Bug 179309

Summary: iOS supports some text encodings supposedly due to lack of TEC that aren't supported by the TEC decoder on macOS
Product: WebKit Reporter: Maciej Stachowiak <mjs>
Component: TextAssignee: Maciej Stachowiak <mjs>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, commit-queue, darin, mitz, mmaxfield, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 179303    
Attachments:
Description Flags
Patch
none
Patch for landing none

Description Maciej Stachowiak 2017-11-05 18:19:00 PST
PLATFORM PARITY: iOS WebKit knows nonstandard encoding macos-6-10.2 with names ['macos-6-10.2'], but macOS WebKit doesn't
PLATFORM PARITY: iOS WebKit knows nonstandard encoding macos-6_2-10.4 with names ['macos-6_2-10.4', 'x-mac-greek', 'windows-10006', 'macgr', 'x-macgreek'], but macOS WebKit doesn't
PLATFORM PARITY: iOS WebKit knows nonstandard encoding macos-35-10.2 with names ['macos-35-10.2', 'x-mac-turkish', 'windows-10081', 'mactr', 'x-macturkish'], but macOS WebKit doesn't
PLATFORM PARITY: iOS WebKit knows nonstandard encoding macos-29-10.2 with names ['macos-29-10.2', 'x-mac-centraleurroman', 'windows-10029', 'x-mac-ce', 'macce', 'maccentraleurope', 'x-maccentraleurope'], but macOS WebKit doesn't
PLATFORM PARITY: iOS WebKit knows nonstandard encoding softbank-sjis with names ['softbank-sjis'], but macOS WebKit doesn't
PLATFORM PARITY: iOS WebKit knows nonstandard encoding macos-7_3-10.2 with names ['macos-7_3-10.2', 'x-mac-cyrillic', 'windows-10007', 'mac-cyrillic', 'maccy', 'x-maccyrillic', 'x-macukraine'], but macOS WebKit doesn't
Comment 1 Maciej Stachowiak 2017-11-05 18:39:31 PST
Some of these appear to be redundant aliases for encodings handled by the general ICU code path. Others are just unnecessary macos-whatever names. The only substantive difference is softbank-sjis. That is not supported by other browsers, so we probably don't need it.

Tentatively, I think we can delete this whole chunk of additions.
Comment 2 Alexey Proskuryakov 2017-11-05 19:28:59 PST
I’ll need to look up the references, but there is a lot of history behind the SoftBank encoding, and there is a very good chance that it’s still needed. But maybe it’s only needed by native clients, and can be hidden from the web. 

The other discrepancies surprise me, I thought it was the opposite - Mac supported those legacy Mac encodings, but iOS didn’t. How did you test?
Comment 3 Maciej Stachowiak 2017-11-05 20:25:31 PST
The results above are a bit misleading. The general WenKit ICUE code path supports many of these encodings, just with a different canonical name.

Actually not supported:
macos-6-10.2
macos-6_2-10.4
macos-35-10.2
macos-29-10.2
macos-7_3-10.2
softbank-sjis

Supported, but with different canonical name:

['x-mac-greek', 'windows-10006', 'macgr', 'x-macgreek'] with canonical name 'x-mac-greek' instead of 'macos-6_2-10.4'
['x-mac-turkish', 'windows-10081', 'mactr', 'x-macturkish'] with canonical name 'x-mac-turkish' instead 'of macos-35-10.2'
['x-mac-centraleurroman', 'windows-10029', 'x-mac-ce', 'macce', 'maccentraleurope', 'x-maccentraleurope'] with canonical name 'x-mac-centraleurroman' instead of 'macos-29-10.2'
['x-mac-cyrillic', 'windows-10007', 'mac-cyrillic', 'maccy', 'x-maccyrillic', 'x-macukraine'] with canonical name 'x-mac-cyrillic' instead of 'macos-7_3-10.2

I believe the iOS-specific ICU code can all be removed with the possible exception of 'softbank-sjis'. I am not sure how to find out if that is actually required for anything. It might date back to when WebKit was used for all iOS text. The generic path does support all of "shift-jis", "csshiftjis", "ms932", "ms_kanji", "sjis", "windows-31j", "x-sjis". I don't know of softbank-sjis is different. I cannot find any references for this encoding name. I do see sjis-softbank and other such names mentioned. For now, I think the right move is to delete the other aliases besides softbank-sjis, while we determine if it is actually needed for anything.
Comment 4 Alexey Proskuryakov 2017-11-06 10:22:25 PST
The SoftBank variant is still needed for e-mail, per rdar://problem/27577436. I don't know which aliases are needed for it. It may be enough to support it in API, not when parsing HTML, again not quite sure how iOS Mail works.
Comment 5 Alexey Proskuryakov 2017-11-06 10:23:59 PST
I meant rdar://problem/26935599
Comment 6 Maciej Stachowiak 2017-11-06 16:06:56 PST
I wonder how to find out if WebKit needs to specifically support softbank-sjis encoding on iOS (but not docomo-sjis or kddi-sjis). It was hard to tell from the Radars if it's required in WebKit or just in ICU. They reference emails where the only encoding mentioned is UTF-8. Maybe best discussed offline.
Comment 7 Maciej Stachowiak 2017-11-08 00:41:42 PST
Created attachment 326314 [details]
Patch
Comment 8 Darin Adler 2017-11-08 09:02:39 PST
Comment on attachment 326314 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=326314&action=review

> Source/WebCore/platform/text/TextCodecICU.cpp:173
> +    // FIXME: this may not be needed any more.

Should should mention bug 179416, capitalize "this".

> Source/WebCore/platform/text/TextCodecICU.cpp:219
> +    // FIXME: this may not be needed any more.

Ditto.
Comment 9 Maciej Stachowiak 2017-11-08 11:04:02 PST
Created attachment 326340 [details]
Patch for landing
Comment 10 WebKit Commit Bot 2017-11-08 11:35:22 PST
Comment on attachment 326340 [details]
Patch for landing

Clearing flags on attachment: 326340

Committed r224589: <https://trac.webkit.org/changeset/224589>
Comment 11 WebKit Commit Bot 2017-11-08 11:35:24 PST
All reviewed patches have been landed.  Closing bug.
Comment 12 Radar WebKit Bug Importer 2017-11-15 09:40:28 PST
<rdar://problem/35562205>