RESOLVED FIXED Bug 171128
Regression(r204605): support for "cp874" charset alias was inadvertently dropped which may cause issues on certain Thai sites
https://bugs.webkit.org/show_bug.cgi?id=171128
Summary Regression(r204605): support for "cp874" charset alias was inadvertently drop...
Chris Dumez
Reported 2017-04-21 12:13:20 PDT
text on http://www1.odn.ne.jp/pasto_cbe26550/ looks garbled after r204605, when the system's default language is Japanese. This is because we inadvertently dropped support for "cp932" character encoding alias.
Attachments
Patch (6.55 KB, patch)
2017-04-21 12:18 PDT, Chris Dumez
no flags
Patch (2.31 KB, patch)
2017-04-24 15:11 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2017-04-21 12:13:36 PDT
Chris Dumez
Comment 2 2017-04-21 12:18:24 PDT
Alexey Proskuryakov
Comment 3 2017-04-21 13:06:51 PDT
Comment on attachment 307771 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=307771&action=review > Source/WebCore/platform/text/TextCodecICU.cpp:96 > +DECLARE_ALIASES(windows_949, "euc-kr", "cseuckr", "csksc56011987", "iso-ir-149", "korean", "ks_c_5601-1987", "ks_c_5601-1989", "ksc5601", "ksc_5601", "ms949", "x-KSC5601", "x-windows-949", "x-uhc", "cp949"); This won't work - cp949 is a different encoding in ICU than CFString thinks it is. I don't see why we would add support for non-standard encoding aliases here. Why not just special case them in defaultTextEncodingNameForSystemLanguage()? That's the only place where they should be supported, not in Web content.
Chris Dumez
Comment 4 2017-04-24 15:11:34 PDT
Alexey Proskuryakov
Comment 5 2017-04-24 16:50:39 PDT
Comment on attachment 308016 [details] Patch r=me assuming that this is the same encoding that we used to map cp874 to in the past.
Chris Dumez
Comment 6 2017-04-24 16:56:38 PDT
(In reply to Alexey Proskuryakov from comment #5) > Comment on attachment 308016 [details] > Patch > > r=me assuming that this is the same encoding that we used to map cp874 to in > the past. I am no sure it is. If you look at http://demo.icu-project.org/icu-bin/convexp. It says that "cp874" alias was using ibm-874_P100-1995 internal encoder in ICU. However, it would now use windows-874-2000. I do not know how big an issue that is.
Chris Dumez
Comment 7 2017-04-24 17:00:41 PDT
(In reply to Chris Dumez from comment #6) > (In reply to Alexey Proskuryakov from comment #5) > > Comment on attachment 308016 [details] > > Patch > > > > r=me assuming that this is the same encoding that we used to map cp874 to in > > the past. > > I am no sure it is. If you look at > http://demo.icu-project.org/icu-bin/convexp. It says that "cp874" alias was > using ibm-874_P100-1995 internal encoder in ICU. However, it would now use > windows-874-2000. I do not know how big an issue that is. http://demo.icu-project.org/icu-bin/convexp?conv=windows-874-2000&s=ALL http://demo.icu-project.org/icu-bin/convexp?conv=ibm-874_P100-1995&s=ALL
Alexey Proskuryakov
Comment 8 2017-04-24 18:01:45 PDT
The table attached to bug 160931 says "cp874: (got: windows-874)". Looking at the ICU links that you posted, and at r204605, I don't see how that could be the case. But the table is accurate, then windows-874 is definitely the way to go.
Alexey Proskuryakov
Comment 9 2017-04-24 18:21:26 PDT
> But the table But *if* the table
WebKit Commit Bot
Comment 10 2017-04-24 18:44:35 PDT
Comment on attachment 308016 [details] Patch Clearing flags on attachment: 308016 Committed r215711: <http://trac.webkit.org/changeset/215711>
WebKit Commit Bot
Comment 11 2017-04-24 18:44:37 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.