Bug 171128

Summary: Regression(r204605): support for "cp874" charset alias was inadvertently dropped which may cause issues on certain Thai sites
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: DOMAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, cdumez, commit-queue, darin, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 160931    
Attachments:
Description Flags
Patch
none
Patch none

Description Chris Dumez 2017-04-21 12:13:20 PDT
text on http://www1.odn.ne.jp/pasto_cbe26550/ looks garbled after r204605, when the system's default language is Japanese. This is because we inadvertently dropped support for "cp932" character encoding alias.
Comment 1 Chris Dumez 2017-04-21 12:13:36 PDT
<rdar://problem/31526844>
Comment 2 Chris Dumez 2017-04-21 12:18:24 PDT
Created attachment 307771 [details]
Patch
Comment 3 Alexey Proskuryakov 2017-04-21 13:06:51 PDT
Comment on attachment 307771 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307771&action=review

> Source/WebCore/platform/text/TextCodecICU.cpp:96
> +DECLARE_ALIASES(windows_949, "euc-kr", "cseuckr", "csksc56011987", "iso-ir-149", "korean", "ks_c_5601-1987", "ks_c_5601-1989", "ksc5601", "ksc_5601", "ms949", "x-KSC5601", "x-windows-949", "x-uhc", "cp949");

This won't work - cp949 is a different encoding in ICU than CFString thinks it is.

I don't see why we would add support for non-standard encoding aliases here. Why not just special case them in defaultTextEncodingNameForSystemLanguage()? That's the only place where they should be supported, not in Web content.
Comment 4 Chris Dumez 2017-04-24 15:11:34 PDT
Created attachment 308016 [details]
Patch
Comment 5 Alexey Proskuryakov 2017-04-24 16:50:39 PDT
Comment on attachment 308016 [details]
Patch

r=me assuming that this is the same encoding that we used to map cp874 to in the past.
Comment 6 Chris Dumez 2017-04-24 16:56:38 PDT
(In reply to Alexey Proskuryakov from comment #5)
> Comment on attachment 308016 [details]
> Patch
> 
> r=me assuming that this is the same encoding that we used to map cp874 to in
> the past.

I am no sure it is. If you look at http://demo.icu-project.org/icu-bin/convexp. It says that "cp874" alias was using ibm-874_P100-1995 internal encoder in ICU. However, it would now use windows-874-2000. I do not know how big an issue that is.
Comment 7 Chris Dumez 2017-04-24 17:00:41 PDT
(In reply to Chris Dumez from comment #6)
> (In reply to Alexey Proskuryakov from comment #5)
> > Comment on attachment 308016 [details]
> > Patch
> > 
> > r=me assuming that this is the same encoding that we used to map cp874 to in
> > the past.
> 
> I am no sure it is. If you look at
> http://demo.icu-project.org/icu-bin/convexp. It says that "cp874" alias was
> using ibm-874_P100-1995 internal encoder in ICU. However, it would now use
> windows-874-2000. I do not know how big an issue that is.

http://demo.icu-project.org/icu-bin/convexp?conv=windows-874-2000&s=ALL
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-874_P100-1995&s=ALL
Comment 8 Alexey Proskuryakov 2017-04-24 18:01:45 PDT
The table attached to bug 160931 says "cp874: (got: windows-874)". Looking at the ICU links that you posted, and at r204605, I don't see how that could be the case.

But the table is accurate, then windows-874 is definitely the way to go.
Comment 9 Alexey Proskuryakov 2017-04-24 18:21:26 PDT
> But the table

But *if* the table
Comment 10 WebKit Commit Bot 2017-04-24 18:44:35 PDT
Comment on attachment 308016 [details]
Patch

Clearing flags on attachment: 308016

Committed r215711: <http://trac.webkit.org/changeset/215711>
Comment 11 WebKit Commit Bot 2017-04-24 18:44:37 PDT
All reviewed patches have been landed.  Closing bug.