WebKit treats Big5-HKSCS as a distinct encoding from Big5, but the Encoding standard says it's the same. Chrome and Firefox report Big5 as the canonical name when using the TextDecoder API. It's not clear to me if they actually decode it differently though, I am not sure how to make a test for that.
Here's some past revisions that may explain why we have this behavior (pointed out by Darin): https://trac.webkit.org/changeset/3611/webkit We changed to treat all Big5 as an alias for the Windows version (like the latest Encoding spec does) https://trac.webkit.org/changeset/4054/webkit We changed to treat most Big5 character sets as Big5_HKSCS_1999, unless they were explicitly Microsoft-specific. https://trac.webkit.org/changeset/4689/webkit We changed to treat most Big5 character sets as the DOS/Windows version, but left Big5-HKSCS alone. It's not totally clear why Big5-HKSCS was left alone in that last change. I don't think this is compatible with other browsers do, so we should probably abandon this direction. But I need to make some tests.
Big5 is a large family of standards governed by various entities, and we basically never got to check if ICU supported the variant(s) that other browsers used. This is likely moot now, as Chrome also uses ICU.
These are our differences from the standard on Big5-related encodings: MISMATCH: encoding big5-hkscs is Big5 in the standard, but Big5-HKSCS in WebKit EXTRA NAME: WebKit knows extra nonstandard name x-windows-950 for Big5 EXTRA NAME: WebKit knows extra nonstandard name windows-950 for Big5 EXTRA NAME: WebKit knows extra nonstandard name x-big5 for Big5 EXTRA NAME: WebKit knows extra nonstandard name ms950 for Big5 EXTRA NAME: WebKit knows extra nonstandard name windows-950-2000 for Big5 EXTRA ENCODING: WebKit knows nonstandard encoding Big5-HKSCS with names ['big5-hkscs', 'big5hk', 'hkscs-big5', 'ibm-1375', 'ibm-1375_p100-2008']
Created attachment 326098 [details] Test case for (lack of) WebKit's Big5 quirks, meant to go in LayoutTests/fast/encodings This test case gives exactly the spec-mandated results for Firefox and Chrome. They both have the exact spec behavior. Safari has the differences described above.
Here's the Gecko bug from when they did the merge: https://bugzilla.mozilla.org/show_bug.cgi?id=912470 It seems like their Big5 supports HKSCS character sequences. But I'm not sure if that's the same as our Big5-HKSCS or something that's a larger of that and Windows-flavord Big5.
Based on http://w3c-test.org/encoding/big5-encoder.html , it doesn't look like either Big5 or Big5_HKSCS encodings from ICU quite match what the Encoding standard requires, and their failures are not the same either, so merging down to one of the two is bound to cause bugs. We might need a custom Big5 codec. ICU seems to support several apparent Big5 variants: ibm-1373_P100-2002 windows-950-2000 ibm-950_P110-1999 ibm-1375_P100-2008 ibm-5471_P100-2006 I'm not sure if any of these are the proper web variant.
*** Bug 159890 has been marked as a duplicate of this bug. ***
According to https://wpt.fyi/results/encoding?label=master&label=experimental&aligned&view=subtest&q=big5 we pass all the tests so this was fixed at some point. Probably by Alex?
Confirmed: https://github.com/WebKit/WebKit/commit/70a5c3285eca476faa66c6e6055d615c26c78fc4 *** This bug has been marked as a duplicate of bug 216016 ***