<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>179307</bug_id>
          
          <creation_ts>2017-11-05 16:38:29 -0800</creation_ts>
          <short_desc>WebKit treats Big5-HKSCS as a distinct encoding from Big5, Encoding standard says it&apos;s the same</short_desc>
          <delta_ts>2022-09-27 06:44:37 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Text</component>
          <version>Safari Technology Preview</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>DUPLICATE</resolution>
          <dup_id>216016</dup_id>
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>179303</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Maciej Stachowiak">mjs</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>achristensen</cc>
    
    <cc>addison</cc>
    
    <cc>annevk</cc>
    
    <cc>ap</cc>
    
    <cc>darin</cc>
    
    <cc>mmaxfield</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1368502</commentid>
    <comment_count>0</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2017-11-05 16:38:29 -0800</bug_when>
    <thetext>WebKit treats Big5-HKSCS as a distinct encoding from Big5, but the Encoding standard says it&apos;s the same. Chrome and Firefox report Big5 as the canonical name when using the TextDecoder API. It&apos;s not clear to me if they actually decode it differently though, I am not sure how to make a test for that.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1368519</commentid>
    <comment_count>1</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2017-11-05 18:29:18 -0800</bug_when>
    <thetext>Here&apos;s some past revisions that may explain why we have this behavior (pointed out by Darin):

https://trac.webkit.org/changeset/3611/webkit
    We changed to treat all Big5 as an alias for the Windows version (like the latest Encoding spec does)


https://trac.webkit.org/changeset/4054/webkit
    We changed to treat most Big5 character sets as Big5_HKSCS_1999, unless they were explicitly Microsoft-specific.

https://trac.webkit.org/changeset/4689/webkit
    We changed to treat most Big5 character sets as the DOS/Windows version, but left Big5-HKSCS alone.

It&apos;s not totally clear why Big5-HKSCS was left alone in that last change. I don&apos;t think this is compatible with other browsers do, so we should probably abandon this direction. But I need to make some tests.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1368525</commentid>
    <comment_count>2</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2017-11-05 19:32:48 -0800</bug_when>
    <thetext>Big5 is a large family of standards governed by various entities, and we basically never got to check if ICU supported the variant(s) that other browsers used. This is likely moot now, as Chrome also uses ICU.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1368532</commentid>
    <comment_count>3</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2017-11-05 20:36:22 -0800</bug_when>
    <thetext>These are our differences from the standard on Big5-related encodings:

MISMATCH: encoding big5-hkscs is Big5 in the standard, but Big5-HKSCS in WebKit
EXTRA NAME: WebKit knows extra nonstandard name x-windows-950 for Big5
EXTRA NAME: WebKit knows extra nonstandard name windows-950 for Big5
EXTRA NAME: WebKit knows extra nonstandard name x-big5 for Big5
EXTRA NAME: WebKit knows extra nonstandard name ms950 for Big5
EXTRA NAME: WebKit knows extra nonstandard name windows-950-2000 for Big5
EXTRA ENCODING: WebKit knows nonstandard encoding Big5-HKSCS with names [&apos;big5-hkscs&apos;, &apos;big5hk&apos;, &apos;hkscs-big5&apos;, &apos;ibm-1375&apos;, &apos;ibm-1375_p100-2008&apos;]</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1368533</commentid>
    <comment_count>4</comment_count>
      <attachid>326098</attachid>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2017-11-05 20:41:52 -0800</bug_when>
    <thetext>Created attachment 326098
Test case for (lack of) WebKit&apos;s Big5 quirks, meant to go in LayoutTests/fast/encodings

This test case gives exactly the spec-mandated results for Firefox and Chrome. They both have the exact spec behavior. Safari has the differences described above.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1368536</commentid>
    <comment_count>5</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2017-11-05 20:58:37 -0800</bug_when>
    <thetext>Here&apos;s the Gecko bug from when they did the merge: https://bugzilla.mozilla.org/show_bug.cgi?id=912470

It seems like their Big5 supports HKSCS character sequences. But I&apos;m not sure if that&apos;s the same as our Big5-HKSCS or something that&apos;s a larger of that and Windows-flavord Big5.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1368543</commentid>
    <comment_count>6</comment_count>
    <who name="Maciej Stachowiak">mjs</who>
    <bug_when>2017-11-05 21:56:40 -0800</bug_when>
    <thetext>Based on http://w3c-test.org/encoding/big5-encoder.html , it doesn&apos;t look like either Big5 or Big5_HKSCS encodings from ICU quite match what the Encoding standard requires, and their failures are not the same either, so merging down to one of the two is bound to cause bugs. We might need a custom Big5 codec.

ICU seems to support several apparent Big5 variants:
ibm-1373_P100-2002
windows-950-2000
ibm-950_P110-1999
ibm-1375_P100-2008
ibm-5471_P100-2006

I&apos;m not sure if any of these are the proper web variant.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1649588</commentid>
    <comment_count>7</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2020-05-06 07:11:42 -0700</bug_when>
    <thetext>*** Bug 159890 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1901469</commentid>
    <comment_count>8</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2022-09-27 06:27:24 -0700</bug_when>
    <thetext>According to https://wpt.fyi/results/encoding?label=master&amp;label=experimental&amp;aligned&amp;view=subtest&amp;q=big5 we pass all the tests so this was fixed at some point.

Probably by Alex?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1901480</commentid>
    <comment_count>9</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2022-09-27 06:44:37 -0700</bug_when>
    <thetext>Confirmed: https://github.com/WebKit/WebKit/commit/70a5c3285eca476faa66c6e6055d615c26c78fc4

*** This bug has been marked as a duplicate of bug 216016 ***</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>326098</attachid>
            <date>2017-11-05 20:41:52 -0800</date>
            <delta_ts>2017-11-05 20:41:52 -0800</delta_ts>
            <desc>Test case for (lack of) WebKit&apos;s Big5 quirks, meant to go in LayoutTests/fast/encodings</desc>
            <filename>big5-encodings.html</filename>
            <type>text/html</type>
            <size>853</size>
            <attacher name="Maciej Stachowiak">mjs</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIGh0bWw+CjxodG1sPgo8aGVhZD4KPG1ldGEgY2hhcnNldD0idXRmLTgiPgo8c2Ny
aXB0IHNyYz0iLi4vLi4vcmVzb3VyY2VzL2pzLXRlc3QtcHJlLmpzIj48L3NjcmlwdD4KPC9oZWFk
Pgo8Ym9keT4KPHNjcmlwdD4KZGVzY3JpcHRpb24oIlRoaXMgdGVzdCBjaGVja3Mgc3VwcG9ydCBm
b3IgdmFyaW91cyBCaWc1IHRleHQgZW5jb2RpbmdzLiIpOwoKbGV0IGJpZzVFbmNvZGluZ3MgPSBb
ImJpZzUiLCAiYmlnNS1oa3NjcyIsICJjbi1iaWc1IiwgImNzYmlnNSIsICJ4LXgtYmlnNSJdOzsK
CmxldCBleHRyYUJpZzVFbmNvZGluZ3MgPSBbIngtd2luZG93cy05NTAiLCAid2luZG93cy05NTAi
LCAieC1iaWc1IiwgIm1zOTUwIiwgIndpbmRvd3MtOTUwLTIwMDAiLCAiYmlnNWhrIiwgImhrc2Nz
LWJpZzUiLCAiaWJtLTEzNzUiLCAiaWJtLTEzNzVfcDEwMC0yMDA4Il07CgoKZm9yIChsZXQgZW5j
b2Rpbmcgb2YgYmlnNUVuY29kaW5ncykgewogICAgbGV0IGNhbm9uaWNhbF9uYW1lX2V4cHIgPSAn
bmV3IFRleHREZWNvZGVyKCInICsgZW5jb2RpbmcgKyAnIikuZW5jb2RpbmcnOwogICAgc2hvdWxk
QmUoY2Fub25pY2FsX25hbWVfZXhwciwgIidiaWc1JyIpOwp9Cgpmb3IgKGxldCBlbmNvZGluZyBv
ZiBleHRyYUJpZzVFbmNvZGluZ3MpIHsKICAgIGxldCBjYW5vbmljYWxfbmFtZV9leHByID0gJ25l
dyBUZXh0RGVjb2RlcigiJyArIGVuY29kaW5nICsgJyIpLmVuY29kaW5nJzsKICAgIHNob3VsZFRo
cm93KGNhbm9uaWNhbF9uYW1lX2V4cHIpOwp9CgoKPC9zY3JpcHQ+CjxzY3JpcHQgc3JjPSIuLi8u
Li9yZXNvdXJjZXMvanMtdGVzdC1wb3N0LmpzIj48L3NjcmlwdD4KPC9ib2R5Pgo8L2h0bWw+Cg==
</data>

          </attachment>
      

    </bug>

</bugzilla>