<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>22962</bug_id>
          
          <creation_ts>2008-12-22 08:11:32 -0800</creation_ts>
          <short_desc>Web page encoded as &quot;Big 5 HKSCS&quot; is not decoded properly</short_desc>
          <delta_ts>2022-09-16 15:20:48 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Page Loading</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Mac</rep_platform>
          <op_sys>OS X 10.5</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          <see_also>https://bugs.webkit.org/show_bug.cgi?id=160931</see_also>
          <bug_file_loc>http://www.mingpaonews.com/20081222/gaa1h.htm</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>HasReduction, InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="David Kilzer (:ddkilzer)">ddkilzer</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
    
    <cc>cdumez</cc>
    
    <cc>eric</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>103178</commentid>
    <comment_count>0</comment_count>
    <who name="David Kilzer (:ddkilzer)">ddkilzer</who>
    <bug_when>2008-12-22 08:11:32 -0800</bug_when>
    <thetext>* SUMMARY
Web page with &quot;Big5&quot; encoding specified in &lt;meta&gt; tag (and Content-Type sent as &quot;text/html&quot;) is not detected as having &quot;Big 5 HKSCS&quot; encoding and is thus not decoded properly.  The same page loaded in Firefox 3 is detected and decoded properly.

* STEPS TO REPRODUCE
1. Launch Safari/WebKit.
2. Open URL:  http://www.mingpaonews.com/20081222/gaa1h.htm

* RESULTS
Note square boxes in the text of the story, and how the text differs after switching to &quot;Big 5 HKSCS&quot; encoding via the &quot;Text Encoding&quot; item in the View menu.

* REGRESSION
Unknown.  Tested Safari 3.2.1 on Mac OS X 10.5.6 and a local debug build of WebKit r39423.  Both showed the same behavior.

* NOTES
Firefox 3 gets it right, so WebKit should be using a similar heuristic.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103180</commentid>
    <comment_count>1</comment_count>
    <who name="David Kilzer (:ddkilzer)">ddkilzer</who>
    <bug_when>2008-12-22 08:19:56 -0800</bug_when>
    <thetext>&lt;rdar://problem/6462924&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103184</commentid>
    <comment_count>2</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2008-12-22 08:50:50 -0800</bug_when>
    <thetext>This page uses an encoding that is different from either Big5 variant supported by Safari - note the replacement characters that appear after forcing the encoding to Big 5 HKSCS.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103186</commentid>
    <comment_count>3</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2008-12-22 08:54:45 -0800</bug_when>
    <thetext>Dave, do you know for a fact that Firefox decodes the text 100% correctly? Or just that it has no square boxes, question marks and other obvious brokenness?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103187</commentid>
    <comment_count>4</comment_count>
    <who name="David Kilzer (:ddkilzer)">ddkilzer</who>
    <bug_when>2008-12-22 09:04:34 -0800</bug_when>
    <thetext>(In reply to comment #3)
&gt; Dave, do you know for a fact that Firefox decodes the text 100% correctly? Or
&gt; just that it has no square boxes, question marks and other obvious brokenness?

Scrolling down the page, I see replacement characters in Firefox 3 as well.  They&apos;re &quot;?&quot; characters without black diamonds around them.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103188</commentid>
    <comment_count>5</comment_count>
    <who name="David Kilzer (:ddkilzer)">ddkilzer</who>
    <bug_when>2008-12-22 09:04:57 -0800</bug_when>
    <thetext>I wonder if MSIE 6/7/8 handle this page any better?

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103189</commentid>
    <comment_count>6</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2008-12-22 09:11:26 -0800</bug_when>
    <thetext>(In reply to comment #4)
&gt; Scrolling down the page, I see replacement characters in Firefox 3 as well. 
&gt; They&apos;re &quot;?&quot; characters without black diamonds around them.

Are you sure about that? These looked like normal question marks to me.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103190</commentid>
    <comment_count>7</comment_count>
    <who name="David Kilzer (:ddkilzer)">ddkilzer</who>
    <bug_when>2008-12-22 09:19:09 -0800</bug_when>
    <thetext>(In reply to comment #6)
&gt; (In reply to comment #4)
&gt; &gt; Scrolling down the page, I see replacement characters in Firefox 3 as well. 
&gt; &gt; They&apos;re &quot;?&quot; characters without black diamonds around them.
&gt; 
&gt; Are you sure about that? These looked like normal question marks to me.

No, I am not sure.  I do not read Chinese.  :)

I don&apos;t see any &quot;square boxes&quot; or question-marks-in-black-diamonds on the page in Firefox 3.  I *do* see a character that looks like &quot;No&quot; with the &quot;o&quot; superscript and underlined (&amp;#8470;) in the Firefox page that doesn&apos;t appear in the Safari page with &quot;Big 5 HKSCS&quot; encoding.

Also note that the black diamonds in Desktop Safari when switching text encoding to &quot;Big 5 HKSCS&quot; are simply colons on the Firefox 3 page.  Could this be a missing glyph or a decoding bug?
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103191</commentid>
    <comment_count>8</comment_count>
    <who name="David Kilzer (:ddkilzer)">ddkilzer</who>
    <bug_when>2008-12-22 09:21:01 -0800</bug_when>
    <thetext>The equivalent character from Desktop Safari (to the &quot;No&quot; character in Firefox 3):  &amp;#22050;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>749931</commentid>
    <comment_count>9</comment_count>
    <who name="Eric Seidel (no email)">eric</who>
    <bug_when>2012-10-24 12:44:03 -0700</bug_when>
    <thetext>It&apos;s unclear to me if this is still an issue.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1899098</commentid>
    <comment_count>10</comment_count>
    <who name="Sam Sneddon [:gsnedders]">gsnedders</who>
    <bug_when>2022-09-16 15:20:48 -0700</bug_when>
    <thetext>Archive.org doesn&apos;t seem to have archived this either, so it&apos;s not meaningfully actionable as I can tell.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>