Bug 22962
Summary: | Web page encoded as "Big 5 HKSCS" is not decoded properly | ||
---|---|---|---|
Product: | WebKit | Reporter: | David Kilzer (:ddkilzer) <ddkilzer> |
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED INVALID | ||
Severity: | Normal | CC: | ap, cdumez, eric |
Priority: | P2 | Keywords: | HasReduction, InRadar |
Version: | 528+ (Nightly build) | ||
Hardware: | Mac | ||
OS: | OS X 10.5 | ||
URL: | http://www.mingpaonews.com/20081222/gaa1h.htm | ||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=160931 |
David Kilzer (:ddkilzer)
* SUMMARY
Web page with "Big5" encoding specified in <meta> tag (and Content-Type sent as "text/html") is not detected as having "Big 5 HKSCS" encoding and is thus not decoded properly. The same page loaded in Firefox 3 is detected and decoded properly.
* STEPS TO REPRODUCE
1. Launch Safari/WebKit.
2. Open URL: http://www.mingpaonews.com/20081222/gaa1h.htm
* RESULTS
Note square boxes in the text of the story, and how the text differs after switching to "Big 5 HKSCS" encoding via the "Text Encoding" item in the View menu.
* REGRESSION
Unknown. Tested Safari 3.2.1 on Mac OS X 10.5.6 and a local debug build of WebKit r39423. Both showed the same behavior.
* NOTES
Firefox 3 gets it right, so WebKit should be using a similar heuristic.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
David Kilzer (:ddkilzer)
<rdar://problem/6462924>
Alexey Proskuryakov
This page uses an encoding that is different from either Big5 variant supported by Safari - note the replacement characters that appear after forcing the encoding to Big 5 HKSCS.
Alexey Proskuryakov
Dave, do you know for a fact that Firefox decodes the text 100% correctly? Or just that it has no square boxes, question marks and other obvious brokenness?
David Kilzer (:ddkilzer)
(In reply to comment #3)
> Dave, do you know for a fact that Firefox decodes the text 100% correctly? Or
> just that it has no square boxes, question marks and other obvious brokenness?
Scrolling down the page, I see replacement characters in Firefox 3 as well. They're "?" characters without black diamonds around them.
David Kilzer (:ddkilzer)
I wonder if MSIE 6/7/8 handle this page any better?
Alexey Proskuryakov
(In reply to comment #4)
> Scrolling down the page, I see replacement characters in Firefox 3 as well.
> They're "?" characters without black diamonds around them.
Are you sure about that? These looked like normal question marks to me.
David Kilzer (:ddkilzer)
(In reply to comment #6)
> (In reply to comment #4)
> > Scrolling down the page, I see replacement characters in Firefox 3 as well.
> > They're "?" characters without black diamonds around them.
>
> Are you sure about that? These looked like normal question marks to me.
No, I am not sure. I do not read Chinese. :)
I don't see any "square boxes" or question-marks-in-black-diamonds on the page in Firefox 3. I *do* see a character that looks like "No" with the "o" superscript and underlined (№) in the Firefox page that doesn't appear in the Safari page with "Big 5 HKSCS" encoding.
Also note that the black diamonds in Desktop Safari when switching text encoding to "Big 5 HKSCS" are simply colons on the Firefox 3 page. Could this be a missing glyph or a decoding bug?
David Kilzer (:ddkilzer)
The equivalent character from Desktop Safari (to the "No" character in Firefox 3): 嘢
Eric Seidel (no email)
It's unclear to me if this is still an issue.
Sam Sneddon [:gsnedders]
Archive.org doesn't seem to have archived this either, so it's not meaningfully actionable as I can tell.