WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
19291
Chinese characters shown as garbage on search.yesky.com
https://bugs.webkit.org/show_bug.cgi?id=19291
Summary
Chinese characters shown as garbage on search.yesky.com
Hong Zhao
Reported
2008-05-28 07:30:09 PDT
1. Go to www.yesky.com and wait until the site is loaded completely. 2. Type some strings in search bar on top of the page and press search. Expected outcome: Search result is displayed. Actual outcome: Garbage webpage is displayed, Chinese characters are shown as garbage Note: This page works fine with IE, Firefox, using Etheral to compare the http request data to server with IE, can't see the difference. After the page is loaded (which shows the garbage characters instead of Chinese), in the same page, charset is set as gb2312, except it doesn't use this charset. But if I save the page, and reload from the saved file, the page is shown as Chinese. Looks like it takes http encoding instead charset inside the page.
Attachments
minimal test case
(175 bytes, text/html)
2008-07-29 02:04 PDT
,
Robert Blaut
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Ed Ford
Comment 1
2008-07-23 21:39:39 PDT
Packet dump showing that HTTP response header coming down the pipe with charset=iso8859_1. This is the charset that Webkit is using to display the page. Switching the charset to GB2312 from the View menu changes the page to display the proper Chinese characters. Compared results against Firefox 3 -- Firefox is rendering the page with the GB2312 charset despite the iso8859_1 info coming from the header. Changing the Firefox charset to iso8859_1, the exact same sequences of garbage characters are shown as in Webkit. With both browsers set to using GB2312, the Chinese character sequence is the same.
Alexey Proskuryakov
Comment 2
2008-07-23 22:53:55 PDT
So, Firefox does not recognize the encoding name "iso8859_1", unlike "iso8859-1". This is a violation of HTML5 rules: "When comparing a string specifying a character encoding with the name or alias of a character encoding to determine if they are equal, user agents must ignore the all characters in the ranges U+0009 to U+000D, U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060, and U+007B to U+007E (all whitespace and punctuation characters in ASCII) in both names, and then perform the comparison case-insensitively."
Robert Blaut
Comment 3
2008-07-29 02:04:31 PDT
Created
attachment 22534
[details]
minimal test case
Robert Blaut
Comment 4
2008-07-29 02:11:42 PDT
Opera 9.50 and WebKit behaves correctly here according HTML5 spec. So the report doesn't describe bug in WebKit. A charset definition should be corrected on the reported site. Classified as an evangelism bug.
Robert Blaut
Comment 5
2008-12-23 05:33:19 PST
As I tested today the reported site appears to working fine. Correct charset is sent to the latest WebKit :) The bug is fixed.
Joel Parks
Comment 6
2011-03-21 11:53:41 PDT
re-purposing InTSW keyword for use by QtWebkit team
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug