Bug 19291 - Chinese characters shown as garbage on search.yesky.com
Summary: Chinese characters shown as garbage on search.yesky.com
Alias: None
Product: WebKit
Classification: Unclassified
Component: Evangelism (show other bugs)
Version: 525.x (Safari 3.1)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL: http://www.yesky.com
Depends on:
Reported: 2008-05-28 07:30 PDT by Hong Zhao
Modified: 2011-03-21 11:53 PDT (History)
4 users (show)

See Also:

minimal test case (175 bytes, text/html)
2008-07-29 02:04 PDT, Robert Blaut
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hong Zhao 2008-05-28 07:30:09 PDT
1. Go to www.yesky.com and wait until the site is loaded completely.
2. Type some strings in search bar on top of the page and press search.

Expected outcome:
Search result is displayed.

Actual outcome:
Garbage webpage is displayed, Chinese characters are shown as garbage

Note: This page works fine with IE, Firefox, using Etheral to compare the http request data to server with IE, can't see the difference. After the page is loaded (which shows the garbage characters instead of Chinese), in the same page, charset is set as gb2312, except it doesn't use this charset. But if I save the page, and reload from the saved file, the page is shown as Chinese. Looks like it takes http encoding instead charset inside the page.
Comment 1 Ed Ford 2008-07-23 21:39:39 PDT
Packet dump showing that HTTP response header coming down the pipe with charset=iso8859_1.  This is the charset that Webkit is using to display the page.  Switching the charset to GB2312 from the View menu changes the page to display the proper Chinese characters.

Compared results against Firefox 3 -- Firefox is rendering the page with the GB2312 charset despite the iso8859_1 info coming from the header.  Changing the Firefox charset to iso8859_1, the exact same sequences of garbage characters are shown as in Webkit.  With both browsers set to using GB2312, the Chinese character sequence is the same.
Comment 2 Alexey Proskuryakov 2008-07-23 22:53:55 PDT
So, Firefox does not recognize the encoding name "iso8859_1", unlike "iso8859-1". This is a violation of HTML5 rules: "When comparing a string specifying a character encoding with the name or alias of a character encoding to determine if they are equal, user agents must ignore the all characters in the ranges U+0009 to U+000D, U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060, and U+007B to U+007E (all whitespace and punctuation characters in ASCII) in both names, and then perform the comparison case-insensitively."
Comment 3 Robert Blaut 2008-07-29 02:04:31 PDT
Created attachment 22534 [details]
minimal test case
Comment 4 Robert Blaut 2008-07-29 02:11:42 PDT
Opera 9.50 and WebKit behaves correctly here according HTML5 spec. So the report doesn't describe bug in WebKit. 

A charset definition should be corrected on the reported site. Classified as an evangelism bug.
Comment 5 Robert Blaut 2008-12-23 05:33:19 PST
As I tested today the reported site appears to working fine. Correct charset is sent to the latest WebKit :) The bug is fixed.
Comment 6 Joel Parks 2011-03-21 11:53:41 PDT
re-purposing InTSW keyword for use by QtWebkit team