Bug 25488
Summary: | windows-949 returned by document.{charset,characterset} is not recognized by most Korean web servers | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
Component: | Text | Assignee: | Jungshik Shin <jshin> |
Status: | RESOLVED DUPLICATE | ||
Severity: | Normal | CC: | ap |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&cb=1000&charset=windows-949 |
Jungshik Shin
1. Go to http://www.qubi.com (Korean Railroad)
2. The ad frame in the middle of the page has garbled characters (UTF-8 interpreted as EUC-KR)
That frame is at http://file.qubi.com/sg_framework/sg_framework_top/season2/adserver_www_010.html
and uses 'document.write()' for <script>. It has a meta charset declaration ('charset=euc-kr' at the top).
When constructing a URL for an ad to show, it uses document.charset to pass to the server as a cgi param (charset).
Because Webkit (that uses ICU) maps euc-kr to windows-949 (its superset), document.charset returns 'windows-949'. An example URL constructed as a result is like this:
http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&cb=1000&charset=windows-949
Unfortunately, it's not recognized by most Korean web servers.
Firefox (although it treats EUC-KR as windows-949 for converting to Unicode) still uses the name (EUC-KR) and the following url is constructed and the web server at qubi.com emits EUC-KR strings back.
http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&cb=1000&charset=EUC-KR
http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&cb=1000&charset=UTF-8 also works.
I was mildly worried about this issue, but bit the bullet (of unforking Chrome's copy of TextCodecICU.cpp to match Webkit trunk) because I thought there'd not be many web servers relying on docuement.charset value. It appears that some ad serving web servers (in Korea) use this technique to show ads in pages in both UTF-8 and EUC-KR
In the past (before Chrome unforked its copy of TextCodecICU.cpp), it modified ICU's charset alias table to treat EUC-KR the same as windows-949 but left alone TextCodecICU.cpp (as a result, document.charset returns 'EUC-KR' in Chrome in the past). Because Safari can't touch the charset alias table on ICU, it's not applicable to Webkit in general.
A quick (and perhaps dirty fix) would be to add an exception to Document::encoding() to make it return 'EUC-KR' when encoding name is 'windows-949'. Perhaps, there's a better way to deal with this in TextCodecICU.cpp. I haven't yet given much thought to that possibility.
It's a Chromium bug 11242 :
http://code.google.com/p/chromium/issues/detail?id=11242
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Jungshik Shin
sorry for dupe.
*** This bug has been marked as a duplicate of 25487 ***