<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>25488</bug_id>
          
          <creation_ts>2009-04-30 12:02:00 -0700</creation_ts>
          <short_desc>windows-949 returned by document.{charset,characterset}  is not recognized by most Korean web servers</short_desc>
          <delta_ts>2009-04-30 12:02:34 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Text</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>DUPLICATE</resolution>
          <dup_id>25487</dup_id>
          
          <bug_file_loc>http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&amp;cb=1000&amp;charset=windows-949</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Jungshik Shin">jshin</reporter>
          <assigned_to name="Jungshik Shin">jshin</assigned_to>
          <cc>ap</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>119494</commentid>
    <comment_count>0</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2009-04-30 12:02:00 -0700</bug_when>
    <thetext>1. Go to http://www.qubi.com (Korean Railroad) 
2. The ad frame in the middle of the page has garbled characters (UTF-8 interpreted as EUC-KR)

That frame is at http://file.qubi.com/sg_framework/sg_framework_top/season2/adserver_www_010.html
and uses &apos;document.write()&apos; for &lt;script&gt;.  It has a meta charset declaration (&apos;charset=euc-kr&apos; at the top). 

When constructing a URL for an ad to show, it uses document.charset to pass to the server as a cgi param (charset). 

Because Webkit (that uses ICU) maps euc-kr to windows-949 (its superset), document.charset returns &apos;windows-949&apos;. An example URL constructed as a result is like this:

http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&amp;cb=1000&amp;charset=windows-949

Unfortunately, it&apos;s not recognized by most Korean web servers. 

Firefox (although it treats EUC-KR as windows-949 for converting to Unicode) still uses the name (EUC-KR) and the following url is constructed and the web server at qubi.com emits EUC-KR strings back.

http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&amp;cb=1000&amp;charset=EUC-KR

http://adx.qubi.com/openx/www/delivery/ajs.php?zoneid=35&amp;cb=1000&amp;charset=UTF-8 also works. 

I was mildly worried about this issue, but bit the bullet (of unforking Chrome&apos;s copy of TextCodecICU.cpp to match Webkit trunk) because I thought there&apos;d not be many web servers relying on docuement.charset value. It appears that some ad serving web servers (in Korea) use this technique to show ads in pages in both UTF-8 and EUC-KR

In the past (before Chrome unforked its copy of TextCodecICU.cpp), it modified ICU&apos;s charset alias table to treat EUC-KR the same as windows-949 but left alone TextCodecICU.cpp (as a result, document.charset returns &apos;EUC-KR&apos; in Chrome in the past). Because Safari can&apos;t touch the charset alias table on ICU, it&apos;s not applicable to Webkit in general. 

A quick (and perhaps dirty fix) would be to add an exception to Document::encoding() to make it return &apos;EUC-KR&apos; when encoding name is &apos;windows-949&apos;.  Perhaps, there&apos;s a better way to deal with this in TextCodecICU.cpp. I haven&apos;t yet given much thought to that possibility. 

It&apos;s a Chromium bug 11242 :
http://code.google.com/p/chromium/issues/detail?id=11242</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>119495</commentid>
    <comment_count>1</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2009-04-30 12:02:34 -0700</bug_when>
    <thetext>sorry for dupe. 


*** This bug has been marked as a duplicate of 25487 ***</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>