<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>14475</bug_id>
          
          <creation_ts>2007-06-30 07:10:33 -0700</creation_ts>
          <short_desc>REGRESSION: Korean (DOS) encoding doesn&apos;t work</short_desc>
          <delta_ts>2007-06-30 09:58:06 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Page Loading</component>
          <version>523.x (Safari 3)</version>
          <rep_platform>Mac (PowerPC)</rep_platform>
          <op_sys>OS X 10.4</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc>http://tomyun.pe.kr/temp/safari-encoding-bug/</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar, Regression</keywords>
          <priority>P1</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Kyungdahm Yun">tomyun</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>5865</commentid>
    <comment_count>0</comment_count>
    <who name="Kyungdahm Yun">tomyun</who>
    <bug_when>2007-06-30 07:10:33 -0700</bug_when>
    <thetext>Characters in different encodings are detected and rendered correctly when they are in a frame which properly specify text encoding. But when the frame is poorly structured, encoding is not detected. The worse is that one can&apos;t even change text encoding with an explicit menu command.

I&apos;ve done small test with different cases. They are contained in a main page which specifies a default encoding in a META tag.

Frame 1: When a frame has encoded characters in a raw form, without any HTML markups.
Frame 2: When a frame has a HTML structure, but an encoding is not specified.
Frame 3: When a frame has a HTML structure with an encoding specified properly.

In Safari 3.0.2 (522.12) and nightly build, Frame 1 and 2 shows the problem. An attempt to change &apos;Text Encoding&apos; in View menu failed. When I chose an encoding except UTF-8, nothing happened. Choosing UTF-8 made a change in rendered text with miserably broken characters.  Frame 3 renders correctly.

Firefox 2.0.0.3 and Camino 1.5 has no problem at all. They even automatically detected proper encoding for Frame 1 and 2.

Internet Explorer 7 on Windows does a good job as well. It detected a proper encoding for all frames.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5866</commentid>
    <comment_count>1</comment_count>
      <attachid>15325</attachid>
    <who name="Kyungdahm Yun">tomyun</who>
    <bug_when>2007-06-30 07:13:06 -0700</bug_when>
    <thetext>Created attachment 15325
3 frames with different encoding setup</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5867</commentid>
    <comment_count>2</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2007-06-30 07:24:35 -0700</bug_when>
    <thetext>(In reply to comment #0)
&gt; The worse is that one can&apos;t even
&gt; change text encoding with an explicit menu command.

I have tried, and choosing Korean (Mac OS) from the menu works for me in r23841 nightly (running with Safari 3.0.2 beta). I&apos;m wondering what is different in your case. Do you have any Safari enhancers installed?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5855</commentid>
    <comment_count>3</comment_count>
    <who name="Kyungdahm Yun">tomyun</who>
    <bug_when>2007-06-30 08:15:44 -0700</bug_when>
    <thetext>(In reply to comment #2)
&gt; I have tried, and choosing Korean (Mac OS) from the menu works for me in r23841
&gt; nightly (running with Safari 3.0.2 beta). I&apos;m wondering what is different in
&gt; your case. Do you have any Safari enhancers installed?
&gt; 

I missed that one. Actually, I (and maybe many Korean users) usually play with &apos;Korean (Windows, DOS)&apos;, not &apos;Korean (Mac OS)&apos;. They are slightly different variants of EUC-KR encoding, though I&apos;m not sure which parts are exactly different. Since Windows platforms are prevalent in Korea, the former would be more commonly found on the web.

Anyway, choosing &apos;Korean (Windows, DOS)&apos; should show the same result as &apos;Korean (Mac OS)&apos; in most cases. Web pages rendered correctly in Safari 2 starts broken in Safari 3.

Also, automatic encoding detection feature in Safari 3 seems to be somewhat broken when the page does not specify one.

PS: I don&apos;t have any enhancer installed. Once I had SafariStand, but uninstalled it right after Safari 3 beta came out.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5851</commentid>
    <comment_count>4</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2007-06-30 09:02:25 -0700</bug_when>
    <thetext>&gt; Anyway, choosing &apos;Korean (Windows, DOS)&apos; should show the same result as &apos;Korean
&gt; (Mac OS)&apos; in most cases

Yes, I also see this now. Confirming that &apos;Korean (Windows, DOS)&apos; no longer works, renaming the bug to make clear that it tracks this specific problem.

As for automatic detection, there are two issues in fact:

1) Firefox has true encoding auto-detection (using the actual page text to guess what the correct encoding is). WebKit only has it for Japanese at the moment, although other languages could also benefit from it. I suggest adding examples of sites that need auto-detection to bug 4120.

2) In your test case, the index document explicitly specifies an encoding, while its subframes do not. WebKit used to propagate the encoding from main frame to subframes in such case, but we stopped doing so because of many sites that were broken by this approach. If you have examples of real-life sites that are broken because of this change, please file a new bug; maybe we could find a safer solution.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5853</commentid>
    <comment_count>5</comment_count>
      <attachid>15327</attachid>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2007-06-30 09:12:42 -0700</bug_when>
    <thetext>Created attachment 15327
test case

This is a test of cp949 encoding, which Safari tries to use when Korean (DOS, Windows) encoding is manually selected.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5847</commentid>
    <comment_count>6</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2007-06-30 09:58:06 -0700</bug_when>
    <thetext>The ICU version shipped with Tiger doesn&apos;t support &quot;cp949&quot; encoding, and newer ICU versions do - but it&apos;s a different encoding! See &lt;http://www.icu-project.org/icu-bin/convexp?conv=ibm-949_P110-1999&amp;s=ALL&gt;.

Firefox and MSIE do not support cp949 either, and I think it&apos;s a Safari bug that it uses this name for what is actually &quot;windows-949&quot;.

Since Safari is not open source, this needs to be fixed by Apple engineers. I have filed this as &lt;rdar://5304984&gt;. As this is not a WebKit bug, closing as INVALID (please open new bugs for related issues, as discussed above).</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>15325</attachid>
            <date>2007-06-30 07:13:06 -0700</date>
            <delta_ts>2007-06-30 07:13:06 -0700</delta_ts>
            <desc>3 frames with different encoding setup</desc>
            <filename>safari-encoding-bug.zip</filename>
            <type>application/zip</type>
            <size>1709</size>
            <attacher name="Kyungdahm Yun">tomyun</attacher>
            
              <data encoding="base64">UEsDBAoAAAAAAJC03jYAAAAAAAAAAAAAAAAUABUAc2FmYXJpLWVuY29kaW5nLWJ1Zy9VVAkAA2Bc
hkaPZIZGVXgEAPUB9QFQSwMECgAAAAAAiLTeNvG9Tms6AAAAOgAAAB8AFQBzYWZhcmktZW5jb2Rp
bmctYnVnL2ZyYW1lMS5odG1sVVQJAANQXIZGi2SGRlV4BAD1AfUBsKGzqrTZtvO4trnZu+e+xsDa
wvfEq8W4xsTHzwooZXVjLWtyIGNoYXJhY3RlcnMsIHJhdyB0ZXh0KVBLAwQUAAAACABotN42Dd8D
XQABAAAXAQAAHwAVAHNhZmFyaS1lbmNvZGluZy1idWcvZnJhbWUyLmh0bWxVVAkAAxRchkaLZIZG
VXgEAPUB9QEljjFLw0AYhufmV3zN1EKbT9tOcmQwKShULXIijtfcNTlM78LlJPhzXHURlNoiSJXa
jg5O7R/wBzi4asz0wvu+8DykHp4E9GLYhwN6NIDh2f7gMAC3jXjeDRBDGlZDz9vZBWqYyqWVWrEU
sX/sgptYm+0hFkXhFV1PmxjpKSZ2kvYw1ToXHrfc9R1SVmUIxn2nRqy0qfDXr0/zxSd0oDE2bCKg
0yRYLQ7B6kpGml/7Nef+5uH2cTP9nk3nm+evl9Vi+/azvHufrZbrDzIygL7TEFdR+9JAlDDDIitM
3gKlQahIc6liyDMRybEUvAWZ0ZkwUEqBZXHe/OP9g0puafoLUEsDBBQAAAAIAG203jaJ7+x5KQEA
AFwBAAAfABUAc2FmYXJpLWVuY29kaW5nLWJ1Zy9mcmFtZTMuaHRtbFVUCQADHlyGRotkhkZVeAQA
9QH1ASXPvU7CUBQA4Bme4ngnSGivpk5aOlhINKISUmIcL+0BGktvvT1YeRxXXUw0CDExaBBGByd8
AR/AwVUunU7O75djb9XOXO+iWYdD76QBzfZB48gFZnB+brmc17xa3tg1t3fAUyJOQwplLCLO66cM
WJ8o2eM8yzIzs0ypetxr8T4Nol0eSZmiGVDAnKKtSzqgCJxiwaaQInSWb8/T2RdYUOoqMUCwyjbP
O+uRAZIAfd7Aq2F4XWWujAljMrxRggz8PKsywhvaiPvg94VKkar1tmsctxjwtchz0u7IYOQUig+3
j3dPq/HvZDxdvfy8Lmbf73/z+4/JYr78tDtKr5Rw6BuXanNN+IQqrQDGvgzCuAdpgn7YDTGoQKJk
ggo0DSR6aXmNbRSN6nf/AVBLAwQUAAAACABItN42ihAgp/AAAABnAQAAHgAVAHNhZmFyaS1lbmNv
ZGluZy1idWcvaW5kZXguaHRtbFVUCQAD11uGRotkhkZVeAQA9QH1AXWPwU7DMAxAz+wrjCUkJtGY
0Z0g7YG2iIkOpikT4li1gU5a15EaCj/FByBxhF/gd1jSlRNcHDvP8rPlfnwTqbtZApdqmsJscZ5O
IkCP6NaPiGIVd2AsjkdwYbJKN5qJkmsELJk3p0Rt24rWF7V5IDWnkqvVmO53naLgAsOBtL/20VkR
DvYkL3mlw8/vj/evNzhUuuGhJDVRaWJppTkDO9zTj0/L5wCjes16zZ563WiEvKsCZP3CzncGeZmZ
rS5IFpF3NUegrYw6m+x3AVO3TYC+f3AEfUDrcw3QmDxAl46EHfonOvkf+T2Sv9e7JdzlP1BLAQIX
AwoAAAAAAJC03jYAAAAAAAAAAAAAAAAUAA0AAAAAAAAAEADtQQAAAABzYWZhcmktZW5jb2Rpbmct
YnVnL1VUBQADYFyGRlV4AABQSwECFwMKAAAAAACItN428b1OazoAAAA6AAAAHwANAAAAAAAAAAAA
pIFHAAAAc2FmYXJpLWVuY29kaW5nLWJ1Zy9mcmFtZTEuaHRtbFVUBQADUFyGRlV4AABQSwECFwMU
AAAACABotN42Dd8DXQABAAAXAQAAHwANAAAAAAABAAAApIHTAAAAc2FmYXJpLWVuY29kaW5nLWJ1
Zy9mcmFtZTIuaHRtbFVUBQADFFyGRlV4AABQSwECFwMUAAAACABttN42ie/seSkBAABcAQAAHwAN
AAAAAAABAAAApIElAgAAc2FmYXJpLWVuY29kaW5nLWJ1Zy9mcmFtZTMuaHRtbFVUBQADHlyGRlV4
AABQSwECFwMUAAAACABItN42ihAgp/AAAABnAQAAHgANAAAAAAABAAAApIGgAwAAc2FmYXJpLWVu
Y29kaW5nLWJ1Zy9pbmRleC5odG1sVVQFAAPXW4ZGVXgAAFBLBQYAAAAABQAFALYBAADhBAAAAAA=
</data>

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>15327</attachid>
            <date>2007-06-30 09:12:42 -0700</date>
            <delta_ts>2007-06-30 09:12:42 -0700</delta_ts>
            <desc>test case</desc>
            <filename>cp949.html</filename>
            <type>text/html</type>
            <size>347</size>
            <attacher name="Alexey Proskuryakov">ap</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMDEgVHJhbnNpdGlvbmFs
Ly9FTiIgImh0dHA6Ly93d3cudzMub3JnL1RSL2h0bWw0L2xvb3NlLmR0ZCI+CjxodG1sPgo8aGVh
ZD4KCTx0aXRsZT7Hwbe5wNMgMyAoZnJhbWUgMyk8L3RpdGxlPgoJPG1ldGEgaHR0cC1lcXVpdj0i
Q29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7IGNoYXJzZXQ9Y3A5NDkiIC8+CjwvaGVh
ZD4KPGJvZHk+CQqwobOqtNm287i2udm7577GwNrC98SrxbjGxMfPPGJyIC8+CihldWMta3IgY2hh
cmFjdGVycywgZW5jb2Rpbmcgc3BlY2lmaWVkLCBwcm9wZXIgaHRtbCB0YWdzKQo8L2JvZHk+Cjwv
aHRtbD4=
</data>

          </attachment>
      

    </bug>

</bugzilla>