Bug 14475 - REGRESSION: Korean (DOS) encoding doesn't work
Summary: REGRESSION: Korean (DOS) encoding doesn't work
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac (PowerPC) OS X 10.4
: P1 Normal
Assignee: Nobody
URL: http://tomyun.pe.kr/temp/safari-encod...
Keywords: InRadar, Regression
Depends on:
Blocks:
 
Reported: 2007-06-30 07:10 PDT by Kyungdahm Yun
Modified: 2007-06-30 09:58 PDT (History)
1 user (show)

See Also:


Attachments
3 frames with different encoding setup (1.67 KB, application/zip)
2007-06-30 07:13 PDT, Kyungdahm Yun
no flags Details
test case (347 bytes, text/html)
2007-06-30 09:12 PDT, Alexey Proskuryakov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kyungdahm Yun 2007-06-30 07:10:33 PDT
Characters in different encodings are detected and rendered correctly when they are in a frame which properly specify text encoding. But when the frame is poorly structured, encoding is not detected. The worse is that one can't even change text encoding with an explicit menu command.

I've done small test with different cases. They are contained in a main page which specifies a default encoding in a META tag.

Frame 1: When a frame has encoded characters in a raw form, without any HTML markups.
Frame 2: When a frame has a HTML structure, but an encoding is not specified.
Frame 3: When a frame has a HTML structure with an encoding specified properly.

In Safari 3.0.2 (522.12) and nightly build, Frame 1 and 2 shows the problem. An attempt to change 'Text Encoding' in View menu failed. When I chose an encoding except UTF-8, nothing happened. Choosing UTF-8 made a change in rendered text with miserably broken characters.  Frame 3 renders correctly.

Firefox 2.0.0.3 and Camino 1.5 has no problem at all. They even automatically detected proper encoding for Frame 1 and 2.

Internet Explorer 7 on Windows does a good job as well. It detected a proper encoding for all frames.
Comment 1 Kyungdahm Yun 2007-06-30 07:13:06 PDT
Created attachment 15325 [details]
3 frames with different encoding setup
Comment 2 Alexey Proskuryakov 2007-06-30 07:24:35 PDT
(In reply to comment #0)
> The worse is that one can't even
> change text encoding with an explicit menu command.

I have tried, and choosing Korean (Mac OS) from the menu works for me in r23841 nightly (running with Safari 3.0.2 beta). I'm wondering what is different in your case. Do you have any Safari enhancers installed?
Comment 3 Kyungdahm Yun 2007-06-30 08:15:44 PDT
(In reply to comment #2)
> I have tried, and choosing Korean (Mac OS) from the menu works for me in r23841
> nightly (running with Safari 3.0.2 beta). I'm wondering what is different in
> your case. Do you have any Safari enhancers installed?
> 

I missed that one. Actually, I (and maybe many Korean users) usually play with 'Korean (Windows, DOS)', not 'Korean (Mac OS)'. They are slightly different variants of EUC-KR encoding, though I'm not sure which parts are exactly different. Since Windows platforms are prevalent in Korea, the former would be more commonly found on the web.

Anyway, choosing 'Korean (Windows, DOS)' should show the same result as 'Korean (Mac OS)' in most cases. Web pages rendered correctly in Safari 2 starts broken in Safari 3.

Also, automatic encoding detection feature in Safari 3 seems to be somewhat broken when the page does not specify one.

PS: I don't have any enhancer installed. Once I had SafariStand, but uninstalled it right after Safari 3 beta came out.
Comment 4 Alexey Proskuryakov 2007-06-30 09:02:25 PDT
> Anyway, choosing 'Korean (Windows, DOS)' should show the same result as 'Korean
> (Mac OS)' in most cases

Yes, I also see this now. Confirming that 'Korean (Windows, DOS)' no longer works, renaming the bug to make clear that it tracks this specific problem.

As for automatic detection, there are two issues in fact:

1) Firefox has true encoding auto-detection (using the actual page text to guess what the correct encoding is). WebKit only has it for Japanese at the moment, although other languages could also benefit from it. I suggest adding examples of sites that need auto-detection to bug 4120.

2) In your test case, the index document explicitly specifies an encoding, while its subframes do not. WebKit used to propagate the encoding from main frame to subframes in such case, but we stopped doing so because of many sites that were broken by this approach. If you have examples of real-life sites that are broken because of this change, please file a new bug; maybe we could find a safer solution.
Comment 5 Alexey Proskuryakov 2007-06-30 09:12:42 PDT
Created attachment 15327 [details]
test case

This is a test of cp949 encoding, which Safari tries to use when Korean (DOS, Windows) encoding is manually selected.
Comment 6 Alexey Proskuryakov 2007-06-30 09:58:06 PDT
The ICU version shipped with Tiger doesn't support "cp949" encoding, and newer ICU versions do - but it's a different encoding! See <http://www.icu-project.org/icu-bin/convexp?conv=ibm-949_P110-1999&s=ALL>.

Firefox and MSIE do not support cp949 either, and I think it's a Safari bug that it uses this name for what is actually "windows-949".

Since Safari is not open source, this needs to be fixed by Apple engineers. I have filed this as <rdar://5304984>. As this is not a WebKit bug, closing as INVALID (please open new bugs for related issues, as discussed above).