Bug 17182
Summary: | Charset declared with document.write in an external Javascript is not honored | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED WONTFIX | ||
Severity: | Normal | CC: | abarth, ap, ian, playmobil |
Priority: | P3 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://i18nl10n.com/webkit/enc_extjs.html |
Jungshik Shin
(this report contains non-ASCII characters in UTF-8. View this with the encoding set to UTF-8).
* How to reproduce
1. Go to the url above (which is a reduced test case of http://event.naver.com )
* Expected:
Two Korean syllables, '가각' should appear. (in case of Naver, Korean characters should come up everywhere)
* Actual:
'°¡°¢' in place of '가각' show up. ('가각' in EUC-KR is 0xB0 0xA1 0xB0 0xA2 and when they're interpreted as ISO-8859-1, they become '°¡°¢').
http://event.naver.com (Naver has the largest market share (over 70%) in Korean web search/portal ) uses a very strange way of setting the page encoding. The charset is set in an external javascript file referred to by the html file. The JS file in question is http://event.naver.com/include/head2.inc. It has the following lines:
------------------------
document.write("\
<html>\
<head>\
<meta http-equiv='Content-type' content='text/html; charset=euc-kr'>\
<title>네이버 :: 이벤트</title>\
<link rel=stylesheet type=text/css href=http://event.naver.com/event.css>\
<!-- 신규 추가 GNB ---->\
<!-- 이미 이 부분이 들어가 있는 서비스라면 또 넣지 않아도 됩니다. -->\
<script type=text/javascript>document.domain = 'naver.com';</script>\
<!-- 신규 추가 GNB---->\
</head>\
<body topmargin=0 rightmargin=0 bottommargin=50 leftmargin=0 bgcolor=#FFFFFF>\
<center>\
... snip ... ");
--------------
Firefox honors the meta charset declared this way (when testing this, make sure to turn OFF the encoding detector in FF and to set the default encoding in FF to something other than Korean (EUC-KR)/ Korean (UHC)).
MS IE does not. (how could Naver have this page in Korea where IE has 99% market share? They rely on the fact that the default encoding of IE is set to Korean (Windows-949) in Korea).
What to do about it?
I've come across several sites with this strange way of charset declaration before. If it can be done easily, we'd want to do that. Otherwise, this is an evangelism issue (For Naver, I'll talk to my contact there to FIX this).
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Alexey Proskuryakov
I believe that this is legal according to HTML5, so I'd like to track this as a bug. On the other hand, fixing it is not easy, and we match IE, so we probably won't fix it very soon.
> (For Naver, I'll talk to my contact there to FIX this).
That would be great - thank you very much!
Jungshik Shin
In http://i18nl10n.com/webkit/enc_js2.html , charset is declared in an embedded javascript with document.write() and it's honored by webkit. (MS IE still does not honor it.).
So, I'm removing parentheses around 'external javascript' in the summary.
Alexey Proskuryakov
The root cause is that WebKit doesn't look at charset declarations when parsing HTML - we only check for those in a quick pre-parsing phase. It's a deficiency in this pre-parsing phase that it picks meta declarations even inside scripts.
Jungshik Shin
event.naver.com fixed itself a while ago (at least they added meta charset declaration at the beginning of an html file).
I'm leaving this open because it's not filed as an evangelism bug for Naver but as a generic issue to deal with a meta charset declaration in an external JS file.
johnnyding
http://www.yellowurl.cn/1549919.html has same problem.
Adam Barth
(In reply to comment #3)
> The root cause is that WebKit doesn't look at charset declarations when parsing HTML - we only check for those in a quick pre-parsing phase. It's a deficiency in this pre-parsing phase that it picks meta declarations even inside scripts.
This should be fixed now, right?
Alexey Proskuryakov
No, this still happens - and I now think that this is a WONTFIX. Switching encodings after DOM has been partially constructed is nonsense.
Adam Barth
Ok. We should keep track of how often we run into compat problems from not making this change and re-evaluate if that happens too often.
Jungshik Shin
In comment #1, ap wrote that he thinks that it's legal in HTML 5. Is it still the case?
If so, should we ask HTML5 WG to reconsider HTML5 spec regarding this practice in light of this decision (WONTFIX)?