Bug 17182 - Charset declared with document.write in an external Javascript is not honored
Summary: Charset declared with document.write in an external Javascript is not honored
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P3 Normal
Assignee: Nobody
URL: http://i18nl10n.com/webkit/enc_extjs....
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-05 11:41 PST by Jungshik Shin
Modified: 2011-08-29 11:38 PDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jungshik Shin 2008-02-05 11:41:02 PST
(this report contains non-ASCII characters in UTF-8. View this with the encoding set to UTF-8). 

* How to reproduce
  1. Go to the url above (which is a reduced test case of http://event.naver.com )

* Expected:
   Two Korean syllables, '가각' should appear. (in case of Naver, Korean characters should come up everywhere)

* Actual: 
   '°¡°¢' in place of '가각' show up.  ('가각' in EUC-KR is 0xB0 0xA1 0xB0 0xA2 and when they're interpreted as ISO-8859-1, they become '°¡°¢'). 

http://event.naver.com (Naver has the largest market share (over 70%) in Korean web search/portal ) uses a very strange way of setting the page encoding. The charset is set in an external javascript file referred to by the html file. The JS file in question is http://event.naver.com/include/head2.inc. It has the following lines:

------------------------
document.write("\
<html>\
<head>\
<meta http-equiv='Content-type' content='text/html; charset=euc-kr'>\
<title>네이버 :: 이벤트</title>\
<link rel=stylesheet type=text/css href=http://event.naver.com/event.css>\
<!-- 신규 추가  GNB ---->\
<!-- 이미 이 부분이 들어가 있는 서비스라면 또 넣지 않아도 됩니다. -->\
<script type=text/javascript>document.domain = 'naver.com';</script>\
<!-- 신규 추가  GNB---->\
</head>\
<body topmargin=0 rightmargin=0 bottommargin=50 leftmargin=0 bgcolor=#FFFFFF>\
<center>\
... snip ... "); 

--------------

Firefox honors the meta charset declared this way (when testing this, make sure to turn OFF the encoding detector in FF and to set the default encoding in FF to something other than Korean (EUC-KR)/ Korean (UHC)).  

MS IE does not. (how could Naver have this page in Korea where IE has 99% market share?  They rely on the fact that the default encoding of IE is set to Korean (Windows-949) in Korea).  

What to do about it?  

I've come across several sites with this strange way of charset declaration before. If it can be done easily, we'd want to do that. Otherwise, this is an evangelism issue (For Naver, I'll talk to my contact there to FIX this).
Comment 1 Alexey Proskuryakov 2008-02-05 12:34:34 PST
I believe that this is legal according to HTML5, so I'd like to track this as a bug. On the other hand, fixing it is not easy, and we match IE, so we probably won't fix it very soon.

> (For Naver, I'll talk to my contact there to FIX this).

That would be great - thank you very much!
Comment 2 Jungshik Shin 2008-02-05 13:00:01 PST
In http://i18nl10n.com/webkit/enc_js2.html , charset is declared in an embedded javascript with document.write() and it's honored by webkit. (MS IE still does not honor it.).  

So, I'm removing parentheses around 'external javascript' in the summary. 
Comment 3 Alexey Proskuryakov 2008-02-05 13:26:30 PST
The root cause is that WebKit doesn't look at charset declarations when parsing HTML - we only check for those in a quick pre-parsing phase. It's a deficiency in this pre-parsing phase that it picks meta declarations even inside scripts.
Comment 4 Jungshik Shin 2009-05-04 12:55:05 PDT
event.naver.com fixed itself a while ago (at least they added meta charset declaration at the beginning of an html file). 

I'm leaving this open because it's not filed as an evangelism bug for Naver but as a generic issue to deal with a meta charset declaration in an external JS file. 
Comment 5 johnnyding 2009-06-25 03:38:07 PDT
http://www.yellowurl.cn/1549919.html has same problem.
Comment 6 Adam Barth 2011-01-20 22:24:05 PST
(In reply to comment #3)
> The root cause is that WebKit doesn't look at charset declarations when parsing HTML - we only check for those in a quick pre-parsing phase. It's a deficiency in this pre-parsing phase that it picks meta declarations even inside scripts.

This should be fixed now, right?
Comment 7 Alexey Proskuryakov 2011-01-20 22:33:59 PST
No, this still happens - and I now think that this is a WONTFIX. Switching encodings after DOM has been partially constructed is nonsense.
Comment 8 Adam Barth 2011-01-20 22:38:38 PST
Ok.  We should keep track of how often we run into compat problems from not making this change and re-evaluate if that happens too often.
Comment 9 Jungshik Shin 2011-08-29 11:38:50 PDT
In comment #1, ap wrote that he thinks that it's legal in HTML 5.   Is it still the case? 

If so, should we ask HTML5 WG to reconsider HTML5 spec regarding this practice in light of this decision (WONTFIX)?