Bug 16987
Summary: | XMLHTTPRequest.responseText (text/html) without charset label does not inherit the referring page's charset | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | ap |
Priority: | P2 | ||
Version: | 523.x (Safari 3) | ||
Hardware: | PC | ||
OS: | All |
Jungshik Shin
* How to reproduce
1. Go to http://i18nl10n.com/webkit/xhrtest.html
* Expected (IE6, Firefox and Opera all do this)
Both columns should be identical with two lines of the first 5 letters of Cyrillic alphabet
* Actual
The 2nd line in the right column (text/html) has 'gibberish' rather than the first 5 letters of Cyrillic alphabet.
The left column ( http://i18nl10n.com/webkit/xhrtest1.html with meta for UTF-8) loads xhrtestdata1.txt (encoded in UTF-8 and emitted with HTTP header : Content-Type: text/plain) while the right column (http://i18nl10n.com/webkit/xhrtest2.html : with meta for UTF-8) loads xhrtestdata2.html (encoded in UTF-8 without meta and emitted with HTTP header : 'Content-Type: text/html')
Neither xhrtestdata1.txt nor xhrtestdata2.html has 'UTF-8 BOM'.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Jungshik Shin
I forgot to mention that the gibberish is the result of interpreting the 1st 5 Cyrillic letters in UTF-8 as ISO-8859-1 (the default encoding).
There can be other variations of these two test cases.
Jungshik Shin
sorry for bug spam.
(In reply to comment #1)
> I forgot to mention that the gibberish is the result of interpreting the 1st 5
> Cyrillic letters in UTF-8 as ISO-8859-1 (the default encoding).
s/ISO-8859-1/Windows-1252/
Eric Seidel (no email)
This fails for me on 3.0.4 (mac), however it works on TOT. I think this bug was fixed already, not sure where.
Eric Seidel (no email)
Fails in the latest Windows beta.
This is the file which would have changed:
http://trac.webkit.org/projects/webkit/log/trunk/WebCore/xml/XMLHttpRequest.cpp
A quick scan didn't find the change. However, I think we can still close this as "Fixed".
Jungshik Shin
It's 'fixed' in http://trac.webkit.org/projects/webkit/browser/trunk/WebCore/xml/XMLHttpRequest.cpp?rev=28934
However, I'm not sure that change is the 'right' thing to do. I'll make another test case and see what happens.
Jungshik Shin
I was wrong to think that 'text/html'(without charset specified anywhere) obtained through XHR inherits charset from the referring document in FF and IE. ap's change makes webkit compatible with FF and IE.
Jungshik Shin
(In reply to comment #6)
> ap's change makes webkit compatible with FF and IE.
Which assumes UTF-8 when charset is not specified.