VERIFIED FIXED 5548
UTF-16 charset incorrectly specified in an 8-bit document at www.ponyexpress.ru
https://bugs.webkit.org/show_bug.cgi?id=5548
Summary UTF-16 charset incorrectly specified in an 8-bit document at www.ponyexpress.ru
Alexey Proskuryakov
Reported 2005-10-29 10:15:58 PDT
This site has pages that begin with: <?xml version="1.0" encoding="UTF-16"?> <html xmlns:msxsl="urn:schemas-microsoft-com:xslt"> <head> <meta content="text/html; charset=windows-1251" http-equiv="Content-Type"> The actual encoding is windows-1251. Since UTF-16 is impossible in eight-bit documents like this, it can/should be ignored. Firefox and MacIE render this site correctly. The proposed patch fixes this problem, and provides test cases for it and for rdar://3182977 ("unicode" encoding handled as UTF-16 rather than UTF-8 at www.delcom-eng.com).
Attachments
proposed patch (4.13 KB, patch)
2005-10-29 10:16 PDT, Alexey Proskuryakov
darin: review-
allow meta to override encoding from XML declaration (4.47 KB, patch)
2005-11-03 11:03 PST, Alexey Proskuryakov
darin: review+
Alexey Proskuryakov
Comment 1 2005-10-29 10:16:27 PDT
Created attachment 4525 [details] proposed patch
Alexey Proskuryakov
Comment 2 2005-10-29 10:18:18 PDT
(the test cases go to fast/encoding)
Alexey Proskuryakov
Comment 3 2005-10-30 05:37:42 PST
Comment on attachment 4525 [details] proposed patch Um, this doesn't look right, clearing the review flag...
Darin Adler
Comment 4 2005-10-31 10:01:02 PST
Comment on attachment 4525 [details] proposed patch This doesn't look quite right to me. Testing with other browsers long ago, I found that pages marked "UTF-16" (not XML pages, but HTML ones) in <meta> tags were treated as UTF-8 by other browsers. Not default encoding (Windows Latin-1 for the "default default"), but specifically UTF-8. This patch changes that behavior to make some XML cases work better; I think that's incorrect.
Alexey Proskuryakov
Comment 5 2005-10-31 11:24:50 PST
Perhaps, Decoder should know if it's decoding HTML or XML (in Firefox, encoding from <meta> tags doesn't seem to be used for XML)... I'll try to figure out the correct behavior.
Alexey Proskuryakov
Comment 6 2005-11-03 11:03:42 PST
Created attachment 4584 [details] allow meta to override encoding from XML declaration WinIE (but not Firefox) indeed treats HTML pages marked UTF-16 in <meta> tags as UTF-8, thank you for noticing! This new patch includes a regression test for this, too. Allowing <meta> to override XML encoding seems to match what Firefox does for HTML. For XHTML, Firefox ignores <meta>, but Safari doesn't - the patch doesn't change this (although also allows such overriding).
Darin Adler
Comment 7 2005-11-03 12:27:45 PST
Comment on attachment 4584 [details] allow meta to override encoding from XML declaration I don't understand the logic here. You say that Gecko ignores <meta> elements entirely for "real XHTML". And you say that this patch leaves WebKit respecting <meta> elements for "real XHTML" and goes further, allowing such <meta> tags to override the character set specified in the XML declaration. This sounds like the wrong direction to go if we're looking for compatibility with Gecko. Can you clarify why this is a desirable change?
Darin Adler
Comment 8 2005-11-03 12:30:07 PST
Comment on attachment 4584 [details] allow meta to override encoding from XML declaration I think I see what's going on. This site isn't "real XHTML". It's "XHTML being served with a plain HTML MIME type". I guess in that case we want to match what the other major browsers do. Do they look at the character set in the XML header at all in cases like this?
Alexey Proskuryakov
Comment 9 2005-11-03 12:59:09 PST
(In reply to comment #8) > Do they look at the character set in the XML header at all in cases like this? Yes, Firefox and Opera do look at it - a test is at <http://nypop.com/~ap/webkit/xhtml.html>. MacIE doesn't; cannot say about WinIE (browsershots.org doesn't work with it at the moment). Although it's unfortunate that this patch slightly changes the "real XHTML" behavior, making it less similar to Firefox, I think that this can only be handled by making Decoder know about what kind of source it parses, which looks like a separate undertaking.
Darin Adler
Comment 10 2005-11-03 13:06:49 PST
Comment on attachment 4584 [details] allow meta to override encoding from XML declaration OK, I'm convinced now. r=me
Alexey Proskuryakov
Comment 11 2005-11-03 13:36:31 PST
Filed the <meta> in "real XHTML" issue as bug 5620.
Alexey Proskuryakov
Comment 12 2005-11-24 21:43:29 PST
Bumping priority to P1, because the patch also fixes a regression in bug 5823.
Eric Seidel (no email)
Comment 13 2005-11-26 17:12:50 PST
ap: landing would be even easier if you provided the test case in patch form... ChangeLog entry as a bonus. :)
Eric Seidel (no email)
Comment 14 2005-11-26 18:03:25 PST
nm, I now see that's included in your patch! Thanks, landing now.
Note You need to log in before you can comment on or make changes to this bug.