RESOLVED FIXED 14951
REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>
https://bugs.webkit.org/show_bug.cgi?id=14951
Summary REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>
Darin Adler
Reported 2007-08-12 15:06:48 PDT
At www.becu.org there's a missing character glyph, the <?> symbol, at what should be some whitespace. The reason for this is that we're interpreting the page as UTF-8, despite the fact that it's not labeled as such. That's because in this file there is the following stray content (not at the beginning, after the entire <head> section). <?xml version="1.0" encoding="UTF-16"?> This causes the TextResourceDecoder to decide the file is UTF-8! Doesn't happen in other browsers.
Attachments
test case (non-regression) (160 bytes, text/html)
2007-08-13 03:52 PDT, Alexey Proskuryakov
no flags
proposed fix (8.03 KB, patch)
2007-08-13 12:32 PDT, Alexey Proskuryakov
darin: review+
Darin Adler
Comment 1 2007-08-12 15:07:17 PDT
<rdar://problem/5400664> covers this bug and some other problems.
Alexey Proskuryakov
Comment 2 2007-08-13 03:48:26 PDT
This is a regression from shipping Safari/WebKit. Firefox 2 also honors an XML encoding declaration in HTML documents (which is the historical reason why we do so, too, ignoring HTML5), but only if it's at the very beginning of the file. Of course, both browsers detect an error if an XML declaration is not at the very beginning of an XML file. Also, Firefox doesn't have out quirk of treating UTF-16 declaration in 8-bit files as UTF-8.
Alexey Proskuryakov
Comment 3 2007-08-13 03:52:04 PDT
Created attachment 15951 [details] test case (non-regression) Here, the XML declaration is not at the beginning of the file, but before BODY - both shipping Safari/WebKit and TOT fail in this case (although the test doesn't work in the former due to missing support for document.characterSet).
Darin Adler
Comment 4 2007-08-13 10:45:45 PDT
Alexey, is your plan to ignore <?xml> that are not at the start of the file? I think that would probably be sufficient.
Alexey Proskuryakov
Comment 5 2007-08-13 11:13:57 PDT
(In reply to comment #4) > Alexey, is your plan to ignore <?xml> that are not at the start of the file? I > think that would probably be sufficient. Yes, that's precisely what I'm going to do here.
Alexey Proskuryakov
Comment 6 2007-08-13 12:32:31 PDT
Created attachment 15954 [details] proposed fix /me wonders how many off-by-one errors he managed to introduce this time.
Darin Adler
Comment 7 2007-08-13 12:42:09 PDT
Comment on attachment 15954 [details] proposed fix + // Is there enough data available to check for XML declaration? + if (m_buffer.size() < 8) + return false; Why not put that check before setting up ptr and pEnd? r=me
Darin Adler
Comment 8 2007-08-13 12:42:47 PDT
Oh, and don't forget to land a test case for what this fixed!
Alexey Proskuryakov
Comment 9 2007-08-13 13:22:18 PDT
(In reply to comment #7) > Why not put that check before setting up ptr and pEnd? OK > Oh, and don't forget to land a test case for what this fixed! Do you mean that there should be a test case specifically for XML declaration after HEAD? Is the included test not sufficient?
Darin Adler
Comment 10 2007-08-13 13:30:32 PDT
(In reply to comment #9) > Do you mean that there should be a test case specifically for XML declaration > after HEAD? Is the included test not sufficient? I think the included test is sufficient. Sorry, I completely overlooked it. I would have expected a test that was more like the original site too, but I don't think that's really all that helpful or important.
Alexey Proskuryakov
Comment 11 2007-08-13 22:29:17 PDT
Committed revision 25066.
Alexey Proskuryakov
Comment 12 2007-08-13 22:30:41 PDT
Oops, forgot to move the check for m_buffer.size() as promised.
Note You need to log in before you can comment on or make changes to this bug.