14951 – REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>

RESOLVED FIXED 14951

REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>

https://bugs.webkit.org/show_bug.cgi?id=14951

Summary REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>

Darin Adler

Reported 2007-08-12 15:06:48 PDT

At www.becu.org there's a missing character glyph, the <?> symbol, at what should be some whitespace. The reason for this is that we're interpreting the page as UTF-8, despite the fact that it's not labeled as such. That's because in this file there is the following stray content (not at the beginning, after the entire <head> section). <?xml version="1.0" encoding="UTF-16"?> This causes the TextResourceDecoder to decide the file is UTF-8! Doesn't happen in other browsers.

Attachments
test case (non-regression) (160 bytes, text/html) 2007-08-13 03:52 PDT, Alexey Proskuryakov	no flags	Details
proposed fix (8.03 KB, patch) 2007-08-13 12:32 PDT, Alexey Proskuryakov	darin: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Darin Adler

Comment 1 2007-08-12 15:07:17 PDT

<rdar://problem/5400664> covers this bug and some other problems.

Alexey Proskuryakov

Comment 2 2007-08-13 03:48:26 PDT

This is a regression from shipping Safari/WebKit. Firefox 2 also honors an XML encoding declaration in HTML documents (which is the historical reason why we do so, too, ignoring HTML5), but only if it's at the very beginning of the file. Of course, both browsers detect an error if an XML declaration is not at the very beginning of an XML file. Also, Firefox doesn't have out quirk of treating UTF-16 declaration in 8-bit files as UTF-8.

Alexey Proskuryakov

Comment 3 2007-08-13 03:52:04 PDT

Created attachment 15951 [details] test case (non-regression) Here, the XML declaration is not at the beginning of the file, but before BODY - both shipping Safari/WebKit and TOT fail in this case (although the test doesn't work in the former due to missing support for document.characterSet).

Darin Adler

Comment 4 2007-08-13 10:45:45 PDT

Alexey, is your plan to ignore <?xml> that are not at the start of the file? I think that would probably be sufficient.

Alexey Proskuryakov

Comment 5 2007-08-13 11:13:57 PDT

(In reply to comment #4) > Alexey, is your plan to ignore <?xml> that are not at the start of the file? I > think that would probably be sufficient. Yes, that's precisely what I'm going to do here.

Alexey Proskuryakov

Comment 6 2007-08-13 12:32:31 PDT

Created attachment 15954 [details] proposed fix /me wonders how many off-by-one errors he managed to introduce this time.

Darin Adler

Comment 7 2007-08-13 12:42:09 PDT

Comment on attachment 15954 [details] proposed fix + // Is there enough data available to check for XML declaration? + if (m_buffer.size() < 8) + return false; Why not put that check before setting up ptr and pEnd? r=me

Darin Adler

Comment 8 2007-08-13 12:42:47 PDT

Oh, and don't forget to land a test case for what this fixed!

Alexey Proskuryakov

Comment 9 2007-08-13 13:22:18 PDT

(In reply to comment #7) > Why not put that check before setting up ptr and pEnd? OK > Oh, and don't forget to land a test case for what this fixed! Do you mean that there should be a test case specifically for XML declaration after HEAD? Is the included test not sufficient?

Darin Adler

Comment 10 2007-08-13 13:30:32 PDT

(In reply to comment #9) > Do you mean that there should be a test case specifically for XML declaration > after HEAD? Is the included test not sufficient? I think the included test is sufficient. Sorry, I completely overlooked it. I would have expected a test that was more like the original site too, but I don't think that's really all that helpful or important.

Alexey Proskuryakov

Comment 11 2007-08-13 22:29:17 PDT

Committed revision 25066.

Alexey Proskuryakov

Comment 12 2007-08-13 22:30:41 PDT

Oops, forgot to move the check for m_buffer.size() as promised.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P1

Severity Normal

Classification Unclassified

Version 523.x (Safari 3)

Hardware Mac

OS OS X 10.4

Product WebKit

Component WebCore Misc.

Assignee

Alexey Proskuryakov

Reported

2007-08-12 15:06 PDT

Modified

2007-08-13 22:30 PDT History

CC List

1 user Show

URL

http://www.becu.org

Keywords InRadar, Regression

Depends on

Blocks