Bug 14951 - REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>
Summary: REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P1 Normal
Assignee: Alexey Proskuryakov
URL: http://www.becu.org
Keywords: InRadar, Regression
Depends on:
Blocks:
 
Reported: 2007-08-12 15:06 PDT by Darin Adler
Modified: 2007-08-13 22:30 PDT (History)
1 user (show)

See Also:


Attachments
test case (non-regression) (160 bytes, text/html)
2007-08-13 03:52 PDT, Alexey Proskuryakov
no flags Details
proposed fix (8.03 KB, patch)
2007-08-13 12:32 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2007-08-12 15:06:48 PDT
At www.becu.org there's a missing character glyph, the <?> symbol, at what should be some whitespace.

The reason for this is that we're interpreting the page as UTF-8, despite the fact that it's not labeled as such. That's because in this file there is the following stray content (not at the beginning, after the entire <head> section).

    <?xml version="1.0" encoding="UTF-16"?>

This causes the TextResourceDecoder to decide the file is UTF-8! Doesn't happen in other browsers.
Comment 1 Darin Adler 2007-08-12 15:07:17 PDT
<rdar://problem/5400664> covers this bug and some other problems.
Comment 2 Alexey Proskuryakov 2007-08-13 03:48:26 PDT
This is a regression from shipping Safari/WebKit.

Firefox 2 also honors an XML encoding declaration in HTML documents (which is the historical reason why we do so, too, ignoring HTML5), but only if it's at the very beginning of the file. Of course, both browsers detect an error if an XML declaration is not at the very beginning of an XML file.

Also, Firefox doesn't have out quirk of treating UTF-16 declaration in 8-bit files as UTF-8.
Comment 3 Alexey Proskuryakov 2007-08-13 03:52:04 PDT
Created attachment 15951 [details]
test case (non-regression)

Here, the XML declaration is not at the beginning of the file, but before BODY - both shipping Safari/WebKit and TOT fail in this case (although the test doesn't work in the former due to missing support for document.characterSet).
Comment 4 Darin Adler 2007-08-13 10:45:45 PDT
Alexey, is your plan to ignore <?xml> that are not at the start of the file? I think that would probably be sufficient.
Comment 5 Alexey Proskuryakov 2007-08-13 11:13:57 PDT
(In reply to comment #4)
> Alexey, is your plan to ignore <?xml> that are not at the start of the file? I
> think that would probably be sufficient.

Yes, that's precisely what I'm going to do here.
Comment 6 Alexey Proskuryakov 2007-08-13 12:32:31 PDT
Created attachment 15954 [details]
proposed fix

/me wonders how many off-by-one errors he managed to introduce this time.
Comment 7 Darin Adler 2007-08-13 12:42:09 PDT
Comment on attachment 15954 [details]
proposed fix

+    // Is there enough data available to check for XML declaration?
+    if (m_buffer.size() < 8)
+        return false;

Why not put that check before setting up ptr and pEnd?

r=me
Comment 8 Darin Adler 2007-08-13 12:42:47 PDT
Oh, and don't forget to land a test case for what this fixed!
Comment 9 Alexey Proskuryakov 2007-08-13 13:22:18 PDT
(In reply to comment #7)
> Why not put that check before setting up ptr and pEnd?

  OK

> Oh, and don't forget to land a test case for what this fixed!

  Do you mean that there should be a test case specifically for XML declaration after HEAD? Is the included test not sufficient? 
Comment 10 Darin Adler 2007-08-13 13:30:32 PDT
(In reply to comment #9)
> Do you mean that there should be a test case specifically for XML declaration
> after HEAD? Is the included test not sufficient? 

I think the included test is sufficient. Sorry, I completely overlooked it.

I would have expected a test that was more like the original site too, but I don't think that's really all that helpful or important.
Comment 11 Alexey Proskuryakov 2007-08-13 22:29:17 PDT
Committed revision 25066.

Comment 12 Alexey Proskuryakov 2007-08-13 22:30:41 PDT
Oops, forgot to move the check for m_buffer.size() as promised.