WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
14951
REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>
https://bugs.webkit.org/show_bug.cgi?id=14951
Summary
REGRESSION: page interpreted as UTF-8 because of stray <?xml> after <head>
Darin Adler
Reported
2007-08-12 15:06:48 PDT
At www.becu.org there's a missing character glyph, the <?> symbol, at what should be some whitespace. The reason for this is that we're interpreting the page as UTF-8, despite the fact that it's not labeled as such. That's because in this file there is the following stray content (not at the beginning, after the entire <head> section). <?xml version="1.0" encoding="UTF-16"?> This causes the TextResourceDecoder to decide the file is UTF-8! Doesn't happen in other browsers.
Attachments
test case (non-regression)
(160 bytes, text/html)
2007-08-13 03:52 PDT
,
Alexey Proskuryakov
no flags
Details
proposed fix
(8.03 KB, patch)
2007-08-13 12:32 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2007-08-12 15:07:17 PDT
<
rdar://problem/5400664
> covers this bug and some other problems.
Alexey Proskuryakov
Comment 2
2007-08-13 03:48:26 PDT
This is a regression from shipping Safari/WebKit. Firefox 2 also honors an XML encoding declaration in HTML documents (which is the historical reason why we do so, too, ignoring HTML5), but only if it's at the very beginning of the file. Of course, both browsers detect an error if an XML declaration is not at the very beginning of an XML file. Also, Firefox doesn't have out quirk of treating UTF-16 declaration in 8-bit files as UTF-8.
Alexey Proskuryakov
Comment 3
2007-08-13 03:52:04 PDT
Created
attachment 15951
[details]
test case (non-regression) Here, the XML declaration is not at the beginning of the file, but before BODY - both shipping Safari/WebKit and TOT fail in this case (although the test doesn't work in the former due to missing support for document.characterSet).
Darin Adler
Comment 4
2007-08-13 10:45:45 PDT
Alexey, is your plan to ignore <?xml> that are not at the start of the file? I think that would probably be sufficient.
Alexey Proskuryakov
Comment 5
2007-08-13 11:13:57 PDT
(In reply to
comment #4
)
> Alexey, is your plan to ignore <?xml> that are not at the start of the file? I > think that would probably be sufficient.
Yes, that's precisely what I'm going to do here.
Alexey Proskuryakov
Comment 6
2007-08-13 12:32:31 PDT
Created
attachment 15954
[details]
proposed fix /me wonders how many off-by-one errors he managed to introduce this time.
Darin Adler
Comment 7
2007-08-13 12:42:09 PDT
Comment on
attachment 15954
[details]
proposed fix + // Is there enough data available to check for XML declaration? + if (m_buffer.size() < 8) + return false; Why not put that check before setting up ptr and pEnd? r=me
Darin Adler
Comment 8
2007-08-13 12:42:47 PDT
Oh, and don't forget to land a test case for what this fixed!
Alexey Proskuryakov
Comment 9
2007-08-13 13:22:18 PDT
(In reply to
comment #7
)
> Why not put that check before setting up ptr and pEnd?
OK
> Oh, and don't forget to land a test case for what this fixed!
Do you mean that there should be a test case specifically for XML declaration after HEAD? Is the included test not sufficient?
Darin Adler
Comment 10
2007-08-13 13:30:32 PDT
(In reply to
comment #9
)
> Do you mean that there should be a test case specifically for XML declaration > after HEAD? Is the included test not sufficient?
I think the included test is sufficient. Sorry, I completely overlooked it. I would have expected a test that was more like the original site too, but I don't think that's really all that helpful or important.
Alexey Proskuryakov
Comment 11
2007-08-13 22:29:17 PDT
Committed revision 25066.
Alexey Proskuryakov
Comment 12
2007-08-13 22:30:41 PDT
Oops, forgot to move the check for m_buffer.size() as promised.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug