WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
3809
Should default to UTF-8 or UTF-16 for application/xml documents with omitted charset and encoding declaration
https://bugs.webkit.org/show_bug.cgi?id=3809
Summary
Should default to UTF-8 or UTF-16 for application/xml documents with omitted ...
Henri Sivonen
Reported
2005-07-02 04:33:38 PDT
Steps to reproduce: 1) Make Safari load (either in content area or through XMLHttpRequest) an XML document that does not have an XML declaration that declares the character encoding AND does not have a BOM AND is encoded in UTF-8 AND contains characters from outside the ASCII range AND is served as either application/xml or application/xhtml+xml AND has no charset parameter on the HTTP layer. (Although the above looks very specific, the conditions commonly hold true.) 2) Observe. Actual results: The bytes are decoded as characters according to the Default Encoding in Appearance preferences. Expected results: Expected the bytes to be decoded as characters according to UTF-8 as per section 3.2 of RFC 3023, which defers to XML 1.0 section 4.3.3. Additional information: Besides the obvious implications of this bug, there are two less obvious implications: 1) Safari cannot properly consume Canonical XML. 2) Safari cannot properly consume XML documents it has produced itself via XMLHttpRequest POST!
Attachments
proposed patch
(741 bytes, patch)
2005-09-09 12:49 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Oliver Hunt
Comment 1
2005-07-21 16:26:05 PDT
Would you be able to attach a test document, cheers, Oliver
Henri Sivonen
Comment 2
2005-09-09 01:14:22 PDT
What reduction is needed beyond the case that has been in the URL field all along?
Oliver Hunt
Comment 3
2005-09-09 01:25:10 PDT
Behaviour is wrong (confirmed against ffx)
Alexey Proskuryakov
Comment 4
2005-09-09 12:49:23 PDT
Created
attachment 3827
[details]
proposed patch Well, the XML spec is pretty explicit about files that do not have an encoding declaration in the text declaration - they should be UTF-8 or UTF-16, unless a higher-level protocol defines a charset (4.3.3).
Alexey Proskuryakov
Comment 5
2005-09-09 12:50:57 PDT
The file from bug URL can serve as a test case (without a link to the next test, of course).
Darin Adler
Comment 6
2005-09-09 15:36:48 PDT
Comment on
attachment 3827
[details]
proposed patch Is there any other browser that has this behavior? The comments above lead me to believe this is not working this way in Firefox.
Henri Sivonen
Comment 7
2005-09-09 23:55:57 PDT
Gecko used to have this same bug (at least in content area--not sure about XMLHttpRequest), but it has been fixed.
Alexey Proskuryakov
Comment 8
2005-09-10 03:22:28 PDT
Henri, which Gecko bugfix are you referring to? I see that Firefox 1.0.5 renders the test as expected, but I couldn't find anything in Bugzilla. I found <
https://bugzilla.mozilla.org/show_bug.cgi?id=247024
>, but it talks about a different issue: documents transferred with MIME type text/xml should default to us-ascii, not utf-8. I'm not sure if WebKit has the same problem, but if it has, that should be in a separate report IMO.
Darin Adler
Comment 9
2005-09-11 21:57:43 PDT
Comment on
attachment 3827
[details]
proposed patch I thought about it a lot, and I think it's fine to land the fix just like this.
Lucas Forschler
Comment 10
2019-02-06 09:04:18 PST
Mass moving XML DOM bugs to the "DOM" Component.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug