RESOLVED FIXED 3809
Should default to UTF-8 or UTF-16 for application/xml documents with omitted charset and encoding declaration
https://bugs.webkit.org/show_bug.cgi?id=3809
Summary Should default to UTF-8 or UTF-16 for application/xml documents with omitted ...
Henri Sivonen
Reported 2005-07-02 04:33:38 PDT
Steps to reproduce: 1) Make Safari load (either in content area or through XMLHttpRequest) an XML document that does not have an XML declaration that declares the character encoding AND does not have a BOM AND is encoded in UTF-8 AND contains characters from outside the ASCII range AND is served as either application/xml or application/xhtml+xml AND has no charset parameter on the HTTP layer. (Although the above looks very specific, the conditions commonly hold true.) 2) Observe. Actual results: The bytes are decoded as characters according to the Default Encoding in Appearance preferences. Expected results: Expected the bytes to be decoded as characters according to UTF-8 as per section 3.2 of RFC 3023, which defers to XML 1.0 section 4.3.3. Additional information: Besides the obvious implications of this bug, there are two less obvious implications: 1) Safari cannot properly consume Canonical XML. 2) Safari cannot properly consume XML documents it has produced itself via XMLHttpRequest POST!
Attachments
proposed patch (741 bytes, patch)
2005-09-09 12:49 PDT, Alexey Proskuryakov
darin: review+
Oliver Hunt
Comment 1 2005-07-21 16:26:05 PDT
Would you be able to attach a test document, cheers, Oliver
Henri Sivonen
Comment 2 2005-09-09 01:14:22 PDT
What reduction is needed beyond the case that has been in the URL field all along?
Oliver Hunt
Comment 3 2005-09-09 01:25:10 PDT
Behaviour is wrong (confirmed against ffx)
Alexey Proskuryakov
Comment 4 2005-09-09 12:49:23 PDT
Created attachment 3827 [details] proposed patch Well, the XML spec is pretty explicit about files that do not have an encoding declaration in the text declaration - they should be UTF-8 or UTF-16, unless a higher-level protocol defines a charset (4.3.3).
Alexey Proskuryakov
Comment 5 2005-09-09 12:50:57 PDT
The file from bug URL can serve as a test case (without a link to the next test, of course).
Darin Adler
Comment 6 2005-09-09 15:36:48 PDT
Comment on attachment 3827 [details] proposed patch Is there any other browser that has this behavior? The comments above lead me to believe this is not working this way in Firefox.
Henri Sivonen
Comment 7 2005-09-09 23:55:57 PDT
Gecko used to have this same bug (at least in content area--not sure about XMLHttpRequest), but it has been fixed.
Alexey Proskuryakov
Comment 8 2005-09-10 03:22:28 PDT
Henri, which Gecko bugfix are you referring to? I see that Firefox 1.0.5 renders the test as expected, but I couldn't find anything in Bugzilla. I found <https://bugzilla.mozilla.org/show_bug.cgi?id=247024>, but it talks about a different issue: documents transferred with MIME type text/xml should default to us-ascii, not utf-8. I'm not sure if WebKit has the same problem, but if it has, that should be in a separate report IMO.
Darin Adler
Comment 9 2005-09-11 21:57:43 PDT
Comment on attachment 3827 [details] proposed patch I thought about it a lot, and I think it's fine to land the fix just like this.
Lucas Forschler
Comment 10 2019-02-06 09:04:18 PST
Mass moving XML DOM bugs to the "DOM" Component.
Note You need to log in before you can comment on or make changes to this bug.