Bug 3809 - Should default to UTF-8 or UTF-16 for application/xml documents with omitted charset and encoding declaration
Summary: Should default to UTF-8 or UTF-16 for application/xml documents with omitted ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 312.x
Hardware: Mac OS X 10.3
: P2 Major
Assignee: Darin Adler
URL: http://hsivonen.iki.fi/test/mobile/la...
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-02 04:33 PDT by Henri Sivonen
Modified: 2019-02-06 09:04 PST (History)
2 users (show)

See Also:


Attachments
proposed patch (741 bytes, patch)
2005-09-09 12:49 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2005-07-02 04:33:38 PDT
Steps to reproduce:
1) Make Safari load (either in content area or through XMLHttpRequest) an XML
document that 
  does not have an XML declaration that declares the character encoding 
  AND
  does not have a BOM 
  AND
  is encoded in UTF-8 
  AND
  contains characters from outside the ASCII range
  AND
  is served as either application/xml or application/xhtml+xml
  AND
  has no charset parameter on the HTTP layer.

(Although the above looks very specific, the conditions commonly hold true.)

2) Observe.

Actual results:
The bytes are decoded as characters according to the Default Encoding in
Appearance preferences.

Expected results:
Expected the bytes to be decoded as characters according to UTF-8 as per section
3.2 of RFC 3023, which defers to XML 1.0 section 4.3.3.

Additional information:
Besides the obvious implications of this bug, there are two less obvious
implications:
1) Safari cannot properly consume Canonical XML.
2) Safari cannot properly consume XML documents it has produced itself via
XMLHttpRequest POST!
Comment 1 Oliver Hunt 2005-07-21 16:26:05 PDT
Would you be able to attach a test document,
cheers,
  Oliver
Comment 2 Henri Sivonen 2005-09-09 01:14:22 PDT
What reduction is needed beyond the case that has been in the URL field all along?
Comment 3 Oliver Hunt 2005-09-09 01:25:10 PDT
Behaviour is wrong (confirmed against ffx)
Comment 4 Alexey Proskuryakov 2005-09-09 12:49:23 PDT
Created attachment 3827 [details]
proposed patch

Well, the XML spec is pretty explicit about files that do not have an encoding
declaration in the text declaration - they should be UTF-8 or UTF-16, unless a
higher-level protocol defines a charset (4.3.3).
Comment 5 Alexey Proskuryakov 2005-09-09 12:50:57 PDT
The file from bug URL can serve as a test case (without a link to the next test, of course).
Comment 6 Darin Adler 2005-09-09 15:36:48 PDT
Comment on attachment 3827 [details]
proposed patch

Is there any other browser that has this behavior? The comments above lead me
to believe this is not working this way in Firefox.
Comment 7 Henri Sivonen 2005-09-09 23:55:57 PDT
Gecko used to have this same bug (at least in content area--not sure about
XMLHttpRequest), but it has been fixed.
Comment 8 Alexey Proskuryakov 2005-09-10 03:22:28 PDT
Henri, which Gecko bugfix are you referring to? I see that Firefox 1.0.5 renders the test as expected, but I 
couldn't find anything in Bugzilla.

I found <https://bugzilla.mozilla.org/show_bug.cgi?id=247024>, but it talks about a different issue: 
documents transferred with MIME type text/xml should default to us-ascii, not utf-8. I'm not sure if 
WebKit has the same problem, but if it has, that should be in a separate report IMO.
Comment 9 Darin Adler 2005-09-11 21:57:43 PDT
Comment on attachment 3827 [details]
proposed patch

I thought about it a lot, and I think it's fine to land the fix just like this.
Comment 10 Lucas Forschler 2019-02-06 09:04:18 PST
Mass moving XML DOM bugs to the "DOM" Component.