RESOLVED CONFIGURATION CHANGED215764
incorrect charset default for text/xml
https://bugs.webkit.org/show_bug.cgi?id=215764
Summary incorrect charset default for text/xml
Julian Reschke
Reported 2020-08-24 04:09:16 PDT
Apparently, when getting a content-type of "text/xml" (no charset parameter), Safari defaults to ISO-8859-1, instead of inspecting the XML content. See testcase at http://test.greenbytes.de/tech/tc/httpcontenttype/#textxmlnodefaultutf8nodecl (note that Firefox and Chrome correctly detect the charset.
Attachments
Alexey Proskuryakov
Comment 1 2020-08-24 17:29:45 PDT
Could you please clarify what you expect as "inspecting the XML content"? This test case doesn't seem to have any kind of encoding declaration, so it could expect either defaulting to UTF-8, or sniffing. I think that we are probably defaulting to the embedding page charset here, and that wouldn't seem obviously wrong.
Julian Reschke
Comment 2 2020-08-24 21:29:42 PDT
I would expect that it follows: https://www.w3.org/TR/REC-xml/#sec-guessing That's what the other browsers do.
Julian Reschke
Comment 3 2020-08-24 21:32:42 PDT
And: https://www.w3.org/TR/REC-xml/#charencoding says: "hough an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration) containing an encoding declaration: (...)"
Alexey Proskuryakov
Comment 4 2020-08-25 09:30:02 PDT
https://www.w3.org/TR/REC-xml/#charencoding defers to RFC 3023 for text/xml resources delivered over http, which says: Conformant with [RFC2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. In cases where the XML MIME entity is transmitted via HTTP, the default charset value is still "us-ascii". (Note: There is an inconsistency between this specification and HTTP/1.1, which uses ISO-8859-1[ISO8859] as the default for a historical reason. Since XML is a new format, a new default should be chosen for better I18N. US-ASCII was chosen, since it is the intersection of UTF-8 and ISO-8859-1 and since it is already used by MIME.) So it looks like other browser engines violate the spec in a different way. Us inheriting the default charset from the page is at least consistent with how other text/ subresources are handled.
Julian Reschke
Comment 5 2020-08-25 12:37:28 PDT
Unless I'm missing something, https://www.w3.org/TR/REC-xml/#charencoding does not refer to RFC 3023 at all. That said, what would be relevant is the *current* definition of the text/xml media type, which is RFC 7303. Also, it seems you missed the normative text in <https://www.w3.org/TR/REC-xml/#charencoding>: "Though an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration) containing an encoding declaration: (...)" Note the last sentence; if there is no external character encoding information, the default is UTF-8 or UTF-16, nothing else.
Alexey Proskuryakov
Comment 6 2020-08-25 13:03:17 PDT
Wrong copy/paste, I wanted to say that https://www.w3.org/TR/REC-xml/#sec-guessing referred to RFC 3023. My understanding of the specs' language is that anything loaded via http falls into "has external character encoding information" case, even when there is no charset in http headers - this just means that external information is taken as default for http.
Julian Reschke
Comment 7 2020-08-26 03:57:16 PDT
...but there is no default in HTTP. (there was in RFC 2616, but that was removed in RFC 723* with good reasons)
Radar WebKit Bug Importer
Comment 9 2020-08-31 04:10:16 PDT
Anne van Kesteren
Comment 10 2023-12-18 04:51:42 PST
This appears to have been fixed.
Note You need to log in before you can comment on or make changes to this bug.