<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>215764</bug_id>
          
          <creation_ts>2020-08-24 04:09:16 -0700</creation_ts>
          <short_desc>incorrect charset default for text/xml</short_desc>
          <delta_ts>2023-12-18 04:51:42 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>DOM</component>
          <version>Safari 13</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>CONFIGURATION CHANGED</resolution>
          
          
          <bug_file_loc>http://test.greenbytes.de/tech/tc/httpcontenttype/#textxmlnodefaultutf8nodecl</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Julian Reschke">julian.reschke</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>annevk</cc>
    
    <cc>ap</cc>
    
    <cc>webkit-bug-importer</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1682294</commentid>
    <comment_count>0</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2020-08-24 04:09:16 -0700</bug_when>
    <thetext>Apparently, when getting a content-type of &quot;text/xml&quot; (no charset parameter), Safari defaults to ISO-8859-1, instead of inspecting the XML content.

See testcase at

  http://test.greenbytes.de/tech/tc/httpcontenttype/#textxmlnodefaultutf8nodecl

(note that Firefox and Chrome correctly detect the charset.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682507</commentid>
    <comment_count>1</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2020-08-24 17:29:45 -0700</bug_when>
    <thetext>Could you please clarify what you expect as &quot;inspecting the XML content&quot;? This test case doesn&apos;t seem to have any kind of encoding declaration, so it could expect either defaulting to UTF-8, or sniffing.

I think that we are probably defaulting to the embedding page charset here, and that wouldn&apos;t seem obviously wrong.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682561</commentid>
    <comment_count>2</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2020-08-24 21:29:42 -0700</bug_when>
    <thetext>I would expect that it follows:

   https://www.w3.org/TR/REC-xml/#sec-guessing

That&apos;s what the other browsers do.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682564</commentid>
    <comment_count>3</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2020-08-24 21:32:42 -0700</bug_when>
    <thetext>And:

   https://www.w3.org/TR/REC-xml/#charencoding

says:

&quot;hough an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration) containing an encoding declaration: (...)&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682671</commentid>
    <comment_count>4</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2020-08-25 09:30:02 -0700</bug_when>
    <thetext>https://www.w3.org/TR/REC-xml/#charencoding defers to RFC 3023 for text/xml resources delivered over http, which says:

      Conformant with [RFC2046], if a text/xml entity is received with
      the charset parameter omitted, MIME processors and XML processors
      MUST use the default charset value of &quot;us-ascii&quot;[ASCII].  In cases
      where the XML MIME entity is transmitted via HTTP, the default
      charset value is still &quot;us-ascii&quot;.  (Note: There is an
      inconsistency between this specification and HTTP/1.1, which uses
      ISO-8859-1[ISO8859] as the default for a historical reason.  Since
      XML is a new format, a new default should be chosen for better
      I18N.  US-ASCII was chosen, since it is the intersection of UTF-8
      and ISO-8859-1 and since it is already used by MIME.)

So it looks like other browser engines violate the spec in a different way. Us inheriting the default charset from the page is at least consistent with how other text/ subresources are handled.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682749</commentid>
    <comment_count>5</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2020-08-25 12:37:28 -0700</bug_when>
    <thetext>Unless I&apos;m missing something, https://www.w3.org/TR/REC-xml/#charencoding does not refer to RFC 3023 at all.

That said, what would be relevant is the *current* definition of the text/xml media type, which is RFC 7303.

Also, it seems you missed the normative text in &lt;https://www.w3.org/TR/REC-xml/#charencoding&gt;:

&quot;Though an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see 4.3.1 The Text Declaration) containing an encoding declaration: (...)&quot;

Note the last sentence; if there is no external character encoding information, the default is UTF-8 or UTF-16, nothing else.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682761</commentid>
    <comment_count>6</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2020-08-25 13:03:17 -0700</bug_when>
    <thetext>Wrong copy/paste, I wanted to say that https://www.w3.org/TR/REC-xml/#sec-guessing referred to RFC 3023.

My understanding of the specs&apos; language is that anything loaded via http falls into &quot;has external character encoding information&quot; case, even when there is no charset in http headers - this just means that external information is taken as default for http.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682957</commentid>
    <comment_count>7</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2020-08-26 03:57:16 -0700</bug_when>
    <thetext>...but there is no default in HTTP.

(there was in RFC 2616, but that was removed in RFC 723* with good reasons)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1682963</commentid>
    <comment_count>8</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2020-08-26 04:24:44 -0700</bug_when>
    <thetext>Link: https://greenbytes.de/tech/webdav/rfc7231.html#rfc.section.B.p.4</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1684110</commentid>
    <comment_count>9</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2020-08-31 04:10:16 -0700</bug_when>
    <thetext>&lt;rdar://problem/68065097&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2000373</commentid>
    <comment_count>10</comment_count>
    <who name="Anne van Kesteren">annevk</who>
    <bug_when>2023-12-18 04:51:42 -0800</bug_when>
    <thetext>This appears to have been fixed.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>