<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>3809</bug_id>
          
          <creation_ts>2005-07-02 04:33:37 -0700</creation_ts>
          <short_desc>Should default to UTF-8 or UTF-16 for application/xml documents with omitted charset and encoding declaration</short_desc>
          <delta_ts>2019-02-06 09:04:18 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>DOM</component>
          <version>312.x</version>
          <rep_platform>Mac</rep_platform>
          <op_sys>OS X 10.3</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://hsivonen.iki.fi/test/mobile/latin.xhtml</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Major</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Henri Sivonen">hsivonen</reporter>
          <assigned_to name="Darin Adler">darin</assigned_to>
          <cc>ap</cc>
    
    <cc>cdumez</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>13672</commentid>
    <comment_count>0</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2005-07-02 04:33:38 -0700</bug_when>
    <thetext>Steps to reproduce:
1) Make Safari load (either in content area or through XMLHttpRequest) an XML
document that 
  does not have an XML declaration that declares the character encoding 
  AND
  does not have a BOM 
  AND
  is encoded in UTF-8 
  AND
  contains characters from outside the ASCII range
  AND
  is served as either application/xml or application/xhtml+xml
  AND
  has no charset parameter on the HTTP layer.

(Although the above looks very specific, the conditions commonly hold true.)

2) Observe.

Actual results:
The bytes are decoded as characters according to the Default Encoding in
Appearance preferences.

Expected results:
Expected the bytes to be decoded as characters according to UTF-8 as per section
3.2 of RFC 3023, which defers to XML 1.0 section 4.3.3.

Additional information:
Besides the obvious implications of this bug, there are two less obvious
implications:
1) Safari cannot properly consume Canonical XML.
2) Safari cannot properly consume XML documents it has produced itself via
XMLHttpRequest POST!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>15041</commentid>
    <comment_count>1</comment_count>
    <who name="Oliver Hunt">oliver</who>
    <bug_when>2005-07-21 16:26:05 -0700</bug_when>
    <thetext>Would you be able to attach a test document,
cheers,
  Oliver</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19157</commentid>
    <comment_count>2</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2005-09-09 01:14:22 -0700</bug_when>
    <thetext>What reduction is needed beyond the case that has been in the URL field all along?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19158</commentid>
    <comment_count>3</comment_count>
    <who name="Oliver Hunt">oliver</who>
    <bug_when>2005-09-09 01:25:10 -0700</bug_when>
    <thetext>Behaviour is wrong (confirmed against ffx)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19232</commentid>
    <comment_count>4</comment_count>
      <attachid>3827</attachid>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2005-09-09 12:49:23 -0700</bug_when>
    <thetext>Created attachment 3827
proposed patch

Well, the XML spec is pretty explicit about files that do not have an encoding
declaration in the text declaration - they should be UTF-8 or UTF-16, unless a
higher-level protocol defines a charset (4.3.3).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19233</commentid>
    <comment_count>5</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2005-09-09 12:50:57 -0700</bug_when>
    <thetext>The file from bug URL can serve as a test case (without a link to the next test, of course).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19244</commentid>
    <comment_count>6</comment_count>
      <attachid>3827</attachid>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-09-09 15:36:48 -0700</bug_when>
    <thetext>Comment on attachment 3827
proposed patch

Is there any other browser that has this behavior? The comments above lead me
to believe this is not working this way in Firefox.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19254</commentid>
    <comment_count>7</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2005-09-09 23:55:57 -0700</bug_when>
    <thetext>Gecko used to have this same bug (at least in content area--not sure about
XMLHttpRequest), but it has been fixed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19273</commentid>
    <comment_count>8</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2005-09-10 03:22:28 -0700</bug_when>
    <thetext>Henri, which Gecko bugfix are you referring to? I see that Firefox 1.0.5 renders the test as expected, but I 
couldn&apos;t find anything in Bugzilla.

I found &lt;https://bugzilla.mozilla.org/show_bug.cgi?id=247024&gt;, but it talks about a different issue: 
documents transferred with MIME type text/xml should default to us-ascii, not utf-8. I&apos;m not sure if 
WebKit has the same problem, but if it has, that should be in a separate report IMO.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19430</commentid>
    <comment_count>9</comment_count>
      <attachid>3827</attachid>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-09-11 21:57:43 -0700</bug_when>
    <thetext>Comment on attachment 3827
proposed patch

I thought about it a lot, and I think it&apos;s fine to land the fix just like this.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1503145</commentid>
    <comment_count>10</comment_count>
    <who name="Lucas Forschler">lforschler</who>
    <bug_when>2019-02-06 09:04:18 -0800</bug_when>
    <thetext>Mass moving XML DOM bugs to the &quot;DOM&quot; Component.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>3827</attachid>
            <date>2005-09-09 12:49:23 -0700</date>
            <delta_ts>2005-09-11 21:57:43 -0700</delta_ts>
            <desc>proposed patch</desc>
            <filename>xmlnocharset.txt</filename>
            <type>text/plain</type>
            <size>741</size>
            <attacher name="Alexey Proskuryakov">ap</attacher>
            
              <data encoding="base64">SW5kZXg6IGRlY29kZXIuY3BwCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KUkNTIGZpbGU6IC9jdnMvcm9vdC9XZWJDb3Jl
L2todG1sL21pc2MvZGVjb2Rlci5jcHAsdgpyZXRyaWV2aW5nIHJldmlzaW9uIDEuNDIKZGlmZiAt
cCAtdSAtcjEuNDIgZGVjb2Rlci5jcHAKLS0tIGRlY29kZXIuY3BwCTEgU2VwIDIwMDUgMDU6Mzc6
MTcgLTAwMDAJMS40MgorKysgZGVjb2Rlci5jcHAJOSBTZXAgMjAwNSAxOTo0NjoxMSAtMDAwMApA
QCAtNTIzLDYgKzUyMyw5IEBAIFFTdHJpbmcgRGVjb2Rlcjo6ZGVjb2RlKGNvbnN0IGNoYXIgKmRh
dGEKICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzZXRFbmNvZGluZyhzdHIubWlkKHBvcywg
bGVuKSwgRW5jb2RpbmdGcm9tWE1MSGVhZGVyKTsKICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICBpZiAobV90eXBlID09IEVuY29kaW5nRnJvbVhNTEhlYWRlcikKICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgZ290byBmb3VuZDsKKyAgICAgICAgICAgICAgICAgICAgICAgIH0gZWxz
ZSB7CisgICAgICAgICAgICAgICAgICAgICAgICAgICAgc2V0RW5jb2RpbmcoIlVURi04IiwgRW5j
b2RpbmdGcm9tWE1MSGVhZGVyKTsKKyAgICAgICAgICAgICAgICAgICAgICAgICAgICBnb3RvIGZv
dW5kOwogICAgICAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgICAgICB9CiAK
</data>
<flag name="review"
          id="551"
          type_id="1"
          status="+"
          setter="darin"
    />
          </attachment>
      

    </bug>

</bugzilla>