<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>44039</bug_id>
          
          <creation_ts>2010-08-15 15:59:33 -0700</creation_ts>
          <short_desc>c1 control codes shouldn&apos;t be interpreted as microsoft characters</short_desc>
          <delta_ts>2010-08-16 09:10:37 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>DOM</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc>http://bugs.debian.org/592884</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>0</everconfirmed>
          <reporter name="Michael Gilbert">michael.s.gilbert</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ap</cc>
    
    <cc>vincent-webkit</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>264897</commentid>
    <comment_count>0</comment_count>
    <who name="Michael Gilbert">michael.s.gilbert</who>
    <bug_when>2010-08-15 15:59:33 -0700</bug_when>
    <thetext>hi, a debian user submitted the following bug:

  According to the W3C[*], C1 control codes such as U+0080 should not
  be interpreted as Microsoft characters. In the attached testcase,
  WebKit (tested with /usr/lib/webkit-1.0-2/libexec/GtkLauncher)
  renders U+0080 as a Euro symbol, which is incorrect.

see linked debian bug for more info.

thanks,
mike</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>265025</commentid>
    <comment_count>1</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2010-08-16 05:02:45 -0700</bug_when>
    <thetext>The test case attached to the Debian bug contains &amp;#x80; in a text/html file (which also has ignored XHTML-style incantations inside). Handling of these is defined in HTML5 section 10.2.4.70 Tokenizing character references:

---------------------
If that number is one of the numbers in the first column of the following table, then this is a parse error. Find the row with that number in the first column, and return a character token for the Unicode character given in the second column of that row.

Number	Unicode character
0x00	U+FFFD	REPLACEMENT CHARACTER
0x0D	U+000D	CARRIAGE RETURN (CR)
0x80	U+20AC	EURO SIGN (€)
&lt;...&gt;
---------------------

Obviously, we match the HTML5 spec here.

If iI re-save this test case as an XHTML file, the character reference is interpreted as U+0080.

WebKit correctly implements the relevant specifications, and matches other browsers (I only tested Firefox 3.6.8 this time, but my recollection is that IE does the same). The FAQ is obsolete.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>265135</commentid>
    <comment_count>2</comment_count>
    <who name="Vincent Lefevre">vincent-webkit</who>
    <bug_when>2010-08-16 09:10:37 -0700</bug_when>
    <thetext>OK, thanks for the information. I&apos;ve just sent a comment to the W3C about the FAQ (using the &quot;Send us a comment&quot; link).</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>