<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>54582</bug_id>
          
          <creation_ts>2011-02-16 13:53:58 -0800</creation_ts>
          <short_desc>REGRESSION (r73756 ): Some page content not displaying correctly at pay.37wan.com</short_desc>
          <delta_ts>2011-02-16 16:34:26 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>WebKit Misc.</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          <dependson>47397</dependson>
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Adele Peterson">adele</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>abarth</cc>
    
    <cc>aestes</cc>
    
    <cc>ap</cc>
    
    <cc>dimich</cc>
    
    <cc>ian</cc>
    
    <cc>jennb</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>352311</commentid>
    <comment_count>0</comment_count>
      <attachid>82686</attachid>
    <who name="Adele Peterson">adele</who>
    <bug_when>2011-02-16 13:53:58 -0800</bug_when>
    <thetext>Created attachment 82686
test

Steps to reproduce:

1. Navigate to http://www.37wan.com/.
2. Enter username into top text field
3. Enter password into bottom text field
4. Press login button (left orange button).
5. Press index button (Middle yellow button within orange box).
--Result: Most of the text on the page is character garbage.

If you don&apos;t have a login, please see the attached reduction, which is the following markup:

&lt;meta http-equiv=&quot;Content-Type&quot; contet=&quot;text/html; charset=UTF-8&quot; /&gt;
充值中心| webgame-37wan网页游戏平台

So this site has a typo with &quot;contet&quot;.  If this works with all shipping browsers, we should probably send feedback to HTML WG, and make a fix.

&lt;rdar://problem/9006151&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352315</commentid>
    <comment_count>1</comment_count>
    <who name="Adele Peterson">adele</who>
    <bug_when>2011-02-16 13:56:45 -0800</bug_when>
    <thetext>This is the change that caused the regression:

http://trac.webkit.org/changeset/73756
TextResourceDecoder::checkForHeadCharset can look way past the limit.
 https://bugs.webkit.org/show_bug.cgi?id=47397</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352321</commentid>
    <comment_count>2</comment_count>
    <who name="Andy Estes">aestes</who>
    <bug_when>2011-02-16 13:59:47 -0800</bug_when>
    <thetext>Note that this test case isn&apos;t quite right since bugzilla is serving it with a correct Content-Type header (text/html; charset=UTF-8). Saving the testcase and opening it locally should do the trick.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352372</commentid>
    <comment_count>3</comment_count>
    <who name="Jenn Braithwaite">jennb</who>
    <bug_when>2011-02-16 14:43:46 -0800</bug_when>
    <thetext>(In reply to comment #1)
&gt; This is the change that caused the regression:
&gt; 
&gt; http://trac.webkit.org/changeset/73756
&gt; TextResourceDecoder::checkForHeadCharset can look way past the limit.
&gt;  https://bugs.webkit.org/show_bug.cgi?id=47397

Prior to this change, browsers were extremely lenient with charset specification syntax.  Something like:

&lt;meta http-equiv=&quot;Content-Type&quot; foobar=&quot;notacharset=UTF-8&quot; /&gt;
充值中心| webgame-37wan网页游戏平台

would still work.  

As the goal of the change was to make charset detection more precise, I recommend not going backwards even if the above (and the test case for this bug) work with all shipping browsers.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352390</commentid>
    <comment_count>4</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2011-02-16 14:59:27 -0800</bug_when>
    <thetext>It&apos;s extremely easy to make a typo like the one here - and if it used to work in all browsers, breaking that would be unfortunate.

Generally speaking, making parsing more strict is rarely helpful.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352415</commentid>
    <comment_count>5</comment_count>
    <who name="Jenn Braithwaite">jennb</who>
    <bug_when>2011-02-16 15:38:04 -0800</bug_when>
    <thetext>A small deviation from the spec would be if &quot;http-equiv=&apos;Content-Type&apos; is seen in a meta tag, extract &quot;charset=xxx&quot; from other attributes regardless of attribute name.

If this sounds acceptable, I&apos;ll make the change.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352416</commentid>
    <comment_count>6</comment_count>
    <who name="Adam Barth">abarth</who>
    <bug_when>2011-02-16 15:39:22 -0800</bug_when>
    <thetext>(In reply to comment #5)
&gt; A small deviation from the spec would be if &quot;http-equiv=&apos;Content-Type&apos; is seen in a meta tag, extract &quot;charset=xxx&quot; from other attributes regardless of attribute name.
&gt; 
&gt; If this sounds acceptable, I&apos;ll make the change.

You should probably grab Hixie on #whatwg and touch base with him before implementing that change.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352428</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-02-16 16:04:24 -0800</bug_when>
    <thetext>From what I can tell, this would be a wrong change. In particular, if it was correct, I&apos;d expect all the browsers to get a &quot;fail&quot; (1252) on this test case, but every browser I tested gets a &quot;pass&quot; (1254):
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/142.html

(To control for declaration order, I also have this test:
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/143.html
...which seems to indicate the issue is not that browsers are just using the last one.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>352448</commentid>
    <comment_count>8</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-02-16 16:31:53 -0800</bug_when>
    <thetext>Looks like IE8, FF4, Opera 11, and the next versions of WebKit and Chrome will all break this page in the same way. I would vote to leave it broken. The alternative is making the parsing of charset decls way more complicated and breaking away from IE8 compat on this issue, which seems suboptimal.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>82686</attachid>
            <date>2011-02-16 13:53:58 -0800</date>
            <delta_ts>2011-02-16 13:53:58 -0800</delta_ts>
            <desc>test</desc>
            <filename>37wan.html</filename>
            <type>text/html</type>
            <size>115</size>
            <attacher name="Adele Peterson">adele</attacher>
            
              <data encoding="base64">PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZXQ9InRleHQvaHRtbDsgY2hhcnNl
dD1VVEYtOCIgLz4K5YWF5YC85Lit5b+DfCB3ZWJnYW1lLTM3d2Fu572R6aG15ri45oiP5bmz5Y+w
Cg==
</data>

          </attachment>
      

    </bug>

</bugzilla>