<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>16621</bug_id>
          
          <creation_ts>2007-12-27 00:36:02 -0800</creation_ts>
          <short_desc>WebKit ignores encoding description in invalid HTML if it&apos;s too far from the start</short_desc>
          <delta_ts>2024-06-01 17:10:52 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Page Loading</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>Mac</rep_platform>
          <op_sys>OS X 10.4</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Alexey Proskuryakov">ap</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>ahmad.saleem792</cc>
    
    <cc>darin</cc>
    
    <cc>ddkilzer</cc>
    
    <cc>ian</cc>
    
    <cc>jshin</cc>
    
    <cc>mrowe</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>65350</commentid>
    <comment_count>0</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2007-12-27 00:36:02 -0800</bug_when>
    <thetext>From bug 12526 comment 3.

Our heuristic for &lt;meta&gt; charset declarations differs from what Firefox does, and what is documented in HTML5. Namely, we do not check for &lt;meta&gt; during normal parsing and re-start parsing if the charset changes late in the game. We only pre-parse the first 512 bytes of the document, or the whole &lt;head&gt;, whichever is larger. This is usually enough, but we know of pages that aren&apos;t decoded correctly because of this difference.

The following two pages have a very long script (~ 10kB) at the beginning, and
charset declaration in &lt;meta&gt; is not honored. 

http://db66.vnet.cn/
http://www.ddm.com/event/event84.asp?code=-548

Restarting parsing at any point is a big can of worms though - e.g., some scripts with side effects may run twice because of that.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>65372</commentid>
    <comment_count>1</comment_count>
    <who name="Mark Rowe (bdash)">mrowe</who>
    <bug_when>2007-12-27 01:58:44 -0800</bug_when>
    <thetext>Is the handling of scripts when reparsing discussed in the HTML5 specification?  Is that something which should be documented in the spec?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>65384</commentid>
    <comment_count>2</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2007-12-27 02:28:56 -0800</bug_when>
    <thetext>See &lt;http://www.whatwg.org/specs/web-apps/current-work/#change&gt;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66602</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2008-01-08 18:37:00 -0800</bug_when>
    <thetext>(basically, HTML5 requires that the scripts run twice.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2039286</commentid>
    <comment_count>4</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2024-06-01 17:10:52 -0700</bug_when>
    <thetext>*** Bug 275017 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>