Bug 16621 - WebKit ignores encoding description in invalid HTML if it's too far from the start
Summary: WebKit ignores encoding description in invalid HTML if it's too far from the ...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Macintosh OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-27 00:36 PST by Alexey Proskuryakov
Modified: 2008-01-08 18:37 PST (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2007-12-27 00:36:02 PST
From bug 12526 comment 3.

Our heuristic for <meta> charset declarations differs from what Firefox does, and what is documented in HTML5. Namely, we do not check for <meta> during normal parsing and re-start parsing if the charset changes late in the game. We only pre-parse the first 512 bytes of the document, or the whole <head>, whichever is larger. This is usually enough, but we know of pages that aren't decoded correctly because of this difference.

The following two pages have a very long script (~ 10kB) at the beginning, and
charset declaration in <meta> is not honored. 

http://db66.vnet.cn/
http://www.ddm.com/event/event84.asp?code=-548

Restarting parsing at any point is a big can of worms though - e.g., some scripts with side effects may run twice because of that.
Comment 1 Mark Rowe (bdash) 2007-12-27 01:58:44 PST
Is the handling of scripts when reparsing discussed in the HTML5 specification?  Is that something which should be documented in the spec?
Comment 2 Alexey Proskuryakov 2007-12-27 02:28:56 PST
See <http://www.whatwg.org/specs/web-apps/current-work/#change>.
Comment 3 Ian 'Hixie' Hickson 2008-01-08 18:37:00 PST
(basically, HTML5 requires that the scripts run twice.)