|Summary:||WebKit ignores encoding description in invalid HTML if it's too far from the start|
|Product:||WebKit||Reporter:||Alexey Proskuryakov <ap>|
|Component:||Page Loading||Assignee:||Nobody <webkit-unassigned>|
|Severity:||Normal||CC:||darin, ddkilzer, ian, jshin, mrowe|
|Version:||528+ (Nightly build)|
|OS:||OS X 10.4|
Description Alexey Proskuryakov 2007-12-27 00:36:02 PST
From bug 12526 comment 3. Our heuristic for <meta> charset declarations differs from what Firefox does, and what is documented in HTML5. Namely, we do not check for <meta> during normal parsing and re-start parsing if the charset changes late in the game. We only pre-parse the first 512 bytes of the document, or the whole <head>, whichever is larger. This is usually enough, but we know of pages that aren't decoded correctly because of this difference. The following two pages have a very long script (~ 10kB) at the beginning, and charset declaration in <meta> is not honored. http://db66.vnet.cn/ http://www.ddm.com/event/event84.asp?code=-548 Restarting parsing at any point is a big can of worms though - e.g., some scripts with side effects may run twice because of that.
Comment 1 Mark Rowe (bdash) 2007-12-27 01:58:44 PST
Is the handling of scripts when reparsing discussed in the HTML5 specification? Is that something which should be documented in the spec?
Comment 2 Alexey Proskuryakov 2007-12-27 02:28:56 PST
Comment 3 Ian 'Hixie' Hickson 2008-01-08 18:37:00 PST
(basically, HTML5 requires that the scripts run twice.)