Bug 16621

Summary: WebKit ignores encoding description in invalid HTML if it's too far from the start
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: NEW    
Severity: Normal CC: ahmad.saleem792, darin, ddkilzer, ian, jshin, mrowe
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   

Alexey Proskuryakov
Reported 2007-12-27 00:36:02 PST
From bug 12526 comment 3. Our heuristic for <meta> charset declarations differs from what Firefox does, and what is documented in HTML5. Namely, we do not check for <meta> during normal parsing and re-start parsing if the charset changes late in the game. We only pre-parse the first 512 bytes of the document, or the whole <head>, whichever is larger. This is usually enough, but we know of pages that aren't decoded correctly because of this difference. The following two pages have a very long script (~ 10kB) at the beginning, and charset declaration in <meta> is not honored. http://db66.vnet.cn/ http://www.ddm.com/event/event84.asp?code=-548 Restarting parsing at any point is a big can of worms though - e.g., some scripts with side effects may run twice because of that.
Attachments
Mark Rowe (bdash)
Comment 1 2007-12-27 01:58:44 PST
Is the handling of scripts when reparsing discussed in the HTML5 specification? Is that something which should be documented in the spec?
Alexey Proskuryakov
Comment 2 2007-12-27 02:28:56 PST
Ian 'Hixie' Hickson
Comment 3 2008-01-08 18:37:00 PST
(basically, HTML5 requires that the scripts run twice.)
Alexey Proskuryakov
Comment 4 2024-06-01 17:10:52 PDT
*** Bug 275017 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.