From bug 12526 comment 3. Our heuristic for <meta> charset declarations differs from what Firefox does, and what is documented in HTML5. Namely, we do not check for <meta> during normal parsing and re-start parsing if the charset changes late in the game. We only pre-parse the first 512 bytes of the document, or the whole <head>, whichever is larger. This is usually enough, but we know of pages that aren't decoded correctly because of this difference. The following two pages have a very long script (~ 10kB) at the beginning, and charset declaration in <meta> is not honored. http://db66.vnet.cn/ http://www.ddm.com/event/event84.asp?code=-548 Restarting parsing at any point is a big can of worms though - e.g., some scripts with side effects may run twice because of that.
Is the handling of scripts when reparsing discussed in the HTML5 specification? Is that something which should be documented in the spec?
See <http://www.whatwg.org/specs/web-apps/current-work/#change>.
(basically, HTML5 requires that the scripts run twice.)