RESOLVED FIXED Bug 58123
Document source preload scanned repeatedly
https://bugs.webkit.org/show_bug.cgi?id=58123
Summary Document source preload scanned repeatedly
Antti Koivisto
Reported 2011-04-08 01:36:21 PDT
Clearing the preload scanner after parsing resumes (from http://trac.webkit.org/changeset/63154) void HTMLDocumentParser::resumeParsingAfterScriptExecution() { ASSERT(!inScriptExecution()); ASSERT(!m_treeBuilder->isPaused()); m_preloadScanner.clear() ... makes us rescan the document text multiple times if the parser gets blocked repeatedly during loading. For example when reloading nytimes.com, we scan almost the entire 120kb main document 5 times, causing us to spent way more time in the preload scanner than the main tokenizer. This happens because when the scanner is cleared we lose information about how far we have already scanned. When the parser gets blocked again, the preload scanner is re-constructed and the scanning starts from the current position of the main parser: void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode) { ... if (isWaitingForScripts()) { ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState); if (!m_preloadScanner) { m_preloadScanner.set(new HTMLPreloadScanner(document())); m_preloadScanner->appendToEnd(m_input.current()); } m_preloadScanner->scan(); ... If the blocking happens early in the document (as it often does) most of the document text gets re-scanned.
Attachments
patch (3.33 KB, patch)
2011-04-11 06:47 PDT, Antti Koivisto
tonyg: review+
Antti Koivisto
Comment 1 2011-04-11 06:47:48 PDT
Tony Gentilcore
Comment 2 2011-04-11 09:30:37 PDT
The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614
Antti Koivisto
Comment 3 2011-04-11 10:39:06 PDT
(In reply to comment #2) > The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614 It is kinda hard since you need to run against a (throttled) web server to trigger the preload scanner, need a huge test content to make tokenization time observable and need to construct measurement so that the load throttling itself won't dominate. I don't think it would be worth the effort. On the other hand the issue is easily observable with a sampling profiler on real world web sites. We would probably notice if it regressed.
Antti Koivisto
Comment 4 2011-04-11 11:23:01 PDT
Antti Koivisto
Comment 5 2011-04-11 12:08:37 PDT
Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.
Tony Gentilcore
Comment 6 2011-04-11 12:59:34 PDT
(In reply to comment #5) > Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished. Oh, good point. Are you putting together a follow-up?
Note You need to log in before you can comment on or make changes to this bug.