Bug 58123

Summary:

Document source preload scanned repeatedly

Product:

WebKit

Reporter:

Antti Koivisto <koivisto>

Component:

Page Loading

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, tonyg

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

OS X 10.5

Attachments:

Description	Flags
patch	tonyg: review+

Antti Koivisto

Reported 2011-04-08 01:36:21 PDT

Clearing the preload scanner after parsing resumes (from http://trac.webkit.org/changeset/63154) void HTMLDocumentParser::resumeParsingAfterScriptExecution() { ASSERT(!inScriptExecution()); ASSERT(!m_treeBuilder->isPaused()); m_preloadScanner.clear() ... makes us rescan the document text multiple times if the parser gets blocked repeatedly during loading. For example when reloading nytimes.com, we scan almost the entire 120kb main document 5 times, causing us to spent way more time in the preload scanner than the main tokenizer. This happens because when the scanner is cleared we lose information about how far we have already scanned. When the parser gets blocked again, the preload scanner is re-constructed and the scanning starts from the current position of the main parser: void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode) { ... if (isWaitingForScripts()) { ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState); if (!m_preloadScanner) { m_preloadScanner.set(new HTMLPreloadScanner(document())); m_preloadScanner->appendToEnd(m_input.current()); } m_preloadScanner->scan(); ... If the blocking happens early in the document (as it often does) most of the document text gets re-scanned.

Attachments
patch (3.33 KB, patch) 2011-04-11 06:47 PDT, Antti Koivisto	tonyg: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Antti Koivisto

Comment 1 2011-04-11 06:47:48 PDT

Created attachment 89001 [details] patch

Tony Gentilcore

Comment 2 2011-04-11 09:30:37 PDT

The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614

Antti Koivisto

Comment 3 2011-04-11 10:39:06 PDT

(In reply to comment #2) > The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614 It is kinda hard since you need to run against a (throttled) web server to trigger the preload scanner, need a huge test content to make tokenization time observable and need to construct measurement so that the load throttling itself won't dominate. I don't think it would be worth the effort. On the other hand the issue is easily observable with a sampling profiler on real world web sites. We would probably notice if it regressed.

Antti Koivisto

Comment 4 2011-04-11 11:23:01 PDT

http://trac.webkit.org/changeset/83456

Antti Koivisto

Comment 5 2011-04-11 12:08:37 PDT

Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.

Tony Gentilcore

Comment 6 2011-04-11 12:59:34 PDT

(In reply to comment #5) > Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished. Oh, good point. Are you putting together a follow-up?

Note You need to log in before you can comment on or make changes to this bug.