Bug 58123 - Document source preload scanned repeatedly
Summary: Document source preload scanned repeatedly
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-08 01:36 PDT by Antti Koivisto
Modified: 2011-04-11 12:59 PDT (History)
2 users (show)

See Also:


Attachments
patch (3.33 KB, patch)
2011-04-11 06:47 PDT, Antti Koivisto
tonyg: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antti Koivisto 2011-04-08 01:36:21 PDT
Clearing the preload scanner after parsing resumes (from http://trac.webkit.org/changeset/63154)
 
void HTMLDocumentParser::resumeParsingAfterScriptExecution()
{
    ASSERT(!inScriptExecution());
    ASSERT(!m_treeBuilder->isPaused());

    m_preloadScanner.clear()
...

makes us rescan the document text multiple times if the parser gets blocked repeatedly during loading. For example when reloading nytimes.com, we scan almost the entire 120kb main document 5 times, causing us to spent way more time in the preload scanner than the main tokenizer.

This happens because when the scanner is cleared we lose information about how far we have already scanned. When the parser gets blocked again, the preload scanner is re-constructed and the scanning starts from the current position of the main parser:

void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
{
...
    if (isWaitingForScripts()) {
        ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState);
        if (!m_preloadScanner) {
            m_preloadScanner.set(new HTMLPreloadScanner(document()));
            m_preloadScanner->appendToEnd(m_input.current());
        }
        m_preloadScanner->scan();
...

If the blocking happens early in the document (as it often does) most of the document text gets re-scanned.
Comment 1 Antti Koivisto 2011-04-11 06:47:48 PDT
Created attachment 89001 [details]
patch
Comment 2 Tony Gentilcore 2011-04-11 09:30:37 PDT
The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614
Comment 3 Antti Koivisto 2011-04-11 10:39:06 PDT
(In reply to comment #2)
> The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614

It is kinda hard since you need to run against a (throttled) web server to trigger the preload scanner, need a huge test content to make tokenization time observable and need to construct measurement so that the load throttling itself won't dominate. I don't think it would be worth the effort.

On the other hand the issue is easily observable with a sampling profiler on real world web sites. We would probably notice if it regressed.
Comment 4 Antti Koivisto 2011-04-11 11:23:01 PDT
http://trac.webkit.org/changeset/83456
Comment 5 Antti Koivisto 2011-04-11 12:08:37 PDT
Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.
Comment 6 Tony Gentilcore 2011-04-11 12:59:34 PDT
(In reply to comment #5)
> Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.

Oh, good point. Are you putting together a follow-up?