Bug 58123

Summary: Document source preload scanned repeatedly
Product: WebKit Reporter: Antti Koivisto <koivisto>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, tonyg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
patch tonyg: review+

Description Antti Koivisto 2011-04-08 01:36:21 PDT
Clearing the preload scanner after parsing resumes (from http://trac.webkit.org/changeset/63154)
 
void HTMLDocumentParser::resumeParsingAfterScriptExecution()
{
    ASSERT(!inScriptExecution());
    ASSERT(!m_treeBuilder->isPaused());

    m_preloadScanner.clear()
...

makes us rescan the document text multiple times if the parser gets blocked repeatedly during loading. For example when reloading nytimes.com, we scan almost the entire 120kb main document 5 times, causing us to spent way more time in the preload scanner than the main tokenizer.

This happens because when the scanner is cleared we lose information about how far we have already scanned. When the parser gets blocked again, the preload scanner is re-constructed and the scanning starts from the current position of the main parser:

void HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
{
...
    if (isWaitingForScripts()) {
        ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState);
        if (!m_preloadScanner) {
            m_preloadScanner.set(new HTMLPreloadScanner(document()));
            m_preloadScanner->appendToEnd(m_input.current());
        }
        m_preloadScanner->scan();
...

If the blocking happens early in the document (as it often does) most of the document text gets re-scanned.
Comment 1 Antti Koivisto 2011-04-11 06:47:48 PDT
Created attachment 89001 [details]
patch
Comment 2 Tony Gentilcore 2011-04-11 09:30:37 PDT
The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614
Comment 3 Antti Koivisto 2011-04-11 10:39:06 PDT
(In reply to comment #2)
> The fix looks good, but I'm slightly sad that we can't add a test to make sure this doesn't regress. I can't think of any directly observable effects. Do you think it would make any sense to try to add an order of magnitude perf test? http://trac.webkit.org/changeset/65614

It is kinda hard since you need to run against a (throttled) web server to trigger the preload scanner, need a huge test content to make tokenization time observable and need to construct measurement so that the load throttling itself won't dominate. I don't think it would be worth the effort.

On the other hand the issue is easily observable with a sampling profiler on real world web sites. We would probably notice if it regressed.
Comment 4 Antti Koivisto 2011-04-11 11:23:01 PDT
http://trac.webkit.org/changeset/83456
Comment 5 Antti Koivisto 2011-04-11 12:08:37 PDT
Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.
Comment 6 Tony Gentilcore 2011-04-11 12:59:34 PDT
(In reply to comment #5)
> Hmm, it should be relatively easy to add an assert that preload scanner has always tokenized less characters than the main parser when parsing is finished.

Oh, good point. Are you putting together a follow-up?