Bug 64369

Summary: PreloadScanner should reuse tokens from the parser
Product: WebKit Reporter: Balazs Kelemen <kbalazs>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: abarth, loki, tonyg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
WIP patch none

Description Balazs Kelemen 2011-07-12 10:47:10 PDT
... and not retokenize the input.
Comment 1 Balazs Kelemen 2011-07-12 11:09:03 PDT
Created attachment 100524 [details]
WIP patch

This is a work in progress patch. I welcome your feedback.
Comment 2 Balazs Kelemen 2011-07-12 11:13:30 PDT
Performance results on PerformanceTests/Parser/html-parser.html are not really encouraging.
Reference:
avg 1736
median 1735.5
stdev 6.957010852370434
min 1717
max 1748

Patched:
avg 1881.35
median 1882
stdev 4.33906671992953
min 1865
max 1886
Comment 3 Balazs Kelemen 2011-07-12 11:15:32 PDT
The idea for this work is came from https://bugs.webkit.org/show_bug.cgi?id=63531.
Comment 4 Adam Barth 2011-07-12 11:44:13 PDT
You might have better luck caching AtomicHTMLTokens.
Comment 5 Balazs Kelemen 2011-07-13 07:09:24 PDT
After some investigation I think the whole idea is wrong. I thought that the preload scanner scanning the input again after the parser but now it seems to be it does not. The preload scanner only scans text that the parser has not seen yet. In this case it is useless to feed it with "old" tokens. Adam, could you validate this for me? Thanks.
Comment 6 Tony Gentilcore 2011-07-13 07:18:45 PDT
(In reply to comment #5)
> After some investigation I think the whole idea is wrong. I thought that the preload scanner scanning the input again after the parser but now it seems to be it does not. The preload scanner only scans text that the parser has not seen yet. In this case it is useless to feed it with "old" tokens. Adam, could you validate this for me? Thanks.

It is the other way around. The preload scanner should feed its tokens to the parser. There is no case where it would make sense for the parser to feed tokens to the preload scanner.

For example, consider:
1. <img/>
2. <script src="slow.js"></script>
3. <img/>

The Parser would tokenize #1 and #2, then the parser is forced to pause while the script downloads. In the mean time, the PreloadScanner would tokenize #3. After the script runs, the parser would have to retokenize #3 unless we save the work done by the preload scanner.

There is a little more information in this bug:
https://bugs.webkit.org/show_bug.cgi?id=57376

We should dupe one of them.
Comment 7 Balazs Kelemen 2011-07-13 07:46:38 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > After some investigation I think the whole idea is wrong. I thought that the preload scanner scanning the input again after the parser but now it seems to be it does not. The preload scanner only scans text that the parser has not seen yet. In this case it is useless to feed it with "old" tokens. Adam, could you validate this for me? Thanks.
> 
> It is the other way around. The preload scanner should feed its tokens to the parser. There is no case where it would make sense for the parser to feed tokens to the preload scanner.
> 
> For example, consider:
> 1. <img/>
> 2. <script src="slow.js"></script>
> 3. <img/>
> 
> The Parser would tokenize #1 and #2, then the parser is forced to pause while the script downloads. In the mean time, the PreloadScanner would tokenize #3. After the script runs, the parser would have to retokenize #3 unless we save the work done by the preload scanner.

Thanks for the clarificatoin.

> 
> There is a little more information in this bug:
> https://bugs.webkit.org/show_bug.cgi?id=57376
> 
> We should dupe one of them.

Yep, I am going to mark this one as a duplicate.

*** This bug has been marked as a duplicate of bug 57376 ***