57376 – HTMLDocumentParser should reuse tokens from HTMLPreloadScanner

RESOLVED DUPLICATE of bug 106127 57376

HTMLDocumentParser should reuse tokens from HTMLPreloadScanner

https://bugs.webkit.org/show_bug.cgi?id=57376

Summary HTMLDocumentParser should reuse tokens from HTMLPreloadScanner

Tony Gentilcore

Reported 2011-03-29 12:07:41 PDT

See FIXME in HTMLPreloadScanner::scan(). Currently we generate tokens during preloading then discard them and retokenize during parsing. We should be able to save them and reuse them if the input steam isn't modified by the script (doc.write). Eric, I'm thinking of picking this up. Do you have any high-level implementation thoughts?

Attachments
Add attachment proposed patch, testcase, etc.

Tony Gentilcore

Comment 1 2011-03-29 12:18:36 PDT

I'm thinking of making a HTMLTokenSegmentedString class which subclasses the SegmentedString used by the parser and maintains a cache of tokens in the stream. The tokenizer can just return the next cache token if it exists and the htmltokensegmentedstring will drop its cache when anything is inserted into the string.

Adam Barth

Comment 2 2011-03-29 13:25:17 PDT

You're going to have a lot better luck saving AtomicHTMLTokens. They're way smaller in most cases. We mostly just need a way to buffer them and to detect when to discard the buffer because something changed.

Adam Barth

Comment 3 2011-03-29 13:36:53 PDT

I'm not sure a subclass is needed. You're probably better off making an object that composites in the SegmentedString, like we do in HTMLInputStream. Actually, maybe you should just make HTMLInputStream smarter. :)

Balazs Kelemen

Comment 4 2011-07-13 07:46:38 PDT

*** Bug 64369 has been marked as a duplicate of this bug. ***

Balazs Kelemen

Comment 5 2011-07-13 08:03:21 PDT

There is a problem with reusing the tokens from the scanner. The parsing algorithm requires the tree builder to participate in the tokenizing process at a few point like: http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp#L464 http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp#L809 http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp#L833 (There is a a few more.) This means we cannot produce the correct token stream without the tree builder. This could be solved by creating a mock tree builder just for guiding the tokenizer but it would make preloading more costly. On the other hand we are running the scanner while we are waiting for the network so maybe it could be worthwile.

Adam Barth

Comment 6 2011-07-13 10:18:51 PDT

If those cases are rare, we could invalidate the token stream.

Eric Seidel (no email)

Comment 7 2013-03-05 02:02:10 PST

This is done as part of the threaded parser. See bug 106127. I'm not sure we want to bother trying to do this on the main-thread parser. Although maybe it would make things nice to use CompactHTMLToken even on the main thread. :)

Eric Seidel (no email)

Comment 8 2013-03-05 02:05:06 PST

I don't think we plan to do this for the main-thread parser. Closing as a dupe of bug 106127 for now. Feel free to re-open. *** This bug has been marked as a duplicate of bug 106127 ***

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution DUPLICATE

of bug 106127

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component WebCore Misc.

Assignee

Nobody

Reported

2011-03-29 12:07 PDT

Modified

2013-03-05 02:05 PST History

CC List

9 users Show

URL

Keywords

Duplicates (1)

64369 View as bug list

Depends on

Blocks

106127

Dependencies

tree graph