Bug 54574 - Refactor pumpTokenizer loop
Summary: Refactor pumpTokenizer loop
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Tony Gentilcore
URL:
Keywords:
Depends on:
Blocks: 54355
  Show dependency treegraph
 
Reported: 2011-02-16 11:22 PST by Tony Gentilcore
Modified: 2011-02-18 09:26 PST (History)
2 users (show)

See Also:


Attachments
Patch (8.17 KB, patch)
2011-02-16 11:43 PST, Tony Gentilcore
no flags Details | Formatted Diff | Diff
Patch (8.99 KB, patch)
2011-02-16 12:57 PST, Tony Gentilcore
no flags Details | Formatted Diff | Diff
Patch (9.40 KB, patch)
2011-02-16 17:15 PST, Tony Gentilcore
eric: review+
tonyg: commit-queue+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tony Gentilcore 2011-02-16 11:22:46 PST
Refactor pumpTokenizer loop
Comment 1 Tony Gentilcore 2011-02-16 11:43:51 PST
Created attachment 82669 [details]
Patch
Comment 2 Tony Gentilcore 2011-02-16 11:47:06 PST
Hey Eric, Please let me know what you think of this. The main problem with bug 54355 seemed to be that I was duplicating the second half of the pumpTokenizer loop in resumeParsingAfterYield() because that method didn't know whether we are resuming from a yield just before a token or just before a script. In addition to cleaning up the loop, this will make those tricky parts go away in that patch. Happy to discuss on IRC if it isn't clear how they relate.
Comment 3 Adam Barth 2011-02-16 11:57:20 PST
Comment on attachment 82669 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203
> +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session)

Is it ok to pass the PumpSession by value?

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:205
> +    // JavaScript may have stopped or detached the parser.

These comments are kind of silly.
Comment 4 Tony Gentilcore 2011-02-16 12:57:12 PST
Created attachment 82680 [details]
Patch
Comment 5 Tony Gentilcore 2011-02-16 12:57:21 PST
> Is it ok to pass the PumpSession by value?

Oops. Good catch, I mistakenly thought we were passing by val to shouldContinueParsing(), but I see now that isn't.

> > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205
> > +    // JavaScript may have stopped or detached the parser.
> 
> These comments are kind of silly.

Removed.
Comment 6 Eric Seidel (no email) 2011-02-16 13:58:57 PST
Comment on attachment 82669 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review

I think this is nicer.  It a bit confusing how deep in the callstack the actual scheduling occurs.  I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command.

>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203
>> +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session)
> 
> Is it ok to pass the PumpSession by value?

It's not. :)

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:210
> +    if (m_treeBuilder->isPaused()) {

So why is this isPaused instead of isWaitingForScripts?  I don't understand the difference.
Comment 7 Tony Gentilcore 2011-02-16 14:15:58 PST
> I think this is nicer.  It a bit confusing how deep in the callstack the actual scheduling occurs.  I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command.

What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop:

if (session.needsYield)
  m_parserScheduler->scheduleForResume();

This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse.

> > Is it ok to pass the PumpSession by value?
> 
> It's not. :)

Fixed.

> 
> > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210
> > +    if (m_treeBuilder->isPaused()) {
> 
> So why is this isPaused instead of isWaitingForScripts?  I don't understand the difference.

isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.
Comment 8 Eric Seidel (no email) 2011-02-16 14:19:03 PST
(In reply to comment #7)
> > I think this is nicer.  It a bit confusing how deep in the callstack the actual scheduling occurs.  I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command.
> 
> What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop:
> 
> if (session.needsYield)
>   m_parserScheduler->scheduleForResume();
> 
> This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse.

I like it!  Do you want to do that in a second patch or would such a re-write make this patch betteR?  Your call.

> > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210
> > > +    if (m_treeBuilder->isPaused()) {
> > 
> > So why is this isPaused instead of isWaitingForScripts?  I don't understand the difference.
> 
> isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.

Hmm..  Ok.  Maybe we should rename isPaused to isPausedWaitingForScripts. :)  I'm not sure if there are other things which cause us to pause bsides scripts.
Comment 9 Eric Seidel (no email) 2011-02-16 14:20:26 PST
(In reply to comment #8)
> (In reply to comment #7)
> > > I think this is nicer.  It a bit confusing how deep in the callstack the actual scheduling occurs.  I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command.
> > 
> > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop:
> > 
> > if (session.needsYield)
> >   m_parserScheduler->scheduleForResume();
> > 
> > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse.
> 
> I like it!  Do you want to do that in a second patch or would such a re-write make this patch betteR?  Your call.

I suspect checkForYieldBeforeToken would still return a bool to yield or not?  Not sure.
Comment 10 Tony Gentilcore 2011-02-16 14:23:43 PST
> Do you want to do that in a second patch or would such a re-write make this patch betteR?

I'd like to get any refactoring out of the way in this patch.

> I suspect checkForYieldBeforeToken would still return a bool to yield or not?  Not sure.

Let me try it out.
Comment 11 Tony Gentilcore 2011-02-16 17:15:39 PST
Created attachment 82726 [details]
Patch
Comment 12 Tony Gentilcore 2011-02-16 17:17:50 PST
Ready for another look. I think this is much cleaner, let me know what you think.

I decided not to touch the isPaused vs isWaitingForScripts thing in this change. But to answer your earlier question, the only thing that can caused the tree builder to be paused is a script.
Comment 13 Tony Gentilcore 2011-02-17 09:19:36 PST
I noticed that in the diff it is kind of hard to see how nice the tokenizer loop is now:

    PumpSession session;
    while (canTakeNextToken(mode, session) && !session.needsYield) {
        m_sourceTracker.start(m_input, m_token);
        if (!m_tokenizer->nextToken(m_input.current(), m_token))
            break;
        m_sourceTracker.end(m_input, m_token);

        m_xssFilter.filterToken(m_token);

        m_treeBuilder->constructTreeFromToken(m_token);
        m_token.clear();
    }
Comment 14 Eric Seidel (no email) 2011-02-17 12:45:55 PST
Comment on attachment 82726 [details]
Patch

I think this is an improvement.  Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)
Comment 15 Tony Gentilcore 2011-02-17 12:48:20 PST
(In reply to comment #14)
> (From update of attachment 82726 [details])
> I think this is an improvement.  Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)

Thanks. Hope you are feeling better.

I updated the perf numbers in the ChangeLog. Counterintuitively to me, it seems to have gotten faster. I'm not planning to look at the asm to figure out why.
Comment 16 Tony Gentilcore 2011-02-18 09:26:09 PST
Committed r79006: <http://trac.webkit.org/changeset/79006>