WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 54574
Refactor pumpTokenizer loop
https://bugs.webkit.org/show_bug.cgi?id=54574
Summary
Refactor pumpTokenizer loop
Tony Gentilcore
Reported
2011-02-16 11:22:46 PST
Refactor pumpTokenizer loop
Attachments
Patch
(8.17 KB, patch)
2011-02-16 11:43 PST
,
Tony Gentilcore
no flags
Details
Formatted Diff
Diff
Patch
(8.99 KB, patch)
2011-02-16 12:57 PST
,
Tony Gentilcore
no flags
Details
Formatted Diff
Diff
Patch
(9.40 KB, patch)
2011-02-16 17:15 PST
,
Tony Gentilcore
eric
: review+
tonyg
: commit-queue+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Tony Gentilcore
Comment 1
2011-02-16 11:43:51 PST
Created
attachment 82669
[details]
Patch
Tony Gentilcore
Comment 2
2011-02-16 11:47:06 PST
Hey Eric, Please let me know what you think of this. The main problem with
bug 54355
seemed to be that I was duplicating the second half of the pumpTokenizer loop in resumeParsingAfterYield() because that method didn't know whether we are resuming from a yield just before a token or just before a script. In addition to cleaning up the loop, this will make those tricky parts go away in that patch. Happy to discuss on IRC if it isn't clear how they relate.
Adam Barth
Comment 3
2011-02-16 11:57:20 PST
Comment on
attachment 82669
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=82669&action=review
> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 > +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session)
Is it ok to pass the PumpSession by value?
> Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > + // JavaScript may have stopped or detached the parser.
These comments are kind of silly.
Tony Gentilcore
Comment 4
2011-02-16 12:57:12 PST
Created
attachment 82680
[details]
Patch
Tony Gentilcore
Comment 5
2011-02-16 12:57:21 PST
> Is it ok to pass the PumpSession by value?
Oops. Good catch, I mistakenly thought we were passing by val to shouldContinueParsing(), but I see now that isn't.
> > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > > + // JavaScript may have stopped or detached the parser. > > These comments are kind of silly.
Removed.
Eric Seidel (no email)
Comment 6
2011-02-16 13:58:57 PST
Comment on
attachment 82669
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=82669&action=review
I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command.
>> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 >> +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) > > Is it ok to pass the PumpSession by value?
It's not. :)
> Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > + if (m_treeBuilder->isPaused()) {
So why is this isPaused instead of isWaitingForScripts? I don't understand the difference.
Tony Gentilcore
Comment 7
2011-02-16 14:15:58 PST
> I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command.
What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: if (session.needsYield) m_parserScheduler->scheduleForResume(); This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse.
> > Is it ok to pass the PumpSession by value? > > It's not. :)
Fixed.
> > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > + if (m_treeBuilder->isPaused()) { > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference.
isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.
Eric Seidel (no email)
Comment 8
2011-02-16 14:19:03 PST
(In reply to
comment #7
)
> > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > if (session.needsYield) > m_parserScheduler->scheduleForResume(); > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse.
I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call.
> > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > > + if (m_treeBuilder->isPaused()) { > > > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. > > isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.
Hmm.. Ok. Maybe we should rename isPaused to isPausedWaitingForScripts. :) I'm not sure if there are other things which cause us to pause bsides scripts.
Eric Seidel (no email)
Comment 9
2011-02-16 14:20:26 PST
(In reply to
comment #8
)
> (In reply to
comment #7
) > > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > > > if (session.needsYield) > > m_parserScheduler->scheduleForResume(); > > > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call.
I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure.
Tony Gentilcore
Comment 10
2011-02-16 14:23:43 PST
> Do you want to do that in a second patch or would such a re-write make this patch betteR?
I'd like to get any refactoring out of the way in this patch.
> I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure.
Let me try it out.
Tony Gentilcore
Comment 11
2011-02-16 17:15:39 PST
Created
attachment 82726
[details]
Patch
Tony Gentilcore
Comment 12
2011-02-16 17:17:50 PST
Ready for another look. I think this is much cleaner, let me know what you think. I decided not to touch the isPaused vs isWaitingForScripts thing in this change. But to answer your earlier question, the only thing that can caused the tree builder to be paused is a script.
Tony Gentilcore
Comment 13
2011-02-17 09:19:36 PST
I noticed that in the diff it is kind of hard to see how nice the tokenizer loop is now: PumpSession session; while (canTakeNextToken(mode, session) && !session.needsYield) { m_sourceTracker.start(m_input, m_token); if (!m_tokenizer->nextToken(m_input.current(), m_token)) break; m_sourceTracker.end(m_input, m_token); m_xssFilter.filterToken(m_token); m_treeBuilder->constructTreeFromToken(m_token); m_token.clear(); }
Eric Seidel (no email)
Comment 14
2011-02-17 12:45:55 PST
Comment on
attachment 82726
[details]
Patch I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)
Tony Gentilcore
Comment 15
2011-02-17 12:48:20 PST
(In reply to
comment #14
)
> (From update of
attachment 82726
[details]
) > I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)
Thanks. Hope you are feeling better. I updated the perf numbers in the ChangeLog. Counterintuitively to me, it seems to have gotten faster. I'm not planning to look at the asm to figure out why.
Tony Gentilcore
Comment 16
2011-02-18 09:26:09 PST
Committed
r79006
: <
http://trac.webkit.org/changeset/79006
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug