RESOLVED FIXED Bug 54574
Refactor pumpTokenizer loop
https://bugs.webkit.org/show_bug.cgi?id=54574
Summary Refactor pumpTokenizer loop
Tony Gentilcore
Reported 2011-02-16 11:22:46 PST
Refactor pumpTokenizer loop
Attachments
Patch (8.17 KB, patch)
2011-02-16 11:43 PST, Tony Gentilcore
no flags
Patch (8.99 KB, patch)
2011-02-16 12:57 PST, Tony Gentilcore
no flags
Patch (9.40 KB, patch)
2011-02-16 17:15 PST, Tony Gentilcore
eric: review+
tonyg: commit-queue+
Tony Gentilcore
Comment 1 2011-02-16 11:43:51 PST
Tony Gentilcore
Comment 2 2011-02-16 11:47:06 PST
Hey Eric, Please let me know what you think of this. The main problem with bug 54355 seemed to be that I was duplicating the second half of the pumpTokenizer loop in resumeParsingAfterYield() because that method didn't know whether we are resuming from a yield just before a token or just before a script. In addition to cleaning up the loop, this will make those tricky parts go away in that patch. Happy to discuss on IRC if it isn't clear how they relate.
Adam Barth
Comment 3 2011-02-16 11:57:20 PST
Comment on attachment 82669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review > Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 > +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) Is it ok to pass the PumpSession by value? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > + // JavaScript may have stopped or detached the parser. These comments are kind of silly.
Tony Gentilcore
Comment 4 2011-02-16 12:57:12 PST
Tony Gentilcore
Comment 5 2011-02-16 12:57:21 PST
> Is it ok to pass the PumpSession by value? Oops. Good catch, I mistakenly thought we were passing by val to shouldContinueParsing(), but I see now that isn't. > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > > + // JavaScript may have stopped or detached the parser. > > These comments are kind of silly. Removed.
Eric Seidel (no email)
Comment 6 2011-02-16 13:58:57 PST
Comment on attachment 82669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 >> +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) > > Is it ok to pass the PumpSession by value? It's not. :) > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > + if (m_treeBuilder->isPaused()) { So why is this isPaused instead of isWaitingForScripts? I don't understand the difference.
Tony Gentilcore
Comment 7 2011-02-16 14:15:58 PST
> I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: if (session.needsYield) m_parserScheduler->scheduleForResume(); This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > Is it ok to pass the PumpSession by value? > > It's not. :) Fixed. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > + if (m_treeBuilder->isPaused()) { > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.
Eric Seidel (no email)
Comment 8 2011-02-16 14:19:03 PST
(In reply to comment #7) > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > if (session.needsYield) > m_parserScheduler->scheduleForResume(); > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > > + if (m_treeBuilder->isPaused()) { > > > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. > > isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop. Hmm.. Ok. Maybe we should rename isPaused to isPausedWaitingForScripts. :) I'm not sure if there are other things which cause us to pause bsides scripts.
Eric Seidel (no email)
Comment 9 2011-02-16 14:20:26 PST
(In reply to comment #8) > (In reply to comment #7) > > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > > > if (session.needsYield) > > m_parserScheduler->scheduleForResume(); > > > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call. I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure.
Tony Gentilcore
Comment 10 2011-02-16 14:23:43 PST
> Do you want to do that in a second patch or would such a re-write make this patch betteR? I'd like to get any refactoring out of the way in this patch. > I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure. Let me try it out.
Tony Gentilcore
Comment 11 2011-02-16 17:15:39 PST
Tony Gentilcore
Comment 12 2011-02-16 17:17:50 PST
Ready for another look. I think this is much cleaner, let me know what you think. I decided not to touch the isPaused vs isWaitingForScripts thing in this change. But to answer your earlier question, the only thing that can caused the tree builder to be paused is a script.
Tony Gentilcore
Comment 13 2011-02-17 09:19:36 PST
I noticed that in the diff it is kind of hard to see how nice the tokenizer loop is now: PumpSession session; while (canTakeNextToken(mode, session) && !session.needsYield) { m_sourceTracker.start(m_input, m_token); if (!m_tokenizer->nextToken(m_input.current(), m_token)) break; m_sourceTracker.end(m_input, m_token); m_xssFilter.filterToken(m_token); m_treeBuilder->constructTreeFromToken(m_token); m_token.clear(); }
Eric Seidel (no email)
Comment 14 2011-02-17 12:45:55 PST
Comment on attachment 82726 [details] Patch I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)
Tony Gentilcore
Comment 15 2011-02-17 12:48:20 PST
(In reply to comment #14) > (From update of attachment 82726 [details]) > I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...) Thanks. Hope you are feeling better. I updated the perf numbers in the ChangeLog. Counterintuitively to me, it seems to have gotten faster. I'm not planning to look at the asm to figure out why.
Tony Gentilcore
Comment 16 2011-02-18 09:26:09 PST
Note You need to log in before you can comment on or make changes to this bug.