Refactor pumpTokenizer loop
Created attachment 82669 [details] Patch
Hey Eric, Please let me know what you think of this. The main problem with bug 54355 seemed to be that I was duplicating the second half of the pumpTokenizer loop in resumeParsingAfterYield() because that method didn't know whether we are resuming from a yield just before a token or just before a script. In addition to cleaning up the loop, this will make those tricky parts go away in that patch. Happy to discuss on IRC if it isn't clear how they relate.
Comment on attachment 82669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review > Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 > +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) Is it ok to pass the PumpSession by value? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > + // JavaScript may have stopped or detached the parser. These comments are kind of silly.
Created attachment 82680 [details] Patch
> Is it ok to pass the PumpSession by value? Oops. Good catch, I mistakenly thought we were passing by val to shouldContinueParsing(), but I see now that isn't. > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > > + // JavaScript may have stopped or detached the parser. > > These comments are kind of silly. Removed.
Comment on attachment 82669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 >> +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) > > Is it ok to pass the PumpSession by value? It's not. :) > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > + if (m_treeBuilder->isPaused()) { So why is this isPaused instead of isWaitingForScripts? I don't understand the difference.
> I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: if (session.needsYield) m_parserScheduler->scheduleForResume(); This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > Is it ok to pass the PumpSession by value? > > It's not. :) Fixed. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > + if (m_treeBuilder->isPaused()) { > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.
(In reply to comment #7) > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > if (session.needsYield) > m_parserScheduler->scheduleForResume(); > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > > + if (m_treeBuilder->isPaused()) { > > > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. > > isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop. Hmm.. Ok. Maybe we should rename isPaused to isPausedWaitingForScripts. :) I'm not sure if there are other things which cause us to pause bsides scripts.
(In reply to comment #8) > (In reply to comment #7) > > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > > > if (session.needsYield) > > m_parserScheduler->scheduleForResume(); > > > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call. I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure.
> Do you want to do that in a second patch or would such a re-write make this patch betteR? I'd like to get any refactoring out of the way in this patch. > I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure. Let me try it out.
Created attachment 82726 [details] Patch
Ready for another look. I think this is much cleaner, let me know what you think. I decided not to touch the isPaused vs isWaitingForScripts thing in this change. But to answer your earlier question, the only thing that can caused the tree builder to be paused is a script.
I noticed that in the diff it is kind of hard to see how nice the tokenizer loop is now: PumpSession session; while (canTakeNextToken(mode, session) && !session.needsYield) { m_sourceTracker.start(m_input, m_token); if (!m_tokenizer->nextToken(m_input.current(), m_token)) break; m_sourceTracker.end(m_input, m_token); m_xssFilter.filterToken(m_token); m_treeBuilder->constructTreeFromToken(m_token); m_token.clear(); }
Comment on attachment 82726 [details] Patch I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)
(In reply to comment #14) > (From update of attachment 82726 [details]) > I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...) Thanks. Hope you are feeling better. I updated the perf numbers in the ChangeLog. Counterintuitively to me, it seems to have gotten faster. I'm not planning to look at the asm to figure out why.
Committed r79006: <http://trac.webkit.org/changeset/79006>