54574 – Refactor pumpTokenizer loop

RESOLVED FIXED 54574

Refactor pumpTokenizer loop

https://bugs.webkit.org/show_bug.cgi?id=54574

Summary Refactor pumpTokenizer loop

Tony Gentilcore

Reported 2011-02-16 11:22:46 PST

Refactor pumpTokenizer loop

Attachments
Patch (8.17 KB, patch) 2011-02-16 11:43 PST, Tony Gentilcore	no flags	Details Formatted Diff Diff
Patch (8.99 KB, patch) 2011-02-16 12:57 PST, Tony Gentilcore	no flags	Details Formatted Diff Diff
Patch (9.40 KB, patch) 2011-02-16 17:15 PST, Tony Gentilcore	eric: review+ tonyg: commit-queue+	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Tony Gentilcore

Comment 1 2011-02-16 11:43:51 PST

Created attachment 82669 [details] Patch

Tony Gentilcore

Comment 2 2011-02-16 11:47:06 PST

Hey Eric, Please let me know what you think of this. The main problem with bug 54355 seemed to be that I was duplicating the second half of the pumpTokenizer loop in resumeParsingAfterYield() because that method didn't know whether we are resuming from a yield just before a token or just before a script. In addition to cleaning up the loop, this will make those tricky parts go away in that patch. Happy to discuss on IRC if it isn't clear how they relate.

Adam Barth

Comment 3 2011-02-16 11:57:20 PST

Comment on attachment 82669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review > Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 > +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) Is it ok to pass the PumpSession by value? > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > + // JavaScript may have stopped or detached the parser. These comments are kind of silly.

Tony Gentilcore

Comment 4 2011-02-16 12:57:12 PST

Created attachment 82680 [details] Patch

Tony Gentilcore

Comment 5 2011-02-16 12:57:21 PST

> Is it ok to pass the PumpSession by value? Oops. Good catch, I mistakenly thought we were passing by val to shouldContinueParsing(), but I see now that isn't. > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:205 > > + // JavaScript may have stopped or detached the parser. > > These comments are kind of silly. Removed.

Eric Seidel (no email)

Comment 6 2011-02-16 13:58:57 PST

Comment on attachment 82669 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=82669&action=review I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. >> Source/WebCore/html/parser/HTMLDocumentParser.cpp:203 >> +bool HTMLDocumentParser::shouldContinueParsingAfterRunningScripts(SynchronousMode mode, HTMLParserScheduler::PumpSession session) > > Is it ok to pass the PumpSession by value? It's not. :) > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > + if (m_treeBuilder->isPaused()) { So why is this isPaused instead of isWaitingForScripts? I don't understand the difference.

Tony Gentilcore

Comment 7 2011-02-16 14:15:58 PST

> I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: if (session.needsYield) m_parserScheduler->scheduleForResume(); This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > Is it ok to pass the PumpSession by value? > > It's not. :) Fixed. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > + if (m_treeBuilder->isPaused()) { > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop.

Eric Seidel (no email)

Comment 8 2011-02-16 14:19:03 PST

(In reply to comment #7) > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > if (session.needsYield) > m_parserScheduler->scheduleForResume(); > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call. > > > Source/WebCore/html/parser/HTMLDocumentParser.cpp:210 > > > + if (m_treeBuilder->isPaused()) { > > > > So why is this isPaused instead of isWaitingForScripts? I don't understand the difference. > > isWaitingForScripts() returns m_treeBuilder->isPaused(), so they are exactly equivalent. Since isWaitingForScripts() is a virtual method it doesn't seem desirable to call in this hot loop. Hmm.. Ok. Maybe we should rename isPaused to isPausedWaitingForScripts. :) I'm not sure if there are other things which cause us to pause bsides scripts.

Eric Seidel (no email)

Comment 9 2011-02-16 14:20:26 PST

(In reply to comment #8) > (In reply to comment #7) > > > I think this is nicer. It a bit confusing how deep in the callstack the actual scheduling occurs. I wonder if there is some better way to return all the way back to the pumpTokenizer loop before executing the actual "schedule me to resume later" command. > > > > What if PumpSession gets a new bool: "needsYield." Then HTMLParserScheduler::shouldContinueParsing() would be renamed HTMLParserScheduler::checkForYieldBeforeToken() and rather than actually scheduling the resume, it would just set the needsYield bit. Then pumpTokenizer() can do something like this after its while-loop: > > > > if (session.needsYield) > > m_parserScheduler->scheduleForResume(); > > > > This changes the semantic from HTMLParserScheduler demanding that the parser yield to HTMLParserScheduler suggesting that the parser yield. Should be less prone to misuse. > > I like it! Do you want to do that in a second patch or would such a re-write make this patch betteR? Your call. I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure.

Tony Gentilcore

Comment 10 2011-02-16 14:23:43 PST

> Do you want to do that in a second patch or would such a re-write make this patch betteR? I'd like to get any refactoring out of the way in this patch. > I suspect checkForYieldBeforeToken would still return a bool to yield or not? Not sure. Let me try it out.

Tony Gentilcore

Comment 11 2011-02-16 17:15:39 PST

Created attachment 82726 [details] Patch

Tony Gentilcore

Comment 12 2011-02-16 17:17:50 PST

Ready for another look. I think this is much cleaner, let me know what you think. I decided not to touch the isPaused vs isWaitingForScripts thing in this change. But to answer your earlier question, the only thing that can caused the tree builder to be paused is a script.

Tony Gentilcore

Comment 13 2011-02-17 09:19:36 PST

I noticed that in the diff it is kind of hard to see how nice the tokenizer loop is now: PumpSession session; while (canTakeNextToken(mode, session) && !session.needsYield) { m_sourceTracker.start(m_input, m_token); if (!m_tokenizer->nextToken(m_input.current(), m_token)) break; m_sourceTracker.end(m_input, m_token); m_xssFilter.filterToken(m_token); m_treeBuilder->constructTreeFromToken(m_token); m_token.clear(); }

Eric Seidel (no email)

Comment 14 2011-02-17 12:45:55 PST

Comment on attachment 82726 [details] Patch I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...)

Tony Gentilcore

Comment 15 2011-02-17 12:48:20 PST

(In reply to comment #14) > (From update of attachment 82726 [details]) > I think this is an improvement. Assuming it didn't regress perf r+. (Sorry for the slow review, been sick for a week and sleeping most of the days...) Thanks. Hope you are feeling better. I updated the perf numbers in the ChangeLog. Counterintuitively to me, it seems to have gotten faster. I'm not planning to look at the asm to figure out why.

Tony Gentilcore

Comment 16 2011-02-18 09:26:09 PST

Committed r79006: <http://trac.webkit.org/changeset/79006>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Tony Gentilcore

Reported

2011-02-16 11:22 PST

Modified

2011-02-18 09:26 PST History

CC List

2 users Show

URL

Keywords

Depends on

Blocks

54355

Dependencies

tree graph