Bug 93455 - Extract DOM modifications in the HTML5 parser into HTMLParserDOMMutationQueue.
Summary: Extract DOM modifications in the HTML5 parser into HTMLParserDOMMutationQueue.
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kwang Yul Seo
URL:
Keywords:
Depends on:
Blocks: 90751 93608
  Show dependency treegraph
 
Reported: 2012-08-08 02:18 PDT by Kwang Yul Seo
Modified: 2013-09-05 08:33 PDT (History)
8 users (show)

See Also:


Attachments
Patch (not for landing) (32.06 KB, patch)
2012-08-08 02:21 PDT, Kwang Yul Seo
no flags Details | Formatted Diff | Diff
Patch (not for landing) (39.82 KB, patch)
2012-08-08 05:13 PDT, Kwang Yul Seo
no flags Details | Formatted Diff | Diff
Patch (not for landing) (37.18 KB, patch)
2012-08-08 16:31 PDT, Kwang Yul Seo
no flags Details | Formatted Diff | Diff
Patch (99.63 KB, patch)
2012-08-09 05:18 PDT, Kwang Yul Seo
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kwang Yul Seo 2012-08-08 02:18:10 PDT
This patch extracts DOM modifications in the parser into DOMModifier and makes the tree builder and the element stack use it. DOMModifier can change its behavior dynamically by changing its strategy. The default strategy of DOMModifier is set to ImmediateDOMModifierStrategy which calls DOM operations synchronously.
    
This refactoring is a preparation for speculative parsing. To run the parser in speculation mode, we must be able to queue up DOM modifications. We can do this by adding DeferredDOMModifierStategy which queues up DOM operations and execute them later.
    
Both ImmediateDOMModifierStrategy and DeferredDOMModifierStategy eventually perform the same DOM operations. So I added DOMModifierExecutor which performs the actual DOM operations. DOMModifierStrategy classes should delegate to DOMModifierExecutor to remove code duplication.
    
Changed HTMLDocumentParser to own a DOMModifier instance and pass it to HTMLTreeBuilder, HTMLConstructionSite and HTMLElementStack.
    
This patch is not complete. HTMLConstructionSite still has DOM modifications including element creation. I will move the remaining DOM modifications into DOMModifier in a follow-up patch.
Comment 1 Kwang Yul Seo 2012-08-08 02:21:02 PDT
Created attachment 157160 [details]
Patch (not for landing)
Comment 2 Kwang Yul Seo 2012-08-08 02:26:35 PDT
Eric, I've implemented the idea we discussed in Bug 93113. Would you take a look at it?

I am a bit worried about the indirection and virtual method overhead.
Comment 3 Kwang Yul Seo 2012-08-08 02:34:24 PDT
rniwa, I changed the name 'detachNode' to 'removeFromParent' in this patch.
Comment 4 Eric Seidel (no email) 2012-08-08 03:26:13 PDT
Interesting.  Do we know how much of a regression this would be for perf?
Comment 5 Kwang Yul Seo 2012-08-08 04:02:24 PDT
(In reply to comment #4)
> Interesting.  Do we know how much of a regression this would be for perf?

This patch causes a 0.3% performance regression on the html5-parser benchmark.
Comment 6 Kwang Yul Seo 2012-08-08 05:13:06 PDT
Created attachment 157188 [details]
Patch (not for landing)
Comment 7 Kwang Yul Seo 2012-08-08 05:13:43 PDT
(In reply to comment #6)
> Created an attachment (id=157188) [details]
> Patch (not for landing)

I added more DOM modifications to DOMModifier.
Comment 8 Kwang Yul Seo 2012-08-08 05:26:28 PDT
If this patch looks okay, I will move the rest of DOM modifications to DOMModifier and rebase NodeProxy patch ( https://github.com/kseo/webkit/commit/b9e6009013be65a629dcbebc7e4088c4c44f51e7 ) and speculative parser implementation on top of this patch.
Comment 9 Gyuyoung Kim 2012-08-08 05:47:27 PDT
Comment on attachment 157188 [details]
Patch (not for landing)

Attachment 157188 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/13459207
Comment 10 Build Bot 2012-08-08 05:52:59 PDT
Comment on attachment 157188 [details]
Patch (not for landing)

Attachment 157188 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13451774
Comment 11 Early Warning System Bot 2012-08-08 05:59:57 PDT
Comment on attachment 157188 [details]
Patch (not for landing)

Attachment 157188 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/13451777
Comment 12 Early Warning System Bot 2012-08-08 06:09:31 PDT
Comment on attachment 157188 [details]
Patch (not for landing)

Attachment 157188 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/13453624
Comment 13 Alexey Proskuryakov 2012-08-08 10:17:01 PDT
"Modifier" is not a noun that conveys much meaning. In a sense, half of WebCore code is a "DOM modifier".

Is there a more specific way to describe the code you're extracting?
Comment 14 Ryosuke Niwa 2012-08-08 10:38:22 PDT
Comment on attachment 157188 [details]
Patch (not for landing)

View in context: https://bugs.webkit.org/attachment.cgi?id=157188&action=review

> Source/WebCore/html/parser/DOMModifier.h:42
> +class DOMModifier {

The class name reflect the fact it's specific to parser.

> Source/WebCore/html/parser/DOMModifierExecutor.h:37
> +class DOMModifierExecutor {

Do we really need this class?

> Source/WebCore/html/parser/DOMModifierStrategy.h:42
> +    virtual void beginParsingChildren(Node*) = 0;
> +    virtual void finishParsingChildren(Node*) = 0;

I am extremely concerned about the implication of all these new virtual function calls.
If there will be only two strategies, it's probably better to embed if-else in the DOMModifier class itself instead of using polymorphism.
Comment 15 Kwang Yul Seo 2012-08-08 14:15:58 PDT
(In reply to comment #13)
> "Modifier" is not a noun that conveys much meaning. In a sense, half of WebCore code is a "DOM modifier".
> 
> Is there a more specific way to describe the code you're extracting?

It's a "DOM modifier" specific to the HTML parser. After this patch, the rest of the HTML parser never (and must not) modifies the DOM. Why don't we call it HTMLParserDOMModifier? Does it sound better?
Comment 16 Ryosuke Niwa 2012-08-08 14:17:47 PDT
Comment on attachment 157188 [details]
Patch (not for landing)

I don't think we should be adding this many indirections and classes.
Comment 17 Kwang Yul Seo 2012-08-08 14:35:43 PDT
(In reply to comment #16)
> (From update of attachment 157188 [details])
> I don't think we should be adding this many indirections and classes.

Another option is to remove ImmediateDOMModifierStrategy and DOMModifierExecutor and embed DOMModifierExecutor as the default behavior in DOMModifier.

void DOMModifier::beginParsingChildren(Node* node)
{
    ASSERT(node);
    if (!m_strategy) {
        node->beginParsingChildren();
        return;
    }

    m_strategy->beginParsingChildren(node);
}

Now we have only one additional NULL check in non-speculative parsing code paths. This also reduces the number of classes involved in this indirection.
Comment 18 Ryosuke Niwa 2012-08-08 14:43:38 PDT
(In reply to comment #17)
> (In reply to comment #16)
> > (From update of attachment 157188 [details] [details])
> > I don't think we should be adding this many indirections and classes.
> 
> Another option is to remove ImmediateDOMModifierStrategy and DOMModifierExecutor and embed DOMModifierExecutor as the default behavior in DOMModifier.
> 
> void DOMModifier::beginParsingChildren(Node* node)
> {
>     ASSERT(node);
>     if (!m_strategy) {
>         node->beginParsingChildren();
>         return;
>     }
> 
>     m_strategy->beginParsingChildren(node);
> }
> 
> Now we have only one additional NULL check in non-speculative parsing code paths. This also reduces the number of classes involved in this indirection.

Why do we need this DOMModifier/Strategy classes at all? can't we just put the contents of each beginParsingChildren implementation right there?
Comment 19 Kwang Yul Seo 2012-08-08 15:05:57 PDT
(In reply to comment #18)
> Why do we need this DOMModifier/Strategy classes at all? can't we just put the contents of each beginParsingChildren implementation right there?

eseidel and I discussed this issue on https://bugs.webkit.org/show_bug.cgi?id=93113.

DOMModifier is an abstraction to handle queueing of DOM operations (and cross-thread DOM mutation later) so that the parser can continue parsing speculatively while waiting for the parser blocking script to be loaded. If the speculation ends up succeeding, we perform the queued DOM operations at once. Otherwise, we throw them away.

I admit DOMModifierStrategy might be an overkill here. I can certainly put the queueing of DOM operations directly in DOMModifier class.

Let's listen to eseidel's opinion on this.
Comment 20 Kwang Yul Seo 2012-08-08 16:31:06 PDT
Created attachment 157329 [details]
Patch (not for landing)
Comment 21 Ryosuke Niwa 2012-08-08 16:36:26 PDT
(In reply to comment #19)
> DOMModifier is an abstraction to handle queueing of DOM operations (and cross-thread DOM mutation later) so that the parser can continue parsing speculatively while waiting for the parser blocking script to be loaded. If the speculation ends up succeeding, we perform the queued DOM operations at once. Otherwise, we throw them away.

HTMLParserDOMMutationQueue?  Anyhow, I would much prefer seeing code like:

if (m_inSpeculativeParsing) {
   ~~
} else {
   ~~
}

over

m_strategy->X() where we have two X()'s.

The latter pattern is nice if we're going to have more than few cases and we want it to be extensible but it appears that we only have two code paths in this case.
Comment 22 Kwang Yul Seo 2012-08-08 16:51:37 PDT
(In reply to comment #21)
> HTMLParserDOMMutationQueue?  

It seems a bit weird to me that HTMLParserDOMMutationQueue can synchronously call DOM operations.

> Anyhow, I would much prefer seeing code like:
> 
> if (m_inSpeculativeParsing) {
>    ~~
> } else {
>    ~~
> }
> 
> over
> 
> m_strategy->X() where we have two X()'s.
> 
> The latter pattern is nice if we're going to have more than few cases and we want it to be extensible but it appears that we only have two code paths in this case.

That's exactly what I did in my previous try. At that time, I put DOM modifications into HTMLConstructionSite.

https://github.com/kseo/webkit/commit/cee9b035c7c14e73739467067fa96291d58e8082

e.g.,

void scheduleBeginParsingChildren(NodeProxy* node)
{
    if (m_isSpeculating)
        m_taskQueue.append(bind(&HTMLConstructionSite::beginParsingChildrenTask, this, PassRefPtr<NodeProxy>(node)));
    else
        beginParsingChildrenTask(node);
}


The reason why I chose the latter pattern in this patch is to keep the speculative parsing code in a separate class. I thought the performance penalty caused by virtual method calls is acceptable in speculation code paths. 

However, I agree to your point. m_strategy->X() pattern is an overkill if there are only one or two strategies. I will see if I can do this better, or I will follow your suggestion!

Thanks.
Comment 23 Kwang Yul Seo 2012-08-08 16:53:58 PDT
(In reply to comment #20)
> Created an attachment (id=157329) [details]
> Patch (not for landing)

Changes over the previous patch:

- Changed the name DOMModifier to HTMLParserDOMModifier because this class is specific to the HTML parser.

- Removed ImmediateDOMModifierStrategy. Now HTMLParserDOMModifier synchronously calls DOM operations if no HTMLParserDOMModifierStrategy is provided. This removes virtual method calls in non-speculative parsing.

- Kept HTMLParserDOMModifierExecutor. Otherwise, DeferredDOMModifierStategy (which will be added later) can't reuse the code.
Comment 24 Ryosuke Niwa 2012-08-08 16:59:57 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > HTMLParserDOMMutationQueue?  
> 
> It seems a bit weird to me that HTMLParserDOMMutationQueue can synchronously call DOM operations.

Why not? We have this pattern elsewhere in the WebKit. e.g. ScopedEventQueue.

I also dislike the use of the term "modifier" here. "DOMModifier" doesn't tell us anything about the nature or the purpose of the class. All we know is that it mutates DOM, as Alexey pointed out, is what a vast majority of WebCore classes do.

What we're trying to do here is to queue up possible DOM mutations when we're in the speculative parsing.
Comment 25 Ryosuke Niwa 2012-08-08 17:04:33 PDT
(In reply to comment #22)
> void scheduleBeginParsingChildren(NodeProxy* node)
> {
>     if (m_isSpeculating)
>         m_taskQueue.append(bind(&HTMLConstructionSite::beginParsingChildrenTask, this, PassRefPtr<NodeProxy>(node)));
>     else
>         beginParsingChildrenTask(node);
> }

This code is much clearer than the one we have in your patch here.

> The reason why I chose the latter pattern in this patch is to keep the speculative parsing code in a separate class. I thought the performance penalty caused by virtual method calls is acceptable in speculation code paths. 

It's not about virtual function calls. It's about the clarity. By adding extra two years of classes, we're obscuring what the code does. Imagine you don't know anything about HTML parser and trying to decipher what's happening that code. After your patch, we have to go through 3 layers of classes to figure out what it does.

Worse yet, just looking at the DOMModififer class is not sufficient to figure out what will happen because the actual implementation is hidden behind m_strategy. Now we have to find all subclasses of the strategy class and when and how they're associated with DOMModifier. That's A LOT more work than if we had a single boolean indicating whether we're in the speculative parsing or not. In that case, the intent of this class and what it does is self-evidently clear.
Comment 26 Kwang Yul Seo 2012-08-08 17:14:25 PDT
(In reply to comment #24)
> Why not? We have this pattern elsewhere in the WebKit. e.g. ScopedEventQueue.

I didn't know that. If this naming pattern is common in WebKit, I think HTMLParserDOMMutationQueue is okay!

> I also dislike the use of the term "modifier" here. "DOMModifier" doesn't tell us anything about the nature or the purpose of the class. All we know is that it mutates DOM, as Alexey pointed out, is what a vast majority of WebCore classes do.

My intention was to collect all DOM modifications into one class. Then, we can dynamically add or remove a strategy which queues DOM mutations. DOMModifier simply modifies the DOM, while DeferredDOMModifierStrategy queues up DOM mutations. But if we combine these classes into one as you suggested, then the name should be HTMLParserDOMMutationQueue.
Comment 27 Kwang Yul Seo 2012-08-08 17:17:14 PDT
(In reply to comment #25)
> It's not about virtual function calls. It's about the clarity. By adding extra two years of classes, we're obscuring what the code does. Imagine you don't know anything about HTML parser and trying to decipher what's happening that code. After your patch, we have to go through 3 layers of classes to figure out what it does.
> 
> Worse yet, just looking at the DOMModififer class is not sufficient to figure out what will happen because the actual implementation is hidden behind m_strategy. Now we have to find all subclasses of the strategy class and when and how they're associated with DOMModifier. That's A LOT more work than if we had a single boolean indicating whether we're in the speculative parsing or not. In that case, the intent of this class and what it does is self-evidently clear.

I see! I will follow your suggestion and combine these classes into HTMLParserDOMMutationQueue.
Comment 28 Kwang Yul Seo 2012-08-09 05:18:01 PDT
Created attachment 157441 [details]
Patch
Comment 29 Adam Barth 2012-08-09 10:50:17 PDT
Sorry for not commenting earlier.  @skyul: I thought we were going to evaluate the speculative parser on a branch before landing more parser changes.
Comment 30 Adam Barth 2012-08-09 10:51:56 PDT
I see that you've already answered my question in https://bugs.webkit.org/show_bug.cgi?id=90751#c10
Comment 31 Kwang Yul Seo 2012-08-09 16:28:30 PDT
(In reply to comment #29)
> Sorry for not commenting earlier.  @skyul: I thought we were going to evaluate the speculative parser on a branch before landing more parser changes.

(In reply to comment #30)
> I see that you've already answered my question in https://bugs.webkit.org/show_bug.cgi?id=90751#c10

Yeah, please take a look at these preparation patches. I won't land them until we  decide to land the whole speculative parser. I should've turned off r+ flag in this case?
Comment 32 Eric Seidel (no email) 2013-03-05 02:02:50 PST
We took a different approach with the threaded parser work.  I think this is now WONTFIX/dupe of bug  106127.
Comment 33 Zan Dobersek 2013-09-05 08:33:14 PDT
Comment on attachment 157441 [details]
Patch

The work on this bug has ceased, removing the r? flag on the patch.