Bug 89635

Summary: NodeRareData should be directly referred from Node instead of using HashMap
Product: WebKit Reporter: Hajime Morrita <morrita>
Component: DOMAssignee: Hajime Morrita <morrita>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: abarth, apavlov, ap, cmarcelo, darin, dglazkov, eric, esprehn, kling, koivisto, rniwa, sam, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 100057    
Bug Blocks: 87034, 89737    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ec2-cr-linux-01
none
Patch rniwa: review+

Description Hajime Morrita 2012-06-20 22:57:16 PDT
Coined by Ryosuke, 
There is an observation that there is noticeable memory usage change even after haraken's 8 byte: http://trac.webkit.org/changeset/119802
because of some reasons like allocator fragmentation. This change is going to take back that bytes to put a pointer to NodeRareData.
Then we can kill the hashtable lookup when fetching it.

I understand this can be controversial. I'd like to hear your feedback.
Comment 1 Hajime Morrita 2012-06-20 22:59:50 PDT
s/noticeable/no noticeable/
Comment 2 Ryosuke Niwa 2012-06-20 23:00:30 PDT
I'm supportive of this change. When I profiled some basic DOM APIs like appendChild, removeChild, etc... we were spending some significant amount of time 5-10% look up hash maps to look for nodeRareData() to obtain treeScope(). I suspect much of that will go away if we avoided the hash lookup for the rare node data.
Comment 3 Hajime Morrita 2012-06-21 02:03:12 PDT
Created attachment 148750 [details]
Patch
Comment 4 WebKit Review Bot 2012-06-21 05:10:18 PDT
Comment on attachment 148750 [details]
Patch

Attachment 148750 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/13026082

New failing tests:
fast/forms/textarea-live-pseudo-selectors.html
fast/dom/HTMLElement/attr-dir-auto-text-form-control-child.html
fast/dom/HTMLElement/attr-dir-value-change.html
fast/dom/HTMLElement/attr-dir-auto-change-child-node.html
fast/text/international/bdi-dir-default-to-auto.html
fast/dom/HTMLElement/attr-dir-auto-change-text.html
fast/dom/HTMLElement/attr-dir-auto-remove-add-children.html
fast/text/international/bdi-neutral-wrapped.html
fast/dom/HTMLElement/attr-dir-auto.html
fast/dom/HTMLElement/attr-dir-auto-change-before-text-node.html
Comment 5 WebKit Review Bot 2012-06-21 05:10:24 PDT
Created attachment 148770 [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: <class 'webkitpy.common.config.ports.ChromiumXVFBPort'>  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 6 Andreas Kling 2012-06-21 06:30:47 PDT
I support this change as well.

There's probably a bunch of smaller cleanups we can do where it's no longer necessary to cache rareData() in a local variable, etc.
Comment 7 Hajime Morrita 2012-06-21 22:09:48 PDT
Created attachment 148961 [details]
Patch
Comment 8 Hajime Morrita 2012-06-21 22:11:42 PDT
Ryosuke, Kling, Thank you for your support :-)
I updated the patch to fix test failures and to have some cleanup which Kling mentioned.
Comment 9 Ryosuke Niwa 2012-06-21 22:39:55 PDT
Comment on attachment 148961 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review

> Source/WebCore/ChangeLog:9
> +        Local benchmark shows 3% speedup on dom-modify.html and 7% speedup on dom-query.html

Nice!

> Source/WebCore/dom/Element.cpp:152
>  inline ElementRareData* Element::rareData() const
>  {
> -    ASSERT(hasRareData());
> -    return static_cast<ElementRareData*>(NodeRareData::rareDataFromMap(this));
> +    return static_cast<ElementRareData*>(Node::rareData());
>  }

We can probably just inline this at the declaration.

> Source/WebCore/dom/Element.cpp:1044
>          if (hasRareData()) {

Should we replace this with rareData() ?

> Source/WebCore/dom/Node.h:724
> +    bool hasRareData() const { return m_rareData; }

Do we really need this function?

> Source/WebCore/dom/Node.h:786
> +    NodeRareData* m_rareData;

Why don't we make this OwnPtr?
Comment 10 Hajime Morrita 2012-06-21 22:43:54 PDT
(In reply to comment #9)
> (From update of attachment 148961 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review
> 
> > Source/WebCore/ChangeLog:9
> > +        Local benchmark shows 3% speedup on dom-modify.html and 7% speedup on dom-query.html
> 
> Nice!
> 
> > Source/WebCore/dom/Element.cpp:152
> >  inline ElementRareData* Element::rareData() const
> >  {
> > -    ASSERT(hasRareData());
> > -    return static_cast<ElementRareData*>(NodeRareData::rareDataFromMap(this));
> > +    return static_cast<ElementRareData*>(Node::rareData());
> >  }
> 
> We can probably just inline this at the declaration.
Unfortunately we cannot since ElementRareData definition isn't visible from the header file.
 
> 
> > Source/WebCore/dom/Element.cpp:1044
> >          if (hasRareData()) {
> 
> Should we replace this with rareData() ?
> 
> > Source/WebCore/dom/Node.h:724
> > +    bool hasRareData() const { return m_rareData; }
> 
> Do we really need this function?
I'd like to do that in separate patch to keep this change clean.

> 
> > Source/WebCore/dom/Node.h:786
> > +    NodeRareData* m_rareData;
> 
> Why don't we make this OwnPtr?
Because NodeRareData definition from the header file
and Node::Node is inlined (in Document.h). This is a bit ugly bat not worse than before...
Comment 11 Ryosuke Niwa 2012-06-21 22:51:03 PDT
Comment on attachment 148961 [details]
Patch

ok, fair enough. r=me provided we don't break tests :)
Comment 12 Darin Adler 2012-06-22 10:18:50 PDT
Comment on attachment 148961 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review

> Source/WebCore/ChangeLog:8
> +        This change adds Node::m_rareData and eliminate old raredata-support codes.

Why is adding a pointer to every Node an acceptable memory regression!? On 64-bit systems this will make every Node 8 bytes bigger! We should look at other options before we spend all this extra memory. I don’t see any data here on the change in memory use and why it’s OK.
Comment 13 Andreas Kling 2012-06-22 11:02:54 PDT
(In reply to comment #12)
> (From update of attachment 148961 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=148961&action=review
> 
> > Source/WebCore/ChangeLog:8
> > +        This change adds Node::m_rareData and eliminate old raredata-support codes.
> 
> Why is adding a pointer to every Node an acceptable memory regression!? On 64-bit systems this will make every Node 8 bytes bigger! We should look at other options before we spend all this extra memory. I don’t see any data here on the change in memory use and why it’s OK.

We just recently knocked one pointer off of every Node with the TreeShared devirtualization, so we're basically getting back to square one.

Furthermore, I think we could move the "inline" JS wrapper pointer to NodeRareData after this change and make up for much of the regression.
Comment 14 Darin Adler 2012-06-22 11:14:13 PDT
I can understand adding the pointer since this is showing up as slowness on some realistic performance tests. But in general I am not sure I understand how to make the tradeoff between memory use and speed for this data. If the data was truly rare, then clearly the speed would not matter. So the heart of this is that the data is not all that rare.

I could have used a pointer from the beginning when I added rare data, so I’m wondering how we will weigh this appropriately.
Comment 15 Ryosuke Niwa 2012-06-22 12:02:24 PDT
(In reply to comment #14)
> I can understand adding the pointer since this is showing up as slowness on some realistic performance tests. But in general I am not sure I understand how to make the tradeoff between memory use and speed for this data.

What we have observed from haraken's vtable pointer removal is that we adding a single pointer doesn't have any observable impact on memory usage probably because it fits within the fragment malloc allocates for each Node.

> If the data was truly rare, then clearly the speed would not matter. So the heart of this is that the data is not all that rare.

Possibly. The problem is with treeScope(), which is used in a lot of places. See https://bugs.webkit.org/show_bug.cgi?id=87034. There, morrita had tried to replace m_document with m_treeScope but caused a perf. regression.

Also, node lists are rare but they need to be fast when they're accessed.
Comment 16 Darin Adler 2012-06-22 17:30:52 PDT
(In reply to comment #15)
> What we have observed from haraken's vtable pointer removal is that we adding a single pointer doesn't have any observable impact on memory usage probably because it fits within the fragment malloc allocates for each Node.

Sorry, I don’t buy it. Eventually you cross the fragment size threshold, and then size does matter!

> > If the data was truly rare, then clearly the speed would not matter. So the heart of this is that the data is not all that rare.
> 
> Possibly. The problem is with treeScope(), which is used in a lot of places. See https://bugs.webkit.org/show_bug.cgi?id=87034. There, morrita had tried to replace m_document with m_treeScope but caused a perf. regression.

This worries me. We’re saying that the tree scope feature basically requires an additional pointer per node. I think that instead of tightening up we are letting size creep up.

This treeScope regression is indeed a bad thing, and I suggest we find another way to fix it.

> Also, node lists are rare but they need to be fast when they're accessed.

Not sure what this means in practice. I think this patch is really about the treeScope regression.
Comment 17 Adam Barth 2012-06-22 18:24:19 PDT
> If the data was truly rare, then clearly the speed would not matter.

I'm sure that's true as a general proposition.  It's entirely possible that the data is present only on a small fraction of nodes but those nodes (and that data) are accessed frequently.  In other words, it might be rare from a space point-of-view but not rare from an access frequency point-of-view.

(Note: I haven't looked into any of the details here, so the above might not be what's going on here.)
Comment 18 Ryosuke Niwa 2012-06-22 18:26:27 PDT
(In reply to comment #17)
> I'm sure that's true as a general proposition.  It's entirely possible that the data is present only on a small fraction of nodes but those nodes (and that data) are accessed frequently.  In other words, it might be rare from a space point-of-view but not rare from an access frequency point-of-view.

That's certainly the case for getElementsByTagName.
Comment 19 Hajime Morrita 2012-06-24 18:12:48 PDT
My guess is that 3% speedup on dom-modify is from treeScope()
and 7% speedup on dom-query is from nodelist  cache. 

Here is my another plan to attack treeScope() speedup.
- We could make it lazily-assigned value.
- We could early return when a node isn't inside shadow dom.

I personally believe this approach works well and can open opportunity for further optimization.
But as Darin mentioned, we need some data to see how the tradeoff goes.

Bug 78984 will help for that purpose. So let us come back here once it starts working.
Comment 20 Ryosuke Niwa 2012-06-25 10:19:24 PDT
I think we'll need to do this regardless of how we fix treeScope() regression. Given that jquery depends on our current optimization of getElementsByTagName('*'), this is the only way to mitigate the performance problem in the bug 73853.
Comment 21 Elliott Sprehn 2012-11-07 15:32:53 PST
This was resolved by the removal of the rare data map in Bug 100057
Comment 22 Ryosuke Niwa 2012-11-07 16:24:41 PST

*** This bug has been marked as a duplicate of bug 100057 ***