Bug 100057

Summary: Replace NodeRareData hash map with a union on m_renderer
Product: WebKit Reporter: Elliott Sprehn <esprehn>
Component: New BugsAssignee: Elliott Sprehn <esprehn>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cmarcelo, darin, dbates, dglazkov, esprehn, gns, haraken, jchaffraix, kling, koivisto, mjs, morrita, ojan, philn, rniwa, slewis, webkit.review.bot, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 101277    
Bug Blocks: 73853, 87034, 89635    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch for landing
none
Patch for landing
none
Patch for landing
none
Patch none

Description Elliott Sprehn 2012-10-22 17:00:41 PDT
Replace NodeRareData hash map with a union on m_renderer
Comment 1 Elliott Sprehn 2012-10-22 17:16:58 PDT
Created attachment 170032 [details]
Patch
Comment 2 Eric Seidel (no email) 2012-10-22 17:34:05 PDT
Comment on attachment 170032 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=170032&action=review

I think this is interesting.  It's unclear to me how likely this is to cause perf changes (good or bad).

> Source/WebCore/dom/Node.h:814
> +    union DataUnion {

You should probably add a comment, explaining why, etc.
Comment 3 Build Bot 2012-10-22 17:44:02 PDT
Comment on attachment 170032 [details]
Patch

Attachment 170032 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/14488634
Comment 4 Build Bot 2012-10-22 17:59:57 PDT
Comment on attachment 170032 [details]
Patch

Attachment 170032 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14482747
Comment 5 Darin Adler 2012-10-22 18:41:20 PDT
Comment on attachment 170032 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=170032&action=review

Looks like a good change, but requires some additional performance testing and tuning, I suspect.

Mac build failed because what was an inline function is now non-inline. You’ll have to update the export file because of that.

Not sure why the GTK and Windows builds failed.

> Source/WebCore/ChangeLog:15
> +        Use a union on Node::m_renderer between NodeRareData* and RenderObject*. This removes
> +        the overhead of accessing rare data and the memory from the map. We now get the 5%
> +        performance increase observed in Bug 90059 but when accessing node lists on any node.
> +        This should be a perf win on node-list-access.html of a similar percentage.
> +
> +        The extra pointer and conditional does not regress performance when accessing node
> +        lists through methods like getElementsByClassName on the document. I also removed
> +        multiple accesses down hot code paths like recalcStyle.

How did you measure performance outside of those DOM benchmarks? Saying “I removed multiple accesses down hot code paths” sounds like optimization by code inspection, but we need to actually test whether this measurably slows performance in code that was heavily accessing the renderer. Did you do some kind of testing?

> Source/WebCore/dom/Node.cpp:483
> +    data->setRenderer(renderer());

Really should be:

    data->setRenderer(m_data.m_renderer);

No reason to check HasRareDataFlag an extra time here, and the more verbose version also has the advantage of being slightly clearer about what’s going on.

> Source/WebCore/dom/Node.cpp:502
> +    RenderObject* renderer = this->renderer();

Really should be:

    RenderObject* renderer = m_data.m_rareData->renderer();

No reason to check HasRareDataFlag an extra time here, and the more verbose version also has the advantage of being slightly clearer about what’s going on.

> Source/WebCore/dom/Node.cpp:511
> +RenderObject* Node::renderer() const
> +{
> +    return hasRareData() ? m_data.m_rareData->renderer() : m_data.m_renderer;
> +}

Why isn’t this inlined? It’s really OK to have function overhead every time this is called? I’d expect that we’d at least put the !hasRareData() case in the header and inline it.

> Source/WebCore/dom/Node.cpp:1427
> +    RenderObject* renderer = this->renderer();
> +    if (renderer)
> +        renderer->setAnimatableStyle(s);

Our usual style would be to define the variable inside the if statement.

> Source/WebCore/dom/Node.cpp:2842
> +    RenderObject* renderer = this->renderer();
> +    if (renderer)
> +        info.addMember(renderer->style());

Our usual style would be to define the variable inside the if statement.

> Source/WebCore/dom/NodeRareData.h:183
> +        : m_renderer(0)

A shame to set m_renderer here since it will always get set again as soon as the createRareData function returns.

> Source/WebCore/dom/NodeRenderStyle.h:39
> -    if (m_renderer) 
> -        return m_renderer->style();
> +    if (renderer())
> +        return renderer()->style();

Strange that you changed this, but did not put the renderer into a local variable.
Comment 6 Maciej Stachowiak 2012-10-22 22:04:20 PDT
I'd like to see some data about the performance effects of this change. Specifically:

- What tests does it speed up and how much? Actual data, please, not just guesses.

- Tests that cover rendering (including page loading tests, dom-driven animation tests, and anything else that might be rendering heavy).

Reason for the latter is that this patch makes access to a node's renderer take a branch always, and dereference an extra pointer in the case where the node has rare data, so prima facie it seems like there is a risk of regressing anything that is hot in renderer access.


Just as for behavior-affecting changes it is the patch submitter's job to provide evidence that the patch is correct, with performance-affecting patches it is the patch submitter's job to provide evidence that the patch improves performance as intended and does not regress things that it might be feared to regress.

So I hope it is reasonable to ask for this perf test data up front.
Comment 7 Elliott Sprehn 2012-10-23 12:59:06 PDT
Comment on attachment 170032 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=170032&action=review

>> Source/WebCore/ChangeLog:15
>> +        multiple accesses down hot code paths like recalcStyle.
> 
> How did you measure performance outside of those DOM benchmarks? Saying “I removed multiple accesses down hot code paths” sounds like optimization by code inspection, but we need to actually test whether this measurably slows performance in code that was heavily accessing the renderer. Did you do some kind of testing?

I'll remove these for now and benchmark more.

>> Source/WebCore/dom/Node.cpp:511
>> +}
> 
> Why isn’t this inlined? It’s really OK to have function overhead every time this is called? I’d expect that we’d at least put the !hasRareData() case in the header and inline it.

There's header cycles trying to make them inline and knowing the internal layout of a NodeRareData. I'll make the fast path inline.
Comment 8 Elliott Sprehn 2012-10-23 13:25:04 PDT
Created attachment 170218 [details]
Patch
Comment 9 Elliott Sprehn 2012-10-23 13:26:55 PDT
(In reply to comment #8)
> Created an attachment (id=170218) [details]
> Patch

I used a base class for NodeRareData that lets me inline the whole impl of setRenderer() and renderer(). The cast in rareData() is a little gross, but this saves you from all the exports and adding the method call.

I'll run the bencharks now :)
Comment 10 Adam Barth 2012-10-25 23:42:44 PDT
Comment on attachment 170218 [details]
Patch

Marking r- to remove from EWS queue.
Comment 11 Elliott Sprehn 2012-11-01 16:02:18 PDT
Created attachment 171948 [details]
Patch

Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope)
Comment 12 Eric Seidel (no email) 2012-11-01 16:07:45 PDT
Comment on attachment 171948 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=171948&action=review

This LGTM, but I expect we're going to have some minor perf fallout from this.  Code has long assumed that renderer() was basically free, and now it's a branch.

> Source/WebCore/dom/Node.cpp:503
> +    delete m_data.m_rareData;

I assume manual delete is needed because of our use of a union?

> Source/WebCore/dom/Node.h:835
> +    // When a node has rare data we move the renderer into the rare data.

You might also mention somnewhere in this header that this makes renderer() slightly more expensive than before.  Previously renderer() was assumed free. :)
Comment 13 Elliott Sprehn 2012-11-01 17:33:01 PDT
(In reply to comment #12)
> (From update of attachment 171948 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=171948&action=review
> 
> This LGTM, but I expect we're going to have some minor perf fallout from this.  Code has long assumed that renderer() was basically free, and now it's a branch.

Yeah I'll keep an eye on it.

> 
> > Source/WebCore/dom/Node.cpp:503
> > +    delete m_data.m_rareData;
> 
> I assume manual delete is needed because of our use of a union?

Manual delete was always needed because we don't really use the OwnPtr from createRareData, we instead call leakPtr() and just manage it ourselves. You tried to fix this once: https://bugs.webkit.org/show_bug.cgi?id=17199

With the union approach we definitely can't use an OwnPtr, so I was going to just make createRareData return a bare ptr in a future patch.
Comment 14 Build Bot 2012-11-01 17:46:35 PDT
Comment on attachment 171948 [details]
Patch

Attachment 171948 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14657091

New failing tests:
svg/W3C-SVG-1.1/text-align-04-b.svg
svg/custom/tref-remove-target-crash.svg
svg/custom/text-tref-03-b-referenced-element-removal.svg
svg/custom/text-tref-03-b-tref-removal.svg
svg/custom/tref-own-content-removal.svg
svg/batik/text/textProperties.svg
svg/custom/text-linking.svg
svg/W3C-SVG-1.1-SE/styling-pres-02-f.svg
svg/custom/tref-update.svg
svg/foreignObject/text-tref-02-b.svg
svg/custom/tref-nested-events-crash.svg
svg/text/text-align-04-b.svg
svg/custom/text-tref-03-b-change-href.svg
svg/custom/text-tref-03-b-change-href-dom.svg
svg/W3C-SVG-1.1-SE/text-tref-03-b.svg
Comment 15 Elliott Sprehn 2012-11-01 17:56:36 PDT
Created attachment 171961 [details]
Patch for landing
Comment 16 Elliott Sprehn 2012-11-01 18:00:53 PDT
(In reply to comment #14)
> (From update of attachment 171948 [details])
> Attachment 171948 [details] did not pass mac-ews (mac):
> Output: http://queues.webkit.org/results/14657091
> 
> New failing tests:
> svg/W3C-SVG-1.1/text-align-04-b.svg
> svg/custom/tref-remove-target-crash.svg
> svg/custom/text-tref-03-b-referenced-element-removal.svg
> svg/custom/text-tref-03-b-tref-removal.svg
> svg/custom/tref-own-content-removal.svg
> svg/batik/text/textProperties.svg
> svg/custom/text-linking.svg
> svg/W3C-SVG-1.1-SE/styling-pres-02-f.svg
> svg/custom/tref-update.svg
> svg/foreignObject/text-tref-02-b.svg
> svg/custom/tref-nested-events-crash.svg
> svg/text/text-align-04-b.svg
> svg/custom/text-tref-03-b-change-href.svg
> svg/custom/text-tref-03-b-change-href-dom.svg
> svg/W3C-SVG-1.1-SE/text-tref-03-b.svg

I fixed the crashes. I had assumed in Text::recalcTextStyle that if you have a renderer your parent does too, but this is not the case if your parent is a ShadowRoot.

Incidentally the hack in SVGShadowText::willRecalcTextStyle is concerning, I don't think we handle Text styles properly in shadows for things outside SVG right now. I can't figure out where we ever call setStyle() on a Text if it's an immediate child of a ShadowRoot.
Comment 17 Maciej Stachowiak 2012-11-01 23:43:07 PDT
(In reply to comment #11)
> Created an attachment (id=171948) [details]
> Patch
> 
> Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope)

Could you please give some more detail on what tests you ran and what the actual raw results were (as per request in comment #6)?

Did you run any benchmarks that test page loading speed?
Comment 18 Elliott Sprehn 2012-11-02 09:42:43 PDT
(In reply to comment #17)
> (In reply to comment #11)
> > Created an attachment (id=171948) [details] [details]
> > Patch
> > 
> > Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope)
> 
> Could you please give some more detail on what tests you ran and what the actual raw results were (as per request in comment #6)?
> 

I used run-perf-tests on a Mac Pro with yes > /dev/null in a background terminal to ensure no CPU throttling (I tested many times, and brining one core to a constant 100% CPU made all run-perf-tests results have less run-to-run delta).

Results from run-perf-tests (exactly copied):

Parser/html5-full-render    ms  
Parser/textarea-parsing runs/s  

Baseline (with new RareData patch)

4590.78 ± 0.55% 
55.42   ± 0.29%

4596.38 ± 0.23%     
57.45   ± 0.19%

4565.54 ± 0.43% 3.66% Better    
56.00   ± 0.11% 1.04% Better    

r133226 by comparison:

4584.45 ± 0.53%     
48.22   ± 0.27% 13.00% Worse    

4592.59 ± 0.85%     
48.03   ± 0.62% 13.34% Worse

4531.03 ± 0.59% 1.30% Better    
46.98   ± 0.24% 15.22% Worse 

I also ran the Bindings/* tests to ensure there were no changes and there weren't. The existing tests for getElementsByTagName and other node list getters showed no regression meaning this patch is indeed the same +5% Better as observed in 90059. Unfortunately the Bindings/get-elements-by* benchmarks only test document.* versions so this doesn't show that by not regressing performance on those tests with this patch I've also generalized the performance win.

> Did you run any benchmarks that test page loading speed?

I used html5-full-render as a micro benchark for that. It loads the HTML5 spec which is huge and hammers on recalcStyle, setRenderer and other page loading centric things. That's how I decided to change Text::recalcTextStyle. This was in response to Darin's comment on figuring out which places we should really care about accessing renderer() being slower. More detail of this optimization step is in the ChangeLog.

Once this lands the Chromium bots can give us a clearer picture of page loading speed. As this patch is quite small it would be easy to just roll it out if there's a major issue just as was done the last time someone attempted to move m_document into the rare data.
Comment 19 Elliott Sprehn 2012-11-02 14:04:59 PDT
(In reply to comment #18)
> (In reply to comment #17)
> > (In reply to comment #11)
> > > Created an attachment (id=171948) [details] [details] [details]
> > > Patch
> > > 
> > > Did performance testing and removed original speculative optimizations in favor of ones that actually seem needed. There's no observable slowdown now, and there's a 15% improvement on Parser/textarea-parsing.html in addition to the 5% improvement for node lists (and treeScope)
> > 
> > Could you please give some more detail on what tests you ran and what the actual raw results were (as per request in comment #6)?
> > 
> 
> I used run-perf-tests on a Mac Pro with yes > /dev/null in a background terminal to ensure no CPU throttling (I tested many times, and brining one core to a constant 100% CPU made all run-perf-tests results have less run-to-run delta).
> 
> Results from run-perf-tests (exactly copied):
> 
> Parser/html5-full-render    ms  
> Parser/textarea-parsing runs/s  
> 
> Baseline (with new RareData patch)
> 
> 4590.78 ± 0.55% 
> 55.42   ± 0.29%
> 
> 4596.38 ± 0.23%     
> 57.45   ± 0.19%
> 
> 4565.54 ± 0.43% 3.66% Better    
> 56.00   ± 0.11% 1.04% Better    
> 
> r133226 by comparison:
> 
> 4584.45 ± 0.53%     
> 48.22   ± 0.27% 13.00% Worse    
> 
> 4592.59 ± 0.85%     
> 48.03   ± 0.62% 13.34% Worse
> 
> 4531.03 ± 0.59% 1.30% Better    
> 46.98   ± 0.24% 15.22% Worse 
> 

Eric wanted me to clarify these numbers. The numbers show that for the Parser/html5-full-render.html test the patch has effectively no effect. The last one that's 1.3% better without my patch is just normal variation as the two previous runs don't show enough difference for run-perf-tests to even show the delta. Before I tuned the Text::recalcTextStyle method I saw consistent "2% Better" without my patch on html5-full-render which is why I made changes to that method to remove the overhead.

The second test for Parser/textarea-parsing.html shows a consistent improvement of 13-15% with my patch.

The other test I ran was Bindings/get-elements-by-tag-name.html which originally showed no difference on the first iteration of this patch, but I just reran with the current iteration of the patch and it does actually show some improvement:

run-perf-tests --release --chromium Bindings/get-elements-by-tag-name.html

Baseline of r133226:
224.28	± 1.49%	2.52% Worse
227.44	± 1.91%
230.08	± 0.48%

With my patch:
249.95	± 1.27%	8.64% Better
249.27	± 1.49%	8.34% Better
243.02	± 2.21%	5.63% Better

Which makes it look like removing the branch from inside Node::rareData() and the special casing for document rare data improved that benchmark by +5% as well.

So in terms of these benchmarks this looks like all win, but of course the change that was needed in Text::recalcTextStyle highlights that there's possibly minor regressions in places where we call renderer() repeatedly. In a follow up patch we can fix those to cache the renderer in a local variable if needed just like I did in Text::recalcTextStyle and renderStyle().
Comment 20 Eric Seidel (no email) 2012-11-02 14:07:01 PDT
Comment on attachment 171961 [details]
Patch for landing

Thank you for clarifying.  As we discussed, it's likely there may be some minor regressions due to repeated renderer() calls which can easily be avoided by using a local to store the renderer() result.
Comment 21 WebKit Review Bot 2012-11-02 14:09:22 PDT
Comment on attachment 171961 [details]
Patch for landing

Rejecting attachment 171961 [details] from commit-queue.

Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '-..." exit_code: 2

Last 500 characters of output:
atching file Source/WebCore/dom/Node.h
Hunk #2 succeeded at 510 (offset -4 lines).
Hunk #3 succeeded at 827 with fuzz 2 (offset -6 lines).
patching file Source/WebCore/dom/NodeRareData.h
Hunk #1 succeeded at 178 (offset 2 lines).
Hunk #2 succeeded at 200 (offset 2 lines).
patching file Source/WebCore/dom/NodeRenderStyle.h
patching file Source/WebCore/dom/Text.cpp

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--force']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue

Full output: http://queues.webkit.org/results/14670960
Comment 22 Elliott Sprehn 2012-11-02 14:16:59 PDT
Created attachment 172133 [details]
Patch for landing
Comment 23 Eric Seidel (no email) 2012-11-02 14:23:38 PDT
Comment on attachment 172133 [details]
Patch for landing

Engage!
Comment 24 Elliott Sprehn 2012-11-02 14:26:40 PDT
Created attachment 172136 [details]
Patch for landing
Comment 25 Elliott Sprehn 2012-11-02 14:27:46 PDT
(In reply to comment #23)
> (From update of attachment 172133 [details])
> Engage!

Sorry, want to cq+ one more time? I fixed the changelog to have more details from the perf tests I just ran since it turns out this is an unexpected 8% win :)
Comment 26 Eric Seidel (no email) 2012-11-02 14:28:32 PDT
Comment on attachment 172136 [details]
Patch for landing

Re-engage!
Comment 27 Early Warning System Bot 2012-11-02 14:38:31 PDT
Comment on attachment 172136 [details]
Patch for landing

Attachment 172136 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/14670967
Comment 28 Early Warning System Bot 2012-11-02 14:39:02 PDT
Comment on attachment 172136 [details]
Patch for landing

Attachment 172136 [details] did not pass qt-wk2-ews (qt):
Output: http://queues.webkit.org/results/14718016
Comment 29 Elliott Sprehn 2012-11-02 14:56:38 PDT
(In reply to comment #28)
> (From update of attachment 172136 [details])
> Attachment 172136 [details] did not pass qt-wk2-ews (qt):
> Output: http://queues.webkit.org/results/14718016

Woops, it looks like new code went in this morning to track the memory usage of the map itself:

https://trac.webkit.org/changeset/133298

which means that removing the map in this patch also reduces the memory usage on nytimes.com by 250k.
Comment 30 WebKit Review Bot 2012-11-02 14:58:22 PDT
Comment on attachment 172136 [details]
Patch for landing

Attachment 172136 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14687988
Comment 31 Build Bot 2012-11-02 15:17:56 PDT
Comment on attachment 172136 [details]
Patch for landing

Attachment 172136 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14677940
Comment 32 Build Bot 2012-11-02 15:18:45 PDT
Comment on attachment 172136 [details]
Patch for landing

Attachment 172136 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/14713101
Comment 33 Elliott Sprehn 2012-11-02 15:23:39 PDT
Created attachment 172156 [details]
Patch

Fix bug and update changelog to note the memory savings
Comment 34 Eric Seidel (no email) 2012-11-02 16:15:14 PDT
Comment on attachment 172156 [details]
Patch

By our powers combined!
Comment 35 WebKit Review Bot 2012-11-02 16:40:07 PDT
Comment on attachment 172156 [details]
Patch

Clearing flags on attachment: 172156

Committed r133372: <http://trac.webkit.org/changeset/133372>
Comment 36 WebKit Review Bot 2012-11-02 16:40:14 PDT
All reviewed patches have been landed.  Closing bug.
Comment 37 Elliott Sprehn 2012-11-02 16:47:36 PDT
(In reply to comment #36)
> All reviewed patches have been landed.  Closing bug.

By the power of Grayskull!
Comment 38 Maciej Stachowiak 2012-11-04 07:42:12 PST
Thanks for the performance data! I've asked Apple folks to run this through our internal page load speed benchmark as well.
Comment 39 Elliott Sprehn 2012-11-07 16:08:43 PST
(In reply to comment #38)
> Thanks for the performance data! I've asked Apple folks to run this through our internal page load speed benchmark as well.

The Chromium page cyclers showed no regression so the change looks good.
Comment 40 Ryosuke Niwa 2012-11-07 16:24:41 PST
*** Bug 89635 has been marked as a duplicate of this bug. ***
Comment 41 Stephanie Lewis 2012-11-07 16:37:58 PST
Ran it through our performance tests and there were no regressions.