Bug 106726 - 20% regression on dom_perf/DomDivWalk
Summary: 20% regression on dom_perf/DomDivWalk
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P1 Normal
Assignee: Matt Falkenhagen
URL:
Keywords: PerfRegression
Depends on:
Blocks:
 
Reported: 2013-01-11 20:17 PST by Ojan Vafai
Modified: 2013-01-24 23:07 PST (History)
12 users (show)

See Also:


Attachments
Patch (4.31 KB, patch)
2013-01-22 21:17 PST, Matt Falkenhagen
no flags Details | Formatted Diff | Diff
Patch (3.25 KB, patch)
2013-01-22 22:42 PST, Matt Falkenhagen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Levi Weintraub 2013-01-15 11:48:12 PST
I ran these tests locally with each of the patches reverted individually. Reverting 139402 took a full second off the runtime on my machine. I'm assigning to falken@ to investigate further.
Comment 2 Matt Falkenhagen 2013-01-16 00:00:09 PST
Not sure what's going on here. The only code from my patch that this test can hit is in Element::removedFrom. I found this code is hit over 9M times when running DOMDivWalk, so it does look suspicious. On the other hand, the change seems innocuous.

Before the patch, Element::removedFrom did:
    setIsInTopLayer(false);
which did:
    ensureElementRareData()->setIsInTopLayer(inTopLayer);
    setNeedsStyleRecalc(SyntheticStyleChange);

After patch, Element::removedFrom does:
    document()->removeFromTopLayer(this);
which does:
    if (!element->isInTopLayer())
        return;
And isInTopLayer is just:
    return hasRareData() && elementRareData()->isInTopLayer();

I don't have luck reproducing the regression locally. The test runs are about 67 ms, with or without my patch.

Any ideas how to proceed?
Comment 4 Matt Falkenhagen 2013-01-16 00:47:25 PST
(In reply to comment #3)
> Also note that I don't see the regression on http://webkit-perf.appspot.com/graph.html#tests=[[6107,2001,3001],[6107,2001,2389050],[6107,2001,173262]]&sel=1357718371985,1358323171985&displayrange=7&datatype=running

It looks like that's for DOMWalk, not DOMDivWalk. I don't see the regression in the link below either, but somehow there's no data for Chromium Linux earlier than today.

http://webkit-perf.appspot.com/graph.html#tests=[[6104,2001,173262],[6104,2001,2389050],[6104,2001,3001]]&sel=1357720675021,1358325475021&displayrange=7&datatype=running
Comment 5 Ojan Vafai 2013-01-16 09:54:52 PST
Nothing obvious there. Maybe something is getting inlined differently? Couple shot-in-the-dark things you could try:
-Move the element->isInTopLayer() check to removedFrom
-move isInTopLayer implementation to the header to get it to inline
Comment 6 Matt Falkenhagen 2013-01-16 17:25:24 PST
(In reply to comment #5)
> Nothing obvious there. Maybe something is getting inlined differently? Couple shot-in-the-dark things you could try:
> -Move the element->isInTopLayer() check to removedFrom
> -move isInTopLayer implementation to the header to get it to inline

I'll give it a try. Is there a way to submit a try job to the perf bot? It's not reproducing locally (maybe my machine is too fast). Or should I just commit and see what happens?
Comment 7 Matt Falkenhagen 2013-01-17 18:09:42 PST
Rolled out the patch: http://trac.webkit.org/changeset/140080
Comment 8 Ojan Vafai 2013-01-18 09:08:55 PST
Looks like 139402 wasn't the cause of the regression. It's been rolled out and there's no improvement. Matt, r=me on relanding that. This is why it's usually best to rollout first and ask questions later with perf regressions. Saves everyone work. :)

I wonder if it's the FrameSelection change in http://trac.webkit.org/changeset/139401/. Shinya-san, can you try a rollout to confirm? If it's not the culprit r=me on relanding it.
Comment 9 Ryosuke Niwa 2013-01-18 10:51:52 PST
Please try to run the latest locally before rolling the patch out. All these rollouts are causing svn churn.
Comment 10 Ojan Vafai 2013-01-18 10:53:50 PST
Comment 1 already tried reproducing locally and clearly the test variance is too high to get a reliable result locally. I agree that the commit churn isn't great, but until someone builds better infrastructure for trying perf changes on bots, I don't see an alternative.
Comment 11 Ryosuke Niwa 2013-01-21 05:04:56 PST
(In reply to comment #10)
> Comment 1 already tried reproducing locally and clearly the test variance is too high to get a reliable result locally. I agree that the commit churn isn't great, but until someone builds better infrastructure for trying perf changes on bots, I don't see an alternative.

If the variance is the issue, you should be able to run the test 20, 100, or even 1000 times and use the mean of those results. It should reflect the said regression if it's really reproducible.
Comment 12 Matt Falkenhagen 2013-01-22 01:24:13 PST
It looks like my patch is indeed the culprit. A short while after the rollout the perf indeed went back up and immediately after relanding it dropped back down.

Sorry for the SVN churn but I think I have to rollout again.

I don't think variance is the issue, the regression just doesn't occur on my machine. I had the test run 200 iterations instead of the normal 20 and the results of 'run-perf-tests ./PerformanceTests/DOM/DOMDivWalk.html' were:

(without my patch)
RESULT DOM: DOMDivWalk= 67.3472753909 ms
median= 67.2489583376 ms, stdev= 0.787590427719 ms, min= 64.9623333205 ms, max= 70.2458332914 ms
(with my patch)
RESULT DOM: DOMDivWalk= 68.0555829093 ms
median= 67.6872915453 ms, stdev= 1.23173675198 ms, min= 66.4260832709 ms, max= 73.4682503195 ms

It looks like we may not have a way to do a try job on the perf bot.

I'm curious why the regression doesn't occur on http://webkit-perf.appspot.com. What does it do differently from http://chromium-perf.appspot.com?
I also opened DOMDivWalk in Chrome itself but didn't seem to get meaningful results (my patch was faster than without).
Comment 13 Matt Falkenhagen 2013-01-22 03:16:05 PST
I think moving the top layer flag from ElementRareData to NodeFlags has a chance of fixing the regression, and seems to be a good idea anyway. I want to try to land the patch at bug 107542 and see if it fixes it.
Comment 14 Ryosuke Niwa 2013-01-22 10:21:28 PST
(In reply to comment #12)
> (without my patch)
> RESULT DOM: DOMDivWalk= 67.3472753909 ms
> median= 67.2489583376 ms, stdev= 0.787590427719 ms, min= 64.9623333205 ms, max= 70.2458332914 ms
> (with my patch)
> RESULT DOM: DOMDivWalk= 68.0555829093 ms
> median= 67.6872915453 ms, stdev= 1.23173675198 ms, min= 66.4260832709 ms, max= 73.4682503195 ms

There is a slight difference between the two: 67.35 vs. 68.06. I bet if you took 2000 or 5000 samples instead of a mere 200, you'll see more clear deviation between the two.
Comment 15 Matt Falkenhagen 2013-01-22 21:17:22 PST
Created attachment 184130 [details]
Patch
Comment 16 Ryosuke Niwa 2013-01-22 21:22:17 PST
Comment on attachment 184130 [details]
Patch

Is this patch speculative? If it is, I don't see how it could fix the perf. regression.
Comment 17 Matt Falkenhagen 2013-01-22 21:25:18 PST
(mid-air collision)
The NodeFlags patch did not help. I have another idea I'd like to try. I realized there is already a check in Element::removedFrom for Fullscreen. I'll try to absorb the perf hit of the check for top layer by moving the Fullscreen and top layer checks into a slow path.

This code is somewhat ugly but it should be fixed as the plan is for Fullscreen and top layer to merge (bug 107617).

I get somewhat promising results:
(with slow path patch)
RESULT DOM: DOMDivWalk= 63.8586679573 ms
median= 63.7347082957 ms, stdev= 0.864724536393 ms, min= 62.3731666322 ms, max= 70.3177498071 ms
(with current build)
RESULT DOM: DOMDivWalk= 64.9818412457 ms
median= 64.8267084325 ms, stdev= 1.28962335807 ms, min= 62.4124999352 ms, max= 70.5802498269 ms

I ran the first test for 2000 iterations but the second one only 200 (I accidentally ran 2000 on the wrong build...).

If this also fails, I'll rollout all three patches and start over.
Comment 18 Hajime Morrita 2013-01-22 21:34:32 PST
Comment on attachment 184130 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=184130&action=review

> Source/WebCore/dom/Node.cpp:-2623
> -#if ENABLE(DIALOG_ELEMENT)

we don't need this. Instead of that....

> Source/WebCore/dom/Node.h:694
> +    void setIsInTopLayer(bool);

The disabled case could be inlined here.
I think guarding whole definition(s) would be cleaner than having the ifdef in the function body.
Comment 19 Matt Falkenhagen 2013-01-22 22:42:01 PST
Created attachment 184145 [details]
Patch
Comment 20 WebKit Review Bot 2013-01-22 23:32:59 PST
Comment on attachment 184145 [details]
Patch

Clearing flags on attachment: 184145

Committed r140512: <http://trac.webkit.org/changeset/140512>
Comment 21 WebKit Review Bot 2013-01-22 23:33:03 PST
All reviewed patches have been landed.  Closing bug.
Comment 22 Matt Falkenhagen 2013-01-23 07:21:08 PST
r140512 did not fix the regression. I'll roll out the three patches (r140307, r140411, r140512).

rniwa had a clue that run-perf-tests and webkit-perf.appspot.com use a different test runner than dom_perf, which may explain why I couldn't reproduce locally and why the regression doesn't show on webkit-perf.appspot.com. Particularly, the webkit runner certainly doesn't include element removal in the benchmark time, but dom_perf might. So I have some room to investigate more before attempting to land something again.
Comment 23 Elliott Sprehn 2013-01-23 09:26:45 PST
(In reply to comment #22)
> r140512 did not fix the regression. I'll roll out the three patches (r140307, r140411, r140512).
> 
> rniwa had a clue that run-perf-tests and webkit-perf.appspot.com use a different test runner than dom_perf, which may explain why I couldn't reproduce locally and why the regression doesn't show on webkit-perf.appspot.com. Particularly, the webkit runner certainly doesn't include element removal in the benchmark time, but dom_perf might. So I have some room to investigate more before attempting to land something again.

The patches here don't make sense, doing hasRareData() || isInTopLayer() is only avoiding a single function call, and if that was the "slow part" then my patch that checks pseudoElement for BEFORE and AFTER should have caused a 40% regression: https://trac.webkit.org/changeset/140452

I don't think we're focusing on the right thing, certainly not with trying to avoid this one place of method call overhead (that's definitely not 20% of the time spent in DOMDivWalk). If anything the original patch should have made things _much_ faster since we were inadvertently allocating rare data for every element that was removed from the DOM because Element::removedFrom did setIsInTopLayer(false) which did ensureRareData() which was super bad for memory usage.

At the very least setIsInTopLayer needs to check if (!hasRareData() && !inTopLayer) return; to avoid allocating rare data for every node that gets removed...
Comment 24 Elliott Sprehn 2013-01-23 09:40:40 PST
(In reply to comment #23)
> (In reply to comment #22)
> ...
>

Can we please roll out this patch? There's no reason for this if guard, as there's no reason that isInTopLayer() should be slower than pseudoElement(BEFORE) or AFTER right above this.
Comment 25 Matt Falkenhagen 2013-01-23 13:50:06 PST
Rolled out: http://trac.webkit.org/changeset/140546

Now DOMDivWalk looks better again.
Comment 26 Hajime Morrita 2013-01-23 16:37:31 PST
(In reply to comment #25)
> Rolled out: http://trac.webkit.org/changeset/140546
> 
> Now DOMDivWalk looks better again.
Sigh. We should definitely look how our internal test runner work.
I guess we can run it on DRT instead of full Chromium with small modifications.
Then running it will have less pain.
Comment 27 Matt Falkenhagen 2013-01-24 01:11:16 PST
On IRC, Elliott had a very promising theory about what is happening. A much earlier patch (r136575) inadvertently changed Element::removedFrom to always allocate rare data, by calling setIsInTopLayer. Element::removedFrom is not called during the DOMDivWalk benchmark itself, just in the setup/teardown between iterations. This means every element gets rare data for free, and perf hits that would normally occur by allocating rare data vanish from the benchmark, resulting in a higher score.

Since my patch changed removedFrom to not allocate rare data unless needed, the score decreases.

The graph supports this theory:
http://chromium-perf.appspot.com/?tab=linux-release-webkit-latest&graph=DOMDivWalk&trace=score&rev=179000&history=700&master=ChromiumWebkit&testSuite=dom_perf&details=true

- r136575: always allocate rare data in Element::removedFrom
- r139402: first landing of my patch
- r140080: rollout
- r140307: reland
- r140546: rollout
- r140638: fix for always allocating rare data

So if this is the case, we should see a perf hit for 140638, which so far is happening.

I should have realized this too because my code before r136575 was performing the isInTopLayer() check on each Element::removedFrom call, and evidently no perf regression occurred.
Comment 28 Ojan Vafai 2013-01-24 19:09:17 PST
Seems reasonable to me. It doesn't seem to me like there's more to be had from digging deeper here.
Comment 29 Matt Falkenhagen 2013-01-24 20:21:47 PST
Just for more evidence, I slowed down rare data allocation with a sleep and tested with and without r140638:

(always allocating rare data)
RESULT DOM: DOMDivWalk= 3293.468275 ms
(not allocating rare data)
RESULT DOM: DOMDivWalk= 1865.209725 ms

The benchmark uses childNodes, which uses rare data.

I'll reland the patch.
Comment 30 Matt Falkenhagen 2013-01-24 20:24:02 PST
(In reply to comment #29)
> (always allocating rare data)
> RESULT DOM: DOMDivWalk= 3293.468275 ms
> (not allocating rare data)
> RESULT DOM: DOMDivWalk= 1865.209725 ms

Oops, I reversed those. Always allocating rare data is faster than not.
Comment 31 Ryosuke Niwa 2013-01-24 22:58:33 PST
(In reply to comment #30)
> (In reply to comment #29)
> > (always allocating rare data)
> > RESULT DOM: DOMDivWalk= 3293.468275 ms
> > (not allocating rare data)
> > RESULT DOM: DOMDivWalk= 1865.209725 ms
> 
> Always allocating rare data is faster than not.

That's not really true. We're just shifting the runtime cost to where it's not measured. We're paying the same runtime cost at the end of the day.
Comment 32 Matt Falkenhagen 2013-01-24 23:07:32 PST
(In reply to comment #31)
> That's not really true. We're just shifting the runtime cost to where it's not measured. We're paying the same runtime cost at the end of the day.

Yes that was unclear wording on my part... I meant the always allocated case has the better/faster benchmark result.