RESOLVED FIXED 106726
20% regression on dom_perf/DomDivWalk
https://bugs.webkit.org/show_bug.cgi?id=106726
Summary 20% regression on dom_perf/DomDivWalk
Attachments
Patch (4.31 KB, patch)
2013-01-22 21:17 PST, Matt Falkenhagen
no flags
Patch (3.25 KB, patch)
2013-01-22 22:42 PST, Matt Falkenhagen
no flags
Levi Weintraub
Comment 1 2013-01-15 11:48:12 PST
I ran these tests locally with each of the patches reverted individually. Reverting 139402 took a full second off the runtime on my machine. I'm assigning to falken@ to investigate further.
Matt Falkenhagen
Comment 2 2013-01-16 00:00:09 PST
Not sure what's going on here. The only code from my patch that this test can hit is in Element::removedFrom. I found this code is hit over 9M times when running DOMDivWalk, so it does look suspicious. On the other hand, the change seems innocuous. Before the patch, Element::removedFrom did: setIsInTopLayer(false); which did: ensureElementRareData()->setIsInTopLayer(inTopLayer); setNeedsStyleRecalc(SyntheticStyleChange); After patch, Element::removedFrom does: document()->removeFromTopLayer(this); which does: if (!element->isInTopLayer()) return; And isInTopLayer is just: return hasRareData() && elementRareData()->isInTopLayer(); I don't have luck reproducing the regression locally. The test runs are about 67 ms, with or without my patch. Any ideas how to proceed?
Matt Falkenhagen
Comment 4 2013-01-16 00:47:25 PST
(In reply to comment #3) > Also note that I don't see the regression on http://webkit-perf.appspot.com/graph.html#tests=[[6107,2001,3001],[6107,2001,2389050],[6107,2001,173262]]&sel=1357718371985,1358323171985&displayrange=7&datatype=running It looks like that's for DOMWalk, not DOMDivWalk. I don't see the regression in the link below either, but somehow there's no data for Chromium Linux earlier than today. http://webkit-perf.appspot.com/graph.html#tests=[[6104,2001,173262],[6104,2001,2389050],[6104,2001,3001]]&sel=1357720675021,1358325475021&displayrange=7&datatype=running
Ojan Vafai
Comment 5 2013-01-16 09:54:52 PST
Nothing obvious there. Maybe something is getting inlined differently? Couple shot-in-the-dark things you could try: -Move the element->isInTopLayer() check to removedFrom -move isInTopLayer implementation to the header to get it to inline
Matt Falkenhagen
Comment 6 2013-01-16 17:25:24 PST
(In reply to comment #5) > Nothing obvious there. Maybe something is getting inlined differently? Couple shot-in-the-dark things you could try: > -Move the element->isInTopLayer() check to removedFrom > -move isInTopLayer implementation to the header to get it to inline I'll give it a try. Is there a way to submit a try job to the perf bot? It's not reproducing locally (maybe my machine is too fast). Or should I just commit and see what happens?
Matt Falkenhagen
Comment 7 2013-01-17 18:09:42 PST
Ojan Vafai
Comment 8 2013-01-18 09:08:55 PST
Looks like 139402 wasn't the cause of the regression. It's been rolled out and there's no improvement. Matt, r=me on relanding that. This is why it's usually best to rollout first and ask questions later with perf regressions. Saves everyone work. :) I wonder if it's the FrameSelection change in http://trac.webkit.org/changeset/139401/. Shinya-san, can you try a rollout to confirm? If it's not the culprit r=me on relanding it.
Ryosuke Niwa
Comment 9 2013-01-18 10:51:52 PST
Please try to run the latest locally before rolling the patch out. All these rollouts are causing svn churn.
Ojan Vafai
Comment 10 2013-01-18 10:53:50 PST
Comment 1 already tried reproducing locally and clearly the test variance is too high to get a reliable result locally. I agree that the commit churn isn't great, but until someone builds better infrastructure for trying perf changes on bots, I don't see an alternative.
Ryosuke Niwa
Comment 11 2013-01-21 05:04:56 PST
(In reply to comment #10) > Comment 1 already tried reproducing locally and clearly the test variance is too high to get a reliable result locally. I agree that the commit churn isn't great, but until someone builds better infrastructure for trying perf changes on bots, I don't see an alternative. If the variance is the issue, you should be able to run the test 20, 100, or even 1000 times and use the mean of those results. It should reflect the said regression if it's really reproducible.
Matt Falkenhagen
Comment 12 2013-01-22 01:24:13 PST
It looks like my patch is indeed the culprit. A short while after the rollout the perf indeed went back up and immediately after relanding it dropped back down. Sorry for the SVN churn but I think I have to rollout again. I don't think variance is the issue, the regression just doesn't occur on my machine. I had the test run 200 iterations instead of the normal 20 and the results of 'run-perf-tests ./PerformanceTests/DOM/DOMDivWalk.html' were: (without my patch) RESULT DOM: DOMDivWalk= 67.3472753909 ms median= 67.2489583376 ms, stdev= 0.787590427719 ms, min= 64.9623333205 ms, max= 70.2458332914 ms (with my patch) RESULT DOM: DOMDivWalk= 68.0555829093 ms median= 67.6872915453 ms, stdev= 1.23173675198 ms, min= 66.4260832709 ms, max= 73.4682503195 ms It looks like we may not have a way to do a try job on the perf bot. I'm curious why the regression doesn't occur on http://webkit-perf.appspot.com. What does it do differently from http://chromium-perf.appspot.com? I also opened DOMDivWalk in Chrome itself but didn't seem to get meaningful results (my patch was faster than without).
Matt Falkenhagen
Comment 13 2013-01-22 03:16:05 PST
I think moving the top layer flag from ElementRareData to NodeFlags has a chance of fixing the regression, and seems to be a good idea anyway. I want to try to land the patch at bug 107542 and see if it fixes it.
Ryosuke Niwa
Comment 14 2013-01-22 10:21:28 PST
(In reply to comment #12) > (without my patch) > RESULT DOM: DOMDivWalk= 67.3472753909 ms > median= 67.2489583376 ms, stdev= 0.787590427719 ms, min= 64.9623333205 ms, max= 70.2458332914 ms > (with my patch) > RESULT DOM: DOMDivWalk= 68.0555829093 ms > median= 67.6872915453 ms, stdev= 1.23173675198 ms, min= 66.4260832709 ms, max= 73.4682503195 ms There is a slight difference between the two: 67.35 vs. 68.06. I bet if you took 2000 or 5000 samples instead of a mere 200, you'll see more clear deviation between the two.
Matt Falkenhagen
Comment 15 2013-01-22 21:17:22 PST
Ryosuke Niwa
Comment 16 2013-01-22 21:22:17 PST
Comment on attachment 184130 [details] Patch Is this patch speculative? If it is, I don't see how it could fix the perf. regression.
Matt Falkenhagen
Comment 17 2013-01-22 21:25:18 PST
(mid-air collision) The NodeFlags patch did not help. I have another idea I'd like to try. I realized there is already a check in Element::removedFrom for Fullscreen. I'll try to absorb the perf hit of the check for top layer by moving the Fullscreen and top layer checks into a slow path. This code is somewhat ugly but it should be fixed as the plan is for Fullscreen and top layer to merge (bug 107617). I get somewhat promising results: (with slow path patch) RESULT DOM: DOMDivWalk= 63.8586679573 ms median= 63.7347082957 ms, stdev= 0.864724536393 ms, min= 62.3731666322 ms, max= 70.3177498071 ms (with current build) RESULT DOM: DOMDivWalk= 64.9818412457 ms median= 64.8267084325 ms, stdev= 1.28962335807 ms, min= 62.4124999352 ms, max= 70.5802498269 ms I ran the first test for 2000 iterations but the second one only 200 (I accidentally ran 2000 on the wrong build...). If this also fails, I'll rollout all three patches and start over.
Hajime Morrita
Comment 18 2013-01-22 21:34:32 PST
Comment on attachment 184130 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=184130&action=review > Source/WebCore/dom/Node.cpp:-2623 > -#if ENABLE(DIALOG_ELEMENT) we don't need this. Instead of that.... > Source/WebCore/dom/Node.h:694 > + void setIsInTopLayer(bool); The disabled case could be inlined here. I think guarding whole definition(s) would be cleaner than having the ifdef in the function body.
Matt Falkenhagen
Comment 19 2013-01-22 22:42:01 PST
WebKit Review Bot
Comment 20 2013-01-22 23:32:59 PST
Comment on attachment 184145 [details] Patch Clearing flags on attachment: 184145 Committed r140512: <http://trac.webkit.org/changeset/140512>
WebKit Review Bot
Comment 21 2013-01-22 23:33:03 PST
All reviewed patches have been landed. Closing bug.
Matt Falkenhagen
Comment 22 2013-01-23 07:21:08 PST
r140512 did not fix the regression. I'll roll out the three patches (r140307, r140411, r140512). rniwa had a clue that run-perf-tests and webkit-perf.appspot.com use a different test runner than dom_perf, which may explain why I couldn't reproduce locally and why the regression doesn't show on webkit-perf.appspot.com. Particularly, the webkit runner certainly doesn't include element removal in the benchmark time, but dom_perf might. So I have some room to investigate more before attempting to land something again.
Elliott Sprehn
Comment 23 2013-01-23 09:26:45 PST
(In reply to comment #22) > r140512 did not fix the regression. I'll roll out the three patches (r140307, r140411, r140512). > > rniwa had a clue that run-perf-tests and webkit-perf.appspot.com use a different test runner than dom_perf, which may explain why I couldn't reproduce locally and why the regression doesn't show on webkit-perf.appspot.com. Particularly, the webkit runner certainly doesn't include element removal in the benchmark time, but dom_perf might. So I have some room to investigate more before attempting to land something again. The patches here don't make sense, doing hasRareData() || isInTopLayer() is only avoiding a single function call, and if that was the "slow part" then my patch that checks pseudoElement for BEFORE and AFTER should have caused a 40% regression: https://trac.webkit.org/changeset/140452 I don't think we're focusing on the right thing, certainly not with trying to avoid this one place of method call overhead (that's definitely not 20% of the time spent in DOMDivWalk). If anything the original patch should have made things _much_ faster since we were inadvertently allocating rare data for every element that was removed from the DOM because Element::removedFrom did setIsInTopLayer(false) which did ensureRareData() which was super bad for memory usage. At the very least setIsInTopLayer needs to check if (!hasRareData() && !inTopLayer) return; to avoid allocating rare data for every node that gets removed...
Elliott Sprehn
Comment 24 2013-01-23 09:40:40 PST
(In reply to comment #23) > (In reply to comment #22) > ... > Can we please roll out this patch? There's no reason for this if guard, as there's no reason that isInTopLayer() should be slower than pseudoElement(BEFORE) or AFTER right above this.
Matt Falkenhagen
Comment 25 2013-01-23 13:50:06 PST
Rolled out: http://trac.webkit.org/changeset/140546 Now DOMDivWalk looks better again.
Hajime Morrita
Comment 26 2013-01-23 16:37:31 PST
(In reply to comment #25) > Rolled out: http://trac.webkit.org/changeset/140546 > > Now DOMDivWalk looks better again. Sigh. We should definitely look how our internal test runner work. I guess we can run it on DRT instead of full Chromium with small modifications. Then running it will have less pain.
Matt Falkenhagen
Comment 27 2013-01-24 01:11:16 PST
On IRC, Elliott had a very promising theory about what is happening. A much earlier patch (r136575) inadvertently changed Element::removedFrom to always allocate rare data, by calling setIsInTopLayer. Element::removedFrom is not called during the DOMDivWalk benchmark itself, just in the setup/teardown between iterations. This means every element gets rare data for free, and perf hits that would normally occur by allocating rare data vanish from the benchmark, resulting in a higher score. Since my patch changed removedFrom to not allocate rare data unless needed, the score decreases. The graph supports this theory: http://chromium-perf.appspot.com/?tab=linux-release-webkit-latest&graph=DOMDivWalk&trace=score&rev=179000&history=700&master=ChromiumWebkit&testSuite=dom_perf&details=true - r136575: always allocate rare data in Element::removedFrom - r139402: first landing of my patch - r140080: rollout - r140307: reland - r140546: rollout - r140638: fix for always allocating rare data So if this is the case, we should see a perf hit for 140638, which so far is happening. I should have realized this too because my code before r136575 was performing the isInTopLayer() check on each Element::removedFrom call, and evidently no perf regression occurred.
Ojan Vafai
Comment 28 2013-01-24 19:09:17 PST
Seems reasonable to me. It doesn't seem to me like there's more to be had from digging deeper here.
Matt Falkenhagen
Comment 29 2013-01-24 20:21:47 PST
Just for more evidence, I slowed down rare data allocation with a sleep and tested with and without r140638: (always allocating rare data) RESULT DOM: DOMDivWalk= 3293.468275 ms (not allocating rare data) RESULT DOM: DOMDivWalk= 1865.209725 ms The benchmark uses childNodes, which uses rare data. I'll reland the patch.
Matt Falkenhagen
Comment 30 2013-01-24 20:24:02 PST
(In reply to comment #29) > (always allocating rare data) > RESULT DOM: DOMDivWalk= 3293.468275 ms > (not allocating rare data) > RESULT DOM: DOMDivWalk= 1865.209725 ms Oops, I reversed those. Always allocating rare data is faster than not.
Ryosuke Niwa
Comment 31 2013-01-24 22:58:33 PST
(In reply to comment #30) > (In reply to comment #29) > > (always allocating rare data) > > RESULT DOM: DOMDivWalk= 3293.468275 ms > > (not allocating rare data) > > RESULT DOM: DOMDivWalk= 1865.209725 ms > > Always allocating rare data is faster than not. That's not really true. We're just shifting the runtime cost to where it's not measured. We're paying the same runtime cost at the end of the day.
Matt Falkenhagen
Comment 32 2013-01-24 23:07:32 PST
(In reply to comment #31) > That's not really true. We're just shifting the runtime cost to where it's not measured. We're paying the same runtime cost at the end of the day. Yes that was unclear wording on my part... I meant the always allocated case has the better/faster benchmark result.
Note You need to log in before you can comment on or make changes to this bug.