Regression range: http://trac.webkit.org/log/?verbose=on&rev=139402&stop_rev=139400 REGRESSIONS http://chromium-perf.appspot.com/?tab=linux-release-webkit-latest&graph=DOMDivWalk&trace=score&rev=176358&history=150&master=ChromiumWebkit&testSuite=dom_perf&details=true PROGRESSIONS http://chromium-perf.appspot.com/?tab=linux-release-webkit-latest&graph=Get%20Elements&trace=score&history=150&master=ChromiumWebkit&testSuite=dom_perf&details=true
I ran these tests locally with each of the patches reverted individually. Reverting 139402 took a full second off the runtime on my machine. I'm assigning to falken@ to investigate further.
Not sure what's going on here. The only code from my patch that this test can hit is in Element::removedFrom. I found this code is hit over 9M times when running DOMDivWalk, so it does look suspicious. On the other hand, the change seems innocuous. Before the patch, Element::removedFrom did: setIsInTopLayer(false); which did: ensureElementRareData()->setIsInTopLayer(inTopLayer); setNeedsStyleRecalc(SyntheticStyleChange); After patch, Element::removedFrom does: document()->removeFromTopLayer(this); which does: if (!element->isInTopLayer()) return; And isInTopLayer is just: return hasRareData() && elementRareData()->isInTopLayer(); I don't have luck reproducing the regression locally. The test runs are about 67 ms, with or without my patch. Any ideas how to proceed?
Also note that I don't see the regression on http://webkit-perf.appspot.com/graph.html#tests=[[6107,2001,3001],[6107,2001,2389050],[6107,2001,173262]]&sel=1357718371985,1358323171985&displayrange=7&datatype=running
(In reply to comment #3) > Also note that I don't see the regression on http://webkit-perf.appspot.com/graph.html#tests=[[6107,2001,3001],[6107,2001,2389050],[6107,2001,173262]]&sel=1357718371985,1358323171985&displayrange=7&datatype=running It looks like that's for DOMWalk, not DOMDivWalk. I don't see the regression in the link below either, but somehow there's no data for Chromium Linux earlier than today. http://webkit-perf.appspot.com/graph.html#tests=[[6104,2001,173262],[6104,2001,2389050],[6104,2001,3001]]&sel=1357720675021,1358325475021&displayrange=7&datatype=running
Nothing obvious there. Maybe something is getting inlined differently? Couple shot-in-the-dark things you could try: -Move the element->isInTopLayer() check to removedFrom -move isInTopLayer implementation to the header to get it to inline
(In reply to comment #5) > Nothing obvious there. Maybe something is getting inlined differently? Couple shot-in-the-dark things you could try: > -Move the element->isInTopLayer() check to removedFrom > -move isInTopLayer implementation to the header to get it to inline I'll give it a try. Is there a way to submit a try job to the perf bot? It's not reproducing locally (maybe my machine is too fast). Or should I just commit and see what happens?
Rolled out the patch: http://trac.webkit.org/changeset/140080
Looks like 139402 wasn't the cause of the regression. It's been rolled out and there's no improvement. Matt, r=me on relanding that. This is why it's usually best to rollout first and ask questions later with perf regressions. Saves everyone work. :) I wonder if it's the FrameSelection change in http://trac.webkit.org/changeset/139401/. Shinya-san, can you try a rollout to confirm? If it's not the culprit r=me on relanding it.
Please try to run the latest locally before rolling the patch out. All these rollouts are causing svn churn.
Comment 1 already tried reproducing locally and clearly the test variance is too high to get a reliable result locally. I agree that the commit churn isn't great, but until someone builds better infrastructure for trying perf changes on bots, I don't see an alternative.
(In reply to comment #10) > Comment 1 already tried reproducing locally and clearly the test variance is too high to get a reliable result locally. I agree that the commit churn isn't great, but until someone builds better infrastructure for trying perf changes on bots, I don't see an alternative. If the variance is the issue, you should be able to run the test 20, 100, or even 1000 times and use the mean of those results. It should reflect the said regression if it's really reproducible.
It looks like my patch is indeed the culprit. A short while after the rollout the perf indeed went back up and immediately after relanding it dropped back down. Sorry for the SVN churn but I think I have to rollout again. I don't think variance is the issue, the regression just doesn't occur on my machine. I had the test run 200 iterations instead of the normal 20 and the results of 'run-perf-tests ./PerformanceTests/DOM/DOMDivWalk.html' were: (without my patch) RESULT DOM: DOMDivWalk= 67.3472753909 ms median= 67.2489583376 ms, stdev= 0.787590427719 ms, min= 64.9623333205 ms, max= 70.2458332914 ms (with my patch) RESULT DOM: DOMDivWalk= 68.0555829093 ms median= 67.6872915453 ms, stdev= 1.23173675198 ms, min= 66.4260832709 ms, max= 73.4682503195 ms It looks like we may not have a way to do a try job on the perf bot. I'm curious why the regression doesn't occur on http://webkit-perf.appspot.com. What does it do differently from http://chromium-perf.appspot.com? I also opened DOMDivWalk in Chrome itself but didn't seem to get meaningful results (my patch was faster than without).
I think moving the top layer flag from ElementRareData to NodeFlags has a chance of fixing the regression, and seems to be a good idea anyway. I want to try to land the patch at bug 107542 and see if it fixes it.
(In reply to comment #12) > (without my patch) > RESULT DOM: DOMDivWalk= 67.3472753909 ms > median= 67.2489583376 ms, stdev= 0.787590427719 ms, min= 64.9623333205 ms, max= 70.2458332914 ms > (with my patch) > RESULT DOM: DOMDivWalk= 68.0555829093 ms > median= 67.6872915453 ms, stdev= 1.23173675198 ms, min= 66.4260832709 ms, max= 73.4682503195 ms There is a slight difference between the two: 67.35 vs. 68.06. I bet if you took 2000 or 5000 samples instead of a mere 200, you'll see more clear deviation between the two.
Created attachment 184130 [details] Patch
Comment on attachment 184130 [details] Patch Is this patch speculative? If it is, I don't see how it could fix the perf. regression.
(mid-air collision) The NodeFlags patch did not help. I have another idea I'd like to try. I realized there is already a check in Element::removedFrom for Fullscreen. I'll try to absorb the perf hit of the check for top layer by moving the Fullscreen and top layer checks into a slow path. This code is somewhat ugly but it should be fixed as the plan is for Fullscreen and top layer to merge (bug 107617). I get somewhat promising results: (with slow path patch) RESULT DOM: DOMDivWalk= 63.8586679573 ms median= 63.7347082957 ms, stdev= 0.864724536393 ms, min= 62.3731666322 ms, max= 70.3177498071 ms (with current build) RESULT DOM: DOMDivWalk= 64.9818412457 ms median= 64.8267084325 ms, stdev= 1.28962335807 ms, min= 62.4124999352 ms, max= 70.5802498269 ms I ran the first test for 2000 iterations but the second one only 200 (I accidentally ran 2000 on the wrong build...). If this also fails, I'll rollout all three patches and start over.
Comment on attachment 184130 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=184130&action=review > Source/WebCore/dom/Node.cpp:-2623 > -#if ENABLE(DIALOG_ELEMENT) we don't need this. Instead of that.... > Source/WebCore/dom/Node.h:694 > + void setIsInTopLayer(bool); The disabled case could be inlined here. I think guarding whole definition(s) would be cleaner than having the ifdef in the function body.
Created attachment 184145 [details] Patch
Comment on attachment 184145 [details] Patch Clearing flags on attachment: 184145 Committed r140512: <http://trac.webkit.org/changeset/140512>
All reviewed patches have been landed. Closing bug.
r140512 did not fix the regression. I'll roll out the three patches (r140307, r140411, r140512). rniwa had a clue that run-perf-tests and webkit-perf.appspot.com use a different test runner than dom_perf, which may explain why I couldn't reproduce locally and why the regression doesn't show on webkit-perf.appspot.com. Particularly, the webkit runner certainly doesn't include element removal in the benchmark time, but dom_perf might. So I have some room to investigate more before attempting to land something again.
(In reply to comment #22) > r140512 did not fix the regression. I'll roll out the three patches (r140307, r140411, r140512). > > rniwa had a clue that run-perf-tests and webkit-perf.appspot.com use a different test runner than dom_perf, which may explain why I couldn't reproduce locally and why the regression doesn't show on webkit-perf.appspot.com. Particularly, the webkit runner certainly doesn't include element removal in the benchmark time, but dom_perf might. So I have some room to investigate more before attempting to land something again. The patches here don't make sense, doing hasRareData() || isInTopLayer() is only avoiding a single function call, and if that was the "slow part" then my patch that checks pseudoElement for BEFORE and AFTER should have caused a 40% regression: https://trac.webkit.org/changeset/140452 I don't think we're focusing on the right thing, certainly not with trying to avoid this one place of method call overhead (that's definitely not 20% of the time spent in DOMDivWalk). If anything the original patch should have made things _much_ faster since we were inadvertently allocating rare data for every element that was removed from the DOM because Element::removedFrom did setIsInTopLayer(false) which did ensureRareData() which was super bad for memory usage. At the very least setIsInTopLayer needs to check if (!hasRareData() && !inTopLayer) return; to avoid allocating rare data for every node that gets removed...
(In reply to comment #23) > (In reply to comment #22) > ... > Can we please roll out this patch? There's no reason for this if guard, as there's no reason that isInTopLayer() should be slower than pseudoElement(BEFORE) or AFTER right above this.
Rolled out: http://trac.webkit.org/changeset/140546 Now DOMDivWalk looks better again.
(In reply to comment #25) > Rolled out: http://trac.webkit.org/changeset/140546 > > Now DOMDivWalk looks better again. Sigh. We should definitely look how our internal test runner work. I guess we can run it on DRT instead of full Chromium with small modifications. Then running it will have less pain.
On IRC, Elliott had a very promising theory about what is happening. A much earlier patch (r136575) inadvertently changed Element::removedFrom to always allocate rare data, by calling setIsInTopLayer. Element::removedFrom is not called during the DOMDivWalk benchmark itself, just in the setup/teardown between iterations. This means every element gets rare data for free, and perf hits that would normally occur by allocating rare data vanish from the benchmark, resulting in a higher score. Since my patch changed removedFrom to not allocate rare data unless needed, the score decreases. The graph supports this theory: http://chromium-perf.appspot.com/?tab=linux-release-webkit-latest&graph=DOMDivWalk&trace=score&rev=179000&history=700&master=ChromiumWebkit&testSuite=dom_perf&details=true - r136575: always allocate rare data in Element::removedFrom - r139402: first landing of my patch - r140080: rollout - r140307: reland - r140546: rollout - r140638: fix for always allocating rare data So if this is the case, we should see a perf hit for 140638, which so far is happening. I should have realized this too because my code before r136575 was performing the isInTopLayer() check on each Element::removedFrom call, and evidently no perf regression occurred.
Seems reasonable to me. It doesn't seem to me like there's more to be had from digging deeper here.
Just for more evidence, I slowed down rare data allocation with a sleep and tested with and without r140638: (always allocating rare data) RESULT DOM: DOMDivWalk= 3293.468275 ms (not allocating rare data) RESULT DOM: DOMDivWalk= 1865.209725 ms The benchmark uses childNodes, which uses rare data. I'll reland the patch.
(In reply to comment #29) > (always allocating rare data) > RESULT DOM: DOMDivWalk= 3293.468275 ms > (not allocating rare data) > RESULT DOM: DOMDivWalk= 1865.209725 ms Oops, I reversed those. Always allocating rare data is faster than not.
(In reply to comment #30) > (In reply to comment #29) > > (always allocating rare data) > > RESULT DOM: DOMDivWalk= 3293.468275 ms > > (not allocating rare data) > > RESULT DOM: DOMDivWalk= 1865.209725 ms > > Always allocating rare data is faster than not. That's not really true. We're just shifting the runtime cost to where it's not measured. We're paying the same runtime cost at the end of the day.
(In reply to comment #31) > That's not really true. We're just shifting the runtime cost to where it's not measured. We're paying the same runtime cost at the end of the day. Yes that was unclear wording on my part... I meant the always allocated case has the better/faster benchmark result.