Bug 99669

Summary: hover, layout and scrolling super slow
Product: WebKit Reporter: Ojan Vafai <ojan>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: NEW    
Severity: Normal CC: allan.jensen, anilsson, arv, bdakin, eae, efidler, enne, eric, esprehn, gmak, jamesr, jchaffraix, kling, koivisto, laszlo.gombos, leviw, mikelawther, mlattanzio, pdr, rniwa, simon.fraser, staikos, tdanderson, tonikitoo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 99940    
Bug Blocks:    
Attachments:
Description Flags
test case
none
profile
none
profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll
none
trace of chromium mac ToT w Instruments 4.5
none
another profile using sysprof (http://sysprof.com/) this time
none
sysprof profile xml file
none
sysprof screenshot of chrome with compositing enabled
none
sysprof xml file with compositing enabled
none
Testcase that doesn't kill my browser none

Ojan Vafai
Reported 2012-10-17 18:42:37 PDT
Created attachment 169322 [details] test case Here's a reduced test case from a Google app. It's actually pretty close to the original in terms of performance. I tried reducing it more, but it was hard to tell if I was removing important bits. There are at least a few bugs here (probably more): 1. The hover takes forever to kick in. This gets worse as you get lower on the page. Clearly part of this is that we iterate over every renderlayer until we find the one we need to hit. Should we be maintaining a quad-tree for the positions of RenderLayers so we can make this be O(log n) instead of O(n)? 2. Mousemove events fire multiple times while scrolling. Firefox only fires one mousemove at the end of the scroll animation. We clearly should do the same. 3. Probably due to both of the above, scrolling is super janky. If you remove the fixed position div and have threaded composited scrolling enabled, then scrolling becomes very smooth, but you see a lot of checkerboard and it's now differently janky. :(
Attachments
test case (2.19 KB, text/html)
2012-10-17 18:42 PDT, Ojan Vafai
no flags
profile (100.62 KB, application/octet-stream)
2012-10-18 10:58 PDT, Ojan Vafai
no flags
profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll (1.14 MB, application/zip)
2012-10-22 11:29 PDT, Eric Seidel (no email)
no flags
trace of chromium mac ToT w Instruments 4.5 (1.44 MB, application/zip)
2012-10-22 14:55 PDT, Ojan Vafai
no flags
another profile using sysprof (http://sysprof.com/) this time (397.64 KB, image/png)
2012-10-26 13:14 PDT, Ojan Vafai
no flags
sysprof profile xml file (1.51 MB, text/xml)
2012-10-26 13:14 PDT, Ojan Vafai
no flags
sysprof screenshot of chrome with compositing enabled (465.51 KB, image/png)
2012-10-26 13:57 PDT, Ojan Vafai
no flags
sysprof xml file with compositing enabled (4.46 MB, text/xml)
2012-10-26 13:58 PDT, Ojan Vafai
no flags
Testcase that doesn't kill my browser (2.18 KB, text/html)
2012-10-26 17:05 PDT, Simon Fraser (smfr)
no flags
Simon Fraser (smfr)
Comment 1 2012-10-17 20:18:25 PDT
Lots of time updating layer positions, which is O(N^2) or worse.
Simon Fraser (smfr)
Comment 2 2012-10-17 20:18:50 PDT
This is not dissimilar to facebook pages BTW.
Ojan Vafai
Comment 3 2012-10-18 10:58:34 PDT
Created attachment 169438 [details] profile Scrolled to the bottom then did little bit of hovering. Not surprisingly, 68% in RenderLayer::hitTestList.
Ojan Vafai
Comment 4 2012-10-18 11:04:54 PDT
Does the quad-tree approach seem reasonable to people or is there something else we should do? It obviously uses more memory, but it's roughly O(n) in the number of RenderLayers, which already are giant memory hogs. Only tricky bit I think is making sure we keep it updated as layers move around. An alternative approach would be keeping track of which layers are currently visible. This sounds less promising and more complicated to me though. Also, this approach would also benefit from the quad-tree thing.
Ojan Vafai
Comment 5 2012-10-21 10:52:13 PDT
Filed bug 99940 for issue #2 (updating mouse events and hover state too frequently) so that this bug can focus on the hit testing performance.
Allan Sandfeld Jensen
Comment 6 2012-10-21 11:57:35 PDT
I don't know what you consider a quad tree, but I have been working on designing an interval tree solution, and simply search and order the interval tree based on the longest dimension (usually height). Since a typical layout will have close to full width elements, making x-ordering much less useful. An interval tree has O(log(n)+k) run-time, where k is the number of results, and it only uses O(n) memory. The problem with it though is that it takes at least O(n*log(n)^2) to build, and probably will take O(n^2) to build in a low overhead fashion. So part of such solution will be a heuristic for when to build the tree, so it doesn't get build when it isn't needed or when the content is changing fast. Of course before we do that, please make sure you are throttleing mouse-events, there is no reason to do more than 60 per second, but a mouse often send 300 updates each second. And the idea about only doing a mouse over at the end sounds like a good idea, or alternatively, only send them at a lower pace. All this shouldn't be that big a problem though. Browsers has been doing these hit tests forever on every single mouse move event. If it gets that slow now, something else is likely to be wrong.
Eric Seidel (no email)
Comment 7 2012-10-22 11:29:34 PDT
Created attachment 169947 [details] profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll
Ojan Vafai
Comment 8 2012-10-22 11:39:26 PDT
(In reply to comment #7) > Created an attachment (id=169947) [details] > profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll Loading this trace seems to crash Instruments for me on 10.8.2. :(
Simon Fraser (smfr)
Comment 9 2012-10-22 11:50:43 PDT
That profile shows that hit testing is not the bottleneck; it's compositing stuff, and layer clip rect calculation etc.
Eric Seidel (no email)
Comment 10 2012-10-22 12:18:43 PDT
I suspect the bottlenecks may be different in Chromium vs. Safari at this point. We should probably split this bug. It's possible after both ports fix their respective low-hanging fruit there are unified wins yet to discuss.
Ojan Vafai
Comment 11 2012-10-22 14:55:01 PDT
Created attachment 169993 [details] trace of chromium mac ToT w Instruments 4.5 Curiously, TestShell on Mac performs much better than Chromium. This trace is of Chromium's renderer process. No scrolling, just hovering over items at the end of the page.
Ojan Vafai
Comment 12 2012-10-26 13:14:05 PDT
Created attachment 170988 [details] another profile using sysprof (http://sysprof.com/) this time Still shows hitTestLayer to be the hotspot. RefPtr doesn't show up here though. The difference might be that for the sysprof profile I already had the page loaded and scrolled to the bottom. I didn't do any scrolling, I just hovered over different items for ~20 seconds.
Ojan Vafai
Comment 13 2012-10-26 13:14:45 PDT
Created attachment 170990 [details] sysprof profile xml file
Ojan Vafai
Comment 14 2012-10-26 13:57:46 PDT
Created attachment 170998 [details] sysprof screenshot of chrome with compositing enabled Very different trace. Still super slow.
Ojan Vafai
Comment 15 2012-10-26 13:58:36 PDT
Created attachment 170999 [details] sysprof xml file with compositing enabled
Ojan Vafai
Comment 16 2012-10-26 14:02:24 PDT
1:49 PM <esprehn_> ojan: it seems hitTestLayer on anything with a transform calls backgroundClipRect -> parentClipRects -> parent()->updateClipRects() and that walks up the whole layer tree to the root 1:50 PM <esprehn_> it seems like there should be some caching in there to prevent that being super slow 1:51 PM <smfr> esprehn_: we blow away caches when things go between composited and not 1:51 PM <smfr> which this pages does in the extreme 1:51 PM <esprehn_> oh, because the transform is on hover 1:51 PM <esprehn_> well that's certainly part of it then 1:51 PM <smfr> transforms also blow away caches 1:52 PM <smfr> transform changes, that is :( Not sure where that leaves us in terms of fixing this.
Ojan Vafai
Comment 17 2012-10-26 15:57:26 PDT
James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements: // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform. if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer())) compositingState.m_testingOverlap = false; Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast. Some ideas from James as to how to fix this: 1. compute tighter bounds on the layer with the animation and feed that into the overlap test 2. composite all of these potentially-overlapping things into one composited layer instead of N 3. make computing all of this faster These all sound hard. :) Thinking about 3, in the Chromium composited profile: -20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton. -30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers.
Simon Fraser (smfr)
Comment 18 2012-10-26 17:05:45 PDT
Created attachment 171052 [details] Testcase that doesn't kill my browser
Simon Fraser (smfr)
Comment 19 2012-10-26 17:09:15 PDT
(In reply to comment #17) > James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements: > // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform. > if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer())) > compositingState.m_testingOverlap = false; > > Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast. > > Some ideas from James as to how to fix this: > 1. compute tighter bounds on the layer with the animation and feed that into the overlap test In theory we could look at the animation endpoints (and keyframes if any) and compute a bounding box for all steps of the animation. That would certainly help. > 2. composite all of these potentially-overlapping things into one composited layer instead of N Do to this you'd have to figure out that doing so doesn't break the visual ordering. Also, it's not clear that it would help performance here. > 3. make computing all of this faster That would help in general. > These all sound hard. :) > > Thinking about 3, in the Chromium composited profile: > -20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton. > -30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers.
Note You need to log in before you can comment on or make changes to this bug.