Bug 99669 - hover, layout and scrolling super slow
: hover, layout and scrolling super slow
Status: NEW
: WebKit
Layout and Rendering
: 528+ (Nightly build)
: Unspecified Unspecified
: P2 Normal
Assigned To:
:
:
: 99940
:
  Show dependency treegraph
 
Reported: 2012-10-17 18:42 PST by
Modified: 2013-01-29 20:27 PST (History)


Attachments
test case (2.19 KB, text/html)
2012-10-17 18:42 PST, Ojan Vafai
no flags Details
profile (100.62 KB, application/octet-stream)
2012-10-18 10:58 PST, Ojan Vafai
no flags Details
profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll (1.14 MB, application/zip)
2012-10-22 11:29 PST, Eric Seidel
no flags Details
trace of chromium mac ToT w Instruments 4.5 (1.44 MB, application/zip)
2012-10-22 14:55 PST, Ojan Vafai
no flags Details
another profile using sysprof (http://sysprof.com/) this time (397.64 KB, image/png)
2012-10-26 13:14 PST, Ojan Vafai
no flags Details
sysprof profile xml file (1.51 MB, text/xml)
2012-10-26 13:14 PST, Ojan Vafai
no flags Details
sysprof screenshot of chrome with compositing enabled (465.51 KB, image/png)
2012-10-26 13:57 PST, Ojan Vafai
no flags Details
sysprof xml file with compositing enabled (4.46 MB, text/xml)
2012-10-26 13:58 PST, Ojan Vafai
no flags Details
Testcase that doesn't kill my browser (2.18 KB, text/html)
2012-10-26 17:05 PST, Simon Fraser (smfr)
no flags Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2012-10-17 18:42:37 PST
Created an attachment (id=169322) [details]
test case

Here's a reduced test case from a Google app. It's actually pretty close to the original in terms of performance. I tried reducing it more, but it was hard to tell if I was removing important bits. There are at least a few bugs here (probably more):

1. The hover takes forever to kick in. This gets worse as you get lower on the page. Clearly part of this is that we iterate over every renderlayer until we find the one we need to hit. Should we be maintaining a quad-tree for the positions of RenderLayers so we can make this be O(log n) instead of O(n)?
2. Mousemove events fire multiple times while scrolling. Firefox only fires one mousemove at the end of the scroll animation. We clearly should do the same.
3. Probably due to both of the above, scrolling is super janky.

If you remove the fixed position div and have threaded composited scrolling enabled, then scrolling becomes very smooth, but you see a lot of checkerboard and it's now differently janky. :(
------- Comment #1 From 2012-10-17 20:18:25 PST -------
Lots of time updating layer positions, which is O(N^2) or worse.
------- Comment #2 From 2012-10-17 20:18:50 PST -------
This is not dissimilar to facebook pages BTW.
------- Comment #3 From 2012-10-18 10:58:34 PST -------
Created an attachment (id=169438) [details]
profile

Scrolled to the bottom then did little bit of hovering. Not surprisingly, 68% in RenderLayer::hitTestList.
------- Comment #4 From 2012-10-18 11:04:54 PST -------
Does the quad-tree approach seem reasonable to people or is there something else we should do? It obviously uses more memory, but it's roughly O(n) in the number of RenderLayers, which already are giant memory hogs. Only tricky bit I think is making sure we keep it updated as layers move around.

An alternative approach would be keeping track of which layers are currently visible. This sounds less promising and more complicated to me though. Also, this approach would also benefit from the quad-tree thing.
------- Comment #5 From 2012-10-21 10:52:13 PST -------
Filed bug 99940 for issue #2 (updating mouse events and hover state too frequently) so that this bug can focus on the hit testing performance.
------- Comment #6 From 2012-10-21 11:57:35 PST -------
I don't know what you consider a quad tree, but I have been working on designing an interval tree solution, and simply search and order the interval tree based on the longest dimension (usually height). Since a typical layout will have close to full width elements, making x-ordering much less useful.

An interval tree has O(log(n)+k) run-time, where k is the number of results, and it only uses O(n) memory. The problem with it though is that it takes at least O(n*log(n)^2) to build, and probably will take O(n^2) to build in a low overhead fashion. So part of such solution will be a heuristic for when to build the tree, so it doesn't get build when it isn't needed or when the content is changing fast.

Of course before we do that, please make sure you are throttleing mouse-events, there is no reason to do more than 60 per second, but a mouse often send 300 updates each second. And the idea about only doing a mouse over at the end sounds like a good idea, or alternatively, only send them at a lower pace.

All this shouldn't be that big a problem though. Browsers has been doing these hit tests forever on every single mouse move event. If it gets that slow now, something else is likely to be wrong.
------- Comment #7 From 2012-10-22 11:29:34 PST -------
Created an attachment (id=169947) [details]
profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll
------- Comment #8 From 2012-10-22 11:39:26 PST -------
(In reply to comment #7)
> Created an attachment (id=169947) [details] [details]
> profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll

Loading this trace seems to crash Instruments for me on 10.8.2. :(
------- Comment #9 From 2012-10-22 11:50:43 PST -------
That profile shows that hit testing is not the bottleneck; it's compositing stuff, and layer clip rect calculation etc.
------- Comment #10 From 2012-10-22 12:18:43 PST -------
I suspect the bottlenecks may be different in Chromium vs. Safari at this point.  We should probably split this bug.  It's possible after both ports fix their respective low-hanging fruit there are unified wins yet to discuss.
------- Comment #11 From 2012-10-22 14:55:01 PST -------
Created an attachment (id=169993) [details]
trace of chromium mac ToT w Instruments 4.5

Curiously, TestShell on Mac performs much better than Chromium. This trace is of Chromium's renderer process. No scrolling, just hovering over items at the end of the page.
------- Comment #12 From 2012-10-26 13:14:05 PST -------
Created an attachment (id=170988) [details]
another profile using sysprof (http://sysprof.com/) this time

Still shows hitTestLayer to be the hotspot. RefPtr doesn't show up here though. The difference might be that for the sysprof profile I already had the page loaded and scrolled to the bottom. I didn't do any scrolling, I just hovered over different items for ~20 seconds.
------- Comment #13 From 2012-10-26 13:14:45 PST -------
Created an attachment (id=170990) [details]
sysprof profile xml file
------- Comment #14 From 2012-10-26 13:57:46 PST -------
Created an attachment (id=170998) [details]
sysprof screenshot of chrome with compositing enabled

Very different trace. Still super slow.
------- Comment #15 From 2012-10-26 13:58:36 PST -------
Created an attachment (id=170999) [details]
sysprof xml file with compositing enabled
------- Comment #16 From 2012-10-26 14:02:24 PST -------
1:49 PM <esprehn_> ojan: it seems hitTestLayer on anything with a transform calls backgroundClipRect -> parentClipRects -> parent()->updateClipRects() and that walks up the whole layer tree to the root
1:50 PM <esprehn_> it seems like there should be some caching in there to prevent that being super slow
1:51 PM <smfr> esprehn_: we blow away caches when things go between composited and not
1:51 PM <smfr> which this pages does in the extreme
1:51 PM <esprehn_> oh, because the transform is on hover
1:51 PM <esprehn_> well that's certainly part of it then
1:51 PM <smfr> transforms also blow away caches
1:52 PM <smfr> transform changes, that is

:( Not sure where that leaves us in terms of fixing this.
------- Comment #17 From 2012-10-26 15:57:26 PST -------
James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements:
    // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform.
    if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer()))
        compositingState.m_testingOverlap = false;

Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast.

Some ideas from James as to how to fix this:
1. compute tighter bounds on the layer with the animation and feed that into the overlap test
2. composite all of these potentially-overlapping things into one composited layer instead of N
3. make computing all of this faster

These all sound hard. :) 

Thinking about 3, in the Chromium composited profile:
-20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton.
-30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers.
------- Comment #18 From 2012-10-26 17:05:45 PST -------
Created an attachment (id=171052) [details]
Testcase that doesn't kill my browser
------- Comment #19 From 2012-10-26 17:09:15 PST -------
(In reply to comment #17)
> James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements:
>     // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform.
>     if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer()))
>         compositingState.m_testingOverlap = false;
> 
> Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast.
> 
> Some ideas from James as to how to fix this:
> 1. compute tighter bounds on the layer with the animation and feed that into the overlap test

In theory we could look at the animation endpoints (and keyframes if any) and compute a bounding box for all steps of the animation. That would certainly help.

> 2. composite all of these potentially-overlapping things into one composited layer instead of N

Do to this you'd have to figure out that doing so doesn't break the visual ordering. Also, it's not clear that it would help performance here.

> 3. make computing all of this faster

That would help in general.

> These all sound hard. :) 
> 
> Thinking about 3, in the Chromium composited profile:
> -20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton.
> -30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers.