Bug 99669

Summary: hover, layout and scrolling super slow
Product: WebKit Reporter: Ojan Vafai <ojan>
Component: Layout and RenderingAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: allan.jensen, anilsson, arv, bdakin, eae, efidler, enne, eric, esprehn, gmak, jamesr, jchaffraix, kling, koivisto, laszlo.gombos, leviw, mikelawther, mlattanzio, pdr, rniwa, simon.fraser, staikos, tdanderson, tonikitoo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 99940    
Bug Blocks:    
Attachments:
Description Flags
test case
none
profile
none
profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll
none
trace of chromium mac ToT w Instruments 4.5
none
another profile using sysprof (http://sysprof.com/) this time
none
sysprof profile xml file
none
sysprof screenshot of chrome with compositing enabled
none
sysprof xml file with compositing enabled
none
Testcase that doesn't kill my browser none

Description Ojan Vafai 2012-10-17 18:42:37 PDT
Created attachment 169322 [details]
test case

Here's a reduced test case from a Google app. It's actually pretty close to the original in terms of performance. I tried reducing it more, but it was hard to tell if I was removing important bits. There are at least a few bugs here (probably more):

1. The hover takes forever to kick in. This gets worse as you get lower on the page. Clearly part of this is that we iterate over every renderlayer until we find the one we need to hit. Should we be maintaining a quad-tree for the positions of RenderLayers so we can make this be O(log n) instead of O(n)?
2. Mousemove events fire multiple times while scrolling. Firefox only fires one mousemove at the end of the scroll animation. We clearly should do the same.
3. Probably due to both of the above, scrolling is super janky.

If you remove the fixed position div and have threaded composited scrolling enabled, then scrolling becomes very smooth, but you see a lot of checkerboard and it's now differently janky. :(
Comment 1 Simon Fraser (smfr) 2012-10-17 20:18:25 PDT
Lots of time updating layer positions, which is O(N^2) or worse.
Comment 2 Simon Fraser (smfr) 2012-10-17 20:18:50 PDT
This is not dissimilar to facebook pages BTW.
Comment 3 Ojan Vafai 2012-10-18 10:58:34 PDT
Created attachment 169438 [details]
profile

Scrolled to the bottom then did little bit of hovering. Not surprisingly, 68% in RenderLayer::hitTestList.
Comment 4 Ojan Vafai 2012-10-18 11:04:54 PDT
Does the quad-tree approach seem reasonable to people or is there something else we should do? It obviously uses more memory, but it's roughly O(n) in the number of RenderLayers, which already are giant memory hogs. Only tricky bit I think is making sure we keep it updated as layers move around.

An alternative approach would be keeping track of which layers are currently visible. This sounds less promising and more complicated to me though. Also, this approach would also benefit from the quad-tree thing.
Comment 5 Ojan Vafai 2012-10-21 10:52:13 PDT
Filed bug 99940 for issue #2 (updating mouse events and hover state too frequently) so that this bug can focus on the hit testing performance.
Comment 6 Allan Sandfeld Jensen 2012-10-21 11:57:35 PDT
I don't know what you consider a quad tree, but I have been working on designing an interval tree solution, and simply search and order the interval tree based on the longest dimension (usually height). Since a typical layout will have close to full width elements, making x-ordering much less useful.

An interval tree has O(log(n)+k) run-time, where k is the number of results, and it only uses O(n) memory. The problem with it though is that it takes at least O(n*log(n)^2) to build, and probably will take O(n^2) to build in a low overhead fashion. So part of such solution will be a heuristic for when to build the tree, so it doesn't get build when it isn't needed or when the content is changing fast.

Of course before we do that, please make sure you are throttleing mouse-events, there is no reason to do more than 60 per second, but a mouse often send 300 updates each second. And the idea about only doing a mouse over at the end sounds like a good idea, or alternatively, only send them at a lower pace.

All this shouldn't be that big a problem though. Browsers has been doing these hit tests forever on every single mouse move event. If it gets that slow now, something else is likely to be wrong.
Comment 7 Eric Seidel (no email) 2012-10-22 11:29:34 PDT
Created attachment 169947 [details]
profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll
Comment 8 Ojan Vafai 2012-10-22 11:39:26 PDT
(In reply to comment #7)
> Created an attachment (id=169947) [details]
> profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll

Loading this trace seems to crash Instruments for me on 10.8.2. :(
Comment 9 Simon Fraser (smfr) 2012-10-22 11:50:43 PDT
That profile shows that hit testing is not the bottleneck; it's compositing stuff, and layer clip rect calculation etc.
Comment 10 Eric Seidel (no email) 2012-10-22 12:18:43 PDT
I suspect the bottlenecks may be different in Chromium vs. Safari at this point.  We should probably split this bug.  It's possible after both ports fix their respective low-hanging fruit there are unified wins yet to discuss.
Comment 11 Ojan Vafai 2012-10-22 14:55:01 PDT
Created attachment 169993 [details]
trace of chromium mac ToT w Instruments 4.5

Curiously, TestShell on Mac performs much better than Chromium. This trace is of Chromium's renderer process. No scrolling, just hovering over items at the end of the page.
Comment 12 Ojan Vafai 2012-10-26 13:14:05 PDT
Created attachment 170988 [details]
another profile using sysprof (http://sysprof.com/) this time

Still shows hitTestLayer to be the hotspot. RefPtr doesn't show up here though. The difference might be that for the sysprof profile I already had the page loaded and scrolled to the bottom. I didn't do any scrolling, I just hovered over different items for ~20 seconds.
Comment 13 Ojan Vafai 2012-10-26 13:14:45 PDT
Created attachment 170990 [details]
sysprof profile xml file
Comment 14 Ojan Vafai 2012-10-26 13:57:46 PDT
Created attachment 170998 [details]
sysprof screenshot of chrome with compositing enabled

Very different trace. Still super slow.
Comment 15 Ojan Vafai 2012-10-26 13:58:36 PDT
Created attachment 170999 [details]
sysprof xml file with compositing enabled
Comment 16 Ojan Vafai 2012-10-26 14:02:24 PDT
1:49 PM <esprehn_> ojan: it seems hitTestLayer on anything with a transform calls backgroundClipRect -> parentClipRects -> parent()->updateClipRects() and that walks up the whole layer tree to the root
1:50 PM <esprehn_> it seems like there should be some caching in there to prevent that being super slow
1:51 PM <smfr> esprehn_: we blow away caches when things go between composited and not
1:51 PM <smfr> which this pages does in the extreme
1:51 PM <esprehn_> oh, because the transform is on hover
1:51 PM <esprehn_> well that's certainly part of it then
1:51 PM <smfr> transforms also blow away caches
1:52 PM <smfr> transform changes, that is

:( Not sure where that leaves us in terms of fixing this.
Comment 17 Ojan Vafai 2012-10-26 15:57:26 PDT
James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements:
    // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform.
    if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer()))
        compositingState.m_testingOverlap = false;

Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast.

Some ideas from James as to how to fix this:
1. compute tighter bounds on the layer with the animation and feed that into the overlap test
2. composite all of these potentially-overlapping things into one composited layer instead of N
3. make computing all of this faster

These all sound hard. :) 

Thinking about 3, in the Chromium composited profile:
-20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton.
-30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers.
Comment 18 Simon Fraser (smfr) 2012-10-26 17:05:45 PDT
Created attachment 171052 [details]
Testcase that doesn't kill my browser
Comment 19 Simon Fraser (smfr) 2012-10-26 17:09:15 PDT
(In reply to comment #17)
> James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements:
>     // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform.
>     if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer()))
>         compositingState.m_testingOverlap = false;
> 
> Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast.
> 
> Some ideas from James as to how to fix this:
> 1. compute tighter bounds on the layer with the animation and feed that into the overlap test

In theory we could look at the animation endpoints (and keyframes if any) and compute a bounding box for all steps of the animation. That would certainly help.

> 2. composite all of these potentially-overlapping things into one composited layer instead of N

Do to this you'd have to figure out that doing so doesn't break the visual ordering. Also, it's not clear that it would help performance here.

> 3. make computing all of this faster

That would help in general.

> These all sound hard. :) 
> 
> Thinking about 3, in the Chromium composited profile:
> -20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton.
> -30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers.