Summary: | hover, layout and scrolling super slow | ||
---|---|---|---|
Product: | WebKit | Reporter: | Ojan Vafai <ojan> |
Component: | Layout and Rendering | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW --- | ||
Severity: | Normal | CC: | allan.jensen, anilsson, arv, bdakin, eae, efidler, enne, eric, esprehn, gmak, jamesr, jchaffraix, kling, koivisto, laszlo.gombos, leviw, mikelawther, mlattanzio, pdr, rniwa, simon.fraser, staikos, tdanderson, tonikitoo |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Bug Depends on: | 99940 | ||
Bug Blocks: | |||
Attachments: |
Lots of time updating layer positions, which is O(N^2) or worse. This is not dissimilar to facebook pages BTW. Created attachment 169438 [details]
profile
Scrolled to the bottom then did little bit of hovering. Not surprisingly, 68% in RenderLayer::hitTestList.
Does the quad-tree approach seem reasonable to people or is there something else we should do? It obviously uses more memory, but it's roughly O(n) in the number of RenderLayers, which already are giant memory hogs. Only tricky bit I think is making sure we keep it updated as layers move around. An alternative approach would be keeping track of which layers are currently visible. This sounds less promising and more complicated to me though. Also, this approach would also benefit from the quad-tree thing. Filed bug 99940 for issue #2 (updating mouse events and hover state too frequently) so that this bug can focus on the hit testing performance. I don't know what you consider a quad tree, but I have been working on designing an interval tree solution, and simply search and order the interval tree based on the longest dimension (usually height). Since a typical layout will have close to full width elements, making x-ordering much less useful. An interval tree has O(log(n)+k) run-time, where k is the number of results, and it only uses O(n) memory. The problem with it though is that it takes at least O(n*log(n)^2) to build, and probably will take O(n^2) to build in a low overhead fashion. So part of such solution will be a heuristic for when to build the tree, so it doesn't get build when it isn't needed or when the content is changing fast. Of course before we do that, please make sure you are throttleing mouse-events, there is no reason to do more than 60 per second, but a mouse often send 300 updates each second. And the idea about only doing a mouse over at the end sounds like a good idea, or alternatively, only send them at a lower pace. All this shouldn't be that big a problem though. Browsers has been doing these hit tests forever on every single mouse move event. If it gets that slow now, something else is likely to be wrong. Created attachment 169947 [details] profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll (In reply to comment #7) > Created an attachment (id=169947) [details] > profile of WebProcess with run-safari of r132006 in instruments, using mouse-wheel to scroll Loading this trace seems to crash Instruments for me on 10.8.2. :( That profile shows that hit testing is not the bottleneck; it's compositing stuff, and layer clip rect calculation etc. I suspect the bottlenecks may be different in Chromium vs. Safari at this point. We should probably split this bug. It's possible after both ports fix their respective low-hanging fruit there are unified wins yet to discuss. Created attachment 169993 [details]
trace of chromium mac ToT w Instruments 4.5
Curiously, TestShell on Mac performs much better than Chromium. This trace is of Chromium's renderer process. No scrolling, just hovering over items at the end of the page.
Created attachment 170988 [details] another profile using sysprof (http://sysprof.com/) this time Still shows hitTestLayer to be the hotspot. RefPtr doesn't show up here though. The difference might be that for the sysprof profile I already had the page loaded and scrolled to the bottom. I didn't do any scrolling, I just hovered over different items for ~20 seconds. Created attachment 170990 [details]
sysprof profile xml file
Created attachment 170998 [details]
sysprof screenshot of chrome with compositing enabled
Very different trace. Still super slow.
Created attachment 170999 [details]
sysprof xml file with compositing enabled
1:49 PM <esprehn_> ojan: it seems hitTestLayer on anything with a transform calls backgroundClipRect -> parentClipRects -> parent()->updateClipRects() and that walks up the whole layer tree to the root 1:50 PM <esprehn_> it seems like there should be some caching in there to prevent that being super slow 1:51 PM <smfr> esprehn_: we blow away caches when things go between composited and not 1:51 PM <smfr> which this pages does in the extreme 1:51 PM <esprehn_> oh, because the transform is on hover 1:51 PM <esprehn_> well that's certainly part of it then 1:51 PM <smfr> transforms also blow away caches 1:52 PM <smfr> transform changes, that is :( Not sure where that leaves us in terms of fixing this. James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements: // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform. if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer())) compositingState.m_testingOverlap = false; Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast. Some ideas from James as to how to fix this: 1. compute tighter bounds on the layer with the animation and feed that into the overlap test 2. composite all of these potentially-overlapping things into one composited layer instead of N 3. make computing all of this faster These all sound hard. :) Thinking about 3, in the Chromium composited profile: -20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton. -30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers. Created attachment 171052 [details]
Testcase that doesn't kill my browser
(In reply to comment #17) > James walked me through the compositing code. The culprit here is the following code in RenderLayerCompositor::computeCompositingRequirements: > // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform. > if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer())) > compositingState.m_testingOverlap = false; > > Since we don't know the bounds of the animation / transform, we disable all overlap testing and recomposite the world. I confirmed that commenting out these two lines makes this test case fast. > > Some ideas from James as to how to fix this: > 1. compute tighter bounds on the layer with the animation and feed that into the overlap test In theory we could look at the animation endpoints (and keyframes if any) and compute a bounding box for all steps of the animation. That would certainly help. > 2. composite all of these potentially-overlapping things into one composited layer instead of N Do to this you'd have to figure out that doing so doesn't break the visual ordering. Also, it's not clear that it would help performance here. > 3. make computing all of this faster That would help in general. > These all sound hard. :) > > Thinking about 3, in the Chromium composited profile: > -20% is spent in TilingData::tileRect. That function seems pretty fast though, so presumably it's just getting called a ton. > -30% is spent in RenderLayerBacking::~RenderLayerBacking, which spends most of its time in RenderLayerBacking::updateClippingLayers. |
Created attachment 169322 [details] test case Here's a reduced test case from a Google app. It's actually pretty close to the original in terms of performance. I tried reducing it more, but it was hard to tell if I was removing important bits. There are at least a few bugs here (probably more): 1. The hover takes forever to kick in. This gets worse as you get lower on the page. Clearly part of this is that we iterate over every renderlayer until we find the one we need to hit. Should we be maintaining a quad-tree for the positions of RenderLayers so we can make this be O(log n) instead of O(n)? 2. Mousemove events fire multiple times while scrolling. Firefox only fires one mousemove at the end of the scroll animation. We clearly should do the same. 3. Probably due to both of the above, scrolling is super janky. If you remove the fixed position div and have threaded composited scrolling enabled, then scrolling becomes very smooth, but you see a lot of checkerboard and it's now differently janky. :(