RESOLVED FIXED 33520
Mapping from local to container coord space in updateLayerPositions is quadratic
https://bugs.webkit.org/show_bug.cgi?id=33520
Summary Mapping from local to container coord space in updateLayerPositions is quadratic
James Robinson
Reported 2010-01-11 23:37:35 PST
The attached test pages show a huge amount of time spend mapping from local to container coordinate space, mainly in the functions: WebCore::RenderLayer::convertToLayerCoords() WebCore::RenderBox::mapLocalToContainer() WebCore::RenderBox::computeRectForRepaint() WebCore::TransformationMatrix::multLeft() [on a page with -webkit-transform styles] The test pages feature a <span> inside 2048 nested <div>s that are either position:relative or have a transform applied, so they each have a RenderLayer associated with them. The bounds of each of these <div>s is calculated in order to hit test and in order to calculate the repaint bounds of the layer. This is calculated by starting at the target renderer and walking up to each container and applying the appropriate transform recursively. There's no caching of this calculation, so each transform is applied (depth of DOM)^2 times. This affects any page that has any number of elements that have layers and have children (i.e. most all of them).
Attachments
Nested -webkit-transform elements (713 bytes, text/html)
2010-01-11 23:38 PST, James Robinson
no flags
Profile of the position:relative test page (postscript) (82.01 KB, application/octet-stream)
2010-01-11 23:43 PST, James Robinson
no flags
Profile of the -webkit-transform test page (postscript) (86.11 KB, application/octet-stream)
2010-01-11 23:44 PST, James Robinson
no flags
Test page with nested position:relative elements (710 bytes, text/html)
2010-01-12 12:56 PST, James Robinson
no flags
Patch (4.32 KB, patch)
2010-02-05 12:28 PST, James Robinson
no flags
Patch (4.20 KB, patch)
2010-02-05 13:45 PST, James Robinson
no flags
Patch (4.15 KB, patch)
2010-02-05 14:00 PST, James Robinson
no flags
Patch (6.43 KB, patch)
2010-03-04 13:27 PST, James Robinson
no flags
Patch (10.93 KB, patch)
2010-03-10 18:38 PST, James Robinson
no flags
fix ChangeLog (10.85 KB, patch)
2010-03-11 12:35 PST, James Robinson
no flags
Patch (11.11 KB, patch)
2010-03-16 12:23 PDT, James Robinson
no flags
Patch (10.91 KB, patch)
2010-03-17 14:59 PDT, James Robinson
no flags
fix RenderSVGModelObject signature to match (10.95 KB, patch)
2010-03-17 17:41 PDT, James Robinson
no flags
add some tests (20.90 KB, patch)
2010-03-31 12:53 PDT, James Robinson
simon.fraser: review+
James Robinson
Comment 1 2010-01-11 23:38:12 PST
Created attachment 46338 [details] Nested -webkit-transform elements
James Robinson
Comment 2 2010-01-11 23:43:25 PST
Created attachment 46340 [details] Profile of the position:relative test page (postscript) Here are the top few hot functions (as reported by pprof, tested using the 'test_shell' thin WebView wrapper from Chromium on a 32bit linux system): (pprof) top20 Total: 2571 samples 877 34.1% 34.1% 877 34.1% WebCore::RenderLayer::convertToLayerCoords 329 12.8% 46.9% 1223 47.6% WebCore::RenderBox::mapLocalToContainer 275 10.7% 57.6% 369 14.4% WebCore::RenderBox::computeRectForRepaint 134 5.2% 62.8% 161 6.3% WebCore::RenderObject::containingBlock 92 3.6% 66.4% 618 24.0% WebCore::RenderBoxModelObject::relativePositionOffsetX 89 3.5% 69.9% 758 29.5% WebCore::RenderBox::offsetFromContainer 81 3.2% 73.0% 303 11.8% WebCore::RenderBlock::availableWidth 78 3.0% 76.0% 78 3.0% WebCore::RenderObject::offsetFromContainer 74 2.9% 78.9% 74 2.9% WebCore::RenderBoxModelObject::borderLeft 72 2.8% 81.7% 72 2.8% WebCore::RenderObject::isRooted 69 2.7% 84.4% 162 6.3% WebCore::RenderBox::clientWidth 38 1.5% 85.9% 38 1.5% WebCore::TransformState::move 37 1.4% 87.3% 37 1.4% WebCore::RenderBoxModelObject::relativePositionOffsetY 36 1.4% 88.7% 40 1.6% WebCore::RenderBlock::isBlockFlow 32 1.2% 90.0% 32 1.2% WebCore::RenderBoxModelObject::paddingRight 29 1.1% 91.1% 29 1.1% WebCore::RenderBoxModelObject::paddingLeft 28 1.1% 92.2% 33 1.3% WebCore::RenderObject::enclosingLayer 26 1.0% 93.2% 26 1.0% WebCore::RenderObject::isHR 25 1.0% 94.2% 27 1.1% WebCore::RenderObject::enclosingBox 23 0.9% 95.1% 110 4.3% WebCore::RenderLayer::updateLayerPositions 10 0.4% 95.4% 10 0.4% WebCore::RenderBox::verticalScrollbarWidth Information about the profiler is here: http://goog-perftools.sourceforge.net/doc/cpu_profiler.html. The important column are the first (number of samples recorded inside this function), second (% of total time recorded within this function) and fourth (% of time spend in this function and its callees).
James Robinson
Comment 3 2010-01-11 23:44:36 PST
Created attachment 46341 [details] Profile of the -webkit-transform test page (postscript) Hot functions: (pprof) top20 Total: 2362 samples 496 21.0% 21.0% 496 21.0% WebCore::TransformationMatrix::multLeft 211 8.9% 29.9% 1410 59.7% WebCore::RenderBox::mapLocalToContainer 162 6.9% 36.8% 162 6.9% __floorf 157 6.6% 43.4% 310 13.1% WebCore::RenderObject::localToAbsolute 153 6.5% 49.9% 153 6.5% WebCore::TransformationMatrix::translate 127 5.4% 55.3% 127 5.4% __ceilf 117 5.0% 60.2% 117 5.0% WebCore::TransformationMatrix::mapQuad 110 4.7% 64.9% 678 28.7% WebCore::RenderBox::computeRectForRepaint 97 4.1% 69.0% 97 4.1% WebCore::TransformationMatrix::mapPoint 85 3.6% 72.6% 514 21.8% WebCore::TransformationMatrix::mapRect 82 3.5% 76.1% 82 3.5% WebCore::enclosingIntRect 50 2.1% 78.2% 88 3.7% WebCore::RenderLayer::createLocalTransformState 41 1.7% 79.9% 41 1.7% WebCore::RenderLayer::convertToLayerCoords 37 1.6% 81.5% 154 6.5% WebCore::TransformState::flattenWithTransform 35 1.5% 83.0% 35 1.5% WebCore::RenderObject::offsetFromContainer 34 1.4% 84.4% 34 1.4% WebCore::RenderObject::isRooted 29 1.2% 85.6% 40 1.7% WebCore::RenderBox::offsetFromContainer 29 1.2% 86.9% 551 23.3% WebCore::RenderObject::getTransformFromContainer 27 1.1% 88.0% 58 2.5% WebCore::FloatRect::FloatRect 23 1.0% 89.0% 23 1.0% WebCore::FloatPoint::FloatPoint 21 0.9% 89.9% 293 12.4% WebCore::TransformState::applyTransform
James Robinson
Comment 4 2010-01-12 00:10:12 PST
The cleanest solution to this would be to keep a transformation matrix around while iterating into the render tree that translates from the current local coordinate system into the coordinate space of the current repaint container. That way the transformation only has to be updated once for every container, rather than having to walk up to the root from each RenderObject. This would probably require a bit of restructuring. Additionally, it assumes that all transforms are affine, which I think they are. It'd be easier to implement if all transforms are invertible but I do not think this is the case.
Eric Seidel (no email)
Comment 5 2010-01-12 04:47:08 PST
Keeping the current transform as we traverse has precedent with things like LayoutState: http://trac.webkit.org/browser/trunk/WebCore/rendering/LayoutState.h (Only related to this discussion as an example of keeping side-state around during a rendering tree traversal.) Were I to re-write the rendering tree, I would change all classes to deal with IntPoints in the local coordinate system (content box) instead of the current system where your x/y are passes as separate integers and are relative to your containing block. Then again, that's mostly orthogonal to this issue.
Simon Fraser (smfr)
Comment 6 2010-01-12 08:12:48 PST
Ah, so you do have lots of transforms. I'll take a look and see if this can be improved.
James Robinson
Comment 7 2010-01-12 12:56:59 PST
Created attachment 46392 [details] Test page with nested position:relative elements The same test case except with position:relative elements instead of -webkit-transform. The same basic issue happens here - this isn't a bug just with -webkit-transform. I thought I had attached this test case when creating the bug but must have forgotten.
James Robinson
Comment 8 2010-01-13 17:15:44 PST
There's a similar issue with computeRectForRepaint(): it's called from updateLayerPositions(), which traverses from the root to child layers, and it traverses up to the root element applying transforms at each step. I would guess that the cleanest way to fix these issues would be to make LayoutState transform-aware and reuse it when calling updateLayerPositions() to avoid having to redo transforms from local to repaint container coordinates.
Simon Fraser (smfr)
Comment 9 2010-01-13 17:20:59 PST
Why doesn't the testcase with just position: relative use the layout-state code path?
James Robinson
Comment 10 2010-01-13 17:40:53 PST
LayoutState is used in the position:relative case, but it doesn't help much. On that page the slow mapping calls are made through a few code paths: - RenderLayer::updateLayerPositions() calls RenderLayer::convertToLayerCoords() at each layer without passing any accumulated state in. This means at each layer it still has to do a walk up every ancestor layer and recalculate the offsets. - RenderLayer::updateLayerPositions() calls RenderBox::outlineBoundsForRepaint() which calls RenderObject::localToContainerQuad() which calls RenderBox::mapLocalToContainer(), which walks up each ancestor containing renderer. - RenderLayer::hitTestLayer() which descends through each layer and calls RenderLayer::calculateRects() which calls RenderLayer::convertToLayerCoords() on each layer. No accumulated state is stored here, so at each layer a walk is made back up to the root converting to the parent's coordinate system. During layout 'proper' (RenderView::layout() and children), the layout state is used to optimize RenderBox::computeRectForRepaint(). In the post-layout updateLayerPositions() call (http://trac.webkit.org/browser/trunk/WebCore/page//FrameView.cpp#L700), the layout state is never set. However RenderBox::computeRectForRepaint() is still called via RenderLayer::updateLayerPositions() which calls RenderBox::clippedOverflowRectForRepaint() which calls RenderBox::computeRectForRepaint(). This means it goes slowpath, and since it's called unconditionally on every layer it goes slowpath for every layer in the document.
James Robinson
Comment 11 2010-01-13 17:47:02 PST
FYI, RenderLayer::updateLayerPositions() has this: // FIXME: Optimize using LayoutState and remove the disableLayoutState() call // from updateScrollInfoAfterLayout(). ASSERT(!view->layoutStateEnabled()); The comment has existed since LayoutState was added. I'm not sure exactly what addressing the FIXME would entail but it would definitely allow computeRectForRepaint() and mapLocalToContainer() to use their fastpaths within RenderLayer::updateLayerPositions(). RenderLayer::convertToLayerCoords() is not aware of layout state and would still be slow.
James Robinson
Comment 12 2010-02-01 17:47:19 PST
I think the solution to this is to create a more general version of LayoutState that stores an accumulated delta from the root (essentially just the m_layoutDelta field) and to ensure that all code that recurses down the RenderLayer tree accumulates transforms on this field. This means hitTestLayer(), layout(), and updateLayerPositions. In the short term it would be easiest to have this fall back to slow path in the presence of transforms (just like LayoutState does currently). Does this sound right? I'll hack on it some and see what I can get.
Simon Fraser (smfr)
Comment 13 2010-02-01 18:19:59 PST
On a related note, we've talked about removing all the geometry mapping code (convertToLayerCoords()) in the layer tree and just doing it via renderers. Ideally the layers would just hang off of renders, and not store their own x,y. This would allow us to use LayoutState more of the time. I'm not sure what the perf impact would be. One of the issues with convertToLayerCoords() is that it's used to map to some ancestor at times when you're not necessarily walking from the root, so you may not have all of the state available.
James Robinson
Comment 14 2010-02-01 18:35:32 PST
Yes, I've seen comments to that effect of doing all coordinate transforms via RenderObjects throughout the code. Is there a bug or a set of patches tracking that code? I think it would simplify things. convertToLayerCoords() seems to be a slow spot in cases where the code is doing a recursive walk from some 'root' through children. What's needed for these is a mechanism like LayoutState that maintained both: - some 'root' for the current walk - a transform from the current local coordinate space to that of the 'root' I don't think there are many spots in the codebase that call convertToLayerCoords() with different ancestorLayers willy-nilly, this seems to be only done when recursing a subtree using a different 'root'. If there are, they will show up as hotspots pretty quickly.
Simon Fraser (smfr)
Comment 15 2010-02-01 21:31:21 PST
(In reply to comment #14) > Yes, I've seen comments to that effect of doing all coordinate transforms via > RenderObjects throughout the code. Is there a bug or a set of patches tracking > that code? No, it is not being worked on. > convertToLayerCoords() seems to be a slow spot in cases where the code is doing > a recursive walk from some 'root' through children. What's needed for these is > a mechanism like LayoutState that maintained both: > > - some 'root' for the current walk > - a transform from the current local coordinate space to that of the 'root' Right. > I don't think there are many spots in the codebase that call > convertToLayerCoords() with different ancestorLayers willy-nilly, this seems to > be only done when recursing a subtree using a different 'root'. If there are, > they will show up as hotspots pretty quickly. I think that's right.
James Robinson
Comment 16 2010-02-02 14:36:51 PST
(In reply to comment #15) > (In reply to comment #14) > > Yes, I've seen comments to that effect of doing all coordinate transforms via > > RenderObjects throughout the code. Is there a bug or a set of patches tracking > > that code? > > No, it is not being worked on. OK. I don't think I fully understand what this means. My understanding is that currently a RenderLayer represents a set of RenderObjects that exist in the same coordinate space. Is the idea that a RenderObject would either exist in its container's coordinate space or define a new one (in which case it would also have a RenderLayer hanging off of it?), and all mappings between local<->container coordinate spaces would traverse through RenderObject::container()s? I don't quite understand why the mapping is sometimes done through the RenderLayer hierarchy and sometimes through the RenderObject container hierarchy currently. Is it always the case that in the absence of transforms the two are equivalent?
Simon Fraser (smfr)
Comment 17 2010-02-02 15:50:17 PST
> I don't quite understand why the mapping is sometimes done through the > RenderLayer > hierarchy and sometimes through the RenderObject container hierarchy currently. > Is it > always the case that in the absence of transforms the two are equivalent? The mapping though RenderLayer is mostly just a convenience, and we assume it will be faster than mapping through RenderObjects. We assert that this mapping never crosses transformed layers, so it's really just accumulating x,y offsets. Just think of RenderLayer x,y as a cache.
James Robinson
Comment 18 2010-02-03 14:44:38 PST
Here is what I shall attempt to do: *) When a RenderLayer and all of its ancestors have no transforms, store the x/y offset to the root layer directly on the RenderLayer so that coordinate mapping to the root layer is O(1). This will have to be recalculated whenever a layer or any of its parents move - I'll have to figure out how to detect that through trial and error - although I think doing this in updateLayerPosition() should be good enough. *) Special-case mapping to the root layer in convertToLayerCoords() to just use these offsets, assuming no transforms. Mapping to layers other than the root or mapping through anything with transforms will still use the same (fairly slow) codepath. *) If this all works out, see if it's reasonable to ditch the RenderLayer's x/y offset to its parent RenderLayer. This should speed up the position:relative test page and the pages in the wild that are super slow. Transforms make the world much more complicated than I can probably handle :)
James Robinson
Comment 19 2010-02-05 12:28:51 PST
James Robinson
Comment 20 2010-02-05 13:45:28 PST
James Robinson
Comment 21 2010-02-05 14:00:49 PST
James Robinson
Comment 22 2010-02-05 14:11:48 PST
This patch makes RenderLayer::convertToLayerCoords() constant time in the absence of transforms or position:fixed. I didn't optimize transforms since that requires keeping more than just an X/Y offset and is somewhat complicated. I exclude position:fixed elements because fast-path scrolling updates X/Y offsets without calling updateLayerPositions(), and convertToLayerCoords() reads these offsets directly through localToAbsolute(). It would be possible to reenable this if there was some sort of notification to the RenderLayer when the offsets changed on the corresponding RenderObject. That seemed a bit ugly. This speeds up the position:relative test case by quite a bit (~20% on my MBP). The speedup is mostly due to speeding up calculation of the repaint bounds for each layer. With this patch applied the time needed to repaint the entire frame goes from 55ms to 1ms.
WebKit Review Bot
Comment 23 2010-02-05 14:31:07 PST
James Robinson
Comment 24 2010-02-05 14:43:26 PST
Compile failure was an internal compiler error, probably a gcc 4.4 bug and not due to anything in this patch.
Eric Seidel (no email)
Comment 25 2010-02-17 15:15:50 PST
Comment on attachment 48256 [details] Patch Seems: 9 // Cached x/y offset to our root layer. Only valid if none of our ancestor layers have transforms. 580 int m_rootOffsetX; 581 int m_rootOffsetY; should be an IntSize.
Eric Seidel (no email)
Comment 26 2010-02-22 12:59:23 PST
Comment on attachment 48256 [details] Patch r- for the int/intsize comment (mostly to get this out of the 120+ item review queue)
James Robinson
Comment 27 2010-03-04 13:27:22 PST
James Robinson
Comment 28 2010-03-04 13:31:43 PST
Patch speeds up the position:relative test page by about 55%. The functionality overlaps quite a bit with LayoutState, but for this particular use most of the features of LayoutState are unnecessary. If this approach looks OK then I'd like to merge it with the offset caching portion of LayoutState to avoid duplication.
James Robinson
Comment 29 2010-03-04 14:29:07 PST
Here are the exact speedups from this patch and r55065 for the position:relative test page. Speedups: r55064->r55065 without this patch, 1.75x faster r55064 with this patch is 2.78x faster r55065 with this patch is 2.12x faster Measured times (from my MBP): r55064, no patch: 1180ms r55064, with patch: 425ms r55065, no patch: 675ms r55065, with patch: 318ms Combined, this patch and r55065 speeds up the position:relative test page by 3.71x.
Eric Seidel (no email)
Comment 30 2010-03-04 15:34:11 PST
Comment on attachment 50048 [details] Patch The style bot somehow let you slip! 1 if (!repaintContainer && v->haveCachedOffsetFrom(this)) { 962 transformState.move(v->cachedOffset()); 963 return; 964 } Should this be its own function? const RenderObject* oldCachedOffsetChild = view->moveCachedOffset(offset, renderer()); 273 274 bool disableOffsetCache = renderer()->isSVGRoot() || renderer()->hasColumns() || renderer()->hasReflection() || renderer()->hasTransform(); 275 if (disableOffsetCache) 276 view->disableCachedOffset();
James Robinson
Comment 31 2010-03-10 14:41:54 PST
After discussion in IRC with dhyatt and smfr, I will store the cached offsets on the stack in updateLayerPositions and then pass it along directly to outlineBoundsForRepaint(). This should provide most of the performance improvement without needing to store more crap off the RenderView.
James Robinson
Comment 32 2010-03-10 18:38:52 PST
James Robinson
Comment 33 2010-03-10 18:45:10 PST
I've updated the patch based on feedback. It now keeps the offset on the stack during updateLayerPositions() and doesn't rely on the RenderView at all. The offset is used for outlineBoundsForRepaint() and for the convertToLayerCoords() call in updateLayerPositions(). The amount of time spent in updateLayerPositions() on the position:relative testpage goes from ~580ms to ~105ms on my macbook pro. This patch doesn't cache an offset for composited layers or their children. This should be very easy to add in a follow-up patch.
Ojan Vafai
Comment 34 2010-03-11 09:31:12 PST
Comment on attachment 50462 [details] Patch > +++ b/WebCore/ChangeLog > + RenderLayer::updaterLayerPositions() makes a recursive walk through all RenderLayers and updates the repaint rectangles on ea > + These rectangles have to be calculated in the repaint container's coordinates, which typically means walking up to the root > + coordinates through something like RenderObject::mapLocalToContainer. This patch keeps track of the offset to the root and > + uses that for outlineBoundsForRepaint(). It's slightly hacky but outlineBoundsForRepaint() is going away soon anyway. Your first sentence got cut off.
James Robinson
Comment 35 2010-03-11 12:35:30 PST
Created attachment 50531 [details] fix ChangeLog
Dave Hyatt
Comment 36 2010-03-11 12:47:55 PST
Comment on attachment 50531 [details] fix ChangeLog The logic for disabling the cache is wrong... <div style="position:absolute;"> .... <div style="overflow:auto">.... <div style="position:absolute;left:100;top:100"> You'd have the cache enabled and add in the intermediate x/y of the overflow section, but that isn't relevant. I think a simple cache will have to check containingBlock of the child layer's renderer to make sure it matches the parent layer's renderer in order to do a cache like this.
James Robinson
Comment 37 2010-03-16 12:23:20 PDT
Darin Adler
Comment 38 2010-03-16 13:17:10 PDT
Comment on attachment 50824 [details] Patch I'd like to see debug-only code that checks that the passed-in cached offset is correct and asserts if it is not.
James Robinson
Comment 39 2010-03-17 14:59:53 PDT
James Robinson
Comment 40 2010-03-17 15:01:44 PDT
This patch adds an assertion in RenderLayer::updateLayerPositions() like Darin suggested that compares the passed-in offset with one calculated through the old slow path in debug builds. This patch also fixes a typo in the ChangeLog and simplifies the code in RenderBox::outlineBoundsForRepaint() a bit so it correctly handles boxes with negative widths.
Darin Adler
Comment 41 2010-03-17 16:58:06 PDT
Comment on attachment 50961 [details] Patch > + ASSERT(x == nonCachedX && y == nonCachedY); Generally assertions that have expressions containing && should be split up into separate assertions. That way you know can tell which one failed.
Darin Adler
Comment 42 2010-03-17 16:58:34 PDT
I’m hoping Hyatt will review again.
Simon Fraser (smfr)
Comment 43 2010-03-17 17:09:59 PDT
Comment on attachment 50961 [details] Patch > diff --git a/WebCore/rendering/RenderObject.h b/WebCore/rendering/RenderObject.h > - virtual IntRect outlineBoundsForRepaint(RenderBoxModelObject* /*repaintContainer*/) const { return IntRect(); } > + virtual IntRect outlineBoundsForRepaint(RenderBoxModelObject* /*repaintContainer*/, IntPoint* /*cachedOffsetToRepaintContainer*/ = 0) const { return IntRect(); } > diff --git a/WebCore/rendering/RenderSVGModelObject.h b/WebCore/rendering/RenderSVGModelObject.h > - virtual IntRect outlineBoundsForRepaint(RenderBoxModelObject* repaintContainer) const; > + virtual IntRect outlineBoundsForRepaint(RenderBoxModelObject* repaintContainer, IntSize*) const; IntPoint or IntSize? The signatures don't match, so something should be broken here, and isn't getting tested.
James Robinson
Comment 44 2010-03-17 17:19:04 PDT
Good catch! I'll update the headers to match.
James Robinson
Comment 45 2010-03-17 17:41:56 PDT
Created attachment 50980 [details] fix RenderSVGModelObject signature to match
James Robinson
Comment 46 2010-03-29 18:24:21 PDT
Ping? I think this is basically blocked on a review from Dave Hyatt, correct?
James Robinson
Comment 47 2010-03-31 12:53:11 PDT
Created attachment 52198 [details] add some tests
Simon Fraser (smfr)
Comment 48 2010-04-01 12:27:00 PDT
Comment on attachment 52198 [details] add some tests I tested with compositing enabled, and didn't see any issues. r=me.
James Robinson
Comment 49 2010-04-01 15:21:44 PDT
Note You need to log in before you can comment on or make changes to this bug.