RoboHornetPro spends ~25% of total test time in WebCore::Region::Shape methods RoboHornetPro supposedly adds some animation tests to the suite. I've not yet investigated all the changes from RoboHornet. Running Time Self Symbol Name 3465.0ms 14.0% 3465.0 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&) 2791.0ms 11.3% 2791.0 WebCore::Region::Shape::appendSpan(int, int const*, int const*) The 4th top symbol is also Shape related: Running Time Self Symbol Name 863.0ms 3.5% 863.0 memmove$VARIANT$sse42 579.0ms 2.3% 0.0 WTF::Vector<int, 32ul>::reserveCapacity(unsigned long) 579.0ms 2.3% 0.0 void WTF::Vector<int, 32ul>::appendSlowCase<int>(int const&) 565.0ms 2.2% 0.0 WebCore::Region::Shape::appendSpan(int, int const*, int const*) 14.0ms 0.0% 0.0 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&) These come from layer updates: Running Time Self Symbol Name 2791.0ms 11.3% 2791.0 WebCore::Region::Shape::appendSpan(int, int const*, int const*) 2782.0ms 11.2% 0.0 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&) 2782.0ms 11.2% 0.0 WebCore::Region::unite(WebCore::Region const&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::OverlapMap::add(WebCore::RenderLayer const*, WebCore::IntRect const&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::addToOverlapMap(WebCore::RenderLayerCompositor::OverlapMap&, WebCore::RenderLayer*, WebCore::IntRect&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::updateCompositingLayers(WebCore::CompositingUpdateType, WebCore::RenderLayer*) 2443.0ms 9.9% 0.0 WebCore::Document::recalcStyle(WebCore::Node::StyleChange)
Created attachment 167796 [details] Profile of running RoboHornetPro 3 times in run-safari in an r130736 Release build on Mac
<rdar://problem/12462665>
http://ie.microsoft.com/testdrive/performance/robohornetpro/ is the URL. To see the 25% you need to click the "run" button in the lower right hand corner, and use Instruments to sample. You may want to run it multiple times during a sample.
I believe Region::union is simply N^2. The union operation is effectively just: union(Region a, Region b) = { return a + b; } Region r = reduce(union, regions); Each region has a vector of spans, and each union operation results in a completely new Region object, which we copy the contents of a and b into. Each time, we're individually copying over each span into the new region. If I'm doing my math right, that means we're doing at least N * (N-1)/2 span copies! (the second to last spans are copied once, the 3rd twice, etc. to N-1) Additionally, we're performing N * ln(N) mallocs from resizing the span vectors. Bad news bears.
I believe all of these Regions have exactly 2 spans (and 2 segments) in their shape. They're all just tracking a single rect for a single layer which is being animated. :(
I think there are various ways that we could fix this. N=number of regions, M= number of spans per region. 1. create a form of the union() operator which was self-modifying. That would allow us to do only N*M span copies instead of our current approximately (N*M)^2. 2. pre-allocate span/segment vectors of the correct size. That would make us have N vector allocations, instead of the current N*ln(M). 3. lazily turn a list of rects into a Region, using a Vector<IntRect> instead of the Region for the map which could eliminate all unnecessary span/segment copies. :) 4. fix bug 84393 or similar to avoid doing this operation so often. 5. find a way to turn off the OverlapMap for this (and similar) scenarios, since Simon says it shouldn't buy us anything in this example. I suspect we'll end up doing a combination of the above. I'm happy to file additional bugs if that would be helpful. I've stared at the Region code enough this afternoon that I suspect I could execute on 1 or 2 and possibly 3. But I also suspect that Anders being Mr. Region, would do a better job than I would, assuming he has the time to do so. It's also possible that 6. we could temporarily disable the OverlayMap if it's not buying us much? I don't really know what it does, so that one is a shot in the dark. :)
Don't want to disable the overlap map. That would cause memory use to balloon on some pages.
I wrote a patch to do #2 (pre-allocate vectors of the correct size). I am not sure I guessed the sizes 100% correctly, as I still see a few fastMalloc's where I wouldn't expect them, but it was a small win. I don't think #2 is worth doing. #1 is where the real money is. It's also unclear to me how exactly the OverlayMap is used -- if it's written to more than it's read from, for instance. An OverlayMap which is smart enough to normally use a Vector<IntRect> and only convert to a Region when necessary might be possible and might be a bigger win even than making Region::union fast. :)
One example of #3 might be: class RegionBuilder { void addRect() { // Adds m_rects. } void addRegion() { // Add to m_regions. } const Region& combinedRegion() { // build m_region from m_regions and m_rects if necessary, clear them and return the combined region. } private: Vector<IntRect> m_rects; Vector<Region> m_regions; Region m_region; }; typedef OverlayMap RegionBuilder; I don't know if that would work well for the OverlayMap usecase. It would depend on the ratio of adds to reads. Simon might know better how often overlayMap is read back during computeCompositingRequirements.
I have a fix for this. I'll break the idea of making unioning regions not O(N^2) off into a separate bug which Anders or someone can look at at a later time. :)
Created attachment 171584 [details] Patch
I've filed the "make Region::union not N^2'd" as bug 100814.
Comment on attachment 171584 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review > Source/WebCore/rendering/RenderLayerCompositor.cpp:81 > +class LazyRegion { I don't really like this name. Maybe call it RectSet or something. > Source/WebCore/rendering/RenderLayerCompositor.cpp:102 > + m_rects.resize(0); Isn't there a clear()? > Source/WebCore/rendering/RenderLayerCompositor.cpp:142 > + return m_overlapStack.last().region().intersects(bounds); Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast?
Comment on attachment 171584 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review >> Source/WebCore/rendering/RenderLayerCompositor.cpp:142 >> + return m_overlapStack.last().region().intersects(bounds); > > Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast? I don't know. It's possible that Region is just completely the wrong data structure for what OverlapMap is trying to do. It looks like hit-testing a region is O(N), just like hit-testing the rect-list would be, so you're probably right. I'm happy to make this "RectSet" and remove references to Region entirely if you think that's best?
Created attachment 171585 [details] A much simpler approach. Kudos to smfr for being a smart man.
Brilliant idea Simon. :)
Comment on attachment 171584 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review >>> Source/WebCore/rendering/RenderLayerCompositor.cpp:142 >>> + return m_overlapStack.last().region().intersects(bounds); >> >> Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast? > > I don't know. It's possible that Region is just completely the wrong data structure for what OverlapMap is trying to do. It looks like hit-testing a region is O(N), just like hit-testing the rect-list would be, so you're probably right. > > I'm happy to make this "RectSet" and remove references to Region entirely if you think that's best? I had originally assumed that the OverlapMap needed to expose Regions (since it was originally written with Regions), but you're right, it doesn't. And hit-testing the IntRects is just as fast. :)
Comment on attachment 171585 [details] A much simpler approach. Kudos to smfr for being a smart man. View in context: https://bugs.webkit.org/attachment.cgi?id=171585&action=review > Source/WebCore/ChangeLog:35 > + * rendering/RenderLayerCompositor.cpp: > + (LazyRegion): > + (WebCore::LazyRegion::unite): > + (WebCore::LazyRegion::updateRegionIfNecessary): > + (WebCore::LazyRegion::region): > + (WebCore): > + (WebCore::RenderLayerCompositor::OverlapMap::overlapsLayers): > + (WebCore::RenderLayerCompositor::OverlapMap::pushCompositingContainer): > + (RenderLayerCompositor::OverlapMap): I don't think you regenerated your changelog. There ain't no LazyRegion no mo. > Source/WebCore/rendering/RenderLayerCompositor.cpp:112 > + const RectList& layerRects = m_overlapStack.last(); > + for (unsigned i = 0; i < layerRects.size(); i++) > + if (layerRects[i].intersects(bounds)) > + return true; Please add braces around the for loop.
Created attachment 171586 [details] Patch for landing
Thank you Sam, Simon & Anders.
Comment on attachment 171586 [details] Patch for landing Clearing flags on attachment: 171586 Committed r132990: <http://trac.webkit.org/changeset/132990>
All reviewed patches have been landed. Closing bug.
If I start out with one entry in the overlap map (0, 0, 100, 100) and then add a second entry (0, 0, 100, 100), to tell if a third rect intersects with anything in the overlap map I have to do two intersections with this list-of-rects. The idea with Region is that you only have to do one rectangle intersection since the unite() call is smart enough to collapse the two together (and in this case it is).
(In reply to comment #23) > If I start out with one entry in the overlap map (0, 0, 100, 100) and then add a second entry (0, 0, 100, 100), to tell if a third rect intersects with anything in the overlap map I have to do two intersections with this list-of-rects. The idea with Region is that you only have to do one rectangle intersection since the unite() call is smart enough to collapse the two together (and in this case it is). True. We could make my patch smart enough to handle similar cases by having it check for containment before adding a new rect. Not perfect. And I'm not sure it's worth the code w/o a test case. In practice this seems much cheaper than the Region case, as a Rect expands to 2 segments and 2 spans in a Shape, and unioning N Regions is O(N^2) (due to bug 100814) and requires O(N*ln(N)) mallocs. Regions are just really inefficient for this simple rect case, as written. I agree we can definitely do better than O(N) for containment test.
I'm certainly not wedded to this fix, btw. It just seemed like a simple way around bug 100814. It's odd to me that we seem to build this map, but never check for overlap in it in in the Matrix/RoboHornetPro example.
(In reply to comment #25) > I'm certainly not wedded to this fix, btw. It just seemed like a simple way around bug 100814. > > It's odd to me that we seem to build this map, but never check for overlap in it in in the Matrix/RoboHornetPro example. I think this is because we turn off overlap testing in the scope of the elements that are running CSS animations: // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform. if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer())) compositingState.m_testingOverlap = false; I'd have to think about whether we can just avoid putting things into the overlap map in that case; it might be that later scopes still need the relevant rectangles. I filed bug 100880.
Comment on attachment 171586 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=171586&action=review > Source/WebCore/rendering/RenderLayerCompositor.cpp:137 > + typedef Vector<IntRect> RectList; > + Vector<RectList> m_overlapStack; If we know typical sizes for these, we might get an even better result with Vector<IntRect, 1> or the like.
(In reply to comment #27) > (From update of attachment 171586 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171586&action=review > > > Source/WebCore/rendering/RenderLayerCompositor.cpp:137 > > + typedef Vector<IntRect> RectList; > > + Vector<RectList> m_overlapStack; > > If we know typical sizes for these, we might get an even better result with Vector<IntRect, 1> or the like. Right now the RoboHornetPro example is entirely dominated by bug 98823 (now that this bug is fixed). But once that's fixed, we might find other tweaks in this area. It's also possible that Simon and others have overlap/layer benchmarks which we could use to tweak this number.