Bug 98800

Summary: RoboHornetPro spends ~25% of total test time in WebCore::Region::Shape methods
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Layout and RenderingAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, arajp, jamesr, jchaffraix, kling, koivisto, mrowe, ojan, simon.fraser, tonyg, webkit-bug-importer, webkit.review.bot
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
URL: http://ie.microsoft.com/testdrive/performance/robohornetpro/
Bug Depends on: 100814, 84393    
Bug Blocks: 98798    
Attachments:
Description Flags
Profile of running RoboHornetPro 3 times in run-safari in an r130736 Release build on Mac
none
Patch
none
A much simpler approach. Kudos to smfr for being a smart man.
none
Patch for landing none

Eric Seidel (no email)
Reported 2012-10-09 11:26:09 PDT
RoboHornetPro spends ~25% of total test time in WebCore::Region::Shape methods RoboHornetPro supposedly adds some animation tests to the suite. I've not yet investigated all the changes from RoboHornet. Running Time Self Symbol Name 3465.0ms 14.0% 3465.0 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&) 2791.0ms 11.3% 2791.0 WebCore::Region::Shape::appendSpan(int, int const*, int const*) The 4th top symbol is also Shape related: Running Time Self Symbol Name 863.0ms 3.5% 863.0 memmove$VARIANT$sse42 579.0ms 2.3% 0.0 WTF::Vector<int, 32ul>::reserveCapacity(unsigned long) 579.0ms 2.3% 0.0 void WTF::Vector<int, 32ul>::appendSlowCase<int>(int const&) 565.0ms 2.2% 0.0 WebCore::Region::Shape::appendSpan(int, int const*, int const*) 14.0ms 0.0% 0.0 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&) These come from layer updates: Running Time Self Symbol Name 2791.0ms 11.3% 2791.0 WebCore::Region::Shape::appendSpan(int, int const*, int const*) 2782.0ms 11.2% 0.0 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&) 2782.0ms 11.2% 0.0 WebCore::Region::unite(WebCore::Region const&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::OverlapMap::add(WebCore::RenderLayer const*, WebCore::IntRect const&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::addToOverlapMap(WebCore::RenderLayerCompositor::OverlapMap&, WebCore::RenderLayer*, WebCore::IntRect&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&) 2782.0ms 11.2% 0.0 WebCore::RenderLayerCompositor::updateCompositingLayers(WebCore::CompositingUpdateType, WebCore::RenderLayer*) 2443.0ms 9.9% 0.0 WebCore::Document::recalcStyle(WebCore::Node::StyleChange)
Attachments
Profile of running RoboHornetPro 3 times in run-safari in an r130736 Release build on Mac (1.14 MB, application/zip)
2012-10-09 11:27 PDT, Eric Seidel (no email)
no flags
Patch (4.13 KB, patch)
2012-10-30 22:23 PDT, Eric Seidel (no email)
no flags
A much simpler approach. Kudos to smfr for being a smart man. (4.07 KB, patch)
2012-10-30 22:45 PDT, Eric Seidel (no email)
no flags
Patch for landing (4.04 KB, patch)
2012-10-30 22:58 PDT, Eric Seidel (no email)
no flags
Eric Seidel (no email)
Comment 1 2012-10-09 11:27:02 PDT
Created attachment 167796 [details] Profile of running RoboHornetPro 3 times in run-safari in an r130736 Release build on Mac
Radar WebKit Bug Importer
Comment 2 2012-10-09 11:29:17 PDT
Eric Seidel (no email)
Comment 3 2012-10-09 17:28:38 PDT
http://ie.microsoft.com/testdrive/performance/robohornetpro/ is the URL. To see the 25% you need to click the "run" button in the lower right hand corner, and use Instruments to sample. You may want to run it multiple times during a sample.
Eric Seidel (no email)
Comment 4 2012-10-11 13:25:48 PDT
I believe Region::union is simply N^2. The union operation is effectively just: union(Region a, Region b) = { return a + b; } Region r = reduce(union, regions); Each region has a vector of spans, and each union operation results in a completely new Region object, which we copy the contents of a and b into. Each time, we're individually copying over each span into the new region. If I'm doing my math right, that means we're doing at least N * (N-1)/2 span copies! (the second to last spans are copied once, the 3rd twice, etc. to N-1) Additionally, we're performing N * ln(N) mallocs from resizing the span vectors. Bad news bears.
Eric Seidel (no email)
Comment 5 2012-10-11 13:33:11 PDT
I believe all of these Regions have exactly 2 spans (and 2 segments) in their shape. They're all just tracking a single rect for a single layer which is being animated. :(
Eric Seidel (no email)
Comment 6 2012-10-11 13:52:01 PDT
I think there are various ways that we could fix this. N=number of regions, M= number of spans per region. 1. create a form of the union() operator which was self-modifying. That would allow us to do only N*M span copies instead of our current approximately (N*M)^2. 2. pre-allocate span/segment vectors of the correct size. That would make us have N vector allocations, instead of the current N*ln(M). 3. lazily turn a list of rects into a Region, using a Vector<IntRect> instead of the Region for the map which could eliminate all unnecessary span/segment copies. :) 4. fix bug 84393 or similar to avoid doing this operation so often. 5. find a way to turn off the OverlapMap for this (and similar) scenarios, since Simon says it shouldn't buy us anything in this example. I suspect we'll end up doing a combination of the above. I'm happy to file additional bugs if that would be helpful. I've stared at the Region code enough this afternoon that I suspect I could execute on 1 or 2 and possibly 3. But I also suspect that Anders being Mr. Region, would do a better job than I would, assuming he has the time to do so. It's also possible that 6. we could temporarily disable the OverlayMap if it's not buying us much? I don't really know what it does, so that one is a shot in the dark. :)
Simon Fraser (smfr)
Comment 7 2012-10-11 14:11:16 PDT
Don't want to disable the overlap map. That would cause memory use to balloon on some pages.
Eric Seidel (no email)
Comment 8 2012-10-11 14:17:40 PDT
I wrote a patch to do #2 (pre-allocate vectors of the correct size). I am not sure I guessed the sizes 100% correctly, as I still see a few fastMalloc's where I wouldn't expect them, but it was a small win. I don't think #2 is worth doing. #1 is where the real money is. It's also unclear to me how exactly the OverlayMap is used -- if it's written to more than it's read from, for instance. An OverlayMap which is smart enough to normally use a Vector<IntRect> and only convert to a Region when necessary might be possible and might be a bigger win even than making Region::union fast. :)
Eric Seidel (no email)
Comment 9 2012-10-11 15:32:07 PDT
One example of #3 might be: class RegionBuilder { void addRect() { // Adds m_rects. } void addRegion() { // Add to m_regions. } const Region& combinedRegion() { // build m_region from m_regions and m_rects if necessary, clear them and return the combined region. } private: Vector<IntRect> m_rects; Vector<Region> m_regions; Region m_region; }; typedef OverlayMap RegionBuilder; I don't know if that would work well for the OverlayMap usecase. It would depend on the ratio of adds to reads. Simon might know better how often overlayMap is read back during computeCompositingRequirements.
Eric Seidel (no email)
Comment 10 2012-10-30 22:09:47 PDT
I have a fix for this. I'll break the idea of making unioning regions not O(N^2) off into a separate bug which Anders or someone can look at at a later time. :)
Eric Seidel (no email)
Comment 11 2012-10-30 22:23:37 PDT
Eric Seidel (no email)
Comment 12 2012-10-30 22:25:18 PDT
I've filed the "make Region::union not N^2'd" as bug 100814.
Simon Fraser (smfr)
Comment 13 2012-10-30 22:27:59 PDT
Comment on attachment 171584 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review > Source/WebCore/rendering/RenderLayerCompositor.cpp:81 > +class LazyRegion { I don't really like this name. Maybe call it RectSet or something. > Source/WebCore/rendering/RenderLayerCompositor.cpp:102 > + m_rects.resize(0); Isn't there a clear()? > Source/WebCore/rendering/RenderLayerCompositor.cpp:142 > + return m_overlapStack.last().region().intersects(bounds); Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast?
Eric Seidel (no email)
Comment 14 2012-10-30 22:32:50 PDT
Comment on attachment 171584 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review >> Source/WebCore/rendering/RenderLayerCompositor.cpp:142 >> + return m_overlapStack.last().region().intersects(bounds); > > Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast? I don't know. It's possible that Region is just completely the wrong data structure for what OverlapMap is trying to do. It looks like hit-testing a region is O(N), just like hit-testing the rect-list would be, so you're probably right. I'm happy to make this "RectSet" and remove references to Region entirely if you think that's best?
Eric Seidel (no email)
Comment 15 2012-10-30 22:45:41 PDT
Created attachment 171585 [details] A much simpler approach. Kudos to smfr for being a smart man.
Eric Seidel (no email)
Comment 16 2012-10-30 22:45:59 PDT
Brilliant idea Simon. :)
Eric Seidel (no email)
Comment 17 2012-10-30 22:50:48 PDT
Comment on attachment 171584 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review >>> Source/WebCore/rendering/RenderLayerCompositor.cpp:142 >>> + return m_overlapStack.last().region().intersects(bounds); >> >> Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast? > > I don't know. It's possible that Region is just completely the wrong data structure for what OverlapMap is trying to do. It looks like hit-testing a region is O(N), just like hit-testing the rect-list would be, so you're probably right. > > I'm happy to make this "RectSet" and remove references to Region entirely if you think that's best? I had originally assumed that the OverlapMap needed to expose Regions (since it was originally written with Regions), but you're right, it doesn't. And hit-testing the IntRects is just as fast. :)
Sam Weinig
Comment 18 2012-10-30 22:54:34 PDT
Comment on attachment 171585 [details] A much simpler approach. Kudos to smfr for being a smart man. View in context: https://bugs.webkit.org/attachment.cgi?id=171585&action=review > Source/WebCore/ChangeLog:35 > + * rendering/RenderLayerCompositor.cpp: > + (LazyRegion): > + (WebCore::LazyRegion::unite): > + (WebCore::LazyRegion::updateRegionIfNecessary): > + (WebCore::LazyRegion::region): > + (WebCore): > + (WebCore::RenderLayerCompositor::OverlapMap::overlapsLayers): > + (WebCore::RenderLayerCompositor::OverlapMap::pushCompositingContainer): > + (RenderLayerCompositor::OverlapMap): I don't think you regenerated your changelog. There ain't no LazyRegion no mo. > Source/WebCore/rendering/RenderLayerCompositor.cpp:112 > + const RectList& layerRects = m_overlapStack.last(); > + for (unsigned i = 0; i < layerRects.size(); i++) > + if (layerRects[i].intersects(bounds)) > + return true; Please add braces around the for loop.
Eric Seidel (no email)
Comment 19 2012-10-30 22:58:24 PDT
Created attachment 171586 [details] Patch for landing
Eric Seidel (no email)
Comment 20 2012-10-30 22:59:57 PDT
Thank you Sam, Simon & Anders.
WebKit Review Bot
Comment 21 2012-10-30 23:48:30 PDT
Comment on attachment 171586 [details] Patch for landing Clearing flags on attachment: 171586 Committed r132990: <http://trac.webkit.org/changeset/132990>
WebKit Review Bot
Comment 22 2012-10-30 23:48:35 PDT
All reviewed patches have been landed. Closing bug.
James Robinson
Comment 23 2012-10-31 00:05:38 PDT
If I start out with one entry in the overlap map (0, 0, 100, 100) and then add a second entry (0, 0, 100, 100), to tell if a third rect intersects with anything in the overlap map I have to do two intersections with this list-of-rects. The idea with Region is that you only have to do one rectangle intersection since the unite() call is smart enough to collapse the two together (and in this case it is).
Eric Seidel (no email)
Comment 24 2012-10-31 01:19:41 PDT
(In reply to comment #23) > If I start out with one entry in the overlap map (0, 0, 100, 100) and then add a second entry (0, 0, 100, 100), to tell if a third rect intersects with anything in the overlap map I have to do two intersections with this list-of-rects. The idea with Region is that you only have to do one rectangle intersection since the unite() call is smart enough to collapse the two together (and in this case it is). True. We could make my patch smart enough to handle similar cases by having it check for containment before adding a new rect. Not perfect. And I'm not sure it's worth the code w/o a test case. In practice this seems much cheaper than the Region case, as a Rect expands to 2 segments and 2 spans in a Shape, and unioning N Regions is O(N^2) (due to bug 100814) and requires O(N*ln(N)) mallocs. Regions are just really inefficient for this simple rect case, as written. I agree we can definitely do better than O(N) for containment test.
Eric Seidel (no email)
Comment 25 2012-10-31 01:22:00 PDT
I'm certainly not wedded to this fix, btw. It just seemed like a simple way around bug 100814. It's odd to me that we seem to build this map, but never check for overlap in it in in the Matrix/RoboHornetPro example.
Simon Fraser (smfr)
Comment 26 2012-10-31 13:08:51 PDT
(In reply to comment #25) > I'm certainly not wedded to this fix, btw. It just seemed like a simple way around bug 100814. > > It's odd to me that we seem to build this map, but never check for overlap in it in in the Matrix/RoboHornetPro example. I think this is because we turn off overlap testing in the scope of the elements that are running CSS animations: // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform. if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer())) compositingState.m_testingOverlap = false; I'd have to think about whether we can just avoid putting things into the overlap map in that case; it might be that later scopes still need the relevant rectangles. I filed bug 100880.
Darin Adler
Comment 27 2012-11-01 10:02:42 PDT
Comment on attachment 171586 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=171586&action=review > Source/WebCore/rendering/RenderLayerCompositor.cpp:137 > + typedef Vector<IntRect> RectList; > + Vector<RectList> m_overlapStack; If we know typical sizes for these, we might get an even better result with Vector<IntRect, 1> or the like.
Eric Seidel (no email)
Comment 28 2012-11-01 10:40:01 PDT
(In reply to comment #27) > (From update of attachment 171586 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=171586&action=review > > > Source/WebCore/rendering/RenderLayerCompositor.cpp:137 > > + typedef Vector<IntRect> RectList; > > + Vector<RectList> m_overlapStack; > > If we know typical sizes for these, we might get an even better result with Vector<IntRect, 1> or the like. Right now the RoboHornetPro example is entirely dominated by bug 98823 (now that this bug is fixed). But once that's fixed, we might find other tweaks in this area. It's also possible that Simon and others have overlap/layer benchmarks which we could use to tweak this number.
Note You need to log in before you can comment on or make changes to this bug.