Bug 98800

Summary: RoboHornetPro spends ~25% of total test time in WebCore::Region::Shape methods
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Layout and RenderingAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, arajp, jamesr, jchaffraix, kling, koivisto, mrowe, ojan, simon.fraser, tonyg, webkit-bug-importer, webkit.review.bot
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
URL: http://ie.microsoft.com/testdrive/performance/robohornetpro/
Bug Depends on: 100814, 84393    
Bug Blocks: 98798    
Attachments:
Description Flags
Profile of running RoboHornetPro 3 times in run-safari in an r130736 Release build on Mac
none
Patch
none
A much simpler approach. Kudos to smfr for being a smart man.
none
Patch for landing none

Description Eric Seidel (no email) 2012-10-09 11:26:09 PDT
RoboHornetPro spends ~25% of total test time in WebCore::Region::Shape methods

RoboHornetPro supposedly adds some animation tests to the suite.  I've not yet investigated all the changes from RoboHornet.

Running Time	Self		Symbol Name
3465.0ms   14.0%	3465.0	 	WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&)
2791.0ms   11.3%	2791.0	 	WebCore::Region::Shape::appendSpan(int, int const*, int const*)

The 4th top symbol is also Shape related:
Running Time	Self		Symbol Name
863.0ms    3.5%	863.0	 	memmove$VARIANT$sse42
579.0ms    2.3%	0.0	 	 WTF::Vector<int, 32ul>::reserveCapacity(unsigned long)
579.0ms    2.3%	0.0	 	  void WTF::Vector<int, 32ul>::appendSlowCase<int>(int const&)
565.0ms    2.2%	0.0	 	   WebCore::Region::Shape::appendSpan(int, int const*, int const*)
14.0ms    0.0%	0.0	 	   WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&)


These come from layer updates:
Running Time	Self		Symbol Name
2791.0ms   11.3%	2791.0	 	WebCore::Region::Shape::appendSpan(int, int const*, int const*)
2782.0ms   11.2%	0.0	 	 WebCore::Region::Shape WebCore::Region::Shape::shapeOperation<WebCore::Region::Shape::UnionOperation>(WebCore::Region::Shape const&, WebCore::Region::Shape const&)
2782.0ms   11.2%	0.0	 	  WebCore::Region::unite(WebCore::Region const&)
2782.0ms   11.2%	0.0	 	   WebCore::RenderLayerCompositor::OverlapMap::add(WebCore::RenderLayer const*, WebCore::IntRect const&)
2782.0ms   11.2%	0.0	 	    WebCore::RenderLayerCompositor::addToOverlapMap(WebCore::RenderLayerCompositor::OverlapMap&, WebCore::RenderLayer*, WebCore::IntRect&, bool&)
2782.0ms   11.2%	0.0	 	     WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&)
2782.0ms   11.2%	0.0	 	      WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&)
2782.0ms   11.2%	0.0	 	       WebCore::RenderLayerCompositor::computeCompositingRequirements(WebCore::RenderLayer*, WebCore::RenderLayer*, WebCore::RenderLayerCompositor::OverlapMap*, WebCore::CompositingState&, bool&, bool&)
2782.0ms   11.2%	0.0	 	        WebCore::RenderLayerCompositor::updateCompositingLayers(WebCore::CompositingUpdateType, WebCore::RenderLayer*)
2443.0ms    9.9%	0.0	 	         WebCore::Document::recalcStyle(WebCore::Node::StyleChange)
Comment 1 Eric Seidel (no email) 2012-10-09 11:27:02 PDT
Created attachment 167796 [details]
Profile of running RoboHornetPro 3 times in run-safari in an r130736 Release build on Mac
Comment 2 Radar WebKit Bug Importer 2012-10-09 11:29:17 PDT
<rdar://problem/12462665>
Comment 3 Eric Seidel (no email) 2012-10-09 17:28:38 PDT
http://ie.microsoft.com/testdrive/performance/robohornetpro/ is the URL.  To see the 25% you need to click the "run" button in the lower right hand corner, and use Instruments to sample.  You may want to run it multiple times during a sample.
Comment 4 Eric Seidel (no email) 2012-10-11 13:25:48 PDT
I believe Region::union is simply N^2.

The union operation is effectively just:

union(Region a, Region b) = { return a + b; }
Region r = reduce(union, regions);

Each region has a vector of spans, and each union operation results in a completely new Region object, which we copy the contents of a and b into.  Each time, we're individually copying over each span into the new region.

If I'm doing my math right, that means we're doing at least N * (N-1)/2 span copies! (the second to last spans are copied once, the 3rd twice, etc. to N-1)  Additionally, we're performing N * ln(N) mallocs from resizing the span vectors.

Bad news bears.
Comment 5 Eric Seidel (no email) 2012-10-11 13:33:11 PDT
I believe all of these Regions have exactly 2 spans (and 2 segments) in their shape.  They're all just tracking a single rect for a single layer which is being animated.  :(
Comment 6 Eric Seidel (no email) 2012-10-11 13:52:01 PDT
I think there are various ways that we could fix this.

N=number of regions, M= number of spans per region.

1. create a form of the union() operator which was self-modifying.  That would allow us to do only N*M span copies instead of our current approximately (N*M)^2.

2. pre-allocate span/segment vectors of the correct size.  That would make us have N vector allocations, instead of the current N*ln(M).

3. lazily turn a list of rects into a Region, using a Vector<IntRect> instead of the Region for the map which could eliminate all unnecessary span/segment copies. :)

4. fix bug 84393 or similar to avoid doing this operation so often.

5. find a way to turn off the OverlapMap for this (and similar) scenarios, since Simon says it shouldn't buy us anything in this example.

I suspect we'll end up doing a combination of the above.  I'm happy to file additional bugs if that would be helpful.

I've stared at the Region code enough this afternoon that I suspect I could execute on 1 or 2 and possibly 3.  But I also suspect that Anders being Mr. Region, would do a better job than I would, assuming he has the time to do so.

It's also possible that 6. we could temporarily disable the OverlayMap if it's not buying us much?  I don't really know what it does, so that one is a shot in the dark. :)
Comment 7 Simon Fraser (smfr) 2012-10-11 14:11:16 PDT
Don't want to disable the overlap map. That would cause memory use to balloon on some pages.
Comment 8 Eric Seidel (no email) 2012-10-11 14:17:40 PDT
I wrote a patch to do #2 (pre-allocate vectors of the correct size).  I am not sure I guessed the sizes 100% correctly, as I still see a few fastMalloc's where I wouldn't expect them, but it was a small win.

I don't think #2 is worth doing.  #1 is where the real money is.

It's also unclear to me how exactly the OverlayMap is used -- if it's written to more than it's read from, for instance.  An OverlayMap which is smart enough to normally use a Vector<IntRect> and only convert to a Region when necessary might be possible and might be a bigger win even than making Region::union fast. :)
Comment 9 Eric Seidel (no email) 2012-10-11 15:32:07 PDT
One example of #3 might be:

class RegionBuilder {
void addRect()
{
// Adds m_rects.
}
void addRegion()
{
// Add to m_regions.
}
const Region& combinedRegion()
{
// build m_region from m_regions and m_rects if necessary, clear them and return the combined region.
}
private:
Vector<IntRect> m_rects;
Vector<Region> m_regions;
Region m_region;
};
typedef OverlayMap RegionBuilder;

I don't know if that would work well for the OverlayMap usecase.  It would depend on the ratio of adds to reads.  Simon might know better how often overlayMap is read back during computeCompositingRequirements.
Comment 10 Eric Seidel (no email) 2012-10-30 22:09:47 PDT
I have a fix for this.  I'll break the idea of making unioning regions not O(N^2) off into a separate bug which Anders or someone can look at at a later time. :)
Comment 11 Eric Seidel (no email) 2012-10-30 22:23:37 PDT
Created attachment 171584 [details]
Patch
Comment 12 Eric Seidel (no email) 2012-10-30 22:25:18 PDT
I've filed the "make Region::union not N^2'd" as bug 100814.
Comment 13 Simon Fraser (smfr) 2012-10-30 22:27:59 PDT
Comment on attachment 171584 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review

> Source/WebCore/rendering/RenderLayerCompositor.cpp:81
> +class LazyRegion {

I don't really like this name. Maybe call it RectSet or something.

> Source/WebCore/rendering/RenderLayerCompositor.cpp:102
> +        m_rects.resize(0);

Isn't there a clear()?

> Source/WebCore/rendering/RenderLayerCompositor.cpp:142
> +        return m_overlapStack.last().region().intersects(bounds);

Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast?
Comment 14 Eric Seidel (no email) 2012-10-30 22:32:50 PDT
Comment on attachment 171584 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review

>> Source/WebCore/rendering/RenderLayerCompositor.cpp:142
>> +        return m_overlapStack.last().region().intersects(bounds);
> 
> Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast?

I don't know.  It's possible that Region is just completely the wrong data structure for what OverlapMap is trying to do.   It looks like hit-testing a region is O(N), just like hit-testing the rect-list would be, so you're probably right.

I'm happy to make this "RectSet" and remove references to Region entirely if you think that's best?
Comment 15 Eric Seidel (no email) 2012-10-30 22:45:41 PDT
Created attachment 171585 [details]
A much simpler approach. Kudos to smfr for being a smart man.
Comment 16 Eric Seidel (no email) 2012-10-30 22:45:59 PDT
Brilliant idea Simon. :)
Comment 17 Eric Seidel (no email) 2012-10-30 22:50:48 PDT
Comment on attachment 171584 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=171584&action=review

>>> Source/WebCore/rendering/RenderLayerCompositor.cpp:142
>>> +        return m_overlapStack.last().region().intersects(bounds);
>> 
>> Do we even need to convert to a Region to do the intersection test? Would testing the individual rects be just as fast?
> 
> I don't know.  It's possible that Region is just completely the wrong data structure for what OverlapMap is trying to do.   It looks like hit-testing a region is O(N), just like hit-testing the rect-list would be, so you're probably right.
> 
> I'm happy to make this "RectSet" and remove references to Region entirely if you think that's best?

I had originally assumed that the OverlapMap needed to expose Regions (since it was originally written with Regions), but you're right, it doesn't.  And hit-testing the IntRects is just as fast. :)
Comment 18 Sam Weinig 2012-10-30 22:54:34 PDT
Comment on attachment 171585 [details]
A much simpler approach. Kudos to smfr for being a smart man.

View in context: https://bugs.webkit.org/attachment.cgi?id=171585&action=review

> Source/WebCore/ChangeLog:35
> +        * rendering/RenderLayerCompositor.cpp:
> +        (LazyRegion):
> +        (WebCore::LazyRegion::unite):
> +        (WebCore::LazyRegion::updateRegionIfNecessary):
> +        (WebCore::LazyRegion::region):
> +        (WebCore):
> +        (WebCore::RenderLayerCompositor::OverlapMap::overlapsLayers):
> +        (WebCore::RenderLayerCompositor::OverlapMap::pushCompositingContainer):
> +        (RenderLayerCompositor::OverlapMap):

I don't think you regenerated your changelog.  There ain't no LazyRegion no mo.

> Source/WebCore/rendering/RenderLayerCompositor.cpp:112
> +        const RectList& layerRects = m_overlapStack.last();
> +        for (unsigned i = 0; i < layerRects.size(); i++)
> +            if (layerRects[i].intersects(bounds))
> +                return true;

Please add braces around the for loop.
Comment 19 Eric Seidel (no email) 2012-10-30 22:58:24 PDT
Created attachment 171586 [details]
Patch for landing
Comment 20 Eric Seidel (no email) 2012-10-30 22:59:57 PDT
Thank you Sam, Simon & Anders.
Comment 21 WebKit Review Bot 2012-10-30 23:48:30 PDT
Comment on attachment 171586 [details]
Patch for landing

Clearing flags on attachment: 171586

Committed r132990: <http://trac.webkit.org/changeset/132990>
Comment 22 WebKit Review Bot 2012-10-30 23:48:35 PDT
All reviewed patches have been landed.  Closing bug.
Comment 23 James Robinson 2012-10-31 00:05:38 PDT
If I start out with one entry in the overlap map (0, 0, 100, 100) and then add a second entry (0, 0, 100, 100), to tell if a third rect intersects with anything in the overlap map I have to do two intersections with this list-of-rects.  The idea with Region is that you only have to do one rectangle intersection since the unite() call is smart enough to collapse the two together (and in this case it is).
Comment 24 Eric Seidel (no email) 2012-10-31 01:19:41 PDT
(In reply to comment #23)
> If I start out with one entry in the overlap map (0, 0, 100, 100) and then add a second entry (0, 0, 100, 100), to tell if a third rect intersects with anything in the overlap map I have to do two intersections with this list-of-rects.  The idea with Region is that you only have to do one rectangle intersection since the unite() call is smart enough to collapse the two together (and in this case it is).

True.  We could make my patch smart enough to handle similar cases by having it check for containment before adding a new rect.  Not perfect.  And I'm not sure it's worth the code w/o a test case.

In practice this seems much cheaper than the Region case, as a Rect expands to 2 segments and 2 spans in a Shape, and unioning N Regions is O(N^2) (due to bug 100814) and requires O(N*ln(N)) mallocs.

Regions are just really inefficient for this simple rect case, as written.  I agree we can definitely do better than O(N) for containment test.
Comment 25 Eric Seidel (no email) 2012-10-31 01:22:00 PDT
I'm certainly not wedded to this fix, btw.  It just seemed like a simple way around bug 100814.

It's odd to me that we seem to build this map, but never check for overlap in it in in the Matrix/RoboHornetPro example.
Comment 26 Simon Fraser (smfr) 2012-10-31 13:08:51 PDT
(In reply to comment #25)
> I'm certainly not wedded to this fix, btw.  It just seemed like a simple way around bug 100814.
> 
> It's odd to me that we seem to build this map, but never check for overlap in it in in the Matrix/RoboHornetPro example.

I think this is because we turn off overlap testing in the scope of the elements that are running CSS animations:

    // Turn overlap testing off for later layers if it's already off, or if we have a 3D transform or an animating transform.
    if (!childState.m_testingOverlap || layer->has3DTransform() || isRunningAcceleratedTransformAnimation(layer->renderer()))
        compositingState.m_testingOverlap = false;

I'd have to think about whether we can just avoid putting things into the overlap map in that case; it might be that later scopes still need the relevant rectangles. I filed bug 100880.
Comment 27 Darin Adler 2012-11-01 10:02:42 PDT
Comment on attachment 171586 [details]
Patch for landing

View in context: https://bugs.webkit.org/attachment.cgi?id=171586&action=review

> Source/WebCore/rendering/RenderLayerCompositor.cpp:137
> +    typedef Vector<IntRect> RectList;
> +    Vector<RectList> m_overlapStack;

If we know typical sizes for these, we might get an even better result with Vector<IntRect, 1> or the like.
Comment 28 Eric Seidel (no email) 2012-11-01 10:40:01 PDT
(In reply to comment #27)
> (From update of attachment 171586 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=171586&action=review
> 
> > Source/WebCore/rendering/RenderLayerCompositor.cpp:137
> > +    typedef Vector<IntRect> RectList;
> > +    Vector<RectList> m_overlapStack;
> 
> If we know typical sizes for these, we might get an even better result with Vector<IntRect, 1> or the like.

Right now the RoboHornetPro example is entirely dominated by bug 98823 (now that this bug is fixed).  But once that's fixed, we might find other tweaks in this area.  It's also possible that Simon and others have overlap/layer benchmarks which we could use to tweak this number.