Bug 65668 - Optimize floating elements lookup
Summary: Optimize floating elements lookup
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
Depends on: 65667 65868
  Show dependency treegraph
Reported: 2011-08-04 00:25 PDT by Alexandru Chiculita
Modified: 2011-08-09 00:23 PDT (History)
7 users (show)

See Also:

Patch (41.89 KB, patch)
2011-08-04 09:22 PDT, Alexandru Chiculita
no flags Details | Formatted Diff | Diff
Patch (41.91 KB, patch)
2011-08-04 09:41 PDT, Alexandru Chiculita
no flags Details | Formatted Diff | Diff
Simple Performance Test (1.99 KB, text/html)
2011-08-04 18:20 PDT, Alexandru Chiculita
no flags Details
Patch (44.28 KB, patch)
2011-08-05 09:37 PDT, Alexandru Chiculita
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Patch (44.28 KB, patch)
2011-08-05 10:09 PDT, Alexandru Chiculita
hyatt: review-
hyatt: commit-queue-
Details | Formatted Diff | Diff
Patch (51.22 KB, patch)
2011-08-08 07:26 PDT, Alexandru Chiculita
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Fixed build (51.22 KB, patch)
2011-08-08 08:01 PDT, Alexandru Chiculita
no flags Details | Formatted Diff | Diff
Patch (51.27 KB, patch)
2011-08-08 08:10 PDT, Alexandru Chiculita
no flags Details | Formatted Diff | Diff
Patch (51.26 KB, patch)
2011-08-08 08:34 PDT, Alexandru Chiculita
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandru Chiculita 2011-08-04 00:25:01 PDT
Currently the floating RenderObjects are stored in a list and searched by using a linear search. When the number of floats increases the layout becomes very slow. A better structure should be used.
Comment 1 Alexandru Chiculita 2011-08-04 00:29:39 PDT
Two options are currently available:
1. Use the shape algebra algorithm described in Graphics Gems II and implemented in Regions.cpp that is used by WebKit2 to track the dirty areas
2. Use the PODIntervalTree.h  that was used by the GPU rendering of polygons in WebCore

* Regions.cpp seems to be very promising although the current implementation is not there yet. My current prototype is slower then the simple linear search. It needs to be highly optimized for our simple operation: adding a new rectangle (ie. no full memory copies, in-place updates, skipping spans before and after the rectangle)
* Uses less memory then interval trees for compact floats that have same height (eg. lots of floats per line)
* It is not able to query a list of floats that affect a line and that might later be needed for wrap-shape exclusions.
* Search time should be O(logN) where N is the amount of spans created by the algorithm.

Interval trees: 
* Current prototype is already very fast
* Uses a fixed amount of memory O(N) where N is number of floats, but I think that can be improved by merging the nodes for the compact floats. It should become similar to Regions and will also improve speed (because of having less nodes).
* Has O(logN) insert/remove/search times (N is the number of floats)
* Can query exact list of floats for a specific line

Regions.cpp uses a lot less memory when the floats are really compact. However, it seems to take a lot of time to update the structure. The memory increases when the floats are really small in height, because it will tend to create lots of spans.  However, the implementation in Regions is not optimized for lots of spans. There's an optimization mentioned at the end of the Graphics Gems chapter about skipping non-affected spans. Also we only need to unite a rect and in that case the operation can be further optimized. 

Memory allocations are also a big issue, a simple operation will copy the whole structure. The article optimizes the memory allocations by using a linked list and doing most of the updates in-place. The Regions.cpp implementation also tries to merge spans, but the article says it is not worth the time if the structure is not going to be used for a bigger amount of time. In our case the Region will always be recreated when a new layout is done. 

Indeed the query for rectangles at a specific height can be very fast, it would only require a search in a sorted list. It should always retrieve a maximum of two rectangles: one for all the floats on the left and one rectangle for all the floats on the right, so the actual calculation is pretty easy. That will get a little more complicated for positioned elements. However, when wrap-shapes will be implemented, it will need to know exactly what floats are on the line, so that the wrap-shape can be taken into account. Using this algorithm that information seems to be lost in the compression. The alternative would be to approximate the curves of the wrap-shape using rectangles and pollute the floats Region. Having that will create lots of spans and will only make it a lot slower. Moreover, such a solution will make it dependent on the resolution.

Removing the floats below a specified Y value can also be optimized by just removing all the spans after one point. That is needed when rewinding a line, for example for pagination purposes.
MarkAllDescendantsWithFloatsForLayout also removes the floats, but it looks like there's no need to update the Region in that case. That's because the layout will be forced on those blocks , so the region will be totally cleared anyway. In that case it seems it always just adds new rectangles, so a subtract is not actually needed, meaning that we can optimize only for a union between a Region and a LayoutRect.

An interval tree will create a node for each placed float. The implementation is based on a red/black balanced tree, so we got O(logN) for all the operations. The good news is that the implementation allocates the nodes in an arena, so it is very fast because no free or destroy callbacks are used. The arena goes away when the tree is cleared and that will happen at beginning of every layout. That is important because we tend to have few removals from the tree (only when a line is rewinded at the end of a column/region/page). The interval tree is very predictable,  it always uses a specific amount of memory O(n) where n is the number of floats and it has O(logN) insert/remove/search times. It also allows querying an exact list of floats that are affecting a line (needed for wrap-shape).

The memory of the interval tree may be improved by doing a search before adding a new interval. For example multiple same height floats on a single line can be merged together, all it needs to do is to update offset of the interval.
Comment 2 Alexandru Chiculita 2011-08-04 00:33:57 PDT
Per discussion with David Hyatt:
1. We will go with interval trees.
2. Also other optimizations can be explored in order to avoid tree propagation through descendant blocks.
Comment 3 Alexandru Chiculita 2011-08-04 09:22:21 PDT
Created attachment 102931 [details]
Comment 4 Alexandru Chiculita 2011-08-04 09:41:34 PDT
Created attachment 102936 [details]

Rebased the patch
Comment 5 Sam Weinig 2011-08-04 16:10:36 PDT
Does this show a speed up on any tests?  Can we construct a test that shows an improvement (something with a lot of floats)?  

Looking good though!
Comment 6 Alexandru Chiculita 2011-08-04 18:20:31 PDT
Created attachment 103014 [details]
Simple Performance Test

Created a simple performance test.
On my machine I get nearly 7fps with nightly builds and about 18fps with this patch applied.
Comment 7 Adam Barth 2011-08-04 18:39:33 PDT
> Simple Performance Test

Feel encouraged to put the perf test somewhere in this folder:

Comment 8 Alexandru Chiculita 2011-08-04 21:14:35 PDT
(In reply to comment #7)
> > Simple Performance Test
> Feel encouraged to put the perf test somewhere in this folder:
> http://trac.webkit.org/browser/trunk/PerformanceTests
> :)

I've added https://bugs.webkit.org/show_bug.cgi?id=65741 . I will submit a patch today.
Comment 9 Alexandru Chiculita 2011-08-05 09:37:14 PDT
Created attachment 103078 [details]

Removed the need to populate a vector.
Comment 10 WebKit Review Bot 2011-08-05 09:46:53 PDT
Comment on attachment 103078 [details]

Attachment 103078 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9305871
Comment 11 Alexandru Chiculita 2011-08-05 10:09:46 PDT
Created attachment 103083 [details]

Fixed build.
Comment 12 Sam Weinig 2011-08-05 12:03:15 PDT
(In reply to comment #6)
> Created an attachment (id=103014) [details]
> Simple Performance Test
> Created a simple performance test.
> On my machine I get nearly 7fps with nightly builds and about 18fps with this patch applied.

Very nice!
Comment 13 Dave Hyatt 2011-08-05 12:13:37 PDT
Comment on attachment 103083 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=103083&action=review

Overall this looks pretty good. I have one major issue with the patch though, and that is that I would prefer it if the interval tree was built lazily if possible. I could be wrong about this but if you have something like this:

<img style="float:left;width:10px;height:1000px">
<div><div><div>....lots more nested divs.... <div>Some text

We propagate floating objects into all of the nested divs. This duplication is bad enough and is something we need to fix, but if you don't build the interval tree lazily you will also have tons of duplication in each div, even though nobody actually *tested* for overlap in any of the intermediate divs.

It seems like you could have a notion of whether or not a float's overlap has been computed that is independent of being placed and add to the interval tree lazily when someone actually does an overlap test.

I think I'd like to see a deeply nested div performance test in which many many floats intrude into all of those divs to verify the potential performance gain from the optimization I'm suggesting. That will also help us measure when we remove all of the duplicate FloatingObject construction down the road as well.

> Source/WebCore/rendering/RenderBlock.cpp:94
> +#ifndef NDEBUG
> +// These helpers are only used by the PODIntervalTree for debugging purposes.
> +String ValueToString<int>::string(const int value)
> +{
> +    return String::number(value);
> +}
> +
> +String ValueToString<RenderBlock::FloatingObject*>::string(const RenderBlock::FloatingObject* floatingObject)
> +{
> +    return String::format("%p (%dx%d %dx%d)", floatingObject, floatingObject->x(), floatingObject->y(), floatingObject->maxX(), floatingObject->maxY());
> +}
> +#endif

Can we put these at the end of the file along with the renderName dumping.

> Source/WebCore/rendering/RenderBlock.cpp:1211
> +    if (m_floatingObjects)
> +        m_floatingObjects->setHorizontalWritingMode(isHorizontalWritingMode());

Seems like you could do this inside clearFloats.

> Source/WebCore/rendering/RenderBlock.cpp:3419
>          floatingObject->setIsPlaced();
> +        m_floatingObjects->addPlacedObject(floatingObject);

You could move setIsPlaced inside addPlacedObject.

> Source/WebCore/rendering/RenderBlock.cpp:3666
>          floatingObject->setIsPlaced(true);
> +        m_floatingObjects->addPlacedObject(floatingObject);

Again, you could move setIsPlaced inside addPlacedObject.
Comment 14 Dave Hyatt 2011-08-05 12:15:40 PDT
Note my comments about "You could move setIsPlaced inside addPlacedObject." don't apply if you do the lazy interval tree.
Comment 15 Alexandru Chiculita 2011-08-08 07:26:41 PDT
Created attachment 103242 [details]

Updated the floats test to also create some empty nested divs. It seems that the lazy init optimization makes a big difference. Thanks for the tip.

Also addressed the other comments.

Regarding "You could move setIsPlaced inside addPlacedObject" comment. I think it's ok to move it, even if I changed to lazy init now.
Comment 16 WebKit Review Bot 2011-08-08 08:00:38 PDT
Comment on attachment 103242 [details]

Attachment 103242 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9329853

New failing tests:
Comment 17 Alexandru Chiculita 2011-08-08 08:01:23 PDT
Created attachment 103246 [details]
Fixed build
Comment 18 Alexandru Chiculita 2011-08-08 08:10:04 PDT
Created attachment 103247 [details]
Comment 19 Alexandru Chiculita 2011-08-08 08:34:10 PDT
Created attachment 103252 [details]
Comment 20 Dave Hyatt 2011-08-08 09:11:34 PDT
Comment on attachment 103252 [details]

All looks good! r=me!
Comment 21 WebKit Review Bot 2011-08-08 11:09:33 PDT
Comment on attachment 103252 [details]

Clearing flags on attachment: 103252

Committed r92610: <http://trac.webkit.org/changeset/92610>
Comment 22 WebKit Review Bot 2011-08-08 11:09:39 PDT
All reviewed patches have been landed.  Closing bug.
Comment 23 Alexey Proskuryakov 2011-08-08 16:37:39 PDT
This has been rolled out in bug 65868, correct?
Comment 24 Alexandru Chiculita 2011-08-09 00:23:35 PDT
(In reply to comment #23)
> This has been rolled out in bug 65868, correct?

Yes. I've added https://bugs.webkit.org/show_bug.cgi?id=65871 and submitted the patch on that bug.