WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
65668
Optimize floating elements lookup
https://bugs.webkit.org/show_bug.cgi?id=65668
Summary
Optimize floating elements lookup
Alexandru Chiculita
Reported
2011-08-04 00:25:01 PDT
Currently the floating RenderObjects are stored in a list and searched by using a linear search. When the number of floats increases the layout becomes very slow. A better structure should be used.
Attachments
Patch
(41.89 KB, patch)
2011-08-04 09:22 PDT
,
Alexandru Chiculita
no flags
Details
Formatted Diff
Diff
Patch
(41.91 KB, patch)
2011-08-04 09:41 PDT
,
Alexandru Chiculita
no flags
Details
Formatted Diff
Diff
Simple Performance Test
(1.99 KB, text/html)
2011-08-04 18:20 PDT
,
Alexandru Chiculita
no flags
Details
Patch
(44.28 KB, patch)
2011-08-05 09:37 PDT
,
Alexandru Chiculita
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Patch
(44.28 KB, patch)
2011-08-05 10:09 PDT
,
Alexandru Chiculita
hyatt
: review-
hyatt
: commit-queue-
Details
Formatted Diff
Diff
Patch
(51.22 KB, patch)
2011-08-08 07:26 PDT
,
Alexandru Chiculita
webkit.review.bot
: commit-queue-
Details
Formatted Diff
Diff
Fixed build
(51.22 KB, patch)
2011-08-08 08:01 PDT
,
Alexandru Chiculita
no flags
Details
Formatted Diff
Diff
Patch
(51.27 KB, patch)
2011-08-08 08:10 PDT
,
Alexandru Chiculita
no flags
Details
Formatted Diff
Diff
Patch
(51.26 KB, patch)
2011-08-08 08:34 PDT
,
Alexandru Chiculita
no flags
Details
Formatted Diff
Diff
Show Obsolete
(8)
View All
Add attachment
proposed patch, testcase, etc.
Alexandru Chiculita
Comment 1
2011-08-04 00:29:39 PDT
Two options are currently available: 1. Use the shape algebra algorithm described in Graphics Gems II and implemented in Regions.cpp that is used by WebKit2 to track the dirty areas 2. Use the PODIntervalTree.h that was used by the GPU rendering of polygons in WebCore Regions: * Regions.cpp seems to be very promising although the current implementation is not there yet. My current prototype is slower then the simple linear search. It needs to be highly optimized for our simple operation: adding a new rectangle (ie. no full memory copies, in-place updates, skipping spans before and after the rectangle) * Uses less memory then interval trees for compact floats that have same height (eg. lots of floats per line) * It is not able to query a list of floats that affect a line and that might later be needed for wrap-shape exclusions. * Search time should be O(logN) where N is the amount of spans created by the algorithm. Interval trees: * Current prototype is already very fast * Uses a fixed amount of memory O(N) where N is number of floats, but I think that can be improved by merging the nodes for the compact floats. It should become similar to Regions and will also improve speed (because of having less nodes). * Has O(logN) insert/remove/search times (N is the number of floats) * Can query exact list of floats for a specific line Regions.cpp uses a lot less memory when the floats are really compact. However, it seems to take a lot of time to update the structure. The memory increases when the floats are really small in height, because it will tend to create lots of spans. However, the implementation in Regions is not optimized for lots of spans. There's an optimization mentioned at the end of the Graphics Gems chapter about skipping non-affected spans. Also we only need to unite a rect and in that case the operation can be further optimized. Memory allocations are also a big issue, a simple operation will copy the whole structure. The article optimizes the memory allocations by using a linked list and doing most of the updates in-place. The Regions.cpp implementation also tries to merge spans, but the article says it is not worth the time if the structure is not going to be used for a bigger amount of time. In our case the Region will always be recreated when a new layout is done. Indeed the query for rectangles at a specific height can be very fast, it would only require a search in a sorted list. It should always retrieve a maximum of two rectangles: one for all the floats on the left and one rectangle for all the floats on the right, so the actual calculation is pretty easy. That will get a little more complicated for positioned elements. However, when wrap-shapes will be implemented, it will need to know exactly what floats are on the line, so that the wrap-shape can be taken into account. Using this algorithm that information seems to be lost in the compression. The alternative would be to approximate the curves of the wrap-shape using rectangles and pollute the floats Region. Having that will create lots of spans and will only make it a lot slower. Moreover, such a solution will make it dependent on the resolution. Removing the floats below a specified Y value can also be optimized by just removing all the spans after one point. That is needed when rewinding a line, for example for pagination purposes. MarkAllDescendantsWithFloatsForLayout also removes the floats, but it looks like there's no need to update the Region in that case. That's because the layout will be forced on those blocks , so the region will be totally cleared anyway. In that case it seems it always just adds new rectangles, so a subtract is not actually needed, meaning that we can optimize only for a union between a Region and a LayoutRect. An interval tree will create a node for each placed float. The implementation is based on a red/black balanced tree, so we got O(logN) for all the operations. The good news is that the implementation allocates the nodes in an arena, so it is very fast because no free or destroy callbacks are used. The arena goes away when the tree is cleared and that will happen at beginning of every layout. That is important because we tend to have few removals from the tree (only when a line is rewinded at the end of a column/region/page). The interval tree is very predictable, it always uses a specific amount of memory O(n) where n is the number of floats and it has O(logN) insert/remove/search times. It also allows querying an exact list of floats that are affecting a line (needed for wrap-shape). The memory of the interval tree may be improved by doing a search before adding a new interval. For example multiple same height floats on a single line can be merged together, all it needs to do is to update offset of the interval.
Alexandru Chiculita
Comment 2
2011-08-04 00:33:57 PDT
Per discussion with David Hyatt: 1. We will go with interval trees. 2. Also other optimizations can be explored in order to avoid tree propagation through descendant blocks.
Alexandru Chiculita
Comment 3
2011-08-04 09:22:21 PDT
Created
attachment 102931
[details]
Patch
Alexandru Chiculita
Comment 4
2011-08-04 09:41:34 PDT
Created
attachment 102936
[details]
Patch Rebased the patch
Sam Weinig
Comment 5
2011-08-04 16:10:36 PDT
Does this show a speed up on any tests? Can we construct a test that shows an improvement (something with a lot of floats)? Looking good though!
Alexandru Chiculita
Comment 6
2011-08-04 18:20:31 PDT
Created
attachment 103014
[details]
Simple Performance Test Created a simple performance test. On my machine I get nearly 7fps with nightly builds and about 18fps with this patch applied.
Adam Barth
Comment 7
2011-08-04 18:39:33 PDT
> Simple Performance Test
Feel encouraged to put the perf test somewhere in this folder:
http://trac.webkit.org/browser/trunk/PerformanceTests
:)
Alexandru Chiculita
Comment 8
2011-08-04 21:14:35 PDT
(In reply to
comment #7
)
> > Simple Performance Test > > Feel encouraged to put the perf test somewhere in this folder: >
http://trac.webkit.org/browser/trunk/PerformanceTests
> > :)
I've added
https://bugs.webkit.org/show_bug.cgi?id=65741
. I will submit a patch today.
Alexandru Chiculita
Comment 9
2011-08-05 09:37:14 PDT
Created
attachment 103078
[details]
Patch Removed the need to populate a vector.
WebKit Review Bot
Comment 10
2011-08-05 09:46:53 PDT
Comment on
attachment 103078
[details]
Patch
Attachment 103078
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/9305871
Alexandru Chiculita
Comment 11
2011-08-05 10:09:46 PDT
Created
attachment 103083
[details]
Patch Fixed build.
Sam Weinig
Comment 12
2011-08-05 12:03:15 PDT
(In reply to
comment #6
)
> Created an attachment (id=103014) [details] > Simple Performance Test > > Created a simple performance test. > On my machine I get nearly 7fps with nightly builds and about 18fps with this patch applied.
Very nice!
Dave Hyatt
Comment 13
2011-08-05 12:13:37 PDT
Comment on
attachment 103083
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=103083&action=review
Overall this looks pretty good. I have one major issue with the patch though, and that is that I would prefer it if the interval tree was built lazily if possible. I could be wrong about this but if you have something like this: <img style="float:left;width:10px;height:1000px"> <div><div><div>....lots more nested divs.... <div>Some text We propagate floating objects into all of the nested divs. This duplication is bad enough and is something we need to fix, but if you don't build the interval tree lazily you will also have tons of duplication in each div, even though nobody actually *tested* for overlap in any of the intermediate divs. It seems like you could have a notion of whether or not a float's overlap has been computed that is independent of being placed and add to the interval tree lazily when someone actually does an overlap test. I think I'd like to see a deeply nested div performance test in which many many floats intrude into all of those divs to verify the potential performance gain from the optimization I'm suggesting. That will also help us measure when we remove all of the duplicate FloatingObject construction down the road as well.
> Source/WebCore/rendering/RenderBlock.cpp:94 > +#ifndef NDEBUG > +// These helpers are only used by the PODIntervalTree for debugging purposes. > +String ValueToString<int>::string(const int value) > +{ > + return String::number(value); > +} > + > +String ValueToString<RenderBlock::FloatingObject*>::string(const RenderBlock::FloatingObject* floatingObject) > +{ > + return String::format("%p (%dx%d %dx%d)", floatingObject, floatingObject->x(), floatingObject->y(), floatingObject->maxX(), floatingObject->maxY()); > +} > +#endif
Can we put these at the end of the file along with the renderName dumping.
> Source/WebCore/rendering/RenderBlock.cpp:1211 > + if (m_floatingObjects) > + m_floatingObjects->setHorizontalWritingMode(isHorizontalWritingMode());
Seems like you could do this inside clearFloats.
> Source/WebCore/rendering/RenderBlock.cpp:3419 > floatingObject->setIsPlaced(); > + m_floatingObjects->addPlacedObject(floatingObject);
You could move setIsPlaced inside addPlacedObject.
> Source/WebCore/rendering/RenderBlock.cpp:3666 > floatingObject->setIsPlaced(true); > + m_floatingObjects->addPlacedObject(floatingObject);
Again, you could move setIsPlaced inside addPlacedObject.
Dave Hyatt
Comment 14
2011-08-05 12:15:40 PDT
Note my comments about "You could move setIsPlaced inside addPlacedObject." don't apply if you do the lazy interval tree.
Alexandru Chiculita
Comment 15
2011-08-08 07:26:41 PDT
Created
attachment 103242
[details]
Patch Updated the floats test to also create some empty nested divs. It seems that the lazy init optimization makes a big difference. Thanks for the tip. Also addressed the other comments. Regarding "You could move setIsPlaced inside addPlacedObject" comment. I think it's ok to move it, even if I changed to lazy init now.
WebKit Review Bot
Comment 16
2011-08-08 08:00:38 PDT
Comment on
attachment 103242
[details]
Patch
Attachment 103242
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/9329853
New failing tests: css2.1/t0905-c414-flt-wrap-00-e.html css2.1/t09-c5526c-display-00-e.html css2.1/t0905-c414-flt-03-c.html css1/box_properties/clear_float.html css1/box_properties/float_elements_in_series.html css1/box_properties/clear.html css1/formatting_model/vertical_formatting.html css1/box_properties/float_margin.html css2.1/t0905-c414-flt-04-c.html css2.1/t0905-c414-flt-02-c.html css2.1/t0905-c414-flt-00-d.html css2.1/t0905-c5525-fltblck-00-d-ag.html css1/formatting_model/floating_elements.html css2.1/t0905-c414-flt-wrap-01-d-g.html css1/box_properties/float.html css2.1/t0905-c414-flt-fit-01-d-g.html css1/box_properties/acid_test.html css2.1/t0905-c414-flt-01-d-g.html css1/box_properties/float_on_text_elements.html css2.1/t0905-c414-flt-fit-00-d.html
Alexandru Chiculita
Comment 17
2011-08-08 08:01:23 PDT
Created
attachment 103246
[details]
Fixed build
Alexandru Chiculita
Comment 18
2011-08-08 08:10:04 PDT
Created
attachment 103247
[details]
Patch
Alexandru Chiculita
Comment 19
2011-08-08 08:34:10 PDT
Created
attachment 103252
[details]
Patch
Dave Hyatt
Comment 20
2011-08-08 09:11:34 PDT
Comment on
attachment 103252
[details]
Patch All looks good! r=me!
WebKit Review Bot
Comment 21
2011-08-08 11:09:33 PDT
Comment on
attachment 103252
[details]
Patch Clearing flags on attachment: 103252 Committed
r92610
: <
http://trac.webkit.org/changeset/92610
>
WebKit Review Bot
Comment 22
2011-08-08 11:09:39 PDT
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 23
2011-08-08 16:37:39 PDT
This has been rolled out in
bug 65868
, correct?
Alexandru Chiculita
Comment 24
2011-08-09 00:23:35 PDT
(In reply to
comment #23
)
> This has been rolled out in
bug 65868
, correct?
Yes. I've added
https://bugs.webkit.org/show_bug.cgi?id=65871
and submitted the patch on that bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug