Bug 13351 - O(N^2) behavior seen in deeply nested trees when constructing renderers, laying out renderers and destroying renderers
Summary: O(N^2) behavior seen in deeply nested trees when constructing renderers, layi...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Dave Hyatt
URL:
Keywords:
Depends on: 13430 84567 13424 13431 13432 13433 13487
Blocks: 13337
  Show dependency treegraph
 
Reported: 2007-04-14 16:30 PDT by Geoffrey Garen
Modified: 2013-02-02 05:35 PST (History)
10 users (show)

See Also:


Attachments
nesting tests (309.04 KB, application/octet-stream)
2007-04-14 16:35 PDT, Geoffrey Garen
no flags Details
nesting results (410.75 KB, image/jpeg)
2007-04-14 16:36 PDT, Geoffrey Garen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2007-04-14 16:30:12 PDT
An equal number of un-nested tags shows very fast O(N) behavior.

I'd like to use this bug as a master bug for all the different reasons we might see O(N^2) behavior with nested tags. As we diagnose specific causes, we should break off new bugs to track them.

I think our first task should be to test parsing, rendering, and destroying individually, and then to profile what's slow.
Comment 1 Geoffrey Garen 2007-04-14 16:35:15 PDT
Created attachment 14034 [details]
nesting tests

Attached the tests I wrote, along with results. 

Basic summary:
not nested (span): 5ms
nested (span): 99ms
nested (div): 212ms
nested (div span): 38338ms

The div/span pair is particularly slow because mixing inline and block children forces the render tree into contortions.
Comment 2 Geoffrey Garen 2007-04-14 16:36:03 PDT
Created attachment 14035 [details]
nesting results

Attached image of results graph.
Comment 3 Dave Hyatt 2007-04-14 16:56:59 PDT
Some comments on each case:

(1) The nested span case is going to result in orders of magnitude more line boxes.  If you nest to an extreme depth, then every line will have a gigantic nesting level.

It should be possible to collapse together nested spans with identical styles and line box positions (e.g., if one span's ypos and height precisely match its enclosing span's ypos and height) and that would not do any painting of their own, but this change would obviously be involved and highly dangerous.

That's how I would attack the problem though... avoid constructing line boxes for spans that would not have painted anything and that have a positioning and style that is identical to their parent inlines.

(2) Nested divs (I assume there is content in each div like text) would result in a lot of anonymous blocks being made to enclose text.  Nothing comes to mind regarding fixing it other than the hope that something obvious shows up on profiles.  This behavior should not be O(n^2), so if it is, something is going wrong.

(3) Doing block - inline - block - inline is going to be the hardest one to fix.  The rendering contortions are necessary, but maybe profiling will show areas for improvement.
Comment 4 Dave Hyatt 2007-04-14 17:00:37 PDT
The principal reason I never bothered optimizing (1) (the nested inline case) is that this really only occurs if something has gone horribly wrong with a malformed page.  In normal usage inlines just don't nest deeply at all, and setting a maximum nesting depth of 200 (like Firefox does) would probably be good enough to deal with this particular problem for now.
Comment 5 Geoffrey Garen 2007-04-14 18:35:10 PDT
(I assume there is content in each div like text)

I put a single letter 'a' inside each tag.
Comment 6 Geoffrey Garen 2007-04-15 14:59:57 PDT
Let's use this bug to track the O(N^2) performance issue, regardless of whether we set a cap. We can track the cap -- or whatever solution we devise for hanging -- with bug 13337.
Comment 7 Dave Hyatt 2007-04-20 00:32:41 PDT
Analyzing the nested span case.  Because everything is on one line, some pathological first-line behavior is occurring.  Not sure this would occur in any real-world pages, since you inevitably get off the first line pretty quickly, but nevertheless:

firstLineStyle() is O(n^2)
verticalPositionHint() is O(n^2)
Comment 8 Dave Hyatt 2007-04-20 00:34:50 PDT
Also, parsing is not O(n^2) in the nested span case.  It's just as fast as the span case.  It's building up the render tree and laying it out that is O(n^2),.
Comment 9 Dave Hyatt 2007-04-20 00:45:17 PDT
isContentEditable is O(n^2).
inlineWidth() in bidi.cpp is O(n^2).
Comment 10 Dave Hyatt 2007-04-20 01:07:26 PDT
I should clarify this and say that these functions are O(n) but are often called on each node in the chain.  
Comment 11 Dave Hyatt 2007-04-20 01:08:07 PDT
Also, all four of these are small potatoes... removing them has little effect on the overall perf numbers, but I thought I'd list them as I keep investigating.
Comment 12 Dave Hyatt 2007-04-20 01:39:02 PDT
Actually these are huge.  Geoff's tests don't measure layout (which is actually the bulk of the time).
Comment 13 Dave Hyatt 2007-04-20 01:41:26 PDT
Results for the nested span case for 8192 nesting level:

Time to parse 8192 nested elements: 9ms
Time to construct renderers for 8192 nested elements: 664ms
Time to lay out 8192 nested elements: 3017ms
TOTAL TIME 8192 nested elements: 3690ms

Results for the non-nested span case for 8192 nesting level:

Time to parse 8192 nested elements: 7ms
Time to construct renderers 8192 nested elements: 5ms
Time to lay out 8192 nested elements: 13ms
TOTAL TIME 8192 nested elements: 25ms

This is terrible.  The layout result is just stunningly bad.
Comment 14 Dave Hyatt 2007-04-20 01:44:52 PDT
Here are the new results when the four functions mentioned are fixed (I just hacked them to be O(1)):

Time to parse 8192 nested elements: 11ms
Time to construct renderers for 8192 nested elements: 670ms
Time to lay out 8192 nested elements: 381ms
TOTAL TIME 8192 nested elements: 1063ms
Comment 15 Dave Hyatt 2007-04-20 01:55:22 PDT
Here are div results for construction and layout.

Unnested case:

Time to parse 8192 nested elements: 10ms
Time to construct renderers for 8192 nested elements: 5ms
Time to lay out 8192 nested elements: 26ms
TOTAL TIME 8192 nested elements: 41ms

Nested case:

Time to parse 8192 nested elements: 9ms
Time to construct renderers for 8192 nested elements: 1230ms
Time to lay out 8192 nested elements: 51ms
TOTAL TIME 8192 nested elements: 1291ms

Note that layout of nested blocks is not adversely affected by render tree depth, and that the O(n^2) problem here now that layout has been taken into account is actually not as bad as the nested span case!
Comment 16 Dave Hyatt 2007-04-20 02:17:47 PDT
Numbers for the span/div case. 

Unnested:

Time to parse 4096 nested elements: 8ms
Time to construct renderers for 4096 nested elements: 9ms
Time to lay out 4096 nested elements: 39ms
TOTAL TIME 4096 nested elements: 56ms

Nested:

Time to parse 4096 nested elements: 9ms
Time to construct renderers for 4096 nested elements: 1759ms
Time to lay out 4096 nested elements: 73055ms
TOTAL TIME 4096 nested elements: 74823ms

Note the incredible result here.  Layout itself is pathologically slow.  The render tree construction, while bad, is nothing compared to the cost of layout.


Comment 17 Dave Hyatt 2007-04-20 02:47:16 PDT
By the way without the O(1) fixes to the nested span case, the inline/block nested case jumps to 230 seconds.  So both are affected by those functions.
Comment 18 Dave Hyatt 2007-04-20 03:12:18 PDT
We are spending most of our time calculating minmaxwidth in this pathological case.  Here are results if we could envision doing calcMinMaxWidth lazily.

Time to parse 4096 nested elements: 9ms
Time to construct renderers for 4096 nested elements: 1608ms
Time to lay out 4096 nested elements: 14993ms
TOTAL TIME 4096 nested elements: 16610ms
Comment 19 Dave Hyatt 2007-04-20 03:22:37 PDT
We always call nextOnLineExists/prevOnLineExists to try to figure out which edges of enclosing inlines should be included to get borders, padding and margins correct.  Since 99% of the time, these spans have no borders, padding or margins, this should not be necessary.

Simply turning off these calls (in determineSpacingForFlowBoxes) yields the following (very exciting) results:

Time to parse 4096 nested elements: 10ms
Time to construct renderers for 4096 nested elements: 1768ms
Time to lay out 4096 nested elements: 2041ms
TOTAL TIME 4096 nested elements: 3820ms

This is down from 230 seconds assuming every problem described in this bug so far is fixed.

Comment 20 Dave Hyatt 2007-04-20 03:31:55 PDT
Here is the result for 8192 nested blocks inside inlines with all "fixes" applied (a level I doubt any of these real-world hangs have even achieved).

Time to parse 8192 nested elements: 17ms
Time to construct renderers for 8192 nested elements: 26192ms
Time to lay out 8192 nested elements: 36236ms
TOTAL TIME 8192 nested elements: 63005ms
Comment 21 Dave Hyatt 2007-04-20 03:35:32 PDT
Even if we don't do calcInlineMinMaxWidth lazily, there are some easy wins.  The isBR() check is redundant and can be removed for a speedup.  isInlineFlow() could be sped up to check a bit instead of making a virtual function call.
Comment 22 Dave Hyatt 2007-04-20 04:43:53 PDT
Another observation: it is both expected and natural that the nested-block-inside-inline case would be O(n^2).  You will actually make n^2 RenderObjects in this case.

Note that the block inside inline behavior is currently n^4, and that's why the slowdown is so pathological.  The tree itself explodes into an O(n^2) phenomenon (unavoidable), and then O(n^2) algorithms run on each one (avoidable).
Comment 23 Dave Hyatt 2007-04-20 22:43:09 PDT
I just thought of a brilliant way to handle continuations. :)

When a block is encountered inside an inline, wrap it in an anonymous inline-block.  Make sure the anonymous inline-block has the following special characteristics:
(1) It is set to width:100% so that it fills the width of the container.
(2) It is flagged as special so that forced breaks occur before and after the inline-block.

I think that should give us continuations practically for free (and even fix bugs where an enclosing inline was a relpositioned layer, etc.)
Comment 24 Gavin Sherlock 2009-02-01 10:06:26 PST
The performance of the test cases has improved enormously (tested with r40471 on 10.5.6, on a MacPro):

not nested (span):
Time to parse, render, and destroy 1024 nested elements: 1ms
Time to parse, render, and destroy 2048 nested elements: 1ms
Time to parse, render, and destroy 3072 nested elements: 3ms
Time to parse, render, and destroy 4096 nested elements: 3ms
Time to parse, render, and destroy 5120 nested elements: 5ms
Time to parse, render, and destroy 6144 nested elements: 5ms
Time to parse, render, and destroy 7168 nested elements: 7ms
Time to parse, render, and destroy 8192 nested elements: 8ms
Time to parse, render, and destroy 9216 nested elements: 8ms
Time to parse, render, and destroy 10240 nested elements: 9ms
Time to parse, render, and destroy 11264 nested elements: 10ms
Time to parse, render, and destroy 12288 nested elements: 11ms
Time to parse, render, and destroy 13312 nested elements: 12ms
Time to parse, render, and destroy 14336 nested elements: 12ms
Time to parse, render, and destroy 15360 nested elements: 13ms
Time to parse, render, and destroy 16384 nested elements: 14ms

nested (span):
Time to parse, render, and destroy 1024 nested elements: 2ms
Time to parse, render, and destroy 2048 nested elements: 7ms
Time to parse, render, and destroy 3072 nested elements: 15ms
Time to parse, render, and destroy 4096 nested elements: 27ms
Time to parse, render, and destroy 5120 nested elements: 40ms
Time to parse, render, and destroy 6144 nested elements: 52ms
Time to parse, render, and destroy 7168 nested elements: 80ms
Time to parse, render, and destroy 8192 nested elements: 98ms
Time to parse, render, and destroy 9216 nested elements: 131ms
Time to parse, render, and destroy 10240 nested elements: 140ms
Time to parse, render, and destroy 11264 nested elements: 194ms
Time to parse, render, and destroy 12288 nested elements: 227ms
Time to parse, render, and destroy 13312 nested elements: 232ms
Time to parse, render, and destroy 14336 nested elements: 307ms
Time to parse, render, and destroy 15360 nested elements: 347ms
Time to parse, render, and destroy 16384 nested elements: 395ms

nested (div):
Time to parse, render, and destroy 1024 nested elements: 2ms
Time to parse, render, and destroy 2048 nested elements: 7ms
Time to parse, render, and destroy 3072 nested elements: 16ms
Time to parse, render, and destroy 4096 nested elements: 27ms
Time to parse, render, and destroy 5120 nested elements: 42ms
Time to parse, render, and destroy 6144 nested elements: 55ms
Time to parse, render, and destroy 7168 nested elements: 79ms
Time to parse, render, and destroy 8192 nested elements: 102ms
Time to parse, render, and destroy 9216 nested elements: 129ms
Time to parse, render, and destroy 10240 nested elements: 156ms
Time to parse, render, and destroy 11264 nested elements: 189ms
Time to parse, render, and destroy 12288 nested elements: 216ms
Time to parse, render, and destroy 13312 nested elements: 265ms
Time to parse, render, and destroy 14336 nested elements: 304ms
Time to parse, render, and destroy 15360 nested elements: 319ms
Time to parse, render, and destroy 16384 nested elements: 396ms

nested (div span):
Time to parse, render, and destroy 512 nested elements: 2ms
Time to parse, render, and destroy 1024 nested elements: 4ms
Time to parse, render, and destroy 1536 nested elements: 8ms
Time to parse, render, and destroy 2048 nested elements: 15ms
Time to parse, render, and destroy 2560 nested elements: 23ms

It's still O(n^2) for everything except the not nested case, but these times are way better than they used to be, and significantly better than Safari 3.2.1 on the same machine.  For example, the nested (div span) test case in Safari 3.2.1 gives:

Time to parse, render, and destroy 512 nested elements: 46ms
Time to parse, render, and destroy 1024 nested elements: 141ms
Time to parse, render, and destroy 1536 nested elements: 251ms
Time to parse, render, and destroy 2048 nested elements: 367ms
Time to parse, render, and destroy 2560 nested elements: 491ms

while the nested (div) case in Safari 3.2.1 gives:

Time to parse, render, and destroy 1024 nested elements: 19ms
Time to parse, render, and destroy 2048 nested elements: 72ms
Time to parse, render, and destroy 3072 nested elements: 160ms
Time to parse, render, and destroy 4096 nested elements: 281ms
Time to parse, render, and destroy 5120 nested elements: 443ms
Time to parse, render, and destroy 6144 nested elements: 630ms
Time to parse, render, and destroy 7168 nested elements: 856ms
Time to parse, render, and destroy 8192 nested elements: 1126ms
Time to parse, render, and destroy 9216 nested elements: 1427ms
Time to parse, render, and destroy 10240 nested elements: 1783ms
Time to parse, render, and destroy 11264 nested elements: 2252ms
Time to parse, render, and destroy 12288 nested elements: 2705ms
Time to parse, render, and destroy 13312 nested elements: 3412ms
Time to parse, render, and destroy 14336 nested elements: 4891ms
Time to parse, render, and destroy 15360 nested elements: 6028ms
Time to parse, render, and destroy 16384 nested elements: 7522ms

so there have been enormous improvements.
Comment 25 Dave Hyatt 2009-08-22 22:21:04 PDT
It's just the new parser depth cap of 4096.  This depth cap is basically making the tests faster.
Comment 26 Ryosuke Niwa 2012-04-15 11:12:06 PDT
It seems like the first step of resolving this bug will be to add a WebKit-style performance test.
http://trac.webkit.org/wiki/Writing%20Performance%20Tests

In fact, this might be a good demo on my perf. test talk at the contributor's meeting :)