Bug 39794 - Inserting more than 30 000 nodes is very slow
Summary: Inserting more than 30 000 nodes is very slow
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Macintosh OS X 10.5
: P2 Normal
Assignee: Nobody
URL: http://hacks.mozilla.org/2010/05/bett...
Keywords: HasReduction
Depends on:
Reported: 2010-05-26 15:38 PDT by Anthony Ricaud
Modified: 2017-04-16 11:20 PDT (History)
7 users (show)

See Also:

Reduced testcase (794 bytes, text/html)
2010-05-26 15:38 PDT, Anthony Ricaud
no flags Details
pprof profiler graph (from linux chromium port) (31.62 KB, application/postscript)
2010-05-27 18:25 PDT, James Robinson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anthony Ricaud 2010-05-26 15:38:03 PDT
Created attachment 57172 [details]
Reduced testcase

The first test case in the linked page takes about 30s to insert 80 000 nodes. It took between 1s and 2s in Firefox and Opera (see comments).

While it's not a common use case, something is wrong if other browsers can do it so fast.

Note that this is not related to the perf optimizations mentioned.
Comment 1 James Robinson 2010-05-27 18:24:07 PDT
Ironically (given the source of this test case is http://hacks.mozilla.org/2010/05/better-performance-with-lazy-frame-construction/) all of the time is spent in a single lazy attachment pass.  There are a few bits of N^2 behavior:

1.) Node::attach()'s loop through nextSibling() to check for adj text nodes. http://trac.webkit.org/browser/trunk/WebCore/dom/Node.cpp#L1226.  This does a crawl through all subsequent siblings to see if they have renderers.  Slow, but not new (that code is from r29054.

2.) Node::nextRenderer() called from Node::createRendererIfNeeded().  This also does a call through all siblings.  This is also slow and also not new.

This doesn't look like a regression, so it is probably super important relative to other things.  It would be nice at least to not have to make two passes through all siblings at each attachment instead of one.
Comment 2 James Robinson 2010-05-27 18:25:27 PDT
Created attachment 57286 [details]
pprof profiler graph (from linux chromium port)

This graph shows the slowspots pretty clearly, if anyone is curious about what the callstacks look like.
Comment 3 Anthony Ricaud 2017-04-16 11:20:32 PDT
Just tested and the testcase runs in 250ms now. So it was probably fixed in the last seven years.