Bug 39794

Summary: Inserting more than 30 000 nodes is very slow
Product: WebKit Reporter: Anthony Ricaud <rik>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WORKSFORME    
Severity: Normal CC: ap, catfish.man, hyatt, jamesr, jamesr, mitz, simon.fraser
Priority: P2 Keywords: HasReduction
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
URL: http://hacks.mozilla.org/2010/05/better-performance-with-lazy-frame-construction/
Attachments:
Description Flags
Reduced testcase
none
pprof profiler graph (from linux chromium port) none

Description Anthony Ricaud 2010-05-26 15:38:03 PDT
Created attachment 57172 [details]
Reduced testcase

The first test case in the linked page takes about 30s to insert 80 000 nodes. It took between 1s and 2s in Firefox and Opera (see comments).

While it's not a common use case, something is wrong if other browsers can do it so fast.

Note that this is not related to the perf optimizations mentioned.
Comment 1 James Robinson 2010-05-27 18:24:07 PDT
Ironically (given the source of this test case is http://hacks.mozilla.org/2010/05/better-performance-with-lazy-frame-construction/) all of the time is spent in a single lazy attachment pass.  There are a few bits of N^2 behavior:

1.) Node::attach()'s loop through nextSibling() to check for adj text nodes. http://trac.webkit.org/browser/trunk/WebCore/dom/Node.cpp#L1226.  This does a crawl through all subsequent siblings to see if they have renderers.  Slow, but not new (that code is from r29054.

2.) Node::nextRenderer() called from Node::createRendererIfNeeded().  This also does a call through all siblings.  This is also slow and also not new.

This doesn't look like a regression, so it is probably super important relative to other things.  It would be nice at least to not have to make two passes through all siblings at each attachment instead of one.
Comment 2 James Robinson 2010-05-27 18:25:27 PDT
Created attachment 57286 [details]
pprof profiler graph (from linux chromium port)

This graph shows the slowspots pretty clearly, if anyone is curious about what the callstacks look like.
Comment 3 Anthony Ricaud 2017-04-16 11:20:32 PDT
Just tested and the testcase runs in 250ms now. So it was probably fixed in the last seven years.