Summary: | Inserting more than 30 000 nodes is very slow | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Anthony Ricaud <rik> | ||||||
Component: | DOM | Assignee: | Nobody <webkit-unassigned> | ||||||
Status: | RESOLVED WORKSFORME | ||||||||
Severity: | Normal | CC: | ap, catfish.man, hyatt, jamesr, jamesr, mitz, simon.fraser | ||||||
Priority: | P2 | Keywords: | HasReduction | ||||||
Version: | 528+ (Nightly build) | ||||||||
Hardware: | Mac | ||||||||
OS: | OS X 10.5 | ||||||||
URL: | http://hacks.mozilla.org/2010/05/better-performance-with-lazy-frame-construction/ | ||||||||
Attachments: |
|
Ironically (given the source of this test case is http://hacks.mozilla.org/2010/05/better-performance-with-lazy-frame-construction/) all of the time is spent in a single lazy attachment pass. There are a few bits of N^2 behavior: 1.) Node::attach()'s loop through nextSibling() to check for adj text nodes. http://trac.webkit.org/browser/trunk/WebCore/dom/Node.cpp#L1226. This does a crawl through all subsequent siblings to see if they have renderers. Slow, but not new (that code is from r29054. 2.) Node::nextRenderer() called from Node::createRendererIfNeeded(). This also does a call through all siblings. This is also slow and also not new. This doesn't look like a regression, so it is probably super important relative to other things. It would be nice at least to not have to make two passes through all siblings at each attachment instead of one. Created attachment 57286 [details]
pprof profiler graph (from linux chromium port)
This graph shows the slowspots pretty clearly, if anyone is curious about what the callstacks look like.
Just tested and the testcase runs in 250ms now. So it was probably fixed in the last seven years. |
Created attachment 57172 [details] Reduced testcase The first test case in the linked page takes about 30s to insert 80 000 nodes. It took between 1s and 2s in Firefox and Opera (see comments). While it's not a common use case, something is wrong if other browsers can do it so fast. Note that this is not related to the perf optimizations mentioned.