RESOLVED FIXED 13115
REGRESSION: 1000% performance regression in DOM access by index, which was already slow
https://bugs.webkit.org/show_bug.cgi?id=13115
Summary REGRESSION: 1000% performance regression in DOM access by index, which was al...
Jon
Reported 2007-03-18 23:57:26 PDT
As of r20309, WebKit's performance on the index portion of Hixie's DOM benchmark has regressed 1000%, from 1.4s to 14s (Quad G5, 10.4.9, 1GB RAM). I believe the relevant part of the test is this: function testIndex(div) { for (var i = 0; i < count; i += 1) { divs[i] = div.childNodes[count*2 - i*2 - 1]; } } In contrast Opera completes this part of the test in .09s and Firefox 2 in .4s. (Bizarrely, the first time I run the test in shipping Safari, it comes up at around 1.5 seconds. Then, the second and third times, the time doubles (3.5s, 7s). Relaunching Safari starts the process again.)
Attachments
proposed fix (13.44 KB, patch)
2007-04-21 10:06 PDT, Alexey Proskuryakov
no flags
proposed fix (13.42 KB, patch)
2007-04-21 10:56 PDT, Alexey Proskuryakov
no flags
proposed fix (13.42 KB, patch)
2007-04-21 10:58 PDT, Alexey Proskuryakov
darin: review+
Mark Rowe (bdash)
Comment 1 2007-03-19 00:03:36 PDT
Profiling with Shark shows that WebCore::ChildNodeList::length and WebCore::ChildNodeList::item are the two hottest functions here. WebCore::Node::notifyNodeListsChildrenChanged is not far behind, though I think that relates to the other tests rather than the testIndex case.
Maciej Stachowiak
Comment 2 2007-03-19 11:52:11 PDT
David Smith
Comment 3 2007-03-24 16:31:07 PDT
I have modified the testcase to show the effects of just the relevant test more clearly. I'm seeing 95.3% in length, so I suspect item is a false result from one of the other tests. The modified testcase can be found at http://dscoder.com/indexperf.html
David Smith
Comment 4 2007-03-24 16:34:56 PDT
In the version of Safari included in 10.4.9, 92.8% is spent in ChildNodeListImpl::item()
David Smith
Comment 5 2007-03-24 17:43:29 PDT
Some more things I've noticed: The length cache is not being used at all, due to a new ChildNodeList being returned each time. If you reuse the ChildNodeList it will speed things up dramatically. I can't reproduce a 1000% regression, although there's definitely some regression.
Alice Liu
Comment 6 2007-04-05 00:02:24 PDT
I've found 3 ranges of regression, and 1 range of improvement. I was using the test at http://dscoder.com/indexperf.html 2.0.4: ~5400 ms r13058: ~5400 ms r13060: ~6300 ms r15531: ~6300 ms r15534: ~7000 ms r15536: ~7000 ms r15544: ~7500 ms r16513: ~7000 ms r17582: ~6700 ms TOT: ~6700 ms I'm not sure what to make of these findings. I don't really see how r13059 could have had an impact on this test, and r13060 definitely couldn't. For the second range, we can rule out r15533 and r15534 as causes of regression, so i guess that leaves 15532. haven't really looked into the 3rd range yet, and i'm not sure if it was actually resolved between 15544 and 15561 or if it was just a coincidental speedup.
Alexey Proskuryakov
Comment 7 2007-04-21 10:06:02 PDT
Created attachment 14123 [details] proposed fix This patch makes NodeLists use a shared cache, similar to the one HTMLCollections use. With it, ChildNodeList::length() is no longer in Shark profiles. Above comments indicate that different people get different timings on these tests. This is apparently due to them being entirely memory-bound - over 90% of time is spent on a single instruction, "lwz r3,20(r3)" (which is what Node::nextSibling() compiles to). On my machine, the "insert" time on Hixie's test went down from 26 seconds to 7 seconds on TOT, while shipping Safari result is under 6 seconds. Believe it or not, David's reduction takes 64 seconds with shipping Safari, and 7 seconds on TOT with this patch (I haven't measured bare TOT). Go figure.
Alexey Proskuryakov
Comment 8 2007-04-21 10:09:28 PDT
Cache sharing doesn't improve item() behavior, because the test iterates backwards, which isn't optimized by the cache. Obviously, that needs to be amended, but probably in a separate non-P1 bug.
Darin Adler
Comment 9 2007-04-21 10:14:21 PDT
Comment on attachment 14123 [details] proposed fix I'm not a big fan of the word "info" in class names. Perhaps "Caches" is a better name? Also, if this struct is giong to be a member of the NodeList class, then its class name can just be Caches. + return m_info->lastItem; + } else if (index > m_info->lastItemOffset) { We normally don't do else after return. ChildNodeList(Node*); + ChildNodeList(Node*, NodeListInfo*); Why are we retaining the old ChildNodeList constructor that does not take the pointer to the caches? I don't see any reason to do so.
Alexey Proskuryakov
Comment 10 2007-04-21 10:56:52 PDT
Created attachment 14124 [details] proposed fix (In reply to comment #9) > Also, if this struct is giong to be a member of the NodeList > class, then its class name can just be Caches. OK. > We normally don't do else after return. Fixed. I didn't really touch this code - only used global replace for Cache members. > Why are we retaining the old ChildNodeList constructor that does not take the > pointer to the caches? I don't see any reason to do so. Just an oversight; removed.
Alexey Proskuryakov
Comment 11 2007-04-21 10:58:38 PDT
Created attachment 14125 [details] proposed fix Renamed Info to Caches in a couple more places.
Darin Adler
Comment 12 2007-04-21 15:58:27 PDT
Comment on attachment 14125 [details] proposed fix m_ownsInfo needs to be renamed to m_ownsCaches. r=me
Alexey Proskuryakov
Comment 13 2007-04-22 01:57:22 PDT
Committed revision 21003. Filed bug 13445 for the remaining performance problem.
Note You need to log in before you can comment on or make changes to this bug.