WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
13115
REGRESSION: 1000% performance regression in DOM access by index, which was already slow
https://bugs.webkit.org/show_bug.cgi?id=13115
Summary
REGRESSION: 1000% performance regression in DOM access by index, which was al...
Jon
Reported
2007-03-18 23:57:26 PDT
As of
r20309
, WebKit's performance on the index portion of Hixie's DOM benchmark has regressed 1000%, from 1.4s to 14s (Quad G5, 10.4.9, 1GB RAM). I believe the relevant part of the test is this: function testIndex(div) { for (var i = 0; i < count; i += 1) { divs[i] = div.childNodes[count*2 - i*2 - 1]; } } In contrast Opera completes this part of the test in .09s and Firefox 2 in .4s. (Bizarrely, the first time I run the test in shipping Safari, it comes up at around 1.5 seconds. Then, the second and third times, the time doubles (3.5s, 7s). Relaunching Safari starts the process again.)
Attachments
proposed fix
(13.44 KB, patch)
2007-04-21 10:06 PDT
,
Alexey Proskuryakov
no flags
Details
Formatted Diff
Diff
proposed fix
(13.42 KB, patch)
2007-04-21 10:56 PDT
,
Alexey Proskuryakov
no flags
Details
Formatted Diff
Diff
proposed fix
(13.42 KB, patch)
2007-04-21 10:58 PDT
,
Alexey Proskuryakov
darin
: review+
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Mark Rowe (bdash)
Comment 1
2007-03-19 00:03:36 PDT
Profiling with Shark shows that WebCore::ChildNodeList::length and WebCore::ChildNodeList::item are the two hottest functions here. WebCore::Node::notifyNodeListsChildrenChanged is not far behind, though I think that relates to the other tests rather than the testIndex case.
Maciej Stachowiak
Comment 2
2007-03-19 11:52:11 PDT
<
rdar://problem/5072349
>
David Smith
Comment 3
2007-03-24 16:31:07 PDT
I have modified the testcase to show the effects of just the relevant test more clearly. I'm seeing 95.3% in length, so I suspect item is a false result from one of the other tests. The modified testcase can be found at
http://dscoder.com/indexperf.html
David Smith
Comment 4
2007-03-24 16:34:56 PDT
In the version of Safari included in 10.4.9, 92.8% is spent in ChildNodeListImpl::item()
David Smith
Comment 5
2007-03-24 17:43:29 PDT
Some more things I've noticed: The length cache is not being used at all, due to a new ChildNodeList being returned each time. If you reuse the ChildNodeList it will speed things up dramatically. I can't reproduce a 1000% regression, although there's definitely some regression.
Alice Liu
Comment 6
2007-04-05 00:02:24 PDT
I've found 3 ranges of regression, and 1 range of improvement. I was using the test at
http://dscoder.com/indexperf.html
2.0.4: ~5400 ms
r13058
: ~5400 ms
r13060
: ~6300 ms
r15531
: ~6300 ms
r15534
: ~7000 ms
r15536
: ~7000 ms
r15544
: ~7500 ms
r16513
: ~7000 ms
r17582
: ~6700 ms TOT: ~6700 ms I'm not sure what to make of these findings. I don't really see how
r13059
could have had an impact on this test, and
r13060
definitely couldn't. For the second range, we can rule out
r15533
and
r15534
as causes of regression, so i guess that leaves 15532. haven't really looked into the 3rd range yet, and i'm not sure if it was actually resolved between 15544 and 15561 or if it was just a coincidental speedup.
Alexey Proskuryakov
Comment 7
2007-04-21 10:06:02 PDT
Created
attachment 14123
[details]
proposed fix This patch makes NodeLists use a shared cache, similar to the one HTMLCollections use. With it, ChildNodeList::length() is no longer in Shark profiles. Above comments indicate that different people get different timings on these tests. This is apparently due to them being entirely memory-bound - over 90% of time is spent on a single instruction, "lwz r3,20(r3)" (which is what Node::nextSibling() compiles to). On my machine, the "insert" time on Hixie's test went down from 26 seconds to 7 seconds on TOT, while shipping Safari result is under 6 seconds. Believe it or not, David's reduction takes 64 seconds with shipping Safari, and 7 seconds on TOT with this patch (I haven't measured bare TOT). Go figure.
Alexey Proskuryakov
Comment 8
2007-04-21 10:09:28 PDT
Cache sharing doesn't improve item() behavior, because the test iterates backwards, which isn't optimized by the cache. Obviously, that needs to be amended, but probably in a separate non-P1 bug.
Darin Adler
Comment 9
2007-04-21 10:14:21 PDT
Comment on
attachment 14123
[details]
proposed fix I'm not a big fan of the word "info" in class names. Perhaps "Caches" is a better name? Also, if this struct is giong to be a member of the NodeList class, then its class name can just be Caches. + return m_info->lastItem; + } else if (index > m_info->lastItemOffset) { We normally don't do else after return. ChildNodeList(Node*); + ChildNodeList(Node*, NodeListInfo*); Why are we retaining the old ChildNodeList constructor that does not take the pointer to the caches? I don't see any reason to do so.
Alexey Proskuryakov
Comment 10
2007-04-21 10:56:52 PDT
Created
attachment 14124
[details]
proposed fix (In reply to
comment #9
)
> Also, if this struct is giong to be a member of the NodeList > class, then its class name can just be Caches.
OK.
> We normally don't do else after return.
Fixed. I didn't really touch this code - only used global replace for Cache members.
> Why are we retaining the old ChildNodeList constructor that does not take the > pointer to the caches? I don't see any reason to do so.
Just an oversight; removed.
Alexey Proskuryakov
Comment 11
2007-04-21 10:58:38 PDT
Created
attachment 14125
[details]
proposed fix Renamed Info to Caches in a couple more places.
Darin Adler
Comment 12
2007-04-21 15:58:27 PDT
Comment on
attachment 14125
[details]
proposed fix m_ownsInfo needs to be renamed to m_ownsCaches. r=me
Alexey Proskuryakov
Comment 13
2007-04-22 01:57:22 PDT
Committed revision 21003. Filed
bug 13445
for the remaining performance problem.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug