Bug 13115 - REGRESSION: 1000% performance regression in DOM access by index, which was already slow
Summary: REGRESSION: 1000% performance regression in DOM access by index, which was al...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac (PowerPC) OS X 10.4
: P1 Major
Assignee: Alexey Proskuryakov
URL: http://www.hixie.ch/tests/adhoc/perf/...
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2007-03-18 23:57 PDT by Jon
Modified: 2007-04-22 01:57 PDT (History)
4 users (show)

See Also:


Attachments
proposed fix (13.44 KB, patch)
2007-04-21 10:06 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
proposed fix (13.42 KB, patch)
2007-04-21 10:56 PDT, Alexey Proskuryakov
no flags Details | Formatted Diff | Diff
proposed fix (13.42 KB, patch)
2007-04-21 10:58 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jon 2007-03-18 23:57:26 PDT
As of r20309, WebKit's performance on the index portion of Hixie's DOM benchmark has regressed 1000%, from 1.4s to 14s (Quad G5, 10.4.9, 1GB RAM). I believe the relevant part of the test is this:

function testIndex(div) {
     for (var i = 0; i < count; i += 1) {
       divs[i] = div.childNodes[count*2 - i*2 - 1];
     }
   }

In contrast Opera completes this part of the test in .09s and Firefox 2 in .4s. 

(Bizarrely, the first time I run the test in shipping Safari, it comes up at around 1.5 seconds. Then, the second and third times, the time doubles (3.5s, 7s). Relaunching Safari starts the process again.)
Comment 1 Mark Rowe (bdash) 2007-03-19 00:03:36 PDT
Profiling with Shark shows that WebCore::ChildNodeList::length and WebCore::ChildNodeList::item are the two hottest functions here.  WebCore::Node::notifyNodeListsChildrenChanged is not far behind, though I think that relates to the other tests rather than the testIndex case.
Comment 2 Maciej Stachowiak 2007-03-19 11:52:11 PDT
<rdar://problem/5072349>
Comment 3 David Smith 2007-03-24 16:31:07 PDT
I have modified the testcase to show the effects of just the relevant test more clearly. I'm seeing 95.3% in length, so I suspect item is a false result from one of the other tests.

The modified testcase can be found at http://dscoder.com/indexperf.html
Comment 4 David Smith 2007-03-24 16:34:56 PDT
In the version of Safari included in 10.4.9, 92.8% is spent in ChildNodeListImpl::item()
Comment 5 David Smith 2007-03-24 17:43:29 PDT
Some more things I've noticed:

The length cache is not being used at all, due to a new ChildNodeList being returned each time. If you reuse the ChildNodeList it will speed things up dramatically.

I can't reproduce a 1000% regression, although there's definitely some regression.
Comment 6 Alice Liu 2007-04-05 00:02:24 PDT
I've found 3 ranges of regression, and 1 range of improvement.  I was using the test at http://dscoder.com/indexperf.html

2.0.4:  ~5400 ms
r13058: ~5400 ms
r13060: ~6300 ms

r15531: ~6300 ms
r15534: ~7000 ms

r15536: ~7000 ms
r15544: ~7500 ms

r16513: ~7000 ms
r17582: ~6700 ms
TOT:    ~6700 ms

I'm not sure what to make of these findings.  I don't really see how r13059 could have had an impact on this test, and r13060 definitely couldn't.  For the second range, we can rule out r15533 and r15534 as causes of regression, so i guess that leaves 15532.  haven't really looked into the 3rd range yet, and i'm not sure if it was actually resolved between 15544 and 15561 or if it was just a coincidental speedup. 
Comment 7 Alexey Proskuryakov 2007-04-21 10:06:02 PDT
Created attachment 14123 [details]
proposed fix

This patch makes NodeLists use a shared cache, similar to the one HTMLCollections use. With it, ChildNodeList::length() is no longer in Shark profiles.

Above comments indicate that different people get different timings on these tests. This is apparently due to them being entirely memory-bound - over 90% of time is spent on a single instruction, "lwz r3,20(r3)" (which is what Node::nextSibling() compiles to).

On my machine, the "insert" time on Hixie's test went down from 26 seconds to 7 seconds on TOT, while shipping Safari result is under 6 seconds. Believe it or not, David's reduction takes 64 seconds with shipping Safari, and 7 seconds on TOT with this patch (I haven't measured bare TOT). Go figure.
Comment 8 Alexey Proskuryakov 2007-04-21 10:09:28 PDT
Cache sharing doesn't improve item() behavior, because the test iterates backwards, which isn't optimized by the cache. Obviously, that needs to be amended, but probably in a separate non-P1 bug.
Comment 9 Darin Adler 2007-04-21 10:14:21 PDT
Comment on attachment 14123 [details]
proposed fix

I'm not a big fan of the word "info" in class names. Perhaps "Caches" is a better name? Also, if this struct is giong to be a member of the NodeList class, then its class name can just be Caches.

+            return m_info->lastItem;
+        } else if (index > m_info->lastItemOffset) {

We normally don't do else after return.

     ChildNodeList(Node*);
+    ChildNodeList(Node*, NodeListInfo*);

Why are we retaining the old ChildNodeList constructor that does not take the pointer to the caches? I don't see any reason to do so.
Comment 10 Alexey Proskuryakov 2007-04-21 10:56:52 PDT
Created attachment 14124 [details]
proposed fix

(In reply to comment #9)
> Also, if this struct is giong to be a member of the NodeList
> class, then its class name can just be Caches.

  OK.

> We normally don't do else after return.

  Fixed. I didn't really touch this code - only used global replace for Cache members.

> Why are we retaining the old ChildNodeList constructor that does not take the
> pointer to the caches? I don't see any reason to do so.

  Just an oversight; removed.
Comment 11 Alexey Proskuryakov 2007-04-21 10:58:38 PDT
Created attachment 14125 [details]
proposed fix

Renamed Info to Caches in a couple more places.
Comment 12 Darin Adler 2007-04-21 15:58:27 PDT
Comment on attachment 14125 [details]
proposed fix

m_ownsInfo needs to be renamed to m_ownsCaches.

r=me
Comment 13 Alexey Proskuryakov 2007-04-22 01:57:22 PDT
Committed revision 21003.

Filed bug 13445 for the remaining performance problem.