Bug 38557 - r58526 introduced a ~30% regression on Dromaeo JS lib
Summary: r58526 introduced a ~30% regression on Dromaeo JS lib
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Sam Weinig
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-04 18:03 PDT by Gavin Barraclough
Modified: 2010-05-14 08:59 PDT (History)
6 users (show)

See Also:


Attachments
Late Heap lookup (2.47 KB, patch)
2010-05-06 11:36 PDT, anton muhin
no flags Details | Formatted Diff | Diff
Next patch: faster emptyValue (2.47 KB, patch)
2010-05-07 12:15 PDT, anton muhin
no flags Details | Formatted Diff | Diff
My emptyValue patch (1.80 KB, patch)
2010-05-07 12:37 PDT, Sam Weinig
no flags Details | Formatted Diff | Diff
Fix (4.12 KB, patch)
2010-05-07 17:47 PDT, Sam Weinig
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gavin Barraclough 2010-05-04 18:03:42 PDT
This change -> http://trac.webkit.org/changeset/58526

Regressed Dromaeo JS lib tests by ~30% on my machine.  That said, it was about an equally huge win on Dromaeo DOM core (awesome!), but given our no-regressions policy I'd imagine we'll need to find a way to avoid the performance hit if we are to keep this in the tree.  Might be best to revert & re-land a new patch, once this issue has been resolved?

The hit seems to mostly fall on the CSS style & DOM traversal tests.

http://dromaeo.com/?id=102238
http://dromaeo.com/?id=102247

Darin, cc'ing you as reviewer of the original patch.
Comment 1 anton muhin 2010-05-05 07:12:33 PDT
Gavin, sorry for introducing that.

What is interesting there is nothing like that for Chromium: http://build.chromium.org/buildbot/perf/xp-release-single-core/dromaeo_jslib/report.html?history=350&header=&thumbnail=undefined&rev=-1&graph=jslib_traverse_prototype 

The only difference between Safari and Chromium that immediately come into my mind is change to GC policy---I haven't yet implemented retention of cached node lists in v8's bindings (will implement this week).

Gavin, if you have some spare cycles, may I ask you to comment out GC thing (this is the only change in JSNodeCustom.cpp: http://trac.webkit.org/changeset/58526/trunk/WebCore/bindings/js/JSNodeCustom.cpp) and check if it solves the problem?

If you don't have any cycles, just let me know and I'll investigate.

If possible, I'd ask not to revert it for, say, a week---I'll start working on it immediately.

And just for your information: there is a pretty convenient way to compare several Dromaeo runs---just separate run ids with comma: http://dromaeo.com/?id=102238,102247

(In reply to comment #0)
> This change -> http://trac.webkit.org/changeset/58526
> 
> Regressed Dromaeo JS lib tests by ~30% on my machine.  That said, it was about
> an equally huge win on Dromaeo DOM core (awesome!), but given our
> no-regressions policy I'd imagine we'll need to find a way to avoid the
> performance hit if we are to keep this in the tree.  Might be best to revert &
> re-land a new patch, once this issue has been resolved?
> 
> The hit seems to mostly fall on the CSS style & DOM traversal tests.
> 
> http://dromaeo.com/?id=102238
> http://dromaeo.com/?id=102247
> 
> Darin, cc'ing you as reviewer of the original patch.
Comment 2 Oliver Hunt 2010-05-05 09:50:22 PDT
My thoughts would be to see whether the cost is in GC overhead (that being the obvious culprit if we're now caching GC objects) and failing that i would look at what is causing us to miss the cache (which would be the second most obvious culprit)
Comment 3 Gavin Barraclough 2010-05-05 22:40:27 PDT
Hi Anton,

I'm fairly snowed under myself this week, so Sam Weinig has kindly offered to step in for me and try to help.  I think he plans to start with the basics, take some shark profiles & see what shows up.  I'm sure he'll let you know once he has any info.

many thanks,
G.
Comment 4 anton muhin 2010-05-06 09:45:21 PDT
(In reply to comment #3)
> Hi Anton,
> 
> I'm fairly snowed under myself this week, so Sam Weinig has kindly offered to
> step in for me and try to help.  I think he plans to start with the basics,
> take some shark profiles & see what shows up.  I'm sure he'll let you know once
> he has any info.
> 
> many thanks,
> G.

Sure, Gavin.

So the main suspect is GC for now: http://dromaeo.com/?id=102562,102570,102580 . First one is head of WebKit, the second one is head of WebKit with my change backed out, the last one is head of WebKit with the following change:

$ git diff
diff --git a/WebCore/bindings/js/JSNodeCustom.cpp b/WebCore/bindings/js/JSNodeCu
index 6d61037..c2ff970 100644
--- a/WebCore/bindings/js/JSNodeCustom.cpp
+++ b/WebCore/bindings/js/JSNodeCustom.cpp
@@ -179,7 +179,7 @@ void JSNode::markChildren(MarkStack& markStack)

     Node* node = m_impl.get();
     node->markJSEventListeners(markStack);
-    node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData());
+    // node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData());

     // Nodes in the document are kept alive by JSDocument::mark, so, if we're i
     // the document, we need to mark the document, but we don't need to explici

That's on windows box, but I hope to repro it on MBP as well.  Looking further.
Comment 5 anton muhin 2010-05-06 09:46:30 PDT
And I assign it to myself.  But if anyone else wishes to grab it, that's fine, I'll still keep investigating.
Comment 6 Darin Adler 2010-05-06 09:48:17 PDT
(In reply to comment #4)
> -    node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData());
> +    // node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData());

To get maximum speed on this line of code, we need to make sure we do the minimum amount of work on nodes without outstanding node lists. Finding the heap for "this" so we can get the global data pointer is one example of unwanted work. Also, we need to inline much of the checking for node lists, including at least the rare data bit check and possibly also the rare data hash table lookup and then the subsequent check for the cached node lists pointer inside the rare data structure.

At the moment, I believe we do have the rare data bit check inlined, so the issue may be the overhead in getting the global data pointer.

If worse comes to worst we might have to devote a bit in the Node to indicating whether we have any cached node lists.
Comment 7 anton muhin 2010-05-06 09:51:41 PDT
(In reply to comment #6)
> (In reply to comment #4)
> > -    node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData());
> > +    // node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData());
> 
> To get maximum speed on this line of code, we need to make sure we do the
> minimum amount of work on nodes without outstanding node lists. Finding the
> heap for "this" so we can get the global data pointer is one example of
> unwanted work. Also, we need to inline much of the checking for node lists,
> including at least the rare data bit check and possibly also the rare data hash
> table lookup and then the subsequent check for the cached node lists pointer
> inside the rare data structure.
> 
> At the moment, I believe we do have the rare data bit check inlined, so the
> issue may be the overhead in getting the global data pointer.
> 
> If worse comes to worst we might have to devote a bit in the Node to indicating
> whether we have any cached node lists.

Agree, and thanks a lot for your suggestions.  Will investigate various approaches.  I inlined by your suggestion a fast check for presence of rare data, but will start with inlining the check for presence of cached node lists.  Let's hope it's not a node with tons on node lists cached on it.
Comment 8 Darin Adler 2010-05-06 09:54:21 PDT
(In reply to comment #7)
> I inlined by your suggestion a fast check for presence of rare
> data, but will start with inlining the check for presence of cached node lists.
>  Let's hope it's not a node with tons on node lists cached on it.

I think that instead, the right first step is to restructure to eliminate the call to *Heap::heap(this)->globalData() when rare data is not present.
Comment 9 anton muhin 2010-05-06 09:55:49 PDT
(In reply to comment #8)
> (In reply to comment #7)
> > I inlined by your suggestion a fast check for presence of rare
> > data, but will start with inlining the check for presence of cached node lists.
> >  Let's hope it's not a node with tons on node lists cached on it.
> 
> I think that instead, the right first step is to restructure to eliminate the
> call to *Heap::heap(this)->globalData() when rare data is not present.

Thanks a lot, Darin, going to give it a try right now.
Comment 10 Darin Adler 2010-05-06 09:58:03 PDT
The easiest way is probably to remove the JSGlobalData* argument and add a JSObject* argument instead.
Comment 11 anton muhin 2010-05-06 11:36:58 PDT
Created attachment 55276 [details]
Late Heap lookup

Alas, this doesn't solve the problem.  Any obvious mistakes?
Comment 12 anton muhin 2010-05-06 12:40:09 PDT
[+ Vitaly]

So we carried out couple of experiments.  Current results:

1) speeding up fast checks doesn't help: see patch above, I ran another patch which uses bit flag when cached node lists are present---regression still persists;

2) heap profiler (in Chromium) hints that Dromaeo tests retain documents (one per suite like "DOM Attributes (jQuery)" and those apparently keep node lists;

3) if one runs 'regressed suite' separately, omitting suites before, there is no regression: http://dromaeo.com/?id=102605,102610

So the best hypothesis for now (which I will check tomorrow) is we've got 'leaked' cached node lists which sit and make subsequent GCs slower.

If that's true, ideally Dromaeo should have cleaned up its garbage (and I'll experiment with it nulling out globals in suites).

But anyway, esp. as JSC heap is not segregated by DOMWindow yet, we probably need to collect node lists if they don't have custom properties set.

Thoughts?
Comment 13 anton muhin 2010-05-06 13:36:03 PDT
BTW, Darin, do you remember why it was decided that we need to retain cached node lists as long as node is alive?  I remember it was due to failing layout tests, but is it indeed desired semantics?

I'll check tomorrow HTML5 wording, but any insights are most appreciated.
Comment 14 Darin Adler 2010-05-06 15:13:17 PDT
(In reply to comment #13)
> BTW, Darin, do you remember why it was decided that we need to retain cached
> node lists as long as node is alive?  I remember it was due to failing layout
> tests, but is it indeed desired semantics?
> 
> I'll check tomorrow HTML5 wording, but any insights are most appreciated.

The general principle is that garbage collection must not be detectable. So if someone adds custom properties to a node list it’s important we don’t lose those. That’s the main concept. The rest is just about performance tuning, I suppose.
Comment 15 Sam Weinig 2010-05-06 16:31:42 PDT
I did the experiment of running the whole Dromaeo JavaScript Library Test and sharking just "DOM Traversal (Prototype)" (that one was particularly slow.  It seems an exorbitant amount of time is being spent creating the emptyValue QualifiedName for the TagNodeListCache in markNodeLists.  The emptyValue Qualified name can be greatly simplified in one of two ways, 1) use a 0 value for QualifiedName::m_impl or 2) use a static global.  I am trying #1 now and will report back the results.

In addition, I am going to investigate adding a call to hasCustomProperties() in markDOMObjectWrapper (or something similar that actually works, like adding a new version of markDOMObjectWrapper that does this) to avoid keeping alive the wrappers when there are no observable properties.
Comment 16 anton muhin 2010-05-06 22:25:12 PDT
(In reply to comment #14)
> (In reply to comment #13)
> > BTW, Darin, do you remember why it was decided that we need to retain cached
> > node lists as long as node is alive?  I remember it was due to failing layout
> > tests, but is it indeed desired semantics?
> > 
> > I'll check tomorrow HTML5 wording, but any insights are most appreciated.
> 
> The general principle is that garbage collection must not be detectable. So if
> someone adds custom properties to a node list it’s important we don’t lose
> those. That’s the main concept. The rest is just about performance tuning, I
> suppose.

Darin, I'll respond somewhat later---I need to carry on couple of experiments before.
Comment 17 anton muhin 2010-05-06 22:26:56 PDT
(In reply to comment #15)
> I did the experiment of running the whole Dromaeo JavaScript Library Test and
> sharking just "DOM Traversal (Prototype)" (that one was particularly slow.  It
> seems an exorbitant amount of time is being spent creating the emptyValue
> QualifiedName for the TagNodeListCache in markNodeLists.  The emptyValue
> Qualified name can be greatly simplified in one of two ways, 1) use a 0 value
> for QualifiedName::m_impl or 2) use a static global.  I am trying #1 now and
> will report back the results.
> 
> In addition, I am going to investigate adding a call to hasCustomProperties()
> in markDOMObjectWrapper (or something similar that actually works, like adding
> a new version of markDOMObjectWrapper that does this) to avoid keeping alive
> the wrappers when there are no observable properties.

Sam, thanks a lot.  Any news about your experiments?  I am asking as in ~3 hrs I'll start working on it and I'd rather not duplicate the work you're already doing.
Comment 18 anton muhin 2010-05-07 12:15:30 PDT
Created attachment 55404 [details]
Next patch: faster emptyValue
Comment 19 anton muhin 2010-05-07 12:17:54 PDT
(In reply to comment #18)
> Created an attachment (id=55404) [details]
> Next patch: faster emptyValue

This patch restores notable amount of score for me---this is all due to faster emptyValue (thanks to Sam's investigation).  Would appreciate if anyone else checks if it helps on their box.  So the main suspect for now would be iteration over cached node lists.  I am going to use NodeListsNodeData::m_listsWithCaches to see if it'll help.
Comment 20 Sam Weinig 2010-05-07 12:37:37 PDT
Created attachment 55408 [details]
My emptyValue patch
Comment 21 Darin Adler 2010-05-07 17:05:43 PDT
Comment on attachment 55404 [details]
Next patch: faster emptyValue

This is another copy of your JSGlobalData patch, not an emptyValue one.

I like Sam's approach to the emptyValue issue -- I suggest we use his.
Comment 22 Sam Weinig 2010-05-07 17:47:36 PDT
Created attachment 55442 [details]
Fix
Comment 23 Darin Adler 2010-05-07 17:48:42 PDT
Comment on attachment 55442 [details]
Fix

> +        - Only mark cached NodeLists on Documents instead of all Nodes.

Why is this OK?
Comment 24 Eric Seidel (no email) 2010-05-08 23:15:51 PDT
Attachment 55442 [details] was posted by a committer and has review+, assigning to Sam Weinig for commit.
Comment 25 Sam Weinig 2010-05-09 13:34:58 PDT
Landed in r59058.
Comment 26 anton muhin 2010-05-11 07:35:31 PDT
(In reply to comment #21)
> (From update of attachment 55404 [details])
> This is another copy of your JSGlobalData patch, not an emptyValue one.
> 
> I like Sam's approach to the emptyValue issue -- I suggest we use his.

Sorry, wrong patch.
Comment 27 anton muhin 2010-05-11 07:36:16 PDT
(In reply to comment #23)
> (From update of attachment 55442 [details])
> > +        - Only mark cached NodeLists on Documents instead of all Nodes.
> 
> Why is this OK?

I'd be curious to know as well.  And thanks a lot for fixing the problem, Sam.
Comment 28 Darin Adler 2010-05-11 10:34:34 PDT
(In reply to comment #27)
> (In reply to comment #23)
> > (From update of attachment 55442 [details] [details])
> > > +        - Only mark cached NodeLists on Documents instead of all Nodes.
> > 
> > Why is this OK?
> 
> I'd be curious to know as well.  And thanks a lot for fixing the problem, Sam.

Sam answered my question in person. Sam, when you get a chance can you comment here in writing?
Comment 29 anton muhin 2010-05-14 04:15:14 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > (In reply to comment #23)
> > > (From update of attachment 55442 [details] [details] [details])
> > > > +        - Only mark cached NodeLists on Documents instead of all Nodes.
> > > 
> > > Why is this OK?
> > 
> > I'd be curious to know as well.  And thanks a lot for fixing the problem, Sam.
> 
> Sam answered my question in person. Sam, when you get a chance can you comment here in writing?

Sam, Darin,

sorry for being pushy, but could you clarify the story with Document?  I'd like to adjust v8's GC as well, but would like to have it as similar to JSC as possible.
Comment 30 Darin Adler 2010-05-14 08:59:24 PDT
Sam should answer the question. Here's what I remember: Part of the answer is that almost all use of these functions is on Document. Another part is that collection of the JavaScript wrapper only affects the lifetime of custom properties on the node list in cases where no one is retaining a pointer to that list.