This change -> http://trac.webkit.org/changeset/58526 Regressed Dromaeo JS lib tests by ~30% on my machine. That said, it was about an equally huge win on Dromaeo DOM core (awesome!), but given our no-regressions policy I'd imagine we'll need to find a way to avoid the performance hit if we are to keep this in the tree. Might be best to revert & re-land a new patch, once this issue has been resolved? The hit seems to mostly fall on the CSS style & DOM traversal tests. http://dromaeo.com/?id=102238 http://dromaeo.com/?id=102247 Darin, cc'ing you as reviewer of the original patch.
Gavin, sorry for introducing that. What is interesting there is nothing like that for Chromium: http://build.chromium.org/buildbot/perf/xp-release-single-core/dromaeo_jslib/report.html?history=350&header=&thumbnail=undefined&rev=-1&graph=jslib_traverse_prototype The only difference between Safari and Chromium that immediately come into my mind is change to GC policy---I haven't yet implemented retention of cached node lists in v8's bindings (will implement this week). Gavin, if you have some spare cycles, may I ask you to comment out GC thing (this is the only change in JSNodeCustom.cpp: http://trac.webkit.org/changeset/58526/trunk/WebCore/bindings/js/JSNodeCustom.cpp) and check if it solves the problem? If you don't have any cycles, just let me know and I'll investigate. If possible, I'd ask not to revert it for, say, a week---I'll start working on it immediately. And just for your information: there is a pretty convenient way to compare several Dromaeo runs---just separate run ids with comma: http://dromaeo.com/?id=102238,102247 (In reply to comment #0) > This change -> http://trac.webkit.org/changeset/58526 > > Regressed Dromaeo JS lib tests by ~30% on my machine. That said, it was about > an equally huge win on Dromaeo DOM core (awesome!), but given our > no-regressions policy I'd imagine we'll need to find a way to avoid the > performance hit if we are to keep this in the tree. Might be best to revert & > re-land a new patch, once this issue has been resolved? > > The hit seems to mostly fall on the CSS style & DOM traversal tests. > > http://dromaeo.com/?id=102238 > http://dromaeo.com/?id=102247 > > Darin, cc'ing you as reviewer of the original patch.
My thoughts would be to see whether the cost is in GC overhead (that being the obvious culprit if we're now caching GC objects) and failing that i would look at what is causing us to miss the cache (which would be the second most obvious culprit)
Hi Anton, I'm fairly snowed under myself this week, so Sam Weinig has kindly offered to step in for me and try to help. I think he plans to start with the basics, take some shark profiles & see what shows up. I'm sure he'll let you know once he has any info. many thanks, G.
(In reply to comment #3) > Hi Anton, > > I'm fairly snowed under myself this week, so Sam Weinig has kindly offered to > step in for me and try to help. I think he plans to start with the basics, > take some shark profiles & see what shows up. I'm sure he'll let you know once > he has any info. > > many thanks, > G. Sure, Gavin. So the main suspect is GC for now: http://dromaeo.com/?id=102562,102570,102580 . First one is head of WebKit, the second one is head of WebKit with my change backed out, the last one is head of WebKit with the following change: $ git diff diff --git a/WebCore/bindings/js/JSNodeCustom.cpp b/WebCore/bindings/js/JSNodeCu index 6d61037..c2ff970 100644 --- a/WebCore/bindings/js/JSNodeCustom.cpp +++ b/WebCore/bindings/js/JSNodeCustom.cpp @@ -179,7 +179,7 @@ void JSNode::markChildren(MarkStack& markStack) Node* node = m_impl.get(); node->markJSEventListeners(markStack); - node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData()); + // node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData()); // Nodes in the document are kept alive by JSDocument::mark, so, if we're i // the document, we need to mark the document, but we don't need to explici That's on windows box, but I hope to repro it on MBP as well. Looking further.
And I assign it to myself. But if anyone else wishes to grab it, that's fine, I'll still keep investigating.
(In reply to comment #4) > - node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData()); > + // node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData()); To get maximum speed on this line of code, we need to make sure we do the minimum amount of work on nodes without outstanding node lists. Finding the heap for "this" so we can get the global data pointer is one example of unwanted work. Also, we need to inline much of the checking for node lists, including at least the rare data bit check and possibly also the rare data hash table lookup and then the subsequent check for the cached node lists pointer inside the rare data structure. At the moment, I believe we do have the rare data bit check inlined, so the issue may be the overhead in getting the global data pointer. If worse comes to worst we might have to devote a bit in the Node to indicating whether we have any cached node lists.
(In reply to comment #6) > (In reply to comment #4) > > - node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData()); > > + // node->markCachedNodeLists(markStack, *Heap::heap(this)->globalData()); > > To get maximum speed on this line of code, we need to make sure we do the > minimum amount of work on nodes without outstanding node lists. Finding the > heap for "this" so we can get the global data pointer is one example of > unwanted work. Also, we need to inline much of the checking for node lists, > including at least the rare data bit check and possibly also the rare data hash > table lookup and then the subsequent check for the cached node lists pointer > inside the rare data structure. > > At the moment, I believe we do have the rare data bit check inlined, so the > issue may be the overhead in getting the global data pointer. > > If worse comes to worst we might have to devote a bit in the Node to indicating > whether we have any cached node lists. Agree, and thanks a lot for your suggestions. Will investigate various approaches. I inlined by your suggestion a fast check for presence of rare data, but will start with inlining the check for presence of cached node lists. Let's hope it's not a node with tons on node lists cached on it.
(In reply to comment #7) > I inlined by your suggestion a fast check for presence of rare > data, but will start with inlining the check for presence of cached node lists. > Let's hope it's not a node with tons on node lists cached on it. I think that instead, the right first step is to restructure to eliminate the call to *Heap::heap(this)->globalData() when rare data is not present.
(In reply to comment #8) > (In reply to comment #7) > > I inlined by your suggestion a fast check for presence of rare > > data, but will start with inlining the check for presence of cached node lists. > > Let's hope it's not a node with tons on node lists cached on it. > > I think that instead, the right first step is to restructure to eliminate the > call to *Heap::heap(this)->globalData() when rare data is not present. Thanks a lot, Darin, going to give it a try right now.
The easiest way is probably to remove the JSGlobalData* argument and add a JSObject* argument instead.
Created attachment 55276 [details] Late Heap lookup Alas, this doesn't solve the problem. Any obvious mistakes?
[+ Vitaly] So we carried out couple of experiments. Current results: 1) speeding up fast checks doesn't help: see patch above, I ran another patch which uses bit flag when cached node lists are present---regression still persists; 2) heap profiler (in Chromium) hints that Dromaeo tests retain documents (one per suite like "DOM Attributes (jQuery)" and those apparently keep node lists; 3) if one runs 'regressed suite' separately, omitting suites before, there is no regression: http://dromaeo.com/?id=102605,102610 So the best hypothesis for now (which I will check tomorrow) is we've got 'leaked' cached node lists which sit and make subsequent GCs slower. If that's true, ideally Dromaeo should have cleaned up its garbage (and I'll experiment with it nulling out globals in suites). But anyway, esp. as JSC heap is not segregated by DOMWindow yet, we probably need to collect node lists if they don't have custom properties set. Thoughts?
BTW, Darin, do you remember why it was decided that we need to retain cached node lists as long as node is alive? I remember it was due to failing layout tests, but is it indeed desired semantics? I'll check tomorrow HTML5 wording, but any insights are most appreciated.
(In reply to comment #13) > BTW, Darin, do you remember why it was decided that we need to retain cached > node lists as long as node is alive? I remember it was due to failing layout > tests, but is it indeed desired semantics? > > I'll check tomorrow HTML5 wording, but any insights are most appreciated. The general principle is that garbage collection must not be detectable. So if someone adds custom properties to a node list it’s important we don’t lose those. That’s the main concept. The rest is just about performance tuning, I suppose.
I did the experiment of running the whole Dromaeo JavaScript Library Test and sharking just "DOM Traversal (Prototype)" (that one was particularly slow. It seems an exorbitant amount of time is being spent creating the emptyValue QualifiedName for the TagNodeListCache in markNodeLists. The emptyValue Qualified name can be greatly simplified in one of two ways, 1) use a 0 value for QualifiedName::m_impl or 2) use a static global. I am trying #1 now and will report back the results. In addition, I am going to investigate adding a call to hasCustomProperties() in markDOMObjectWrapper (or something similar that actually works, like adding a new version of markDOMObjectWrapper that does this) to avoid keeping alive the wrappers when there are no observable properties.
(In reply to comment #14) > (In reply to comment #13) > > BTW, Darin, do you remember why it was decided that we need to retain cached > > node lists as long as node is alive? I remember it was due to failing layout > > tests, but is it indeed desired semantics? > > > > I'll check tomorrow HTML5 wording, but any insights are most appreciated. > > The general principle is that garbage collection must not be detectable. So if > someone adds custom properties to a node list it’s important we don’t lose > those. That’s the main concept. The rest is just about performance tuning, I > suppose. Darin, I'll respond somewhat later---I need to carry on couple of experiments before.
(In reply to comment #15) > I did the experiment of running the whole Dromaeo JavaScript Library Test and > sharking just "DOM Traversal (Prototype)" (that one was particularly slow. It > seems an exorbitant amount of time is being spent creating the emptyValue > QualifiedName for the TagNodeListCache in markNodeLists. The emptyValue > Qualified name can be greatly simplified in one of two ways, 1) use a 0 value > for QualifiedName::m_impl or 2) use a static global. I am trying #1 now and > will report back the results. > > In addition, I am going to investigate adding a call to hasCustomProperties() > in markDOMObjectWrapper (or something similar that actually works, like adding > a new version of markDOMObjectWrapper that does this) to avoid keeping alive > the wrappers when there are no observable properties. Sam, thanks a lot. Any news about your experiments? I am asking as in ~3 hrs I'll start working on it and I'd rather not duplicate the work you're already doing.
Created attachment 55404 [details] Next patch: faster emptyValue
(In reply to comment #18) > Created an attachment (id=55404) [details] > Next patch: faster emptyValue This patch restores notable amount of score for me---this is all due to faster emptyValue (thanks to Sam's investigation). Would appreciate if anyone else checks if it helps on their box. So the main suspect for now would be iteration over cached node lists. I am going to use NodeListsNodeData::m_listsWithCaches to see if it'll help.
Created attachment 55408 [details] My emptyValue patch
Comment on attachment 55404 [details] Next patch: faster emptyValue This is another copy of your JSGlobalData patch, not an emptyValue one. I like Sam's approach to the emptyValue issue -- I suggest we use his.
Created attachment 55442 [details] Fix
Comment on attachment 55442 [details] Fix > + - Only mark cached NodeLists on Documents instead of all Nodes. Why is this OK?
Attachment 55442 [details] was posted by a committer and has review+, assigning to Sam Weinig for commit.
Landed in r59058.
(In reply to comment #21) > (From update of attachment 55404 [details]) > This is another copy of your JSGlobalData patch, not an emptyValue one. > > I like Sam's approach to the emptyValue issue -- I suggest we use his. Sorry, wrong patch.
(In reply to comment #23) > (From update of attachment 55442 [details]) > > + - Only mark cached NodeLists on Documents instead of all Nodes. > > Why is this OK? I'd be curious to know as well. And thanks a lot for fixing the problem, Sam.
(In reply to comment #27) > (In reply to comment #23) > > (From update of attachment 55442 [details] [details]) > > > + - Only mark cached NodeLists on Documents instead of all Nodes. > > > > Why is this OK? > > I'd be curious to know as well. And thanks a lot for fixing the problem, Sam. Sam answered my question in person. Sam, when you get a chance can you comment here in writing?
(In reply to comment #28) > (In reply to comment #27) > > (In reply to comment #23) > > > (From update of attachment 55442 [details] [details] [details]) > > > > + - Only mark cached NodeLists on Documents instead of all Nodes. > > > > > > Why is this OK? > > > > I'd be curious to know as well. And thanks a lot for fixing the problem, Sam. > > Sam answered my question in person. Sam, when you get a chance can you comment here in writing? Sam, Darin, sorry for being pushy, but could you clarify the story with Document? I'd like to adjust v8's GC as well, but would like to have it as similar to JSC as possible.
Sam should answer the question. Here's what I remember: Part of the answer is that almost all use of these functions is on Document. Another part is that collection of the JavaScript wrapper only affects the lifetime of custom properties on the node list in cases where no one is retaining a pointer to that list.