We spend quite of bit time here, including the function entry/exit. We should also generate different code for stayWithin and unscoped cases.
Created attachment 140621 [details] patch
Comment on attachment 140621 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=140621&action=review > Source/WebCore/dom/Node.h:861 > + const Node* n = this; Please spell out node. > Source/WebCore/dom/Node.h:869 > +inline Node* Node::traverseNextNode(const Node* stayWithin) const Should we use template to share the code? > Source/WebCore/dom/Node.h:877 > + const Node* n = this; Ditto about one-letter variable name. > Source/WebCore/dom/Node.h:878 > + while (n && !n->nextSibling() && (!stayWithin || n->parentNodeAsNode() != stayWithin)) It appears to me that we should call traverseNextNode() when stayWithin is 0. > Source/WebCore/dom/Node.h:889 > + const Node* n = this; Ditto about one-letter variable name. > Source/WebCore/dom/Node.h:903 > + const Node* n = this; Ditto about one-letter variable name. > Source/WebCore/dom/Node.h:904 > + while (n && !n->nextSibling() && (!stayWithin || n->parentNodeAsNode() != stayWithin)) Ditto about calling traverseNextSibling() instead.
I'm sure Darin & Eric have something to say about this.
Comment on attachment 140621 [details] patch Seems reasonable. Curious what benchmark hit this?
(In reply to comment #2) > > Source/WebCore/dom/Node.h:869 > > +inline Node* Node::traverseNextNode(const Node* stayWithin) const > > Should we use template to share the code? I wrote a template version and it was uglier and more confusing. As I mention in the changelog the function is simple. > > Source/WebCore/dom/Node.h:878 > > + while (n && !n->nextSibling() && (!stayWithin || n->parentNodeAsNode() != stayWithin)) > > It appears to me that we should call traverseNextNode() when stayWithin is 0. Yeah but that is an interface change and better done separately. We don't know what relies on ability to pass null. We should ASSERT(stayWithin).
(In reply to comment #5) > (In reply to comment #2) > > > > Source/WebCore/dom/Node.h:869 > > > +inline Node* Node::traverseNextNode(const Node* stayWithin) const > > > > Should we use template to share the code? > > I wrote a template version and it was uglier and more confusing. As I mention in the changelog the function is simple. Okay. > > > Source/WebCore/dom/Node.h:878 > > > + while (n && !n->nextSibling() && (!stayWithin || n->parentNodeAsNode() != stayWithin)) > > > > It appears to me that we should call traverseNextNode() when stayWithin is 0. > > Yeah but that is an interface change and better done separately. We don't know what relies on ability to pass null. We should ASSERT(stayWithin). What I meant is that you could just call traverseNextNode() in that case. Oh well, I guess compilers are smart enough to move the condition out of the loop?
(In reply to comment #6) > > Yeah but that is an interface change and better done separately. We don't know what relies on ability to pass null. We should ASSERT(stayWithin). > > What I meant is that you could just call traverseNextNode() in that case. Oh well, I guess compilers are smart enough to move the condition out of the loop? That still adds an extra branch for no reason. It should really be the callers responsibility to use the right variant as usually they just need one or the other.
(In reply to comment #7) > (In reply to comment #6) > > What I meant is that you could just call traverseNextNode() in that case. Oh well, I guess compilers are smart enough to move the condition out of the loop? > > That still adds an extra branch for no reason. It should really be the callers responsibility to use the right variant as usually they just need one or the other. Fair enough. I bet editing code is the one that calls this function with null stayWithin.
(In reply to comment #4) > (From update of attachment 140621 [details]) > Seems reasonable. Curious what benchmark hit this? Nothing really. If you sample regular page loading you can see this function showing up around ~1%.
<rdar://problem/10942765>
Created attachment 140632 [details] patch2 n -> node
Comment on attachment 140632 [details] patch2 Very reasonable patch.
Attachment 140632 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/dom/Node.h:894: More than one command on the same line [whitespace/newline] [4] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 140632 [details] patch2 View in context: https://bugs.webkit.org/attachment.cgi?id=140632&action=review > Source/WebCore/dom/Node.h:853 > +inline Node* Node::parentNodeAsNode() const > +{ > + // Avoid circular dependency with ContainerNode. > + return reinterpret_cast<Node*>(parentNode()); > +} Normally the way we handle stuff like this is to put the functions into ContainerNode.h, instead of a trick like this one. The downside then, is that anyone calling these functions has to include ContainerNode.h, but they probably do anyway.
(In reply to comment #14) > Normally the way we handle stuff like this is to put the functions into ContainerNode.h, instead of a trick like this one. > > The downside then, is that anyone calling these functions has to include ContainerNode.h, but they probably do anyway. Generally I would prefer keeping the functions in a predictable place but I see ContainerNode.h is already full of Node function. I'll move them there.
Comment on attachment 140632 [details] patch2 Attachment 140632 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/12652231
http://trac.webkit.org/changeset/116402
Re-opened since this is blocked by 85898
Seems like the inlining here is overly aggressive. I'll try what bots think of a version that just inlines the initial firstChild/nextSibling checks.
Just for the record, all the time I saw in this function in profiles was in the implementation of HTMLCollection and the related DOM classes.
(In reply to comment #20) > Just for the record, all the time I saw in this function in profiles was in the implementation of HTMLCollection and the related DOM classes. HTMLCollection checks DOM tree version to invalidate cache so making any DOM mutation inside a loop to iterate over a HTMLCollection will introduce a O(n^2) iteration where n is the number of nodes in the subtree. We should probably move HTMLCollection to the model DynamicSubtreNodeList uses.
I still think the right amount of inlining is a good idea for this function. Inlining the whole thing is probably too much. I also think we can override this in ContainerNode to do a version that’s even tighter for cases where we are starting with an Element* rather than a Node*. The firstChild function is smaller on an Element* than on a Node*.
I think I was remembering wrong. I think the hot cases were in one of the classes derived from DynamicSubtreeNodeList.
(In reply to comment #23) > I think I was remembering wrong. I think the hot cases were in one of the classes derived from DynamicSubtreeNodeList. I see. Do you happen to know tests / websites you used to obtain this result?
I was looking at a profile of loading the front page of nytimes.com.
Created attachment 141135 [details] another version with non-inlined ancestor sibling search
Comment on attachment 141135 [details] another version with non-inlined ancestor sibling search Attachment 141135 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/12654902
Created attachment 141191 [details] fix exports
Comment on attachment 141191 [details] fix exports View in context: https://bugs.webkit.org/attachment.cgi?id=141191&action=review > Source/WebCore/dom/ContainerNode.h:236 > + if (firstChild()) > + return firstChild(); > + if (nextSibling()) > + return nextSibling(); Should we create a local variables for firstChid, nextSibling, etc...? Or maybe compilers are smart enough to figure that out by themselves?
(In reply to comment #29) > Should we create a local variables for firstChid, nextSibling, etc...? > Or maybe compilers are smart enough to figure that out by themselves? I think compiler knows best here.
Comment on attachment 141191 [details] fix exports Attachment 141191 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/12672007
Comment on attachment 141191 [details] fix exports Attachment 141191 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/12653924
Lets see what bots think about this one. http://trac.webkit.org/changeset/116677
Re-opened since this is blocked by 86159
The fix for the Chromium build error mentioned in bug 86159 is to add an include of ContainerNode.h to the Chromium source file that has RetainedDOMInfo::GetElementCount() in it.
(In reply to comment #35) > The fix for the Chromium build error mentioned in bug 86159 is to add an include of ContainerNode.h to the Chromium source file that has RetainedDOMInfo::GetElementCount() in it. This seems to be the right fix for chromium. But it seems that we also have quite a few other places that call Node::traverseNextNode but might not include ContainerNode.h. This could be the reason mac-ews also showed the similar error. Can the author add the comment to those methods in Node.h, saying that ContainerNode.h should be included to access these inline methods?
(In reply to comment #36) > Can the author add the comment to those methods in Node.h, saying that ContainerNode.h should be included to access these inline methods? I’m don’t think that kind of comment would be helpful. We have many other functions in Node with that requirement and this really isn’t an ongoing problem. This kind of build breakage happens at the time we add the requirement, for existing code developed before it was required, and so the comment is not relevant.
I'm starting to feel the approach involving reinterpret_cast<Node*>(parentNode()) was superior.
Relanded with Chromium build fix. http://trac.webkit.org/changeset/116742
(In reply to comment #22) > I also think we can override this in ContainerNode to do a version that’s even tighter for cases where we are starting with an Element* rather than a Node*. The firstChild function is smaller on an Element* than on a Node*. Most of the clients actually want to traverse Elements (or StyledElements) not Nodes. We might want to have traverseNextElement/traverseNextElementSibling. If we tracked whether elements have any Element children (with a bit) we could avoid looking into Text nodes at all.
Looks like on Chromium Mac this was ~15% speedup in DOM/GetElement http://webkit-perf.appspot.com/graph.html#tests=[[19048,2001,3001]]&sel=1336694619128.783,1336808170127.4514,0,750&displayrange=7&datatype=running and ~5% speedup in DOM/Table http://webkit-perf.appspot.com/graph.html#tests=[[31016,2001,3001]]&sel=none&displayrange=7&datatype=running Dromaeo/cssquery-dojo shows a likely few percent speedup too http://webkit-perf.appspot.com/graph.html#tests=[[45012,2001,3001]]&sel=1336673435809.816,1336817676814,78125,150000&displayrange=7&datatype=running
On the other hand, DOM/GridSort appears to have regressed 15% on Qt :( http://webkit-perf.appspot.com/graph.html#tests=[[9106,2001,963028]]&sel=1336678907863.6448,1336774846949.939,37.20704845814978,47.286343612334804&displayrange=7&datatype=running