Bug 73737 - HIERARCHY_REQUEST_ERR check in checkAcceptChild should be optimized for newChild without children
Summary: HIERARCHY_REQUEST_ERR check in checkAcceptChild should be optimized for newCh...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords:
Depends on: 73802
Blocks: 73853
  Show dependency treegraph
 
Reported: 2011-12-02 23:32 PST by Ryosuke Niwa
Modified: 2011-12-16 23:50 PST (History)
8 users (show)

See Also:


Attachments
perf test (1.09 KB, text/html)
2011-12-03 01:15 PST, Ryosuke Niwa
no flags Details
Patch (1.96 KB, patch)
2011-12-03 01:35 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Also implement arv's optimization (2.31 KB, patch)
2011-12-04 00:03 PST, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
new perf test (1.21 KB, text/html)
2011-12-04 01:14 PST, Ryosuke Niwa
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2011-12-02 23:32:11 PST
Per statistics posted in http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-December/034038.html,
it's pretty clear that the check for HIERARCHY_REQUEST_ERR in checkAcceptChild should be optimized for the case
where newChild doesn't have any children or less than 10 children.
Comment 1 Ryosuke Niwa 2011-12-02 23:32:37 PST
Diff I used to collect statistics for completeness:

Index: Source/WebCore/dom/Node.cpp
===================================================================
--- Source/WebCore/dom/Node.cpp	(revision 101848)
+++ Source/WebCore/dom/Node.cpp	(working copy)
@@ -1309,6 +1309,15 @@
 
     // HIERARCHY_REQUEST_ERR: Raised if this node is of a type that does not allow children of the type of the
     // newChild node, or if the node to append is one of this node's ancestors.
+    {
+        unsigned i = 0;
+        for (Node* node = newChild; node && i < 100; node = node->traverseNextNode(newChild))
+            i++;
+        if (i < 100)
+            fprintf(stderr, "inserting child with %d nodes\n", i);
+        else
+            fprintf(stderr, "inserting child with 100+ nodes\n");
+    }
 
     if (newChild == newParent || newParent->isDescendantOf(newChild)) {
         ec = HIERARCHY_REQUEST_ERR;
Comment 2 Ryosuke Niwa 2011-12-03 01:15:27 PST
Created attachment 117749 [details]
perf test
Comment 3 Ryosuke Niwa 2011-12-03 01:28:59 PST
I took 11 samples on Apple's mac port applied on r101862.

Without patch:
shallow tree: 2ms
deep tree: 4.09ms (std:0.3)
deeper tree: 11.45ms (std:0.5)

With patch:
shallow tree: 2ms
deep tree: 3ms
deeper tree: 6.91ms (std 0.3)

40% improvement! The number, however, indicates that we have at least one more linear algorithm hidden somewhere.
Comment 4 Ryosuke Niwa 2011-12-03 01:35:21 PST
Created attachment 117751 [details]
Patch
Comment 5 Erik Arvidsson 2011-12-03 12:28:41 PST
Thanks for bringing this up. I forgot we talked about this a while ago.

I thought of another simple optimization we could do here. If we are comparing two trees that have different inDocument state, then we know that one cannot be a descendant of the other:

if (inDocument() != other->inDocument())
    return false;

I'm curious how many of the cases you saw where a node is appended that has no children falls into the above category? My gut feeling is that the building of the disconnected tree falls into the no child case and that the attachment to the document would fall into the above case.
Comment 6 Ryosuke Niwa 2011-12-03 15:09:59 PST
(In reply to comment #5)
> Thanks for bringing this up. I forgot we talked about this a while ago.
> 
> I thought of another simple optimization we could do here. If we are comparing two trees that have different inDocument state, then we know that one cannot be a descendant of the other:
> 
> if (inDocument() != other->inDocument())
>     return false;
> 
> I'm curious how many of the cases you saw where a node is appended that has no children falls into the above category? My gut feeling is that the building of the disconnected tree falls into the no child case and that the attachment to the document would fall into the above case.

That's an excellent obervation obervstion. I'm going to collect a stat on this as well as the number of ancesors newParent has.
Comment 7 Darin Adler 2011-12-03 19:27:18 PST
That code counts the total number of descendants, not the number of children.
Comment 8 Ryosuke Niwa 2011-12-03 21:00:34 PST
(In reply to comment #7)
> That code counts the total number of descendants, not the number of children.

Oh, right. That's what I meant. New stat coming per Erik's comment. Will wait landing this patch until that.
Comment 9 Ryosuke Niwa 2011-12-03 23:19:09 PST
(In reply to comment #5)
> I'm curious how many of the cases you saw where a node is appended that has no children falls into the above category? My gut feeling is that the building of the disconnected tree falls into the no child case and that the attachment to the document would fall into the above case.

59%.

Sample size: 42,000 - 1

n = Number of inserted nodes
    1     2     7     6  100+     3   10+     5     4   20+
0.679 0.089 0.038 0.029 0.028 0.024 0.021 0.018 0.013 0.011

d = Number of ancestors of new parent
    1   20+   10+     3     2     4     5     9     8     6
0.544 0.160 0.114 0.077 0.061 0.011 0.010 0.008 0.006 0.006

P(either but not both in document) = 0.625610133575

P(d=1 | n=1) ≈ 0.50007
P(parent not in document | n=1) ≈ 0.585
P(child not in document | n=1) ≈ 0.00681
P(either but not both in document | n=1) ≈ 0.589
P(either but not both in document) ≈ 0.626
P(n = 1, either but not both in document) ≈ 0.400
P(n = 1 | either but not both in document) ≈ 0.639
Comment 10 Ryosuke Niwa 2011-12-03 23:55:21 PST
My data suggests that we should do both optimizations.
Comment 11 Ryosuke Niwa 2011-12-04 00:03:54 PST
Created attachment 117784 [details]
Also implement arv's optimization
Comment 12 Ryosuke Niwa 2011-12-04 00:48:16 PST
Resubmitting the patch for review since I've implemented Erik's optimization as well.

Most significantly, P(n = 1 or either but not both in document) ≈ 0.904. So this patch will make 90% of cases really fast.
Comment 13 Ryosuke Niwa 2011-12-04 01:14:12 PST
Created attachment 117785 [details]
new perf test

Given that parent having more than 30 ancestors is very rare, I've updated my perf test to reflect this.
Comment 14 Erik Arvidsson 2011-12-04 11:12:03 PST
(In reply to comment #12)
> Most significantly, P(n = 1 or either but not both in document) ≈ 0.904. So this patch will make 90% of cases really fast.

90%! Wow that is huge. I expect this to have big impact on Gmail. We should ask them for numbers to see how much this brings down their latency.
Comment 15 Ryosuke Niwa 2011-12-04 12:30:04 PST
Comment on attachment 117784 [details]
Also implement arv's optimization

Thanks for the reviews, Darin!
Comment 16 Ryosuke Niwa 2011-12-04 12:32:20 PST
(In reply to comment #14)
> (In reply to comment #12)
> > Most significantly, P(n = 1 or either but not both in document) ≈ 0.904. So this patch will make 90% of cases really fast.
> 
> 90%! Wow that is huge. I expect this to have big impact on Gmail. We should ask them for numbers to see how much this brings down their latency.

Hm... we still have a linear performance, and there are so many other APIs that are much slower so I'll be surprised this had any meaningful impact on Gmail but it'll great if this patch improved their latency :)
Comment 17 WebKit Review Bot 2011-12-04 13:53:13 PST
Comment on attachment 117784 [details]
Also implement arv's optimization

Clearing flags on attachment: 117784

Committed r101967: <http://trac.webkit.org/changeset/101967>
Comment 18 WebKit Review Bot 2011-12-04 13:53:18 PST
All reviewed patches have been landed.  Closing bug.