73737 – HIERARCHY_REQUEST_ERR check in checkAcceptChild should be optimized for newChild without children

RESOLVED FIXED 73737

HIERARCHY_REQUEST_ERR check in checkAcceptChild should be optimized for newChild without children

https://bugs.webkit.org/show_bug.cgi?id=73737

Summary HIERARCHY_REQUEST_ERR check in checkAcceptChild should be optimized for newCh...

Ryosuke Niwa

Reported 2011-12-02 23:32:11 PST

Per statistics posted in http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-December/034038.html, it's pretty clear that the check for HIERARCHY_REQUEST_ERR in checkAcceptChild should be optimized for the case where newChild doesn't have any children or less than 10 children.

Attachments
perf test (1.09 KB, text/html) 2011-12-03 01:15 PST, Ryosuke Niwa	no flags	Details
Patch (1.96 KB, patch) 2011-12-03 01:35 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Also implement arv's optimization (2.31 KB, patch) 2011-12-04 00:03 PST, Ryosuke Niwa	no flags	Details Formatted Diff Diff
new perf test (1.21 KB, text/html) 2011-12-04 01:14 PST, Ryosuke Niwa	no flags	Details
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2011-12-02 23:32:37 PST

Diff I used to collect statistics for completeness: Index: Source/WebCore/dom/Node.cpp =================================================================== --- Source/WebCore/dom/Node.cpp (revision 101848) +++ Source/WebCore/dom/Node.cpp (working copy) @@ -1309,6 +1309,15 @@ // HIERARCHY_REQUEST_ERR: Raised if this node is of a type that does not allow children of the type of the // newChild node, or if the node to append is one of this node's ancestors. + { + unsigned i = 0; + for (Node* node = newChild; node && i < 100; node = node->traverseNextNode(newChild)) + i++; + if (i < 100) + fprintf(stderr, "inserting child with %d nodes\n", i); + else + fprintf(stderr, "inserting child with 100+ nodes\n"); + } if (newChild == newParent || newParent->isDescendantOf(newChild)) { ec = HIERARCHY_REQUEST_ERR;

Ryosuke Niwa

Comment 2 2011-12-03 01:15:27 PST

Created attachment 117749 [details] perf test

Ryosuke Niwa

Comment 3 2011-12-03 01:28:59 PST

I took 11 samples on Apple's mac port applied on r101862. Without patch: shallow tree: 2ms deep tree: 4.09ms (std:0.3) deeper tree: 11.45ms (std:0.5) With patch: shallow tree: 2ms deep tree: 3ms deeper tree: 6.91ms (std 0.3) 40% improvement! The number, however, indicates that we have at least one more linear algorithm hidden somewhere.

Ryosuke Niwa

Comment 4 2011-12-03 01:35:21 PST

Created attachment 117751 [details] Patch

Erik Arvidsson

Comment 5 2011-12-03 12:28:41 PST

Thanks for bringing this up. I forgot we talked about this a while ago. I thought of another simple optimization we could do here. If we are comparing two trees that have different inDocument state, then we know that one cannot be a descendant of the other: if (inDocument() != other->inDocument()) return false; I'm curious how many of the cases you saw where a node is appended that has no children falls into the above category? My gut feeling is that the building of the disconnected tree falls into the no child case and that the attachment to the document would fall into the above case.

Ryosuke Niwa

Comment 6 2011-12-03 15:09:59 PST

(In reply to comment #5) > Thanks for bringing this up. I forgot we talked about this a while ago. > > I thought of another simple optimization we could do here. If we are comparing two trees that have different inDocument state, then we know that one cannot be a descendant of the other: > > if (inDocument() != other->inDocument()) > return false; > > I'm curious how many of the cases you saw where a node is appended that has no children falls into the above category? My gut feeling is that the building of the disconnected tree falls into the no child case and that the attachment to the document would fall into the above case. That's an excellent obervation obervstion. I'm going to collect a stat on this as well as the number of ancesors newParent has.

Darin Adler

Comment 7 2011-12-03 19:27:18 PST

That code counts the total number of descendants, not the number of children.

Ryosuke Niwa

Comment 8 2011-12-03 21:00:34 PST

(In reply to comment #7) > That code counts the total number of descendants, not the number of children. Oh, right. That's what I meant. New stat coming per Erik's comment. Will wait landing this patch until that.

Ryosuke Niwa

Comment 9 2011-12-03 23:19:09 PST

(In reply to comment #5) > I'm curious how many of the cases you saw where a node is appended that has no children falls into the above category? My gut feeling is that the building of the disconnected tree falls into the no child case and that the attachment to the document would fall into the above case. 59%. Sample size: 42,000 - 1 n = Number of inserted nodes 1 2 7 6 100+ 3 10+ 5 4 20+ 0.679 0.089 0.038 0.029 0.028 0.024 0.021 0.018 0.013 0.011 d = Number of ancestors of new parent 1 20+ 10+ 3 2 4 5 9 8 6 0.544 0.160 0.114 0.077 0.061 0.011 0.010 0.008 0.006 0.006 P(either but not both in document) = 0.625610133575 P(d=1 | n=1) ≈ 0.50007 P(parent not in document | n=1) ≈ 0.585 P(child not in document | n=1) ≈ 0.00681 P(either but not both in document | n=1) ≈ 0.589 P(either but not both in document) ≈ 0.626 P(n = 1, either but not both in document) ≈ 0.400 P(n = 1 | either but not both in document) ≈ 0.639

Ryosuke Niwa

Comment 10 2011-12-03 23:55:21 PST

My data suggests that we should do both optimizations.

Ryosuke Niwa

Comment 11 2011-12-04 00:03:54 PST

Created attachment 117784 [details] Also implement arv's optimization

Ryosuke Niwa

Comment 12 2011-12-04 00:48:16 PST

Resubmitting the patch for review since I've implemented Erik's optimization as well. Most significantly, P(n = 1 or either but not both in document) ≈ 0.904. So this patch will make 90% of cases really fast.

Ryosuke Niwa

Comment 13 2011-12-04 01:14:12 PST

Created attachment 117785 [details] new perf test Given that parent having more than 30 ancestors is very rare, I've updated my perf test to reflect this.

Erik Arvidsson

Comment 14 2011-12-04 11:12:03 PST

(In reply to comment #12) > Most significantly, P(n = 1 or either but not both in document) ≈ 0.904. So this patch will make 90% of cases really fast. 90%! Wow that is huge. I expect this to have big impact on Gmail. We should ask them for numbers to see how much this brings down their latency.

Ryosuke Niwa

Comment 15 2011-12-04 12:30:04 PST

Comment on attachment 117784 [details] Also implement arv's optimization Thanks for the reviews, Darin!

Ryosuke Niwa

Comment 16 2011-12-04 12:32:20 PST

(In reply to comment #14) > (In reply to comment #12) > > Most significantly, P(n = 1 or either but not both in document) ≈ 0.904. So this patch will make 90% of cases really fast. > > 90%! Wow that is huge. I expect this to have big impact on Gmail. We should ask them for numbers to see how much this brings down their latency. Hm... we still have a linear performance, and there are so many other APIs that are much slower so I'll be surprised this had any meaningful impact on Gmail but it'll great if this patch improved their latency :)

WebKit Review Bot

Comment 17 2011-12-04 13:53:13 PST

Comment on attachment 117784 [details] Also implement arv's optimization Clearing flags on attachment: 117784 Committed r101967: <http://trac.webkit.org/changeset/101967>

WebKit Review Bot

Comment 18 2011-12-04 13:53:18 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component DOM

Assignee

Ryosuke Niwa

Reported

2011-12-02 23:32 PST

Modified

2011-12-16 23:50 PST History

CC List

8 users Show

URL

Keywords

Depends on

73802

Blocks

73853

Dependencies

tree graph