Bug 88834 - If Node X is reachable from JavaScript, all Nodes in the same tree should be kept alive
Summary: If Node X is reachable from JavaScript, all Nodes in the same tree should be ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kentaro Hara
URL:
Keywords: InRadar
Depends on: 92965 94324 98612
Blocks: 98658
  Show dependency treegraph
 
Reported: 2012-06-11 20:34 PDT by Kentaro Hara
Modified: 2018-09-08 01:51 PDT (History)
33 users (show)

See Also:


Attachments
Performance tests (2.05 KB, text/html)
2012-06-11 20:36 PDT, Kentaro Hara
no flags Details
Patch (15.02 KB, patch)
2012-06-11 20:39 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
the first WIP patch to observe build bots (66.04 KB, patch)
2012-06-20 23:28 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
WIP patch2 (73.08 KB, patch)
2012-06-22 07:03 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
WIP patch3 (65.12 KB, patch)
2012-06-25 17:43 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
WIP patch4 (66.00 KB, patch)
2012-06-25 23:47 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
Patch (64.77 KB, patch)
2012-06-27 05:37 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
patch for 'rough' review (101.44 KB, patch)
2012-07-31 05:09 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
Simplified the algorithm. Fixed crashes. Fixed memory leaks. But the perf regression of Dromaeo/dom-modify is still unacceptable. I'm working on the optimization. (71.78 KB, patch)
2012-08-27 21:40 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
Fixed perf regression (76.15 KB, patch)
2012-09-03 19:02 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
rebased patch (76.10 KB, patch)
2012-09-03 19:17 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
overall comments are appreciated before serious review (75.81 KB, patch)
2012-09-06 00:48 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
Pathological test case 1 (1.74 KB, text/html)
2012-09-24 22:46 PDT, Kentaro Hara
no flags Details
Pathological test case 2 (1.58 KB, text/html)
2012-09-24 22:48 PDT, Kentaro Hara
no flags Details
Patch (76.31 KB, patch)
2012-09-27 23:19 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
Patch (23.48 KB, patch)
2012-10-04 21:08 PDT, Geoffrey Garen
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
With V8 side fix (8.51 KB, patch)
2012-10-05 00:20 PDT, Kentaro Hara
no flags Details | Formatted Diff | Diff
patch + Gavin's and Mark's suggested edits - V8 code (23.94 KB, patch)
2012-10-05 15:35 PDT, Geoffrey Garen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kentaro Hara 2012-06-11 20:34:02 PDT
The current reference-count algorithm of Node objects does not guarantee "If Node X has a ref-count, all the Nodes in the tree which X belongs to are kept alive".

Look at the following example:

  div = document.createElement("div");
  document.body.appendChild(div);
  div.innerHTML = '<div><div><div><span id="span"><br><br><br>text</span></div></div></div>';
  span = document.getElementById("span");
  div.innerHTML = "";
  alert(span);  // <span>
  alert(span.parentNode);  // ???
  alert(span.firstChild);  // ???

What is the expected behavior?:

  (a) span.parentNode = <div>, span.firstChild = <br>
  (b) span.parentNode = null, span.firstChild = <br>
  (c) span.parentNode = <div>, span.firstChild = null
  (d) span.parentNode = null, span.firstChild = null
  (e) Any value is OK (i.e. the behavior is UNDEFINED)

Behavior of browsers:

  Safari 5.1.7: (b)
  Chrome 20.0: (b)
  Firefox 12.0: (a)
  Opera 11.64: (a)
  IE 9: (d)

We want to accomplish the behavior (a) without any performance regression and without any additional member variable in a Node object.
Comment 1 Kentaro Hara 2012-06-11 20:36:35 PDT
Created attachment 146998 [details]
Performance tests

This is a performance test that would be most affected by the overhead of the reference count algorithm.
Comment 2 Kentaro Hara 2012-06-11 20:39:08 PDT
Created attachment 146999 [details]
Patch
Comment 3 Kentaro Hara 2012-06-11 22:05:29 PDT
The idea I am implementing:

(1) No change in ref(), deref() and setParent().

(2) If the ref-count of a root node becomes 0, we call removedLastRef() for the root node.

(3) removedLastRef() calls a slightly modified version of ContainerNode::removeAllChildren().

(4) The modified version of ContainerNode::removeAllChildren() traverses child nodes, recording all the child nodes traversed.

(4-1) If all the child nodes in the DOM tree have 0 ref-count, we call destructors of all the recorded nodes (i.e. all the nodes in the DOM tree). That's it.

(4-2) If any child node X has 1~ ref-count, we stop the traversal and set a "lifetime-owner-bit" on the node X. No destructor is called in this case. The "lifetime-owner-bit" indicates that the node X keeps the lifetime of the DOM tree which X belongs to.

(5) When the ref-count of the node X becomes 0, we call removedLastRef() for the root node of the node X. Goes to (3).

(6) If the node X is going to be removed from the DOM tree, we call removedLastRef() for the root node of the DOM tree. Goes to (3). This step is needed to delegate the "lifetime-owner-bit" to someone (or to destruct all the nodes if all the nodes in the DOM tree have 0 ref-count).

The "lifetime-owner-bit" can be implemented by using the most significant bit of TreeShared::m_refCount. In terms of implementation, by initially setting "lifetime-owner-bit" on the root node, the implementation of TreeShared gets clearer (because the initial lifetime owner of the DOM tree is the root node).

I guess that this approach would be better in the following points:

- No overhead in ref(), deref() and setParent().

- Overhead happens when there is a DOM tree that has 1~ ref-count on other node than the root node. But this would be practically a rare case.


The patch is coming soon...
Comment 4 Geoffrey Garen 2012-06-11 22:40:04 PDT
(In reply to comment #3)
> (5) When the ref-count of the node X becomes 0, we call removedLastRef() for the root node of the node X. Goes to (3).

This algorithm is O(n^2) when the garbage collector frees a whole document, unless the garbage collector is lucky enough to deref the document node last. That's a show-stopper.

Also, "when the ref-count of the node X becomes 0" implies a change to deref(), which seems to contradict your stated goals.
Comment 5 Kentaro Hara 2012-06-11 23:07:28 PDT
Thanks ggaren.

(In reply to comment #4)
> (In reply to comment #3)
> > (5) When the ref-count of the node X becomes 0, we call removedLastRef() for the root node of the node X. Goes to (3).
> 
> This algorithm is O(n^2) when the garbage collector frees a whole document, unless the garbage collector is lucky enough to deref the document node last. That's a show-stopper.

Accurately, O(n^2) situation can potentially happen, but it would be pretty much artificial situation. The condition in which the worst cases happen is

- not "unless the garbage collector is lucky enough to deref the document node last"

- but "unless the garbage collector is lucky enough to deref nodes in a similar order to the order in which removeAllChildren() delegates the lifetime-owner-bit".

In addition, what we should concern would be not JavaScript GC but DOM mutations caused inside C++ DOM. When JavaScript GC happened, the overhead of GC itself would be much larger. (e.g. the current JSC's GC looks up a root node for all the nodes that have strong references.)

For example, the worst situation can happen in the following scenario:

DOM tree: document -- A -- B -- C -- D -- E

(1) Assume that there are strong references to document, C, D and E. Initially, the document has the lifetime-owner-bit.
(2) document = null. removedLastRef() is called on the document.
(3) The traversal happens. The traversal stops at C, and C takes over the lifetime-owner-bit.
(4) Nothing happens until C's ref-count becomes 0.
(5) If C's ref-count becomes 0, removedLastRef() is called for the document.
(6) The traversal stops at D, and D takes over the lifetime-owner-bit.
(7) Nothing happens until D's ref-count becomes 0.
(8) If D's ref-count becomes 0, removedLastRef() is called for the document.
(9) The traversal stops at E, and E takes over the lifetime-owner-bit.
(10) Nothing happens until E's ref-count becomes 0.
(11) If E's ref-count becomes 0, removedLastRef() is called for the document.
(12) Since the traversal completes, all the Nodes are destructed.

As you can see, if the ref-count becomes 0 in the order of E -> D -> C, then the steps (6) -- (11) are not needed. The worst case happens only when the traversal order matches the order in which the ref-count is lost.

> Also, "when the ref-count of the node X becomes 0" implies a change to deref(), which seems to contradict your stated goals.

Sorry, you're right.

One change is needed in deref():

  before: if (m_refCount <= 0 && !m_parent)
  after: if (m_refCount == LifetimeOwnerBitMask)  // LifetimeOwnerBitMask = 0x8000000

Also one change is needed in setParent():

  before: m_parent = parent;
  after: m_parent = parent; if (parent) m_refCount &= ~LifetimeOwnerBitMask; else m_refcount |= LifetimeOwnerBitMask.

(I hope I can upload the patch in a day...)
Comment 6 Kentaro Hara 2012-06-11 23:09:39 PDT
> - but "unless the garbage collector is lucky enough to deref nodes in a similar order to the order in which removeAllChildren() delegates the lifetime-owner-bit".

Correction: "if the garbage collector is unlucky enough to deref nodes in a similar order to the order in which removeAllChildren() delegates the lifetime-owner-bit"
Comment 7 Geoffrey Garen 2012-06-11 23:50:26 PDT
> The worst case happens only when the traversal order matches the order in which the ref-count is lost.

I'm not clear on why you think this case is artificial. It's common for nodes to be allocated in document order, and it's common for garbage collectors to run finalizers in allocation order. I would expect this case to be common.

Do you have some data that shows this case to be artificial?

> When JavaScript GC happened, the overhead of GC itself would be much larger. (e.g. the current JSC's GC looks up a root node for all the nodes that have strong references.)

I disagree. GC is O(n), not O(n^2).
Comment 8 Kentaro Hara 2012-06-11 23:58:30 PDT
(In reply to comment #7)
> > The worst case happens only when the traversal order matches the order in which the ref-count is lost.
> 
> I'm not clear on why you think this case is artificial. It's common for nodes to be allocated in document order, and it's common for garbage collectors to run finalizers in allocation order. I would expect this case to be common.
> 
> Do you have some data that shows this case to be artificial?
> 
> > When JavaScript GC happened, the overhead of GC itself would be much larger. (e.g. the current JSC's GC looks up a root node for all the nodes that have strong references.)
> 
> I disagree. GC is O(n), not O(n^2).

Ah, got it. You're right.

Anyway the worst case happens only when the traversal order matches the order in which the ref-count becomes 0. We might be able to prevent the problem by changing the traversal order; e.g. traverse in the reverse document order, or have two traversal pointers to mess up the order etc.
Comment 9 Kentaro Hara 2012-06-14 08:28:14 PDT
Unfortunately I encountered a bug of ContainerNodeAlgorithm, which is blocking the implementation of the new reference-count algorithm I explained above. I am fixing the bug, just a moment please:)
Comment 10 Kentaro Hara 2012-06-20 23:28:42 PDT
Created attachment 148737 [details]
the first WIP patch to observe build bots
Comment 11 Build Bot 2012-06-20 23:56:18 PDT
Comment on attachment 148737 [details]
the first WIP patch to observe build bots

Attachment 148737 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13026042
Comment 12 Kentaro Hara 2012-06-22 07:03:40 PDT
Created attachment 149019 [details]
WIP patch2
Comment 13 Build Bot 2012-06-22 07:33:54 PDT
Comment on attachment 149019 [details]
WIP patch2

Attachment 149019 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/13034409
Comment 14 Build Bot 2012-06-22 07:42:13 PDT
Comment on attachment 149019 [details]
WIP patch2

Attachment 149019 [details] did not pass win-ews (win):
Output: http://queues.webkit.org/results/13039358
Comment 15 Kentaro Hara 2012-06-25 17:43:36 PDT
Created attachment 149409 [details]
WIP patch3
Comment 16 Kentaro Hara 2012-06-25 23:47:36 PDT
Created attachment 149468 [details]
WIP patch4
Comment 17 Gustavo Noronha (kov) 2012-06-26 00:58:59 PDT
Comment on attachment 149468 [details]
WIP patch4

Attachment 149468 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/13090426
Comment 18 Ojan Vafai 2012-06-26 11:40:03 PDT
Comment on attachment 149468 [details]
WIP patch4

View in context: https://bugs.webkit.org/attachment.cgi?id=149468&action=review

> Source/WebCore/platform/TreeShared.h:185
> +    uint32_t m_refCount;

Haven't had time to digest the whole patch. Just one nit here. Can you avoid all the manual bit-masking by using bitfields here?

unsigned m_isLiftimeOwner : 1;
unsigned m_refCount : 31;

Then you also don't need to add the refCount method.
Comment 19 Kentaro Hara 2012-06-27 05:37:16 PDT
Created attachment 149734 [details]
Patch
Comment 20 Kentaro Hara 2012-06-27 05:50:03 PDT
(In reply to comment #18)
> > Source/WebCore/platform/TreeShared.h:185
> > +    uint32_t m_refCount;
> 
> Just one nit here. Can you avoid all the manual bit-masking by using bitfields here?

 Done. Thanks.
Comment 22 Gustavo Noronha (kov) 2012-06-27 06:48:01 PDT
Comment on attachment 149734 [details]
Patch

Attachment 149734 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/13101749
Comment 23 Geoffrey Garen 2012-06-27 09:18:50 PDT
Comment on attachment 149734 [details]
Patch

You haven't convinced me that O(n^2) is a good idea.
Comment 24 Kentaro Hara 2012-06-27 17:28:45 PDT
(In reply to comment #23)
> (From update of attachment 149734 [details])
> You haven't convinced me that O(n^2) is a good idea.

The document (page 19 -- page 22) explains that the O(n^2) case won't happen except for the very worst case (i.e. the case where the searchNewLifetimeOwner()'s traversal order matches the deref()'s traversal order), and that we would be able to avoid the worst case in common situations by traversing the tree in the following order:

- First, traverse the root node
- For the rest of nodes, traverse in the reverse DFS order

Wouldn't the explanation be enough? Or you do not want to accept the algorithm that might cause the worst case?
Comment 25 Filip Pizlo 2012-06-27 17:57:21 PDT
(In reply to comment #24)
> (In reply to comment #23)
> > (From update of attachment 149734 [details] [details])
> > You haven't convinced me that O(n^2) is a good idea.
> 
> The document (page 19 -- page 22) explains that the O(n^2) case won't happen except for the very worst case (i.e. the case where the searchNewLifetimeOwner()'s traversal order matches the deref()'s traversal order), and that we would be able to avoid the worst case in common situations by traversing the tree in the following order:
> 
> - First, traverse the root node
> - For the rest of nodes, traverse in the reverse DFS order
> 
> Wouldn't the explanation be enough? Or you do not want to accept the algorithm that might cause the worst case?

This is a big change.  The proof burden to show that this doesn't break the universe is huge.  That burden is on you.

What I worry about is:

1) That you have not done sufficient performance tests.  I want to know how this affects all known benchmarks that operate on the DOM.  I want to know what this does to page load time.  You've shown me some data on Dromaeo but nothing about page load.

2) That there are cases where the DFS code induces O(n^2) behavior.  You could prove this by reporting metrics of the behavior of your code on major websites (gmail, gdocs, bing, facebook, amazon, twitter ... you know the ones).  You have not done this, and that worries me a lot.  I'd be scared to see this change land without that sort of due diligence.

3) That this is completely unnecessary.  You're solving a corner case that probably nobody will notice but us, and you're adding a lot of code to support this corner case.  I'm not convinced that any of this is necessary at all.
Comment 26 Kentaro Hara 2012-06-27 19:30:54 PDT
(In reply to comment #25)
> 1) That you have not done sufficient performance tests.  I want to know how this affects all known benchmarks that operate on the DOM.  I want to know what this does to page load time.  You've shown me some data on Dromaeo but nothing about page load.
> 
> 2) That there are cases where the DFS code induces O(n^2) behavior.  You could prove this by reporting metrics of the behavior of your code on major websites (gmail, gdocs, bing, facebook, amazon, twitter ... you know the ones).  You have not done this, and that worries me a lot.  I'd be scared to see this change land without that sort of due diligence.

OK, let me investigate them.

> 3) That this is completely unnecessary.  You're solving a corner case that probably nobody will notice but us, and you're adding a lot of code to support this corner case.  I'm not convinced that any of this is necessary at all.

At least this is mandatory for redesigning V8 GC.
Comment 27 Filip Pizlo 2012-06-27 19:33:56 PDT
(In reply to comment #26)
> (In reply to comment #25)
> > 1) That you have not done sufficient performance tests.  I want to know how this affects all known benchmarks that operate on the DOM.  I want to know what this does to page load time.  You've shown me some data on Dromaeo but nothing about page load.
> > 
> > 2) That there are cases where the DFS code induces O(n^2) behavior.  You could prove this by reporting metrics of the behavior of your code on major websites (gmail, gdocs, bing, facebook, amazon, twitter ... you know the ones).  You have not done this, and that worries me a lot.  I'd be scared to see this change land without that sort of due diligence.
> 
> OK, let me investigate them.
> 
> > 3) That this is completely unnecessary.  You're solving a corner case that probably nobody will notice but us, and you're adding a lot of code to support this corner case.  I'm not convinced that any of this is necessary at all.
> 
> At least this is mandatory for redesigning V8 GC.

JavaScriptCore's GC approach already demonstrates that this is not mandatory for GC.

Is V8's GC really that much worse than JSC's?
Comment 28 Kentaro Hara 2012-06-27 19:58:34 PDT
(In reply to comment #27)
> JavaScriptCore's GC approach already demonstrates that this is not mandatory for GC.
> 
> Is V8's GC really that much worse than JSC's?

That's a pretty long story and I am writing a document about it:) In a nutshell, the essential difference is that V8 GC is generational, and the semantics is mandatory to reclaim DOM objects in the minor GC.

Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.)

Rough explanation of the difficulty:

- There are two independent object graphs in the world; one is a JavaScript object graph, and the other is a DOM object graph.

- Regarding the JavaScript object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just look up JavaScript objects in the new space. The minor GC does not know JavaScript objects in the old space.

- Regarding the DOM object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just traverse DOM objects that have wrapper objects in the new space. The minor GC does not know DOM objects that have wrapper objects in the old space nor DOM objects that does not have wrapper objects.

- In summary, the minor GC must work on a subset of the JavaScript object graph and a subset of the DOM object graph. To judge the reachability to each DOM object correctly (conservatively) with the limited knowledge, the semantics "If Node X has a ref-count, all the Nodes in the tree which X belongs to are kept alive" in the DOM side is required.
Comment 29 Filip Pizlo 2012-06-27 21:02:53 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > JavaScriptCore's GC approach already demonstrates that this is not mandatory for GC.
> > 
> > Is V8's GC really that much worse than JSC's?
> 
> That's a pretty long story and I am writing a document about it:) In a nutshell, the essential difference is that V8 GC is generational, and the semantics is mandatory to reclaim DOM objects in the minor GC.

I think if you're going to change WebCore just to help V8's GC, you should justify it for more than just V8.  One good starting point is to share that document when you're ready.

> 
> Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.)

Is this objective necessary for performance or correctness, or both?

> 
> Rough explanation of the difficulty:
> 
> - There are two independent object graphs in the world; one is a JavaScript object graph, and the other is a DOM object graph.
> 
> - Regarding the JavaScript object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just look up JavaScript objects in the new space. The minor GC does not know JavaScript objects in the old space.
> 
> - Regarding the DOM object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just traverse DOM objects that have wrapper objects in the new space. The minor GC does not know DOM objects that have wrapper objects in the old space nor DOM objects that does not have wrapper objects.
> 
> - In summary, the minor GC must work on a subset of the JavaScript object graph and a subset of the DOM object graph. To judge the reachability to each DOM object correctly (conservatively) with the limited knowledge, the semantics "If Node X has a ref-count, all the Nodes in the tree which X belongs to are kept alive" in the DOM side is required.

I'm unconvinced.  I think you can build a GenGC that has cheap interaction with the DOM without having to go to all of this trouble.
Comment 30 Kentaro Hara 2012-06-27 21:41:06 PDT
(In reply to comment #29)
> I think if you're going to change WebCore just to help V8's GC, you should justify it for more than just V8. 

The benefit other than V8 is described in the "Motivation" section in the document I uploaded; i.e. improve predictability, fix the FIXME in JSC GC.

> One good starting point is to share that document when you're ready.

Will write up and share the document in a week or so.

> > Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.)
> 
> Is this objective necessary for performance or correctness, or both?

Performance only. Currently V8 GC performs much much worse than JSC GC (V8 GC can stop 4 seconds for one document.createElement()), because all DOM objects survive two copying GC cycles, are promoted to the old space, and then reclaimed by the heavy major GC. We want to fix the behavior.
Comment 31 Filip Pizlo 2012-06-28 00:12:41 PDT
(In reply to comment #30)
> (In reply to comment #29)
> > I think if you're going to change WebCore just to help V8's GC, you should justify it for more than just V8. 
> 
> The benefit other than V8 is described in the "Motivation" section in the document I uploaded; i.e. improve predictability, fix the FIXME in JSC GC.

I'm not convinced that either of these benefits is enough to justify an O(n^2) worst-case.

> 
> > One good starting point is to share that document when you're ready.
> 
> Will write up and share the document in a week or so.

Cool, thanks!

> 
> > > Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.)
> > 
> > Is this objective necessary for performance or correctness, or both?
> 
> Performance only. Currently V8 GC performs much much worse than JSC GC (V8 GC can stop 4 seconds for one document.createElement()), because all DOM objects survive two copying GC cycles, are promoted to the old space, and then reclaimed by the heavy major GC. We want to fix the behavior.

Seems like a sensible goal.  But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible.
Comment 32 Kentaro Hara 2012-06-28 00:54:37 PDT
(In reply to comment #31)
> But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible.

- JSC GC and V8 GC can explicitly avoid the worst case, if they want. If GC cares the deref()'s order so that root nodes are deref()ed in the end, the worst case does not happen (rather, only the "best" case happens). deref()'s order of GC is controllable. Unless GC tries to deref nodes in the same order as searchNewLifetimeOwner()'s order, we can avoid the worst case.

- What we need to concern is DOM operations issued by JavaScript. If removeChild(), innerHTML = "", etc were issued in the same order as searchNewLifetimeOwner()'s order, the worst case happens. DOM operations depend on each Web app, and there is no way to control deref()'s order explicitly.
Comment 33 Filip Pizlo 2012-06-28 01:58:19 PDT
(In reply to comment #32)
> (In reply to comment #31)
> > But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible.
> 
> - JSC GC and V8 GC can explicitly avoid the worst case, if they want. If GC cares the deref()'s order so that root nodes are deref()ed in the end, the worst case does not happen (rather, only the "best" case happens). deref()'s order of GC is controllable. Unless GC tries to deref nodes in the same order as searchNewLifetimeOwner()'s order, we can avoid the worst case.
> 
> - What we need to concern is DOM operations issued by JavaScript. If removeChild(), innerHTML = "", etc were issued in the same order as searchNewLifetimeOwner()'s order, the worst case happens. DOM operations depend on each Web app, and there is no way to control deref()'s order explicitly.

It's this latter point that I'm referring to.  I'm interested in the worst-case in particular.

I'm not sure it makes sense to make WebKit slower to make V8's GC faster.

Hence the high proof burden for you.  You need to really convince us that this is either not going to make things slower under any reasonable circumstance, or alternatively that the change in semantics is needed to make some website work.
Comment 34 Filip Pizlo 2012-06-28 02:41:29 PDT
(In reply to comment #32)
> (In reply to comment #31)
> > But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible.
> 
> - JSC GC and V8 GC can explicitly avoid the worst case, if they want. If GC cares the deref()'s order so that root nodes are deref()ed in the end, the worst case does not happen (rather, only the "best" case happens). deref()'s order of GC is controllable. Unless GC tries to deref nodes in the same order as searchNewLifetimeOwner()'s order, we can avoid the worst case.

I thought about this more.  Your statement that the deref() order in the GC is controllable is almost false.  To control the deref() order, we'd have to perform all deref()'s all at once, and perform some sorting step to get the desired order.  We don't want to do any of this, since we want to invoke the deref()'s incrementally so as to reduce pause time and increase minimum mutator utilization.  This implies that the GC's deref() order is likely to be determined by some complex function of allocation order and the shape of the set of objects that survived not only the current collection, but all collections since system start.  You haven't proved that this function does not resonate with the worst-case of your algorithm.  Such a proof is hard, since reasoning about it analytically is beyond the state of the art (if we could reason about this function analytically, we could prove accurate bounds on fragmentation in non-copying GC - something we know we cannot do; the best known bounds [see http://comjnl.oxfordjournals.org/content/20/3/242.full.pdf] are horribly pathological).  Hence the closest you'll get to a proof is to do experiments.

This implies two things.  First, it raises the proof burden for your claim that the GC will *tend not* to hit the pathological case.  Put another way, it makes the current status quo appear much more attractive than your patch - with the status quo, we already know that we won't hit this pathology.  Second, it contravenes your claim that the GC can actively avoid the pathological case.

In other words, you really need to show evidence that we won't hit the worst case on real websites.  Small benchmarks like Dromaeo are unconvincing here, since they tend to just repeat the same small sequence of actions many times.  Interesting GC behavior tends to only happen in code that does not repeat the same action, but instead performs some chaotic sequence of seemingly unrelated actions.  Your only chance of studying this is on real websites.
Comment 35 Kentaro Hara 2012-06-28 04:32:22 PDT
Thanks for the detailed comment!

> In other words, you really need to show evidence that we won't hit the worst case on real websites. Your only chance of studying this is on real websites.

The point completely makes sense. Counting the number of searchNewLifetimeOwner() that misses deleting objects would give us real data of how the algorithm works in the real world. I'll investigate it after writing up the document about V8 GC.

> Your statement that the deref() order in the GC is controllable is almost false.  To control the deref() order, we'd have to perform all deref()'s all at once, and perform some sorting step to get the desired order.  We don't want to do any of this, since we want to invoke the deref()'s incrementally so as to reduce pause time and increase minimum mutator utilization.

I got your point, but I am not so convinced with the point. Let's consider this way:

Case 1: Assume that the semantics is needed to make the real world work. In this case, my algorithm would make sense. Period.

Case 2: Assume that the semantics is not needed to make the real world work. In this case, considering that the current TreeShared algorithm works predictably only when a root node is an n-ref node, "the semantics is not needed to make the real world work" implies "the real world always keeps a reference to a root node until the whole tree is dereferenced". Even JSC's GC takes care of it; i.e. JSC's GC intentionally keeps a wrapper object of a root node alive in order to keep the whole tree alive (http://code.google.com/codesearch#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/bindings/js/JSNodeCustom.cpp&exact_package=chromium&q=jsnodecustom&type=cs&l=92). In this way, the real world and GC are working (has to work) in a way that a reference of a root node is dereferenced at the very end, since that is the only way to make the behavior predictable. Even in an incremental GC, a root node should be dereferenced in the final incremental phase of the tree. If this observation is true, it would not be difficult to make the GC work so that a root node is dereferenced at the very end; i.e. given that the root node is guaranteed to be in the final incremental phase of the tree, the GC just needs to take care to dereference the root node at the very end in the final incremental phase. No sorting is needed. The GC just needs to skip root nodes at first, and then dereference the skipped root nodes at the very end.

In summary, assuming that "the semantics is not needed to make the real world work", it implies that we can control the GC so that the root node is dereferenced at the very end. If the root node is dereferenced at the very end, it hits the best case.
Comment 36 Geoffrey Garen 2012-06-28 11:00:45 PDT
Accommodating poor design in V8's garbage collector is a non-goal for the WebKit project. The WebKit project has a garbage collector that works very well with the DOM. To justify a huge and risky change to core WebKit code, you need to explain a benefit to the WebKit project, and explain why the benefit outweighs the cost.

A few further notes, based on the discussion here:

(1) I've described an algorithm that isn't O(n^2). Discussing the nuances of this O(n^2) algorithm is interesting but irrelevant until someone proves that there is no viable non-O(n^2) algorithm.

(2) Even if absolute worst-case behavior is rare, an occasional regression in average-case behavior (for example, causing "only" five or six extra traversals of some or all of the DOM during every document destruction) is a huge regression in core WebKit behavior.

(3) Since this patch retains "guardRef" behavior, it doesn't fully match the lifetime semantics of the DOM specification. I'm not even sure I'd be willing to add an O(n^2) algorithm to the core DOM just to match a corner case of the specification. I'm certainly not willing to do so without fully matching the specification.
Comment 37 Kentaro Hara 2012-06-29 06:36:10 PDT
Thanks ggaren.

(In reply to comment #36)
> Accommodating poor design in V8's garbage collector is a non-goal for the WebKit project. The WebKit project has a garbage collector that works very well with the DOM. To justify a huge and risky change to core WebKit code, you need to explain a benefit to the WebKit project, and explain why the benefit outweighs the cost.

The benefit is the predictability, as explained in the document. As far as I understand the webkit-dev@ thread, as darin, maciej and rniwa pointed out, it seems that people are thinking that the proposed semantics looks better than the current unpredictable semantics, as long as it does not regress performance at all.

So I guess that the problem of my approach would be that whether it regresses performance or not is still unclear, than that the benefit is unclear. For example, if I could invent an always-O(n) algorithm, then I hope people will accept it.

As pizlo pointed out, I need to investigate real world cases. In parallel, I am seeking for any alternative always-O(n) algorithm.

> (1) I've described an algorithm that isn't O(n^2). Discussing the nuances of this O(n^2) algorithm is interesting but irrelevant until someone proves that there is no viable non-O(n^2) algorithm.

I'm trying to optimize your algorithm, but let me ask a couple of questions.

You pointed out the following three issues in my first implementation:

(1) adds a call to fprintf inside ref() and deref(), probably making them ineligible for inlining

Will fix.

(2) does a test against "!m_parent" in rootDeref, which can be removed

Will fix.

(3) adds 2 data members to every node, increasing malloc cost and memory traffic

Do you have any idea to avoid adding the 2 data members; i.e. a pointer to root (m_root) and a reference count of the whole tree (i.e. m_rootRefCount)? First, it would be possible to remove m_root, but in that case we need to look up the root every time (O(log n) in average) or store m_root in rare data (which requires hashmap look-up). Second, how can we remove m_rootRefCount? It would be possible to store m_rootRefCount in rare data, but in that case, every ref()/deref() has to look up a hashmap to get the rare data.

> (2) Even if absolute worst-case behavior is rare, an occasional regression in average-case behavior (for example, causing "only" five or six extra traversals of some or all of the DOM during every document destruction) is a huge regression in core WebKit behavior.

Makes sense.

> (3) Since this patch retains "guardRef" behavior, it doesn't fully match the lifetime semantics of the DOM specification. I'm not even sure I'd be willing to add an O(n^2) algorithm to the core DOM just to match a corner case of the specification. I'm certainly not willing to do so without fully matching the specification.

Would you tell me where the spec is? (I want to read it, but couldn't find it...) Does the spec require that "If a document is referenced by ownerDocument() from any node, then all the nodes under the document must be kept alive"? Anyway I think the ownerDocument() problem is orthogonal with our current problem.
Comment 38 Alexey Proskuryakov 2012-06-29 08:06:19 PDT
There are other ways to achieve predictability besides keeping the whole tree alive, and we do implement these in some similar cases. What I mean is that parent pointers could be nulled out on detaching immediately, without waiting for GC. Then GC randomness would not be exposed to client code.
Comment 39 Kentaro Hara 2012-06-29 09:58:53 PDT
Thanks ap!

(In reply to comment #38)
> There are other ways to achieve predictability besides keeping the whole tree alive, and we do implement these in some similar cases. What I mean is that parent pointers could be nulled out on detaching immediately, without waiting for GC. Then GC randomness would not be exposed to client code.

GC is not the only guy who causes randomness. Please look at this section (https://docs.google.com/a/google.com/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1#heading=h.jb5c0dei7hff). The first example is due to GC randomness. The second randomness is due to randomness in a document parser. The third example is due to randomness in editing undo buffer.

"predictability" might not be a good word. Even if behavior were predictable, it would make no sense to programmers if the behavior is far from understandable. "predictability and understandability" would be a requirement. I think all of these details (i.e. the lifetime semantics of DOM nodes) are speced somewhere. (I hope that ggaren would know better.)
Comment 40 Kentaro Hara 2012-07-02 06:09:32 PDT
pizlo, ggaren: I revised the lifetime owner algorithm.

Document: https://docs.google.com/a/chromium.org/document/d/17ubK4aobm4G3c2FADqEoXAwSWbCIipXwzrGFJY_atoQ/edit

Two important observations explained in the document:

- We've believed that the current TreeShared algorithm is always O(N). The observation was wrong. The TreeShared algorithm is O(N) in most cases but O(N^2) in the worst case.

- The revised algorithm is _always_ O(N). In other words, the revised algorithm not only accomplishes the predictable lifetime semantics but also _always_ performs better than the current TreeShared algorithm. I hope:)

Comments are welcome.
Comment 41 Kentaro Hara 2012-07-10 00:03:27 PDT
(In reply to comment #31)
> > > One good starting point is to share that document when you're ready.
> > 
> > Will write up and share the document in a week or so.
> 
> Cool, thanks!

Wrote up a document: https://docs.google.com/a/google.com/document/d/1Ki8WrGIHipIjzz5rl3SG7SN-ulH9JgbcQAOA4Ja-hXs/edit

The document explains why the new lifetime semantics is needed for reclaiming DOM objects in a generational GC. The algorithm proposed in the document is generally applicable to generational GCs, not limited to V8's GC.

Let me reiterate the benefits of the new lifetime semantics:

(1) Performance. The current TreeShared algorithm is O(N) in most cases but O(N^2) in the worst case. On the other hand, the lifetime owner algorithm is always O(N). In other words, the lifetime owner algorithm not only accomplishes the new lifetime semantics but also performs better than the current TreeShared algorithm.

(2) Predictability for JavaScript programmers.

(3) More efficient V8's GC.
Comment 42 Kentaro Hara 2012-07-31 05:09:14 PDT
Created attachment 155499 [details]
patch for 'rough' review
Comment 43 Kentaro Hara 2012-07-31 05:25:14 PDT
ggaren, fpizlo: I uploaded a patch that implements the revised lifetime owner algorithm. Benefits and performance results are described in the ChangeLog. I would be grateful if you could tell me your high-level views on the approach. (You don't need to look into implementation details:-)

The patch is doing a lot of things at a breath to make the algorithm work correctly. If the approach looks acceptable overall, I'd like to break down the patch and land sub-patches one by one, starting from non-controversial ones.
Comment 44 Gustavo Noronha (kov) 2012-07-31 08:54:04 PDT
Comment on attachment 155499 [details]
patch for 'rough' review

Attachment 155499 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/13403174
Comment 45 Hajime Morrita 2012-07-31 18:39:02 PDT
Comment on attachment 155499 [details]
patch for 'rough' review

You could land RefPtr elimination part separately.
Comment 46 Geoffrey Garen 2012-08-02 15:25:03 PDT
Comment on attachment 155499 [details]
patch for 'rough' review

View in context: https://bugs.webkit.org/attachment.cgi?id=155499&action=review

> Source/WebCore/ChangeLog:68
> +        I confirmed that the performance regression of Dromaeo/dom-modify is
> +        the overhead of checkIfLifetimeOwner(node) in ChildNodeRemovalNotifier::notify().
> +        As far as I experimented, it is difficult to kill the 4.67% overhead.
> +        However, I found that we can dramatically optimize ChildNodeInsertionNotifier and
> +        ChildNodeRemovalNotifier by rewriting them in an iterative manner,
> +        which will completely offset the 4.67% overhead.

Looks like the first step here is to get your optimization for ChildNodeInsertionNotifier and ChildNodeRemovalNotifier reviewed and landed, and then re-baseline these performance measurements.
Comment 47 Kentaro Hara 2012-08-02 17:38:18 PDT
(In reply to comment #46)
> Looks like the first step here is to get your optimization for ChildNodeInsertionNotifier and ChildNodeRemovalNotifier reviewed and landed, and then re-baseline these performance measurements.

Thanks ggaren! Dromaeo/dom-modify is going to be optimized by 8.2% in bug 92965. (And I have another idea for further optimization.)
Comment 48 Kentaro Hara 2012-08-27 21:40:55 PDT
Created attachment 160895 [details]
Simplified the algorithm. Fixed crashes. Fixed memory leaks. But the perf regression of Dromaeo/dom-modify is still unacceptable. I'm working on the optimization.
Comment 49 Kentaro Hara 2012-09-03 19:02:21 PDT
Created attachment 161956 [details]
Fixed perf regression
Comment 50 Kentaro Hara 2012-09-03 19:17:28 PDT
Created attachment 161957 [details]
rebased patch
Comment 51 Kentaro Hara 2012-09-03 19:19:33 PDT
ggaren, pizlo: I've updated the patch. Now I think I was able to fix crashes, memory leaks, and perf regression. I had to change the design from the original version because of a bunch of problems:) I documented the final design in this document: https://docs.google.com/a/chromium.org/document/d/1HcaeU-75xukMYSUaskeDoHL-L0FNuCDIukvPmOheY8Q/edit

- I would appreciate your feedback for overall design and implementation.

- If you could review the patch, I'm super happy. But if you don't want to take a detailed look, I can ask other folks to review the details, after you say the overall approach looks OK.

Thanks
Comment 52 Kentaro Hara 2012-09-05 22:50:39 PDT
Just FYI, here is a result of real-world evaluation of the new V8's DOM GC:

https://docs.google.com/a/google.com/document/d/1uNyNSoNedyxsT9d_qqZXMW7ISLpycV7O4PrzSO56yWM/edit#

(If you are busy, you can just take a look at Fig.2 and Table.1.)
Comment 53 Adam Barth 2012-09-06 00:23:56 PDT
Comment on attachment 161957 [details]
rebased patch

@Kentaro: Is this ready for a serious review?  If so, I can spend some time looking at it tomorrow.
Comment 54 Kentaro Hara 2012-09-06 00:27:03 PDT
(In reply to comment #53)
> (From update of attachment 161957 [details])
> @Kentaro: Is this ready for a serious review?  If so, I can spend some time looking at it tomorrow.

Thanks abarth! Not yet. I think I need to hear opinions of pizlo or ggaren about the overall approach first.
Comment 55 Kentaro Hara 2012-09-06 00:48:21 PDT
Created attachment 162440 [details]
overall comments are appreciated before serious review
Comment 56 Filip Pizlo 2012-09-06 00:55:19 PDT
Comment on attachment 162440 [details]
overall comments are appreciated before serious review

View in context: https://bugs.webkit.org/attachment.cgi?id=162440&action=review

> Source/WebCore/ChangeLog:65
> +          (Safari/Mac)
> +          Dromaeo/dom-attr        5769 runs/s => 5656 runs/s   (-1.95%)
> +          Dromaeo/dom-modify      4935 runs/s => 4770 runs/s   (-3.34%)
> +          Dromaeo/dom-query     948941 runs/s => 954218 runs/s (+0.55%)
> +          Dromaeo/dom-traverse    3870 runs/s => 3810 runs/s   (-1.03%)
> +
> +        The -3.34% regression in dom-modify (Safari/Mac) is not acceptable,
> +        but I have already optimized dom-modify by 8.2% in another patch (bug 92965)
> +        to offset the performance regression.

I think we should eliminate this 3.3% regression.  That's great that you optimized it by 8% in another patch, but that's not a good reason to regress the performance.

Have you performed the following investigation:

1) What is the statistical significance of the regression?  Have you run it multiple (at least 10) times and computed the standard deviation or even better, confidence intervals?

2) Do you know where the regression is coming from?

3) What is the performance of the rest of Dromaeo, and of Dromaeo as a whole?

> Source/WebCore/dom/ContainerNodeAlgorithms.h:66
> +    void notify(Node*, bool);

I would strongly prefer an enum instead of a bool.  This bool is hard to follow!  I particularly prefer verbose enums in these cases:

enum InsertionPointDeletionState {
    InsertionPointDeletionHasBegun,
    InsertionPointDeletionHasNotBegun
}

Or similar.

> Source/WebCore/platform/TreeShared.h:198
> +    unsigned m_nonWrapperRefCount : 16; // Reference count by normal RefPtrs in WebKit.
> +    unsigned m_lifetimeOwner : 1; // Whether this Node is a lifetime owner or not.
> +    unsigned m_wrapperRefCount : 15; // Reference count by wrapper objects.

I don't feel comfortable with such small ref counts.  32768 referees feels small.  Do you have functionality to guard against overflow?
Comment 57 Filip Pizlo 2012-09-06 01:12:14 PDT
A couple of general comments:

1) Can you articulate what the general advantages of this approach are, outside of V8, and other than the semantic changes?  I don't think you're going to convince us that this patch is a good idea if it only benefits V8.  You're also not likely to get far with the semantic argument.  So.  Does this broadly help generational GC, or other GC barriers, for example?  Does it fix a known bug on a known website?  If this change cannot help WebKit, then it's hard to swallow.

2) You should fix the 3% regression in dom-modify.  Do you have ideas of where it is coming from?

3) Dromaeo is a really bad benchmark, and it's sad that it's the best one we have for DOM manipulation.  Can you look at the performance in JSC with this change for programs that create a lot of DOM things?  Octane's benchmarks that use typed arrays might be interesting.  You should test it.  But even that won't really answer what will happen on real websites.  So another way of approaching this is: can you construct a pathological, synthetic benchmark that makes your change look as bad as possible for either JSC or V8?  The point is not to use this benchmark as an argument against your patch, but rather, to give all of us a sense of what the worst-case would be.  You could also then produce a similar test that shows a pathological, synthetic speed-up from your change on JSC (if that is possible).  If we know what the worst and best cases are, then that could give us further confidence that your change is safe.  If constructing a worst case benchmark was impossible then that would make us feel quite uneasy about this.  There's nothing worse than tracking down horrid performance on a complex website, and it would be particularly sad if this change was the reason.  On the other hand, if we knew that even in the worst case this patch regresses things, by, say, 10%, then at least we know exactly what to expect.  Even better if the worst-case regression is in the single digits!

4) You should run the following other benchmarks as a sanity check: SunSpider, Kraken, RIABench (http://www.timo-ernst.net/misc/riabench-start/).  I will list more benchmarks if I think of any.  Although those benchmarks don't necessarily stress the DOM a lot, the way they are run does rely on the GC doing exactly the right things with the DOM.  For example if in SunSpider the GC didn't blow away the iframe contents from the previously run benchmark then you'd get disastrous slow-downs.  It's useful to run these tests to ensure that this does not happen.

Other concerns are listed in my quick review (comment 56).  I'm particularly concerned with the small number of bits given to the reference counts.  If wrap-around can be used to cause premature freeing to happen, and that wrap-around only requires 2^16 objects, then that's not really a sound design, unless I'm missing something.
Comment 58 Kentaro Hara 2012-09-06 01:12:48 PDT
Thanks pizlo!

(In reply to comment #56)
> I think we should eliminate this 3.3% regression.  That's great that you optimized it by 8% in another patch, but that's not a good reason to regress the performance.

OK, I'll invent some more optimization. However, it will be landed in a separate patch before landing this patch, because removing the 3.3% regression in this patch is difficult for the reason described below.
 
> Have you performed the following investigation:
> 
> 1) What is the statistical significance of the regression?  Have you run it multiple (at least 10) times and computed the standard deviation or even better, confidence intervals?

The results are the average of 10 runs. Will investigate standard deviation etc.

> 2) Do you know where the regression is coming from?

The 3.3% regression is due to a lot of if(!m_treeScopeDepth) checks in ChildNodeInsertionNotifier::notify() and ChildNodeRemovalNotifier::notify(). The if branches slows down the performance. But it is difficult to remove these if branches because they are critical for the algorithm.

On the other hand, I don't know the root cause of about 1% regressions in dom-attr and dom-traverse. Will investigate it.

> 3) What is the performance of the rest of Dromaeo, and of Dromaeo as a whole?

No observable regression. Other benchmarks in Dromaeo do not touch DOM.


> > Source/WebCore/dom/ContainerNodeAlgorithms.h:66
> > +    void notify(Node*, bool);
> 
> I would strongly prefer an enum instead of a bool.  This bool is hard to follow!  I particularly prefer verbose enums in these cases:
> 
> enum InsertionPointDeletionState {
>     InsertionPointDeletionHasBegun,
>     InsertionPointDeletionHasNotBegun
> }
> 
> Or similar.

Will fix.

> > Source/WebCore/platform/TreeShared.h:198
> > +    unsigned m_nonWrapperRefCount : 16; // Reference count by normal RefPtrs in WebKit.
> > +    unsigned m_lifetimeOwner : 1; // Whether this Node is a lifetime owner or not.
> > +    unsigned m_wrapperRefCount : 15; // Reference count by wrapper objects.
> 
> I don't feel comfortable with such small ref counts.  32768 referees feels small.

The rationale for the counts is explained in this section (https://docs.google.com/a/chromium.org/document/d/1HcaeU-75xukMYSUaskeDoHL-L0FNuCDIukvPmOheY8Q/edit#heading=h.cdgghoqthpgu).

> Do you have functionality to guard against overflow?

ASSERT() should be inserted, you're right. But it is difficult to insert an if(...) check because the if branch regresses performance.
Comment 59 Adam Barth 2012-09-06 01:21:58 PDT
> > > Source/WebCore/platform/TreeShared.h:198
> > > +    unsigned m_nonWrapperRefCount : 16; // Reference count by normal RefPtrs in WebKit.
> > > +    unsigned m_lifetimeOwner : 1; // Whether this Node is a lifetime owner or not.
> > > +    unsigned m_wrapperRefCount : 15; // Reference count by wrapper objects.
> > 
> > I don't feel comfortable with such small ref counts.  32768 referees feels small.
> 
> The rationale for the counts is explained in this section (https://docs.google.com/a/chromium.org/document/d/1HcaeU-75xukMYSUaskeDoHL-L0FNuCDIukvPmOheY8Q/edit#heading=h.cdgghoqthpgu).

The doc claims there isn't a programatic way to spam the RefCount, but why can't a web page just create 60k TreeWalker objects that hold a reference to the node via TreeWalker::m_current ?
Comment 60 Adam Barth 2012-09-06 01:26:05 PDT
var walkers = [];
for (var i=0; i < 60000; ++i)
    walkers.push(document.createTreeWalker(document.body));

I haven't looked in the debugger, but isn't the ref count on document.body now around 60k?
Comment 61 Kentaro Hara 2012-09-06 01:52:19 PDT
(In reply to comment #60)
> var walkers = [];
> for (var i=0; i < 60000; ++i)
>     walkers.push(document.createTreeWalker(document.body));
> 
> I haven't looked in the debugger, but isn't the ref count on document.body now around 60k?

I confirmed that it created 120k refcounts! So, that's a problem.

A possible solution would be falling back to a 32-bit refcount allocated in NodeRareData when the 16-bit refcount overflows (, which would be a super rare case). In order to check if the overflow happens or not, one if branch is needed in ref().
Comment 62 Kentaro Hara 2012-09-06 02:09:09 PDT
pizlo: Thank you very much for comments.

> 2) You should fix the 3% regression in dom-modify.
> 3) Dromaeo is a really bad benchmark, you should create a worst-case benchmark.
> 5) I'm particularly concerned with the small number of bits given to the reference counts.

As a next step, I will focus on fixing the above three issues.


> 4) You should run the following other benchmarks as a sanity check: SunSpider, Kraken, RIABench

Will do. But I think (I hope) that there is no observable regression because they do not touch DOM.


> 1) Can you articulate what the general advantages of this approach are, outside of V8, and other than the semantic changes?  I don't think you're going to convince us that this patch is a good idea if it only benefits V8.  You're also not likely to get far with the semantic argument.  So.  Does this broadly help generational GC, or other GC barriers, for example?  Does it fix a known bug on a known website?  If this change cannot help WebKit, then it's hard to swallow.

- A big performance & memory improvement in a generational GC for DOM objects (https://docs.google.com/a/google.com/document/d/1uNyNSoNedyxsT9d_qqZXMW7ISLpycV7O4PrzSO56yWM/edit). The GC algorithm I've been proposing is broadly applicable to generational GCs. If JSC adopted a generational GC, JSC will be also benefited.

- There is a known bug about the semantics (http://code.google.com/p/chromium/issues/detail?id=143478&q=label%3AWebKit-ID-88834&colspec=ID%20Pri%20Mstone%20ReleaseBlock%20OS%20Area%20Feature%20Status%20Owner%20Summary)

- That being said, I know that the above two items won't convince you. If you (and other port contributors) do not wish this change, it might be possible to enable the semantics only in V8 by surrounding #if USE(V8) or moving most of the code to V8 specific directory.


Actually, IMHO, if we could implement the semantics without any perf regression, the semantics is better than the current incomprehensible semantics. So I'm thinking that the biggest problem is how I could fix all the perf regression.
Comment 63 Geoffrey Garen 2012-09-06 11:20:59 PDT
I broadly agree with the direction this review is taking. 

One comment about performance testing: We need some benchmark that shows average case and worse case cost of

    (a) full document tear-down

    (b) small fragment tear-down

The micro-benchmarks you're measuring cover (b) but not (a).

Also, please report benchmark results for JSC. We can't use V8 results as a baseline because, as your data shows, the V8 implementation of DOM GC is slow, so it sets a too-low bar for performance.

> - If you (and other port contributors) do not wish this change, it might be possible to enable the semantics only in V8 by surrounding #if USE(V8) or moving most of the code to V8 specific directory.

No. We simply can't maintain two #ifdef'd implementations of an algorithm that is both complex and central to the engine. That would seriously undermine our hackability, performance, security, and compatibility goals.
Comment 64 Geoffrey Garen 2012-09-06 11:26:25 PDT
> I confirmed that it created 120k refcounts! So, that's a problem.

Marking r- based on this.

(The other comments here may or may not be reasons for r-, depending on the outcome of some investigations.)
Comment 65 Kentaro Hara 2012-09-24 22:46:03 PDT
Created attachment 165523 [details]
Pathological test case 1

I'll upload the performance result in a day (with performance results for a bunch of benchmarks in both JSC and V8).
Comment 66 Kentaro Hara 2012-09-24 22:48:02 PDT
Created attachment 165524 [details]
Pathological test case 2

I'll upload the performance result in a day (with performance results for a bunch of benchmarks in both JSC and V8).
Comment 67 Kentaro Hara 2012-09-27 23:19:53 PDT
Created attachment 166147 [details]
Patch
Comment 68 Kentaro Hara 2012-09-27 23:21:27 PDT
fpizlo, ggaren: Uploaded a revised patch. Investigated a bunch of benchmarks. Clarified a couple of items pointed out by reviewers.

Benchmark results: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AlobCOyvTnPKdDJiZXh5X0tWeVZXMnQ0ZF9YUHZGdXc#gid=0

Document for more details: https://docs.google.com/a/chromium.org/document/d/1cGPSSWNENS891GuhvYTU7pKsncWUwztoVhGNe0YVlRk/edit

The summary of the document:

  - For Dromaeo, Octane, Kraken and Sunspider, no performance regression.
  - For a lot of tests of Dromaeo JQuery and Dromaeo Prototype.js, 10 ~ 30% performance improvement in V8.
  - For pathological test cases, ~3.1% performance regression in JSC and ~5.8% performance regression in V8.

The document also explains the following points:

  - How to manage a wrapper reference count and a normal reference count using only one 32 bit integer
  - The real cause of the ~3.3% performance regression we had observed
  - Pathological test cases that will be most affected by this change

===========

If you have more benchmark suggestions, I’m happy to investigate. Also I am seeking ideas for simplifying the implementation of TreeShared.h. The current TreeShared.h is a bit difficult to read, in order to have compilers generate fully optimized assembly (See the “Topic 2” of the document for more details).

Thanks!
Comment 69 Maciej Stachowiak 2012-10-02 00:09:32 PDT
Benchmark data looks like this doesn't hurt most DOM benchmarks. Kudos for that. However...

(In reply to comment #68)
> fpizlo, ggaren: Uploaded a revised patch. Investigated a bunch of benchmarks. Clarified a couple of items pointed out by reviewers.
> 
> Benchmark results: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AlobCOyvTnPKdDJiZXh5X0tWeVZXMnQ0ZF9YUHZGdXc#gid=0
> 

Looks like the benchmark data doesn't include a page load speed benchmark or results of testing on popular sites, as requested by Filip in comment #25


> Document for more details: https://docs.google.com/a/chromium.org/document/d/1cGPSSWNENS891GuhvYTU7pKsncWUwztoVhGNe0YVlRk/edit

According to this document, this approach can cause memory leaks in some cases. That seems like a very poor characteristic for a reference counting algorithm. Any explanation for why it's ok to leak sometimes?
Comment 70 Kentaro Hara 2012-10-02 00:30:26 PDT
Thank you very much maciej.

(In reply to comment #69)
> > Benchmark results: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AlobCOyvTnPKdDJiZXh5X0tWeVZXMnQ0ZF9YUHZGdXc#gid=0
> > 
> 
> Looks like the benchmark data doesn't include a page load speed benchmark or results of testing on popular sites, as requested by Filip in comment #25

Do you have any idea for testing that in a reliable way?

To obtain reliable results, I guess that we need to test it using some micro benchmarks. In that sense, pathological micro benchmarks for page loading would be helpful. The pathological test case (https://bugs.webkit.org/attachment.cgi?id=165523) would be one good test case. Another test case would be innerHTML="a_big_DOM_tree", which is tested in dom-modify.

If you have more benchmark suggestions, I'd be happy to test them.

> > Document for more details: https://docs.google.com/a/chromium.org/document/d/1cGPSSWNENS891GuhvYTU7pKsncWUwztoVhGNe0YVlRk/edit
> 
> According to this document, this approach can cause memory leaks in some cases. That seems like a very poor characteristic for a reference counting algorithm. Any explanation for why it's ok to leak sometimes?

To be clear: The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. I believe that at least such a situation won't happen practically. If it happened (I don't think it happens), it just leaks memory.

(If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.)
Comment 71 Adam Barth 2012-10-02 09:49:28 PDT
> Do you have any idea for testing that in a reliable way?

We run page load tests (PLT) continuously on trunk.  The easiest way to get those numbers is to land the patch and watch the graphs.  Getting reliable numbers prior to landing is possible as well, but trickier.

Kentaro, if you find me on chat I can explain how to do it.  It requires access to non-public test data, which you should have.
Comment 72 Darin Adler 2012-10-02 10:07:05 PDT
(In reply to comment #70)
> The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation.

I want to understand what this means. Did you try to think of pathological cases that could lead to this situation? Can we just create a test that sets 100,000 properties on a DOM node all pointing to its parent node?
Comment 73 Geoffrey Garen 2012-10-02 10:51:08 PDT
> The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X".

Like Darin, I'd like to understand this statement more. However, if the plain meaning here is true, this is obviously a show-stopper, and cause for an r-.
Comment 74 Maciej Stachowiak 2012-10-02 11:23:58 PDT
(In reply to comment #71)
> > Do you have any idea for testing that in a reliable way?
> 
> We run page load tests (PLT) continuously on trunk.  The easiest way to get those numbers is to land the patch and watch the graphs.  Getting reliable numbers prior to landing is possible as well, but trickier.
> 
> Kentaro, if you find me on chat I can explain how to do it.  It requires access to non-public test data, which you should have.

Given the performance risk of this change, and the specific request from Filip, it seems like testing before landing is the right thing to do.

(In reply to comment #70)
> 
> To be clear: The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. I believe that at least such a situation won't happen practically. If it happened (I don't think it happens), it just leaks memory.

Wouldn't something like Adam's test case from comment #60 trigger this leak? Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows?

> (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.)

The current normal refcounting algorithm doesn't have this issue. So this seems like a showstopper to me.
Comment 75 Maciej Stachowiak 2012-10-02 17:59:31 PDT
(In reply to comment #74)
> 
> Wouldn't something like Adam's test case from comment #60 trigger this leak? Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows?
> 

I meant to say, node with more than 65535 *child nodes*.
Comment 76 Kentaro Hara 2012-10-02 18:09:47 PDT
darin:
> > The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation.
> 
> I want to understand what this means. Did you try to think of pathological cases that could lead to this situation? Can we just create a test that sets 100,000 properties on a DOM node all pointing to its parent node?

maciej:
> Wouldn't something like Adam's test case from comment #60 trigger this leak?

These two cases don't cause the memory leak. First of all, the memory leak does not happen unless there is "refcount synonym". In other words, the memory leak happens only when we misinterpret the refcount value. Specifically, the memory leak can happen in a case where:

- the refcount is 65538.
- the actual situation is "0 wrapper refcount + 65538 normal refcounts".
- but we interpret it as "1 wrapper refcount + 2 normal refcounts".

In the test cases you suggested, the guys who keep 65535 refcounts are wrapper objects. Thus, the test cases do not cause the misinterpretation. The misinterpretation can happen only when there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X" _without_passing_any_wrapper_object_.

maciej:
> Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows?

Given that tree edges are not refcounted, this won't cause the memory leak.


maciej:
> > (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.)
> 
> The current normal refcounting algorithm doesn't have this issue.

But the current normal refcounting algorithm causes the wrong DOM lifetime semantics, which is the problem I want to fix. We want a refcounting algorithm that can count up a wrapper refcount and a normal refcount separately using only one 32-bit integer:)


(In reply to comment #74)
> (In reply to comment #71)
> > > Do you have any idea for testing that in a reliable way?
> > 
> > We run page load tests (PLT) continuously on trunk.  The easiest way to get those numbers is to land the patch and watch the graphs.  Getting reliable numbers prior to landing is possible as well, but trickier.
> > 
> > Kentaro, if you find me on chat I can explain how to do it.  It requires access to non-public test data, which you should have.
> 
> Given the performance risk of this change, and the specific request from Filip, it seems like testing before landing is the right thing to do.

Sure! Let me do the page load tests after we can reach a consensus on the refcounting problem.
Comment 77 Maciej Stachowiak 2012-10-02 20:53:06 PDT
(In reply to comment #76)
> darin:
> > > The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation.
> > 
> > I want to understand what this means. Did you try to think of pathological cases that could lead to this situation? Can we just create a test that sets 100,000 properties on a DOM node all pointing to its parent node?
> 
> maciej:
> > Wouldn't something like Adam's test case from comment #60 trigger this leak?
> 
> These two cases don't cause the memory leak. First of all, the memory leak does not happen unless there is "refcount synonym". In other words, the memory leak happens only when we misinterpret the refcount value. Specifically, the memory leak can happen in a case where:
> 
> - the refcount is 65538.
> - the actual situation is "0 wrapper refcount + 65538 normal refcounts".
> - but we interpret it as "1 wrapper refcount + 2 normal refcounts".
> 
> In the test cases you suggested, the guys who keep 65535 refcounts are wrapper objects. Thus, the test cases do not cause the misinterpretation. The misinterpretation can happen only when there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X" _without_passing_any_wrapper_object_.

That does not sound right to me. In Adam's test case, the non-wrapper TreeWalker holds a reference to the Node. TreeWalkers can have their own separate lifetimes and are not part of a DOM tree, so they need to each separately ref their node.

Further, your document says the wrapper refcount is only for separate wrappers in different isolated worlds. That is why you say it is ok for it to only be 15 bits. If any wrapper affected the wrapper refcount instead of the normal refcount, not just wrappers for the same object in different isolated worlds, then you would trivially be able to overflow the 15 bit wrapper refcount. If Adam's test case used the wrapper refcount instead of the normal refcount, it would itself overflow 15 bits. And I don't see how it could maintain correctness without using either refcount.

> 
> maciej:
> > Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows?
> 
> Given that tree edges are not refcounted, this won't cause the memory leak.

What does create non-wrapper references, then, if it's neither wrappers nor tree edges?


> 
> 
> maciej:
> > > (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.)
> > 
> > The current normal refcounting algorithm doesn't have this issue.
> 
> But the current normal refcounting algorithm causes the wrong DOM lifetime semantics, which is the problem I want to fix. We want a refcounting algorithm that can count up a wrapper refcount and a normal refcount separately using only one 32-bit integer:)

While it would be nice to fix that, a memory leak is worse than the problem you are trying to fix.

> > 
> > Given the performance risk of this change, and the specific request from Filip, it seems like testing before landing is the right thing to do.
> 
> Sure! Let me do the page load tests after we can reach a consensus on the refcounting problem.

I don't think you are going to get consensus that leaking memory is acceptable. Unless it's provably the case that the leak is impossible. But it doesn't sound like it to me.
Comment 78 Adam Barth 2012-10-02 23:42:29 PDT
othermaciej and I had an extensive discussion of this bug on #webkit.  Here's an attempt at a summary:

The key thing to understand is that there are four kinds of references to DOM nodes:

1) Edges of the DOM tree itself
2) References from JS wrappers in various isolated worlds
3) References within the DOM tree (i.e., from one node to another in C++)
4) References from outside the DOM tree

The patch can run into trouble in the following cases:

A) There are more than 2^15 references of type (2).  This can occur only when there are that many isolated wolds.  othermaciej and I agree that this is a reasonable API limit, but we should probably enforce that limit.

B) There are more than 2^15 references of type (2) that point to a single node.  If this occurs, we will leak the tree for much the same reason we leak the tree today if a node holds a reference to one of its ancestors.  Specifically, we'll think that there is a wrapper pointing to the popular node and therefore retain the tree.  However, retaining the tree will cause us to maintaing the 2^15 references to the node.

The patch does not run into trouble if, as discussed above, there are >2^15 references of type (4).  The behavior is different than before the patch, but the difference in behavior is both more correct and does not cause a memory leak.  Specifically, if there are >2^15 references of type (4) to a single node, we'll retain the tree until there are fewer than 2^15 references to that node.

Given the forgoing, othermaciej's remaining concerns are in regards to case (B).  Specifically, he's concerned about whether (B) can occur.  If (B) can occur with realistic content, he would view that as a huge showstopper concern.  In addition, he would view any leak that occurs with this patch but not without as a major concern.  Leaks that occur without the patch but do not occur with the patch are progressions, but he's troubled by trading one set of leaks for another.

We looked around a bit to see if we could find any examples of case (B) by inspection.  Instead, we found a leak that can occur today due to an obscure violation of the "don't ref your ancestor" rule.  (I filed that issue as Bug 98234.)

My view differs a bit from othermaciej's in this regard.  In my view, if the leaks don't occur in practice but instead only occur in security-exploit-style situations, then we should fix these obscure leaks whether they're caused by (B) or by violations of "don't ref your parent."  However, for both sorts of leaks, it's likely more important to fix crash bugs that are hit by users in the wild.
Comment 79 Maciej Stachowiak 2012-10-03 01:49:41 PDT
(In reply to comment #78)

> 
> We looked around a bit to see if we could find any examples of case (B) by inspection.  Instead, we found a leak that can occur today due to an obscure violation of the "don't ref your ancestor" rule.  (I filed that issue as Bug 98234.)

I don't know if that's totally accurate. I spent maybe 5 minutes looking for examples of case (B) and I think what I found would be likely to turn out to be such an example on closer examination (in addition to having a potential leak with the current code base under different circumstances). I'm not going to do a full audit myself because I believe it's up to the advocates of the patch to show that it is safe from the identified leak hazard.

> 
> My view differs a bit from othermaciej's in this regard.  In my view, if the leaks don't occur in practice but instead only occur in security-exploit-style situations, then we should fix these obscure leaks whether they're caused by (B) or by violations of "don't ref your parent."  However, for both sorts of leaks, it's likely more important to fix crash bugs that are hit by users in the wild.

We need to consider the change of programming disciplines as well as the replacement of one set of bugs with another.

Is there a rule for correct use of the proposed new refcounting model that is as easy to understand as "don't ref your ancestors", or alternately mechanically enforceable at compile time? The best I can think of is "don't ref any node in a DOM subtree 2^16 or more times from any combination of other nodes in that subtree", and I would have no idea how to try to follow that rule, or consider it in patch review. Perhaps there is a better rule that is easier to follow consistently than "don't ref your ancestors" but I do not know what it is.
Comment 80 Adam Barth 2012-10-03 02:16:44 PDT
Comment on attachment 166147 [details]
Patch

I'd like to thank everyone, especially Maciej, for taking the time to understand this patch and for providing constructive feedback.  I'm marking this patch r- because it looks like we'll need to iterate on the approach some more.
Comment 81 Geoffrey Garen 2012-10-04 21:08:18 PDT
Created attachment 167240 [details]
Patch
Comment 82 Geoffrey Garen 2012-10-04 21:11:57 PDT
Maciej asked me if there was a way to fix this in WebKit, and I came up with this.

I think this is a better solution than rewriting referenced counting in the DOM because this is simple and efficient, and it doesn't introduce security bugs or leaks.
Comment 83 Kentaro Hara 2012-10-04 21:17:38 PDT
Comment on attachment 167240 [details]
Patch

Would you add the same logic to the V8 port? (Or please skip the test in Chromium. I'll add the same logic later.)

I'd like to r+ it if there is no objection. Although this solution is not helpful for reclaiming DOM objects in minor GC cycles, we should anyway fix the current wrong semantics.
Comment 84 Maciej Stachowiak 2012-10-04 21:24:11 PDT
(In reply to comment #82)
> Maciej asked me if there was a way to fix this in WebKit, and I came up with this.
> 
> I think this is a better solution than rewriting referenced counting in the DOM because this is simple and efficient, and it doesn't introduce security bugs or leaks.

Did you performance test to ensure this doesn't cause a regression, and if so what tests did you run?
Comment 85 Geoffrey Garen 2012-10-04 21:43:39 PDT
> Did you performance test to ensure this doesn't cause a regression, and if so what tests did you run?

I was concerned that this edge case wouldn't show up in normal performance testing, so I ran these special-case tests:

(*) A stress test of removing subtrees from the DOM via .innerHTML:

		for (var i = 0; i < 5000; ++i) {
			p..innerHTML = "";
			p.appendChild(div);
		}

    => No regression. (Cost dominated by innerHTML.)

(*) A stress test of removing subtrees from the DOM via DOM APIs:

		for (var i = 0; i < 5000; ++i) {
			p.removeChild(div);
			p.appendChild(div);
		}

    => No regression. (More efficient than innerHTML. Cost still dominated by DOM tree maintenance.)

(*) A stress test of cycling subtrees through the DOM to make sure they didn't leak:

                Same as above, but new nodes each time instead of recycled nodes.

    => No leak.

(*) Kentaro's pathological test cases, attached to this bug

    => No regression.

(*) Visited facebook.com, google.com, gmail.com with a breakpoint on the slow path of willCreatePossiblyOrphanedTreeByRemoval()

    => Called ~200 times (compared to ~400,000 total JS allocations).
Comment 86 Maciej Stachowiak 2012-10-04 22:10:06 PDT
That testing sounds great. I think it would also be worthwhile to verify that the edge case does not in fact show up in normal perf testing. Does that seem reasonable, or is the risk of regressing something like Dromaeo or a page load speed benchmark too low to worry about it?

(In reply to comment #85)
> > Did you performance test to ensure this doesn't cause a regression, and if so what tests did you run?
> 
> I was concerned that this edge case wouldn't show up in normal performance testing, so I ran these special-case tests:
> 
> (*) A stress test of removing subtrees from the DOM via .innerHTML:
> 
>         for (var i = 0; i < 5000; ++i) {
>             p..innerHTML = "";
>             p.appendChild(div);
>         }
> 
>     => No regression. (Cost dominated by innerHTML.)
> 
> (*) A stress test of removing subtrees from the DOM via DOM APIs:
> 
>         for (var i = 0; i < 5000; ++i) {
>             p.removeChild(div);
>             p.appendChild(div);
>         }
> 
>     => No regression. (More efficient than innerHTML. Cost still dominated by DOM tree maintenance.)
> 
> (*) A stress test of cycling subtrees through the DOM to make sure they didn't leak:
> 
>                 Same as above, but new nodes each time instead of recycled nodes.
> 
>     => No leak.
> 
> (*) Kentaro's pathological test cases, attached to this bug
> 
>     => No regression.
> 
> (*) Visited facebook.com, google.com, gmail.com with a breakpoint on the slow path of willCreatePossiblyOrphanedTreeByRemoval()
> 
>     => Called ~200 times (compared to ~400,000 total JS allocations).
Comment 87 Geoffrey Garen 2012-10-04 23:01:55 PDT
> That testing sounds great. I think it would also be worthwhile to verify that the edge case does not in fact show up in normal perf testing. Does that seem reasonable, or is the risk of regressing something like Dromaeo or a page load speed benchmark too low to worry about it?

I ran the Dromaeo DOM core tests:

    NO PATCH: 1496.95runs/s
    PATCH: 1502.92runs/s

I don't think page load speed is a useful benchmark for this patch because navigation doesn't trigger this function. (I double-checked this on the URLs I mentioned.) Only direct manipulation via DOM APIs triggers this function.
Comment 88 Geoffrey Garen 2012-10-04 23:05:28 PDT
> Would you add the same logic to the V8 port? (Or please skip the test in Chromium. I'll add the same logic later.)

Honestly, TestExpectations has gotten so complicated that I don't feel comfortable adding to it.

If you can supply an equivalent toV8 line for the "toJS(...)" line @ JSNodeCustom.h:92, I'd be happy to land it with this patch.
Comment 89 WebKit Review Bot 2012-10-04 23:11:22 PDT
Comment on attachment 167240 [details]
Patch

Attachment 167240 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14171581

New failing tests:
fast/dom/gc-5.html
fast/dom/gc-12.html
fast/dom/gc-dom-tree-lifetime.html
fast/dom/gc-3.html
Comment 90 Kentaro Hara 2012-10-04 23:19:41 PDT
(In reply to comment #88)
> > Would you add the same logic to the V8 port? (Or please skip the test in Chromium. I'll add the same logic later.)
> 
> Honestly, TestExpectations has gotten so complicated that I don't feel comfortable adding to it.
> 
> If you can supply an equivalent toV8 line for the "toJS(...)" line @ JSNodeCustom.h:92, I'd be happy to land it with this patch.

OK, I'll do it in hours.
Comment 91 Mark Rowe (bdash) 2012-10-04 23:37:46 PDT
Comment on attachment 167240 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=167240&action=review

> Source/WebCore/dom/ContainerNode.cpp:39
> +#if USE(JSC)
> +#include "JSNode.h"
> +#endif

Can you please move this outside of the sorted list of #include statements? #if'd includes are generally put in a separate paragraph from the normal #includes.
Comment 92 Gavin Barraclough 2012-10-04 23:39:01 PDT
Comment on attachment 167240 [details]
Patch

r+ to this patch as is for JSC, pending adding Kentaro's changes to support V8.
Looking at the test cases, I see test coverage (in gc-12) where a wrapper is created on a node in a disconnected subtree, ensuring that the parent remains alive. I think we discussed at least one more possible route to this problem, where a singleton disconnected node could be given children and became a root. Is this a real problem, is it possible to write a test case, and is it covered by one of these test cases?
Comment 93 WebKit Review Bot 2012-10-05 00:02:33 PDT
Comment on attachment 167240 [details]
Patch

Attachment 167240 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/14171604

New failing tests:
fast/dom/gc-5.html
fast/dom/gc-12.html
fast/dom/gc-dom-tree-lifetime.html
fast/dom/gc-3.html
Comment 94 Kentaro Hara 2012-10-05 00:20:50 PDT
Created attachment 167267 [details]
With V8 side fix

As V8 does not implement [CustomHeader], we cannot add willCreatePossiblyOrphanedTreeByRemoval() to V8NodeCustom.h. So I added it to V8DOMWrapper.cpp.

I'm investigating performance impacts on Chromium.
Comment 95 Kentaro Hara 2012-10-05 00:48:07 PDT
(In reply to comment #87)
> I ran the Dromaeo DOM core tests:
> 
>     NO PATCH: 1496.95runs/s
>     PATCH: 1502.92runs/s

Is this a total score of Dromaeo core tests? The total score is strongly affected by dom-query, and thus the score is not so meaningful. Would you check the performance of each Dromaeo DOM tests in Safari again just in case?


I observed 6.5% regression in Dromaeo/dom-modify in Chromium.

[no patch]
dom-modify= 4182.68393722 runs/s
dom-modify= 4258.18237669 runs/s
dom-modify= 4279.1751288 runs/s
dom-modify= 4270.84756712 runs/s
dom-modify= 4285.91877381 runs/s

[with patch]
dom-modify= 3988.47523316 runs/s
dom-modify= 4022.2093293 runs/s
dom-modify= 4024.93821216 runs/s
dom-modify= 3970.77244907 runs/s
dom-modify= 3970.77244907 runs/s

Even though I inlined willCreatePossiblyOrphanedTreeByRemoval(), I couldn't kill the perf regression.
Comment 96 Geoffrey Garen 2012-10-05 09:18:26 PDT
> a singleton disconnected node could be given children and became a root

I'll add a test case for this to gc-12.html.

I realized after our conversation that this case does not require special treatment in code. It is not possible to modify a singleton node without that node already having a JavaScript reference (i.e., the reference you are using in the modifying expression).

A general way to say this is that the DOM only has API for implicitly creating an unreferenced parent due to removal (.innerHTML, editing, etc.); there's no API for implicitly creating an unreferenced parent due to insertion, akin to .ancestorHTML. (An older version of this patch included willCreatePossiblyOrphanedTreeByInsertion, to match willCreatePossiblyOrphanedTreeByRemoval. When I tried to write a test for it, I realized there's no such thing.)
Comment 97 Geoffrey Garen 2012-10-05 14:18:11 PDT
> Can you please move this outside of the sorted list of #include statements?

OK.
Comment 98 Geoffrey Garen 2012-10-05 15:25:27 PDT
> Is this a total score of Dromaeo core tests?

Yes.

> I observed 6.5% regression in Dromaeo/dom-modify in Chromium.

I re-ran dom-modify @ WebKit r130453, using dromaeo.com and PerformanceTests/Dromaeo/dom-modify.html.

dromaeo.com:

NO PATCH 431.34runs/s ±2.05%
PATCH 440.11runs/s ±2.57%

NOPATCH avg 3842.9124664019946 runs/s ±77
PATCH avg 3974.452005804083 runs/s ±74

Neither result seems to show a statistically significant change.
Comment 99 Geoffrey Garen 2012-10-05 15:33:35 PDT
Kentaro, would you like me to land the V8 code or leave it out?
Comment 100 Geoffrey Garen 2012-10-05 15:35:14 PDT
Created attachment 167397 [details]
patch + Gavin's and Mark's suggested edits - V8 code
Comment 101 Kentaro Hara 2012-10-05 17:34:29 PDT
(In reply to comment #99)
> Kentaro, would you like me to land the V8 code or leave it out?

I'd like to investigate the perf regression observed in the V8 side change. I'll do it next week. So please land the JSC side patch first.
Comment 102 Geoffrey Garen 2012-10-06 11:27:39 PDT
Committed r130584: <http://trac.webkit.org/changeset/130584>
Comment 103 Darin Adler 2012-10-06 11:34:48 PDT
Comment on attachment 167397 [details]
patch + Gavin's and Mark's suggested edits - V8 code

View in context: https://bugs.webkit.org/attachment.cgi?id=167397&action=review

> Source/WebCore/bindings/js/JSNodeCustom.h:76
> +// the C++ DOM, we ensure that the root of every tree has a JavaScript
> +// wrapper.

I would have moved that orphan word to the previous line.

> Source/WebCore/bindings/js/JSNodeCustom.h:86
> +    if (!root->isContainerNode())
> +        return;
> +
> +    if (!toContainerNode(root)->hasChildNodes())
> +        return;

You could combine both of these and just say:

    if (!root->hasChildNodes())
        return;

I believe you’d generate essentially the identical code by doing that. Is there some advantage to the longer form?
Comment 104 Geoffrey Garen 2012-10-06 11:48:26 PDT
> I would have moved that orphan word to the previous line.

OK.

> You could combine both of these and just say:
> 
>     if (!root->hasChildNodes())
>         return;

Aha! I'll do this -- I didn't notice that Node had this interface.
Comment 105 Geoffrey Garen 2012-10-06 11:50:51 PDT
Committed r130587: <http://trac.webkit.org/changeset/130587>
Comment 106 Geoffrey Garen 2012-10-07 15:57:12 PDT
ASSERT fix.

Committed r130611: <http://trac.webkit.org/changeset/130611>
Comment 107 Kentaro Hara 2012-10-08 17:52:22 PDT
Hmmm, I observe 6% perf regression in Dromaeo/dom-modify with the V8 side fix. I'll land a couple of patches that optimizes DOM mutations to offset the perf regression, before landing the V8 side fix.
Comment 108 Kentaro Hara 2012-10-08 23:37:55 PDT
Now the correct DOM lifetime semantics was implemented in JSC. The semantics will be also soon implemented in V8, once I can solve a couple of perf problems. 

On the other hand, the original objective of this bug was to fix V8's GC so that it can collect DOM objects in minor GC cycles. If you're interested in the V8's GC issue, please follow bug 98725. In bug 98725, we are trying to fix V8's GC by another algorithm that affects V8 and V8 binding only, with little change in WebCore/dom.
Comment 109 Kentaro Hara 2012-12-22 03:25:43 PST
Re-opening this bug since it is not yet fixed in V8. Performance regressions have been preventing me from landing the fix to V8.
Comment 110 Ryosuke Niwa 2018-09-08 01:50:03 PDT
This has been fixed in JSC and V8 binding is no longer a thing in WebKit.
Comment 111 Radar WebKit Bug Importer 2018-09-08 01:51:26 PDT
<rdar://problem/44254672>