88834 – If Node X is reachable from JavaScript, all Nodes in the same tree should be kept alive

Kentaro Hara

Reported 2012-06-11 20:34:02 PDT

The current reference-count algorithm of Node objects does not guarantee "If Node X has a ref-count, all the Nodes in the tree which X belongs to are kept alive". Look at the following example: div = document.createElement("div"); document.body.appendChild(div); div.innerHTML = '<div><div><div><span id="span"><br><br><br>text</span></div></div></div>'; span = document.getElementById("span"); div.innerHTML = ""; alert(span); // <span> alert(span.parentNode); // ??? alert(span.firstChild); // ??? What is the expected behavior?: (a) span.parentNode = <div>, span.firstChild = <br> (b) span.parentNode = null, span.firstChild = <br> (c) span.parentNode = <div>, span.firstChild = null (d) span.parentNode = null, span.firstChild = null (e) Any value is OK (i.e. the behavior is UNDEFINED) Behavior of browsers: Safari 5.1.7: (b) Chrome 20.0: (b) Firefox 12.0: (a) Opera 11.64: (a) IE 9: (d) We want to accomplish the behavior (a) without any performance regression and without any additional member variable in a Node object.

Kentaro Hara

Comment 1 2012-06-11 20:36:35 PDT

Created attachment 146998 [details] Performance tests This is a performance test that would be most affected by the overhead of the reference count algorithm.

Kentaro Hara

Comment 2 2012-06-11 20:39:08 PDT

Created attachment 146999 [details] Patch

Kentaro Hara

Comment 3 2012-06-11 22:05:29 PDT

The idea I am implementing: (1) No change in ref(), deref() and setParent(). (2) If the ref-count of a root node becomes 0, we call removedLastRef() for the root node. (3) removedLastRef() calls a slightly modified version of ContainerNode::removeAllChildren(). (4) The modified version of ContainerNode::removeAllChildren() traverses child nodes, recording all the child nodes traversed. (4-1) If all the child nodes in the DOM tree have 0 ref-count, we call destructors of all the recorded nodes (i.e. all the nodes in the DOM tree). That's it. (4-2) If any child node X has 1~ ref-count, we stop the traversal and set a "lifetime-owner-bit" on the node X. No destructor is called in this case. The "lifetime-owner-bit" indicates that the node X keeps the lifetime of the DOM tree which X belongs to. (5) When the ref-count of the node X becomes 0, we call removedLastRef() for the root node of the node X. Goes to (3). (6) If the node X is going to be removed from the DOM tree, we call removedLastRef() for the root node of the DOM tree. Goes to (3). This step is needed to delegate the "lifetime-owner-bit" to someone (or to destruct all the nodes if all the nodes in the DOM tree have 0 ref-count). The "lifetime-owner-bit" can be implemented by using the most significant bit of TreeShared::m_refCount. In terms of implementation, by initially setting "lifetime-owner-bit" on the root node, the implementation of TreeShared gets clearer (because the initial lifetime owner of the DOM tree is the root node). I guess that this approach would be better in the following points: - No overhead in ref(), deref() and setParent(). - Overhead happens when there is a DOM tree that has 1~ ref-count on other node than the root node. But this would be practically a rare case. The patch is coming soon...

Geoffrey Garen

Comment 4 2012-06-11 22:40:04 PDT

(In reply to comment #3) > (5) When the ref-count of the node X becomes 0, we call removedLastRef() for the root node of the node X. Goes to (3). This algorithm is O(n^2) when the garbage collector frees a whole document, unless the garbage collector is lucky enough to deref the document node last. That's a show-stopper. Also, "when the ref-count of the node X becomes 0" implies a change to deref(), which seems to contradict your stated goals.

Kentaro Hara

Comment 5 2012-06-11 23:07:28 PDT

Thanks ggaren. (In reply to comment #4) > (In reply to comment #3) > > (5) When the ref-count of the node X becomes 0, we call removedLastRef() for the root node of the node X. Goes to (3). > > This algorithm is O(n^2) when the garbage collector frees a whole document, unless the garbage collector is lucky enough to deref the document node last. That's a show-stopper. Accurately, O(n^2) situation can potentially happen, but it would be pretty much artificial situation. The condition in which the worst cases happen is - not "unless the garbage collector is lucky enough to deref the document node last" - but "unless the garbage collector is lucky enough to deref nodes in a similar order to the order in which removeAllChildren() delegates the lifetime-owner-bit". In addition, what we should concern would be not JavaScript GC but DOM mutations caused inside C++ DOM. When JavaScript GC happened, the overhead of GC itself would be much larger. (e.g. the current JSC's GC looks up a root node for all the nodes that have strong references.) For example, the worst situation can happen in the following scenario: DOM tree: document -- A -- B -- C -- D -- E (1) Assume that there are strong references to document, C, D and E. Initially, the document has the lifetime-owner-bit. (2) document = null. removedLastRef() is called on the document. (3) The traversal happens. The traversal stops at C, and C takes over the lifetime-owner-bit. (4) Nothing happens until C's ref-count becomes 0. (5) If C's ref-count becomes 0, removedLastRef() is called for the document. (6) The traversal stops at D, and D takes over the lifetime-owner-bit. (7) Nothing happens until D's ref-count becomes 0. (8) If D's ref-count becomes 0, removedLastRef() is called for the document. (9) The traversal stops at E, and E takes over the lifetime-owner-bit. (10) Nothing happens until E's ref-count becomes 0. (11) If E's ref-count becomes 0, removedLastRef() is called for the document. (12) Since the traversal completes, all the Nodes are destructed. As you can see, if the ref-count becomes 0 in the order of E -> D -> C, then the steps (6) -- (11) are not needed. The worst case happens only when the traversal order matches the order in which the ref-count is lost. > Also, "when the ref-count of the node X becomes 0" implies a change to deref(), which seems to contradict your stated goals. Sorry, you're right. One change is needed in deref(): before: if (m_refCount <= 0 && !m_parent) after: if (m_refCount == LifetimeOwnerBitMask) // LifetimeOwnerBitMask = 0x8000000 Also one change is needed in setParent(): before: m_parent = parent; after: m_parent = parent; if (parent) m_refCount &= ~LifetimeOwnerBitMask; else m_refcount |= LifetimeOwnerBitMask. (I hope I can upload the patch in a day...)

Kentaro Hara

Comment 6 2012-06-11 23:09:39 PDT

> - but "unless the garbage collector is lucky enough to deref nodes in a similar order to the order in which removeAllChildren() delegates the lifetime-owner-bit". Correction: "if the garbage collector is unlucky enough to deref nodes in a similar order to the order in which removeAllChildren() delegates the lifetime-owner-bit"

Geoffrey Garen

Comment 7 2012-06-11 23:50:26 PDT

> The worst case happens only when the traversal order matches the order in which the ref-count is lost. I'm not clear on why you think this case is artificial. It's common for nodes to be allocated in document order, and it's common for garbage collectors to run finalizers in allocation order. I would expect this case to be common. Do you have some data that shows this case to be artificial? > When JavaScript GC happened, the overhead of GC itself would be much larger. (e.g. the current JSC's GC looks up a root node for all the nodes that have strong references.) I disagree. GC is O(n), not O(n^2).

Kentaro Hara

Comment 8 2012-06-11 23:58:30 PDT

(In reply to comment #7) > > The worst case happens only when the traversal order matches the order in which the ref-count is lost. > > I'm not clear on why you think this case is artificial. It's common for nodes to be allocated in document order, and it's common for garbage collectors to run finalizers in allocation order. I would expect this case to be common. > > Do you have some data that shows this case to be artificial? > > > When JavaScript GC happened, the overhead of GC itself would be much larger. (e.g. the current JSC's GC looks up a root node for all the nodes that have strong references.) > > I disagree. GC is O(n), not O(n^2). Ah, got it. You're right. Anyway the worst case happens only when the traversal order matches the order in which the ref-count becomes 0. We might be able to prevent the problem by changing the traversal order; e.g. traverse in the reverse document order, or have two traversal pointers to mess up the order etc.

Kentaro Hara

Comment 9 2012-06-14 08:28:14 PDT

Unfortunately I encountered a bug of ContainerNodeAlgorithm, which is blocking the implementation of the new reference-count algorithm I explained above. I am fixing the bug, just a moment please:)

Kentaro Hara

Comment 10 2012-06-20 23:28:42 PDT

Created attachment 148737 [details] the first WIP patch to observe build bots

Build Bot

Comment 11 2012-06-20 23:56:18 PDT

Comment on attachment 148737 [details] the first WIP patch to observe build bots Attachment 148737 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13026042

Kentaro Hara

Comment 12 2012-06-22 07:03:40 PDT

Created attachment 149019 [details] WIP patch2

Build Bot

Comment 13 2012-06-22 07:33:54 PDT

Comment on attachment 149019 [details] WIP patch2 Attachment 149019 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13034409

Build Bot

Comment 14 2012-06-22 07:42:13 PDT

Comment on attachment 149019 [details] WIP patch2 Attachment 149019 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/13039358

Kentaro Hara

Comment 15 2012-06-25 17:43:36 PDT

Created attachment 149409 [details] WIP patch3

Kentaro Hara

Comment 16 2012-06-25 23:47:36 PDT

Created attachment 149468 [details] WIP patch4

Gustavo Noronha (kov)

Comment 17 2012-06-26 00:58:59 PDT

Comment on attachment 149468 [details] WIP patch4 Attachment 149468 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13090426

Ojan Vafai

Comment 18 2012-06-26 11:40:03 PDT

Comment on attachment 149468 [details] WIP patch4 View in context: https://bugs.webkit.org/attachment.cgi?id=149468&action=review > Source/WebCore/platform/TreeShared.h:185 > + uint32_t m_refCount; Haven't had time to digest the whole patch. Just one nit here. Can you avoid all the manual bit-masking by using bitfields here? unsigned m_isLiftimeOwner : 1; unsigned m_refCount : 31; Then you also don't need to add the refCount method.

Kentaro Hara

Comment 19 2012-06-27 05:37:16 PDT

Created attachment 149734 [details] Patch

Kentaro Hara

Comment 20 2012-06-27 05:50:03 PDT

(In reply to comment #18) > > Source/WebCore/platform/TreeShared.h:185 > > + uint32_t m_refCount; > > Just one nit here. Can you avoid all the manual bit-masking by using bitfields here? Done. Thanks.

Kentaro Hara

Comment 21 2012-06-27 05:56:38 PDT

A design document: https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1

Gustavo Noronha (kov)

Comment 22 2012-06-27 06:48:01 PDT

Comment on attachment 149734 [details] Patch Attachment 149734 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13101749

Geoffrey Garen

Comment 23 2012-06-27 09:18:50 PDT

Comment on attachment 149734 [details] Patch You haven't convinced me that O(n^2) is a good idea.

Kentaro Hara

Comment 24 2012-06-27 17:28:45 PDT

(In reply to comment #23) > (From update of attachment 149734 [details]) > You haven't convinced me that O(n^2) is a good idea. The document (page 19 -- page 22) explains that the O(n^2) case won't happen except for the very worst case (i.e. the case where the searchNewLifetimeOwner()'s traversal order matches the deref()'s traversal order), and that we would be able to avoid the worst case in common situations by traversing the tree in the following order: - First, traverse the root node - For the rest of nodes, traverse in the reverse DFS order Wouldn't the explanation be enough? Or you do not want to accept the algorithm that might cause the worst case?

Filip Pizlo

Comment 25 2012-06-27 17:57:21 PDT

(In reply to comment #24) > (In reply to comment #23) > > (From update of attachment 149734 [details] [details]) > > You haven't convinced me that O(n^2) is a good idea. > > The document (page 19 -- page 22) explains that the O(n^2) case won't happen except for the very worst case (i.e. the case where the searchNewLifetimeOwner()'s traversal order matches the deref()'s traversal order), and that we would be able to avoid the worst case in common situations by traversing the tree in the following order: > > - First, traverse the root node > - For the rest of nodes, traverse in the reverse DFS order > > Wouldn't the explanation be enough? Or you do not want to accept the algorithm that might cause the worst case? This is a big change. The proof burden to show that this doesn't break the universe is huge. That burden is on you. What I worry about is: 1) That you have not done sufficient performance tests. I want to know how this affects all known benchmarks that operate on the DOM. I want to know what this does to page load time. You've shown me some data on Dromaeo but nothing about page load. 2) That there are cases where the DFS code induces O(n^2) behavior. You could prove this by reporting metrics of the behavior of your code on major websites (gmail, gdocs, bing, facebook, amazon, twitter ... you know the ones). You have not done this, and that worries me a lot. I'd be scared to see this change land without that sort of due diligence. 3) That this is completely unnecessary. You're solving a corner case that probably nobody will notice but us, and you're adding a lot of code to support this corner case. I'm not convinced that any of this is necessary at all.

Kentaro Hara

Comment 26 2012-06-27 19:30:54 PDT

(In reply to comment #25) > 1) That you have not done sufficient performance tests. I want to know how this affects all known benchmarks that operate on the DOM. I want to know what this does to page load time. You've shown me some data on Dromaeo but nothing about page load. > > 2) That there are cases where the DFS code induces O(n^2) behavior. You could prove this by reporting metrics of the behavior of your code on major websites (gmail, gdocs, bing, facebook, amazon, twitter ... you know the ones). You have not done this, and that worries me a lot. I'd be scared to see this change land without that sort of due diligence. OK, let me investigate them. > 3) That this is completely unnecessary. You're solving a corner case that probably nobody will notice but us, and you're adding a lot of code to support this corner case. I'm not convinced that any of this is necessary at all. At least this is mandatory for redesigning V8 GC.

Filip Pizlo

Comment 27 2012-06-27 19:33:56 PDT

(In reply to comment #26) > (In reply to comment #25) > > 1) That you have not done sufficient performance tests. I want to know how this affects all known benchmarks that operate on the DOM. I want to know what this does to page load time. You've shown me some data on Dromaeo but nothing about page load. > > > > 2) That there are cases where the DFS code induces O(n^2) behavior. You could prove this by reporting metrics of the behavior of your code on major websites (gmail, gdocs, bing, facebook, amazon, twitter ... you know the ones). You have not done this, and that worries me a lot. I'd be scared to see this change land without that sort of due diligence. > > OK, let me investigate them. > > > 3) That this is completely unnecessary. You're solving a corner case that probably nobody will notice but us, and you're adding a lot of code to support this corner case. I'm not convinced that any of this is necessary at all. > > At least this is mandatory for redesigning V8 GC. JavaScriptCore's GC approach already demonstrates that this is not mandatory for GC. Is V8's GC really that much worse than JSC's?

Kentaro Hara

Comment 28 2012-06-27 19:58:34 PDT

(In reply to comment #27) > JavaScriptCore's GC approach already demonstrates that this is not mandatory for GC. > > Is V8's GC really that much worse than JSC's? That's a pretty long story and I am writing a document about it:) In a nutshell, the essential difference is that V8 GC is generational, and the semantics is mandatory to reclaim DOM objects in the minor GC. Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.) Rough explanation of the difficulty: - There are two independent object graphs in the world; one is a JavaScript object graph, and the other is a DOM object graph. - Regarding the JavaScript object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just look up JavaScript objects in the new space. The minor GC does not know JavaScript objects in the old space. - Regarding the DOM object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just traverse DOM objects that have wrapper objects in the new space. The minor GC does not know DOM objects that have wrapper objects in the old space nor DOM objects that does not have wrapper objects. - In summary, the minor GC must work on a subset of the JavaScript object graph and a subset of the DOM object graph. To judge the reachability to each DOM object correctly (conservatively) with the limited knowledge, the semantics "If Node X has a ref-count, all the Nodes in the tree which X belongs to are kept alive" in the DOM side is required.

Filip Pizlo

Comment 29 2012-06-27 21:02:53 PDT

(In reply to comment #28) > (In reply to comment #27) > > JavaScriptCore's GC approach already demonstrates that this is not mandatory for GC. > > > > Is V8's GC really that much worse than JSC's? > > That's a pretty long story and I am writing a document about it:) In a nutshell, the essential difference is that V8 GC is generational, and the semantics is mandatory to reclaim DOM objects in the minor GC. I think if you're going to change WebCore just to help V8's GC, you should justify it for more than just V8. One good starting point is to share that document when you're ready. > > Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.) Is this objective necessary for performance or correctness, or both? > > Rough explanation of the difficulty: > > - There are two independent object graphs in the world; one is a JavaScript object graph, and the other is a DOM object graph. > > - Regarding the JavaScript object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just look up JavaScript objects in the new space. The minor GC does not know JavaScript objects in the old space. > > - Regarding the DOM object graph, the minor GC can just look up a subset of the graph; i.e. the minor GC can just traverse DOM objects that have wrapper objects in the new space. The minor GC does not know DOM objects that have wrapper objects in the old space nor DOM objects that does not have wrapper objects. > > - In summary, the minor GC must work on a subset of the JavaScript object graph and a subset of the DOM object graph. To judge the reachability to each DOM object correctly (conservatively) with the limited knowledge, the semantics "If Node X has a ref-count, all the Nodes in the tree which X belongs to are kept alive" in the DOM side is required. I'm unconvinced. I think you can build a GenGC that has cheap interaction with the DOM without having to go to all of this trouble.

Kentaro Hara

Comment 30 2012-06-27 21:41:06 PDT

(In reply to comment #29) > I think if you're going to change WebCore just to help V8's GC, you should justify it for more than just V8. The benefit other than V8 is described in the "Motivation" section in the document I uploaded; i.e. improve predictability, fix the FIXME in JSC GC. > One good starting point is to share that document when you're ready. Will write up and share the document in a week or so. > > Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.) > > Is this objective necessary for performance or correctness, or both? Performance only. Currently V8 GC performs much much worse than JSC GC (V8 GC can stop 4 seconds for one document.createElement()), because all DOM objects survive two copying GC cycles, are promoted to the old space, and then reclaimed by the heavy major GC. We want to fix the behavior.

Filip Pizlo

Comment 31 2012-06-28 00:12:41 PDT

(In reply to comment #30) > (In reply to comment #29) > > I think if you're going to change WebCore just to help V8's GC, you should justify it for more than just V8. > > The benefit other than V8 is described in the "Motivation" section in the document I uploaded; i.e. improve predictability, fix the FIXME in JSC GC. I'm not convinced that either of these benefits is enough to justify an O(n^2) worst-case. > > > One good starting point is to share that document when you're ready. > > Will write up and share the document in a week or so. Cool, thanks! > > > > Objective: Reclaim DOM objects in the minor GC. (Currently V8 GC reclaims DOM objects only in the major GC. All DOM objects are promoted to the old space.) > > > > Is this objective necessary for performance or correctness, or both? > > Performance only. Currently V8 GC performs much much worse than JSC GC (V8 GC can stop 4 seconds for one document.createElement()), because all DOM objects survive two copying GC cycles, are promoted to the old space, and then reclaimed by the heavy major GC. We want to fix the behavior. Seems like a sensible goal. But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible.

Kentaro Hara

Comment 32 2012-06-28 00:54:37 PDT

(In reply to comment #31) > But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible. - JSC GC and V8 GC can explicitly avoid the worst case, if they want. If GC cares the deref()'s order so that root nodes are deref()ed in the end, the worst case does not happen (rather, only the "best" case happens). deref()'s order of GC is controllable. Unless GC tries to deref nodes in the same order as searchNewLifetimeOwner()'s order, we can avoid the worst case. - What we need to concern is DOM operations issued by JavaScript. If removeChild(), innerHTML = "", etc were issued in the same order as searchNewLifetimeOwner()'s order, the worst case happens. DOM operations depend on each Web app, and there is no way to control deref()'s order explicitly.

Filip Pizlo

Comment 33 2012-06-28 01:58:19 PDT

(In reply to comment #32) > (In reply to comment #31) > > But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible. > > - JSC GC and V8 GC can explicitly avoid the worst case, if they want. If GC cares the deref()'s order so that root nodes are deref()ed in the end, the worst case does not happen (rather, only the "best" case happens). deref()'s order of GC is controllable. Unless GC tries to deref nodes in the same order as searchNewLifetimeOwner()'s order, we can avoid the worst case. > > - What we need to concern is DOM operations issued by JavaScript. If removeChild(), innerHTML = "", etc were issued in the same order as searchNewLifetimeOwner()'s order, the worst case happens. DOM operations depend on each Web app, and there is no way to control deref()'s order explicitly. It's this latter point that I'm referring to. I'm interested in the worst-case in particular. I'm not sure it makes sense to make WebKit slower to make V8's GC faster. Hence the high proof burden for you. You need to really convince us that this is either not going to make things slower under any reasonable circumstance, or alternatively that the change in semantics is needed to make some website work.

Filip Pizlo

Comment 34 2012-06-28 02:41:29 PDT

(In reply to comment #32) > (In reply to comment #31) > > But if it requires adding O(n^2) worst-case behavior to JSC as well, then I'm not sure that's so sensible. > > - JSC GC and V8 GC can explicitly avoid the worst case, if they want. If GC cares the deref()'s order so that root nodes are deref()ed in the end, the worst case does not happen (rather, only the "best" case happens). deref()'s order of GC is controllable. Unless GC tries to deref nodes in the same order as searchNewLifetimeOwner()'s order, we can avoid the worst case. I thought about this more. Your statement that the deref() order in the GC is controllable is almost false. To control the deref() order, we'd have to perform all deref()'s all at once, and perform some sorting step to get the desired order. We don't want to do any of this, since we want to invoke the deref()'s incrementally so as to reduce pause time and increase minimum mutator utilization. This implies that the GC's deref() order is likely to be determined by some complex function of allocation order and the shape of the set of objects that survived not only the current collection, but all collections since system start. You haven't proved that this function does not resonate with the worst-case of your algorithm. Such a proof is hard, since reasoning about it analytically is beyond the state of the art (if we could reason about this function analytically, we could prove accurate bounds on fragmentation in non-copying GC - something we know we cannot do; the best known bounds [see http://comjnl.oxfordjournals.org/content/20/3/242.full.pdf] are horribly pathological). Hence the closest you'll get to a proof is to do experiments. This implies two things. First, it raises the proof burden for your claim that the GC will *tend not* to hit the pathological case. Put another way, it makes the current status quo appear much more attractive than your patch - with the status quo, we already know that we won't hit this pathology. Second, it contravenes your claim that the GC can actively avoid the pathological case. In other words, you really need to show evidence that we won't hit the worst case on real websites. Small benchmarks like Dromaeo are unconvincing here, since they tend to just repeat the same small sequence of actions many times. Interesting GC behavior tends to only happen in code that does not repeat the same action, but instead performs some chaotic sequence of seemingly unrelated actions. Your only chance of studying this is on real websites.

Kentaro Hara

Comment 35 2012-06-28 04:32:22 PDT

Thanks for the detailed comment! > In other words, you really need to show evidence that we won't hit the worst case on real websites. Your only chance of studying this is on real websites. The point completely makes sense. Counting the number of searchNewLifetimeOwner() that misses deleting objects would give us real data of how the algorithm works in the real world. I'll investigate it after writing up the document about V8 GC. > Your statement that the deref() order in the GC is controllable is almost false. To control the deref() order, we'd have to perform all deref()'s all at once, and perform some sorting step to get the desired order. We don't want to do any of this, since we want to invoke the deref()'s incrementally so as to reduce pause time and increase minimum mutator utilization. I got your point, but I am not so convinced with the point. Let's consider this way: Case 1: Assume that the semantics is needed to make the real world work. In this case, my algorithm would make sense. Period. Case 2: Assume that the semantics is not needed to make the real world work. In this case, considering that the current TreeShared algorithm works predictably only when a root node is an n-ref node, "the semantics is not needed to make the real world work" implies "the real world always keeps a reference to a root node until the whole tree is dereferenced". Even JSC's GC takes care of it; i.e. JSC's GC intentionally keeps a wrapper object of a root node alive in order to keep the whole tree alive (http://code.google.com/codesearch#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/bindings/js/JSNodeCustom.cpp&exact_package=chromium&q=jsnodecustom&type=cs&l=92). In this way, the real world and GC are working (has to work) in a way that a reference of a root node is dereferenced at the very end, since that is the only way to make the behavior predictable. Even in an incremental GC, a root node should be dereferenced in the final incremental phase of the tree. If this observation is true, it would not be difficult to make the GC work so that a root node is dereferenced at the very end; i.e. given that the root node is guaranteed to be in the final incremental phase of the tree, the GC just needs to take care to dereference the root node at the very end in the final incremental phase. No sorting is needed. The GC just needs to skip root nodes at first, and then dereference the skipped root nodes at the very end. In summary, assuming that "the semantics is not needed to make the real world work", it implies that we can control the GC so that the root node is dereferenced at the very end. If the root node is dereferenced at the very end, it hits the best case.

Geoffrey Garen

Comment 36 2012-06-28 11:00:45 PDT

Accommodating poor design in V8's garbage collector is a non-goal for the WebKit project. The WebKit project has a garbage collector that works very well with the DOM. To justify a huge and risky change to core WebKit code, you need to explain a benefit to the WebKit project, and explain why the benefit outweighs the cost. A few further notes, based on the discussion here: (1) I've described an algorithm that isn't O(n^2). Discussing the nuances of this O(n^2) algorithm is interesting but irrelevant until someone proves that there is no viable non-O(n^2) algorithm. (2) Even if absolute worst-case behavior is rare, an occasional regression in average-case behavior (for example, causing "only" five or six extra traversals of some or all of the DOM during every document destruction) is a huge regression in core WebKit behavior. (3) Since this patch retains "guardRef" behavior, it doesn't fully match the lifetime semantics of the DOM specification. I'm not even sure I'd be willing to add an O(n^2) algorithm to the core DOM just to match a corner case of the specification. I'm certainly not willing to do so without fully matching the specification.

Kentaro Hara

Comment 37 2012-06-29 06:36:10 PDT

Thanks ggaren. (In reply to comment #36) > Accommodating poor design in V8's garbage collector is a non-goal for the WebKit project. The WebKit project has a garbage collector that works very well with the DOM. To justify a huge and risky change to core WebKit code, you need to explain a benefit to the WebKit project, and explain why the benefit outweighs the cost. The benefit is the predictability, as explained in the document. As far as I understand the webkit-dev@ thread, as darin, maciej and rniwa pointed out, it seems that people are thinking that the proposed semantics looks better than the current unpredictable semantics, as long as it does not regress performance at all. So I guess that the problem of my approach would be that whether it regresses performance or not is still unclear, than that the benefit is unclear. For example, if I could invent an always-O(n) algorithm, then I hope people will accept it. As pizlo pointed out, I need to investigate real world cases. In parallel, I am seeking for any alternative always-O(n) algorithm. > (1) I've described an algorithm that isn't O(n^2). Discussing the nuances of this O(n^2) algorithm is interesting but irrelevant until someone proves that there is no viable non-O(n^2) algorithm. I'm trying to optimize your algorithm, but let me ask a couple of questions. You pointed out the following three issues in my first implementation: (1) adds a call to fprintf inside ref() and deref(), probably making them ineligible for inlining Will fix. (2) does a test against "!m_parent" in rootDeref, which can be removed Will fix. (3) adds 2 data members to every node, increasing malloc cost and memory traffic Do you have any idea to avoid adding the 2 data members; i.e. a pointer to root (m_root) and a reference count of the whole tree (i.e. m_rootRefCount)? First, it would be possible to remove m_root, but in that case we need to look up the root every time (O(log n) in average) or store m_root in rare data (which requires hashmap look-up). Second, how can we remove m_rootRefCount? It would be possible to store m_rootRefCount in rare data, but in that case, every ref()/deref() has to look up a hashmap to get the rare data. > (2) Even if absolute worst-case behavior is rare, an occasional regression in average-case behavior (for example, causing "only" five or six extra traversals of some or all of the DOM during every document destruction) is a huge regression in core WebKit behavior. Makes sense. > (3) Since this patch retains "guardRef" behavior, it doesn't fully match the lifetime semantics of the DOM specification. I'm not even sure I'd be willing to add an O(n^2) algorithm to the core DOM just to match a corner case of the specification. I'm certainly not willing to do so without fully matching the specification. Would you tell me where the spec is? (I want to read it, but couldn't find it...) Does the spec require that "If a document is referenced by ownerDocument() from any node, then all the nodes under the document must be kept alive"? Anyway I think the ownerDocument() problem is orthogonal with our current problem.

Alexey Proskuryakov

Comment 38 2012-06-29 08:06:19 PDT

There are other ways to achieve predictability besides keeping the whole tree alive, and we do implement these in some similar cases. What I mean is that parent pointers could be nulled out on detaching immediately, without waiting for GC. Then GC randomness would not be exposed to client code.

Kentaro Hara

Comment 39 2012-06-29 09:58:53 PDT

Thanks ap! (In reply to comment #38) > There are other ways to achieve predictability besides keeping the whole tree alive, and we do implement these in some similar cases. What I mean is that parent pointers could be nulled out on detaching immediately, without waiting for GC. Then GC randomness would not be exposed to client code. GC is not the only guy who causes randomness. Please look at this section (https://docs.google.com/a/google.com/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1#heading=h.jb5c0dei7hff). The first example is due to GC randomness. The second randomness is due to randomness in a document parser. The third example is due to randomness in editing undo buffer. "predictability" might not be a good word. Even if behavior were predictable, it would make no sense to programmers if the behavior is far from understandable. "predictability and understandability" would be a requirement. I think all of these details (i.e. the lifetime semantics of DOM nodes) are speced somewhere. (I hope that ggaren would know better.)

Kentaro Hara

Comment 40 2012-07-02 06:09:32 PDT

pizlo, ggaren: I revised the lifetime owner algorithm. Document: https://docs.google.com/a/chromium.org/document/d/17ubK4aobm4G3c2FADqEoXAwSWbCIipXwzrGFJY_atoQ/edit Two important observations explained in the document: - We've believed that the current TreeShared algorithm is always O(N). The observation was wrong. The TreeShared algorithm is O(N) in most cases but O(N^2) in the worst case. - The revised algorithm is _always_ O(N). In other words, the revised algorithm not only accomplishes the predictable lifetime semantics but also _always_ performs better than the current TreeShared algorithm. I hope:) Comments are welcome.

Kentaro Hara

Comment 41 2012-07-10 00:03:27 PDT

(In reply to comment #31) > > > One good starting point is to share that document when you're ready. > > > > Will write up and share the document in a week or so. > > Cool, thanks! Wrote up a document: https://docs.google.com/a/google.com/document/d/1Ki8WrGIHipIjzz5rl3SG7SN-ulH9JgbcQAOA4Ja-hXs/edit The document explains why the new lifetime semantics is needed for reclaiming DOM objects in a generational GC. The algorithm proposed in the document is generally applicable to generational GCs, not limited to V8's GC. Let me reiterate the benefits of the new lifetime semantics: (1) Performance. The current TreeShared algorithm is O(N) in most cases but O(N^2) in the worst case. On the other hand, the lifetime owner algorithm is always O(N). In other words, the lifetime owner algorithm not only accomplishes the new lifetime semantics but also performs better than the current TreeShared algorithm. (2) Predictability for JavaScript programmers. (3) More efficient V8's GC.

Kentaro Hara

Comment 42 2012-07-31 05:09:14 PDT

Created attachment 155499 [details] patch for 'rough' review

Kentaro Hara

Comment 43 2012-07-31 05:25:14 PDT

ggaren, fpizlo: I uploaded a patch that implements the revised lifetime owner algorithm. Benefits and performance results are described in the ChangeLog. I would be grateful if you could tell me your high-level views on the approach. (You don't need to look into implementation details:-) The patch is doing a lot of things at a breath to make the algorithm work correctly. If the approach looks acceptable overall, I'd like to break down the patch and land sub-patches one by one, starting from non-controversial ones.

Gustavo Noronha (kov)

Comment 44 2012-07-31 08:54:04 PDT

Comment on attachment 155499 [details] patch for 'rough' review Attachment 155499 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/13403174

Hajime Morrita

Comment 45 2012-07-31 18:39:02 PDT

Comment on attachment 155499 [details] patch for 'rough' review You could land RefPtr elimination part separately.

Geoffrey Garen

Comment 46 2012-08-02 15:25:03 PDT

Comment on attachment 155499 [details] patch for 'rough' review View in context: https://bugs.webkit.org/attachment.cgi?id=155499&action=review > Source/WebCore/ChangeLog:68 > + I confirmed that the performance regression of Dromaeo/dom-modify is > + the overhead of checkIfLifetimeOwner(node) in ChildNodeRemovalNotifier::notify(). > + As far as I experimented, it is difficult to kill the 4.67% overhead. > + However, I found that we can dramatically optimize ChildNodeInsertionNotifier and > + ChildNodeRemovalNotifier by rewriting them in an iterative manner, > + which will completely offset the 4.67% overhead. Looks like the first step here is to get your optimization for ChildNodeInsertionNotifier and ChildNodeRemovalNotifier reviewed and landed, and then re-baseline these performance measurements.

Kentaro Hara

Comment 47 2012-08-02 17:38:18 PDT

(In reply to comment #46) > Looks like the first step here is to get your optimization for ChildNodeInsertionNotifier and ChildNodeRemovalNotifier reviewed and landed, and then re-baseline these performance measurements. Thanks ggaren! Dromaeo/dom-modify is going to be optimized by 8.2% in bug 92965. (And I have another idea for further optimization.)

Kentaro Hara

Comment 48 2012-08-27 21:40:55 PDT

Created attachment 160895 [details] Simplified the algorithm. Fixed crashes. Fixed memory leaks. But the perf regression of Dromaeo/dom-modify is still unacceptable. I'm working on the optimization.

Kentaro Hara

Comment 49 2012-09-03 19:02:21 PDT

Created attachment 161956 [details] Fixed perf regression

Kentaro Hara

Comment 50 2012-09-03 19:17:28 PDT

Created attachment 161957 [details] rebased patch

Kentaro Hara

Comment 51 2012-09-03 19:19:33 PDT

ggaren, pizlo: I've updated the patch. Now I think I was able to fix crashes, memory leaks, and perf regression. I had to change the design from the original version because of a bunch of problems:) I documented the final design in this document: https://docs.google.com/a/chromium.org/document/d/1HcaeU-75xukMYSUaskeDoHL-L0FNuCDIukvPmOheY8Q/edit - I would appreciate your feedback for overall design and implementation. - If you could review the patch, I'm super happy. But if you don't want to take a detailed look, I can ask other folks to review the details, after you say the overall approach looks OK. Thanks

Kentaro Hara

Comment 52 2012-09-05 22:50:39 PDT

Just FYI, here is a result of real-world evaluation of the new V8's DOM GC: https://docs.google.com/a/google.com/document/d/1uNyNSoNedyxsT9d_qqZXMW7ISLpycV7O4PrzSO56yWM/edit# (If you are busy, you can just take a look at Fig.2 and Table.1.)

Adam Barth

Comment 53 2012-09-06 00:23:56 PDT

Comment on attachment 161957 [details] rebased patch @Kentaro: Is this ready for a serious review? If so, I can spend some time looking at it tomorrow.

Kentaro Hara

Comment 54 2012-09-06 00:27:03 PDT

(In reply to comment #53) > (From update of attachment 161957 [details]) > @Kentaro: Is this ready for a serious review? If so, I can spend some time looking at it tomorrow. Thanks abarth! Not yet. I think I need to hear opinions of pizlo or ggaren about the overall approach first.

Kentaro Hara

Comment 55 2012-09-06 00:48:21 PDT

Created attachment 162440 [details] overall comments are appreciated before serious review

Filip Pizlo

Comment 56 2012-09-06 00:55:19 PDT

Comment on attachment 162440 [details] overall comments are appreciated before serious review View in context: https://bugs.webkit.org/attachment.cgi?id=162440&action=review > Source/WebCore/ChangeLog:65 > + (Safari/Mac) > + Dromaeo/dom-attr 5769 runs/s => 5656 runs/s (-1.95%) > + Dromaeo/dom-modify 4935 runs/s => 4770 runs/s (-3.34%) > + Dromaeo/dom-query 948941 runs/s => 954218 runs/s (+0.55%) > + Dromaeo/dom-traverse 3870 runs/s => 3810 runs/s (-1.03%) > + > + The -3.34% regression in dom-modify (Safari/Mac) is not acceptable, > + but I have already optimized dom-modify by 8.2% in another patch (bug 92965) > + to offset the performance regression. I think we should eliminate this 3.3% regression. That's great that you optimized it by 8% in another patch, but that's not a good reason to regress the performance. Have you performed the following investigation: 1) What is the statistical significance of the regression? Have you run it multiple (at least 10) times and computed the standard deviation or even better, confidence intervals? 2) Do you know where the regression is coming from? 3) What is the performance of the rest of Dromaeo, and of Dromaeo as a whole? > Source/WebCore/dom/ContainerNodeAlgorithms.h:66 > + void notify(Node*, bool); I would strongly prefer an enum instead of a bool. This bool is hard to follow! I particularly prefer verbose enums in these cases: enum InsertionPointDeletionState { InsertionPointDeletionHasBegun, InsertionPointDeletionHasNotBegun } Or similar. > Source/WebCore/platform/TreeShared.h:198 > + unsigned m_nonWrapperRefCount : 16; // Reference count by normal RefPtrs in WebKit. > + unsigned m_lifetimeOwner : 1; // Whether this Node is a lifetime owner or not. > + unsigned m_wrapperRefCount : 15; // Reference count by wrapper objects. I don't feel comfortable with such small ref counts. 32768 referees feels small. Do you have functionality to guard against overflow?

Filip Pizlo

Comment 57 2012-09-06 01:12:14 PDT

A couple of general comments: 1) Can you articulate what the general advantages of this approach are, outside of V8, and other than the semantic changes? I don't think you're going to convince us that this patch is a good idea if it only benefits V8. You're also not likely to get far with the semantic argument. So. Does this broadly help generational GC, or other GC barriers, for example? Does it fix a known bug on a known website? If this change cannot help WebKit, then it's hard to swallow. 2) You should fix the 3% regression in dom-modify. Do you have ideas of where it is coming from? 3) Dromaeo is a really bad benchmark, and it's sad that it's the best one we have for DOM manipulation. Can you look at the performance in JSC with this change for programs that create a lot of DOM things? Octane's benchmarks that use typed arrays might be interesting. You should test it. But even that won't really answer what will happen on real websites. So another way of approaching this is: can you construct a pathological, synthetic benchmark that makes your change look as bad as possible for either JSC or V8? The point is not to use this benchmark as an argument against your patch, but rather, to give all of us a sense of what the worst-case would be. You could also then produce a similar test that shows a pathological, synthetic speed-up from your change on JSC (if that is possible). If we know what the worst and best cases are, then that could give us further confidence that your change is safe. If constructing a worst case benchmark was impossible then that would make us feel quite uneasy about this. There's nothing worse than tracking down horrid performance on a complex website, and it would be particularly sad if this change was the reason. On the other hand, if we knew that even in the worst case this patch regresses things, by, say, 10%, then at least we know exactly what to expect. Even better if the worst-case regression is in the single digits! 4) You should run the following other benchmarks as a sanity check: SunSpider, Kraken, RIABench (http://www.timo-ernst.net/misc/riabench-start/). I will list more benchmarks if I think of any. Although those benchmarks don't necessarily stress the DOM a lot, the way they are run does rely on the GC doing exactly the right things with the DOM. For example if in SunSpider the GC didn't blow away the iframe contents from the previously run benchmark then you'd get disastrous slow-downs. It's useful to run these tests to ensure that this does not happen. Other concerns are listed in my quick review (comment 56). I'm particularly concerned with the small number of bits given to the reference counts. If wrap-around can be used to cause premature freeing to happen, and that wrap-around only requires 2^16 objects, then that's not really a sound design, unless I'm missing something.

Kentaro Hara

Comment 58 2012-09-06 01:12:48 PDT

Thanks pizlo! (In reply to comment #56) > I think we should eliminate this 3.3% regression. That's great that you optimized it by 8% in another patch, but that's not a good reason to regress the performance. OK, I'll invent some more optimization. However, it will be landed in a separate patch before landing this patch, because removing the 3.3% regression in this patch is difficult for the reason described below. > Have you performed the following investigation: > > 1) What is the statistical significance of the regression? Have you run it multiple (at least 10) times and computed the standard deviation or even better, confidence intervals? The results are the average of 10 runs. Will investigate standard deviation etc. > 2) Do you know where the regression is coming from? The 3.3% regression is due to a lot of if(!m_treeScopeDepth) checks in ChildNodeInsertionNotifier::notify() and ChildNodeRemovalNotifier::notify(). The if branches slows down the performance. But it is difficult to remove these if branches because they are critical for the algorithm. On the other hand, I don't know the root cause of about 1% regressions in dom-attr and dom-traverse. Will investigate it. > 3) What is the performance of the rest of Dromaeo, and of Dromaeo as a whole? No observable regression. Other benchmarks in Dromaeo do not touch DOM. > > Source/WebCore/dom/ContainerNodeAlgorithms.h:66 > > + void notify(Node*, bool); > > I would strongly prefer an enum instead of a bool. This bool is hard to follow! I particularly prefer verbose enums in these cases: > > enum InsertionPointDeletionState { > InsertionPointDeletionHasBegun, > InsertionPointDeletionHasNotBegun > } > > Or similar. Will fix. > > Source/WebCore/platform/TreeShared.h:198 > > + unsigned m_nonWrapperRefCount : 16; // Reference count by normal RefPtrs in WebKit. > > + unsigned m_lifetimeOwner : 1; // Whether this Node is a lifetime owner or not. > > + unsigned m_wrapperRefCount : 15; // Reference count by wrapper objects. > > I don't feel comfortable with such small ref counts. 32768 referees feels small. The rationale for the counts is explained in this section (https://docs.google.com/a/chromium.org/document/d/1HcaeU-75xukMYSUaskeDoHL-L0FNuCDIukvPmOheY8Q/edit#heading=h.cdgghoqthpgu). > Do you have functionality to guard against overflow? ASSERT() should be inserted, you're right. But it is difficult to insert an if(...) check because the if branch regresses performance.

Adam Barth

Comment 59 2012-09-06 01:21:58 PDT

> > > Source/WebCore/platform/TreeShared.h:198 > > > + unsigned m_nonWrapperRefCount : 16; // Reference count by normal RefPtrs in WebKit. > > > + unsigned m_lifetimeOwner : 1; // Whether this Node is a lifetime owner or not. > > > + unsigned m_wrapperRefCount : 15; // Reference count by wrapper objects. > > > > I don't feel comfortable with such small ref counts. 32768 referees feels small. > > The rationale for the counts is explained in this section (https://docs.google.com/a/chromium.org/document/d/1HcaeU-75xukMYSUaskeDoHL-L0FNuCDIukvPmOheY8Q/edit#heading=h.cdgghoqthpgu). The doc claims there isn't a programatic way to spam the RefCount, but why can't a web page just create 60k TreeWalker objects that hold a reference to the node via TreeWalker::m_current ?

Adam Barth

Comment 60 2012-09-06 01:26:05 PDT

var walkers = []; for (var i=0; i < 60000; ++i) walkers.push(document.createTreeWalker(document.body)); I haven't looked in the debugger, but isn't the ref count on document.body now around 60k?

Kentaro Hara

Comment 61 2012-09-06 01:52:19 PDT

(In reply to comment #60) > var walkers = []; > for (var i=0; i < 60000; ++i) > walkers.push(document.createTreeWalker(document.body)); > > I haven't looked in the debugger, but isn't the ref count on document.body now around 60k? I confirmed that it created 120k refcounts! So, that's a problem. A possible solution would be falling back to a 32-bit refcount allocated in NodeRareData when the 16-bit refcount overflows (, which would be a super rare case). In order to check if the overflow happens or not, one if branch is needed in ref().

Kentaro Hara

Comment 62 2012-09-06 02:09:09 PDT

pizlo: Thank you very much for comments. > 2) You should fix the 3% regression in dom-modify. > 3) Dromaeo is a really bad benchmark, you should create a worst-case benchmark. > 5) I'm particularly concerned with the small number of bits given to the reference counts. As a next step, I will focus on fixing the above three issues. > 4) You should run the following other benchmarks as a sanity check: SunSpider, Kraken, RIABench Will do. But I think (I hope) that there is no observable regression because they do not touch DOM. > 1) Can you articulate what the general advantages of this approach are, outside of V8, and other than the semantic changes? I don't think you're going to convince us that this patch is a good idea if it only benefits V8. You're also not likely to get far with the semantic argument. So. Does this broadly help generational GC, or other GC barriers, for example? Does it fix a known bug on a known website? If this change cannot help WebKit, then it's hard to swallow. - A big performance & memory improvement in a generational GC for DOM objects (https://docs.google.com/a/google.com/document/d/1uNyNSoNedyxsT9d_qqZXMW7ISLpycV7O4PrzSO56yWM/edit). The GC algorithm I've been proposing is broadly applicable to generational GCs. If JSC adopted a generational GC, JSC will be also benefited. - There is a known bug about the semantics (http://code.google.com/p/chromium/issues/detail?id=143478&q=label%3AWebKit-ID-88834&colspec=ID%20Pri%20Mstone%20ReleaseBlock%20OS%20Area%20Feature%20Status%20Owner%20Summary) - That being said, I know that the above two items won't convince you. If you (and other port contributors) do not wish this change, it might be possible to enable the semantics only in V8 by surrounding #if USE(V8) or moving most of the code to V8 specific directory. Actually, IMHO, if we could implement the semantics without any perf regression, the semantics is better than the current incomprehensible semantics. So I'm thinking that the biggest problem is how I could fix all the perf regression.

Geoffrey Garen

Comment 63 2012-09-06 11:20:59 PDT

I broadly agree with the direction this review is taking. One comment about performance testing: We need some benchmark that shows average case and worse case cost of (a) full document tear-down (b) small fragment tear-down The micro-benchmarks you're measuring cover (b) but not (a). Also, please report benchmark results for JSC. We can't use V8 results as a baseline because, as your data shows, the V8 implementation of DOM GC is slow, so it sets a too-low bar for performance. > - If you (and other port contributors) do not wish this change, it might be possible to enable the semantics only in V8 by surrounding #if USE(V8) or moving most of the code to V8 specific directory. No. We simply can't maintain two #ifdef'd implementations of an algorithm that is both complex and central to the engine. That would seriously undermine our hackability, performance, security, and compatibility goals.

Geoffrey Garen

Comment 64 2012-09-06 11:26:25 PDT

> I confirmed that it created 120k refcounts! So, that's a problem. Marking r- based on this. (The other comments here may or may not be reasons for r-, depending on the outcome of some investigations.)

Kentaro Hara

Comment 65 2012-09-24 22:46:03 PDT

Created attachment 165523 [details] Pathological test case 1 I'll upload the performance result in a day (with performance results for a bunch of benchmarks in both JSC and V8).

Kentaro Hara

Comment 66 2012-09-24 22:48:02 PDT

Created attachment 165524 [details] Pathological test case 2 I'll upload the performance result in a day (with performance results for a bunch of benchmarks in both JSC and V8).

Kentaro Hara

Comment 67 2012-09-27 23:19:53 PDT

Created attachment 166147 [details] Patch

Kentaro Hara

Comment 68 2012-09-27 23:21:27 PDT

fpizlo, ggaren: Uploaded a revised patch. Investigated a bunch of benchmarks. Clarified a couple of items pointed out by reviewers. Benchmark results: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AlobCOyvTnPKdDJiZXh5X0tWeVZXMnQ0ZF9YUHZGdXc#gid=0 Document for more details: https://docs.google.com/a/chromium.org/document/d/1cGPSSWNENS891GuhvYTU7pKsncWUwztoVhGNe0YVlRk/edit The summary of the document: - For Dromaeo, Octane, Kraken and Sunspider, no performance regression. - For a lot of tests of Dromaeo JQuery and Dromaeo Prototype.js, 10 ~ 30% performance improvement in V8. - For pathological test cases, ~3.1% performance regression in JSC and ~5.8% performance regression in V8. The document also explains the following points: - How to manage a wrapper reference count and a normal reference count using only one 32 bit integer - The real cause of the ~3.3% performance regression we had observed - Pathological test cases that will be most affected by this change =========== If you have more benchmark suggestions, I’m happy to investigate. Also I am seeking ideas for simplifying the implementation of TreeShared.h. The current TreeShared.h is a bit difficult to read, in order to have compilers generate fully optimized assembly (See the “Topic 2” of the document for more details). Thanks!

Maciej Stachowiak

Comment 69 2012-10-02 00:09:32 PDT

Benchmark data looks like this doesn't hurt most DOM benchmarks. Kudos for that. However... (In reply to comment #68) > fpizlo, ggaren: Uploaded a revised patch. Investigated a bunch of benchmarks. Clarified a couple of items pointed out by reviewers. > > Benchmark results: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AlobCOyvTnPKdDJiZXh5X0tWeVZXMnQ0ZF9YUHZGdXc#gid=0 > Looks like the benchmark data doesn't include a page load speed benchmark or results of testing on popular sites, as requested by Filip in comment #25 > Document for more details: https://docs.google.com/a/chromium.org/document/d/1cGPSSWNENS891GuhvYTU7pKsncWUwztoVhGNe0YVlRk/edit According to this document, this approach can cause memory leaks in some cases. That seems like a very poor characteristic for a reference counting algorithm. Any explanation for why it's ok to leak sometimes?

Kentaro Hara

Comment 70 2012-10-02 00:30:26 PDT

Thank you very much maciej. (In reply to comment #69) > > Benchmark results: https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0AlobCOyvTnPKdDJiZXh5X0tWeVZXMnQ0ZF9YUHZGdXc#gid=0 > > > > Looks like the benchmark data doesn't include a page load speed benchmark or results of testing on popular sites, as requested by Filip in comment #25 Do you have any idea for testing that in a reliable way? To obtain reliable results, I guess that we need to test it using some micro benchmarks. In that sense, pathological micro benchmarks for page loading would be helpful. The pathological test case (https://bugs.webkit.org/attachment.cgi?id=165523) would be one good test case. Another test case would be innerHTML="a_big_DOM_tree", which is tested in dom-modify. If you have more benchmark suggestions, I'd be happy to test them. > > Document for more details: https://docs.google.com/a/chromium.org/document/d/1cGPSSWNENS891GuhvYTU7pKsncWUwztoVhGNe0YVlRk/edit > > According to this document, this approach can cause memory leaks in some cases. That seems like a very poor characteristic for a reference counting algorithm. Any explanation for why it's ok to leak sometimes? To be clear: The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. I believe that at least such a situation won't happen practically. If it happened (I don't think it happens), it just leaks memory. (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.)

Adam Barth

Comment 71 2012-10-02 09:49:28 PDT

> Do you have any idea for testing that in a reliable way? We run page load tests (PLT) continuously on trunk. The easiest way to get those numbers is to land the patch and watch the graphs. Getting reliable numbers prior to landing is possible as well, but trickier. Kentaro, if you find me on chat I can explain how to do it. It requires access to non-public test data, which you should have.

Darin Adler

Comment 72 2012-10-02 10:07:05 PDT

(In reply to comment #70) > The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. I want to understand what this means. Did you try to think of pathological cases that could lead to this situation? Can we just create a test that sets 100,000 properties on a DOM node all pointing to its parent node?

Geoffrey Garen

Comment 73 2012-10-02 10:51:08 PDT

> The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". Like Darin, I'd like to understand this statement more. However, if the plain meaning here is true, this is obviously a show-stopper, and cause for an r-.

Maciej Stachowiak

Comment 74 2012-10-02 11:23:58 PDT

(In reply to comment #71) > > Do you have any idea for testing that in a reliable way? > > We run page load tests (PLT) continuously on trunk. The easiest way to get those numbers is to land the patch and watch the graphs. Getting reliable numbers prior to landing is possible as well, but trickier. > > Kentaro, if you find me on chat I can explain how to do it. It requires access to non-public test data, which you should have. Given the performance risk of this change, and the specific request from Filip, it seems like testing before landing is the right thing to do. (In reply to comment #70) > > To be clear: The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. I believe that at least such a situation won't happen practically. If it happened (I don't think it happens), it just leaks memory. Wouldn't something like Adam's test case from comment #60 trigger this leak? Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows? > (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.) The current normal refcounting algorithm doesn't have this issue. So this seems like a showstopper to me.

Maciej Stachowiak

Comment 75 2012-10-02 17:59:31 PDT

(In reply to comment #74) > > Wouldn't something like Adam's test case from comment #60 trigger this leak? Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows? > I meant to say, node with more than 65535 *child nodes*.

Kentaro Hara

Comment 76 2012-10-02 18:09:47 PDT

darin: > > The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. > > I want to understand what this means. Did you try to think of pathological cases that could lead to this situation? Can we just create a test that sets 100,000 properties on a DOM node all pointing to its parent node? maciej: > Wouldn't something like Adam's test case from comment #60 trigger this leak? These two cases don't cause the memory leak. First of all, the memory leak does not happen unless there is "refcount synonym". In other words, the memory leak happens only when we misinterpret the refcount value. Specifically, the memory leak can happen in a case where: - the refcount is 65538. - the actual situation is "0 wrapper refcount + 65538 normal refcounts". - but we interpret it as "1 wrapper refcount + 2 normal refcounts". In the test cases you suggested, the guys who keep 65535 refcounts are wrapper objects. Thus, the test cases do not cause the misinterpretation. The misinterpretation can happen only when there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X" _without_passing_any_wrapper_object_. maciej: > Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows? Given that tree edges are not refcounted, this won't cause the memory leak. maciej: > > (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.) > > The current normal refcounting algorithm doesn't have this issue. But the current normal refcounting algorithm causes the wrong DOM lifetime semantics, which is the problem I want to fix. We want a refcounting algorithm that can count up a wrapper refcount and a normal refcount separately using only one 32-bit integer:) (In reply to comment #74) > (In reply to comment #71) > > > Do you have any idea for testing that in a reliable way? > > > > We run page load tests (PLT) continuously on trunk. The easiest way to get those numbers is to land the patch and watch the graphs. Getting reliable numbers prior to landing is possible as well, but trickier. > > > > Kentaro, if you find me on chat I can explain how to do it. It requires access to non-public test data, which you should have. > > Given the performance risk of this change, and the specific request from Filip, it seems like testing before landing is the right thing to do. Sure! Let me do the page load tests after we can reach a consensus on the refcounting problem.

Maciej Stachowiak

Comment 77 2012-10-02 20:53:06 PDT

(In reply to comment #76) > darin: > > > The algorithm leaks memory only in a case where there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X". I guess that there is no call path that causes such a situation. > > > > I want to understand what this means. Did you try to think of pathological cases that could lead to this situation? Can we just create a test that sets 100,000 properties on a DOM node all pointing to its parent node? > > maciej: > > Wouldn't something like Adam's test case from comment #60 trigger this leak? > > These two cases don't cause the memory leak. First of all, the memory leak does not happen unless there is "refcount synonym". In other words, the memory leak happens only when we misinterpret the refcount value. Specifically, the memory leak can happen in a case where: > > - the refcount is 65538. > - the actual situation is "0 wrapper refcount + 65538 normal refcounts". > - but we interpret it as "1 wrapper refcount + 2 normal refcounts". > > In the test cases you suggested, the guys who keep 65535 refcounts are wrapper objects. Thus, the test cases do not cause the misinterpretation. The misinterpretation can happen only when there are 65535 or more (direct or indirect) normal references from "any Node in the tree to which Node X belongs" to "Node X" _without_passing_any_wrapper_object_. That does not sound right to me. In Adam's test case, the non-wrapper TreeWalker holds a reference to the Node. TreeWalkers can have their own separate lifetimes and are not part of a DOM tree, so they need to each separately ref their node. Further, your document says the wrapper refcount is only for separate wrappers in different isolated worlds. That is why you say it is ok for it to only be 15 bits. If any wrapper affected the wrapper refcount instead of the normal refcount, not just wrappers for the same object in different isolated worlds, then you would trivially be able to overflow the 15 bit wrapper refcount. If Adam's test case used the wrapper refcount instead of the normal refcount, it would itself overflow 15 bits. And I don't see how it could maintain correctness without using either refcount. > > maciej: > > Or a node with more than 65535, which is allowed by the DOM, and could happen in case of a table with a huge number of rows? > > Given that tree edges are not refcounted, this won't cause the memory leak. What does create non-wrapper references, then, if it's neither wrappers nor tree edges? > > > maciej: > > > (If there is any algorithm that is guaranteed not to leak memory, that would be definitely better.) > > > > The current normal refcounting algorithm doesn't have this issue. > > But the current normal refcounting algorithm causes the wrong DOM lifetime semantics, which is the problem I want to fix. We want a refcounting algorithm that can count up a wrapper refcount and a normal refcount separately using only one 32-bit integer:) While it would be nice to fix that, a memory leak is worse than the problem you are trying to fix. > > > > Given the performance risk of this change, and the specific request from Filip, it seems like testing before landing is the right thing to do. > > Sure! Let me do the page load tests after we can reach a consensus on the refcounting problem. I don't think you are going to get consensus that leaking memory is acceptable. Unless it's provably the case that the leak is impossible. But it doesn't sound like it to me.

Adam Barth

Comment 78 2012-10-02 23:42:29 PDT

othermaciej and I had an extensive discussion of this bug on #webkit. Here's an attempt at a summary: The key thing to understand is that there are four kinds of references to DOM nodes: 1) Edges of the DOM tree itself 2) References from JS wrappers in various isolated worlds 3) References within the DOM tree (i.e., from one node to another in C++) 4) References from outside the DOM tree The patch can run into trouble in the following cases: A) There are more than 2^15 references of type (2). This can occur only when there are that many isolated wolds. othermaciej and I agree that this is a reasonable API limit, but we should probably enforce that limit. B) There are more than 2^15 references of type (2) that point to a single node. If this occurs, we will leak the tree for much the same reason we leak the tree today if a node holds a reference to one of its ancestors. Specifically, we'll think that there is a wrapper pointing to the popular node and therefore retain the tree. However, retaining the tree will cause us to maintaing the 2^15 references to the node. The patch does not run into trouble if, as discussed above, there are >2^15 references of type (4). The behavior is different than before the patch, but the difference in behavior is both more correct and does not cause a memory leak. Specifically, if there are >2^15 references of type (4) to a single node, we'll retain the tree until there are fewer than 2^15 references to that node. Given the forgoing, othermaciej's remaining concerns are in regards to case (B). Specifically, he's concerned about whether (B) can occur. If (B) can occur with realistic content, he would view that as a huge showstopper concern. In addition, he would view any leak that occurs with this patch but not without as a major concern. Leaks that occur without the patch but do not occur with the patch are progressions, but he's troubled by trading one set of leaks for another. We looked around a bit to see if we could find any examples of case (B) by inspection. Instead, we found a leak that can occur today due to an obscure violation of the "don't ref your ancestor" rule. (I filed that issue as Bug 98234.) My view differs a bit from othermaciej's in this regard. In my view, if the leaks don't occur in practice but instead only occur in security-exploit-style situations, then we should fix these obscure leaks whether they're caused by (B) or by violations of "don't ref your parent." However, for both sorts of leaks, it's likely more important to fix crash bugs that are hit by users in the wild.

Maciej Stachowiak

Comment 79 2012-10-03 01:49:41 PDT

(In reply to comment #78) > > We looked around a bit to see if we could find any examples of case (B) by inspection. Instead, we found a leak that can occur today due to an obscure violation of the "don't ref your ancestor" rule. (I filed that issue as Bug 98234.) I don't know if that's totally accurate. I spent maybe 5 minutes looking for examples of case (B) and I think what I found would be likely to turn out to be such an example on closer examination (in addition to having a potential leak with the current code base under different circumstances). I'm not going to do a full audit myself because I believe it's up to the advocates of the patch to show that it is safe from the identified leak hazard. > > My view differs a bit from othermaciej's in this regard. In my view, if the leaks don't occur in practice but instead only occur in security-exploit-style situations, then we should fix these obscure leaks whether they're caused by (B) or by violations of "don't ref your parent." However, for both sorts of leaks, it's likely more important to fix crash bugs that are hit by users in the wild. We need to consider the change of programming disciplines as well as the replacement of one set of bugs with another. Is there a rule for correct use of the proposed new refcounting model that is as easy to understand as "don't ref your ancestors", or alternately mechanically enforceable at compile time? The best I can think of is "don't ref any node in a DOM subtree 2^16 or more times from any combination of other nodes in that subtree", and I would have no idea how to try to follow that rule, or consider it in patch review. Perhaps there is a better rule that is easier to follow consistently than "don't ref your ancestors" but I do not know what it is.

Adam Barth

Comment 80 2012-10-03 02:16:44 PDT

Comment on attachment 166147 [details] Patch I'd like to thank everyone, especially Maciej, for taking the time to understand this patch and for providing constructive feedback. I'm marking this patch r- because it looks like we'll need to iterate on the approach some more.

Geoffrey Garen

Comment 81 2012-10-04 21:08:18 PDT

Created attachment 167240 [details] Patch

Geoffrey Garen

Comment 82 2012-10-04 21:11:57 PDT

Maciej asked me if there was a way to fix this in WebKit, and I came up with this. I think this is a better solution than rewriting referenced counting in the DOM because this is simple and efficient, and it doesn't introduce security bugs or leaks.

Kentaro Hara

Comment 83 2012-10-04 21:17:38 PDT

Comment on attachment 167240 [details] Patch Would you add the same logic to the V8 port? (Or please skip the test in Chromium. I'll add the same logic later.) I'd like to r+ it if there is no objection. Although this solution is not helpful for reclaiming DOM objects in minor GC cycles, we should anyway fix the current wrong semantics.

Maciej Stachowiak

Comment 84 2012-10-04 21:24:11 PDT

(In reply to comment #82) > Maciej asked me if there was a way to fix this in WebKit, and I came up with this. > > I think this is a better solution than rewriting referenced counting in the DOM because this is simple and efficient, and it doesn't introduce security bugs or leaks. Did you performance test to ensure this doesn't cause a regression, and if so what tests did you run?

Geoffrey Garen

Comment 85 2012-10-04 21:43:39 PDT

> Did you performance test to ensure this doesn't cause a regression, and if so what tests did you run? I was concerned that this edge case wouldn't show up in normal performance testing, so I ran these special-case tests: (*) A stress test of removing subtrees from the DOM via .innerHTML: for (var i = 0; i < 5000; ++i) { p..innerHTML = ""; p.appendChild(div); } => No regression. (Cost dominated by innerHTML.) (*) A stress test of removing subtrees from the DOM via DOM APIs: for (var i = 0; i < 5000; ++i) { p.removeChild(div); p.appendChild(div); } => No regression. (More efficient than innerHTML. Cost still dominated by DOM tree maintenance.) (*) A stress test of cycling subtrees through the DOM to make sure they didn't leak: Same as above, but new nodes each time instead of recycled nodes. => No leak. (*) Kentaro's pathological test cases, attached to this bug => No regression. (*) Visited facebook.com, google.com, gmail.com with a breakpoint on the slow path of willCreatePossiblyOrphanedTreeByRemoval() => Called ~200 times (compared to ~400,000 total JS allocations).

Maciej Stachowiak

Comment 86 2012-10-04 22:10:06 PDT

That testing sounds great. I think it would also be worthwhile to verify that the edge case does not in fact show up in normal perf testing. Does that seem reasonable, or is the risk of regressing something like Dromaeo or a page load speed benchmark too low to worry about it? (In reply to comment #85) > > Did you performance test to ensure this doesn't cause a regression, and if so what tests did you run? > > I was concerned that this edge case wouldn't show up in normal performance testing, so I ran these special-case tests: > > (*) A stress test of removing subtrees from the DOM via .innerHTML: > > for (var i = 0; i < 5000; ++i) { > p..innerHTML = ""; > p.appendChild(div); > } > > => No regression. (Cost dominated by innerHTML.) > > (*) A stress test of removing subtrees from the DOM via DOM APIs: > > for (var i = 0; i < 5000; ++i) { > p.removeChild(div); > p.appendChild(div); > } > > => No regression. (More efficient than innerHTML. Cost still dominated by DOM tree maintenance.) > > (*) A stress test of cycling subtrees through the DOM to make sure they didn't leak: > > Same as above, but new nodes each time instead of recycled nodes. > > => No leak. > > (*) Kentaro's pathological test cases, attached to this bug > > => No regression. > > (*) Visited facebook.com, google.com, gmail.com with a breakpoint on the slow path of willCreatePossiblyOrphanedTreeByRemoval() > > => Called ~200 times (compared to ~400,000 total JS allocations).

Geoffrey Garen

Comment 87 2012-10-04 23:01:55 PDT

> That testing sounds great. I think it would also be worthwhile to verify that the edge case does not in fact show up in normal perf testing. Does that seem reasonable, or is the risk of regressing something like Dromaeo or a page load speed benchmark too low to worry about it? I ran the Dromaeo DOM core tests: NO PATCH: 1496.95runs/s PATCH: 1502.92runs/s I don't think page load speed is a useful benchmark for this patch because navigation doesn't trigger this function. (I double-checked this on the URLs I mentioned.) Only direct manipulation via DOM APIs triggers this function.

Geoffrey Garen

Comment 88 2012-10-04 23:05:28 PDT

> Would you add the same logic to the V8 port? (Or please skip the test in Chromium. I'll add the same logic later.) Honestly, TestExpectations has gotten so complicated that I don't feel comfortable adding to it. If you can supply an equivalent toV8 line for the "toJS(...)" line @ JSNodeCustom.h:92, I'd be happy to land it with this patch.

WebKit Review Bot

Comment 89 2012-10-04 23:11:22 PDT

Comment on attachment 167240 [details] Patch Attachment 167240 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14171581 New failing tests: fast/dom/gc-5.html fast/dom/gc-12.html fast/dom/gc-dom-tree-lifetime.html fast/dom/gc-3.html

Kentaro Hara

Comment 90 2012-10-04 23:19:41 PDT

(In reply to comment #88) > > Would you add the same logic to the V8 port? (Or please skip the test in Chromium. I'll add the same logic later.) > > Honestly, TestExpectations has gotten so complicated that I don't feel comfortable adding to it. > > If you can supply an equivalent toV8 line for the "toJS(...)" line @ JSNodeCustom.h:92, I'd be happy to land it with this patch. OK, I'll do it in hours.

Mark Rowe (bdash)

Comment 91 2012-10-04 23:37:46 PDT

Comment on attachment 167240 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=167240&action=review > Source/WebCore/dom/ContainerNode.cpp:39 > +#if USE(JSC) > +#include "JSNode.h" > +#endif Can you please move this outside of the sorted list of #include statements? #if'd includes are generally put in a separate paragraph from the normal #includes.

Gavin Barraclough

Comment 92 2012-10-04 23:39:01 PDT

Comment on attachment 167240 [details] Patch r+ to this patch as is for JSC, pending adding Kentaro's changes to support V8. Looking at the test cases, I see test coverage (in gc-12) where a wrapper is created on a node in a disconnected subtree, ensuring that the parent remains alive. I think we discussed at least one more possible route to this problem, where a singleton disconnected node could be given children and became a root. Is this a real problem, is it possible to write a test case, and is it covered by one of these test cases?

WebKit Review Bot

Comment 93 2012-10-05 00:02:33 PDT

Comment on attachment 167240 [details] Patch Attachment 167240 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/14171604 New failing tests: fast/dom/gc-5.html fast/dom/gc-12.html fast/dom/gc-dom-tree-lifetime.html fast/dom/gc-3.html

Kentaro Hara

Comment 94 2012-10-05 00:20:50 PDT

Created attachment 167267 [details] With V8 side fix As V8 does not implement [CustomHeader], we cannot add willCreatePossiblyOrphanedTreeByRemoval() to V8NodeCustom.h. So I added it to V8DOMWrapper.cpp. I'm investigating performance impacts on Chromium.

Kentaro Hara

Comment 95 2012-10-05 00:48:07 PDT

(In reply to comment #87) > I ran the Dromaeo DOM core tests: > > NO PATCH: 1496.95runs/s > PATCH: 1502.92runs/s Is this a total score of Dromaeo core tests? The total score is strongly affected by dom-query, and thus the score is not so meaningful. Would you check the performance of each Dromaeo DOM tests in Safari again just in case? I observed 6.5% regression in Dromaeo/dom-modify in Chromium. [no patch] dom-modify= 4182.68393722 runs/s dom-modify= 4258.18237669 runs/s dom-modify= 4279.1751288 runs/s dom-modify= 4270.84756712 runs/s dom-modify= 4285.91877381 runs/s [with patch] dom-modify= 3988.47523316 runs/s dom-modify= 4022.2093293 runs/s dom-modify= 4024.93821216 runs/s dom-modify= 3970.77244907 runs/s dom-modify= 3970.77244907 runs/s Even though I inlined willCreatePossiblyOrphanedTreeByRemoval(), I couldn't kill the perf regression.

Geoffrey Garen

Comment 96 2012-10-05 09:18:26 PDT

> a singleton disconnected node could be given children and became a root I'll add a test case for this to gc-12.html. I realized after our conversation that this case does not require special treatment in code. It is not possible to modify a singleton node without that node already having a JavaScript reference (i.e., the reference you are using in the modifying expression). A general way to say this is that the DOM only has API for implicitly creating an unreferenced parent due to removal (.innerHTML, editing, etc.); there's no API for implicitly creating an unreferenced parent due to insertion, akin to .ancestorHTML. (An older version of this patch included willCreatePossiblyOrphanedTreeByInsertion, to match willCreatePossiblyOrphanedTreeByRemoval. When I tried to write a test for it, I realized there's no such thing.)

Geoffrey Garen

Comment 97 2012-10-05 14:18:11 PDT

> Can you please move this outside of the sorted list of #include statements? OK.

Geoffrey Garen

Comment 98 2012-10-05 15:25:27 PDT

> Is this a total score of Dromaeo core tests? Yes. > I observed 6.5% regression in Dromaeo/dom-modify in Chromium. I re-ran dom-modify @ WebKit r130453, using dromaeo.com and PerformanceTests/Dromaeo/dom-modify.html. dromaeo.com: NO PATCH 431.34runs/s ±2.05% PATCH 440.11runs/s ±2.57% NOPATCH avg 3842.9124664019946 runs/s ±77 PATCH avg 3974.452005804083 runs/s ±74 Neither result seems to show a statistically significant change.

Geoffrey Garen

Comment 99 2012-10-05 15:33:35 PDT

Kentaro, would you like me to land the V8 code or leave it out?

Geoffrey Garen

Comment 100 2012-10-05 15:35:14 PDT

Created attachment 167397 [details] patch + Gavin's and Mark's suggested edits - V8 code

Kentaro Hara

Comment 101 2012-10-05 17:34:29 PDT

(In reply to comment #99) > Kentaro, would you like me to land the V8 code or leave it out? I'd like to investigate the perf regression observed in the V8 side change. I'll do it next week. So please land the JSC side patch first.

Geoffrey Garen

Comment 102 2012-10-06 11:27:39 PDT

Committed r130584: <http://trac.webkit.org/changeset/130584>

Darin Adler

Comment 103 2012-10-06 11:34:48 PDT

Comment on attachment 167397 [details] patch + Gavin's and Mark's suggested edits - V8 code View in context: https://bugs.webkit.org/attachment.cgi?id=167397&action=review > Source/WebCore/bindings/js/JSNodeCustom.h:76 > +// the C++ DOM, we ensure that the root of every tree has a JavaScript > +// wrapper. I would have moved that orphan word to the previous line. > Source/WebCore/bindings/js/JSNodeCustom.h:86 > + if (!root->isContainerNode()) > + return; > + > + if (!toContainerNode(root)->hasChildNodes()) > + return; You could combine both of these and just say: if (!root->hasChildNodes()) return; I believe you’d generate essentially the identical code by doing that. Is there some advantage to the longer form?

Geoffrey Garen

Comment 104 2012-10-06 11:48:26 PDT

> I would have moved that orphan word to the previous line. OK. > You could combine both of these and just say: > > if (!root->hasChildNodes()) > return; Aha! I'll do this -- I didn't notice that Node had this interface.

Geoffrey Garen

Comment 105 2012-10-06 11:50:51 PDT

Committed r130587: <http://trac.webkit.org/changeset/130587>

Geoffrey Garen

Comment 106 2012-10-07 15:57:12 PDT

ASSERT fix. Committed r130611: <http://trac.webkit.org/changeset/130611>

Kentaro Hara

Comment 107 2012-10-08 17:52:22 PDT

Hmmm, I observe 6% perf regression in Dromaeo/dom-modify with the V8 side fix. I'll land a couple of patches that optimizes DOM mutations to offset the perf regression, before landing the V8 side fix.

Kentaro Hara

Comment 108 2012-10-08 23:37:55 PDT

Now the correct DOM lifetime semantics was implemented in JSC. The semantics will be also soon implemented in V8, once I can solve a couple of perf problems. On the other hand, the original objective of this bug was to fix V8's GC so that it can collect DOM objects in minor GC cycles. If you're interested in the V8's GC issue, please follow bug 98725. In bug 98725, we are trying to fix V8's GC by another algorithm that affects V8 and V8 binding only, with little change in WebCore/dom.

Kentaro Hara

Comment 109 2012-12-22 03:25:43 PST

Re-opening this bug since it is not yet fixed in V8. Performance regressions have been preventing me from landing the fix to V8.

Ryosuke Niwa

Comment 110 2018-09-08 01:50:03 PDT

This has been fixed in JSC and V8 binding is no longer a thing in WebKit.

Radar WebKit Bug Importer

Comment 111 2018-09-08 01:51:26 PDT

<rdar://problem/44254672>

Attachments
Performance tests (2.05 KB, text/html) 2012-06-11 20:36 PDT, Kentaro Hara	no flags	Details
Patch (15.02 KB, patch) 2012-06-11 20:39 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
the first WIP patch to observe build bots (66.04 KB, patch) 2012-06-20 23:28 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
WIP patch2 (73.08 KB, patch) 2012-06-22 07:03 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
WIP patch3 (65.12 KB, patch) 2012-06-25 17:43 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
WIP patch4 (66.00 KB, patch) 2012-06-25 23:47 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
Patch (64.77 KB, patch) 2012-06-27 05:37 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
patch for 'rough' review (101.44 KB, patch) 2012-07-31 05:09 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
Simplified the algorithm. Fixed crashes. Fixed memory leaks. But the perf regression of Dromaeo/dom-modify is still unacceptable. I'm working on the optimization. (71.78 KB, patch) 2012-08-27 21:40 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
Fixed perf regression (76.15 KB, patch) 2012-09-03 19:02 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
rebased patch (76.10 KB, patch) 2012-09-03 19:17 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
overall comments are appreciated before serious review (75.81 KB, patch) 2012-09-06 00:48 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
Pathological test case 1 (1.74 KB, text/html) 2012-09-24 22:46 PDT, Kentaro Hara	no flags	Details
Pathological test case 2 (1.58 KB, text/html) 2012-09-24 22:48 PDT, Kentaro Hara	no flags	Details
Patch (76.31 KB, patch) 2012-09-27 23:19 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
Patch (23.48 KB, patch) 2012-10-04 21:08 PDT, Geoffrey Garen	webkit.review.bot: commit-queue-	Details Formatted Diff Diff
With V8 side fix (8.51 KB, patch) 2012-10-05 00:20 PDT, Kentaro Hara	no flags	Details Formatted Diff Diff
patch + Gavin's and Mark's suggested edits - V8 code (23.94 KB, patch) 2012-10-05 15:35 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Show Obsolete (12) View All Add attachment proposed patch, testcase, etc.

abarth
ap
barraclough
cmarcelo
darin
dglazkov
dominicc
donggwan.kim
fenerliserdar81
fishd
fmalita
fpizlo
ggaren
gregsimon
gustavo
japhet
koivisto
mburtin
mifenton
mihnea
mitz
mjs
pdr
philn
rafael.lobo
rniwa
sam
schenney
syoichi
webkit-bug-importer
webkit.review.bot
xan.lopez
yurys