RESOLVED FIXED 184307
Custom elements in a reaction queue can lose its JS wrapper and become HTMLUnknownElement
https://bugs.webkit.org/show_bug.cgi?id=184307
Summary Custom elements in a reaction queue can lose its JS wrapper and become HTMLUn...
jairam.ramanathan
Reported 2018-04-04 09:16:15 PDT
Created attachment 337178 [details] Test page to reproduce issue We've seen intermittent failures where custom elements fail to be upgraded (i.e. end up as HTMLUnknownElement rather than the expected class) when they're being created via an innerHTML call. The attached test case should demonstrate the issue: it creates an innerHTML string consistent of 1000 <my-component></my-component> instances and then sets it in the DOM 1000 times. In my runs, I'm seeing the connected callback getting invoked in an invalid state hundreds of times for each time through this loop.
Attachments
Test page to reproduce issue (920 bytes, text/html)
2018-04-04 09:16 PDT, jairam.ramanathan
no flags
testcase (reformatted) (957 bytes, text/html)
2018-07-27 03:05 PDT, Frédéric Wang (:fredw)
no flags
Reproduction for trunk (1.01 KB, text/html)
2018-09-13 23:05 PDT, Ryosuke Niwa
no flags
WIP logging (4.77 KB, patch)
2018-09-13 23:49 PDT, Ryosuke Niwa
no flags
WIP (3.45 KB, patch)
2018-09-14 18:47 PDT, Ryosuke Niwa
no flags
WIP2 (11.63 KB, patch)
2018-09-18 21:29 PDT, Ryosuke Niwa
no flags
WIP3 (11.66 KB, patch)
2018-09-19 13:24 PDT, Ryosuke Niwa
no flags
WIP4 (11.41 KB, patch)
2018-09-19 16:21 PDT, Ryosuke Niwa
no flags
Fixes the bug (20.00 KB, patch)
2018-09-20 00:47 PDT, Ryosuke Niwa
keith_miller: review+
Radar WebKit Bug Importer
Comment 1 2018-04-05 19:48:51 PDT
Frédéric Wang (:fredw)
Comment 2 2018-07-23 09:21:21 PDT
Setting innerHTML ends up calling replaceChildrenWithFragment (from Source/WebCore/editing/markup.cpp) which performs several node operations internally (e.g. appendChild). These operations have CEReactions and so should ensure upgrade reactions are invoked when using the public DOM API: https://dom.spec.whatwg.org/#interface-node ; I wonder if we should do something similar winternally.
Frédéric Wang (:fredw)
Comment 3 2018-07-26 01:27:16 PDT
(In reply to Frédéric Wang (:fredw) from comment #2) > Setting innerHTML ends up calling replaceChildrenWithFragment (from > Source/WebCore/editing/markup.cpp) which performs several node operations > internally (e.g. appendChild). These operations have CEReactions and so > should ensure upgrade reactions are invoked when using the public DOM API: > https://dom.spec.whatwg.org/#interface-node ; I wonder if we should do > something similar winternally. I stand corrected. This is supposed to be already handled by the CEReaction extended attribute on innerHTML.
Frédéric Wang (:fredw)
Comment 4 2018-07-27 03:05:49 PDT
Created attachment 345910 [details] testcase (reformatted) I'm not able to reproduce the bug in debug mode :-( I'm able to reproduce the bug in release mode. However, I also tried adding a RELEASE_ASSERT_NOT_REACHED() in the HTMLUnknownElement constructor but it is never reached.
Ryosuke Niwa
Comment 5 2018-08-02 22:48:26 PDT
Oh, this isn't about the custom elements not getting updated. It's that upgraded custom elements sometimes end up losing its JS wrappers :( When it does, we end up re-creating new wrapper so we end up with HTMLUnknownElement JS wrapper without the override to [[Prototype]].
Ryosuke Niwa
Comment 6 2018-08-03 00:46:47 PDT
Huh, there is a bunch of bugs here but one of the primary issue here is that innerHTML is synchronously constructing custom elements. I don't think we're supposed to do that. Fixing that somehow works around the GC problem but it just moves the bug to elsewhere. I'd have to come up with a better test case which doesn't use innerHTML so that the test case will continue to function even after we've switched innerHTML implementation.
Frédéric Wang (:fredw)
Comment 7 2018-08-03 01:01:56 PDT
(In reply to Ryosuke Niwa from comment #6) > Huh, there is a bunch of bugs here but one of the primary issue here is that > innerHTML is synchronously constructing custom elements. I don't think we're > supposed to do that. I think when setting innerHTML, the HTML fragment parsing algorithm is used so we should create an HTMLElement and enqueue an upgrade for later per https://html.spec.whatwg.org/multipage/parsing.html#create-an-element-for-the-token... This is what the patch on bug 183586 is supposed to do (see also bug 188190 for preliminary work).
Ryosuke Niwa
Comment 8 2018-08-03 01:05:49 PDT
(In reply to Frédéric Wang (:fredw) from comment #7) > (In reply to Ryosuke Niwa from comment #6) > > Huh, there is a bunch of bugs here but one of the primary issue here is that > > innerHTML is synchronously constructing custom elements. I don't think we're > > supposed to do that. > > I think when setting innerHTML, the HTML fragment parsing algorithm is used > so we should create an HTMLElement and enqueue an upgrade for later per > https://html.spec.whatwg.org/multipage/parsing.html#create-an-element-for- > the-token... This is what the patch on bug 183586 is supposed to do (see > also bug 188190 for preliminary work). That patch fixes this problem at the right place. I'll post a patch to fix it properly elsewhere.
Ryosuke Niwa
Comment 9 2018-09-13 23:05:48 PDT
Created attachment 349736 [details] Reproduction for trunk
Ryosuke Niwa
Comment 10 2018-09-13 23:49:38 PDT
Created attachment 349740 [details] WIP logging Interesting. Simply retaining elements in the custom element reaction queue isn't enough. The attached patch sill knows some failures. Note that one has to manually modify JSNodeOwner::finalize in DerivedSource to do the following: extern HashMap<Element*, unsigned>& finalizedElements(); void JSNodeOwner::finalize(JSC::Handle<JSC::Unknown> handle, void* context) { auto* jsNode = static_cast<JSNode*>(handle.slot()->asCell()); auto& world = *static_cast<DOMWrapperWorld*>(context); auto& node = jsNode->wrapped(); if (is<Element>(node) && node.isDefinedCustomElement()) { finalizedElements().set(&downcast<Element>(node), 1); } uncacheWrapper(world, &jsNode->wrapped(), jsNode); } If I add console.log right above where we append the result, I see that sometimes the wrapper goes away without this code getting code: CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! Found a wrapper-less node! finalized=0 CONSOLE MESSAGE: line 35: FAIL:1999 CONSOLE MESSAGE: line 37: PASS! Found a wrapper-less node! finalized=0 CONSOLE MESSAGE: line 35: FAIL:1999 CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! Found a wrapper-less node! finalized=0 CONSOLE MESSAGE: line 35: FAIL:1999 CONSOLE MESSAGE: line 37: PASS! Found a wrapper-less node! finalized=1 CONSOLE MESSAGE: line 35: FAIL:1999 CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! Found a wrapper-less node! finalized=1 CONSOLE MESSAGE: line 35: FAIL:1999 CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! CONSOLE MESSAGE: line 37: PASS! It's a tad strange that JSElement wrappers are going away without JSNodeOwner::finalize getting called.
Ryosuke Niwa
Comment 11 2018-09-14 00:14:39 PDT
This is VERY strange. Even if I leaked every custom element's JS wrapper by adding it as a property to the global object using a private name, inline wrappers are not there. Maybe somehow we're failing to create a JS element wrapper in some cases??? Index: Source/WebCore/bindings/js/JSDOMWindowCustom.cpp =================================================================== --- Source/WebCore/bindings/js/JSDOMWindowCustom.cpp (revision 236001) +++ Source/WebCore/bindings/js/JSDOMWindowCustom.cpp (working copy) @@ -70,11 +70,13 @@ { if (Frame* frame = wrapped().frame()) visitor.addOpaqueRoot(frame); - + // Normally JSEventTargetCustom.cpp's JSEventTarget::visitAdditionalChildren() would call this. But // even though DOMWindow is an EventTarget, JSDOMWindow does not subclass JSEventTarget, so we need // to do this here. wrapped().visitJSEventListeners(visitor); + + CustomElementReactionQueue::visitEnqueuedElements(visitor); } #if ENABLE(USER_MESSAGE_HANDLERS)
Ryosuke Niwa
Comment 12 2018-09-14 01:17:39 PDT
Oh, I was missing upgrade case. Okay, so leaking all JS element wrappers would *fix* this bug. It's still strange that JSNodeOwner::finalize isn't running for some wrappers being collected.
Ryosuke Niwa
Comment 13 2018-09-14 18:47:58 PDT
Created attachment 349844 [details] WIP Ah, the issue was that CustomElementReactionQueue::invokeAll was swapping Vector. Without it, it works for this reproduction.
Ryosuke Niwa
Comment 14 2018-09-18 21:29:59 PDT
EWS Watchlist
Comment 15 2018-09-18 21:32:09 PDT
Attachment 350093 [details] did not pass style-queue: ERROR: Source/WebCore/dom/CustomElementReactionQueue.h:31: Bad include order. Mixing system and custom headers. [build/include_order] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Ryosuke Niwa
Comment 16 2018-09-19 13:24:12 PDT
EWS Watchlist
Comment 17 2018-09-19 13:43:12 PDT
Attachment 350146 [details] did not pass style-queue: ERROR: Source/WebCore/dom/StrongNodeRef.cpp:27: You should add a blank line after implementation file's own header. [build/include_order] [4] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Ryosuke Niwa
Comment 18 2018-09-19 16:21:22 PDT
EWS Watchlist
Comment 19 2018-09-19 16:22:46 PDT
Attachment 350157 [details] did not pass style-queue: ERROR: Source/WebCore/dom/StrongNodeRef.cpp:27: You should add a blank line after implementation file's own header. [build/include_order] [4] Total errors found: 1 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Ryosuke Niwa
Comment 20 2018-09-20 00:47:34 PDT
Created attachment 350178 [details] Fixes the bug
Ryosuke Niwa
Comment 21 2018-09-21 12:39:09 PDT
Ping reviewers
Keith Miller
Comment 22 2018-09-21 12:53:31 PDT
Comment on attachment 350178 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=350178&action=review r=me if you answer my question? > Source/WebCore/dom/StrongNodeRef.cpp:35 > + static NeverDestroyed<HashCountedSet<Node*>> map; Are there any Nodes that can exist in a worker? If they can't, can you add an assertion that we are on the MainThread?
Ryosuke Niwa
Comment 23 2018-09-21 14:12:41 PDT
(In reply to Keith Miller from comment #22) > Comment on attachment 350178 [details] > Fixes the bug > > View in context: > https://bugs.webkit.org/attachment.cgi?id=350178&action=review > > r=me if you answer my question? > > > Source/WebCore/dom/StrongNodeRef.cpp:35 > > + static NeverDestroyed<HashCountedSet<Node*>> map; > > Are there any Nodes that can exist in a worker? If they can't, can you add > an assertion that we are on the MainThread? Node can only be mutated in DOM but GC threads accesses them via isReachableFromOpaqueRoot while the main thread is suspended. As such, we can't add a isMainThread check although we can check if it's main thread or GC threads.
Geoffrey Garen
Comment 24 2018-09-21 14:31:59 PDT
Comment on attachment 350178 [details] Fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=350178&action=review > Source/WebCore/dom/StrongNodeRef.h:36 > +class StrongNodeRefMap { I wonder if we can come up with a better name. You can probably remove "Node" from the name because the template declaration prohibits non-Node types already, and the declaration site will always name the Node type. Some options: GCReachableRef GCProtectedRef StrongRef
Simon Fraser (smfr)
Comment 25 2018-09-21 16:31:57 PDT
Can you run some tests with --world-leaks with and without the patch, and check that you didn't introduce any new document leaks?
Ryosuke Niwa
Comment 26 2018-09-21 16:46:10 PDT
(In reply to Simon Fraser (smfr) from comment #25) > Can you run some tests with --world-leaks with and without the patch, and > check that you didn't introduce any new document leaks? ./Tools/Scripts/run-webkit-tests --world-leaks --debug LayoutTests/fast/custom-elements/ LayoutTests/imported/w3c/web-platform-tests/custom-elements/ shows no leaks with the patch.
Ryosuke Niwa
Comment 27 2018-09-21 17:17:19 PDT
Darin Adler
Comment 28 2018-09-21 21:06:19 PDT
I don’t understand this. There is lots of code that can keep an element alive; in all such cases the element needs to keep its wrapper from being garbage collected since if anyone can recover a pointer to the element, it can also get to the wrapper. I don’t understand how this case is different.
Ryosuke Niwa
Comment 29 2018-09-21 22:35:20 PDT
(In reply to Darin Adler from comment #28) > I don’t understand this. There is lots of code that can keep an element > alive; in all such cases the element needs to keep its wrapper from being > garbage collected since if anyone can recover a pointer to the element, it > can also get to the wrapper. I don’t understand how this case is different. Yes, there is a lot of wrong code in WebCore right now. Basically anywhere where keep a Ref/RefPtr to a Node in the order to use it later has the same bug. e.g. https://bugs.webkit.org/show_bug.cgi?id=167652 is an issue about slotchange event. I know off the top of my head mutation observers & EventDispatcher::dispatchEvent also have the same issue. There are probably hundreds of other places where a similar bug resides. As you're well aware, ordinarily, a node's wrapper is kept alive by the virtue of it being in a document. Because document is an active DOM object, it's marked by GC as long as it's alive, and isReachableFromOpaqueRoot would return true on any node which is connected to such a document. Similarly, any other C++ code which holds onto Ref as a part of its tree structure / DOM API relationship such as node lists, etc... add those related objects in visitAdditionalChildren. However, in cases where WebCore / C++ code temporarily holds onto an element or some other DOM object exposed to scripts via Ref/RefPtr, there is no clear ownership. e.g. when a custom element is enqueued onto a reaction queue and subsequently removed from a document, it's unclear as to which object will be responsible for visiting this node. This patch's goal is to introduce a generic mechanism which can be used to retain JS wrappers of these scenarios. I intend to fix more places in the coming days / weeks / months / years.
Darin Adler
Comment 30 2018-09-22 20:55:11 PDT
(In reply to Ryosuke Niwa from comment #29) > This patch's goal is to introduce a generic mechanism which can be used to > retain JS wrappers of these scenarios. I intend to fix more places in the > coming days / weeks / months / years. I am not sure this is the right way to fix this. I think we should pursue a fix that doesn’t require us to put this in tens or hundreds of places. I seem to recall a different design for this that I discussed with Geoff many years back.
Ryosuke Niwa
Comment 31 2018-09-24 16:08:27 PDT
(In reply to Darin Adler from comment #30) > (In reply to Ryosuke Niwa from comment #29) > > This patch's goal is to introduce a generic mechanism which can be used to > > retain JS wrappers of these scenarios. I intend to fix more places in the > > coming days / weeks / months / years. > > I am not sure this is the right way to fix this. I think we should pursue a > fix that doesn’t require us to put this in tens or hundreds of places. I > seem to recall a different design for this that I discussed with Geoff many > years back. Hm... this is the design that came out of the discussions I've had with Geoff & JSC folks. Do you recall what the alternative design you discussed with Geoff was? If there is a better design, we should clearly adopt that instead.
Geoffrey Garen
Comment 32 2018-09-26 09:14:58 PDT
Comment on attachment 350178 [details] Fixes the bug Another option is to reuse EventTarget::isFiringEventListeners(). Might be faster to set / check a bit vs hash table. Clarifies that this behavior is all about async event dispatch. (A strong GC reference is a memory leak unless you can prove that some future event will clear the reference; stating that this reference is for the purpose of firing an async event listener might help communicate that.)
Darin Adler
Comment 33 2018-09-26 09:22:27 PDT
(In reply to Geoffrey Garen from comment #32) > Another option is to reuse EventTarget::isFiringEventListeners(). Might be > faster to set / check a bit vs hash table. Clarifies that this behavior is > all about async event dispatch. (A strong GC reference is a memory leak > unless you can prove that some future event will clear the reference; > stating that this reference is for the purpose of firing an async event > listener might help communicate that.) This shows that I don’t yet understand the issues clearly enough. I almost wish we had a white paper about it; hard to get right. When reading Ryosuke’s earlier comments, I thought the issue was literally "any object in the DOM that holds a strong reference to another object in the DOM". But you seem to be saying here that it has a somewhat narrower scope and events the code plans to dispatch in the future is a key factor.
Geoffrey Garen
Comment 34 2018-09-26 09:37:37 PDT
(In reply to Darin Adler from comment #30) > I am not sure this is the right way to fix this. I think we should pursue a > fix that doesn’t require us to put this in tens or hundreds of places. I > seem to recall a different design for this that I discussed with Geoff many > years back. Ryosuke, can you make a list of where we might need to use this technique? Let's see how many places it is. I'm not sure if there's another way to solve this problem. What's hard about this case is that a JavaScript event will fire on a node in the future, and the only record of that fact is a C++ RefPtr, and no other references in JS or C++ point to that node. That makes this case similar to an ActiveDOMObject for audio or video or networking, where the only thing that tells us that the wrapper should stay alive is the "hasPendingActivity()" state. We can't say "If a RefPtr points to a node, the node's wrapper stays alive", since that's a direct circular reference. There might be a way to give a node two reference counts, one for its wrappers and one for its RefPtrs, and say "If a node's RefPtr refcount is non-zero then its wrapper stays alive, and if either refcount is non-zero then the node stays alive". That might still trigger an indirect circular reference. But maybe it would Just Work (TM).
Ryosuke Niwa
Comment 35 2018-09-26 12:57:50 PDT
(In reply to Darin Adler from comment #33) > (In reply to Geoffrey Garen from comment #32) > > Another option is to reuse EventTarget::isFiringEventListeners(). Might be > > faster to set / check a bit vs hash table. Clarifies that this behavior is > > all about async event dispatch. (A strong GC reference is a memory leak > > unless you can prove that some future event will clear the reference; > > stating that this reference is for the purpose of firing an async event > > listener might help communicate that.) > > This shows that I don’t yet understand the issues clearly enough. I almost > wish we had a white paper about it; hard to get right. When reading > Ryosuke’s earlier comments, I thought the issue was literally "any object in > the DOM that holds a strong reference to another object in the DOM". But you > seem to be saying here that it has a somewhat narrower scope and events the > code plans to dispatch in the future is a key factor. The synapsis of the problem is that when C++ code retains a Node for a later use (e.g. async event, mutation observer, etc...), there is no way for GC to know that the wrapper of such a node needs to be retained. For example, let's say we have an element which is removed from a tree with a mutation observer which observes the tree. WebKit would add a mutation record of this element, and invokes the observer at the end of the current microtask. If we were to run GC between the two, there is no way for GC to realize that this element, which is no longer in the document and not referenced by any JS needs to be kept alive. GCReachableRef is meant to address these temporary C++-only references to nodes. It's not meant to retain objects in live data structures like Node tree, CSS OM, etc... which need to express relationship between objects with JS wrappers and JS objects. An alternative name I've considered was something like AsyncEventRef and DeferredNodeRef. (In reply to Geoffrey Garen from comment #34) > (In reply to Darin Adler from comment #30) > > I am not sure this is the right way to fix this. I think we should pursue a > > fix that doesn’t require us to put this in tens or hundreds of places. I > > seem to recall a different design for this that I discussed with Geoff many > > years back. > > Ryosuke, can you make a list of where we might need to use this technique? > Let's see how many places it is. Of the top of my head, I can think of: - Custom elements reaction queues (Fixed) - slotchange event (Fixed in webkit.org/b/167652) - MutationObserver (still broken) - Event path (still broken) - This is the list of event targets we collect in EventDispatcher. Each target can be removed from its parent and lose its wrapper while we're invoking event listeners on other targets. - GenericEventQueue (still broken)
Note You need to log in before you can comment on or make changes to this bug.