Bug 184307 - Custom elements in a reaction queue can lose its JS wrapper and become HTMLUnknownElement
Summary: Custom elements in a reaction queue can lose its JS wrapper and become HTMLUn...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: Safari 11
Hardware: Macintosh macOS 10.13
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords: InRadar
Depends on:
Blocks: 154907
  Show dependency treegraph
 
Reported: 2018-04-04 09:16 PDT by jairam.ramanathan
Modified: 2018-09-26 12:57 PDT (History)
18 users (show)

See Also:


Attachments
Test page to reproduce issue (920 bytes, text/html)
2018-04-04 09:16 PDT, jairam.ramanathan
no flags Details
testcase (reformatted) (957 bytes, text/html)
2018-07-27 03:05 PDT, Frédéric Wang (:fredw)
no flags Details
Reproduction for trunk (1.01 KB, text/html)
2018-09-13 23:05 PDT, Ryosuke Niwa
no flags Details
WIP logging (4.77 KB, patch)
2018-09-13 23:49 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
WIP (3.45 KB, patch)
2018-09-14 18:47 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
WIP2 (11.63 KB, patch)
2018-09-18 21:29 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
WIP3 (11.66 KB, patch)
2018-09-19 13:24 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
WIP4 (11.41 KB, patch)
2018-09-19 16:21 PDT, Ryosuke Niwa
no flags Details | Formatted Diff | Diff
Fixes the bug (20.00 KB, patch)
2018-09-20 00:47 PDT, Ryosuke Niwa
keith_miller: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description jairam.ramanathan 2018-04-04 09:16:15 PDT
Created attachment 337178 [details]
Test page to reproduce issue

We've seen intermittent failures where custom elements fail to be upgraded (i.e. end up as HTMLUnknownElement rather than the expected class) when they're being created via an innerHTML call.  The attached test case should demonstrate the issue: it creates an innerHTML string consistent of 1000 <my-component></my-component> instances and then sets it in the DOM 1000 times.  In my runs, I'm seeing the connected callback getting invoked in an invalid state hundreds of times for each time through this loop.
Comment 1 Radar WebKit Bug Importer 2018-04-05 19:48:51 PDT
<rdar://problem/39228664>
Comment 2 Frédéric Wang (:fredw) 2018-07-23 09:21:21 PDT
Setting innerHTML ends up calling replaceChildrenWithFragment (from Source/WebCore/editing/markup.cpp) which performs several node operations internally (e.g. appendChild). These operations have CEReactions and so should ensure upgrade reactions are invoked when using the public DOM API: https://dom.spec.whatwg.org/#interface-node ; I wonder if we should do something similar winternally.
Comment 3 Frédéric Wang (:fredw) 2018-07-26 01:27:16 PDT
(In reply to Frédéric Wang (:fredw) from comment #2)
> Setting innerHTML ends up calling replaceChildrenWithFragment (from
> Source/WebCore/editing/markup.cpp) which performs several node operations
> internally (e.g. appendChild). These operations have CEReactions and so
> should ensure upgrade reactions are invoked when using the public DOM API:
> https://dom.spec.whatwg.org/#interface-node ; I wonder if we should do
> something similar winternally.

I stand corrected. This is supposed to be already handled by the CEReaction extended attribute on innerHTML.
Comment 4 Frédéric Wang (:fredw) 2018-07-27 03:05:49 PDT
Created attachment 345910 [details]
testcase (reformatted)

I'm not able to reproduce the bug in debug mode :-(

I'm able to reproduce the bug in release mode. However, I also tried adding a RELEASE_ASSERT_NOT_REACHED() in the HTMLUnknownElement constructor but it is never reached.
Comment 5 Ryosuke Niwa 2018-08-02 22:48:26 PDT
Oh, this isn't about the custom elements not getting updated. It's that upgraded custom elements sometimes end up losing its JS wrappers :( When it does, we end up re-creating new wrapper so we end up with HTMLUnknownElement JS wrapper without the override to [[Prototype]].
Comment 6 Ryosuke Niwa 2018-08-03 00:46:47 PDT
Huh, there is a bunch of bugs here but one of the primary issue here is that innerHTML is synchronously constructing custom elements. I don't think we're supposed to do that.

Fixing that somehow works around the GC problem but it just moves the bug to elsewhere. I'd have to come up with a better test case which doesn't use innerHTML so that the test case will continue to function even after we've switched innerHTML implementation.
Comment 7 Frédéric Wang (:fredw) 2018-08-03 01:01:56 PDT
(In reply to Ryosuke Niwa from comment #6)
> Huh, there is a bunch of bugs here but one of the primary issue here is that
> innerHTML is synchronously constructing custom elements. I don't think we're
> supposed to do that.

I think when setting innerHTML, the HTML fragment parsing algorithm is used so we should create an HTMLElement and enqueue an upgrade for later per https://html.spec.whatwg.org/multipage/parsing.html#create-an-element-for-the-token... This is what the patch on bug 183586 is supposed to do (see also bug 188190 for preliminary work).
Comment 8 Ryosuke Niwa 2018-08-03 01:05:49 PDT
(In reply to Frédéric Wang (:fredw) from comment #7)
> (In reply to Ryosuke Niwa from comment #6)
> > Huh, there is a bunch of bugs here but one of the primary issue here is that
> > innerHTML is synchronously constructing custom elements. I don't think we're
> > supposed to do that.
> 
> I think when setting innerHTML, the HTML fragment parsing algorithm is used
> so we should create an HTMLElement and enqueue an upgrade for later per
> https://html.spec.whatwg.org/multipage/parsing.html#create-an-element-for-
> the-token... This is what the patch on bug 183586 is supposed to do (see
> also bug 188190 for preliminary work).

That patch fixes this problem at the right place. I'll post a patch to fix it properly elsewhere.
Comment 9 Ryosuke Niwa 2018-09-13 23:05:48 PDT
Created attachment 349736 [details]
Reproduction for trunk
Comment 10 Ryosuke Niwa 2018-09-13 23:49:38 PDT
Created attachment 349740 [details]
WIP logging

Interesting. Simply retaining elements in the custom element reaction queue isn't enough.
The attached patch sill knows some failures.

Note that one has to manually modify JSNodeOwner::finalize in DerivedSource to do the following:

extern HashMap<Element*, unsigned>& finalizedElements();

void JSNodeOwner::finalize(JSC::Handle<JSC::Unknown> handle, void* context)
{
    auto* jsNode = static_cast<JSNode*>(handle.slot()->asCell());
    auto& world = *static_cast<DOMWrapperWorld*>(context);
    auto& node = jsNode->wrapped();
    if (is<Element>(node) && node.isDefinedCustomElement()) {
        finalizedElements().set(&downcast<Element>(node), 1);
    }
    uncacheWrapper(world, &jsNode->wrapped(), jsNode);
}

If I add console.log right above where we append the result,
I see that sometimes the wrapper goes away without this code getting code:

CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
Found a wrapper-less node! finalized=0
CONSOLE MESSAGE: line 35: FAIL:1999
CONSOLE MESSAGE: line 37: PASS!
Found a wrapper-less node! finalized=0
CONSOLE MESSAGE: line 35: FAIL:1999
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
Found a wrapper-less node! finalized=0
CONSOLE MESSAGE: line 35: FAIL:1999
CONSOLE MESSAGE: line 37: PASS!
Found a wrapper-less node! finalized=1
CONSOLE MESSAGE: line 35: FAIL:1999
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
Found a wrapper-less node! finalized=1
CONSOLE MESSAGE: line 35: FAIL:1999
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!
CONSOLE MESSAGE: line 37: PASS!

It's a tad strange that JSElement wrappers are going away without JSNodeOwner::finalize getting called.
Comment 11 Ryosuke Niwa 2018-09-14 00:14:39 PDT
This is VERY strange. Even if I leaked every custom element's JS wrapper by adding it as a property to the global object using a private name, inline wrappers are not there. Maybe somehow we're failing to create a JS element wrapper in some cases???

Index: Source/WebCore/bindings/js/JSDOMWindowCustom.cpp
===================================================================
--- Source/WebCore/bindings/js/JSDOMWindowCustom.cpp	(revision 236001)
+++ Source/WebCore/bindings/js/JSDOMWindowCustom.cpp	(working copy)
@@ -70,11 +70,13 @@
 {
     if (Frame* frame = wrapped().frame())
         visitor.addOpaqueRoot(frame);
-    
+
     // Normally JSEventTargetCustom.cpp's JSEventTarget::visitAdditionalChildren() would call this. But
     // even though DOMWindow is an EventTarget, JSDOMWindow does not subclass JSEventTarget, so we need
     // to do this here.
     wrapped().visitJSEventListeners(visitor);
+
+    CustomElementReactionQueue::visitEnqueuedElements(visitor);
 }
 
 #if ENABLE(USER_MESSAGE_HANDLERS)
Comment 12 Ryosuke Niwa 2018-09-14 01:17:39 PDT
Oh, I was missing upgrade case. Okay, so leaking all JS element wrappers would *fix* this bug. It's still strange that JSNodeOwner::finalize isn't running for some wrappers being collected.
Comment 13 Ryosuke Niwa 2018-09-14 18:47:58 PDT
Created attachment 349844 [details]
WIP

Ah, the issue was that CustomElementReactionQueue::invokeAll was swapping Vector.
Without it, it works for this reproduction.
Comment 14 Ryosuke Niwa 2018-09-18 21:29:59 PDT
Created attachment 350093 [details]
WIP2
Comment 15 Build Bot 2018-09-18 21:32:09 PDT
Attachment 350093 [details] did not pass style-queue:


ERROR: Source/WebCore/dom/CustomElementReactionQueue.h:31:  Bad include order. Mixing system and custom headers.  [build/include_order] [4]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 16 Ryosuke Niwa 2018-09-19 13:24:12 PDT
Created attachment 350146 [details]
WIP3
Comment 17 Build Bot 2018-09-19 13:43:12 PDT
Attachment 350146 [details] did not pass style-queue:


ERROR: Source/WebCore/dom/StrongNodeRef.cpp:27:  You should add a blank line after implementation file's own header.  [build/include_order] [4]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 18 Ryosuke Niwa 2018-09-19 16:21:22 PDT
Created attachment 350157 [details]
WIP4
Comment 19 Build Bot 2018-09-19 16:22:46 PDT
Attachment 350157 [details] did not pass style-queue:


ERROR: Source/WebCore/dom/StrongNodeRef.cpp:27:  You should add a blank line after implementation file's own header.  [build/include_order] [4]
Total errors found: 1 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 20 Ryosuke Niwa 2018-09-20 00:47:34 PDT
Created attachment 350178 [details]
Fixes the bug
Comment 21 Ryosuke Niwa 2018-09-21 12:39:09 PDT
Ping reviewers
Comment 22 Keith Miller 2018-09-21 12:53:31 PDT
Comment on attachment 350178 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=350178&action=review

r=me if you answer my question?

> Source/WebCore/dom/StrongNodeRef.cpp:35
> +    static NeverDestroyed<HashCountedSet<Node*>> map;

Are there any Nodes that can exist in a worker? If they can't, can you add an assertion that we are on the MainThread?
Comment 23 Ryosuke Niwa 2018-09-21 14:12:41 PDT
(In reply to Keith Miller from comment #22)
> Comment on attachment 350178 [details]
> Fixes the bug
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=350178&action=review
> 
> r=me if you answer my question?
> 
> > Source/WebCore/dom/StrongNodeRef.cpp:35
> > +    static NeverDestroyed<HashCountedSet<Node*>> map;
> 
> Are there any Nodes that can exist in a worker? If they can't, can you add
> an assertion that we are on the MainThread?

Node can only be mutated in DOM but GC threads accesses them via isReachableFromOpaqueRoot while the main thread is suspended. As such, we can't add a isMainThread check although we can check if it's main thread or GC threads.
Comment 24 Geoffrey Garen 2018-09-21 14:31:59 PDT
Comment on attachment 350178 [details]
Fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=350178&action=review

> Source/WebCore/dom/StrongNodeRef.h:36
> +class StrongNodeRefMap {

I wonder if we can come up with a better name.

You can probably remove "Node" from the name because the template declaration prohibits non-Node types already, and the declaration site will always name the Node type.

Some options:

GCReachableRef
GCProtectedRef
StrongRef
Comment 25 Simon Fraser (smfr) 2018-09-21 16:31:57 PDT
Can you run some tests with --world-leaks with and without the patch, and check that you didn't introduce any new document leaks?
Comment 26 Ryosuke Niwa 2018-09-21 16:46:10 PDT
(In reply to Simon Fraser (smfr) from comment #25)
> Can you run some tests with --world-leaks with and without the patch, and
> check that you didn't introduce any new document leaks?

./Tools/Scripts/run-webkit-tests --world-leaks --debug LayoutTests/fast/custom-elements/ LayoutTests/imported/w3c/web-platform-tests/custom-elements/

shows no leaks with the patch.
Comment 27 Ryosuke Niwa 2018-09-21 17:17:19 PDT
Committed r236376: <https://trac.webkit.org/changeset/236376>
Comment 28 Darin Adler 2018-09-21 21:06:19 PDT
I don’t understand this. There is lots of code that can keep an element alive; in all such cases the element needs to keep its wrapper from being garbage collected since if anyone can recover a pointer to the element, it can also get to the wrapper. I don’t understand how this case is different.
Comment 29 Ryosuke Niwa 2018-09-21 22:35:20 PDT
(In reply to Darin Adler from comment #28)
> I don’t understand this. There is lots of code that can keep an element
> alive; in all such cases the element needs to keep its wrapper from being
> garbage collected since if anyone can recover a pointer to the element, it
> can also get to the wrapper. I don’t understand how this case is different.

Yes, there is a lot of wrong code in WebCore right now. Basically anywhere where keep a Ref/RefPtr to a Node in the order to use it later has the same bug. e.g. https://bugs.webkit.org/show_bug.cgi?id=167652 is an issue about slotchange event.

I know off the top of my head mutation observers & EventDispatcher::dispatchEvent also have the same issue. There are probably hundreds of other places where a similar bug resides.

As you're well aware, ordinarily, a node's wrapper is kept alive by the virtue of it being in a document. Because document is an active DOM object, it's marked by GC as long as it's alive, and isReachableFromOpaqueRoot would return true on any node which is connected to such a document. Similarly, any other C++ code which holds onto Ref as a part of its tree structure / DOM API relationship such as node lists, etc... add those related objects in visitAdditionalChildren.

However, in cases where WebCore / C++ code temporarily holds onto an element or some other DOM object exposed to scripts via Ref/RefPtr, there is no clear ownership. e.g. when a custom element is enqueued onto a reaction queue and subsequently removed from a document, it's unclear as to which object will be responsible for visiting this node.

This patch's goal is to introduce a generic mechanism which can be used to retain JS wrappers of these scenarios. I intend to fix more places in the coming days / weeks / months / years.
Comment 30 Darin Adler 2018-09-22 20:55:11 PDT
(In reply to Ryosuke Niwa from comment #29)
> This patch's goal is to introduce a generic mechanism which can be used to
> retain JS wrappers of these scenarios. I intend to fix more places in the
> coming days / weeks / months / years.

I am not sure this is the right way to fix this. I think we should pursue a fix that doesn’t require us to put this in tens or hundreds of places. I seem to recall a different design for this that I discussed with Geoff many years back.
Comment 31 Ryosuke Niwa 2018-09-24 16:08:27 PDT
(In reply to Darin Adler from comment #30)
> (In reply to Ryosuke Niwa from comment #29)
> > This patch's goal is to introduce a generic mechanism which can be used to
> > retain JS wrappers of these scenarios. I intend to fix more places in the
> > coming days / weeks / months / years.
> 
> I am not sure this is the right way to fix this. I think we should pursue a
> fix that doesn’t require us to put this in tens or hundreds of places. I
> seem to recall a different design for this that I discussed with Geoff many
> years back.

Hm... this is the design that came out of the discussions I've had with Geoff & JSC folks. Do you recall what the alternative design you discussed with Geoff was? If there is a better design, we should clearly adopt that instead.
Comment 32 Geoffrey Garen 2018-09-26 09:14:58 PDT
Comment on attachment 350178 [details]
Fixes the bug

Another option is to reuse EventTarget::isFiringEventListeners(). Might be faster to set / check a bit vs hash table. Clarifies that this behavior is all about async event dispatch. (A strong GC reference is a memory leak unless you can prove that some future event will clear the reference; stating that this reference is for the purpose of firing an async event listener might help communicate that.)
Comment 33 Darin Adler 2018-09-26 09:22:27 PDT
(In reply to Geoffrey Garen from comment #32)
> Another option is to reuse EventTarget::isFiringEventListeners(). Might be
> faster to set / check a bit vs hash table. Clarifies that this behavior is
> all about async event dispatch. (A strong GC reference is a memory leak
> unless you can prove that some future event will clear the reference;
> stating that this reference is for the purpose of firing an async event
> listener might help communicate that.)

This shows that I don’t yet understand the issues clearly enough. I almost wish we had a white paper about it; hard to get right. When reading Ryosuke’s earlier comments, I thought the issue was literally "any object in the DOM that holds a strong reference to another object in the DOM". But you seem to be saying here that it has a somewhat narrower scope and events the code plans to dispatch in the future is a key factor.
Comment 34 Geoffrey Garen 2018-09-26 09:37:37 PDT
(In reply to Darin Adler from comment #30)
> I am not sure this is the right way to fix this. I think we should pursue a
> fix that doesn’t require us to put this in tens or hundreds of places. I
> seem to recall a different design for this that I discussed with Geoff many
> years back.

Ryosuke, can you make a list of where we might need to use this technique? Let's see how many places it is.

I'm not sure if there's another way to solve this problem.

What's hard about this case is that a JavaScript event will fire on a node in the future, and the only record of that fact is a C++ RefPtr, and no other references in JS or C++ point to that node. That makes this case similar to an ActiveDOMObject for audio or video or networking, where the only thing that tells us that the wrapper should stay alive is the "hasPendingActivity()" state.

We can't say "If a RefPtr points to a node, the node's wrapper stays alive", since that's a direct circular reference.

There might be a way to give a node two reference counts, one for its wrappers and one for its RefPtrs, and say "If a node's RefPtr refcount is non-zero then its wrapper stays alive, and if either refcount is non-zero then the node stays alive". That might still trigger an indirect circular reference. But maybe it would Just Work (TM).
Comment 35 Ryosuke Niwa 2018-09-26 12:57:50 PDT
(In reply to Darin Adler from comment #33)
> (In reply to Geoffrey Garen from comment #32)
> > Another option is to reuse EventTarget::isFiringEventListeners(). Might be
> > faster to set / check a bit vs hash table. Clarifies that this behavior is
> > all about async event dispatch. (A strong GC reference is a memory leak
> > unless you can prove that some future event will clear the reference;
> > stating that this reference is for the purpose of firing an async event
> > listener might help communicate that.)
> 
> This shows that I don’t yet understand the issues clearly enough. I almost
> wish we had a white paper about it; hard to get right. When reading
> Ryosuke’s earlier comments, I thought the issue was literally "any object in
> the DOM that holds a strong reference to another object in the DOM". But you
> seem to be saying here that it has a somewhat narrower scope and events the
> code plans to dispatch in the future is a key factor.

The synapsis of the problem is that when C++ code retains a Node for a later use (e.g. async event, mutation observer, etc...), there is no way for GC to know that the wrapper of such a node needs to be retained.

For example, let's say we have an element which is removed from a tree with a mutation observer which observes the tree. WebKit would add a mutation record of this element, and invokes the observer at the end of the current microtask. If we were to run GC between the two, there is no way for GC to realize that this element, which is no longer in the document and not referenced by any JS needs to be kept alive.

GCReachableRef is meant to address these temporary C++-only references to nodes. It's not meant to retain objects in live data structures like Node tree, CSS OM, etc... which need to express relationship between objects with JS wrappers and JS objects. An alternative name I've considered was something like AsyncEventRef and DeferredNodeRef.

(In reply to Geoffrey Garen from comment #34)
> (In reply to Darin Adler from comment #30)
> > I am not sure this is the right way to fix this. I think we should pursue a
> > fix that doesn’t require us to put this in tens or hundreds of places. I
> > seem to recall a different design for this that I discussed with Geoff many
> > years back.
> 
> Ryosuke, can you make a list of where we might need to use this technique?
> Let's see how many places it is.

Of the top of my head, I can think of:
- Custom elements reaction queues (Fixed)
- slotchange event (Fixed in webkit.org/b/167652)
- MutationObserver (still broken)
- Event path (still broken) - This is the list of event targets we collect in EventDispatcher. Each target can be removed from its parent and lose its wrapper while we're invoking event listeners on other targets.
- GenericEventQueue (still broken)