Bug 208642 - REGRESSION: (r257905) [ Mac wk2 Debug ] ASSERTION FAILED: !m_isolatedWorld->isNormal() || m_wrapper || !m_jsFunction
Summary: REGRESSION: (r257905) [ Mac wk2 Debug ] ASSERTION FAILED: !m_isolatedWorld->i...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: WebKit Nightly Build
Hardware: Mac macOS 10.14
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords: InRadar
Depends on: 165713
Blocks:
  Show dependency treegraph
 
Reported: 2020-03-05 08:11 PST by Jason Lawrence
Modified: 2021-02-10 05:09 PST (History)
12 users (show)

See Also:


Attachments
tracks-support-no-tracks-crash-log (117.23 KB, text/plain)
2020-03-05 08:13 PST, Jason Lawrence
no flags Details
Patch (4.45 KB, patch)
2020-03-06 03:36 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (10.05 KB, patch)
2020-03-07 13:44 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (10.05 KB, patch)
2020-03-07 13:50 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (9.98 KB, patch)
2020-03-07 13:54 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (9.89 KB, patch)
2020-03-07 13:58 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (10.10 KB, patch)
2020-03-07 16:39 PST, Yusuke Suzuki
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Lawrence 2020-03-05 08:11:41 PST
media/modern-media-controls/tracks-support/tracks-support-no-tracks.html

Description:
This test is flaky crashing on Mac wk2 Debug. It appears to have a regression point of r257905.

History:
https://results.webkit.org/?suite=layout-tests&test=media%2Fmodern-media-controls%2Ftracks-support%2Ftracks-support-no-tracks.html&style=debug&limit=50000

Crash log attached;
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   com.apple.JavaScriptCore      	0x0000000821882b2e WTFCrash + 14 (Assertions.cpp:305)
1   com.apple.WebCore             	0x0000000804d70b5b WTFCrashWithInfo(int, char const*, char const*, int) + 27
2   com.apple.WebCore             	0x000000080706142a WebCore::JSEventListener::jsFunction(WebCore::ScriptExecutionContext&) const + 426 (JSEventListener.h:111)
3   com.apple.WebCore             	0x000000080706080b WebCore::JSEventListener::handleEvent(WebCore::ScriptExecutionContext&, WebCore::Event&) + 219 (JSEventListener.cpp:113)
4   com.apple.WebCore             	0x000000080771857c WebCore::EventTarget::innerInvokeEventListeners(WebCore::Event&, WTF::Vector<WTF::RefPtr<WebCore::RegisteredEventListener, WTF::DumbPtrTraits<WebCore::RegisteredEventListener> >, 1ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>, WebCore::EventTarget::EventInvokePhase) + 956 (EventTarget.cpp:315)
5   com.apple.WebCore             	0x00000008077147f2 WebCore::EventTarget::fireEventListeners(WebCore::Event&, WebCore::EventTarget::EventInvokePhase) + 354 (EventTarget.cpp:252)
6   com.apple.WebCore             	0x000000080771815d WebCore::EventTarget::dispatchEvent(WebCore::Event&) + 333 (EventTarget.cpp:211)
7   com.apple.WebCore             	0x0000000807737310 WebCore::MainThreadGenericEventQueue::dispatchOneEvent() + 608 (GenericEventQueue.cpp:73)
8   com.apple.WebCore             	0x000000080773e011 decltype(*(std::__1::forward<WebCore::MainThreadGenericEventQueue*&>(fp0)).*fp()) std::__1::__invoke<void (WebCore::MainThreadGenericEventQueue::*&)(), WebCore::MainThreadGenericEventQueue*&, void>(void (WebCore::MainThreadGenericEventQueue::*&)(), WebCore::MainThreadGenericEventQueue*&) + 113 (type_traits:4302)
9   com.apple.WebCore             	0x000000080773df70 std::__1::__bind_return<void (WebCore::MainThreadGenericEventQueue::*)(), std::__1::tuple<WebCore::MainThreadGenericEventQueue*>, std::__1::tuple<>, __is_valid_bind_return<void (WebCore::MainThreadGenericEventQueue::*)(), std::__1::tuple<WebCore::MainThreadGenericEventQueue*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (WebCore::MainThreadGenericEventQueue::*)(), std::__1::tuple<WebCore::MainThreadGenericEventQueue*>, 0ul, std::__1::tuple<> >(void (WebCore::MainThreadGenericEventQueue::*&)(), std::__1::tuple<WebCore::MainThreadGenericEventQueue*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) + 64 (functional:2644)
10  com.apple.WebCore             	0x000000080773df1c std::__1::__bind_return<void (WebCore::MainThreadGenericEventQueue::*)(), std::__1::tuple<WebCore::MainThreadGenericEventQueue*>, std::__1::tuple<>, __is_valid_bind_return<void (WebCore::MainThreadGenericEventQueue::*)(), std::__1::tuple<WebCore::MainThreadGenericEventQueue*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (WebCore::MainThreadGenericEventQueue::*)(), WebCore::MainThreadGenericEventQueue*>::operator()<>() + 60 (functional:2677)
11  com.apple.WebCore             	0x000000080773deb9 WTF::Detail::CallableWrapper<std::__1::__bind<void (WebCore::MainThreadGenericEventQueue::*)(), WebCore::MainThreadGenericEventQueue*>, void>::call() + 25 (Function.h:52)
12  com.apple.WebCore             	0x0000000804d8353a WTF::Function<void ()>::operator()() const + 138 (Function.h:84)
13  com.apple.WebCore             	0x0000000805061aeb WebCore::GenericTaskQueue<WebCore::Timer>::enqueueTask(WTF::Function<void ()>&&)::'lambda'()::operator()() const + 171 (GenericTaskQueue.h:108)
14  com.apple.WebCore             	0x0000000805061939 WTF::Detail::CallableWrapper<WebCore::GenericTaskQueue<WebCore::Timer>::enqueueTask(WTF::Function<void ()>&&)::'lambda'(), void>::call() + 25 (Function.h:52)
15  com.apple.WebCore             	0x0000000804d8353a WTF::Function<void ()>::operator()() const + 138 (Function.h:84)
16  com.apple.WebCore             	0x0000000808610f4e WebCore::TaskDispatcher<WebCore::Timer>::dispatchOneTask() + 222 (GenericTaskQueue.cpp:111)
17  com.apple.WebCore             	0x0000000808610c00 WebCore::TaskDispatcher<WebCore::Timer>::sharedTimerFired() + 256 (GenericTaskQueue.cpp:86)
18  com.apple.WebCore             	0x00000008086170c1 WebCore::TaskDispatcher<WebCore::Timer>::sharedTimer()::$_1::operator()() const + 17 (GenericTaskQueue.cpp:60)
19  com.apple.WebCore             	0x0000000808617089 WTF::Detail::CallableWrapper<WebCore::TaskDispatcher<WebCore::Timer>::sharedTimer()::$_1, void>::call() + 25 (Function.h:52)
20  com.apple.WebCore             	0x0000000804d8353a WTF::Function<void ()>::operator()() const + 138 (Function.h:84)
21  com.apple.WebCore             	0x0000000804e37269 WebCore::Timer::fired() + 25 (Timer.h:127)
22  com.apple.WebCore             	0x000000080866018a WebCore::ThreadTimers::sharedTimerFiredInternal() + 650 (ThreadTimers.cpp:129)
23  com.apple.WebCore             	0x000000080866a5f1 WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const + 33 (ThreadTimers.cpp:69)
24  com.apple.WebCore             	0x000000080866a5a9 WTF::Detail::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0, void>::call() + 25 (Function.h:52)
25  com.apple.WebCore             	0x0000000804d8353a WTF::Function<void ()>::operator()() const + 138 (Function.h:84)
26  com.apple.WebCore             	0x000000080862a397 WebCore::MainThreadSharedTimer::fired() + 135 (MainThreadSharedTimer.cpp:84)
27  com.apple.WebCore             	0x00000008086d2486 WebCore::timerFired(__CFRunLoopTimer*, void*) + 38 (MainThreadSharedTimerCF.cpp:75)
28  com.apple.CoreFoundation      	0x00007fff36015804 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
29  com.apple.CoreFoundation      	0x00007fff360153be __CFRunLoopDoTimer + 859
30  com.apple.CoreFoundation      	0x00007fff36014e9e __CFRunLoopDoTimers + 317
31  com.apple.CoreFoundation      	0x00007fff35ff9aed __CFRunLoopRun + 2213
32  com.apple.CoreFoundation      	0x00007fff35ff8bd3 CFRunLoopRunSpecific + 499
33  com.apple.Foundation          	0x00007fff3869c1a8 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 212
34  com.apple.Foundation          	0x00007fff3874fd8b -[NSRunLoop(NSRunLoop) run] + 76
35  libxpc.dylib                  	0x00007fff6d8ed0e1 _xpc_objc_main.cold.4 + 49
36  libxpc.dylib                  	0x00007fff6d8ed027 _xpc_objc_main + 559
37  libxpc.dylib                  	0x00007fff6d8ecb5a xpc_main + 377
38  com.apple.WebKit              	0x00000007f889321c WebKit::XPCServiceMain(int, char const**) + 1276 (XPCServiceMain.mm:166)
39  com.apple.WebKit              	0x00000007f9bdcc4b WKXPCServiceMain + 27 (WKMain.mm:33)
40  com.apple.WebKit.WebContent   	0x00000001061e2ec2 main + 34 (AuxiliaryProcessMain.cpp:30)
41  libdyld.dylib                 	0x00007fff6d69e7fd start + 1
Comment 1 Radar WebKit Bug Importer 2020-03-05 08:12:14 PST
<rdar://problem/60084741>
Comment 2 Jason Lawrence 2020-03-05 08:13:19 PST
Created attachment 392571 [details]
tracks-support-no-tracks-crash-log
Comment 3 Jason Lawrence 2020-03-05 08:23:25 PST
I can reproduce this issue with r257915 (debug) and r257905 (debug), but cannot reproduce the issue with r257904 (debug).

run-webkit-tests --iterations 1000 --exit-after-n-failures 5 --force -f --no-retry --debug media/modern-media-controls/tracks-support/tracks-support-no-tracks.html

r257915:
[51/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=49295])
[51/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=49293])
[61/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=49288])
[89/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=49297])
[91/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=49412])
Exiting early after 5 failures. 88 tests run.
83 tests ran as expected, 5 didn't (912 didn't run):

r257905 (I switched from 5 to 3 failures here):
[67/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=50171])
[76/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=50165])
[78/1000] media/modern-media-controls/tracks-support/tracks-support-no-tracks.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=50174])
Exiting early after 3 failures. 75 tests run.
72 tests ran as expected, 3 didn't (925 didn't run):

r257904:
All 1000 tests ran as expected.
Comment 4 Ryan Haddad 2020-03-05 09:26:36 PST
ASSERTION FAILED: !m_isolatedWorld->isNormal() || m_wrapper || !m_jsFunction
./bindings/js/JSEventListener.h(111) : JSC::JSObject *WebCore::JSEventListener::jsFunction(WebCore::ScriptExecutionContext &) const
1   0x7d16190b9 WTFCrash
2   0x7b4ca328b WTFCrashWithInfo(int, char const*, char const*, int)
3   0x7b6f982ea WebCore::JSEventListener::jsFunction(WebCore::ScriptExecutionContext&) const
Comment 5 Jason Lawrence 2020-03-05 10:03:46 PST
I have rolled out r257905 here: https://trac.webkit.org/changeset/257922/webkit
Comment 6 Yusuke Suzuki 2020-03-05 13:22:00 PST
This assertion looks wrong. It is assuming that, if m_wrapper goes away, m_jsFunction should go away too. That is not guaranteed by JSC GC.
Comment 7 Yusuke Suzuki 2020-03-05 14:41:43 PST
It is saying,

1. If the world is normal.
2. If m_wrapper is gone.
3. If m_jsFunction is not gone.

Then, the assertion hits. But this is wrong. We have no guarantee that m_jsFunction is gone when m_wrapper is gone. We have conservative GC. We have generational GC. Everything makes it possible that m_jsFunction is live while m_wrapper is gone.
I'll remove this assertion.
Comment 8 Alexey Proskuryakov 2020-03-05 15:51:27 PST
DOM is supposed to guarantee that wrappers stay alive for event listeners to fire. Otherwise, code in the listeners will or will not run semi-randomly depending on whether GC happened, which would be incorrect behavior. GC must not be observable from user code.
Comment 9 Ryosuke Niwa 2020-03-05 16:03:19 PST
The assertion is saying that if m_jsFunction is not null, then m_wrapper shouldn't be null. This must be true because any event target with a valid JS event listener should have had a wrapper when the event listener was added, and the wrapper should never be collected as long as the event listener can be invoked.
Comment 10 Ryosuke Niwa 2020-03-05 16:04:14 PST
In other words, removing the assertion is not the right solution here. It's an indicative of a bug that we're collecting a wrapper of an event target even when an event can be dispatched in the future on it.
Comment 11 Yusuke Suzuki 2020-03-05 17:20:20 PST
> The assertion is saying that if m_jsFunction is not null, then m_wrapper shouldn't be null. This must be true because any event target with a valid JS event listener should have had a wrapper when the event listener was added, and the wrapper should never be collected as long as the event listener can be invoked.

No. m_jsFunction can be any value basically if m_wrapper is null. This can be collected. But this can be retained because of conservative GC, because user puts listener function to somewhere, and the other any reasons.
So, m_jsFunction in this assertion is completely unrelated to what we should test.
The correct assertion should be at least,

ASSERT(!m_isolatedWorld->isNormal() || m_wrapper);
Comment 12 Yusuke Suzuki 2020-03-05 17:21:43 PST
Currently, I'm checking two possibilities,

1. My patch had a bug. And this allows m_wrapper collected by GC. I'm now checking this possibility.
2. This was pre-existing bug. We are not hitting the crash just because m_jsFunction was dead too.
Comment 13 Yusuke Suzuki 2020-03-05 17:23:31 PST
(In reply to Yusuke Suzuki from comment #11)
> > The assertion is saying that if m_jsFunction is not null, then m_wrapper shouldn't be null. This must be true because any event target with a valid JS event listener should have had a wrapper when the event listener was added, and the wrapper should never be collected as long as the event listener can be invoked.
> 
> No. m_jsFunction can be any value basically if m_wrapper is null. This can
> be collected. But this can be retained because of conservative GC, because
> user puts listener function to somewhere, and the other any reasons.
> So, m_jsFunction in this assertion is completely unrelated to what we should
> test.
> The correct assertion should be at least,
> 
> ASSERT(!m_isolatedWorld->isNormal() || m_wrapper);

Maybe, we can also ensure that,

if (m_wrapper)
    ASSERT(m_jsFunction);

But current assertion is saying,

if (!m_wrapper)
    ASSERT(!m_jsFunction);

This is not guaranteed (you can consider conservative GC. Any objects can be marked as live).
Comment 14 Yusuke Suzuki 2020-03-05 17:28:11 PST
(In reply to Yusuke Suzuki from comment #13)
> (In reply to Yusuke Suzuki from comment #11)
> > > The assertion is saying that if m_jsFunction is not null, then m_wrapper shouldn't be null. This must be true because any event target with a valid JS event listener should have had a wrapper when the event listener was added, and the wrapper should never be collected as long as the event listener can be invoked.
> > 
> > No. m_jsFunction can be any value basically if m_wrapper is null. This can
> > be collected. But this can be retained because of conservative GC, because
> > user puts listener function to somewhere, and the other any reasons.
> > So, m_jsFunction in this assertion is completely unrelated to what we should
> > test.
> > The correct assertion should be at least,
> > 
> > ASSERT(!m_isolatedWorld->isNormal() || m_wrapper);
> 
> Maybe, we can also ensure that,
> 
> if (m_wrapper)
>     ASSERT(m_jsFunction);
> 
> But current assertion is saying,
> 
> if (!m_wrapper)
>     ASSERT(!m_jsFunction);
> 
> This is not guaranteed (you can consider conservative GC. Any objects can be
> marked as live).

And I ensured that nobody is clearing m_jsFunction when m_wrapper is gone. So the current assertion's `!m_jsFunction` part is incorrect.
Comment 15 Ryosuke Niwa 2020-03-05 18:40:04 PST
Given the backtrace, it looks like someone using GenericEventQueue isn't keeping event target's wrapper alive.
GenericEventQueue is problematic because it doesn't keep event target's wrappers alive automatically.
Each event target which gets added to GenericEventQueue needs to keep its wrapper alive.
Comment 16 Yusuke Suzuki 2020-03-06 00:04:51 PST
Several updates.

1. The assertion's normal-world check looks also wrong. We are skipping event emission if the world is not normal, this does not make sense. These events can be used to implement the internal feature, and it could emit user-visible events later.
2. So far, this looks like an existing bug due to https://bugs.webkit.org/show_bug.cgi?id=165713. By using IsoSubspace, we start putting some lower-tier cells in PreciseAllocation, and reuse them. This makes GC behavior different, and exposing the existing bug: we should re-register root if we changed the root after we register the root. I could create a test case which reproduces this assertion failure without my patch.
Comment 17 Yusuke Suzuki 2020-03-06 00:12:15 PST
(In reply to Yusuke Suzuki from comment #16)
> 2. So far, this looks like an existing bug due to
> https://bugs.webkit.org/show_bug.cgi?id=165713. By using IsoSubspace, we
> start putting some lower-tier cells in PreciseAllocation, and reuse them.
> This makes GC behavior different, and exposing the existing bug: we should
> re-register root if we changed the root after we register the root. I could
> create a test case which reproduces this assertion failure without my patch.

Let's describe what is happening.

1. HTMLVideoElement is created in the test under the current Document. So root is Document.
2. Concurrent GC starts working.
3. Marking (1)'s HTMLVideoElement and registering Document as a root.
4. The executed code removes HTMLVideoElement from Document.
5. HTMLVideoTrackList in HTMLVideoElement queries the opaque root. Since the root of HTMLVideoElement is changed to itself, HTMLVideoTrackList says "I'm live if the root set includes HTMLVideoElement". But this is not included since HTMLVideoElement registers Document as a root. And after the root is changed, it is not re-registering the new root.
6. HTMLVideoTrackList is saying I'm not reachable.
Comment 18 Ryosuke Niwa 2020-03-06 00:25:43 PST
(In reply to Yusuke Suzuki from comment #17)
> (In reply to Yusuke Suzuki from comment #16)
> > 2. So far, this looks like an existing bug due to
> > https://bugs.webkit.org/show_bug.cgi?id=165713. By using IsoSubspace, we
> > start putting some lower-tier cells in PreciseAllocation, and reuse them.
> > This makes GC behavior different, and exposing the existing bug: we should
> > re-register root if we changed the root after we register the root. I could
> > create a test case which reproduces this assertion failure without my patch.
> 
> Let's describe what is happening.
> 
> 1. HTMLVideoElement is created in the test under the current Document. So
> root is Document.
> 2. Concurrent GC starts working.
> 3. Marking (1)'s HTMLVideoElement and registering Document as a root.
> 4. The executed code removes HTMLVideoElement from Document.
> 5. HTMLVideoTrackList in HTMLVideoElement queries the opaque root. Since the
> root of HTMLVideoElement is changed to itself, HTMLVideoTrackList says "I'm
> live if the root set includes HTMLVideoElement". But this is not included
> since HTMLVideoElement registers Document as a root. And after the root is
> changed, it is not re-registering the new root.
> 6. HTMLVideoTrackList is saying I'm not reachable.

In this scenario, HTMLVideoElement needs to have a pending activity in ActiveDOMObjet while HTMLVideoTrackList/HTMLVideoElement sits in GenericEventQueue. That would prevent GC from collecting the JS wrapper since it would be reachableFromOpaqueRoot regardless of where it is.
Comment 19 Yusuke Suzuki 2020-03-06 00:34:27 PST
(In reply to Ryosuke Niwa from comment #18)
> (In reply to Yusuke Suzuki from comment #17)
> > (In reply to Yusuke Suzuki from comment #16)
> > > 2. So far, this looks like an existing bug due to
> > > https://bugs.webkit.org/show_bug.cgi?id=165713. By using IsoSubspace, we
> > > start putting some lower-tier cells in PreciseAllocation, and reuse them.
> > > This makes GC behavior different, and exposing the existing bug: we should
> > > re-register root if we changed the root after we register the root. I could
> > > create a test case which reproduces this assertion failure without my patch.
> > 
> > Let's describe what is happening.
> > 
> > 1. HTMLVideoElement is created in the test under the current Document. So
> > root is Document.
> > 2. Concurrent GC starts working.
> > 3. Marking (1)'s HTMLVideoElement and registering Document as a root.
> > 4. The executed code removes HTMLVideoElement from Document.
> > 5. HTMLVideoTrackList in HTMLVideoElement queries the opaque root. Since the
> > root of HTMLVideoElement is changed to itself, HTMLVideoTrackList says "I'm
> > live if the root set includes HTMLVideoElement". But this is not included
> > since HTMLVideoElement registers Document as a root. And after the root is
> > changed, it is not re-registering the new root.
> > 6. HTMLVideoTrackList is saying I'm not reachable.
> 
> In this scenario, HTMLVideoElement needs to have a pending activity in
> ActiveDOMObjet while HTMLVideoTrackList/HTMLVideoElement sits in
> GenericEventQueue. That would prevent GC from collecting the JS wrapper
> since it would be reachableFromOpaqueRoot regardless of where it is.

No, it does not fix the issue unfortunately :(
Even if HTMLVideoElement is live, HTMLVideoTrackList goes away since HTMLVideoTrackList is live only if it can find an opaque root, which is not properly registered by HTMLVideoElement due to race condition.
Fundamental fix should be, in some way, re-registering an root & saying "Marking" happens if the root is changed. This is some form of write-barrier.

This issue itself exists for 4 years, and this issue is hidden by this wrong assertion I think (this happens frequently for HTMLVideoElement, but this assertion does not hit due to `m_jsFunction` check and world-is-normal check since this pattern is used in non-normal world (media controls).
Comment 20 Ryosuke Niwa 2020-03-06 00:48:45 PST
(In reply to Yusuke Suzuki from comment #19)
> (In reply to Ryosuke Niwa from comment #18)
>
> > In this scenario, HTMLVideoElement needs to have a pending activity in
> > ActiveDOMObjet while HTMLVideoTrackList/HTMLVideoElement sits in
> > GenericEventQueue. That would prevent GC from collecting the JS wrapper
> > since it would be reachableFromOpaqueRoot regardless of where it is.
> 
> No, it does not fix the issue unfortunately :(
> Even if HTMLVideoElement is live, HTMLVideoTrackList goes away since
> HTMLVideoTrackList is live only if it can find an opaque root, which is not
> properly registered by HTMLVideoElement due to race condition.

Okay, so VideoTrackList needs to be an ActiveDOMObject itself and have a pending activity whenever it's in HTMLVideoElement's m_asyncEventQueue.

> Fundamental fix should be, in some way, re-registering an root & saying
> "Marking" happens if the root is changed. This is some form of write-barrier.

Hm... that sounds like a brittle approach but if that's what we need to do, then dispatchChildRemovalEvents in ContainerNode.cpp has a code to force creating JS wrapper on the removed node so perhaps we can emit a write barrier there.

Alternatively, can we use MarkingConstraint to relate these objects?
Comment 21 Ryosuke Niwa 2020-03-06 01:00:39 PST
(In reply to Ryosuke Niwa from comment #20)
> (In reply to Yusuke Suzuki from comment #19)
> > (In reply to Ryosuke Niwa from comment #18)
> >
> > > In this scenario, HTMLVideoElement needs to have a pending activity in
> > > ActiveDOMObjet while HTMLVideoTrackList/HTMLVideoElement sits in
> > > GenericEventQueue. That would prevent GC from collecting the JS wrapper
> > > since it would be reachableFromOpaqueRoot regardless of where it is.
> > 
> > No, it does not fix the issue unfortunately :(
> > Even if HTMLVideoElement is live, HTMLVideoTrackList goes away since
> > HTMLVideoTrackList is live only if it can find an opaque root, which is not
> > properly registered by HTMLVideoElement due to race condition.
> 
> Okay, so VideoTrackList needs to be an ActiveDOMObject itself and have a
> pending activity whenever it's in HTMLVideoElement's m_asyncEventQueue.

Let another approach here is to use CustomIsReachable and write a custom JSVideoTrackList::isReachableFromOpaqueRoots, or add a new variant of GenerateIsReachable like ImplMediaElementRoot which checks not just that the root of the media element is an opaque root but also if the media element has a pending activity or not.
Comment 22 Yusuke Suzuki 2020-03-06 01:20:20 PST
I think this is more fundamental problem which is not tied to HTMLVideoElement case.
For any node, this race condition happens if,

1. Node (A) is attached to the tree (TA).
2. (A)'s wrapper is marked and (TA) is registered as an opaque root.
3. The user code removes (A) from (TA).
4. Node (B), which is under (A), is invoking is-reachable-from-opaque-root query.
5. Node (B)'s root is (A) since (A) is detached from the tree (TA).
6. (4) cannot find the (A) from the set of opaque roots.

Let's consider about the fundamental fix. Every time we change the root, we need to emit "write-barrier" to this node.
Then, GC rescans it, re-register the updated root. So, the problem is how to emit "write-barrier" while we only have C++ node.

So the question is how to achieve this effect. One idea I have right now is, (1) when changing the root, (2) if GC is not running right now, noop, (2) otherwise, let's check the old root is included in the opaque root, (3) if it is included, we put the new root into the opaque root as something like remember-set mechanism. Then, conservatively we collect the necessary opaque roots. And once GC cycles pass, opaque root set is anyway cleared again. This works at least for main thread since we can access these data via commonVM().
Comment 23 Ryosuke Niwa 2020-03-06 01:44:34 PST
(In reply to Yusuke Suzuki from comment #22)
> I think this is more fundamental problem which is not tied to
> HTMLVideoElement case.
> For any node, this race condition happens if,
> 
> 1. Node (A) is attached to the tree (TA).
> 2. (A)'s wrapper is marked and (TA) is registered as an opaque root.
> 3. The user code removes (A) from (TA).
> 4. Node (B), which is under (A), is invoking is-reachable-from-opaque-root query.
> 5. Node (B)'s root is (A) since (A) is detached from the tree (TA).
> 6. (4) cannot find the (A) from the set of opaque roots.

That indeed sounds like a real bug.

> Let's consider about the fundamental fix. Every time we change the root, we
> need to emit "write-barrier" to this node.
> Then, GC rescans it, re-register the updated root. So, the problem is how to
> emit "write-barrier" while we only have C++ node.
> 
> So the question is how to achieve this effect. One idea I have right now is,
> (1) when changing the root,

This would be ContainerNode::removeAllChildrenWithScriptAssertion, ContainerNode::removeNodeWithScriptAssertion, and executeNodeInsertionWithScriptAssertion.

> (2) if GC is not running right now, noop, (2)
> otherwise, let's check the old root is included in the opaque root, (3) if
> it is included, we put the new root into the opaque root as something like
> remember-set mechanism.

Hm... I'm thinking if we can re-use something like GCReachableRefMap or add more logic to isReachableFromDOM in JSNodeCustom.cpp but I don't think we could because the issue isn't so much about the root node, which is an opaque root but rather anyone which uses a node has an opaque root.

For that matter, this seems like an issue with anything that can dynamically change its opaque root, not just Node.
Comment 24 Yusuke Suzuki 2020-03-06 01:49:22 PST
Or, super easy way (I'm not sure this is good for performance for now), I'm now considering: let's register opaque roots in output constraints!
Comment 25 Yusuke Suzuki 2020-03-06 02:01:49 PST
(In reply to Yusuke Suzuki from comment #24)
> Or, super easy way (I'm not sure this is good for performance for now), I'm
> now considering: let's register opaque roots in output constraints!

Not sure, some wiring should be required. But this sounds the cleanest solution.
Comment 26 Yusuke Suzuki 2020-03-06 02:09:29 PST
(In reply to Yusuke Suzuki from comment #25)
> (In reply to Yusuke Suzuki from comment #24)
> > Or, super easy way (I'm not sure this is good for performance for now), I'm
> > now considering: let's register opaque roots in output constraints!
> 
> Not sure, some wiring should be required. But this sounds the cleanest
> solution.

Need to check. It would be possible that I'm missing something.
Comment 27 Ryosuke Niwa 2020-03-06 02:13:01 PST
(In reply to Yusuke Suzuki from comment #24)
> Or, super easy way (I'm not sure this is good for performance for now), I'm
> now considering: let's register opaque roots in output constraints!

When you say register opaque roots in output constraints, do you mean to do that for all opaque roots?
Comment 28 Yusuke Suzuki 2020-03-06 02:52:22 PST
Ooops, maybe I found the bug.
I'm now checking but maybe this is caused by my silly mistake. Sorry.
I'm rolling out the patch for now. And ensuring the new patch is fixing this issue. But assertion is anyway wrong so I'll create a different patch for fixing the assertion too.
Comment 29 Yusuke Suzuki 2020-03-06 03:04:12 PST
OK, I found the super silly bug in my original patch https://bugs.webkit.org/show_bug.cgi?id=205107#c13. Really sorry :(
But at the same time, this assertion is still wrong. I'll use this bug for fixing the assertion to the correct form: ASSERT(m_wrapper && m_jsFunction); since,

1. Whether the world is normal or not does not matter to event emission. This is introduced to suppress some assertion failures originally, this is wrong.
2. The original assertion was saying, "if m_wrapper is dead, m_jsFunction must be dead". This is wrong since m_jsFunction can be alive due to whatever reasons. For GC managed objects, we cannot say "This is dead since it is not referenced from anywhere" since conservative GC can find it even if it is not logically referenced. Only thing we can say is, "if m_wrapper is live, m_jsFunction should be live".
Comment 30 Ryosuke Niwa 2020-03-06 03:30:26 PST
Do we still have the issue with opaque roots dynamically updating? Or was it just red herring due to that bug?
Comment 31 Yusuke Suzuki 2020-03-06 03:36:57 PST
Created attachment 392702 [details]
Patch
Comment 32 Yusuke Suzuki 2020-03-06 03:37:36 PST
(In reply to Ryosuke Niwa from comment #30)
> Do we still have the issue with opaque roots dynamically updating? Or was it
> just red herring due to that bug?

I've checked that and as I thought we are already emitting opaque root registering in the output-constraints. So this works correctly :)
Comment 33 Yusuke Suzuki 2020-03-06 03:43:11 PST
Comment on attachment 392702 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392702&action=review

> Source/WebCore/bindings/js/JSEventListener.h:110
> +    ASSERT(!m_isolatedWorld->isNormal() || m_wrapper);

We could even say, `ASSERT(!m_isolatedWorld->isNormal() || (m_wrapper && m_jsFunction));` but not sure whether we have a case that m_jsFunction is null initialized while the m_wrapper is non-null initialized.
And ideal ASSERT would be  `ASSERT(m_wrapper && m_jsFunction)` since normal-world or not does not matter whether we should skip the event handlers.
Comment 34 Saam Barati 2020-03-06 08:04:55 PST
(In reply to Yusuke Suzuki from comment #29)
> OK, I found the super silly bug in my original patch
> https://bugs.webkit.org/show_bug.cgi?id=205107#c13. Really sorry :(

So if we don’t visit any of our output constraint spaces, we only have a single test that fails in a flaky manner?

Sounds like we need better tests :-)

> But at the same time, this assertion is still wrong. I'll use this bug for
> fixing the assertion to the correct form: ASSERT(m_wrapper && m_jsFunction);
> since,
> 
> 1. Whether the world is normal or not does not matter to event emission.
> This is introduced to suppress some assertion failures originally, this is
> wrong.
> 2. The original assertion was saying, "if m_wrapper is dead, m_jsFunction
> must be dead". This is wrong since m_jsFunction can be alive due to whatever
> reasons. For GC managed objects, we cannot say "This is dead since it is not
> referenced from anywhere" since conservative GC can find it even if it is
> not logically referenced. Only thing we can say is, "if m_wrapper is live,
> m_jsFunction should be live".
Comment 35 Alexey Proskuryakov 2020-03-06 08:57:52 PST
Comment on attachment 392702 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392702&action=review

Since the change only makes the assertion stronger, it is good as far as I'm concerned. But tests are now asserting, so mechanically r-.

> Source/WebCore/ChangeLog:12
> +           GC, we cannot say such an condition. Even if m_wrapper is dead, m_jsFunction can be alive by various reasons: conservative
> +           GC finds it, user code stores this function somewhere reachable from the root, etc. The correct thing is `m_wrapper` must

There are a couple things here that make me a little skeptical, from a position of someone who only touched this a long time ago and is absolutely not an expert.

1. If this assertion can misfire due to conservative GC finding it, why haven't we ever seen it fire without a real bug?

2. The assertion would fire if user code stored the function somewhere, so it's essentially saying that user code must not do that. That isn't a bug in the assertion necessarily. Is there even a reason for user code to do this?

3. Checking m_jsFunction for being null is not the same as checking whether the function is alive. Once can just assign null to the member, and arguably should do this without waiting for GC. Is it even right that m_jsFunction is a weak pointer?

> Source/WebCore/ChangeLog:15
> +        2. The comments about "zombie m_jsFunctions" is wrong. We are using JSC::Weak<>. So if the object gets collected, it returns
> +           nullptr, not getting a zombie pointer.

Yes, must have been written before these were weak pointers.

> Source/WebCore/ChangeLog:17
> +        3. We are emitting write-barrier in a wrong order. In the heavily stressed scenario, it is possible that concurrent marking
> +           scans JSEventListener just after we emit the write-barrier, and this marking misses the assigned value. We must emit

Great find! Is there any test that hits this, at least in StressGC configuration?

> Source/WebCore/ChangeLog:21
> +        Still keeping normal world check. But this looks also wrong. The assertion is allowing non-normal world to get dead m_wrapper.

Yes, pretty sure it's been understood to be a dirty hack when added. Interaction between worlds and event listeners was bogus back then, and I don't know if it has been improved since.
Comment 36 Yusuke Suzuki 2020-03-06 13:10:08 PST
Comment on attachment 392702 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392702&action=review

The test failure is saying that we invoke JSEventListener::jsFunction with `!m_wrapper` legally.
This sounds like we are also using `m_wrapper` for whether it is initialized or not. This sounds like wrong too.

>> Source/WebCore/ChangeLog:12
>> +           GC finds it, user code stores this function somewhere reachable from the root, etc. The correct thing is `m_wrapper` must
> 
> There are a couple things here that make me a little skeptical, from a position of someone who only touched this a long time ago and is absolutely not an expert.
> 
> 1. If this assertion can misfire due to conservative GC finding it, why haven't we ever seen it fire without a real bug?
> 
> 2. The assertion would fire if user code stored the function somewhere, so it's essentially saying that user code must not do that. That isn't a bug in the assertion necessarily. Is there even a reason for user code to do this?
> 
> 3. Checking m_jsFunction for being null is not the same as checking whether the function is alive. Once can just assign null to the member, and arguably should do this without waiting for GC. Is it even right that m_jsFunction is a weak pointer?

1.
This is because this assertion is using `!m_jsFuncion` wrong purpose. We are *not* firing if `m_jsFunction` is successfully collected even if `m_wrapper` is collected. This weakens the assertion so we could miss the real bugs due to this wrong condition.
And again, checking `m_jsFunction`'s liveness here is unrelated to the validity of JSEventListener. I think the key is that, previously this m_jsFunction is not Weak<>. So, when we are evaluating `!m_jsFuncion`, we are not checking the liveness of this. Rather we are checking whether we initialized it once before. However, now it is replaced with Weak<> and this `!m_jsFunction` becomes "it is not initialized OR it is collected". This is clearly wrong, and that's why this worked so far. We made this assertion's meaning weak unintentionally when we introduced Weak<> to this JSEventListener.
I guess that this could hide the real bug so far. And making the meaning of this assertion looks very wrong because this is now checking m_jsFunction's liveness too.

The assertion is saying "Even if the m_wrapper is collected, if m_jsFunction  is collected too, this is OK, let's skip event triggering". No, this is not OK, this is the bug.

2.
Yes, it is possible. You can write a code,

function handler() { };
imgElement.onload = handler;

This `handler` is m_jsFunction. You can put it anywhere. So making `m_jsFunction` alive is very easy. The key here is that, the original intention of `m_jsFunction` check is not looking into the liveness. It is checking whether we initialized this field once (because we were using WriteBarrier<>).
But now, this is changed to Weak<>, and the meaning becomes broken.

3.
Yes. And checking `m_jsFunction` for being null is not the same as checking it is not initialized too.
But this assertion is using `m_jsFunction` condition to check wether it is initialized. This is wrong. Liveness should not be checked.
Comment 37 Yusuke Suzuki 2020-03-07 13:03:54 PST
Comment on attachment 392702 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392702&action=review

>> Source/WebCore/ChangeLog:17
>> +           scans JSEventListener just after we emit the write-barrier, and this marking misses the assigned value. We must emit
> 
> Great find! Is there any test that hits this, at least in StressGC configuration?

It is possible that if we stress this GC situation enough, we could eventually hit GC crash. But not sure whether our GC bots have the tests which stresses this path.

> Source/WebCore/bindings/js/JSEventListener.h:101
>      if (!m_jsFunction) {

Now, I think this condition is also wrong. I think this is originally inserted as "this listener is not initialized yet". But now, this `!m_jsFunction`'s meaning is different from this.
Initializing function again if the function is gone sounds wrong.
Comment 38 Yusuke Suzuki 2020-03-07 13:44:54 PST
Created attachment 392888 [details]
Patch
Comment 39 Yusuke Suzuki 2020-03-07 13:50:32 PST
Created attachment 392892 [details]
Patch
Comment 40 Yusuke Suzuki 2020-03-07 13:54:46 PST
Created attachment 392893 [details]
Patch
Comment 41 Yusuke Suzuki 2020-03-07 13:58:46 PST
Created attachment 392895 [details]
Patch
Comment 42 Yusuke Suzuki 2020-03-07 16:38:37 PST
Comment on attachment 392895 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392895&action=review

> Source/WebCore/bindings/js/JSEventListener.cpp:56
>          m_jsFunction = JSC::Weak<JSC::JSObject>(function);

I think the assertion is saying this can be nullptr.
Comment 43 Yusuke Suzuki 2020-03-07 16:39:58 PST
Created attachment 392911 [details]
Patch
Comment 44 Yusuke Suzuki 2020-03-08 01:36:56 PST
I’m now thinking that corrected assertion found the real issue.
https://ews-build.webkit.org/results/macOS-Mojave-Debug-WK1-Tests-EWS/r392911-4549/results.html
Comment 45 Yusuke Suzuki 2020-03-08 01:56:51 PST
(In reply to Yusuke Suzuki from comment #44)
> I’m now thinking that corrected assertion found the real issue.
> https://ews-build.webkit.org/results/macOS-Mojave-Debug-WK1-Tests-EWS/
> r392911-4549/results.html

LoadableTextTrack should be kept alive while it is loading.
Comment 46 Yusuke Suzuki 2020-03-08 01:57:08 PST
(In reply to Yusuke Suzuki from comment #45)
> (In reply to Yusuke Suzuki from comment #44)
> > I’m now thinking that corrected assertion found the real issue.
> > https://ews-build.webkit.org/results/macOS-Mojave-Debug-WK1-Tests-EWS/
> > r392911-4549/results.html
> 
> LoadableTextTrack should be kept alive while it is loading.

LoadableTextTrack's HTMLTrackElement's wrapper should be kept alive.
Comment 47 Yusuke Suzuki 2020-03-08 04:04:46 PDT
(In reply to Yusuke Suzuki from comment #46)
> (In reply to Yusuke Suzuki from comment #45)
> > (In reply to Yusuke Suzuki from comment #44)
> > > I’m now thinking that corrected assertion found the real issue.
> > > https://ews-build.webkit.org/results/macOS-Mojave-Debug-WK1-Tests-EWS/
> > > r392911-4549/results.html
> > 
> > LoadableTextTrack should be kept alive while it is loading.
> 
> LoadableTextTrack's HTMLTrackElement's wrapper should be kept alive.

If LoadableTextTrack is loading a track, and firing events, if HTMLTrackElement is removed from the tree and nobody is looking into it (while event listener is set), I think this assertion hits.
Comment 48 Darin Adler 2020-03-08 23:03:42 PDT
Comment on attachment 392911 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392911&action=review

> Source/WebCore/bindings/js/JSEventListener.cpp:57
> +    if (!wrapper) {
>          ASSERT(!function);
> +        return;
> +    }

Inconsistent to use early return for (!wrapper), but a normal if statement for (function). I suggest we write this instead:

    if (function) {
        ASSERT(wrapper);
        ...
Comment 49 Yusuke Suzuki 2020-03-09 00:23:38 PDT
Comment on attachment 392911 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=392911&action=review

>> Source/WebCore/bindings/js/JSEventListener.cpp:57
>> +    }
> 
> Inconsistent to use early return for (!wrapper), but a normal if statement for (function). I suggest we write this instead:
> 
>     if (function) {
>         ASSERT(wrapper);
>         ...

Thanks, fixed.
Comment 50 Yusuke Suzuki 2020-03-10 01:33:01 PDT
Committed r258189: <https://trac.webkit.org/changeset/258189>
Comment 51 Enrique Ocaña 2021-02-10 05:09:28 PST
This patch is solving an issue we found downstream on wpe-2.22 (and 2.28 but not on 2.30) on ARM 32bit devices (at least, as that's what we use).

We were directly accessing mediaSource.sourceBuffers from JavaScript to register an event listener on the addsourcebuffer event, but when the event happened, the event listener had been declared "dead" by the garbage collector and never ran. If we held mediaSource.sourceBuffers in a temporary variable in an outer scope, the behaviour didn't happen.

It took a long bisect to locate this fix. Thanks a lot for it!