Bug 171406 - [macOS] Flaky Crash under EventTarget::fireEventListeners on imported/blink/paint/deprecatedpaintlayer/non-self-painting-layer-overrides-visibility.html
Summary: [macOS] Flaky Crash under EventTarget::fireEventListeners on imported/blink/p...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-04-27 16:48 PDT by Matt Lewis
Modified: 2019-02-06 09:18 PST (History)
11 users (show)

See Also:


Attachments
Patch (1.39 KB, patch)
2017-04-27 17:39 PDT, Matt Lewis
no flags Details | Formatted Diff | Diff
Patch (10.25 KB, patch)
2017-05-02 10:40 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Matt Lewis 2017-04-27 16:48:47 PDT
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   com.apple.WebCore             	0x000000011d29d0e1 0x1196f9000 + 62537953
1   com.apple.WebCore             	0x00000001197b3305 WTF::TypeCastTraits<WebCore::Document const, WebCore::ScriptExecutionContext const, false>::isOfType(WebCore::ScriptExecutionContext const&) + 21 (Document.h:1796)
2   com.apple.WebCore             	0x0000000119ae14d7 bool WTF::is<WebCore::Document, WebCore::ScriptExecutionContext>(WebCore::ScriptExecutionContext*) + 39 (TypeCasts.h:66)
3   com.apple.WebCore             	0x000000011a052b69 WebCore::EventTarget::fireEventListeners(WebCore::Event&, WTF::Vector<WTF::RefPtr<WebCore::RegisteredEventListener>, 1ul, WTF::CrashOnOverflow, 16ul>) + 137 (EventTarget.cpp:237)
4   com.apple.WebCore             	0x000000011a05298e WebCore::EventTarget::fireEventListeners(WebCore::Event&) + 318 (EventTarget.cpp:209)
5   com.apple.WebCore             	0x000000011a052829 WebCore::EventTarget::dispatchEvent(WebCore::Event&) + 233 (EventTarget.cpp:166)
6   com.apple.WebCore             	0x000000011a2a3e2f WebCore::GenericEventQueue::dispatchOneEvent() + 239 (GenericEventQueue.cpp:70)
7   com.apple.WebCore             	0x000000011a2a6b0d WTF::Function<void ()>::CallableWrapper<std::__1::__bind<void (WebCore::GenericEventQueue::*)(), WebCore::GenericEventQueue*> >::call() + 221 (Function.h:89)
8   com.apple.WebCore             	0x000000011977cdfe WTF::Function<void ()>::operator()() const + 94 (Function.h:50)
9   com.apple.WebCore             	0x000000011a2a6399 WebCore::GenericTaskQueue<WebCore::Timer>::enqueueTask(WTF::Function<void ()>&&)::'lambda'()::operator()() const + 137 (GenericTaskQueue.h:99)
10  com.apple.WebCore             	0x000000011a2a6269 WTF::Function<void ()>::CallableWrapper<WebCore::GenericTaskQueue<WebCore::Timer>::enqueueTask(WTF::Function<void ()>&&)::'lambda'()>::call() + 25 (Function.h:89)
11  com.apple.WebCore             	0x000000011977cdfe WTF::Function<void ()>::operator()() const + 94 (Function.h:50)
12  com.apple.WebCore             	0x000000011a2a7487 WebCore::TaskDispatcher<WebCore::Timer>::dispatchOneTask() + 119 (GenericTaskQueue.cpp:81)
13  com.apple.WebCore             	0x000000011a2a721f WebCore::TaskDispatcher<WebCore::Timer>::sharedTimerFired() + 255 (GenericTaskQueue.cpp:67)
14  com.apple.WebCore             	0x000000011a2a98da void std::__1::__invoke_void_return_wrapper<void>::__call<void (*&)()>(void (*&&&)()) + 42 (__functional_base:469)
15  com.apple.WebCore             	0x000000011a2a9889 std::__1::__function::__func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()() + 41 (functional:1437)
16  com.apple.WebCore             	0x000000011975b3ea std::__1::function<void ()>::operator()() const + 26 (functional:1817)
17  com.apple.WebCore             	0x000000011975b309 WebCore::Timer::fired() + 25 (Timer.h:135)
18  com.apple.WebCore             	0x000000011c23c930 WebCore::ThreadTimers::sharedTimerFiredInternal() + 480 (ThreadTimers.cpp:121)
19  com.apple.WebCore             	0x000000011c23db71 WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const + 33 (ThreadTimers.cpp:70)
20  com.apple.WebCore             	0x000000011c23db3d void std::__1::__invoke_void_return_wrapper<void>::__call<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0&>(WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0&&&) + 45 (__functional_base:469)
21  com.apple.WebCore             	0x000000011c23dae9 std::__1::__function::__func<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0, std::__1::allocator<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0>, void ()>::operator()() + 41 (functional:1437)
22  com.apple.WebCore             	0x000000011975b3ea std::__1::function<void ()>::operator()() const + 26 (functional:1817)
23  com.apple.WebCore             	0x000000011b4ee2a8 WebCore::MainThreadSharedTimer::fired() + 104 (MainThreadSharedTimer.cpp:53)
24  com.apple.WebCore             	0x000000011b4ee639 WebCore::timerFired(__CFRunLoopTimer*, void*) + 41 (MainThreadSharedTimerCF.cpp:74)
25  com.apple.CoreFoundation      	0x00007fffaf2a0de4 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
26  com.apple.CoreFoundation      	0x00007fffaf2a0a73 __CFRunLoopDoTimer + 1075
27  com.apple.CoreFoundation      	0x00007fffaf2a05ca __CFRunLoopDoTimers + 298
28  com.apple.CoreFoundation      	0x00007fffaf297fa1 __CFRunLoopRun + 2081
29  com.apple.CoreFoundation      	0x00007fffaf297524 CFRunLoopRunSpecific + 420
30  com.apple.HIToolbox           	0x00007fffae7f7ebc RunCurrentEventLoopInMode + 240
31  com.apple.HIToolbox           	0x00007fffae7f7cf1 ReceiveNextEventCommon + 432
32  com.apple.HIToolbox           	0x00007fffae7f7b26 _BlockUntilNextEventMatchingListInModeWithFilter + 71
33  com.apple.AppKit              	0x00007fffacd92e24 _DPSNextEvent + 1120
34  com.apple.AppKit              	0x00007fffad50e85e -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 2796
35  com.apple.AppKit              	0x00007fffacd877ab -[NSApplication run] + 926
36  com.apple.AppKit              	0x00007fffacd521de NSApplicationMain + 1237
37  libxpc.dylib                  	0x00007fffc52118c7 _xpc_objc_main + 775
38  libxpc.dylib                  	0x00007fffc52102e4 xpc_main + 494
39  com.apple.WebKit.WebContent   	0x000000010c2e8105 main + 1189 (XPCServiceMain.mm:148)
40  libdyld.dylib                 	0x00007fffc4fb8235 start + 1
Comment 2 Matt Lewis 2017-04-27 16:49:43 PDT
This does not appear to be a recent regression and may already be tracked with:
rdar://problem/30945281
Comment 3 Matt Lewis 2017-04-27 17:39:02 PDT
Created attachment 308485 [details]
Patch
Comment 4 WebKit Commit Bot 2017-04-27 18:23:48 PDT
Comment on attachment 308485 [details]
Patch

Clearing flags on attachment: 308485

Committed r215912: <http://trac.webkit.org/changeset/215912>
Comment 5 WebKit Commit Bot 2017-04-27 18:23:50 PDT
All reviewed patches have been landed.  Closing bug.
Comment 6 Ryan Haddad 2017-04-27 18:25:49 PDT
Reopening because the patch only marked the test as flaky.
Comment 7 Joseph Pecoraro 2017-04-27 23:57:11 PDT
At this point the crash log is not available via the links. It would be great if crash logs were attached to the bugzilla bug when filing bugs about a crash.
Comment 8 Joseph Pecoraro 2017-04-28 00:09:11 PDT
This test just loads a page with an  <object>. I think its possible that this test itself is not the cause of the crash, but another test (perhaps the one or two preceeding it) that did not clean up properly. So skipping this test if the actual cause is another test does little good.

The crash is happening due to a timer from a GenericEventQueue firing. GenericTaskQueues are primarily used by Media code, and there aren't many. Perhaps we can just audit uses of GenericTaskQueues.
Comment 9 Joseph Pecoraro 2017-04-28 00:30:26 PDT
It doesn't look like an assert or a nullptr crash. From the snipper of a crash log provided in the earlier comment this looks like it could be a dangling pointer.

Since the number of GenericEventQueues is small I looked around.

HTMLMediaElement closes all its GenericTaskQueues in HTMLMediaElement::contextDestroyed() but doesn't close its GenericEventQueue. That seems rather suspect to me:

  (1) GenericTaskQueue was in the backtrace

  (2) The ScriptExecutionContext is apparently being destroyed here, so if
      a timer eventually fires for this queue it would be bad. Hence the crash
      apparently attempting to check the type of a ScriptExecutionContext.

Jer / Eric, what do you think? Should HTMLMediaElement be closing the m_asyncEventQueue here? Would it have been guaranteed to have been closed / cleared earlier?
Comment 10 Eric Carlson 2017-04-28 08:20:45 PDT
(In reply to Joseph Pecoraro from comment #9)
> It doesn't look like an assert or a nullptr crash. From the snipper of a
> crash log provided in the earlier comment this looks like it could be a
> dangling pointer.
> 
> Since the number of GenericEventQueues is small I looked around.
> 
> HTMLMediaElement closes all its GenericTaskQueues in
> HTMLMediaElement::contextDestroyed() but doesn't close its
> GenericEventQueue. That seems rather suspect to me:
> 
>   (1) GenericTaskQueue was in the backtrace
> 
>   (2) The ScriptExecutionContext is apparently being destroyed here, so if
>       a timer eventually fires for this queue it would be bad. Hence the
> crash
>       apparently attempting to check the type of a ScriptExecutionContext.
> 
> Jer / Eric, what do you think? Should HTMLMediaElement be closing the
> m_asyncEventQueue here? Would it have been guaranteed to have been closed /
> cleared earlier?

The m_asyncEventQueue is closed in HTMLMediaElement::stop, which should be called before contextDestroyed, but it does seem wrong that all of the other queues are closed in HTMLMediaElement::contextDestroyed. 

m_promiseTaskQueue is closed in both HTMLMediaElement::contextDestroyed and HTMLMediaElement::stop, and m_asyncEventQueue is closed in HTMLMediaElement::stop and in the destructor. I don't know if it will fix this crash, but it may be useful to put all of the queue/timer cleanup one place and add asserts in the places we assume everything has been cleaned up.
Comment 11 Jer Noble 2017-04-28 08:30:04 PDT
(In reply to Eric Carlson from comment #10)
> 
> The m_asyncEventQueue is closed in HTMLMediaElement::stop, which should be
> called before contextDestroyed, but it does seem wrong that all of the other
> queues are closed in HTMLMediaElement::contextDestroyed. 
> 
> m_promiseTaskQueue is closed in both HTMLMediaElement::contextDestroyed and
> HTMLMediaElement::stop, and m_asyncEventQueue is closed in
> HTMLMediaElement::stop and in the destructor. I don't know if it will fix
> this crash, but it may be useful to put all of the queue/timer cleanup one
> place and add asserts in the places we assume everything has been cleaned up.

We could probably fix this narrow issue by closing the queue in contextDestroyed(), but perhaps the more interesting issue is why the ScriptExecutionContext/Document was destroyed without first calling stopActiveDOMObjects(), or why the HTMLMediaElement wasn't considered to be an ActiveDOMObject at destruction time.
Comment 12 Chris Dumez 2017-05-02 09:13:08 PDT
Assuming the EventTarget is a Node (as it is likely the case since most GenericTaskQueue queues are on HTMLMediaElement), then scriptExecution() context is supposed to return the Node's document(). A node's document() is always valid & alive as long as the Node is itself alive.

Therefore, I suspect that either:
1. The EventTarget is not a Node
2. The HTMLMediaElement is itself dead.
Comment 13 Chris Dumez 2017-05-02 09:27:33 PDT
Based on the trace, it is a GenericEventQueue so it could be:
- MediaKeySession::m_eventQueue
- WebKitMediaKeySession::m_asyncEventQueue
- MediaSource::m_asyncEventQueue
- SourceBuffer::m_asyncEventQueue
- SourceBufferList::m_asyncEventQueue
- AudioContext::m_eventQueue
- HTMLMediaElement::m_asyncEventQueue
- TrackListBase::m_asyncEventQueue

As Joe pointed out, the crash is unlikely to be caused by imported/blink/paint/deprecatedpaintlayer/non-self-painting-layer-overrides-visibility.html but rather a test that ran shortly before it. Given that this is the first test in LayoutTests/imported/blink/paint/ folder, I think it is likely caused by a test inside LayoutTests/imported/blink/media/ folder.
Comment 14 Chris Dumez 2017-05-02 09:44:48 PDT
SourceBufferList is potentially suspicious as it:
- Has a GenericEventQueue data member
- overrides EventTarget::scriptExecutionContext() to return its m_scriptExecutionContext member which is an unprotected raw pointer.
- I see no code clearing m_scriptExecutionContext when the document gets destroyed

We probably want SourceBufferList to subclass ContextDestructionObserver instead.

Similar issue for TrackListBase.

TextTrack::m_scriptExecution context also looks potentially unsafe, although TextTrack does not have a GenericEventQueue (it could still be the Event::target() of an Event that is inside the GenericEventQueue).
Comment 15 Chris Dumez 2017-05-02 10:19:20 PDT
<rdar://problem/30945281>
Comment 16 Chris Dumez 2017-05-02 10:40:15 PDT
Created attachment 308832 [details]
Patch
Comment 17 Chris Dumez 2017-05-02 12:04:03 PDT
Comment on attachment 308832 [details]
Patch

Clearing flags on attachment: 308832

Committed r216084: <http://trac.webkit.org/changeset/216084>
Comment 18 Chris Dumez 2017-05-02 12:04:05 PDT
All reviewed patches have been landed.  Closing bug.
Comment 19 Lucas Forschler 2019-02-06 09:18:32 PST
Mass move bugs into the DOM component.