Bug 180174

Summary: REGRESSION: Layout Test storage/indexeddb/detached-iframe.html is a flaky crash
Product: WebKit Reporter: Matt Lewis <jlewis3>
Component: Web InspectorAssignee: Matt Baker <mattbaker>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: achristensen, ap, cdumez, inspector-bugzilla-changes, joepeck, mattbaker, sihui_liu, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
WK1 Crash-Log
none
WK2 Crash Log
none
Patch joepeck: review-

Description Matt Lewis 2017-11-29 16:44:21 PST
Created attachment 327924 [details]
WK1 Crash-Log

storage/indexeddb/detached-iframe.html is a flaky crash on macOS according to the flakiness dashboard:
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=storage%2Findexeddb%2Fdetached-iframe.html

Crashed Thread:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   com.apple.WebCore             	0x0000000450402a8c WebCore::Frame::page() const + 12 (Frame.h:373)
1   com.apple.WebCore             	0x0000000451830325 WebCore::InspectorInstrumentation::instrumentingAgentsForFrame(WebCore::Frame&) + 21 (InspectorInstrumentation.h:1431)
2   com.apple.WebCore             	0x0000000452480116 WebCore::InspectorInstrumentation::didDispatchPostMessage(WebCore::Frame&, WebCore::TimerBase&) + 54 (InspectorInstrumentation.h:731)
3   com.apple.WebCore             	0x000000045247fe57 WebCore::DOMWindow::postMessageTimerFired(WebCore::PostMessageTimer&) + 551 (DOMWindow.cpp:973)
4   com.apple.WebCore             	0x0000000452491cfe WebCore::PostMessageTimer::fired() + 350 (DOMWindow.cpp:176)
5   com.apple.WebCore             	0x00000004526a4964 WebCore::ThreadTimers::sharedTimerFiredInternal() + 452 (ThreadTimers.cpp:118)
6   com.apple.WebCore             	0x00000004526b91c1 WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const + 33 (ThreadTimers.cpp:70)
7   com.apple.WebCore             	0x00000004526b9179 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0>::call() + 25 (Function.h:101)
8   com.apple.WebCore             	0x000000045000f1ab WTF::Function<void ()>::operator()() const + 139 (Function.h:56)
9   com.apple.WebCore             	0x000000045267c3d5 WebCore::MainThreadSharedTimer::fired() + 101 (MainThreadSharedTimer.cpp:55)
10  com.apple.WebCore             	0x000000045271bdb9 WebCore::timerFired(__CFRunLoopTimer*, void*) + 41 (MainThreadSharedTimerCF.cpp:74)
11  com.apple.CoreFoundation      	0x00007fff31a46084 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
12  com.apple.CoreFoundation      	0x00007fff31a45d04 __CFRunLoopDoTimer + 1108
13  com.apple.CoreFoundation      	0x00007fff31a457fa __CFRunLoopDoTimers + 346
14  com.apple.CoreFoundation      	0x00007fff31a3cfcb __CFRunLoopRun + 2427
15  com.apple.CoreFoundation      	0x00007fff31a3c3b7 CFRunLoopRunSpecific + 487
16  com.apple.HIToolbox           	0x00007fff30d49e26 RunCurrentEventLoopInMode + 286
17  com.apple.HIToolbox           	0x00007fff30d49b96 ReceiveNextEventCommon + 613
18  com.apple.HIToolbox           	0x00007fff30d49914 _BlockUntilNextEventMatchingListInModeWithFilter + 64
19  com.apple.AppKit              	0x00007fff2f014f5f _DPSNextEvent + 2085
20  com.apple.AppKit              	0x00007fff2f7aab4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
21  com.apple.AppKit              	0x00007fff2f009d6d -[NSApplication run] + 764
22  com.apple.AppKit              	0x00007fff2efd8f1a NSApplicationMain + 804
23  libxpc.dylib                  	0x00007fff595ff42f _xpc_objc_main + 580
24  libxpc.dylib                  	0x00007fff595fe082 xpc_main + 417
25  com.apple.WebKit.WebContent   	0x000000010ea4912b main + 1195
26  libdyld.dylib                 	0x00007fff59332115 start + 1



I have yet to figure out a regression point, but the first failure happened with:
https://build.webkit.org/builders/Apple%20High%20Sierra%20Release%20WK2%20(Tests)/builds/1145

The crash is slightly different between the WK1 and WK2 crashes.
Comment 1 Matt Lewis 2017-11-29 16:44:56 PST
Created attachment 327925 [details]
WK2 Crash Log

Attaching the WK2 crash log as well
Comment 2 Radar WebKit Bug Importer 2017-11-29 16:47:45 PST
<rdar://problem/35761228>
Comment 3 Alexey Proskuryakov 2017-12-01 11:07:40 PST
The first failure on the bots was on 2017-11-14 around 1 am. It's not super frequent, but clearly a recent regression.

There are also some STP crashes with similar signatures, which I didn't look into carefully.
Comment 4 Joseph Pecoraro 2017-12-01 11:15:01 PST
The crash looks like a null Frame:

> Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
> Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000040
> 
> ...
> 
> Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
> 0   com.apple.WebCore             	0x0000000450402a8c WebCore::Frame::page() const + 12 (Frame.h:373)
> 1   com.apple.WebCore             	0x0000000451830325 WebCore::InspectorInstrumentation::instrumentingAgentsForFrame(WebCore::Frame&) + 21 (InspectorInstrumentation.h:1431)
> 2   com.apple.WebCore             	0x0000000452480116 WebCore::InspectorInstrumentation::didDispatchPostMessage(WebCore::Frame&, WebCore::TimerBase&) + 54 (InspectorInstrumentation.h:731)
> 3   com.apple.WebCore             	0x000000045247fe57 WebCore::DOMWindow::postMessageTimerFired(WebCore::PostMessageTimer&) + 551 (DOMWindow.cpp:973)

Coming from:

>    void DOMWindow::postMessageTimerFired(PostMessageTimer& timer)
>    {
>        if (!document() || !isCurrentlyDisplayedInFrame())
>            return;
>
>        ...
>
>        InspectorInstrumentation::willDispatchPostMessage(*m_frame, timer);
>
>        dispatchEvent(timer.event(*document()));
>
>        InspectorInstrumentation::didDispatchPostMessage(*m_frame, timer);
>    }

So my guess is `m_frame` is nullptr by didDispatchPostMessage time (and presumably wasn't at willDispatchPostMessage time).
Comment 5 Matt Baker 2017-12-01 11:46:46 PST
(In reply to Joseph Pecoraro from comment #4)
> The crash looks like a null Frame:
> 
> > Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
> > Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000040
> > 
> > ...
> > 
> > Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
> > 0   com.apple.WebCore             	0x0000000450402a8c WebCore::Frame::page() const + 12 (Frame.h:373)
> > 1   com.apple.WebCore             	0x0000000451830325 WebCore::InspectorInstrumentation::instrumentingAgentsForFrame(WebCore::Frame&) + 21 (InspectorInstrumentation.h:1431)
> > 2   com.apple.WebCore             	0x0000000452480116 WebCore::InspectorInstrumentation::didDispatchPostMessage(WebCore::Frame&, WebCore::TimerBase&) + 54 (InspectorInstrumentation.h:731)
> > 3   com.apple.WebCore             	0x000000045247fe57 WebCore::DOMWindow::postMessageTimerFired(WebCore::PostMessageTimer&) + 551 (DOMWindow.cpp:973)
> 
> Coming from:
> 
> >    void DOMWindow::postMessageTimerFired(PostMessageTimer& timer)
> >    {
> >        if (!document() || !isCurrentlyDisplayedInFrame())
> >            return;
> >
> >        ...
> >
> >        InspectorInstrumentation::willDispatchPostMessage(*m_frame, timer);
> >
> >        dispatchEvent(timer.event(*document()));
> >
> >        InspectorInstrumentation::didDispatchPostMessage(*m_frame, timer);
> >    }
> 
> So my guess is `m_frame` is nullptr by didDispatchPostMessage time (and
> presumably wasn't at willDispatchPostMessage time).

So something is calling FrameDestructionObserver::frameDestroyed() during event dispatch. If this is the case, can we still get the PageDebuggerAgent for the frame, in order to clean up async call stack state? Is the agent even around anymore?
Comment 6 Joseph Pecoraro 2017-12-01 11:50:35 PST
> > So my guess is `m_frame` is nullptr by didDispatchPostMessage time (and
> > presumably wasn't at willDispatchPostMessage time).
> 
> So something is calling FrameDestructionObserver::frameDestroyed() during
> event dispatch. If this is the case, can we still get the PageDebuggerAgent
> for the frame, in order to clean up async call stack state? Is the agent
> even around anymore?

The crash happens in looking up the list of agents from the frame (InspectorInstrumentation level), but the frame is null.

And PageDebuggerAgent is 1 per Page anyways, so I would expect it to exist, but we still crash before reaching that point.
Comment 7 Matt Baker 2017-12-04 09:38:37 PST
Created attachment 328353 [details]
Patch
Comment 8 Alex Christensen 2017-12-05 15:11:45 PST
Comment on attachment 328353 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review

> Source/WebCore/page/DOMWindow.cpp:954
> -    InspectorInstrumentation::didPostMessage(*m_frame, *timer, state);
> +    InspectorInstrumentation::didPostMessage(*document(), *timer, state);

I don't understand how this will fix anything.  We are just changing the assumption that the frame is non-null to an assumption that the document is non-null.  Is that assumption more valid?  Why don't we want an early return or something?
Comment 9 Joseph Pecoraro 2017-12-05 15:42:39 PST
Comment on attachment 328353 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review

>> Source/WebCore/page/DOMWindow.cpp:954
>> +    InspectorInstrumentation::didPostMessage(*document(), *timer, state);
> 
> I don't understand how this will fix anything.  We are just changing the assumption that the frame is non-null to an assumption that the document is non-null.  Is that assumption more valid?  Why don't we want an early return or something?

Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one.
Comment 10 Joseph Pecoraro 2017-12-05 15:43:40 PST
Comment on attachment 328353 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review

>>> Source/WebCore/page/DOMWindow.cpp:954
>>> +    InspectorInstrumentation::didPostMessage(*document(), *timer, state);
>> 
>> I don't understand how this will fix anything.  We are just changing the assumption that the frame is non-null to an assumption that the document is non-null.  Is that assumption more valid?  Why don't we want an early return or something?
> 
> Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one.

And if possible we should write a test. Script on the page can register for that dispatchEvent and close the frame in the JS handler.
Comment 11 Matt Baker 2017-12-06 12:07:51 PST
Comment on attachment 328353 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review

>>>> Source/WebCore/page/DOMWindow.cpp:954
>>>> +    InspectorInstrumentation::didPostMessage(*document(), *timer, state);
>>> 
>>> I don't understand how this will fix anything.  We are just changing the assumption that the frame is non-null to an assumption that the document is non-null.  Is that assumption more valid?  Why don't we want an early return or something?
>> 
>> Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one.
> 
> And if possible we should write a test. Script on the page can register for that dispatchEvent and close the frame in the JS handler.

The document/frame/whatever that is passed to Inspector instrumentation is only used to retrieve instrumenting agents, so we just need something that will outlive the detached frame. We can use page() instead. As far as a test goes, detached-frame.html performs the same steps you describe (closing the frame in the JS handler during dispatchEvent). Is the existing test insufficient?
Comment 12 Joseph Pecoraro 2017-12-06 12:22:43 PST
> >> Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one.
> > 
> > And if possible we should write a test. Script on the page can register for that dispatchEvent and close the frame in the JS handler.
> 
> The document/frame/whatever that is passed to Inspector instrumentation is
> only used to retrieve instrumenting agents, so we just need something that
> will outlive the detached frame. We can use page() instead.

page() just uses m_frame under the hood, and can be null:

    Page* DOMWindow::page()
    {
        return frame() ? frame()->page() : nullptr;
    }

So it is no different, and would still need an if check.


> As far as a test goes, detached-frame.html performs the same steps you
> describe (closing the frame in the JS handler during dispatchEvent).
> Is the existing test insufficient?

I suppose it is, but its flakey and could go away, be skipped, or changed and we'd never know if there is coverage for this issue which we want to fix. Ideally there would be a specific test covering this specific scenario. If one can't be constructed, then we can fallback to the existing test.
Comment 13 Sihui Liu 2018-11-08 15:33:39 PST

*** This bug has been marked as a duplicate of bug 191217 ***