RESOLVED DUPLICATE of bug 191217 180174
REGRESSION: Layout Test storage/indexeddb/detached-iframe.html is a flaky crash
https://bugs.webkit.org/show_bug.cgi?id=180174
Summary REGRESSION: Layout Test storage/indexeddb/detached-iframe.html is a flaky crash
Matt Lewis
Reported 2017-11-29 16:44:21 PST
Created attachment 327924 [details] WK1 Crash-Log storage/indexeddb/detached-iframe.html is a flaky crash on macOS according to the flakiness dashboard: https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=storage%2Findexeddb%2Fdetached-iframe.html Crashed Thread: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 com.apple.WebCore 0x0000000450402a8c WebCore::Frame::page() const + 12 (Frame.h:373) 1 com.apple.WebCore 0x0000000451830325 WebCore::InspectorInstrumentation::instrumentingAgentsForFrame(WebCore::Frame&) + 21 (InspectorInstrumentation.h:1431) 2 com.apple.WebCore 0x0000000452480116 WebCore::InspectorInstrumentation::didDispatchPostMessage(WebCore::Frame&, WebCore::TimerBase&) + 54 (InspectorInstrumentation.h:731) 3 com.apple.WebCore 0x000000045247fe57 WebCore::DOMWindow::postMessageTimerFired(WebCore::PostMessageTimer&) + 551 (DOMWindow.cpp:973) 4 com.apple.WebCore 0x0000000452491cfe WebCore::PostMessageTimer::fired() + 350 (DOMWindow.cpp:176) 5 com.apple.WebCore 0x00000004526a4964 WebCore::ThreadTimers::sharedTimerFiredInternal() + 452 (ThreadTimers.cpp:118) 6 com.apple.WebCore 0x00000004526b91c1 WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const + 33 (ThreadTimers.cpp:70) 7 com.apple.WebCore 0x00000004526b9179 WTF::Function<void ()>::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0>::call() + 25 (Function.h:101) 8 com.apple.WebCore 0x000000045000f1ab WTF::Function<void ()>::operator()() const + 139 (Function.h:56) 9 com.apple.WebCore 0x000000045267c3d5 WebCore::MainThreadSharedTimer::fired() + 101 (MainThreadSharedTimer.cpp:55) 10 com.apple.WebCore 0x000000045271bdb9 WebCore::timerFired(__CFRunLoopTimer*, void*) + 41 (MainThreadSharedTimerCF.cpp:74) 11 com.apple.CoreFoundation 0x00007fff31a46084 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20 12 com.apple.CoreFoundation 0x00007fff31a45d04 __CFRunLoopDoTimer + 1108 13 com.apple.CoreFoundation 0x00007fff31a457fa __CFRunLoopDoTimers + 346 14 com.apple.CoreFoundation 0x00007fff31a3cfcb __CFRunLoopRun + 2427 15 com.apple.CoreFoundation 0x00007fff31a3c3b7 CFRunLoopRunSpecific + 487 16 com.apple.HIToolbox 0x00007fff30d49e26 RunCurrentEventLoopInMode + 286 17 com.apple.HIToolbox 0x00007fff30d49b96 ReceiveNextEventCommon + 613 18 com.apple.HIToolbox 0x00007fff30d49914 _BlockUntilNextEventMatchingListInModeWithFilter + 64 19 com.apple.AppKit 0x00007fff2f014f5f _DPSNextEvent + 2085 20 com.apple.AppKit 0x00007fff2f7aab4c -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044 21 com.apple.AppKit 0x00007fff2f009d6d -[NSApplication run] + 764 22 com.apple.AppKit 0x00007fff2efd8f1a NSApplicationMain + 804 23 libxpc.dylib 0x00007fff595ff42f _xpc_objc_main + 580 24 libxpc.dylib 0x00007fff595fe082 xpc_main + 417 25 com.apple.WebKit.WebContent 0x000000010ea4912b main + 1195 26 libdyld.dylib 0x00007fff59332115 start + 1 I have yet to figure out a regression point, but the first failure happened with: https://build.webkit.org/builders/Apple%20High%20Sierra%20Release%20WK2%20(Tests)/builds/1145 The crash is slightly different between the WK1 and WK2 crashes.
Attachments
WK1 Crash-Log (114.69 KB, text/plain)
2017-11-29 16:44 PST, Matt Lewis
no flags
WK2 Crash Log (95.98 KB, text/plain)
2017-11-29 16:44 PST, Matt Lewis
no flags
Patch (6.11 KB, patch)
2017-12-04 09:38 PST, Matt Baker
joepeck: review-
Matt Lewis
Comment 1 2017-11-29 16:44:56 PST
Created attachment 327925 [details] WK2 Crash Log Attaching the WK2 crash log as well
Radar WebKit Bug Importer
Comment 2 2017-11-29 16:47:45 PST
Alexey Proskuryakov
Comment 3 2017-12-01 11:07:40 PST
The first failure on the bots was on 2017-11-14 around 1 am. It's not super frequent, but clearly a recent regression. There are also some STP crashes with similar signatures, which I didn't look into carefully.
Joseph Pecoraro
Comment 4 2017-12-01 11:15:01 PST
The crash looks like a null Frame: > Exception Type: EXC_BAD_ACCESS (SIGSEGV) > Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000040 > > ... > > Thread 0 Crashed:: Dispatch queue: com.apple.main-thread > 0 com.apple.WebCore 0x0000000450402a8c WebCore::Frame::page() const + 12 (Frame.h:373) > 1 com.apple.WebCore 0x0000000451830325 WebCore::InspectorInstrumentation::instrumentingAgentsForFrame(WebCore::Frame&) + 21 (InspectorInstrumentation.h:1431) > 2 com.apple.WebCore 0x0000000452480116 WebCore::InspectorInstrumentation::didDispatchPostMessage(WebCore::Frame&, WebCore::TimerBase&) + 54 (InspectorInstrumentation.h:731) > 3 com.apple.WebCore 0x000000045247fe57 WebCore::DOMWindow::postMessageTimerFired(WebCore::PostMessageTimer&) + 551 (DOMWindow.cpp:973) Coming from: > void DOMWindow::postMessageTimerFired(PostMessageTimer& timer) > { > if (!document() || !isCurrentlyDisplayedInFrame()) > return; > > ... > > InspectorInstrumentation::willDispatchPostMessage(*m_frame, timer); > > dispatchEvent(timer.event(*document())); > > InspectorInstrumentation::didDispatchPostMessage(*m_frame, timer); > } So my guess is `m_frame` is nullptr by didDispatchPostMessage time (and presumably wasn't at willDispatchPostMessage time).
Matt Baker
Comment 5 2017-12-01 11:46:46 PST
(In reply to Joseph Pecoraro from comment #4) > The crash looks like a null Frame: > > > Exception Type: EXC_BAD_ACCESS (SIGSEGV) > > Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000040 > > > > ... > > > > Thread 0 Crashed:: Dispatch queue: com.apple.main-thread > > 0 com.apple.WebCore 0x0000000450402a8c WebCore::Frame::page() const + 12 (Frame.h:373) > > 1 com.apple.WebCore 0x0000000451830325 WebCore::InspectorInstrumentation::instrumentingAgentsForFrame(WebCore::Frame&) + 21 (InspectorInstrumentation.h:1431) > > 2 com.apple.WebCore 0x0000000452480116 WebCore::InspectorInstrumentation::didDispatchPostMessage(WebCore::Frame&, WebCore::TimerBase&) + 54 (InspectorInstrumentation.h:731) > > 3 com.apple.WebCore 0x000000045247fe57 WebCore::DOMWindow::postMessageTimerFired(WebCore::PostMessageTimer&) + 551 (DOMWindow.cpp:973) > > Coming from: > > > void DOMWindow::postMessageTimerFired(PostMessageTimer& timer) > > { > > if (!document() || !isCurrentlyDisplayedInFrame()) > > return; > > > > ... > > > > InspectorInstrumentation::willDispatchPostMessage(*m_frame, timer); > > > > dispatchEvent(timer.event(*document())); > > > > InspectorInstrumentation::didDispatchPostMessage(*m_frame, timer); > > } > > So my guess is `m_frame` is nullptr by didDispatchPostMessage time (and > presumably wasn't at willDispatchPostMessage time). So something is calling FrameDestructionObserver::frameDestroyed() during event dispatch. If this is the case, can we still get the PageDebuggerAgent for the frame, in order to clean up async call stack state? Is the agent even around anymore?
Joseph Pecoraro
Comment 6 2017-12-01 11:50:35 PST
> > So my guess is `m_frame` is nullptr by didDispatchPostMessage time (and > > presumably wasn't at willDispatchPostMessage time). > > So something is calling FrameDestructionObserver::frameDestroyed() during > event dispatch. If this is the case, can we still get the PageDebuggerAgent > for the frame, in order to clean up async call stack state? Is the agent > even around anymore? The crash happens in looking up the list of agents from the frame (InspectorInstrumentation level), but the frame is null. And PageDebuggerAgent is 1 per Page anyways, so I would expect it to exist, but we still crash before reaching that point.
Matt Baker
Comment 7 2017-12-04 09:38:37 PST
Alex Christensen
Comment 8 2017-12-05 15:11:45 PST
Comment on attachment 328353 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review > Source/WebCore/page/DOMWindow.cpp:954 > - InspectorInstrumentation::didPostMessage(*m_frame, *timer, state); > + InspectorInstrumentation::didPostMessage(*document(), *timer, state); I don't understand how this will fix anything. We are just changing the assumption that the frame is non-null to an assumption that the document is non-null. Is that assumption more valid? Why don't we want an early return or something?
Joseph Pecoraro
Comment 9 2017-12-05 15:42:39 PST
Comment on attachment 328353 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review >> Source/WebCore/page/DOMWindow.cpp:954 >> + InspectorInstrumentation::didPostMessage(*document(), *timer, state); > > I don't understand how this will fix anything. We are just changing the assumption that the frame is non-null to an assumption that the document is non-null. Is that assumption more valid? Why don't we want an early return or something? Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one.
Joseph Pecoraro
Comment 10 2017-12-05 15:43:40 PST
Comment on attachment 328353 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review >>> Source/WebCore/page/DOMWindow.cpp:954 >>> + InspectorInstrumentation::didPostMessage(*document(), *timer, state); >> >> I don't understand how this will fix anything. We are just changing the assumption that the frame is non-null to an assumption that the document is non-null. Is that assumption more valid? Why don't we want an early return or something? > > Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one. And if possible we should write a test. Script on the page can register for that dispatchEvent and close the frame in the JS handler.
Matt Baker
Comment 11 2017-12-06 12:07:51 PST
Comment on attachment 328353 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=328353&action=review >>>> Source/WebCore/page/DOMWindow.cpp:954 >>>> + InspectorInstrumentation::didPostMessage(*document(), *timer, state); >>> >>> I don't understand how this will fix anything. We are just changing the assumption that the frame is non-null to an assumption that the document is non-null. Is that assumption more valid? Why don't we want an early return or something? >> >> Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one. > > And if possible we should write a test. Script on the page can register for that dispatchEvent and close the frame in the JS handler. The document/frame/whatever that is passed to Inspector instrumentation is only used to retrieve instrumenting agents, so we just need something that will outlive the detached frame. We can use page() instead. As far as a test goes, detached-frame.html performs the same steps you describe (closing the frame in the JS handler during dispatchEvent). Is the existing test insufficient?
Joseph Pecoraro
Comment 12 2017-12-06 12:22:43 PST
> >> Yeah, I agree. I think the specific concern was at line 984, after dispatchEvent() the document and frame might not exist so we should check instead of assuming either one. > > > > And if possible we should write a test. Script on the page can register for that dispatchEvent and close the frame in the JS handler. > > The document/frame/whatever that is passed to Inspector instrumentation is > only used to retrieve instrumenting agents, so we just need something that > will outlive the detached frame. We can use page() instead. page() just uses m_frame under the hood, and can be null: Page* DOMWindow::page() { return frame() ? frame()->page() : nullptr; } So it is no different, and would still need an if check. > As far as a test goes, detached-frame.html performs the same steps you > describe (closing the frame in the JS handler during dispatchEvent). > Is the existing test insufficient? I suppose it is, but its flakey and could go away, be skipped, or changed and we'd never know if there is coverage for this issue which we want to fix. Ideally there would be a specific test covering this specific scenario. If one can't be constructed, then we can fallback to the existing test.
Sihui Liu
Comment 13 2018-11-08 15:33:39 PST
*** This bug has been marked as a duplicate of bug 191217 ***
Note You need to log in before you can comment on or make changes to this bug.