RESOLVED FIXED 237917
[WPE][GTK] Fix a crash after r290360
https://bugs.webkit.org/show_bug.cgi?id=237917
Summary [WPE][GTK] Fix a crash after r290360
Pablo Saavedra
Reported 2022-03-15 13:15:02 PDT
When navigating from one website to another with a different domain with `JS location.replace("https://other.domain.foo")` there is chances to get this crash: ``` was generated by `/usr/libexec/wpe-webkit-1.0/WPEWebProcess 17 75'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x74eeb448 in WebKit::WebProcess::terminate() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 [Current thread is 1 (LWP 115)] (gdb) bt #0 0x74eeb448 in WebKit::WebProcess::terminate() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #1 0x74eeb2dc in WebKit::WebProcess::removeWebPage(WTF::ObjectIdentifier<WebCore::PageIdentifierType>) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #2 0x74f75554 in WebKit::WebPage::close() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #3 0x74f94c96 in WebKit::WebProcess::stopRunLoop() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #4 0x74d62986 in IPC::Connection::dispatchMessage(std::unique_ptr<IPC::Decoder, std::default_delete<IPC::Decoder> >) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #5 0x74d62c22 in IPC::Connection::dispatchOneIncomingMessage() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #6 0x7686b89a in WTF::RunLoop::performWork() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #7 0x768a6f70 in WTF::RunLoop::RunLoop()::$_1::__invoke(void*) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #8 0x768a6664 in WTF::RunLoop::$_0::__invoke(_GSource*, int (*)(void*), void*) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #9 0x7453d7b6 in g_main_dispatch (context=0x19948c8) at ../glib-2.62.6/glib/gmain.c:3216 #10 g_main_context_dispatch (context=context@entry=0x19948c8) at ../glib-2.62.6/glib/gmain.c:3908 #11 0x7453da4c in g_main_context_iterate (context=0x19948c8, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib-2.62.6/glib/gmain.c:3981 #12 0x7453dcb8 in g_main_loop_run (loop=0x1995e58) at ../glib-2.62.6/glib/gmain.c:4175 #13 0x768a6ab0 in WTF::RunLoop::run() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #14 0x74f95620 in int WebKit::AuxiliaryProcessMain<WebKit::WebProcessMainWPE>(int, char**) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8 #15 0x748309fa in __libc_start_main (main=0x456fe0, argc=0, argv=0x0, init=<optimized out>, fini=0x455655 <__libc_csu_fini>, rtld_fini=0x76f13029 <_dl_fini>, stack_end=0x7eb164d4) at libc-start.c:308 #16 0x00455508 in _start () at start.S:112 Backtrace stopped: previous frame identical to this frame (corrupt stack?) ```
Attachments
patch (1.33 KB, patch)
2022-03-15 13:29 PDT, Pablo Saavedra
no flags
patch (1.44 KB, patch)
2022-03-15 14:02 PDT, Pablo Saavedra
ews-feeder: commit-queue-
Patch (2.81 KB, patch)
2022-03-17 04:37 PDT, Carlos Garcia Campos
no flags
Pablo Saavedra
Comment 1 2022-03-15 13:23:56 PDT
The segfault is because a doble-call to the WebProcess::terminate() method in the WebProcess::shutdown() path. ``` void WebProcess::terminate() { #ifndef NDEBUG GCController::singleton().garbageCollectNow(); FontCache::singleton().invalidate(); MemoryCache::singleton().setDisabled(true); #endif m_webConnection->invalidate(); <<<<<<<< invalid access during the second invocation m_webConnection = nullptr; <<<<<<<<< set null in the first invocation platformTerminate(); AuxiliaryProcess::terminate(); } ``` Here the stack method calls: AuxiliaryProcess::shutDown(): ``` -> terminate() -> WebProcess::terminate() -> AuxiliaryProcess::terminate() -> AuxiliaryProcess::terminate() -> stopRunLoop() -> WebProcess::stopRunLoop() (from glib/WebProcessGLib.cpp) -> WebPage::close() -> WebProcess::singleton().removeWebPage(m_identifier) -> AuxiliaryProcess::enableTermination() -- m_terminationCounter: 0 -> WebProcess::terminate() ```
Pablo Saavedra
Comment 2 2022-03-15 13:29:01 PDT
Pablo Saavedra
Comment 3 2022-03-15 13:45:44 PDT
With the proposed patch, the new stack could someth WebProcess::stopRunLoop() (from glib/WebProcessGLib.cpp) ``` -> WebPage::close() -> WebProcess::singleton().removeWebPage(m_identifier) -> WebProcess::removeWebPage() -> AuxiliaryProcess::enableTermination() -> WebProcess::terminate() <<<<<<<<<<< Called for each WebProcess associated to each WebPage ... -> WebPage::~WebPage() -> AuxiliaryProcess::stopRunLoop() -> platformStopRunLoop() ```
Pablo Saavedra
Comment 4 2022-03-15 14:02:54 PDT
Michael Catanzaro
Comment 5 2022-03-15 16:56:24 PDT
So many #ifs. I wonder if we can make other ports use the same implementation of AuxiliaryProcess::didClose that we do, where we call stopRunLoop. I know Apple was previously opposed to similar approaches, but I really doubt one run loop iteration should have a major performance impact on process termination speed. That would allow considerably reducing #ifs, and should hopefully make it safer to modify this code without introducing platform-specific problems.
Pablo Saavedra
Comment 6 2022-03-16 02:15:38 PDT
Comment on attachment 454756 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=454756&action=review Notice that this change, even if could be good for avoiding redundant calls to the RunLoop::main().stop(), is not mandatory since it doesn't harms the shutDown logic. > Source/WebKit/Shared/AuxiliaryProcess.cpp:216 > +#if !PLATFORM(GTK) && !PLATFORM(WPE) Notice that this change, even if could be good for avoiding redundant calls to the RunLoop::main().stop(), is not mandatory since it doesn't harms the shutDown logic.
Pablo Saavedra
Comment 7 2022-03-16 02:51:54 PDT
(In reply to Michael Catanzaro from comment #5) > So many #ifs. > > I wonder if we can make other ports use the same implementation of > AuxiliaryProcess::didClose that we do, where we call stopRunLoop. I know > Apple was previously opposed to similar approaches, but I really doubt one > run loop iteration should have a major performance impact on process > termination speed. That would allow considerably reducing #ifs, and should > hopefully make it safer to modify this code without introducing > platform-specific problems. AFAIK, the major concern to move this to the WebProcess.cpp is the loop stop is too slow in some ports. This is the reason why didClose() is implemented with `_exit(EXIT_SUCCESS)`. I don't think that situation changed a lot since the last time this was discussed.
Pablo Saavedra
Comment 8 2022-03-16 02:56:32 PDT
BTW, a simple safe-guard like this also fix the crash: ``` void WebProcess::terminate() { #ifndef NDEBUG GCController::singleton().garbageCollectNow(); FontCache::singleton().invalidate(); MemoryCache::singleton().setDisabled(true); #endif if (m_webConnection) <<<<<<<<<<<<<<< safe-guarded access { m_webConnection->invalidate(); <<<<<<<< invalid access during the second invocation m_webConnection = nullptr; <<<<<<<<< set null in the first invocation } platformTerminate(); AuxiliaryProcess::terminate(); } ``` ; however I don't think the recursive logic followed by the terminate() could be considered strictly correct: ``` -> terminate() -> WebProcess::terminate() -> AuxiliaryProcess::terminate() -> AuxiliaryProcess::terminate() -> stopRunLoop() -> WebProcess::stopRunLoop() (from glib/WebProcessGLib.cpp) -> WebPage::close() -> WebProcess::singleton().removeWebPage(m_identifier) -> AuxiliaryProcess::enableTermination() -- m_terminationCounter: 0 -> WebProcess::terminate() ```
Carlos Garcia Campos
Comment 9 2022-03-16 03:54:10 PDT
Checking m_webConnection was also my first idea, but returning early instead. However, I would like to understand how this can happen, because I would expect the page map to be empty when shutdown is called.
Pablo Saavedra
Comment 10 2022-03-16 07:43:12 PDT
``` XXX ( thread &1844441728 pid &001) -- WebPageProxy::commitProvisionalPage() -- m_process->removeWebPage() XXX ( thread &1844441728 pid &001) -- WebProcessProxy::removeWebPage() XXX ( thread &1844441728 pid &001) -- WebProcessProxy::maybeShutDown() XXX ( thread &1844441728 pid &001) -- WebProcessProxy::canTerminateAuxiliaryProcess() [1] XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=0, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=0) XXX WebProcessProxy::canTerminateAuxiliaryProcess: true XXX ( thread &1844441728 pid &001) -- WebProcessProxy::maybeShutDown() -> shutDown() XXX ( thread &1844441728 pid &001) -- WebProcessProxy::shutDown() XXX ( thread &1887639264 pid &111) -- AuxiliaryProcess::shutDown() -> terminate() -- m_terminationCounter: 1 XXX ( thread &1887639264 pid &111) -- WebProcess::terminate() -- BEGIN -- this (&0x6fff6000) XXX ( thread &1887639264 pid &111) -- WebProcess::terminate() -- 1 -- this (&0x6fff6000) XXX ( thread &1887639264 pid &111) -- WebProcess::terminate() -- 1.1 -- this (&0x6fff6000) XXX ( thread &1887639264 pid &111) -- WebProcess::terminate() -- 2 -- this (&0x6fff6000) XXX ( thread &1887639264 pid &111) -- WebProcess::terminate() -- 3 -- this (&0x6fff6000) XXX ( thread &1887639264 pid &111) -- glib/WebProcessGLib.cpp -- WebProcess::stopRunLoop() [2] XXX ( thread &1887639264 pid &111) -- WebPage::close() --> This will lead in a 2nd terminate() czll ``` notice that: * in [1] pageCount is WebProcessProxy.m_pageMap.size() -> 0. * in [2] WebPage::close() is called because the m_WebProcess.pageMap is not empty At that point it looks like there is a desynchrony between the WebProcessState state and the WebProcess state
Pablo Saavedra
Comment 11 2022-03-16 08:05:10 PDT
(In reply to Carlos Garcia Campos from comment #9) > Checking m_webConnection was also my first idea, but returning early > instead. However, I would like to understand how this can happen, because I > would expect the page map to be empty when shutdown is called. Notice that a similar decision was taken in the WebProcessProxy: ``` void WebProcessProxy::shutDown() { ... if (m_webConnection) { m_webConnection->invalidate(); m_webConnection = nullptr; } ```
Pablo Saavedra
Comment 12 2022-03-16 09:10:06 PDT
Coming back to the comment:10 This change could prevent the m_pageMap being not empty by calling for the page close() before the page being removed from the WebProcessProxy map: ``` diff --git a/Source/WebKit/UIProcess/WebPageProxy.cpp b/Source/WebKit/UIProcess/WebPageProxy.cpp index a7e19d46..745559a3 100644 --- a/Source/WebKit/UIProcess/WebPageProxy.cpp +++ b/Source/WebKit/UIProcess/WebPageProxy.cpp @@ -3607,6 +3615,11 @@ void WebPageProxy::commitProvisionalPage(FrameIdentifier frameID, FrameInfoData& m_process->removeMessageReceiver(Messages::WebPageProxy::messageReceiverName(), m_webPageID); auto* navigation = navigationState().navigation(m_provisionalPage->navigationID()); bool didSuspendPreviousPage = navigation && !m_provisionalPage->shouldClosePreviousPageAfterCommit() ? suspendCurrentPageIfPossible(*navigation, mainFrameIDInPreviousProcess, m_provisionalPage->processSwapRequestedByClient(), shouldDelayClosingUntilFirstLayerFlush) : false; + + RunLoop::current().dispatch([destinationID = messageSenderDestinationID(), protectedProcess = m_process.copyRef(), preventProcessShutdownScope = m_process->shutdownPreventingScope()] { + protectedProcess->send(Messages::WebPage::Close(), destinationID); + }); + m_process->removeWebPage(*this, m_websiteDataStore.ptr() == &m_provisionalPage->process().websiteDataStore() ? WebProcessProxy::EndsUsingDataStore::No : WebProcessProxy::EndsUsingDataStore::Yes); // There is no way we'll be able to return to the page in the previous page so close it. ``` Resulting in this workflow: ``` XXX ( layerFlush thread &1887233760) -- CoordinatedGraphicsLayer::~CoordinatedGraphicsLayer() -- END XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::WebProcessProxy() -- m_shutdownPreventingScopeCounter XXX ( ??? thread &1844441728 pid &1) -- WebPageProxy::commitProvisionalPage() -- m_process->removeWebPage() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::removeWebPage() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::canTerminateAuxiliaryProcess() XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=0, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=1) XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::addExistingWebPage() XXX ( ??? thread &1886918368 pid &168) -- WebPage::close() <<<<<<<<<<< [1] XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::removeProvisionalPageProxy() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::canTerminateAuxiliaryProcess() XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=1, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=0) ... XXX ( ??? thread &1886918368 pid &168) -- WebPage::close() -> WebProcess::singleton().removeWebPage(m_identifier) XXX (? t:&1886918368 p:&:168) -- WebProcess::removeWebPage() -- BEGIN -- this (&0x6fef6000) XXX ( ??? thread &1886918368 pid &168) -- AuxiliaryProcess::enableTermination() -- m_terminationCounter: 0 XXX (? t:&1886918368 p:&:168) -- AuxiliaryProcess::terminationTimerFired()() (&0x6fef6000) XXX (? t:&1886918368 p:&:168) -- WebProcess::shouldTerminate() -- BEGIN -- this (&0x6fef6000) XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::WebProcessProxy() -- m_shutdownPreventingScopeCounter XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::canTerminateAuxiliaryProcess() XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=0, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=0) XXX WebProcessProxy::canTerminateAuxiliaryProcess: true XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown() -> shutDown() XXX ( ??? thread &1844441728 pid &1) -- WebProcessProxy::shutDown() XXX (? t:&1886918368 p:&:168) -- WebProcess::shouldTerminate() -- END -- true -- this (&0x6fef6000) XXX (? t:&1886918368 p:&:168) -- WebProcess::terminate() -- BEGIN -- this (&0x6fef6000) XXX ( ??? thread &1886918368 pid &168) -- glib/WebProcessGLib.cpp -- WebProcess::stopRunLoop() (no WebPage.close() because it was alrady called in [1] before the shutdown) ... ``` ; apparently more logically right.
Carlos Garcia Campos
Comment 13 2022-03-17 04:37:54 PDT
Pablo Saavedra
Comment 14 2022-03-17 04:45:53 PDT
(In reply to Carlos Garcia Campos from comment #13) > Created attachment 454958 [details] > Patch Yes, this patch also works. Moreover, this avoids shouldTerminate() being called and this again the shutdown() one more time. As, Carlos explained to me, in case of shutdown, we want to inconditionally terminate the WebProcess. We dont want ask to the UI-process if we should do it because it is the UI-process itself who is shutting down the WebProcess. This patch is clear r+ for me. Thanks Carlos.
Michael Catanzaro
Comment 15 2022-03-17 09:44:05 PDT
Comment on attachment 454958 [details] Patch I like that you managed this in a non-intrusive and cross-platform way. Needs owner approval, though.
Carlos Garcia Campos
Comment 16 2022-03-18 02:51:25 PDT
Note You need to log in before you can comment on or make changes to this bug.