Bug 237917 - [WPE][GTK] Fix a crash after r290360
Summary: [WPE][GTK] Fix a crash after r290360
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Linux
: P2 Normal
Assignee: Pablo Saavedra
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-15 13:15 PDT by Pablo Saavedra
Modified: 2022-03-18 02:53 PDT (History)
9 users (show)

See Also:


Attachments
patch (1.33 KB, patch)
2022-03-15 13:29 PDT, Pablo Saavedra
no flags Details | Formatted Diff | Diff
patch (1.44 KB, patch)
2022-03-15 14:02 PDT, Pablo Saavedra
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (2.81 KB, patch)
2022-03-17 04:37 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pablo Saavedra 2022-03-15 13:15:02 PDT
When navigating from one website to another with a different domain with `JS location.replace("https://other.domain.foo")` there is chances to get this crash:


```
was generated by `/usr/libexec/wpe-webkit-1.0/WPEWebProcess 17 75'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x74eeb448 in WebKit::WebProcess::terminate() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
[Current thread is 1 (LWP 115)]
(gdb) bt
#0  0x74eeb448 in WebKit::WebProcess::terminate() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#1  0x74eeb2dc in WebKit::WebProcess::removeWebPage(WTF::ObjectIdentifier<WebCore::PageIdentifierType>) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#2  0x74f75554 in WebKit::WebPage::close() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#3  0x74f94c96 in WebKit::WebProcess::stopRunLoop() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#4  0x74d62986 in IPC::Connection::dispatchMessage(std::unique_ptr<IPC::Decoder, std::default_delete<IPC::Decoder> >) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#5  0x74d62c22 in IPC::Connection::dispatchOneIncomingMessage() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#6  0x7686b89a in WTF::RunLoop::performWork() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#7  0x768a6f70 in WTF::RunLoop::RunLoop()::$_1::__invoke(void*) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#8  0x768a6664 in WTF::RunLoop::$_0::__invoke(_GSource*, int (*)(void*), void*) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#9  0x7453d7b6 in g_main_dispatch (context=0x19948c8) at ../glib-2.62.6/glib/gmain.c:3216
#10 g_main_context_dispatch (context=context@entry=0x19948c8) at ../glib-2.62.6/glib/gmain.c:3908
#11 0x7453da4c in g_main_context_iterate (context=0x19948c8, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib-2.62.6/glib/gmain.c:3981
#12 0x7453dcb8 in g_main_loop_run (loop=0x1995e58) at ../glib-2.62.6/glib/gmain.c:4175
#13 0x768a6ab0 in WTF::RunLoop::run() () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#14 0x74f95620 in int WebKit::AuxiliaryProcessMain<WebKit::WebProcessMainWPE>(int, char**) () from /webkit/usr/lib/libWPEWebKit-1.0.so.3.16.8
#15 0x748309fa in __libc_start_main (main=0x456fe0, argc=0, argv=0x0, init=<optimized out>, fini=0x455655 <__libc_csu_fini>, rtld_fini=0x76f13029 <_dl_fini>, stack_end=0x7eb164d4) at libc-start.c:308
#16 0x00455508 in _start () at start.S:112
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
```
Comment 1 Pablo Saavedra 2022-03-15 13:23:56 PDT
The segfault is because a doble-call to the WebProcess::terminate() method in the WebProcess::shutdown() path.


```
void WebProcess::terminate()
{
#ifndef NDEBUG
    GCController::singleton().garbageCollectNow();
    FontCache::singleton().invalidate();
    MemoryCache::singleton().setDisabled(true);
#endif

    m_webConnection->invalidate(); <<<<<<<< invalid access during the second invocation 
    m_webConnection = nullptr; <<<<<<<<< set null in the first invocation

    platformTerminate();

    AuxiliaryProcess::terminate();
}

```

Here the stack method calls:

AuxiliaryProcess::shutDown():

```
-> terminate()
   -> WebProcess::terminate()
      -> AuxiliaryProcess::terminate()
         -> AuxiliaryProcess::terminate() -> stopRunLoop()
            -> WebProcess::stopRunLoop() (from glib/WebProcessGLib.cpp)
               -> WebPage::close()
                  -> WebProcess::singleton().removeWebPage(m_identifier)
                     -> AuxiliaryProcess::enableTermination() -- m_terminationCounter: 0
                        -> WebProcess::terminate()
```
Comment 2 Pablo Saavedra 2022-03-15 13:29:01 PDT
Created attachment 454750 [details]
patch
Comment 3 Pablo Saavedra 2022-03-15 13:45:44 PDT
With the proposed patch, the new stack could someth

WebProcess::stopRunLoop()  (from glib/WebProcessGLib.cpp)


```
-> WebPage::close()
   -> WebProcess::singleton().removeWebPage(m_identifier)
      -> WebProcess::removeWebPage()
         -> AuxiliaryProcess::enableTermination()
            -> WebProcess::terminate()    <<<<<<<<<<<  Called for each WebProcess associated to each WebPage
  ...
-> WebPage::~WebPage()
-> AuxiliaryProcess::stopRunLoop()
   -> platformStopRunLoop()
```
Comment 4 Pablo Saavedra 2022-03-15 14:02:54 PDT
Created attachment 454756 [details]
patch
Comment 5 Michael Catanzaro 2022-03-15 16:56:24 PDT
So many #ifs.

I wonder if we can make other ports use the same implementation of AuxiliaryProcess::didClose that we do, where we call stopRunLoop. I know Apple was previously opposed to similar approaches, but I really doubt one run loop iteration should have a major performance impact on process termination speed. That would allow considerably reducing #ifs, and should hopefully make it safer to modify this code without introducing platform-specific problems.
Comment 6 Pablo Saavedra 2022-03-16 02:15:38 PDT
Comment on attachment 454756 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=454756&action=review

Notice that this change, even if could be good for avoiding redundant calls to the RunLoop::main().stop(), is not mandatory since it doesn't harms the shutDown logic.

> Source/WebKit/Shared/AuxiliaryProcess.cpp:216
> +#if !PLATFORM(GTK) && !PLATFORM(WPE)

Notice that this change, even if could be good for avoiding redundant calls to the RunLoop::main().stop(), is not mandatory since it doesn't harms the shutDown logic.
Comment 7 Pablo Saavedra 2022-03-16 02:51:54 PDT
(In reply to Michael Catanzaro from comment #5)
> So many #ifs.
> 
> I wonder if we can make other ports use the same implementation of
> AuxiliaryProcess::didClose that we do, where we call stopRunLoop. I know
> Apple was previously opposed to similar approaches, but I really doubt one
> run loop iteration should have a major performance impact on process
> termination speed. That would allow considerably reducing #ifs, and should
> hopefully make it safer to modify this code without introducing
> platform-specific problems.

AFAIK, the major concern to move this to the WebProcess.cpp is the loop stop is too slow in some ports. This is the reason why didClose() is implemented with `_exit(EXIT_SUCCESS)`. I don't think that situation changed a lot since the last time this was discussed.
Comment 8 Pablo Saavedra 2022-03-16 02:56:32 PDT
BTW, a simple safe-guard like this also fix the crash:

```
void WebProcess::terminate()
{
#ifndef NDEBUG
    GCController::singleton().garbageCollectNow();
    FontCache::singleton().invalidate();
    MemoryCache::singleton().setDisabled(true);
#endif

    if (m_webConnection) <<<<<<<<<<<<<<< safe-guarded access
    {
        m_webConnection->invalidate(); <<<<<<<< invalid access during the second invocation 
        m_webConnection = nullptr; <<<<<<<<< set null in the first invocation
    }

    platformTerminate();

    AuxiliaryProcess::terminate();
}

```

; however I don't think the recursive logic followed by the terminate() could be considered strictly correct:


```
-> terminate()
   -> WebProcess::terminate()
      -> AuxiliaryProcess::terminate()
         -> AuxiliaryProcess::terminate() -> stopRunLoop()
            -> WebProcess::stopRunLoop() (from glib/WebProcessGLib.cpp)
               -> WebPage::close()
                  -> WebProcess::singleton().removeWebPage(m_identifier)
                     -> AuxiliaryProcess::enableTermination() -- m_terminationCounter: 0
                        -> WebProcess::terminate()
```
Comment 9 Carlos Garcia Campos 2022-03-16 03:54:10 PDT
Checking m_webConnection was also my first idea, but returning early instead. However, I would like to understand how this can happen, because I would expect the page map to be empty when shutdown is called.
Comment 10 Pablo Saavedra 2022-03-16 07:43:12 PDT


```
XXX (  thread &1844441728 pid &001) -- WebPageProxy::commitProvisionalPage() -- m_process->removeWebPage()
XXX (  thread &1844441728 pid &001) -- WebProcessProxy::removeWebPage()
XXX (  thread &1844441728 pid &001) -- WebProcessProxy::maybeShutDown()
XXX (  thread &1844441728 pid &001) -- WebProcessProxy::canTerminateAuxiliaryProcess()
[1] XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=0, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=0)
XXX WebProcessProxy::canTerminateAuxiliaryProcess: true
XXX (  thread &1844441728 pid &001) -- WebProcessProxy::maybeShutDown() -> shutDown()
XXX (  thread &1844441728 pid &001) -- WebProcessProxy::shutDown()
XXX (  thread &1887639264 pid &111) -- AuxiliaryProcess::shutDown() -> terminate() --  m_terminationCounter: 1
XXX (  thread &1887639264 pid &111) --                         WebProcess::terminate() -- BEGIN -- this (&0x6fff6000)
XXX (  thread &1887639264 pid &111) --                         WebProcess::terminate() -- 1 -- this (&0x6fff6000)
XXX (  thread &1887639264 pid &111) --                         WebProcess::terminate() -- 1.1 -- this (&0x6fff6000)
XXX (  thread &1887639264 pid &111) --                         WebProcess::terminate() -- 2 -- this (&0x6fff6000)
XXX (  thread &1887639264 pid &111) --                         WebProcess::terminate() -- 3 -- this (&0x6fff6000)
XXX (  thread &1887639264 pid &111) -- glib/WebProcessGLib.cpp -- WebProcess::stopRunLoop()
[2] XXX (  thread &1887639264 pid &111) -- WebPage::close()
                                       --> This will lead in a 2nd terminate() czll
```


notice that:

* in [1] pageCount is WebProcessProxy.m_pageMap.size() -> 0.
* in [2] WebPage::close() is called because the m_WebProcess.pageMap is not empty


At that point it looks like there is a desynchrony between the WebProcessState state and the WebProcess state
Comment 11 Pablo Saavedra 2022-03-16 08:05:10 PDT
(In reply to Carlos Garcia Campos from comment #9)
> Checking m_webConnection was also my first idea, but returning early
> instead. However, I would like to understand how this can happen, because I
> would expect the page map to be empty when shutdown is called.


Notice that a similar decision was taken in the WebProcessProxy:

```
void WebProcessProxy::shutDown()
{
...

    if (m_webConnection) {
        m_webConnection->invalidate();
        m_webConnection = nullptr;
    }

```
Comment 12 Pablo Saavedra 2022-03-16 09:10:06 PDT
Coming back to the comment:10

This change could prevent the m_pageMap being not empty by calling for the page close() before the page being removed from the WebProcessProxy map:



```
diff --git a/Source/WebKit/UIProcess/WebPageProxy.cpp b/Source/WebKit/UIProcess/WebPageProxy.cpp
index a7e19d46..745559a3 100644
--- a/Source/WebKit/UIProcess/WebPageProxy.cpp
+++ b/Source/WebKit/UIProcess/WebPageProxy.cpp
@@ -3607,6 +3615,11 @@ void WebPageProxy::commitProvisionalPage(FrameIdentifier frameID, FrameInfoData&
     m_process->removeMessageReceiver(Messages::WebPageProxy::messageReceiverName(), m_webPageID);
     auto* navigation = navigationState().navigation(m_provisionalPage->navigationID());
     bool didSuspendPreviousPage = navigation && !m_provisionalPage->shouldClosePreviousPageAfterCommit() ? suspendCurrentPageIfPossible(*navigation, mainFrameIDInPreviousProcess, m_provisionalPage->processSwapRequestedByClient(), shouldDelayClosingUntilFirstLayerFlush) : false;
+
+    RunLoop::current().dispatch([destinationID = messageSenderDestinationID(), protectedProcess = m_process.copyRef(), preventProcessShutdownScope = m_process->shutdownPreventingScope()] {
+        protectedProcess->send(Messages::WebPage::Close(), destinationID);
+    });
+
     m_process->removeWebPage(*this, m_websiteDataStore.ptr() == &m_provisionalPage->process().websiteDataStore() ? WebProcessProxy::EndsUsingDataStore::No : WebProcessProxy::EndsUsingDataStore::Yes);
 
     // There is no way we'll be able to return to the page in the previous page so close it.

```

Resulting in this workflow:


```
XXX (     layerFlush thread &1887233760) -- CoordinatedGraphicsLayer::~CoordinatedGraphicsLayer() -- END
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::WebProcessProxy() -- m_shutdownPreventingScopeCounter
XXX ( ???            thread &1844441728 pid &1) -- WebPageProxy::commitProvisionalPage() -- m_process->removeWebPage()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::removeWebPage()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::canTerminateAuxiliaryProcess()
XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=0, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=1)
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::addExistingWebPage()
XXX ( ???            thread &1886918368 pid &168) -- WebPage::close()                                   <<<<<<<<<<< [1]
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::removeProvisionalPageProxy()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::canTerminateAuxiliaryProcess()
XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=1, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=0)
...
XXX ( ???            thread &1886918368 pid &168) -- WebPage::close() -> WebProcess::singleton().removeWebPage(m_identifier)
XXX (? t:&1886918368 p:&:168) --                         WebProcess::removeWebPage() -- BEGIN -- this (&0x6fef6000)
XXX ( ???            thread &1886918368 pid &168) -- AuxiliaryProcess::enableTermination() -- m_terminationCounter: 0
XXX (? t:&1886918368 p:&:168) --                         AuxiliaryProcess::terminationTimerFired()() (&0x6fef6000)
XXX (? t:&1886918368 p:&:168) --                         WebProcess::shouldTerminate() -- BEGIN -- this (&0x6fef6000)
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::WebProcessProxy() -- m_shutdownPreventingScopeCounter
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::canTerminateAuxiliaryProcess()
XXX WebProcessProxy::canTerminateAuxiliaryProcess: (pageCount=0, provisionalPageCount=0, m_suspendedPageCount=0, m_isInProcessCache=0, m_shutdownPreventingScopeCounter=0)
XXX WebProcessProxy::canTerminateAuxiliaryProcess: true
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::maybeShutDown() -> shutDown()
XXX ( ???            thread &1844441728 pid &1) -- WebProcessProxy::shutDown()
XXX (? t:&1886918368 p:&:168) --                         WebProcess::shouldTerminate() -- END -- true -- this (&0x6fef6000)
XXX (? t:&1886918368 p:&:168) --                         WebProcess::terminate() -- BEGIN -- this (&0x6fef6000)
XXX ( ???            thread &1886918368 pid &168) -- glib/WebProcessGLib.cpp -- WebProcess::stopRunLoop()
(no WebPage.close() because it was alrady called in [1] before the shutdown)
...
```


; apparently more logically right.
Comment 13 Carlos Garcia Campos 2022-03-17 04:37:54 PDT
Created attachment 454958 [details]
Patch
Comment 14 Pablo Saavedra 2022-03-17 04:45:53 PDT
(In reply to Carlos Garcia Campos from comment #13)
> Created attachment 454958 [details]
> Patch

Yes, this patch also works. Moreover, this avoids shouldTerminate() being called and this again the shutdown() one more time.

As, Carlos explained to me, in  case of shutdown, we want to inconditionally terminate the WebProcess. We dont want ask to the UI-process if we should do it because it is the UI-process itself who is shutting down the WebProcess.

This patch is clear r+ for me.

Thanks Carlos.
Comment 15 Michael Catanzaro 2022-03-17 09:44:05 PDT
Comment on attachment 454958 [details]
Patch

I like that you managed this in a non-intrusive and cross-platform way. Needs owner approval, though.
Comment 16 Carlos Garcia Campos 2022-03-18 02:51:25 PDT
Committed r291475 (248591@trunk): <https://commits.webkit.org/248591@trunk>