Bug 241583

Summary: Make sure WebPageProxy doesn't leak though strong references in async IPC callbacks
Product: WebKit Reporter: Yury Semikhatsky <yurys>
Component: WebKit2Assignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: achristensen, bugs-noreply, cdumez, kkinnunen, mcatanzaro, sihui_liu, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 241353, 242217    
Bug Blocks:    

Description Yury Semikhatsky 2022-06-13 18:56:20 PDT
Filing this bug as suggested in https://github.com/WebKit/WebKit/pull/1455#discussion_r896030512 to
ensure there are no async IPC message callbacks holding a strong reference to WebPageProxy as
it may lead to a web process leak similar to the one in https://bugs.webkit.org/show_bug.cgi?id=241353
when async callback ends up holding WebProcessPool and web process alive though the following
reference chain:


  asyncReplyHandlerMap
           │
           ▼
sendWithAsyncReply callback
           │
           ▼
      WebPageProxy
           │
           ▼
  API::PageConfiguration
           │
           ▼
     WebProcessPool
           │
           ▼
    WebProcessCache
           │
           ▼
      CachedProcess
           │
           ▼
     WebProccessProxy
Comment 1 Michael Catanzaro 2022-06-14 05:22:48 PDT
From bug #241353:

(In reply to Yury Semikhatsky from comment #6)
> Another thing that may be worth implementing is killing stopped web
> processes if the UI process exits while some of the WebProcessProxy objects
> are still alive (intentionally due to the embedder logic or unintentionally
> due to bugs like this). As far as I understand, normally closing the UI
> process will close one end of the IPC connection and the Web process will
> exit as a result. However, if the WebProcess is stopped it will not react to
> the state changes of the IPC pipe and will keep hanging.

That's correct, the subprocesses terminate when they notice that their IPC connection to the UI process has closed. If the subprocess is stopped, that will (obviously) never happen. I think it would suffice to send all subprocesses SIGCONT when the UI process terminates, but that won't work if the UI process crashes.

I'm afraid that to manage this robustly, we've likely reached the point where we need a manager process that does nothing except check whether the UI process is alive and SIGCONT the subprocesses if it dies. Linux has prctl(SET_PDEATHSIG) that in theory could be used to send SIGCONT automatically, but last I checked it seems to be unreliable or broken in practice, and other Unix OSes don't have this. Not sure what's available on macOS/iOS.
Comment 2 Michael Catanzaro 2022-06-15 08:17:18 PDT
Checking how PR_SET_PDEATHSIG works, I think I was just misusing it before when I convinced myself that it was unreliable or broken. I had previously attempted to use it between fork() and exec(), but it is not async-signal-safe, so that's illegal.

Anyway, possible solution:

diff --git a/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp b/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp
index 5f3b994864fa..ac763ecd9b08 100644
--- a/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp
+++ b/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp
@@ -32,6 +32,10 @@
 #include <stdlib.h>
 #include <string.h>
 
+#if OS(LINUX)
+#include <sys/prctl.h>
+#endif
+
 namespace WebKit {
 
 bool AuxiliaryProcessMainCommon::parseCommandLine(int argc, char** argv)
@@ -56,6 +60,10 @@ void AuxiliaryProcess::platformInitialize(const AuxiliaryProcessInitializationPa
     RELEASE_ASSERT(!sigemptyset(&signalAction.sa_mask));
     signalAction.sa_handler = SIG_IGN;
     RELEASE_ASSERT(!sigaction(SIGPIPE, &signalAction, nullptr));
+
+#if OS(LINUX)
+    prctl(PR_SET_PDEATHSIG, SIGCONT, 0, 0, 0);
+#endif
 }
 
 } // namespace WebKit


I attempted to test this by manually sending SIGSTOP to the web process and SIGTERM to the UI process, but the web process and all other child processes actually did die when the UI process received SIGTERM. Not sure what killed the web process.
Comment 3 Radar WebKit Bug Importer 2022-06-20 18:57:11 PDT
<rdar://problem/95561863>
Comment 4 Michael Catanzaro 2022-07-01 05:42:30 PDT
If we don't have a robust solution for this, then we probably need to revert 250678@main "WebProcessProxy should not hold WebsiteDataStore alive when there is no page." Status quo is a game of whack-a-mole with web process leaks. :/

Except... question: if the WebsiteDataStore is alive, then is the network process kept alive? If that commit was fixing network process leaks, we don't want to trade one process leak for another.
Comment 5 Chris Dumez 2022-07-01 07:50:11 PDT
Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess until it responds? This is at least how it works for us on iOS with process suspension.

On macOS, I don't believe this is an issue for us as we don't currently do process suspension.
Comment 6 Yury Semikhatsky 2022-07-01 10:04:28 PDT
(In reply to Chris Dumez from comment #5)
> Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess
> until it responds? This is at least how it works for us on iOS with process
> suspension.
> 

I believe in https://bugs.webkit.org/show_bug.cgi?id=241353 scenario sendWithAsyncReply was called before the process was suspended, i.e. the process got suspended while there already were pending async IPC calls.
Comment 7 Chris Dumez 2022-07-01 10:07:10 PDT
(In reply to Yury Semikhatsky from comment #6)
> (In reply to Chris Dumez from comment #5)
> > Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess
> > until it responds? This is at least how it works for us on iOS with process
> > suspension.
> > 
> 
> I believe in https://bugs.webkit.org/show_bug.cgi?id=241353 scenario
> sendWithAsyncReply was called before the process was suspended, i.e. the
> process got suspended while there already were pending async IPC calls.

Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and capture) a token that prevents process suspension.
Comment 8 Chris Dumez 2022-07-01 10:08:24 PDT
(In reply to Chris Dumez from comment #7)
> (In reply to Yury Semikhatsky from comment #6)
> > (In reply to Chris Dumez from comment #5)
> > > Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess
> > > until it responds? This is at least how it works for us on iOS with process
> > > suspension.
> > > 
> > 
> > I believe in https://bugs.webkit.org/show_bug.cgi?id=241353 scenario
> > sendWithAsyncReply was called before the process was suspended, i.e. the
> > process got suspended while there already were pending async IPC calls.
> 
> Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and
> capture) a token that prevents process suspension.

It's this backgroundActivity here:
```
    if (asyncReplyInfo && canSendMessage() && shouldStartProcessThrottlerActivity == ShouldStartProcessThrottlerActivity::Yes) {
        auto completionHandler = std::exchange(asyncReplyInfo->first, nullptr);
        asyncReplyInfo->first = [activity = throttler().backgroundActivity({ }), completionHandler = WTFMove(completionHandler)](IPC::Decoder* decoder) mutable {
            completionHandler(decoder);
        };
    }
```
Comment 9 Yury Semikhatsky 2022-07-01 13:22:30 PDT
(In reply to Chris Dumez from comment #7)
> Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and
> capture) a token that prevents process suspension.

Do you mean checking ProcessThrottler::shouldBeRunnable() before suspending the process? It should be a safe check to do before suspending the process on linux. I don't see corresponding code that suspends the process and calls the throttler on iOS, can you point me to that?
Comment 10 Chris Dumez 2022-07-01 13:27:38 PDT
(In reply to Yury Semikhatsky from comment #9)
> (In reply to Chris Dumez from comment #7)
> > Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and
> > capture) a token that prevents process suspension.
> 
> Do you mean checking ProcessThrottler::shouldBeRunnable() before suspending
> the process? It should be a safe check to do before suspending the process
> on linux. I don't see corresponding code that suspends the process and calls
> the throttler on iOS, can you point me to that?

On iOS, WebKit does not suspend processes, it is the operating system that suspends processes unless they're holding what we call a process assertion. The backgoundActivity that we capture in the lambdas is just a counter that controls whether or not we should hold a process assertion for the process.

WPE could adopt ProcessAssertion / ProcessThrottler to control suspension similarly to what we do (although they'd do the suspension themselves when releasing the process assertion).

Alternatively, WPE could do their own thing but that works in a similar way. Basically, they'd need an object they can hold (and capture in lambdas) that can delay process suspension until released.
Comment 11 Sihui Liu 2022-07-22 15:37:37 PDT
(In reply to Michael Catanzaro from comment #4)
> If we don't have a robust solution for this, then we probably need to revert
> 250678@main "WebProcessProxy should not hold WebsiteDataStore alive when
> there is no page." Status quo is a game of whack-a-mole with web process
> leaks. :/
> 
> Except... question: if the WebsiteDataStore is alive, then is the network
> process kept alive? If that commit was fixing network process leaks, we
> don't want to trade one process leak for another.

In our current implementation, if WebsiteDataStore is alive, and it has used network process before, the network process will be kept alive. (You may check the use of m_networkProcess in WebsiteDataStore.)
Comment 12 Yury Semikhatsky 2022-08-16 15:40:54 PDT
(In reply to Chris Dumez from comment #10)
> 
> WPE could adopt ProcessAssertion / ProcessThrottler to control suspension
> similarly to what we do (although they'd do the suspension themselves when
> releasing the process assertion).
> 
> Alternatively, WPE could do their own thing but that works in a similar way.
> Basically, they'd need an object they can hold (and capture in lambdas) that
> can delay process suspension until released.

This can help to reliably account for pending activities in the UI process and prevent the child process from stopping which solves part of the problem. Once the child process is paused the UI process can be killed with SIGKILL in which case the child will continue hanging (as it only responds to SIGCONT and SIGKILL while it's stopped). Looks like addressing such use case would anyway require some sort of "a manager process" as Michael Catanzaro described above.
Comment 13 Alex Christensen 2022-08-16 15:57:27 PDT
FYI 250678@main was effectively reverted in 253449@main
Comment 14 Michael Catanzaro 2022-08-16 17:36:25 PDT
OK... in that case, I think we can close this now. Do you agree, Yury?

Non-Cocoa ports still probably need a more robust ownership model for the network process. The Cocoa setup of one singleton process, rather than one process per data store, seems superior to me. But improving this would be a different problem for a different bug. Agreed?
Comment 15 Yury Semikhatsky 2022-08-17 15:49:13 PDT
(In reply to Michael Catanzaro from comment #14)
> OK... in that case, I think we can close this now. Do you agree, Yury?
> 
The bug with WebPageProxy (and web process) leak was exposed by the reverted patch and now that it has been reverted the original repro might not trigger the issue, but I believe the underlying issue of preventing a process from suspending while it has pending tasks is still there. So I'd keep this bug open to track progress on that.

The other issue mentioned above (if the web process is stopped and the UI processes gets killed the former will leak) can probably be moved into another bug. It might make sense to have an api to disable process suspension logic for the embedders that care about possible leaks (we do).


> Non-Cocoa ports still probably need a more robust ownership model for the
> network process. The Cocoa setup of one singleton process, rather than one
> process per data store, seems superior to me. But improving this would be a
> different problem for a different bug. Agreed?

I may be missing something but it doesn't seem directly related to this bug report and would be a topic of a different bug.
Comment 16 Michael Catanzaro 2022-08-17 15:56:53 PDT
> It might make sense to have an api to disable process suspension logic for the embedders that care about possible leaks (we do).

We need to make sure process leaks are not possible period. Nobody wants processes to leak.
Comment 17 Michael Catanzaro 2023-09-25 07:50:02 PDT
(In reply to Michael Catanzaro from comment #14)
> Non-Cocoa ports still probably need a more robust ownership model for the
> network process. The Cocoa setup of one singleton process, rather than one
> process per data store, seems superior to me. But improving this would be a
> different problem for a different bug. Agreed?

GTK/WPE did switch to singleton network process in the GTK 4 and WPE 2 API versions.

(In reply to Michael Catanzaro from comment #16)
> We need to make sure process leaks are not possible period. Nobody wants
> processes to leak.

Nowadays the web process has a watchdog thread that will crash the web process 10 seconds after the UI process exits (crashAfter10Seconds in WebProcess.cpp). But it's only enabled for GTK and WPE. Probably makes sense for it to be enabled on all ports.