Make sure WebPageProxy doesn't leak though strong references in async IPC callbacks
https://bugs.webkit.org/show_bug.cgi?id=241583
Summary Make sure WebPageProxy doesn't leak though strong references in async IPC cal...
Yury Semikhatsky
Reported 2022-06-13 18:56:20 PDT
Filing this bug as suggested in https://github.com/WebKit/WebKit/pull/1455#discussion_r896030512 to ensure there are no async IPC message callbacks holding a strong reference to WebPageProxy as it may lead to a web process leak similar to the one in https://bugs.webkit.org/show_bug.cgi?id=241353 when async callback ends up holding WebProcessPool and web process alive though the following reference chain: asyncReplyHandlerMap │ ▼ sendWithAsyncReply callback │ ▼ WebPageProxy │ ▼ API::PageConfiguration │ ▼ WebProcessPool │ ▼ WebProcessCache │ ▼ CachedProcess │ ▼ WebProccessProxy
Attachments
Michael Catanzaro
Comment 1 2022-06-14 05:22:48 PDT
From bug #241353: (In reply to Yury Semikhatsky from comment #6) > Another thing that may be worth implementing is killing stopped web > processes if the UI process exits while some of the WebProcessProxy objects > are still alive (intentionally due to the embedder logic or unintentionally > due to bugs like this). As far as I understand, normally closing the UI > process will close one end of the IPC connection and the Web process will > exit as a result. However, if the WebProcess is stopped it will not react to > the state changes of the IPC pipe and will keep hanging. That's correct, the subprocesses terminate when they notice that their IPC connection to the UI process has closed. If the subprocess is stopped, that will (obviously) never happen. I think it would suffice to send all subprocesses SIGCONT when the UI process terminates, but that won't work if the UI process crashes. I'm afraid that to manage this robustly, we've likely reached the point where we need a manager process that does nothing except check whether the UI process is alive and SIGCONT the subprocesses if it dies. Linux has prctl(SET_PDEATHSIG) that in theory could be used to send SIGCONT automatically, but last I checked it seems to be unreliable or broken in practice, and other Unix OSes don't have this. Not sure what's available on macOS/iOS.
Michael Catanzaro
Comment 2 2022-06-15 08:17:18 PDT
Checking how PR_SET_PDEATHSIG works, I think I was just misusing it before when I convinced myself that it was unreliable or broken. I had previously attempted to use it between fork() and exec(), but it is not async-signal-safe, so that's illegal. Anyway, possible solution: diff --git a/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp b/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp index 5f3b994864fa..ac763ecd9b08 100644 --- a/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp +++ b/Source/WebKit/Shared/unix/AuxiliaryProcessMain.cpp @@ -32,6 +32,10 @@ #include <stdlib.h> #include <string.h> +#if OS(LINUX) +#include <sys/prctl.h> +#endif + namespace WebKit { bool AuxiliaryProcessMainCommon::parseCommandLine(int argc, char** argv) @@ -56,6 +60,10 @@ void AuxiliaryProcess::platformInitialize(const AuxiliaryProcessInitializationPa RELEASE_ASSERT(!sigemptyset(&signalAction.sa_mask)); signalAction.sa_handler = SIG_IGN; RELEASE_ASSERT(!sigaction(SIGPIPE, &signalAction, nullptr)); + +#if OS(LINUX) + prctl(PR_SET_PDEATHSIG, SIGCONT, 0, 0, 0); +#endif } } // namespace WebKit I attempted to test this by manually sending SIGSTOP to the web process and SIGTERM to the UI process, but the web process and all other child processes actually did die when the UI process received SIGTERM. Not sure what killed the web process.
Radar WebKit Bug Importer
Comment 3 2022-06-20 18:57:11 PDT
Michael Catanzaro
Comment 4 2022-07-01 05:42:30 PDT
If we don't have a robust solution for this, then we probably need to revert 250678@main "WebProcessProxy should not hold WebsiteDataStore alive when there is no page." Status quo is a game of whack-a-mole with web process leaks. :/ Except... question: if the WebsiteDataStore is alive, then is the network process kept alive? If that commit was fixing network process leaks, we don't want to trade one process leak for another.
Chris Dumez
Comment 5 2022-07-01 07:50:11 PDT
Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess until it responds? This is at least how it works for us on iOS with process suspension. On macOS, I don't believe this is an issue for us as we don't currently do process suspension.
Yury Semikhatsky
Comment 6 2022-07-01 10:04:28 PDT
(In reply to Chris Dumez from comment #5) > Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess > until it responds? This is at least how it works for us on iOS with process > suspension. > I believe in https://bugs.webkit.org/show_bug.cgi?id=241353 scenario sendWithAsyncReply was called before the process was suspended, i.e. the process got suspended while there already were pending async IPC calls.
Chris Dumez
Comment 7 2022-07-01 10:07:10 PDT
(In reply to Yury Semikhatsky from comment #6) > (In reply to Chris Dumez from comment #5) > > Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess > > until it responds? This is at least how it works for us on iOS with process > > suspension. > > > > I believe in https://bugs.webkit.org/show_bug.cgi?id=241353 scenario > sendWithAsyncReply was called before the process was suspended, i.e. the > process got suspended while there already were pending async IPC calls. Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and capture) a token that prevents process suspension.
Chris Dumez
Comment 8 2022-07-01 10:08:24 PDT
(In reply to Chris Dumez from comment #7) > (In reply to Yury Semikhatsky from comment #6) > > (In reply to Chris Dumez from comment #5) > > > Shouldn't sendWithAsyncReply() wake up the suspended destination WebProcess > > > until it responds? This is at least how it works for us on iOS with process > > > suspension. > > > > > > > I believe in https://bugs.webkit.org/show_bug.cgi?id=241353 scenario > > sendWithAsyncReply was called before the process was suspended, i.e. the > > process got suspended while there already were pending async IPC calls. > > Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and > capture) a token that prevents process suspension. It's this backgroundActivity here: ``` if (asyncReplyInfo && canSendMessage() && shouldStartProcessThrottlerActivity == ShouldStartProcessThrottlerActivity::Yes) { auto completionHandler = std::exchange(asyncReplyInfo->first, nullptr); asyncReplyInfo->first = [activity = throttler().backgroundActivity({ }), completionHandler = WTFMove(completionHandler)](IPC::Decoder* decoder) mutable { completionHandler(decoder); }; } ```
Yury Semikhatsky
Comment 9 2022-07-01 13:22:30 PDT
(In reply to Chris Dumez from comment #7) > Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and > capture) a token that prevents process suspension. Do you mean checking ProcessThrottler::shouldBeRunnable() before suspending the process? It should be a safe check to do before suspending the process on linux. I don't see corresponding code that suspends the process and calls the throttler on iOS, can you point me to that?
Chris Dumez
Comment 10 2022-07-01 13:27:38 PDT
(In reply to Yury Semikhatsky from comment #9) > (In reply to Chris Dumez from comment #7) > > Right, on iOS, whenever we call sendWithAsyncReply(), we basically grab (and > > capture) a token that prevents process suspension. > > Do you mean checking ProcessThrottler::shouldBeRunnable() before suspending > the process? It should be a safe check to do before suspending the process > on linux. I don't see corresponding code that suspends the process and calls > the throttler on iOS, can you point me to that? On iOS, WebKit does not suspend processes, it is the operating system that suspends processes unless they're holding what we call a process assertion. The backgoundActivity that we capture in the lambdas is just a counter that controls whether or not we should hold a process assertion for the process. WPE could adopt ProcessAssertion / ProcessThrottler to control suspension similarly to what we do (although they'd do the suspension themselves when releasing the process assertion). Alternatively, WPE could do their own thing but that works in a similar way. Basically, they'd need an object they can hold (and capture in lambdas) that can delay process suspension until released.
Sihui Liu
Comment 11 2022-07-22 15:37:37 PDT
(In reply to Michael Catanzaro from comment #4) > If we don't have a robust solution for this, then we probably need to revert > 250678@main "WebProcessProxy should not hold WebsiteDataStore alive when > there is no page." Status quo is a game of whack-a-mole with web process > leaks. :/ > > Except... question: if the WebsiteDataStore is alive, then is the network > process kept alive? If that commit was fixing network process leaks, we > don't want to trade one process leak for another. In our current implementation, if WebsiteDataStore is alive, and it has used network process before, the network process will be kept alive. (You may check the use of m_networkProcess in WebsiteDataStore.)
Yury Semikhatsky
Comment 12 2022-08-16 15:40:54 PDT
(In reply to Chris Dumez from comment #10) > > WPE could adopt ProcessAssertion / ProcessThrottler to control suspension > similarly to what we do (although they'd do the suspension themselves when > releasing the process assertion). > > Alternatively, WPE could do their own thing but that works in a similar way. > Basically, they'd need an object they can hold (and capture in lambdas) that > can delay process suspension until released. This can help to reliably account for pending activities in the UI process and prevent the child process from stopping which solves part of the problem. Once the child process is paused the UI process can be killed with SIGKILL in which case the child will continue hanging (as it only responds to SIGCONT and SIGKILL while it's stopped). Looks like addressing such use case would anyway require some sort of "a manager process" as Michael Catanzaro described above.
Alex Christensen
Comment 13 2022-08-16 15:57:27 PDT
FYI 250678@main was effectively reverted in 253449@main
Michael Catanzaro
Comment 14 2022-08-16 17:36:25 PDT
OK... in that case, I think we can close this now. Do you agree, Yury? Non-Cocoa ports still probably need a more robust ownership model for the network process. The Cocoa setup of one singleton process, rather than one process per data store, seems superior to me. But improving this would be a different problem for a different bug. Agreed?
Yury Semikhatsky
Comment 15 2022-08-17 15:49:13 PDT
(In reply to Michael Catanzaro from comment #14) > OK... in that case, I think we can close this now. Do you agree, Yury? > The bug with WebPageProxy (and web process) leak was exposed by the reverted patch and now that it has been reverted the original repro might not trigger the issue, but I believe the underlying issue of preventing a process from suspending while it has pending tasks is still there. So I'd keep this bug open to track progress on that. The other issue mentioned above (if the web process is stopped and the UI processes gets killed the former will leak) can probably be moved into another bug. It might make sense to have an api to disable process suspension logic for the embedders that care about possible leaks (we do). > Non-Cocoa ports still probably need a more robust ownership model for the > network process. The Cocoa setup of one singleton process, rather than one > process per data store, seems superior to me. But improving this would be a > different problem for a different bug. Agreed? I may be missing something but it doesn't seem directly related to this bug report and would be a topic of a different bug.
Michael Catanzaro
Comment 16 2022-08-17 15:56:53 PDT
> It might make sense to have an api to disable process suspension logic for the embedders that care about possible leaks (we do). We need to make sure process leaks are not possible period. Nobody wants processes to leak.
Michael Catanzaro
Comment 17 2023-09-25 07:50:02 PDT
(In reply to Michael Catanzaro from comment #14) > Non-Cocoa ports still probably need a more robust ownership model for the > network process. The Cocoa setup of one singleton process, rather than one > process per data store, seems superior to me. But improving this would be a > different problem for a different bug. Agreed? GTK/WPE did switch to singleton network process in the GTK 4 and WPE 2 API versions. (In reply to Michael Catanzaro from comment #16) > We need to make sure process leaks are not possible period. Nobody wants > processes to leak. Nowadays the web process has a watchdog thread that will crash the web process 10 seconds after the UI process exits (crashAfter10Seconds in WebProcess.cpp). But it's only enabled for GTK and WPE. Probably makes sense for it to be enabled on all ports.
Note You need to log in before you can comment on or make changes to this bug.