Since r294381 we (Playwright) often (every second run or so) see WPEWebProcesses that stay alive after the browser exits. The tests create a few WebsiteDataStore/WebProcessPool instances and then one/two pages in them. When exiting the UIProcess we now routinely see WebsiteDataStore and WebProcessPool with one WebProcess in it whose sessionID is the same as that of the WebsiteDataStore. WebsiteDataStore::processes() is empty. The repro is quite complicated, so I'm going to continue debugging.
Sihui Liu, perhaps you know off the top of your head what may have changed?
The problem is caused by the change in the behavior of WebProcessCache::canCacheProcess. I see that it was discussed in the code review in https://bugs.webkit.org/show_bug.cgi?id=238892#c10 so maybe Carlos knows what's going on?
Before the change the process would not be cached and now it gets cached. Not sure if this is the desirable behavior for a process that contained pages that use ephemeral datastore. The symptoms as I mentioned above is leaking WPEWebProcess after closing all pages in the browser. There is also now a hanging WebPageProxy object now. It may well be that the change just exposed an existing bug in how the process cache works in WPE.
Any chance you know how to reproduce this? Even if it's not a reliable reproducer, having a test that sometimes fails would be useful.
(In reply to Yury Semikhatsky from comment #1)
> It may well be that the change just
> exposed an existing bug in how the process cache works in WPE.
> Any chance you know how to reproduce this? Even if it's not a reliable
> reproducer, having a test that sometimes fails would be useful.
I have a reliable reproduction but it requires running 76 tests in playwright (didn't manage to isolate it further) after which closing browser hangs because of the leaking process.
The steps would be like this:
git clone https://github.com/yury-s/playwright/tree/wk-1649
git co wk-1649
npx playwright install
CI=1 taskset 3 npm run wtest -- 'trace-viewer|tracing|video' --reporter=list
This is probably not very helpful.
Now that we know at which lines caused the problem I can try to create a webkit test for that.
I managed to debug what causes the issue. We call WebPageProxy::forceRepaint, which in turn sends an async message to the web process and captures a strong reference to the WebPageProxy in the async callback. In WPE on a static page the callback is not invoked for a long time and the client closes the page. While the page is closed, the callback is still in asyncReplyHandlerMap and keeps WebProcessPool alive. Since the closed page was last page in the web process the process gets into WebProcessCache and in 30s gets stopped (CachedProcess::suspensionTimerFired). When the browser exists, the web process is still alive an is in the STOPPED state. Since the reference to WebProcessPool (and transitively to WebProcessProxy) still exists the web process is not killed and keeps hanging after the UIProcess exited.
The retention chain looks like this:
I'll upload a fix for this particular issue shorty but it looks like a bigger problem with stopping web process when there are in-flight IPC messages.
Pull request: https://github.com/WebKit/WebKit/pull/1455
Another thing that may be worth implementing is killing stopped web processes if the UI process exits while some of the WebProcessProxy objects are still alive (intentionally due to the embedder logic or unintentionally due to bugs like this). As far as I understand, normally closing the UI process will close one end of the IPC connection and the Web process will exit as a result. However, if the WebProcess is stopped it will not react to the state changes of the IPC pipe and will keep hanging.
Committed r295511 (251516@main): <https://commits.webkit.org/251516@main>
Reviewed commits have been landed. Closing PR #1455 and removing active labels.