Bug 279914 - REGRESSION(283414@main): [WPE][GTK] Evolution uses new web process for every mail preview
Summary: REGRESSION(283414@main): [WPE][GTK] Evolution uses new web process for every ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: Other
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 280180 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-09-18 11:46 PDT by Michael Catanzaro
Modified: 2024-10-01 08:08 PDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2024-09-18 11:46:48 PDT
Moving this from https://bugzilla.redhat.com/show_bug.cgi?id=2313367

"""
After running evolution in rawhide for a while there are 100+ bwrap-ed WebKitWebProcess'es subprocesses running, and increasing on each new mail preview it seems.

The ps looks like:
150089 ?        Sl     0:54  |   \_ /usr/bin/evolution
 150140 ?        SLl    0:00  |       \_ /usr/libexec/webkit2gtk-4.1/WebKitNetworkProcess 7 251 254
 150141 ?        S      0:00  |       \_ /usr/bin/bwrap --args 265 -- /usr/bin/xdg-dbus-proxy --args=260
 150142 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 265 -- /usr/bin/xdg-dbus-proxy --args=260
 150143 ?        Sl     0:00  |       |       \_ /usr/bin/xdg-dbus-proxy --args=260
 150249 ?        S      0:00  |       \_ /usr/bin/bwrap --args 269 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 43 260 255
 150250 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 269 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 43 260 255
 150254 ?        SLl    0:00  |       |       \_ /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 43 260 255
 150271 ?        S      0:00  |       \_ /usr/bin/bwrap --args 277 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 46 271 273
 150272 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 277 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 46 271 273
 150277 ?        SLl    0:00  |       |       \_ /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 46 271 273
 150303 ?        S      0:00  |       \_ /usr/bin/bwrap --args 279 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 56 271 274
 150305 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 279 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 56 271 274
 150309 ?        SLl    0:00  |       |       \_ /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 56 271 274
 150322 ?        S      0:00  |       \_ /usr/bin/bwrap --args 279 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 63 268 271
 150324 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 279 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 63 268 271
 150326 ?        SLl    0:00  |       |       \_ /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 63 268 271
 150323 ?        S      0:00  |       \_ /usr/bin/bwrap --args 283 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 64 268 275
 150325 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 283 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 64 268 275
 150327 ?        SLl    0:00  |       |       \_ /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 64 268 275
 150328 ?        S      0:00  |       \_ /usr/bin/bwrap --args 287 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 70 277 275
 150330 ?        S      0:00  |       |   \_ /usr/bin/bwrap --args 287 -- /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 70 277 275
 150333 ?        SLl    0:00  |       |       \_ /usr/libexec/webkit2gtk-4.1/WebKitWebProcess 70 277 275
...
With the tree numbers increasing for every bwrap instance  

This happens with webkitgtk-2.46.0-3.fc42
Doesn't happen with webkitgtk-2.45.92-3.fc42
'''
Comment 1 Michael Catanzaro 2024-09-18 12:22:16 PDT
No clue what is wrong here. I don't know what to do about this other than revert 283414@main and also 281488@main and turn off web process cache.
Comment 2 Michael Catanzaro 2024-09-19 12:23:49 PDT
Unfortunately I'm just not seeing this myself.

 * Mail preview web views don't seem to use a new web process at all
 * Composer web views create two new web processes, but they also quit when the composer window is destroyed. Not seeing any leak.

I think it's *weird* that each composer requires two web views. That's probably a bug. But they aren't leaking for me. Also, all web processes quit when I quit Evolution.
Comment 3 Yanko Kaneti 2024-09-20 03:17:30 PDT
(In reply to Michael Catanzaro from comment #2)
> Unfortunately I'm just not seeing this myself.

Strange. Its straight forward to reproduce here. On rawhide..
They all go away when I quit evoltution but currently I have 100+ and going up.

The first time I noticed it was while pondering why my session died because dbus-broker complained about "Too many open files"..
Comment 4 Michael Catanzaro 2024-09-20 06:20:29 PDT
(In reply to Yanko Kaneti from comment #3)
> Strange. Its straight forward to reproduce here. On rawhide..

I tested Evolution with (a) my development jhbuild, and (b) F41 with webkitgtk6.0-2.46.0-2.fc41. Behavior is the same in both cases. I would expect rawhide to behave the same as F41; no clue why there is a difference.
Comment 5 Michael Catanzaro 2024-09-23 05:56:49 PDT
*** Bug 280180 has been marked as a duplicate of this bug. ***
Comment 6 Milan Crha 2024-09-23 08:18:39 PDT
For me, I just:
1. open evolution in a folder with more than one message. 
2. move from one message to another
3. repeat several times, can be just an up & down arrow press

Since the second move to another message the WebKitWebProcess-es stop disappear.
To test more, you can double-click on one message in the message list to open a dedicated window for it and move from one message (and back) in that new window. When you close the window the associated WebKitWebView instance is freed (while Evolution is left running).

One observation: the devhelp also uses WebKitGTK-4.1 API, but it has only a single WebKitWebProcess running, also after start, while Evolution has two processes. That's a clue, I believe, because there should not be two WebKitWebProcess-es just after the app started and loaded the first page (or the second, as it first writes there "Retrieving message xxx" and once it loads it it shows the message content, which involves the web content reload.

You mentioned web process caching. How does that work, by the web page URL? The URL/page content changes dynamically in Evolution, iframe-s are used, bug iframe-s as such do not seem to cause this.
Comment 7 Michael Catanzaro 2024-09-24 18:41:57 PDT
I tried the test app you attached in bug #280180 and it functions exactly as you describe. This behavior is consistent with PSON being improperly enabled, though, bug #278016, which regressed in 279914@main. But there is one thing I don't understand: the security origin should always be "file://" so there shouldn't actually be any cross-origin navigation. Also, 279914@main is relatively old, and Yanko is confident that the recent 283414@main is to blame for this. Hmm.

I wonder if the fix in bug #278016 should resolve this for now. I will check tomorrow. Even if that fixes it, you surely do want Evolution to operate correctly when PSON is enabled, because it's mandatory in the GTK 4 API version, so the problem is going to return in the future (unless you plan to stay on GTK 3 forever). So we'll need to investigate further regardless.

(That said, to determine whether the web processes are actually leaked -- and we do have real leaks in bug #279913 -- you have to wait 5 minutes for them to be flushed from the web process cache. In this case, I suspect they're just cached. Of course we still don't want to be caching a new web process every time you look at a different email.)
Comment 8 Milan Crha 2024-09-24 23:01:20 PDT
> you surely do want Evolution to operate correctly when PSON is enabled

Do you mean on the WebKitWebProcess-es side, those created due to WebKitGTK being used in the Evolution project or it's something Evolution code should change?
Comment 9 Michael Catanzaro 2024-09-25 07:22:31 PDT
I don't know yet.
Comment 10 Michael Catanzaro 2024-09-25 14:36:29 PDT
(In reply to Michael Catanzaro from comment #7)
> I tried the test app you attached in bug #280180 and it functions exactly as
> you describe.

Wait. Yesterday I was testing with webkit2gtk4.1-1.fc41. But that *predates* 283414@main!
Comment 11 Michael Catanzaro 2024-09-25 14:37:20 PDT
(In reply to Michael Catanzaro from comment #10)
> Wait. Yesterday I was testing with webkit2gtk4.1-1.fc41. But that *predates*
> 283414@main!

Whoops, I meant: webkit2gtk4.1-2.45.91-1.fc41.

But in theory, this regression was introduced between 2.45.92 and 2.46.0.
Comment 12 Milan Crha 2024-09-25 23:44:46 PDT
> But in theory, this regression was introduced between 2.45.92 and 2.46.0.

Heh, you've it written at the end of the comment #0. ;)
Comment 13 Michael Catanzaro 2024-09-26 14:15:18 PDT
Sorry I'm taking a while to get to this. It's nearly at the top of my to-do list. I know we need to figure this out soon.
Comment 14 Michael Catanzaro 2024-09-27 08:03:44 PDT
I can reproduce the behavior you describe in bug #280180 with 2.45.91, but notably not with my development build on main.
Comment 15 Michael Catanzaro 2024-09-27 08:13:36 PDT
(In reply to Michael Catanzaro from comment #14)
> I can reproduce the behavior you describe in bug #280180 with 2.45.91, but
> notably not with my development build on main.

Actually, sorry, I confused myself. No I can't. With 2.45.91, I see always one web process for each web view, plus one cached web process to use for the next web view. With main branch, I see just one web process per web view.
Comment 16 Michael Catanzaro 2024-09-27 08:33:11 PDT
Now I'm trying to test using eb5031b47e5de0ace92debcedc01f7a80c9ec45b which is "REGRESSION(281488@main): [WPE][GTK] Process launching is slow" on the 2.46 branch. However, the web process is very unfortunately crashing in WebGamepadProvider::gamepadDisconnected, which makes it impossible to test the suspected regression commit. I guess I'd better try to sabotage that code.
Comment 17 Michael Catanzaro 2024-09-27 09:58:11 PDT
OK, even with the exact commit that is supposed to be the regression commit, I cannot reproduce any problem whatsoever when using Milan's test app. So I'm going to stop using the test app.

But now I *can* reproduce the problem in Evolution! 

Next step: disable PSON (bug #278016), see if problem goes away. Yes it does.

Next step: reenable PSON, revert 283414@main, see if the problem goes away. Yes it does. So Yanko was correct to blame 283414@main.

Final step: disable PSON, un-revert 283414@main, confirm the problem is back, just to double check. Yes indeed.

So this regression is caused by the *combination* of both 283414@main and the problem fixed by 284088@main. 284088@main was enough to fix it.

So the emergency will be resolved with 2.46.1 release. But we still need to figure out why 283414@main breaks Evolution when PSON is enabled.
Comment 18 Michael Catanzaro 2024-09-27 10:01:23 PDT
I'm going to use FIXED status despite the remaining mystery, since this problem doesn't seem to affect other apps that use PSON. At least Epiphany is definitely unaffected. Even Milan's test app is unaffected; I think he got confused because exactly one web process was cached, but we don't cache multiple web processes for the same security origin, and Evolution only uses one security origin ("file://"). I think that in order to trigger this bug, Evolution must be doing something differently than the test app does.
Comment 19 Milan Crha 2024-09-30 00:19:31 PDT
(In reply to Michael Catanzaro from comment #18)
> ... and Evolution only uses one security origin ("file://").

It uses more of them. A "mail://..." for the mail content and "evo-file:///" for the "loading" screen. The iframe-s use "mail://..." too, if I'm not mistaken.

> I think that in order to trigger this bug, Evolution must be doing something differently than the test app does.

Right, I believe I missed some detail in test app. I'm sorry.
Comment 20 Michael Catanzaro 2024-09-30 08:35:37 PDT
Oh, that's excellent news, because that explains everything. The process swaps are expected because you're switching between different security origins.

My suggestion is to stick to just one security origin for everything, because Evolution doesn't actually want unnecessary process swaps. Then try enable PSON manually and verify that it works without extra process swapping, so that you can be prepared for the future, since PSON is mandatory when using GTK 4. (Alternatively, if you never plan to port to GTK 4, you can ignore this.)
Comment 21 Milan Crha 2024-10-01 00:34:30 PDT
I'm not sure what you mean with "the process swaps". I still expect, PSON not PSON, different origin not different origin, that when a WebKitWebView vanishes, it takes the (cached?) WebKitWebProcess-es together with it. They may vanish even when the web view is still alive - for which you mentioned 5 minute interval, which may or may not be a problem, considering I read the new mails shortly after I run Evolution and then let it idle, checking for new mails on its own.

Maybe I misunderstood it, because with webkit2gtk4.1-2.46.0-3.fc42.x86_64 I views a message several times, which generated 28 lines in `ps ax | grep WebKitWeb` output and then I left it idle for almost two hours, not touching anything in Evolution and after that time there was no change, still 28 lines/processes.

If they are cached, what are they cached for? I can switch between two messages and there is no reuse of the cached WebProcess, new processes are created on each switch.

Evo cannot use the same scheme for everything, the scheme declares what's expected behind the URL and the proper resource is loaded accordingly.

Is there any WebKitSetting or WebKitWebContext option to disallow this on the app level? You mentioned it'll be mandatory anyway, thus I guess there is not. Or if there is, it can be void (existing, but doing nothing) in the future anyway.
Comment 22 Michael Catanzaro 2024-10-01 06:44:59 PDT
(In reply to Milan Crha from comment #21)
> I'm not sure what you mean with "the process swaps". I still expect, PSON
> not PSON, different origin not different origin, that when a WebKitWebView
> vanishes, it takes the (cached?) WebKitWebProcess-es together with it.

No, they could be reused for a different future web view. Their lifetime should be tied to the WebKitWebContext.

> Maybe I misunderstood it, because with webkit2gtk4.1-2.46.0-3.fc42.x86_64 I views a message several times, which generated 28 lines in `ps ax | grep WebKitWeb` output and then I left it idle for almost two hours, not touching anything in Evolution and after that time there was no change, still 28 lines/processes.

The cache timeout on other platforms is 30 minutes, but for WPE/GTK I lowered it to 5 minutes because that seemed excessive. They certainly shouldn't last two hours. I would report a separate bug for this, because something must be wrong.
 
> If they are cached, what are they cached for? I can switch between two
> messages and there is no reuse of the cached WebProcess, new processes are
> created on each switch.

If it's the same security origin (protocol, host, and port), then it should be reused. So if a cached web process previously displayed a mail:// URL, it should get reused when you want to display another mail:// URL. If it previously displayed an evo-file:// URL, it should get reused for that. Presumably my assumption is wrong, though. Maybe it's being treated as an opaque origin, in which case it will never compare equal to any other security origin.

> Is there any WebKitSetting or WebKitWebContext option to disallow this on
> the app level? You mentioned it'll be mandatory anyway, thus I guess there
> is not. Or if there is, it can be void (existing, but doing nothing) in the
> future anyway.

In GTK 3 it's off by default, so no need to disable, but it's process-swap-on-cross-site-navigation-enabled. In GTK 4 it's on by default and cannot be disabled.
Comment 23 Michael Catanzaro 2024-10-01 06:46:18 PDT
(In reply to Michael Catanzaro from comment #22)
> The cache timeout on other platforms is 30 minutes, but for WPE/GTK I
> lowered it to 5 minutes because that seemed excessive. They certainly
> shouldn't last two hours. I would report a separate bug for this, because
> something must be wrong.

Are the processes stopped (with SIGSTOP)? If so, that's bug #280014.
Comment 24 Milan Crha 2024-10-01 07:24:43 PDT
I'm switching between two emails, I select the first, then the second, then the first, then the second, then... and they do not change their URI identifier, they are always the same. Still the process count is increasing after the move to the another mail.

> Are the processes stopped (with SIGSTOP)?

How do I know? I do not see it in `ps ax`, or I'm looking in a wrong place. If you know a command, I can run it.

I opened bug #280681 as a follow up for both leftover issues, with a modified reproducer, which reproduces the problem out of Evolution this time.
Comment 25 Michael Catanzaro 2024-10-01 08:08:52 PDT
(In reply to Milan Crha from comment #24)
> How do I know? I do not see it in `ps ax`, or I'm looking in a wrong place.
> If you know a command, I can run it.

There is a Status column you can enable in gnome-system-monitor. If it says "Stopped" instead of "Sleeping" or "Running" then it's bug #280014. Probably not, though.
 
> I opened bug #280681 as a follow up for both leftover issues, with a
> modified reproducer, which reproduces the problem out of Evolution this time.

Thanks.