Bug 223479

Summary: [WPE][GTK] Run web processes in separate cgroups using systemd-run
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: WebKitGTKAssignee: Michael Catanzaro <mcatanzaro>
Status: RESOLVED WONTFIX    
Severity: Normal CC: annulen, benjamin, bugs-noreply, ews-watchlist, gyuyoung.kim, mcatanzaro, ryuan.choi, sergio
Priority: P2    
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
Attachments:
Description Flags
Patch
none
Patch ews-feeder: commit-queue-

Description Michael Catanzaro 2021-03-18 14:11:54 PDT
systemd 248 introduces systemd-oomd, a userspace oom killer contributed by Facebook. Unlike earlyoom and other earlier killers, systemd-oomd operates with cgroup-level granularity. That is, if a process in any given cgroup is using an excessive amount of memory, systemd-oomd will kill the entire cgroup all at once. This has implications for desktop environments like GNOME and KDE, which have started launching applications in separate cgroups to ensure a misbehaving application does not cause the entire desktop to be killed. It also has implications for multiprocess applications like terminals and web browsers.

In WebKit, the web processes are untrusted and are designed to crash and be restarted independently of each other. If a single web process uses an excessive amount of memory, we do not want systemd-oomd to kill the entire browser; instead, it should just kill the affected web process. We can achieve this using systemd-run. Of course, there is no point in doing this unless we are already building with -DUSE_SYSTEMD=ON. Also, we'll restrict this behavior to systemd 246 or newer, which adds the --slice-inherit argument to systemd-run. Both the bubblewrap sandbox launcher and the legacy unsandboxed process launcher are modified. The flatpak-spawn launcher is not modified because the spawn portal already runs subprocesses in a new cgroup. Note that it doesn't make sense to do this for the network process (or other trusted subprocesses), since WebKit is not designed to cope with the network process disappearing. Only the web processes should be run in separate cgroups.
Comment 1 Michael Catanzaro 2021-03-18 14:15:19 PDT
Before, all processes (including two web processes) in one cgroup:

│   │ │ ├─app-org.gnome.Terminal.slice 
│   │ │ │ ├─vte-spawn-f5ea225f-122d-4e91-90d7-cb738bf49f4a.scope 
│   │ │ │ │ ├─77720 bash
│   │ │ │ │ ├─78871 systemd-cgls
│   │ │ │ │ └─78872 less
│   │ │ │ ├─vte-spawn-043103de-4ec0-46be-9edd-27a709be51ac.scope 
│   │ │ │ │ ├─39766 bash
│   │ │ │ │ ├─78590 epiphany
│   │ │ │ │ ├─78598 /usr/libexec/webkit2gtk-4.0/WebKitNetworkProcess 1 20
│   │ │ │ │ ├─78663 /usr/bin/bwrap --args 52 -- /usr/bin/xdg-dbus-proxy --args=49
│   │ │ │ │ ├─78664 /usr/bin/bwrap --args 52 -- /usr/bin/xdg-dbus-proxy --args=49
│   │ │ │ │ ├─78667 /usr/bin/xdg-dbus-proxy --args=49
│   │ │ │ │ ├─78669 /usr/bin/bwrap --args 53 -- /usr/bin/xdg-dbus-proxy --args=50
│   │ │ │ │ ├─78670 /usr/bin/bwrap --args 53 -- /usr/bin/xdg-dbus-proxy --args=50
│   │ │ │ │ ├─78671 /usr/bin/xdg-dbus-proxy --args=50
│   │ │ │ │ ├─78673 /usr/bin/bwrap --args 50 -- /usr/libexec/webkit2gtk-4.0/WebKitWebProcess 11 44
│   │ │ │ │ ├─78674 /usr/bin/bwrap --args 50 -- /usr/libexec/webkit2gtk-4.0/WebKitWebProcess 11 44
│   │ │ │ │ ├─78675 /usr/bin/bwrap --args 57 -- /usr/libexec/webkit2gtk-4.0/WebKitWebProcess 17 53
│   │ │ │ │ ├─78676 /usr/libexec/webkit2gtk-4.0/WebKitWebProcess 11 44
│   │ │ │ │ ├─78679 /usr/bin/bwrap --args 57 -- /usr/libexec/webkit2gtk-4.0/WebKitWebProcess 17 53
│   │ │ │ │ └─78680 /usr/libexec/webkit2gtk-4.0/WebKitWebProcess 17 53
│   │ │ │ └─gnome-terminal-server.service 
│   │ │ │   └─35067 /usr/libexec/gnome-terminal-server

After, web processes split into separate cgroups:

│   │ │ ├─app-org.gnome.Terminal.slice 
│   │ │ │ ├─vte-spawn-f5ea225f-122d-4e91-90d7-cb738bf49f4a.scope 
│   │ │ │ │ ├─77720 bash
│   │ │ │ │ ├─86472 epiphany
│   │ │ │ │ ├─86481 /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitNetworkProcess 1 18
│   │ │ │ │ ├─86545 /usr/bin/bwrap --args 51 -- /usr/bin/xdg-dbus-proxy --args=48
│   │ │ │ │ ├─86546 /usr/bin/bwrap --args 51 -- /usr/bin/xdg-dbus-proxy --args=48
│   │ │ │ │ ├─86548 /usr/bin/xdg-dbus-proxy --args=48
│   │ │ │ │ ├─86550 /usr/bin/bwrap --args 52 -- /usr/bin/xdg-dbus-proxy --args=49
│   │ │ │ │ ├─86551 /usr/bin/bwrap --args 52 -- /usr/bin/xdg-dbus-proxy --args=49
│   │ │ │ │ └─86552 /usr/bin/xdg-dbus-proxy --args=49
│   │ │ │ ├─vte-spawn-043103de-4ec0-46be-9edd-27a709be51ac.scope 
│   │ │ │ │ ├─39766 bash
│   │ │ │ │ ├─86769 systemd-cgls
│   │ │ │ │ └─86770 less
│   │ │ │ ├─run-r4b1acfd6a19a4fe78622c8b608dde6ef.scope 
│   │ │ │ │ ├─86554 /usr/bin/bwrap --args 49 -- /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitWebProcess 11 43
│   │ │ │ │ ├─86555 /usr/bin/bwrap --args 49 -- /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitWebProcess 11 43
│   │ │ │ │ └─86558 /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitWebProcess 11 43
│   │ │ │ ├─run-r02f98d5665514673a73066896b4a1c65.scope 
│   │ │ │ │ ├─86556 /usr/bin/bwrap --args 56 -- /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitWebProcess 17 52
│   │ │ │ │ ├─86560 /usr/bin/bwrap --args 56 -- /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitWebProcess 17 52
│   │ │ │ │ └─86561 /home/mcatanzaro/Projects/GNOME/install/libexec/webkit2gtk-4.0/WebKitWebProcess 17 52
│   │ │ │ └─gnome-terminal-server.service 
│   │ │ │   └─35067 /usr/libexec/gnome-terminal-server

(Note that bwrap execs itself, so there are two bwrap processes with identical command lines for every web process and xdg-dbus-proxy process.)
Comment 2 Michael Catanzaro 2021-03-18 14:29:09 PDT
Created attachment 423657 [details]
Patch
Comment 3 Michael Catanzaro 2021-03-18 15:54:42 PDT
It seems I have some #if guards messed up somewhere. I'm bisecting something now, will fix it later.
Comment 4 Michael Catanzaro 2021-03-18 18:21:53 PDT
Created attachment 423678 [details]
Patch
Comment 5 Benjamin Berg 2021-03-19 01:58:16 PDT
On the patch itself seems good in general.

Some suggestions:

1. I would suggest adding "--collect" to the systemd-run options. There is no reason to leave transient units lying around if one of the helper processes does not exit cleanly.

2. With the unit name as-is, you'll break the resource usage display in "GNOME Usage" and KDE. I think you can fix this by using "sd_pid_get_user_unit", splitting off the trailing unit type (.service/.scope), and then appending e.g. the PID (without a new "-" character for scopes).

3. If you need to grab your own unit name already. Maybe add a `PartOf=` to your own unit. That way systemd guarantees it'll eventually be killed after the parent is gone. I doubt this is important though, just feels like a nice touch to me.


I'll discuss this with some other people. My fear is that we'll want something quite different in the long run. This was the reason for wanting to go for an external hack (i.e. the "cgroupify" thingy I wrote), as it means we can propose a more sound long term solution to browsers eventually and just disable it.

Unfortunately, so far all discussions about this have ended inconclusively.
Comment 6 Michael Catanzaro 2021-03-19 06:20:52 PDT
(In reply to Benjamin Berg from comment #5)
> On the patch itself seems good in general.
> 
> Some suggestions:
> 
> 1. I would suggest adding "--collect" to the systemd-run options. There is
> no reason to leave transient units lying around if one of the helper
> processes does not exit cleanly.

OK.

> 2. With the unit name as-is, you'll break the resource usage display in
> "GNOME Usage" and KDE. I think you can fix this by using
> "sd_pid_get_user_unit", splitting off the trailing unit type
> (.service/.scope), and then appending e.g. the PID (without a new "-"
> character for scopes).

OK, passing the result to the --unit flag I suppose.

> 3. If you need to grab your own unit name already. Maybe add a `PartOf=` to
> your own unit. That way systemd guarantees it'll eventually be killed after
> the parent is gone. I doubt this is important though, just feels like a nice
> touch to me.

Using the --property flag? It's not needed because the web process will close itself when its connection to the UI process is closed.

> I'll discuss this with some other people. My fear is that we'll want
> something quite different in the long run. This was the reason for wanting
> to go for an external hack (i.e. the "cgroupify" thingy I wrote), as it
> means we can propose a more sound long term solution to browsers eventually
> and just disable it.
> 
> Unfortunately, so far all discussions about this have ended inconclusively.

I haven't heard of cgroupify. :) Anyway sure, let's hold off on this until discussions progress further. FWIW I don't think any of the above is too hard, except calculating our own scope name to provide to --unit sounds a little bit annoying.
Comment 7 Michael Catanzaro 2021-03-30 16:42:59 PDT
This is obsoleted by https://gitlab.freedesktop.org/benzea/uresourced/-/commit/9be60420d117b5015ef22eb6937f946db971bc68, at least for now.