Bug 260723 - [GTK] Web process crash when creating GMainContext: Creating pipes for GWakeup: Too many open files
Summary: [GTK] Web process crash when creating GMainContext: Creating pipes for GWakeu...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 264674 266577 274241 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-08-25 11:42 PDT by Michael Catanzaro
Modified: 2024-09-05 04:35 PDT (History)
8 users (show)

See Also:


Attachments
full backtrace (second crash) (19.81 KB, text/plain)
2024-08-29 09:46 PDT, Michael Catanzaro
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2023-08-25 11:42:50 PDT
Look at this web process crash:

Thread 1 (Thread 0x7ff1bdffb6c0 (LWP 219)):
#0  g_log_structured_array (log_level=<optimized out>, fields=0x7ff1bdffa830, n_fields=4) at ../glib/gmessages.c:556
#1  0x00007ff4d7a8588c in g_log_default_handler (log_domain=log_domain@entry=0x7ff4d7ade00e "GLib", log_level=log_level@entry=6, message=message@entry=0x7ff1a0000d50 "Creating pipes for GWakeup: Too many open files", unused_data=unused_data@entry=0x0) at ../glib/gmessages.c:3284
#2  0x00007ff4c006d242 in trap_handler (log_domain=log_domain@entry=0x7ff4d7ade00e "GLib", log_level=log_level@entry=6, message=message@entry=0x7ff1a0000d50 "Creating pipes for GWakeup: Too many open files", user_data=user_data@entry=0x0) at ../lib/ephy-debug.c:104
#3  0x00007ff4d7a85b36 in g_logv (log_domain=0x7ff4d7ade00e "GLib", log_level=G_LOG_LEVEL_ERROR, format=<optimized out>, args=args@entry=0x7ff1bdffa9b0) at ../glib/gmessages.c:1392
#4  0x00007ff4d7a85e23 in g_log (log_domain=log_domain@entry=0x7ff4d7ade00e "GLib", log_level=log_level@entry=G_LOG_LEVEL_ERROR, format=format@entry=0x7ff4d7aeb330 "Creating pipes for GWakeup: %s") at ../glib/gmessages.c:1461
#5  0x00007ff4d7ad51aa in g_wakeup_new () at ../glib/gwakeup.c:164
#6  0x00007ff4d7a79a2f in g_main_context_new_with_flags (flags=<optimized out>) at ../glib/gmain.c:786
#7  0x00007ff4db5515f1 in WTF::RunLoop::RunLoop() (this=0x7ff2fe9ac360) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/glib/RunLoopGLib.cpp:66
#8  0x00007ff4db4f7e7d in WTF::RunLoop::Holder::Holder() (this=0x7ff2595b00d0) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/RunLoop.cpp:46
#9  WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>::Data::Data(WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>*) (this=0x7ff2595b00d0, owner=<optimized out>) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/ThreadSpecific.h:94
#10 WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>::set() (this=<optimized out>) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/ThreadSpecific.h:195
#11 WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>::operator WTF::RunLoop::Holder*() (this=<optimized out>) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/ThreadSpecific.h:211
#12 WTF::ThreadSpecific<WTF::RunLoop::Holder, (WTF::CanBeGCThread)0>::operator->() (this=<optimized out>) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/ThreadSpecific.h:217
#13 WTF::RunLoop::current() () at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/RunLoop.cpp:79
#14 0x00007ff4db4f88f2 in WTF::RunLoop::create(char const*, WTF::ThreadType, WTF::Thread::QOS)::$_0::operator()() const (this=0x7ff2be9381e8) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/RunLoop.cpp:112
#15 WTF::Detail::CallableWrapper<WTF::RunLoop::create(char const*, WTF::ThreadType, WTF::Thread::QOS)::$_0, void>::call() (this=0x400) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/Function.h:53
#16 0x00007ff4db4fb9a7 in WTF::Function<void ()>::operator()() const (this=<optimized out>) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/Function.h:82
#17 WTF::Thread::entryPoint(WTF::Thread::NewThreadContext*) (newThreadContext=0x7ff2f23f02c0) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/Threading.cpp:250
#18 0x00007ff4db55582d in WTF::wtfThreadEntryPoint(void*) (context=0x400) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WTF/wtf/posix/ThreadingPOSIX.cpp:242
#19 0x00007ff4dbc8ee39 in start_thread (arg=<optimized out>) at pthread_create.c:444
#20 0x00007ff4dbd16cc4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Well, that's unfortunate.

pipe(2) says:

       EMFILE The per‐process limit on the number of open file descriptors has been reached.

       ENFILE The system‐wide limit on the total number of open files has been reached.

       ENFILE The user hard limit on memory that can be allocated for pipes has been reached and the caller is  not
              privileged; see pipe(7).

errno(3) says:

       EMFILE          Too many open files (POSIX.1‐2001).  Commonly caused by exceeding the RLIMIT_NOFILE resource
                       limit described in getrlimit(2).  Can also be caused by exceeding  the  limit  specified  in
                       /proc/sys/fs/nr_open.

       ENFILE          Too  many  open  files in system (POSIX.1‐2001).  On Linux, this is probably a result of en‐
                       countering the /proc/sys/fs/file-max limit (see proc(5)).

The Epiphany UI process already increases its fd limit up to the hard limit: https://gitlab.gnome.org/GNOME/epiphany/-/merge_requests/1304. At first I thought "maybe the web process needs to do this too," but a typical web process only needs about 25 file descriptors open, so probably not. I wonder if this unfortunate crashed web process wound up hitting some sort of bug that caused it to open an extremely high number of file descriptors? Not sure how we could debug something like this, especially since I have no clue what this web process was doing before it crashed.
Comment 1 Michael Catanzaro 2023-08-25 11:45:34 PDT
Oh, I forgot to say, the system-wide limits are extremely high:

$ cat /proc/sys/fs/nr_open
1073741816
$ cat /proc/sys/fs/file-max
9223372036854775807
$ ulimit -n
1024

so we are almost certainly hitting the soft RLIMIT_NOFILE limit, which is 1024 (because any value higher than that breaks applications that use select()).
Comment 2 Michael Catanzaro 2023-08-28 08:01:34 PDT
Hit this again yesterday. :(
Comment 3 Michael Catanzaro 2023-12-01 11:19:40 PST
Hit again today when loading:

https://www.cnn.com/dodo-de-extinction-mauritius-spc-intl-scn/index.html

I strongly suspect either GStreamer or our multimedia code is leaking file descriptors, but we have no good way to know because once the process has crashed it's too late to check what fds are open, and until it has crashed we don't know that anything is wrong.
Comment 4 Philippe Normand 2023-12-09 10:50:57 PST
Well, current Ephy TP leaks MSE AppendPipelines, at least...

Run with --env=GST_DEBUG="GST_TRACER*:8" --env="GST_TRACERS=leaks(filters=GstElement)" then load some page with MSE videos, wait for a while, open an empty tab, close video tab, close Ephy.

Then you should see logs like this:

object-alive, type-name=(string)GstPipeline, address=(gpointer)0x55cd47958240, description=(string)<append-pipeline-audio-mp4-1>
Comment 5 Michael Catanzaro 2024-03-01 06:27:34 PST
*** Bug 264674 has been marked as a duplicate of this bug. ***
Comment 6 Michael Catanzaro 2024-03-01 06:27:42 PST
*** Bug 266577 has been marked as a duplicate of this bug. ***
Comment 7 Michael Catanzaro 2024-04-30 19:44:55 PDT
This happens relatively frequently when loading articles on cnn.com (as in comment #3), but unfortunately not frequently enough to be reproducible.
Comment 8 Michael Catanzaro 2024-05-16 05:23:12 PDT
OK, finally I have a reproducer. Load https://globalnews.ca/news/10497774/turks-and-caicos-american-tourist-arrested-ammo/ in Ephy Tech Preview. It's not a sure thing, but it *almost* always crashes. If you can't reproduce after loading the page just once, then load it a couple more times in different web views and it will almost definitely crash.
Comment 9 Michael Catanzaro 2024-05-16 05:35:04 PDT
This same reproducer also triggers bug #266573.
Comment 10 Philippe Normand 2024-05-16 06:09:22 PDT
(In reply to Michael Catanzaro from comment #8)
> OK, finally I have a reproducer. Load
> https://globalnews.ca/news/10497774/turks-and-caicos-american-tourist-
> arrested-ammo/ in Ephy Tech Preview. It's not a sure thing, but it *almost*
> always crashes. If you can't reproduce after loading the page just once,
> then load it a couple more times in different web views and it will almost
> definitely crash.

Doesn't crash here, but I may have found a nice leak that could be the culprit. See bug 274254.
Comment 11 Michael Catanzaro 2024-05-16 08:51:40 PDT
*** Bug 274241 has been marked as a duplicate of this bug. ***
Comment 12 Philippe Normand 2024-05-19 04:39:02 PDT
(In reply to Michael Catanzaro from comment #8)
> OK, finally I have a reproducer. Load
> https://globalnews.ca/news/10497774/turks-and-caicos-american-tourist-
> arrested-ammo/ in Ephy Tech Preview. It's not a sure thing, but it *almost*
> always crashes. If you can't reproduce after loading the page just once,
> then load it a couple more times in different web views and it will almost
> definitely crash.

I can reproduce this in TP, but not if the WEBKIT_DISABLE_SANDBOX_THIS_IS_DANGEROUS is set...

This seems related with the limits set on the WebProcess sandbox by flatpak-spawn.

The issue doesn't happen in MiniBrowser dev build, the flatpak stuff not being used there.
Comment 13 Michael Catanzaro 2024-06-07 03:59:57 PDT
(In reply to Philippe Normand from comment #12)
> The issue doesn't happen in MiniBrowser dev build, the flatpak stuff not
> being used there.

It does for me, even with the bubblewrap sandbox disabled. I can reproduce this crash very reliably by running:

$ jhbuild run env WEBKIT_DISABLE_SANDBOX_THIS_IS_DANGEROUS=1 WEBKIT_GST_HARNESS_DUMP_DIR=~/dump GST_DEBUG=webkit*:9 G_DEBUG= ~/Projects/GNOME/install/libexec/webkitgtk-6.0/MiniBrowser https://www.vox.com/future-perfect/352359/milk-dairy-schools
Comment 14 Michael Catanzaro 2024-06-07 04:10:30 PDT
(In reply to Philippe Normand from comment #10) 
> Doesn't crash here, but I may have found a nice leak that could be the
> culprit. See bug 274254.

And I'm testing 279750@main, so I already have this fix unfortunately.
Comment 15 Kdwk 2024-07-05 22:38:18 PDT
This bug is making reddit.com reliably crash.

Scrolling Reddit for too long invariably leads to this crash and stack trace.

WebKitGTK 2.45.4/ Spidey 1.0
Comment 16 Philippe Normand 2024-07-07 04:34:52 PDT
(In reply to Kdwk from comment #15)
> This bug is making reddit.com reliably crash.
> 
> Scrolling Reddit for too long invariably leads to this crash and stack trace.
> 

Can you try https://github.com/WebKit/WebKit/pull/30549 ?
Comment 17 Michael Catanzaro 2024-07-07 08:18:11 PDT
(In reply to Michael Catanzaro from comment #8)
> OK, finally I have a reproducer. Load
> https://globalnews.ca/news/10497774/turks-and-caicos-american-tourist-
> arrested-ammo/ in Ephy Tech Preview. It's not a sure thing, but it *almost*
> always crashes. If you can't reproduce after loading the page just once,
> then load it a couple more times in different web views and it will almost
> definitely crash.

Unfortunately this reproducer no longer works. I can load this page in many tabs at once without trouble. I do still see this crash somewhat regularly, though; I just don't have a reproducer anymore.

Hopefully Kdwk is able to reproduce on reddit. I use reddit a lot, but never see this happen there.
Comment 18 Nils K 2024-07-18 13:00:47 PDT
Based on the description it seems like we too are running into this bug on our digital signage devices which are running cog, i.e. WPE WebKit.
They show a set list of videos, images, and iframes on repeat, forever.

The crashes only happen once the browser process has been running for many hours - about once per day.
I would be more than happy to provide additional data that could help to figure out to root cause if someone tells me what data/commands would be helpful.

Backtrace:
#0  g_log_structured_array (log_level=<optimized out>, fields=0x7f97af1ff680, n_fields=4) at ../glib/gmessages.c:426
#1  0x00007fad69405d27 in g_log_default_handler (log_domain=log_domain@entry=0x7fad694ba2eb "GLib", log_level=log_level@entry=6, 
    message=message@entry=0x7fac881a2460 "Creating pipes for GWakeup: Too many open files", unused_data=unused_data@entry=0x0) at ../glib/gmessages.c:3357
#2  0x00007fad693fcb29 in g_logv (log_domain=0x7fad694ba2eb "GLib", log_level=G_LOG_LEVEL_ERROR, format=<optimized out>, args=args@entry=0x7f97af1ff7e0) at ../glib/gmessages.c:1246
#3  0x00007fad693fcea3 in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at ../glib/gmessages.c:1315
#4  0x00007fad6945175a in g_wakeup_new () at ../glib/gwakeup.c:162
#5  0x00007fad693f4618 in g_main_context_new_with_flags (flags=<optimized out>) at ../glib/gmain.c:658
#6  0x00007fad6ae9dd0d in WTF::RunLoop::current() () from /lib64/libWPEWebKit-2.0.so.1
#7  0x00007fad6ae9ded7 in WTF::Detail::CallableWrapper<WTF::RunLoop::create(char const*, WTF::ThreadType, WTF::Thread::QOS)::{lambda()#1}, void>::call() ()
   from /lib64/libWPEWebKit-2.0.so.1
#8  0x00007fad6aeeb162 in WTF::wtfThreadEntryPoint(void*) [clone .lto_priv.0] () from /lib64/libWPEWebKit-2.0.so.1
#9  0x00007fad698a6507 in start_thread (arg=<optimized out>) at pthread_create.c:447
#10 0x00007fad6992a214 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

System log directly before crash:
[70612.397185] weston[26604]: shared memfd open() failed: Too many open files
[70612.401727] weston[26604]: Failed to create secure directory (/run/user/999/pulse): Too many open files
[70612.401859] weston[26604]: socket(): Too many open files
[70612.691026] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.160: gst_poll_write_control: assertion 'set != NULL' failed
[70612.691263] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.161: gst_poll_write_control: assertion 'set != NULL' failed
[70612.691668] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.161: gst_poll_write_control: assertion 'set != NULL' failed
[70612.691814] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.161: gst_poll_write_control: assertion 'set != NULL' failed
[70612.691923] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.161: gst_poll_write_control: assertion 'set != NULL' failed
[70612.692049] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_write_control: assertion 'set != NULL' failed
[70612.692177] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_write_control: assertion 'set != NULL' failed
[70612.692316] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_write_control: assertion 'set != NULL' failed
[70612.692429] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_write_control: assertion 'set != NULL' failed
[70612.692551] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_write_control: assertion 'set != NULL' failed
[70612.692650] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_read_control: assertion 'set != NULL' failed
[70612.692779] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.162: gst_poll_read_control: assertion 'set != NULL' failed
[70612.710390] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.180: gst_poll_read_control: assertion 'set != NULL' failed
[70612.713099] weston[26604]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:54:48.183: gst_poll_read_control: assertion 'set != NULL' failed
[70619.904335] weston[26604]: http://localhost/_nuxt/CbqYvi5G.js:14:610: CONSOLE ERROR TypeError: (m.value??[]).filter is not a function. (In '(m.value??[]).filter(b=>b.Content!==null>
[70626.032338] weston[26604]: https://display.soundtrackyourbrand.com/assets/index.63e64c3a.js:42:852: CONSOLE DEBUG [render] Track already rendered
[70626.032804] weston[26604]: https://display.soundtrackyourbrand.com/assets/index.63e64c3a.js:25:173: CONSOLE DEBUG [app] Loading <REDACTED>
[70626.084897] weston[26604]: CONSOLE NETWORK INFO Successfully preconnected to https://i.soundcdn.com/
[70626.112462] weston[26604]: (WPEWebProcess:2): GLib-ERROR **: 19:55:01.582: Creating pipes for GWakeup: Too many open files
[70623.988784] kernel: traps: CryptoQueue[66906] trap int3 ip:7fad694002e7 sp:7f97af1ff630 error:0 in libglib-2.0.so.0.8000.3[7fad693b8000+a4000]
[70626.165246] systemd[1]: Created slice system-systemd\x2dcoredump.slice - Slice /system/systemd-coredump.
[70626.197365] systemd[1]: Started systemd-coredump@0-66907-0.service - Process Core Dump (PID 66907/UID 0).
[70643.337417] systemd-coredump[66908]: [🡕] Process 26604 (WPEWebProcess) of user 999 dumped core.
Comment 19 Nils K 2024-07-18 13:19:26 PDT
PS: We are also observing a slow but steady memory increase. It could be that this memory leak also stems from these open file descriptors.
Comment 20 Philippe Normand 2024-07-18 13:26:45 PDT
(In reply to nilskemail+webkit from comment #19)
> PS: We are also observing a slow but steady memory increase. It could be
> that this memory leak also stems from these open file descriptors.

Yes, pipelines leaking likely induce FD leaks and possibly memory leaks.
With which version is this happening?
Have you checked with git main?
Comment 21 Philippe Normand 2024-07-18 13:28:37 PDT
Also having a minimal test-case would help, if possible.

I assume mem usage is constant when using other web-engines such as Gecko or Chromium/etc?
Comment 22 Nils K 2024-07-18 13:50:04 PDT
> Yes, pipelines leaking likely induce FD leaks and possibly memory leaks.
> With which version is this happening?

cog 0.18.3 (WPE WebKit 2.44.1)

> Have you checked with git main?

No, sadly I do not have the capacity to do that. Apart from the time aspect I also only have a single device on which I can test this for a long time which I already use for other purposes

> Also having a minimal test-case would help, if possible.

Would a bundled web app be okay? In that case I might be able to provide you with the compiled web app and a mock/dummy server replacing our client logic.

> I assume mem usage is constant when using other web-engines such as Gecko or Chromium/etc?

I have not tried to run it on different web engines, yet (again due to only having a single device available for long running tests).
Comment 23 Philippe Normand 2024-07-19 02:12:13 PDT
> Would a bundled web app be okay?

Yes!
Comment 24 Nils K 2024-07-20 11:03:01 PDT
I did not yet have time to create a standalone reproducer as I need to remove some logic before sharing it. However, I investigated a bit more and found out a few things (not sure how much of that is already known). Whilst doing so I was also able to adjust my setup to provoke a crash within a few hours instead of a whole day.

* The owner of a large amount of fds is the root WPEWebProcess, i.e. the direct child of bwrap, not any of its children (according to `pstree -p`)
* The open fds of that process seem to be mostly `/dmabuf:` (according to `lsof`)
* The leaked fds seem to correlate with our memory leak
* The amount of leaked memory depends on the resolution of the screen, i.e. switching from 1440p to 1080p had no influence on the amount of open fds but the memory rose slower
* Crashing of the WPEWebProcess does not free the memory again. According to the system monitoring the amount of unevictable memory does drop upon the crash, however, it still shows up as shared memory and does not drop there.

I also got a new crash with a different backtrace. This might be due to enabling G_DEBUG=fatal-criticals for another bug report (https://bugs.webkit.org/show_bug.cgi?id=276819) causing the process to abort earlier (though even the first CRITICAL log message is a bit different)...

Backtrace:
#0  _g_log_abort (breakpoint=<optimized out>) at ../glib/gmessages.c:426
#1  g_logv (log_domain=0x7f4edd3e89ae "GStreamer", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7ffebc36e0d0) at ../glib/gmessages.c:1273
#2  0x00007f4edd53eea3 in g_log (log_domain=<optimized out>, log_level=<optimized out>, format=<optimized out>) at ../glib/gmessages.c:1315
#3  0x00007f4edd347ceb in gst_bus_constructed (object=0x556dd1401960) at ../gst/gstbus.c:184
#4  0x00007f4edc1f217a in g_object_new_internal (class=0x556dd0c7ecb0, params=0x0, n_params=0) at ../gobject/gobject.c:2657
#5  0x00007f4edc1f361e in g_object_new_internal (class=<optimized out>, params=<optimized out>, n_params=<optimized out>) at ../gobject/gobject.c:2603
#6  g_object_new_with_properties (object_type=<optimized out>, n_properties=<optimized out>, names=names@entry=0x0, values=values@entry=0x0) at ../gobject/gobject.c:2769
#7  0x00007f4edc1f4641 in g_object_new (object_type=<optimized out>, first_property_name=first_property_name@entry=0x0) at ../gobject/gobject.c:2415
#8  0x00007f4edd346fbe in gst_bus_new () at ../gst/gstbus.c:310
#9  0x00007f4e87e1de77 in gst_auto_detect_find_best (self=0x556dd119bd50) at ../gst/autodetect/gstautodetect.c:260
#10 gst_auto_detect_detect (self=0x556dd119bd50) at ../gst/autodetect/gstautodetect.c:368
#11 gst_auto_detect_change_state (element=0x556dd119bd50, transition=GST_STATE_CHANGE_NULL_TO_READY) at ../gst/autodetect/gstautodetect.c:420
#12 0x00007f4edd362306 in gst_element_change_state (element=element@entry=0x556dd119bd50, transition=transition@entry=GST_STATE_CHANGE_NULL_TO_READY) at ../gst/gstelement.c:3101
#13 0x00007f4edd362bcb in gst_element_set_state_func (element=0x556dd119bd50, state=GST_STATE_READY) at ../gst/gstelement.c:3055
#14 0x00007f4edd3396b9 in gst_bin_element_set_state (bin=<optimized out>, element=0x556dd119bd50, base_time=0, start_time=0, current=<optimized out>, next=<optimized out>)
    at ../gst/gstbin.c:2582
#15 gst_bin_change_state_func (element=0x556dd13faa20, transition=GST_STATE_CHANGE_NULL_TO_READY) at ../gst/gstbin.c:2934
#16 0x00007f4edd362306 in gst_element_change_state (element=element@entry=0x556dd13faa20, transition=transition@entry=GST_STATE_CHANGE_NULL_TO_READY) at ../gst/gstelement.c:3101
#17 0x00007f4edd362bcb in gst_element_set_state_func (element=0x556dd13faa20, state=GST_STATE_READY) at ../gst/gstelement.c:3055
#18 0x00007f4e87481977 in activate_sink (playbin=<optimized out>, sink=0x556dd13faa20, activated=0x7ffebc36e6cc) at ../gst/playback/gstplaybin2.c:4528
#19 activate_sink (playbin=<optimized out>, sink=0x556dd13faa20, activated=0x7ffebc36e6cc) at ../gst/playback/gstplaybin2.c:4503
#20 0x00007f4e874a72d0 in activate_group (target=GST_STATE_PAUSED, playbin=0x556dd1402b50, group=0x556dd1402fe0) at ../gst/playback/gstplaybin2.c:5316
#21 setup_next_source.constprop.0 (playbin=playbin@entry=0x556dd1402b50, target=GST_STATE_PAUSED) at ../gst/playback/gstplaybin2.c:5741
#22 0x00007f4e8747f229 in gst_play_bin_change_state (element=0x556dd1402b50, transition=<optimized out>) at ../gst/playback/gstplaybin2.c:5870
#23 0x00007f4edd362306 in gst_element_change_state (element=element@entry=0x556dd1402b50, transition=GST_STATE_CHANGE_READY_TO_PAUSED) at ../gst/gstelement.c:3101
#24 0x00007f4edd362881 in gst_element_continue_state (element=element@entry=0x556dd1402b50, ret=ret@entry=GST_STATE_CHANGE_SUCCESS) at ../gst/gstelement.c:2809
#25 0x00007f4edd36234a in gst_element_change_state (element=element@entry=0x556dd1402b50, transition=transition@entry=GST_STATE_CHANGE_NULL_TO_READY) at ../gst/gstelement.c:3140
#26 0x00007f4edd362bcb in gst_element_set_state_func (element=0x556dd1402b50, state=GST_STATE_PAUSED) at ../gst/gstelement.c:3055
#27 0x00007f4ee02a36e6 in WebCore::MediaPlayerPrivateGStreamer::changePipelineState(GstState) () from /lib64/libWPEWebKit-2.0.so.1
#28 0x00007f4ee02a4095 in WebCore::MediaPlayerPrivateGStreamer::commitLoad() () from /lib64/libWPEWebKit-2.0.so.1
#29 0x00007f4ee02e23d0 in WebCore::MediaPlayerPrivateGStreamer::load(WTF::String const&) () from /lib64/libWPEWebKit-2.0.so.1
#30 0x00007f4ee027085b in WebCore::MediaPlayer::loadWithNextMediaEngine(WebCore::MediaPlayerFactory const*) () from /lib64/libWPEWebKit-2.0.so.1
#31 0x00007f4ee0270e5d in WebCore::MediaPlayer::load(WTF::URL const&, WebCore::ContentType const&, WTF::String const&, bool) () from /lib64/libWPEWebKit-2.0.so.1
#32 0x00007f4edfe06463 in WebCore::HTMLMediaElement::loadResource(WTF::URL const&, WebCore::ContentType&, WTF::String const&) () from /lib64/libWPEWebKit-2.0.so.1
#33 0x00007f4edfe07f02 in WTF::Detail::CallableWrapper<WebCore::HTMLMediaElement::selectMediaResource()::{lambda()#1}, void>::call() () from /lib64/libWPEWebKit-2.0.so.1
#34 0x00007f4edfc3ab75 in WebCore::EventLoop::run(std::optional<WTF::ApproximateTime>) () from /lib64/libWPEWebKit-2.0.so.1
#35 0x00007f4edfc98baf in WebCore::WindowEventLoop::didReachTimeToRun() () from /lib64/libWPEWebKit-2.0.so.1
#36 0x00007f4ee01e1d62 in WTF::Detail::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::{lambda()#1}, void>::call() () from /lib64/libWPEWebKit-2.0.so.1
#37 0x00007f4edeeebdfe in WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) () from /lib64/libWPEWebKit-2.0.so.1
#38 0x00007f4edef1e53d in WTF::RunLoop::{lambda(_GSource*, int (*)(void*), void*)#1}::_FUN(_GSource*, int (*)(void*), void*) [clone .lto_priv.0] () from /lib64/libWPEWebKit-2.0.so.1
#39 0x00007f4edd538e8c in g_main_dispatch (context=0x556dd0b6dd50) at ../glib/gmain.c:3344
#40 g_main_context_dispatch_unlocked (context=0x556dd0b6dd50) at ../glib/gmain.c:4152
#41 0x00007f4edd59ac98 in g_main_context_iterate_unlocked.isra.0 (context=0x556dd0b6dd50, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4217
#42 0x00007f4edd53ef37 in g_main_loop_run (loop=0x556dd0b6dea0) at ../glib/gmain.c:4419
#43 0x00007f4edeeec008 in WTF::RunLoop::run() () from /lib64/libWPEWebKit-2.0.so.1
#44 0x00007f4ede0c9cd6 in WebKit::WebProcessMain(int, char**) () from /lib64/libWPEWebKit-2.0.so.1
#45 0x00007f4edd839088 in __libc_start_call_main (main=main@entry=0x556dac959070 <main>, argc=argc@entry=3, argv=argv@entry=0x7ffebc36f478)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#46 0x00007f4edd83914b in __libc_start_main_impl (main=0x556dac959070 <main>, argc=3, argv=0x7ffebc36f478, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffebc36f468) at ../csu/libc-start.c:360
#47 0x0000556dac9590a5 in _start ()


System journal:
Jul 20 19:26:04 Digital-Signage-Player weston[73445]: The per-process limit on the number of open file descriptors has been reached.
Jul 20 19:26:04 Digital-Signage-Player weston[73445]: ERROR: cannot create wakeup pipe
Jul 20 19:26:58 Digital-Signage-Player weston[73445]: Failed to create secure directory (/run/user/999/pulse): Too many open files
Jul 20 19:26:58 Digital-Signage-Player weston[73445]: socket(): Too many open files
Jul 20 19:26:58 Digital-Signage-Player weston[73445]: (WPEWebProcess:2): GStreamer-CRITICAL **: 19:26:58.952: gst_poll_get_read_gpollfd: assertion 'set != NULL' failed
Jul 20 19:26:58 Digital-Signage-Player kernel: traps: WPEWebProcess[73445] trap int3 ip:7f4edd53ec28 sp:7ffebc36e000 error:0 in libglib-2.0.so.0.8000.3[7f4edd4fa000+a4000]
Comment 25 Nils K 2024-07-21 15:32:10 PDT
(In reply to Philippe Normand from comment #23)
> > Would a bundled web app be okay?
> 
> Yes!

Just some static files are likely not sufficient as I have some endpoints cannot be cleanly mapped to files (e.g. /foo/bar and /foo/bar/baz, or /foo?a and /foo?b). I can provide you with either a HAR file or a simple python server which we use for testing. What is easier for you?
Comment 26 Nils K 2024-07-22 18:01:51 PDT
(In reply to Philippe Normand from comment #23)
> > Would a bundled web app be okay?
> 
> Yes!

I uploaded it at the file share platform of my university https://gigamove.rwth-aachen.de/en/download/3de530d665fdd1fccdb78c0d62175c45 (valid for two weeks - if it expires just ping me). In the file is the web app as well as a basic python server which handles some mimetypes etc. After extracting (and installing hypercorn+quart for the python server) just run "hypercorn -b 127.0.0.1:8000 player:app".
If you have any problems with the archive or getting it running please write me, I can also give you a HAR file or something else, whichever is most convenient. cog was then started with "--media-playback-requires-user-gesture=false" such that videos play automatically.

---

In addition to the points from my previous comment I have again done a bit of searching for possible causes. Within the Gstreamer GitLab I found the following two issues that could be related:

- https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/-/merge_requests/27
- https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/-/issues/106

However, these are both about the old vaapi* plugins which we are not using. We are using the new va* plugin.

I wanted to note this because I am not sure if WebKit handles stuff differently depending on the decoder which are used by Gstreamer.

We have a device which seems to stumble upon a driver error (https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11735) with the result that VA-API afterwards no longer functions. Curiously this device no longer seems to amass file descriptors once VA-API is broken. I did not yet have time to look further into this but I assume that WebKit/Gstreamer falls back to software rendering. I will have to set up a test where I explicitly disable the va* plugins and see if the crash still occurs (but I am short on time currently :/).
Comment 27 Philippe Normand 2024-07-25 04:37:18 PDT
You should check if the leak happens with GStreamer/va.

I've tested your app here (I was asking a simple app, is the 1.5MB of JS necessary for this kind of app?) with git main, on WebKitGTK on my desktop with the va plugin (AMD GPU) and it seems OK. I'll leave it running for an hour.
Comment 28 Nils K 2024-07-25 06:21:27 PDT
(In reply to Philippe Normand from comment #27)
> You should check if the leak happens with GStreamer/va.

I do not have a lot of experience with the Gstreamer API and to properly test this I assume my setup should be similar to what WebKit uses (i.e. start/stop states, which sink/source etc.)

> I've tested your app here (I was asking a simple app, is the 1.5MB of JS
> necessary for this kind of app?) with git main, on WebKitGTK on my desktop
> with the va plugin (AMD GPU) and it seems OK. I'll leave it running for an
> hour.

Yeah sorry about the size. A lot of that will be unused code which the bundler did not remove and the framework (Nuxt/Vue). I did not have the time to create a standalone reproducer and will be on vacation in a few days so I wanted to at least get something out there for you to check.
Comment 29 Nils K 2024-08-11 11:39:03 PDT
I am now back from vacation. If it helps I could take a bit of time next week to try create a reproducer without any JS framework involved. However, based on https://bugs.webkit.org/show_bug.cgi?id=277076 it seems like you already found the fault? (Just not a fix yet?)
Comment 30 Michael Catanzaro 2024-08-29 09:43:46 PDT
(In reply to nilskemail+webkit from comment #24)
> Jul 20 19:26:58 Digital-Signage-Player weston[73445]: (WPEWebProcess:2):
> GStreamer-CRITICAL **: 19:26:58.952: gst_poll_get_read_gpollfd: assertion
> 'set != NULL' failed

I got a backtrace for this one. Normally I would report a second bug for a second crash, but in this case it seems clear enough that the fd exhaustion is the underlying problem in both cases.
Comment 31 Michael Catanzaro 2024-08-29 09:46:14 PDT
Created attachment 472345 [details]
full backtrace (second crash)
Comment 32 Michael Catanzaro 2024-08-29 09:46:46 PDT
Forgot to mention: I hit this when loading https://duckduckgo.com search results page.
Comment 33 Philippe Normand 2024-08-29 11:31:22 PDT
We're well aware of the stack trace, but by then the fd limits being reached, it's a bit useless... Someone would need to track this with valgrind or similar tools.
Comment 34 Michael Catanzaro 2024-08-29 18:02:26 PDT
So my old reproducer from comment #8 doesn't work anymore, but I found a new one: visit https://www.firstalert4.com/2024/08/29/explosion-causes-manhole-covers-blow-off-north-st-louis/ and scroll down
Comment 35 Michael Catanzaro 2024-08-29 18:15:28 PDT
Unfortunately my development build doesn't crash at all, only Tech Preview. So I tried:

$ flatpak run -d --command=bash org.gnome.Epiphany.Devel
[📦 org.gnome.Epiphany.Devel ~]$ valgrind --trace-children=yes --track-fds=yes epiphany -p https://www.firstalert4.com/2024/08/29/explosion-causes-manhole-covers-blow-off-north-st-louis/

Unfortunately the web process just immediately crashes and valgrind outputs a terrifying spew of serious-looking warnings. valgrind thinks that WebKit writes to file descriptors that it has already closed a *lot*, and valgrind does not like this. I'm not sure valgrind is correct, though; some of the warnings I see look like valgrind is getting confused by the multiple processes. I'll look closer tomorrow and report a separate bug if needed.
Comment 36 Michael Catanzaro 2024-08-30 07:27:51 PDT
Now https://gitlab.gnome.org/GNOME/epiphany/-/issues/2448 is crashing too. Reproducer: load the page and go do something else for a few minutes.

This crash has been happening for a long time, but for some reason it's much worse as of yesterday. Not sure why.
Comment 37 Michael Catanzaro 2024-08-30 09:51:27 PDT
(In reply to Michael Catanzaro from comment #35)
> Unfortunately the web process just immediately crashes and valgrind outputs
> a terrifying spew of serious-looking warnings. valgrind thinks that WebKit
> writes to file descriptors that it has already closed a *lot*, and valgrind
> does not like this. I'm not sure valgrind is correct, though; some of the
> warnings I see look like valgrind is getting confused by the multiple
> processes. I'll look closer tomorrow and report a separate bug if needed.

https://gitlab.gnome.org/GNOME/gtk/-/issues/6969

https://gitlab.gnome.org/GNOME/glib/-/merge_requests/4227
Comment 38 Michael Catanzaro 2024-08-30 11:14:46 PDT
I'm actually suspecting a bug in valgrind now, https://gitlab.gnome.org/GNOME/gtk/-/issues/6969#note_2211047. That will make it difficult to use to track down this bug.
Comment 39 Michael Catanzaro 2024-09-02 10:05:18 PDT
The valgrind developers have fixed the issue, but that doesn't do us much good because I can only reproduce the crash in my flatpak environment, so I cannot use the special valgrind I have built.
Comment 40 Philippe Normand 2024-09-02 10:29:35 PDT
Can you check if the issue remains with WEBKIT_DISABLE_DMABUF_RENDERER=1 ?
Comment 41 Michael Catanzaro 2024-09-02 15:42:40 PDT
(In reply to Philippe Normand from comment #40)
> Can you check if the issue remains with WEBKIT_DISABLE_DMABUF_RENDERER=1 ?

Yes, it still crashes immediately.

Most reliable and non-geoblocked reproducer: https://www.reddit.com/r/IdiotsInCars/
Comment 42 Carlos Garcia Campos 2024-09-02 22:46:31 PDT
(In reply to Philippe Normand from comment #40)
> Can you check if the issue remains with WEBKIT_DISABLE_DMABUF_RENDERER=1 ?

Maybe you meant WEBKIT_GST_DMABUF_SINK_DISABLED?
Comment 43 Philippe Normand 2024-09-03 05:43:24 PDT
(In reply to Michael Catanzaro from comment #39)
> I can only reproduce the crash in my flatpak environment, so I
> cannot use the special valgrind I have built.

If this happens only in flatpak could it be an issue in the runtime?
Comment 44 Philippe Normand 2024-09-03 05:43:53 PDT
(In reply to Carlos Garcia Campos from comment #42)
> (In reply to Philippe Normand from comment #40)
> > Can you check if the issue remains with WEBKIT_DISABLE_DMABUF_RENDERER=1 ?
> 
> Maybe you meant WEBKIT_GST_DMABUF_SINK_DISABLED?

Nope
Comment 45 Michael Catanzaro 2024-09-03 05:46:32 PDT
(In reply to Philippe Normand from comment #43)
> If this happens only in flatpak could it be an issue in the runtime?

I think it's actually the opposite. The videos that crash Ephy Tech Preview are all broken in my jhbuild environment. Using the reddit example, all the videos either do not start or play with audio only and no video.

I have this warning, which has been there for years:

(WebKitWebProcess:2): GStreamer-WARNING **: 07:44:13.839: External plugin loader failed. This most likely means that the plugin loader helper binary was not found or could not be run. You might need to set the GST_PLUGIN_SCANNER environment variable if your setup is unusual. This should normally not be required though.

(In reply to Carlos Garcia Campos from comment #42)
> Maybe you meant WEBKIT_GST_DMABUF_SINK_DISABLED?

Still crashes immediately.
Comment 46 Philippe Normand 2024-09-03 05:50:06 PDT
Your jhbuild seems broken then, and this GStreamer warning is afaik unrelated. Can you play any video with gst-play-1.0 in the jhbuild shell?
Comment 47 Michael Catanzaro 2024-09-03 06:04:27 PDT
(In reply to Philippe Normand from comment #46)
> Your jhbuild seems broken then, and this GStreamer warning is afaik
> unrelated. Can you play any video with gst-play-1.0 in the jhbuild shell?

I hear audio, but I don't see any video.

I see two warnings:

WARNING Your GStreamer installation is missing a plug-in.
WARNING debug information: ../gst/playback/gstdecodebin3.c(3312): mq_slot_check_reconfiguration (): /GstPlayBin3:playbin/GstURIDecodeBin3:uridecodebin3/GstDecodebin3:decodebin3-0:
Comment 48 Philippe Normand 2024-09-03 06:16:53 PDT
> Most reliable and non-geoblocked reproducer: https://www.reddit.com/r/IdiotsInCars/

Well, I'm not sure what to do here, when you scroll away from a video, the player is paused, which means the resources are not released. So the paused players accumulate as you scroll down and the number of FDs keep growing until doom happens.

Same issue happens if I make MediaPlayerPrivateGStreamer::setVisibleInViewport() return early...

We have a timer in the player that will teardown the paused pipeline after 5 minutes of activity, but that's disabled for MSE players (not sure why.).
Comment 49 Philippe Normand 2024-09-03 06:25:00 PDT
Thas was changed in bug 264739 ... CCing Quique.
Comment 50 Michael Catanzaro 2024-09-03 06:42:49 PDT
(In reply to Philippe Normand from comment #48)
> Well, I'm not sure what to do here, when you scroll away from a video, the
> player is paused, which means the resources are not released. So the paused
> players accumulate as you scroll down and the number of FDs keep growing
> until doom happens.

Hm, for me the page crashes immediately. I don't have to actually play any video before the crash occurs, so I would expect all players should be paused at the time of the crash? (Maybe not?)

Then the other websites that trigger this crash generally only have a couple videos.
Comment 51 Enrique Ocaña 2024-09-03 07:04:22 PDT
So, the patch I brought in https://bugs.webkit.org/show_bug.cgi?id=264739 leaves "old" players in PAUSED state to avoid the last frame to disappear and become black on downstream platforms that need the player (and the internal platform-specific multimedia engine) working to show that last frame.

I see some options here:

a) Leave the decision to NOT move to READY and stay in PAUSED (old behaviour) as a Quirk  that would be enabled on specific downstream platforms.

b) Maintain a global queue of (weak?) references to MediaPlayerPrivate. For instance, holding the latest 20 active players in "PAUSED because of timeout" state. When more players need to be created and the maximum limit of "PAUSED because of timeout" active players (20) is reached, we start moving the oldest ones to READY, freeing resources at the expense of causing the "last frame shown as blank" issue that https://bugs.webkit.org/show_bug.cgi?id=264739 intended to fix.
Comment 52 Enrique Ocaña 2024-09-03 07:17:57 PDT
Still, it would be good to know if a simple revert of https://commits.webkit.org/270766@main would fix the issue, to see if working on any the approaches that I mentioned actually would fix anything in the end or not.
Comment 53 Nils K 2024-09-03 07:21:52 PDT
In the scenario where I encounter this the video elements are destroyed and only 2 are present at most times (one to show the current video and one in the background to preload the next one). So while many paused video elements might also cause fd exhaustion, in my case the fd limit is caused from a leak somewhere else
Comment 54 Michael Catanzaro 2024-09-03 07:49:25 PDT
Note this bug report is three months older than 270766@main.
Comment 55 Michael Catanzaro 2024-09-04 09:17:06 PDT
I just hit this crash on https://gitlab.gnome.org/GNOME/gtk/-/issues/6983 which appears to have only one media element.

(In reply to Enrique Ocaña from comment #52)
> Still, it would be good to know if a simple revert of
> https://commits.webkit.org/270766@main would fix the issue, to see if
> working on any the approaches that I mentioned actually would fix anything
> in the end or not.

Normally I would try to test this, but since the bug never occurs in my development environment, that's hard. We could try reverting it in the GNOME runtime to see what happens to Epiphany Tech Preview, but this is annoying, and since the bug predates the commit you identified, I guess it's probably not worth the effort this time?
Comment 56 Enrique Ocaña 2024-09-05 04:35:45 PDT
Yeah, probably it's not worth the effort.