Created attachment 468593 [details] backtrace Similar to bug #257551, bug #259504, and bug #260796, here's another crash after 10 seconds when the main thread has deadlocked deep inside GStreamer. The main thread is stuck here: hread 2 (Thread 0x7f115d3e7f40 (LWP 2)): #0 0x00007f11655113cb in __GI_sched_yield () at ../sysdeps/unix/syscall-template.S:120 #1 0x00007f11610b1279 in g_thread_yield () at ../glib/gthread-posix.c:1329 #2 0x00007f1161d55ad5 in default_stop (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:415 #3 0x00007f1161d4eb53 in do_stop (pool=pool@entry=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:440 #4 0x00007f1161d56b70 in dec_outstanding (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:1207 #5 dec_outstanding (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:1197 #6 gst_buffer_pool_release_buffer (buffer=0x7f0e941fff80 [GstBuffer], pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:1393 #7 gst_buffer_pool_release_buffer (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602], buffer=0x7f0e941fff80 [GstBuffer]) at ../gst/gstbufferpool.c:1372 #8 0x00007f1161d56bb8 in _gst_buffer_dispose (buffer=0x7f0e941fff80 [GstBuffer]) at ../gst/gstbuffer.c:776 #9 0x00007f1161d8c0d4 in gst_mini_object_unref (mini_object=0x7f0e941fff80 [GstBuffer]) at ../gst/gstminiobject.c:669 #10 0x00007f1161dbefca in _gst_sample_free (sample=0x55b845127710 [GstSample]) at ../gst/gstsample.c:87 #11 0x00007f1167d89570 in WTF::GRefPtr<_GstSample>::clear() (this=0x7f109a9f07b8) at WTF/Headers/wtf/glib/GRefPtr.h:90 #12 WebCore::ImageGStreamer::~ImageGStreamer() (this=0x7f109a9f07b0) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WebCore/platform/graphics/gstreamer/ImageGStreamerCairo.cpp:175 All-threads backtrace attached.
Deadlock or infinite loop?
Hm, I guess it's indeed an infinite (or excessively slow) loop inside default_stop().
It's a bit weird, maybe something buggy in gst_video_convert_sample()... The returned sample is ref'ed in the decoder buffer pool, which was disposed before returning from gst_video_convert_sample() ... We might need to do a deep copy of the sample I think...
I filed a GStreamer bug about this: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3131
I just hit this again when viewing: https://news.sky.com/story/plane-forced-to-return-to-airport-after-horse-gets-loose-on-board-13008688 This crash occurred shortly after a different web process died to bug #264674 when attempting to display the same page. Notably, the UI process died as well, although there is no core dump for it, so presumably it quit itself and we'll never learn why. :/ I also found a huge spam of warnings in my journal. The first look like this: Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_write_control: assertion 'set != NULL' failed Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_write_control: assertion 'set != NULL' failed Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_read_control: assertion 'set != NULL' failed So that explains why the loop never quits: while (!gst_poll_read_control (priv->poll)) It will always return FALSE due to the precondition violation.
This makes no sense and I am unable to reproduce this issue. Sorry.
I believe I've effectively lost my system journal to this bug; I now have only a few days of history in my journal instead of the several years of history that I would expect. It's all spammed with: gst_poll_read_control: assertion 'set != NULL' failed I see about 30,000 of these in the first two seconds of my journal; after that, there's a 30 second gap and then more of the same. I haven't checked beyond that but I think it's safe to assume that my system journal is probably >99% this. Can we turn this into a g_assert() to crash the process instead of spam the journal? (Arguably there is a systemd bug here for even allowing one process to create this much spam, but oh well.)
How about we add an assertion to crash the process when this bug is hit, so it doesn't spam the journal? I'm trying to investigate a hardware error that happened yesterday at the same time that I hit this bug, and having to wade through thousands of lines of this is not helpful. I gave up on hitting PageDown after 100,000 lines of this error message, and I suspect that might be only a small fraction of the spam; it's so much that journalctl isn't even able to page to the end.
I'm not sure to follow. iiuc: gst_poll_read_control: assertion 'set != NULL' failed happens because we run out of descriptors. In any case that's deep in GStreamer. So not sure what you want to do about this. Obviously the proper fix would be to fix the leaks :)
What I'm suggesting as a stopgap measure until this can be properly investigated and fixed. Something like: diff --git a/subprojects/gstreamer/gst/gstbufferpool.c b/subprojects/gstreamer/gst/gstbufferpool.c index d7da0cd4ad..d06c78fd6a 100644 --- a/subprojects/gstreamer/gst/gstbufferpool.c +++ b/subprojects/gstreamer/gst/gstbufferpool.c @@ -408,6 +408,7 @@ default_stop (GstBufferPool * pool) /* clear the pool */ while ((buffer = gst_atomic_queue_pop (priv->queue))) { + g_assert (priv->poll != NULL); while (!gst_poll_read_control (priv->poll)) { if (errno == EWOULDBLOCK) { /* We put the buffer into the queue but did not finish writing control While a crash of course is not great, it would be *much* less inconvenient than a denial of service attack against the system journal. :)
(In reply to Michael Catanzaro from comment #5) > I just hit this again when viewing: > > https://news.sky.com/story/plane-forced-to-return-to-airport-after-horse- > gets-loose-on-board-13008688 > After disabling my pihole I can reproduce the warning (GStreamer-CRITICAL **: 16:30:29.127: gst_poll_write_control: assertion 'set != NULL' failed) in TP (gotta scroll down to the bottom and wait). I can't reproduce this with current git main, so ... shrug. Maybe this was fixed then? There were a bunch of leak fixes in the last couple months.
OK I can reproduce the issue after setting autoplay-policy=deny and enabling leak tracing...
Pull request: https://github.com/WebKit/WebKit/pull/24684
Committed 275032@main (e653f5a19d03): <https://commits.webkit.org/275032@main> Reviewed commits have been landed. Closing PR #24684 and removing active labels.
<rdar://problem/123304014>
*** Bug 266573 has been marked as a duplicate of this bug. ***
*** Bug 270698 has been marked as a duplicate of this bug. ***