RESOLVED FIXED Bug 264824
[GStreamer] Crash after 10 seconds on watchdog thread due to loop when destroying ~ImageDecoderGStreamerSample
https://bugs.webkit.org/show_bug.cgi?id=264824
Summary [GStreamer] Crash after 10 seconds on watchdog thread due to loop when destro...
Michael Catanzaro
Reported 2023-11-14 10:46:35 PST
Created attachment 468593 [details] backtrace Similar to bug #257551, bug #259504, and bug #260796, here's another crash after 10 seconds when the main thread has deadlocked deep inside GStreamer. The main thread is stuck here: hread 2 (Thread 0x7f115d3e7f40 (LWP 2)): #0 0x00007f11655113cb in __GI_sched_yield () at ../sysdeps/unix/syscall-template.S:120 #1 0x00007f11610b1279 in g_thread_yield () at ../glib/gthread-posix.c:1329 #2 0x00007f1161d55ad5 in default_stop (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:415 #3 0x00007f1161d4eb53 in do_stop (pool=pool@entry=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:440 #4 0x00007f1161d56b70 in dec_outstanding (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:1207 #5 dec_outstanding (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:1197 #6 gst_buffer_pool_release_buffer (buffer=0x7f0e941fff80 [GstBuffer], pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602]) at ../gst/gstbufferpool.c:1393 #7 gst_buffer_pool_release_buffer (pool=0x7f0e941fcf70 [GstVideoBufferPool|videobufferpool1602], buffer=0x7f0e941fff80 [GstBuffer]) at ../gst/gstbufferpool.c:1372 #8 0x00007f1161d56bb8 in _gst_buffer_dispose (buffer=0x7f0e941fff80 [GstBuffer]) at ../gst/gstbuffer.c:776 #9 0x00007f1161d8c0d4 in gst_mini_object_unref (mini_object=0x7f0e941fff80 [GstBuffer]) at ../gst/gstminiobject.c:669 #10 0x00007f1161dbefca in _gst_sample_free (sample=0x55b845127710 [GstSample]) at ../gst/gstsample.c:87 #11 0x00007f1167d89570 in WTF::GRefPtr<_GstSample>::clear() (this=0x7f109a9f07b8) at WTF/Headers/wtf/glib/GRefPtr.h:90 #12 WebCore::ImageGStreamer::~ImageGStreamer() (this=0x7f109a9f07b0) at /buildstream/gnome/sdk/webkitgtk-6.0.bst/Source/WebCore/platform/graphics/gstreamer/ImageGStreamerCairo.cpp:175 All-threads backtrace attached.
Attachments
backtrace (94.92 KB, text/plain)
2023-11-14 10:46 PST, Michael Catanzaro
no flags
Philippe Normand
Comment 1 2023-11-15 07:00:03 PST
Deadlock or infinite loop?
Michael Catanzaro
Comment 2 2023-11-15 07:05:28 PST
Hm, I guess it's indeed an infinite (or excessively slow) loop inside default_stop().
Philippe Normand
Comment 3 2023-11-15 07:18:32 PST
It's a bit weird, maybe something buggy in gst_video_convert_sample()... The returned sample is ref'ed in the decoder buffer pool, which was disposed before returning from gst_video_convert_sample() ... We might need to do a deep copy of the sample I think...
Philippe Normand
Comment 4 2023-11-15 07:41:48 PST
Michael Catanzaro
Comment 5 2023-11-15 08:04:45 PST
I just hit this again when viewing: https://news.sky.com/story/plane-forced-to-return-to-airport-after-horse-gets-loose-on-board-13008688 This crash occurred shortly after a different web process died to bug #264674 when attempting to display the same page. Notably, the UI process died as well, although there is no core dump for it, so presumably it quit itself and we'll never learn why. :/ I also found a huge spam of warnings in my journal. The first look like this: Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_write_control: assertion 'set != NULL' failed Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_write_control: assertion 'set != NULL' failed Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_read_control: assertion 'set != NULL' failed So that explains why the loop never quits: while (!gst_poll_read_control (priv->poll)) It will always return FALSE due to the precondition violation.
Philippe Normand
Comment 6 2023-11-15 10:00:42 PST
This makes no sense and I am unable to reproduce this issue. Sorry.
Michael Catanzaro
Comment 7 2024-01-01 06:50:34 PST
I believe I've effectively lost my system journal to this bug; I now have only a few days of history in my journal instead of the several years of history that I would expect. It's all spammed with: gst_poll_read_control: assertion 'set != NULL' failed I see about 30,000 of these in the first two seconds of my journal; after that, there's a 30 second gap and then more of the same. I haven't checked beyond that but I think it's safe to assume that my system journal is probably >99% this. Can we turn this into a g_assert() to crash the process instead of spam the journal? (Arguably there is a systemd bug here for even allowing one process to create this much spam, but oh well.)
Michael Catanzaro
Comment 8 2024-01-25 07:23:26 PST
How about we add an assertion to crash the process when this bug is hit, so it doesn't spam the journal? I'm trying to investigate a hardware error that happened yesterday at the same time that I hit this bug, and having to wade through thousands of lines of this is not helpful. I gave up on hitting PageDown after 100,000 lines of this error message, and I suspect that might be only a small fraction of the spam; it's so much that journalctl isn't even able to page to the end.
Philippe Normand
Comment 9 2024-01-25 10:36:12 PST
I'm not sure to follow. iiuc: gst_poll_read_control: assertion 'set != NULL' failed happens because we run out of descriptors. In any case that's deep in GStreamer. So not sure what you want to do about this. Obviously the proper fix would be to fix the leaks :)
Michael Catanzaro
Comment 10 2024-01-25 11:39:43 PST
What I'm suggesting as a stopgap measure until this can be properly investigated and fixed. Something like: diff --git a/subprojects/gstreamer/gst/gstbufferpool.c b/subprojects/gstreamer/gst/gstbufferpool.c index d7da0cd4ad..d06c78fd6a 100644 --- a/subprojects/gstreamer/gst/gstbufferpool.c +++ b/subprojects/gstreamer/gst/gstbufferpool.c @@ -408,6 +408,7 @@ default_stop (GstBufferPool * pool) /* clear the pool */ while ((buffer = gst_atomic_queue_pop (priv->queue))) { + g_assert (priv->poll != NULL); while (!gst_poll_read_control (priv->poll)) { if (errno == EWOULDBLOCK) { /* We put the buffer into the queue but did not finish writing control While a crash of course is not great, it would be *much* less inconvenient than a denial of service attack against the system journal. :)
Philippe Normand
Comment 11 2024-02-16 08:39:14 PST
(In reply to Michael Catanzaro from comment #5) > I just hit this again when viewing: > > https://news.sky.com/story/plane-forced-to-return-to-airport-after-horse- > gets-loose-on-board-13008688 > After disabling my pihole I can reproduce the warning (GStreamer-CRITICAL **: 16:30:29.127: gst_poll_write_control: assertion 'set != NULL' failed) in TP (gotta scroll down to the bottom and wait). I can't reproduce this with current git main, so ... shrug. Maybe this was fixed then? There were a bunch of leak fixes in the last couple months.
Philippe Normand
Comment 12 2024-02-16 09:24:52 PST
OK I can reproduce the issue after setting autoplay-policy=deny and enabling leak tracing...
Philippe Normand
Comment 13 2024-02-17 05:11:11 PST
EWS
Comment 14 2024-02-20 03:50:15 PST
Committed 275032@main (e653f5a19d03): <https://commits.webkit.org/275032@main> Reviewed commits have been landed. Closing PR #24684 and removing active labels.
Radar WebKit Bug Importer
Comment 15 2024-02-20 03:51:20 PST
Michael Catanzaro
Comment 16 2024-03-01 06:28:18 PST
*** Bug 266573 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 17 2024-03-08 09:40:24 PST
*** Bug 270698 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.