Summary: | [GStreamer] Crash after 10 seconds on watchdog thread due to loop when destroying ~ImageDecoderGStreamerSample | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Catanzaro <mcatanzaro> | ||||
Component: | Media | Assignee: | Philippe Normand <philn> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | mcatanzaro, philn, vitaly, webkit-bug-importer | ||||
Priority: | P2 | Keywords: | InRadar | ||||
Version: | WebKit Nightly Build | ||||||
Hardware: | PC | ||||||
OS: | Linux | ||||||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=257551 https://bugs.webkit.org/show_bug.cgi?id=259504 https://bugs.webkit.org/show_bug.cgi?id=260796 https://bugs.webkit.org/show_bug.cgi?id=266573 https://bugs.webkit.org/show_bug.cgi?id=260723 |
||||||
Attachments: |
|
Description
Michael Catanzaro
2023-11-14 10:46:35 PST
Deadlock or infinite loop? Hm, I guess it's indeed an infinite (or excessively slow) loop inside default_stop(). It's a bit weird, maybe something buggy in gst_video_convert_sample()... The returned sample is ref'ed in the decoder buffer pool, which was disposed before returning from gst_video_convert_sample() ... We might need to do a deep copy of the sample I think... I filed a GStreamer bug about this: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3131 I just hit this again when viewing: https://news.sky.com/story/plane-forced-to-return-to-airport-after-horse-gets-loose-on-board-13008688 This crash occurred shortly after a different web process died to bug #264674 when attempting to display the same page. Notably, the UI process died as well, although there is no core dump for it, so presumably it quit itself and we'll never learn why. :/ I also found a huge spam of warnings in my journal. The first look like this: Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_write_control: assertion 'set != NULL' failed Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_write_control: assertion 'set != NULL' failed Nov 15 09:55:54 chargestone-cave org.gnome.Epiphany.Devel.desktop[35157]: (WebKitWebProcess:2): GStreamer-CRITICAL **: 09:55:54.594: gst_poll_read_control: assertion 'set != NULL' failed So that explains why the loop never quits: while (!gst_poll_read_control (priv->poll)) It will always return FALSE due to the precondition violation. This makes no sense and I am unable to reproduce this issue. Sorry. I believe I've effectively lost my system journal to this bug; I now have only a few days of history in my journal instead of the several years of history that I would expect. It's all spammed with: gst_poll_read_control: assertion 'set != NULL' failed I see about 30,000 of these in the first two seconds of my journal; after that, there's a 30 second gap and then more of the same. I haven't checked beyond that but I think it's safe to assume that my system journal is probably >99% this. Can we turn this into a g_assert() to crash the process instead of spam the journal? (Arguably there is a systemd bug here for even allowing one process to create this much spam, but oh well.) How about we add an assertion to crash the process when this bug is hit, so it doesn't spam the journal? I'm trying to investigate a hardware error that happened yesterday at the same time that I hit this bug, and having to wade through thousands of lines of this is not helpful. I gave up on hitting PageDown after 100,000 lines of this error message, and I suspect that might be only a small fraction of the spam; it's so much that journalctl isn't even able to page to the end. I'm not sure to follow. iiuc: gst_poll_read_control: assertion 'set != NULL' failed happens because we run out of descriptors. In any case that's deep in GStreamer. So not sure what you want to do about this. Obviously the proper fix would be to fix the leaks :) What I'm suggesting as a stopgap measure until this can be properly investigated and fixed. Something like: diff --git a/subprojects/gstreamer/gst/gstbufferpool.c b/subprojects/gstreamer/gst/gstbufferpool.c index d7da0cd4ad..d06c78fd6a 100644 --- a/subprojects/gstreamer/gst/gstbufferpool.c +++ b/subprojects/gstreamer/gst/gstbufferpool.c @@ -408,6 +408,7 @@ default_stop (GstBufferPool * pool) /* clear the pool */ while ((buffer = gst_atomic_queue_pop (priv->queue))) { + g_assert (priv->poll != NULL); while (!gst_poll_read_control (priv->poll)) { if (errno == EWOULDBLOCK) { /* We put the buffer into the queue but did not finish writing control While a crash of course is not great, it would be *much* less inconvenient than a denial of service attack against the system journal. :) (In reply to Michael Catanzaro from comment #5) > I just hit this again when viewing: > > https://news.sky.com/story/plane-forced-to-return-to-airport-after-horse- > gets-loose-on-board-13008688 > After disabling my pihole I can reproduce the warning (GStreamer-CRITICAL **: 16:30:29.127: gst_poll_write_control: assertion 'set != NULL' failed) in TP (gotta scroll down to the bottom and wait). I can't reproduce this with current git main, so ... shrug. Maybe this was fixed then? There were a bunch of leak fixes in the last couple months. OK I can reproduce the issue after setting autoplay-policy=deny and enabling leak tracing... Pull request: https://github.com/WebKit/WebKit/pull/24684 Committed 275032@main (e653f5a19d03): <https://commits.webkit.org/275032@main> Reviewed commits have been landed. Closing PR #24684 and removing active labels. *** Bug 266573 has been marked as a duplicate of this bug. *** *** Bug 270698 has been marked as a duplicate of this bug. *** |