Summary: | [MSE][GStreamer] Early flushes may deadlock WebkitMediaSrc | ||
---|---|---|---|
Product: | WebKit | Reporter: | Carlos Bentzen <cadubentzen> |
Component: | Media | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW --- | ||
Severity: | Normal | CC: | bandou.yacine, kdwkleung, philn |
Priority: | P2 | ||
Version: | WebKit Nightly Build | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=256428 https://bugs.webkit.org/show_bug.cgi?id=230028 |
Description
Carlos Bentzen
2024-04-19 06:20:59 PDT
Pull request: https://github.com/WebKit/WebKit/pull/27517 I have a WIP patch, but needs a lot of refinement before proposing to merge it. What I have found out so far: 1. In media-source-abort-resets-parser.html, we deadlock because FLUSH_START (which travels out-of-band), arrives at the playsink earlier than STREAM_START. STREAM_START is only handled when the sticky events are pushed after sending FLUSH_STOP. When handling that, it in turn blocks the video pad and waits for FLUSH_START, but FLUSH_START arrived at the pad before and thus it deadlocks. In gstreamer code, STREAM_START will block in gstpad.c do_probe_callbacks() on GST_PAD_BLOCK_WAIT(pad); It would have been unblocked by a FLUSH_START in gst_pad_push_event_unchecked() via GST_PAD_BLOCK_BROADCAST(pad); (but remember, in the test we get the broadcast call first and then the block call). Another thing that is important to trigger this is the fact that the media segment contains both video and audio, but no audio samples are pushed. The test is pushing half the bytes of the segment and aborting, but the audio samples are in the final 2% of the file. Why is that important? Because the audio stream's STREAM_START never arrives and thus doesn't trigger a reconfiguration in the playsink (see gstplaysink.c sinkpad_blocked_cb()), which would unblock the pads as well. To confirm this, simply change the manifest in the test from 'test-fragmented-manifest.json' to 'test-fragmented-video-manifest.json' and the issue goes away. For fixing that, I added a synchronization point before FLUSH_START to wait for STREAM_START to have been handled in the playsink for all streams that pushed buffers. This was achieved with a probe in the playsink pad. Very ugly, but it works. 2. For media/media-source/media-source-current-time.html, the case above happens, but also the second level of racing: sending a FLUSH_START after all STREAM_START's have been handled, but before playsink has caps for the streams on its pads. In this case Continuing... 2. For media/media-source/media-source-current-time.html, the case above happens, but also the second level of racing: sending a FLUSH_START after all STREAM_START's have been handled, but before playsink has caps for the streams on its pads. In this case, I don't fully understand it yet, but essentially the pipeline will reject buffers returning flushing indefinitely, even though FLUSH_START and FLUSH_STOP are sent. Added another synchronization point for, if we have received all STREAM_START's, to only flush after playsink has all the caps. Achieved that by connecting to the "notify::caps" signal on the playsink pads. Again, very ugly but it worked as well. Now knowing this I'd like to see if this synchronization could be moved to MediaPlayerPrivateGStreamer[MSE]. If very weird having to add API and condition variables in the source element for a pipeline issue. And see if there's any way to actually fix this in playsink or playbin3 elements themselves, if I can reduce this to a GStreamer-only test case. (In reply to Carlos Bentzen from comment #4) > If very weird having to add API and It's very weird* According to my analysis, we lose the FLUSH-START event (which is not a sticky event) in the multiqueue source pad which is not yet linked to the avdec_h264 sink pad, so when the multiqueue is linked to avdec_h264 and then to playsink, the sticky events are sent, the first being STREAM-START, immediately afterwards we receive the FLUSH-STOP in playsink without FLUSH-START which is lost ! See this traces: 0:00:00.588103102 [36m 83[00m 0x564b9de40210 [37mDEBUG [00m [00m multiqueue gstmultiqueue.c:2680:gst_multi_queue_sink_event:<multiqueue0:queue_0>[00m Received flush start event 0:00:00.588106251 [36m 83[00m 0x564b9de40210 [37mDEBUG [00m [00;01;31;44m GST_PADS gstpad.c:5657:gst_pad_push_event_unchecked:<multiqueue0:src_0>[00m Dropping event flush-start because pad is not linked 0:00:00.588108455 [36m 83[00m 0x564b9de40210 [37mDEBUG [00m [00m multiqueue gstmultiqueue.c:1382:gst_single_queue_flush:<multiqueue0:queue_0>[00m flush start 0:00:00.588110632 [36m 83[00m 0x564b9de40210 [33;01mLOG [00m [00m multiqueue gstmultiqueue.c:1392:gst_single_queue_flush:<multiqueue0:queue_0>[00m Waking up eventually waiting task Of course, this analysis is about the test: - media/media-source/media-source-abort-resets-parser.html Related, I suppose? https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6763 Yes, this MR fixes one of some cases of deadlock. but it doesn't fixe the timeout of this LayoutTest "media/media-source/media-source-abort-resets-parser.html". *** Bug 274190 has been marked as a duplicate of this bug. *** *** Bug 276716 has been marked as a duplicate of this bug. *** |