Bug 272975
Summary: | [MSE][GStreamer] Early flushes may deadlock WebkitMediaSrc | ||
---|---|---|---|
Product: | WebKit | Reporter: | Carlos Bentzen <cadubentzen> |
Component: | Media | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW | ||
Severity: | Normal | CC: | bandou.yacine, kdwkleung, philn |
Priority: | P2 | ||
Version: | WebKit Nightly Build | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=256428 https://bugs.webkit.org/show_bug.cgi?id=230028 |
Carlos Bentzen
We have some tests that timeout while sending FLUSH_STOP in webKitMediaSrcStreamFlush.
- media/media-source/media-source-abort-resets-parser.html: 100% for me in release mode)
- media/media-source/media-source-seek-twice.html: 20% of the time when running that test with --fully-parallel --repeat-each=1000
- media/media-source/media-source-current-time.html: 20% of the time when running that test with --fully-parallel --repeat-each=1000)
They have in common the fact that flushes are requested very early, before the playbin pipeline is fully created and connected.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Carlos Bentzen
Pull request: https://github.com/WebKit/WebKit/pull/27517
Carlos Bentzen
I have a WIP patch, but needs a lot of refinement before proposing to merge it.
What I have found out so far:
1. In media-source-abort-resets-parser.html, we deadlock because FLUSH_START (which travels out-of-band), arrives at the playsink earlier than STREAM_START. STREAM_START is only handled when the sticky events are pushed after sending FLUSH_STOP. When handling that, it in turn blocks the video pad and waits for FLUSH_START, but FLUSH_START arrived at the pad before and thus it deadlocks.
In gstreamer code, STREAM_START will block in gstpad.c do_probe_callbacks() on GST_PAD_BLOCK_WAIT(pad); It would have been unblocked by a FLUSH_START in gst_pad_push_event_unchecked() via GST_PAD_BLOCK_BROADCAST(pad); (but remember, in the test we get the broadcast call first and then the block call).
Another thing that is important to trigger this is the fact that the media segment contains both video and audio, but no audio samples are pushed. The test is pushing half the bytes of the segment and aborting, but the audio samples are in the final 2% of the file. Why is that important? Because the audio stream's STREAM_START never arrives and thus doesn't trigger a reconfiguration in the playsink (see gstplaysink.c sinkpad_blocked_cb()), which would unblock the pads as well. To confirm this, simply change the manifest in the test from 'test-fragmented-manifest.json' to 'test-fragmented-video-manifest.json' and the issue goes away.
For fixing that, I added a synchronization point before FLUSH_START to wait for STREAM_START to have been handled in the playsink for all streams that pushed buffers. This was achieved with a probe in the playsink pad. Very ugly, but it works.
2. For media/media-source/media-source-current-time.html, the case above happens, but also the second level of racing: sending a FLUSH_START after all STREAM_START's have been handled, but before playsink has caps for the streams on its pads.
In this case
Carlos Bentzen
Continuing...
2. For media/media-source/media-source-current-time.html, the case above happens, but also the second level of racing: sending a FLUSH_START after all STREAM_START's have been handled, but before playsink has caps for the streams on its pads.
In this case, I don't fully understand it yet, but essentially the pipeline will reject buffers returning flushing indefinitely, even though FLUSH_START and FLUSH_STOP are sent.
Added another synchronization point for, if we have received all STREAM_START's, to only flush after playsink has all the caps. Achieved that by connecting to the "notify::caps" signal on the playsink pads. Again, very ugly but it worked as well.
Carlos Bentzen
Now knowing this I'd like to see if this synchronization could be moved to MediaPlayerPrivateGStreamer[MSE]. If very weird having to add API and condition variables in the source element for a pipeline issue. And see if there's any way to actually fix this in playsink or playbin3 elements themselves, if I can reduce this to a GStreamer-only test case.
Carlos Bentzen
(In reply to Carlos Bentzen from comment #4)
> If very weird having to add API and
It's very weird*
Yacine Bandou
According to my analysis, we lose the FLUSH-START event (which is not a sticky event) in the multiqueue source pad which is not yet linked to the avdec_h264 sink pad, so when the multiqueue is linked to avdec_h264 and then to playsink, the sticky events are sent, the first being STREAM-START, immediately afterwards we receive the FLUSH-STOP in playsink without FLUSH-START which is lost !
See this traces:
0:00:00.588103102 [36m 83[00m 0x564b9de40210 [37mDEBUG [00m [00m multiqueue gstmultiqueue.c:2680:gst_multi_queue_sink_event:<multiqueue0:queue_0>[00m Received flush start event
0:00:00.588106251 [36m 83[00m 0x564b9de40210 [37mDEBUG [00m [00;01;31;44m GST_PADS gstpad.c:5657:gst_pad_push_event_unchecked:<multiqueue0:src_0>[00m Dropping event flush-start because pad is not linked
0:00:00.588108455 [36m 83[00m 0x564b9de40210 [37mDEBUG [00m [00m multiqueue gstmultiqueue.c:1382:gst_single_queue_flush:<multiqueue0:queue_0>[00m flush start
0:00:00.588110632 [36m 83[00m 0x564b9de40210 [33;01mLOG [00m [00m multiqueue gstmultiqueue.c:1392:gst_single_queue_flush:<multiqueue0:queue_0>[00m Waking up eventually waiting task
Yacine Bandou
Of course, this analysis is about the test:
- media/media-source/media-source-abort-resets-parser.html
Philippe Normand
Related, I suppose? https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6763
Yacine Bandou
Yes, this MR fixes one of some cases of deadlock. but it doesn't fixe the timeout of this LayoutTest "media/media-source/media-source-abort-resets-parser.html".
Philippe Normand
*** Bug 274190 has been marked as a duplicate of this bug. ***
Philippe Normand
*** Bug 276716 has been marked as a duplicate of this bug. ***