RESOLVED FIXED 144040
[GStreamer] Adaptive streaming issues
https://bugs.webkit.org/show_bug.cgi?id=144040
Summary [GStreamer] Adaptive streaming issues
Philippe Normand
Reported 2015-04-22 03:34:37 PDT
In the case of HLS playback the source element used for fragments downloading does a (sometimes too) quick state transition: PAUSED->READY->PAUSED. However our httpsrc element starts a GMainLoop in its PAUSED->READY transition without waiting its completion. So sometimes webKitWebSrcStop() will be called too late, during the READY->PAUSED transition and playback will stall.
Attachments
patch (3.91 KB, patch)
2015-04-22 06:37 PDT, Philippe Normand
no flags
patch (2.45 KB, patch)
2015-04-23 06:33 PDT, Philippe Normand
no flags
patch (2.71 KB, patch)
2015-04-23 06:34 PDT, Philippe Normand
no flags
patch (3.20 KB, patch)
2015-06-18 07:15 PDT, Philippe Normand
mcatanzaro: review+
Patch (6.75 KB, patch)
2016-05-05 05:07 PDT, Carlos Garcia Campos
pnormand: review+
pnormand: commit-queue-
WIP (15.57 KB, patch)
2016-06-20 05:44 PDT, Carlos Garcia Campos
no flags
Updated patch to include ChangeLog (18.70 KB, patch)
2016-06-27 00:07 PDT, Carlos Garcia Campos
no flags
Updated patch (20.58 KB, patch)
2016-06-27 23:59 PDT, Carlos Garcia Campos
pnormand: review+
Philippe Normand
Comment 1 2015-04-22 06:37:47 PDT
Philippe Normand
Comment 2 2015-04-23 06:33:52 PDT
Philippe Normand
Comment 3 2015-04-23 06:34:25 PDT
Carlos Garcia Campos
Comment 4 2015-05-05 01:28:47 PDT
Comment on attachment 251430 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=251430&action=review > Source/WebCore/ChangeLog:9 > + Block in state transition to prevent element to enter in an > + inconsistent state in the next READY->PAUSED transition. to prevent element from entering? I'm not english expert at all, though. Could you elaborate a bit more? the bug title says "Adaptive streaming issues", what issues? what's exactly the problem we are solving and how? > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:-419 > - ASSERT(isMainThread()); So this could be used by any thread? or should use just invert the assert? we should make more obvious what is expected to run in each thread.
Philippe Normand
Comment 5 2015-06-18 07:10:54 PDT
Comment on attachment 251430 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=251430&action=review >> Source/WebCore/ChangeLog:9 >> + inconsistent state in the next READY->PAUSED transition. > > to prevent element from entering? I'm not english expert at all, though. Could you elaborate a bit more? the bug title says "Adaptive streaming issues", what issues? what's exactly the problem we are solving and how? I described the issue in the bug description but yeah, sure, I can do it here as well :) >> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:-419 >> - ASSERT(isMainThread()); > > So this could be used by any thread? or should use just invert the assert? we should make more obvious what is expected to run in each thread. Any thread. The adaptive streaming scenario is a bit special because the demuxer will ask the uridownloader to perform its task in a thread, AFAIK.
Philippe Normand
Comment 6 2015-06-18 07:15:18 PDT
Sebastian Dröge (slomo)
Comment 7 2016-01-03 00:43:32 PST
Alternatively you could block in READY->PAUSED, e.g. by adding an idle source to the main loop and triggering a condition variable from there. That's what is done elsewhere in GStreamer in such situations.
Philippe Normand
Comment 8 2016-01-04 04:43:24 PST
I don't think I'll land this patch just yet, bug 152043 is also related.
Philippe Normand
Comment 9 2016-04-05 00:23:20 PDT
*** Bug 156186 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 10 2016-04-05 00:23:28 PDT
*** Bug 154579 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 11 2016-04-05 00:24:43 PDT
*** Bug 151634 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 12 2016-04-05 00:24:55 PDT
*** Bug 149672 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 13 2016-04-05 00:25:23 PDT
*** Bug 138606 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 14 2016-04-05 09:36:52 PDT
It looks like we need to prioritize this issue.
Philippe Normand
Comment 15 2016-04-07 05:53:33 PDT
*** Bug 156323 has been marked as a duplicate of this bug. ***
Philippe Normand
Comment 16 2016-04-07 08:35:52 PDT
Comment on attachment 255113 [details] patch This patch is bitrotten. The long-term fix will be to use the HTTP context sharing which should (I hope) be merged into gst 1.9 soon.
Carlos Garcia Campos
Comment 17 2016-05-05 05:07:01 PDT
Created attachment 278162 [details] Patch This fixes the crashes, but not the layout tests, those still time out for different reasons.
Philippe Normand
Comment 18 2016-05-05 05:49:06 PDT
Comment on attachment 278162 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=278162&action=review Thanks for the explanation! > Source/WebCore/ChangeLog:15 > + owner is not expecting and causing rumntoime critical warnings and very often web process crashes. typoe: rumntoime
Carlos Garcia Campos
Comment 19 2016-05-05 06:03:18 PDT
Michael Catanzaro
Comment 20 2016-05-05 10:35:07 PDT
I think we need to file a GStreamer bug for this, or link to one here if one already exists.
Carlos Garcia Campos
Comment 21 2016-05-05 22:56:44 PDT
(In reply to comment #20) > I think we need to file a GStreamer bug for this, or link to one here if one > already exists. That's the plan, but I didn't find the time yet
Philippe Normand
Comment 22 2016-05-24 00:06:37 PDT
*** Bug 153587 has been marked as a duplicate of this bug. ***
WebKit Commit Bot
Comment 23 2016-06-14 09:51:06 PDT
Re-opened since this is blocked by bug 158740
Michael Catanzaro
Comment 24 2016-06-14 09:53:41 PDT
FWIW as long as the GStreamer bug is fixed, I don't think it's necessarily desirable to have workarounds in WebKit... distros should just update GStreamer.
Carlos Garcia Campos
Comment 25 2016-06-14 22:09:20 PDT
(In reply to comment #23) > Re-opened since this is blocked by bug 158740 What happens now in fb/twitter? does the web process crash?
Michael Catanzaro
Comment 26 2016-06-15 04:56:46 PDT
(In reply to comment #25) > What happens now in fb/twitter? does the web process crash? Andres, can you answer this? Did it crash often, sometimes, or never in 2.12.2? I rolled out the patch because my understanding is these sites were usable in 2.12.2 but not in 2.12.3.
Andres Gomez Garcia
Comment 27 2016-06-15 05:29:51 PDT
(In reply to comment #25) > (In reply to comment #23) > > Re-opened since this is blocked by bug 158740 > > What happens now in fb/twitter? does the web process crash? With pre 2.12.2, the process crashed every now and then. For me, mostly in FB since I don't use Twitter that much. With 2.12.2, it crashed in Twitter very, very often. Basically, every time a HLS video was embedded in the page. Similar in FB. Only, if you control when to play the videos, you would avoid it to crash. With 2.12.3, my experience is that FB gets stalled all the time. It is specially bad because it gets stalled when the front page is loading. Basically, you cannot do anything. I suppose it happens when in your front page appears some video but you don't need to scroll to it and try to play it, which would trigger the crash in former versions, it just gets stalled on loading. In any case, this is just my experience. Others may have a different one.
Carlos Garcia Campos
Comment 28 2016-06-20 05:42:55 PDT
Ok, I think I've managed to fix this, but I'm not 100% sure because it's not always easy to reproduce all the deadlocks and threading issues are hard. Anyway, I've found two different deadlocks that are blocking the web process: - The first one is actually a set of deadlocks that can happen in many places. It happens because gst is using several threads to download manifest, fragments, monitor the downloads, etc. To download the fragments and manifest it always creates the source element in a separate thread, something that is not actually expected to happen in WebKit source element (see bug #152043). Our source element is always scheduling tasks (start, stop, need-data, enough-data and seek) to the main thread, and those downloads that use the ResourceHandleStreamingClient (there's no player associated) also happen in the main thread, libsoup calls all its async callbacks in the main thread. So, the result is that it can happen that we end up blocking the main thread in a lock until the download finishes, but the download never finishes because tasks are scheduled in the main thread that is blocked in a lock. I managed to fix this by always using a secondary thread for downloads made by ResourceHandleStreamingClient. My first idea was to do the download in the same secondary thread that calls changeState, or in the first NeedData (note that changeState and NeedData are called form different threads), but those secondary threads don't have a run loop running, and we need a different GMainContext in the running thread for libsoup to send callbacks to the right thread. So, in the end, I moved the download to a thread using its own run loop. I also had to refactor the tasks a bit, leaving the thread safe parts to be run in the calling thread allays, and only scheduling to the main thread in case of not using ResourceHandleStreamingClient and only for the non thread safe parts. - The second deadlock happens when the media player is destroyed by the garbacge collector and pipeline is set to NULL state. This always happens in the main thread, since the garbage colector frees the objects in the main thread. The situation is the following one: Main Thread: - blocked on GST_PAD_STREAM_LOCK (post_activate() gstpad.c) Secondary Thread: - blocked on GST_MANIFEST_LOCK (gst_adaptive_demux_src_event() reconfigure gstadaptivedemux.c) - has EXPOSE_LOCK of gstdecodebin2 taken from source_pad_blocked_cb() Secondary Thread: - blocked on EXPOSE_LOCK (no_more_pads_cb() gstdecodebin2.c) - it seems to have GST_PAD_STREAM_LOCK of gstpad taken from gst_pad_send_event_unchecked() - has GST_MANIFEST_LOCK of gstadaptivedemux taken from _src_event, got fragment EOS Secondary Thread: - infinite loop waiting for fragment download to finish (gst_adaptive_demux_stream_download_uri() I haven't been able to fix or workaround this from WebKit code, the only chnage the seemed to work was releasing the manifest lock in gst_adaptive_demux_expose_streams before calling gst_element_no_more_pads) can taking it again. But I have no idea if the change is correct or just a workaround, the gst code is full of locks and conditions that is very difficult to follow. So, I'll submit the WebKit patch here as WIP for now, and if Sebastian confirms the gst changes makes sense, I'll submit the patch to gst too. A possible workaround for WebKit, to avoid depending on changes in gst would also be appreciated. Please, try both patches and let me know if it fixes the problem for you. In the mean time I'll run the layout tests to make sure I haven't introduced any regression.
Carlos Garcia Campos
Comment 29 2016-06-20 05:44:21 PDT
Carlos Garcia Campos
Comment 30 2016-06-20 05:47:59 PDT
You first need to revert the revert of r200455 since it was a perfectly valid fix (or use a newer version of gst-plugins-bad, since r200455 was indeed a workaround for a gst bug that was fixed already). I forgot to say that this patch should also fix bug #152043, since we no longer use the main thread notifier for sources not created in the main thread.
Carlos Garcia Campos
Comment 31 2016-06-27 00:07:20 PDT
Created attachment 282104 [details] Updated patch to include ChangeLog Just added the changelog after confirming that the remaining deadlocks need to be fixed in gst. I'll try to find a workaround in WebKit, but it doesn't seem easy. Also remember that we need to unrollout r200455, so the patch won't apply to current trunk
Philippe Normand
Comment 32 2016-06-27 08:07:33 PDT
Comment on attachment 282104 [details] Updated patch to include ChangeLog Looks good to me but perhaps Zan or Martin would also like to step in.
Carlos Garcia Campos
Comment 33 2016-06-27 23:59:51 PDT
Created attachment 282217 [details] Updated patch It just includes r200455 as well, as requested by phil, so that it applies to current trunk
WebKit Commit Bot
Comment 34 2016-06-28 00:01:59 PDT
Attachment 282217 [details] did not pass style-queue: ERROR: Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:1081: More than one command on the same line [whitespace/newline] [4] Total errors found: 1 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Carlos Garcia Campos
Comment 35 2016-06-28 23:34:30 PDT
Michael Catanzaro
Comment 36 2017-01-28 17:18:45 PST
I think this caused bug #167003.
Charlie Turner
Comment 37 2017-07-03 03:05:40 PDT
It's hard to say whether this bug is "resolved" due to the general nature of it, but there's another adaptive streaming issue which seem a bit gnarly. I've documented it over here: https://bugs.webkit.org/show_bug.cgi?id=172240 -- it could use some discussion IMO, and it collects together the various bits of investigation done in HLS so far.
Note You need to log in before you can comment on or make changes to this bug.