Bug 144040 - [GStreamer] Adaptive streaming issues
Summary: [GStreamer] Adaptive streaming issues
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Platform (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 138606 149672 151634 153587 154579 156186 156323 (view as bug list)
Depends on: 158740
Blocks: 152043
  Show dependency treegraph
 
Reported: 2015-04-22 03:34 PDT by Philippe Normand
Modified: 2019-03-13 05:07 PDT (History)
12 users (show)

See Also:


Attachments
patch (3.91 KB, patch)
2015-04-22 06:37 PDT, Philippe Normand
no flags Details | Formatted Diff | Diff
patch (2.45 KB, patch)
2015-04-23 06:33 PDT, Philippe Normand
no flags Details | Formatted Diff | Diff
patch (2.71 KB, patch)
2015-04-23 06:34 PDT, Philippe Normand
no flags Details | Formatted Diff | Diff
patch (3.20 KB, patch)
2015-06-18 07:15 PDT, Philippe Normand
mcatanzaro: review+
Details | Formatted Diff | Diff
Patch (6.75 KB, patch)
2016-05-05 05:07 PDT, Carlos Garcia Campos
pnormand: review+
pnormand: commit-queue-
Details | Formatted Diff | Diff
WIP (15.57 KB, patch)
2016-06-20 05:44 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch to include ChangeLog (18.70 KB, patch)
2016-06-27 00:07 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (20.58 KB, patch)
2016-06-27 23:59 PDT, Carlos Garcia Campos
pnormand: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Normand 2015-04-22 03:34:37 PDT
In the case of HLS playback the source element used for fragments downloading does a (sometimes too) quick state transition: PAUSED->READY->PAUSED. However our httpsrc element starts a GMainLoop in its PAUSED->READY transition without waiting its completion. So sometimes webKitWebSrcStop() will be called too late, during the READY->PAUSED transition and playback will stall.
Comment 1 Philippe Normand 2015-04-22 06:37:47 PDT
Created attachment 251310 [details]
patch
Comment 2 Philippe Normand 2015-04-23 06:33:52 PDT
Created attachment 251429 [details]
patch
Comment 3 Philippe Normand 2015-04-23 06:34:25 PDT
Created attachment 251430 [details]
patch
Comment 4 Carlos Garcia Campos 2015-05-05 01:28:47 PDT
Comment on attachment 251430 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=251430&action=review

> Source/WebCore/ChangeLog:9
> +        Block in state transition to prevent element to enter in an
> +        inconsistent state in the next READY->PAUSED transition.

to prevent element from entering? I'm not english expert at all, though. Could you elaborate a bit more? the bug title says "Adaptive streaming issues", what issues? what's exactly the problem we are solving and how?

> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:-419
> -    ASSERT(isMainThread());

So this could be used by any thread? or should use just invert the assert? we should make more obvious what is expected to run in each thread.
Comment 5 Philippe Normand 2015-06-18 07:10:54 PDT
Comment on attachment 251430 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=251430&action=review

>> Source/WebCore/ChangeLog:9
>> +        inconsistent state in the next READY->PAUSED transition.
> 
> to prevent element from entering? I'm not english expert at all, though. Could you elaborate a bit more? the bug title says "Adaptive streaming issues", what issues? what's exactly the problem we are solving and how?

I described the issue in the bug description but yeah, sure, I can do it here as well :)

>> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:-419
>> -    ASSERT(isMainThread());
> 
> So this could be used by any thread? or should use just invert the assert? we should make more obvious what is expected to run in each thread.

Any thread.
The adaptive streaming scenario is a bit special because the demuxer will ask the uridownloader to perform its task in a thread, AFAIK.
Comment 6 Philippe Normand 2015-06-18 07:15:18 PDT
Created attachment 255113 [details]
patch
Comment 7 Sebastian Dröge (slomo) 2016-01-03 00:43:32 PST
Alternatively you could block in READY->PAUSED, e.g. by adding an idle source to the main loop and triggering a condition variable from there. That's what is done elsewhere in GStreamer in such situations.
Comment 8 Philippe Normand 2016-01-04 04:43:24 PST
I don't think I'll land this patch just yet, bug 152043 is also related.
Comment 9 Philippe Normand 2016-04-05 00:23:20 PDT
*** Bug 156186 has been marked as a duplicate of this bug. ***
Comment 10 Philippe Normand 2016-04-05 00:23:28 PDT
*** Bug 154579 has been marked as a duplicate of this bug. ***
Comment 11 Philippe Normand 2016-04-05 00:24:43 PDT
*** Bug 151634 has been marked as a duplicate of this bug. ***
Comment 12 Philippe Normand 2016-04-05 00:24:55 PDT
*** Bug 149672 has been marked as a duplicate of this bug. ***
Comment 13 Philippe Normand 2016-04-05 00:25:23 PDT
*** Bug 138606 has been marked as a duplicate of this bug. ***
Comment 14 Michael Catanzaro 2016-04-05 09:36:52 PDT
It looks like we need to prioritize this issue.
Comment 15 Philippe Normand 2016-04-07 05:53:33 PDT
*** Bug 156323 has been marked as a duplicate of this bug. ***
Comment 16 Philippe Normand 2016-04-07 08:35:52 PDT
Comment on attachment 255113 [details]
patch

This patch is bitrotten.

The long-term fix will be to use the HTTP context sharing which should (I hope) be merged into gst 1.9 soon.
Comment 17 Carlos Garcia Campos 2016-05-05 05:07:01 PDT
Created attachment 278162 [details]
Patch

This fixes the crashes, but not the layout tests, those still time out for different reasons.
Comment 18 Philippe Normand 2016-05-05 05:49:06 PDT
Comment on attachment 278162 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=278162&action=review

Thanks for the explanation!

> Source/WebCore/ChangeLog:15
> +        owner is not expecting and causing rumntoime critical warnings and very often web process crashes.

typoe: rumntoime
Comment 19 Carlos Garcia Campos 2016-05-05 06:03:18 PDT
Committed r200455: <http://trac.webkit.org/changeset/200455>
Comment 20 Michael Catanzaro 2016-05-05 10:35:07 PDT
I think we need to file a GStreamer bug for this, or link to one here if one already exists.
Comment 21 Carlos Garcia Campos 2016-05-05 22:56:44 PDT
(In reply to comment #20)
> I think we need to file a GStreamer bug for this, or link to one here if one
> already exists.

That's the plan, but I didn't find the time yet
Comment 22 Philippe Normand 2016-05-24 00:06:37 PDT
*** Bug 153587 has been marked as a duplicate of this bug. ***
Comment 23 WebKit Commit Bot 2016-06-14 09:51:06 PDT
Re-opened since this is blocked by bug 158740
Comment 24 Michael Catanzaro 2016-06-14 09:53:41 PDT
FWIW as long as the GStreamer bug is fixed, I don't think it's necessarily desirable to have workarounds in WebKit... distros should just update GStreamer.
Comment 25 Carlos Garcia Campos 2016-06-14 22:09:20 PDT
(In reply to comment #23)
> Re-opened since this is blocked by bug 158740

What happens now in fb/twitter? does the web process crash?
Comment 26 Michael Catanzaro 2016-06-15 04:56:46 PDT
(In reply to comment #25)
> What happens now in fb/twitter? does the web process crash?

Andres, can you answer this? Did it crash often, sometimes, or never in 2.12.2?

I rolled out the patch because my understanding is these sites were usable in 2.12.2 but not in 2.12.3.
Comment 27 Andres Gomez Garcia 2016-06-15 05:29:51 PDT
(In reply to comment #25)
> (In reply to comment #23)
> > Re-opened since this is blocked by bug 158740
> 
> What happens now in fb/twitter? does the web process crash?

With pre 2.12.2, the process crashed every now and then. For me, mostly in FB since I don't use Twitter that much.

With 2.12.2, it crashed in Twitter very, very often. Basically, every time a HLS video was embedded in the page. Similar in FB. Only, if you control when to play the videos, you would avoid it to crash.

With 2.12.3, my experience is that FB gets stalled all the time. It is specially bad because it gets stalled when the front page is loading. Basically, you cannot do anything. I suppose it happens when in your front page appears some video but you don't need to scroll to it and try to play it, which would trigger the crash in former versions, it just gets stalled on loading.

In any case, this is just my experience. Others may have a different one.
Comment 28 Carlos Garcia Campos 2016-06-20 05:42:55 PDT
Ok, I think I've managed to fix this, but I'm not 100% sure because it's not always easy to reproduce all the deadlocks and threading issues are hard. Anyway, I've found two different deadlocks that are blocking the web process:

 - The first one is actually a set of deadlocks that can happen in many places. It happens because gst is using several threads to download manifest, fragments, monitor the downloads, etc. To download the fragments and manifest it always creates the source element in a separate thread, something that is not actually expected to happen in WebKit source element (see bug #152043). Our source element is always scheduling tasks (start, stop, need-data, enough-data and seek) to the main thread, and those downloads that use the ResourceHandleStreamingClient (there's no player associated) also happen in the main thread, libsoup calls all its async callbacks in the main thread. So, the result is that it can happen that we end up blocking the main thread in a lock until the download finishes, but the download never finishes because tasks are scheduled in the main thread that is blocked in a lock. I managed to fix this by always using a secondary thread for downloads made by ResourceHandleStreamingClient. My first idea was to do the download in the same secondary thread that calls changeState, or in the first NeedData (note that changeState and NeedData are called form different threads), but those secondary threads don't have a run loop running, and we need a different GMainContext in the running thread for libsoup to send callbacks to the right thread. So, in the end, I moved the download to a thread using its own run loop. I also had to refactor the tasks a bit, leaving the thread safe parts to be run in the calling thread allays, and only scheduling to the main thread in case of not using ResourceHandleStreamingClient and only for the non thread safe parts.

 - The second deadlock happens when the media player is destroyed by the garbacge collector and pipeline is set to NULL state. This always happens in the main thread, since the garbage colector frees the objects in the main thread. The situation is the following one:

Main Thread:
  - blocked on GST_PAD_STREAM_LOCK (post_activate() gstpad.c)

Secondary Thread:
  - blocked on GST_MANIFEST_LOCK (gst_adaptive_demux_src_event() reconfigure gstadaptivedemux.c)
  - has EXPOSE_LOCK of gstdecodebin2 taken from source_pad_blocked_cb()

Secondary Thread:
  - blocked on EXPOSE_LOCK (no_more_pads_cb() gstdecodebin2.c)
  - it seems to have GST_PAD_STREAM_LOCK of gstpad taken from gst_pad_send_event_unchecked()
  - has GST_MANIFEST_LOCK of gstadaptivedemux taken from _src_event, got fragment EOS

Secondary Thread:
  - infinite loop waiting for fragment download to finish (gst_adaptive_demux_stream_download_uri()

I haven't been able to fix or workaround this from WebKit code, the only chnage the seemed to work was releasing the manifest lock in gst_adaptive_demux_expose_streams before calling gst_element_no_more_pads) can taking it again. But I have no idea if the change is correct or just a workaround, the gst code is full of locks and conditions that is very difficult to follow.

So, I'll submit the WebKit patch here as WIP for now, and if Sebastian confirms the gst changes makes sense, I'll submit the patch to gst too. A possible workaround for WebKit, to avoid depending on changes in gst would also be appreciated. Please, try both patches and let me know if it fixes the problem for you. In the mean time I'll run the layout tests to make sure I haven't introduced any regression.
Comment 29 Carlos Garcia Campos 2016-06-20 05:44:21 PDT
Created attachment 281646 [details]
WIP
Comment 30 Carlos Garcia Campos 2016-06-20 05:47:59 PDT
You first need to revert the revert of r200455 since it was a perfectly valid fix (or use a newer version of gst-plugins-bad, since r200455 was indeed a workaround for a gst bug that was fixed already).
I forgot to say that this patch should also fix bug #152043, since we no longer use the main thread notifier for sources not created in the main thread.
Comment 31 Carlos Garcia Campos 2016-06-27 00:07:20 PDT
Created attachment 282104 [details]
Updated patch to include ChangeLog

Just added the changelog after confirming that the remaining deadlocks need to be fixed in gst. I'll try to find a workaround in WebKit, but it doesn't seem easy. Also remember that we need to unrollout r200455, so the patch won't apply to current trunk
Comment 32 Philippe Normand 2016-06-27 08:07:33 PDT
Comment on attachment 282104 [details]
Updated patch to include ChangeLog

Looks good to me but perhaps Zan or Martin would also like to step in.
Comment 33 Carlos Garcia Campos 2016-06-27 23:59:51 PDT
Created attachment 282217 [details]
Updated patch

It just includes r200455 as well, as requested by phil, so that it applies to current trunk
Comment 34 WebKit Commit Bot 2016-06-28 00:01:59 PDT
Attachment 282217 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:1081:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 35 Carlos Garcia Campos 2016-06-28 23:34:30 PDT
Committed r202615: <http://trac.webkit.org/changeset/202615>
Comment 36 Michael Catanzaro 2017-01-28 17:18:45 PST
I think this caused bug #167003.
Comment 37 Charlie Turner 2017-07-03 03:05:40 PDT
It's hard to say whether this bug is "resolved" due to the general nature of it, but there's another adaptive streaming issue which seem a bit gnarly. I've documented it over here: https://bugs.webkit.org/show_bug.cgi?id=172240 -- it could use some discussion IMO, and it collects together the various bits of investigation done in HLS so far.