Bug 180978 - Deadlock in GStreamer video sink during shutdown
Summary: Deadlock in GStreamer video sink during shutdown
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-12-19 09:54 PST by Sebastian Dröge (slomo)
Modified: 2018-01-15 02:51 PST (History)
9 users (show)

See Also:


Attachments
0001-GStreamer-Don-t-wait-for-draw-condition-variable-whe.patch (5.24 KB, patch)
2017-12-19 09:56 PST, Sebastian Dröge (slomo)
no flags Details | Formatted Diff | Diff
0001-GStreamer-Don-t-wait-for-draw-condition-variable-whe.patch (4.46 KB, patch)
2017-12-19 10:03 PST, Sebastian Dröge (slomo)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Dröge (slomo) 2017-12-19 09:54:54 PST
There's a rare deadlock in the GStreamer video sink on shutdown. What can happen is that the video sink is waiting on the draw condition variable for the main thread to actually render the frame, while the main thread is waiting for the GStreamer video sink (and its streaming thread) to shut down.

Patch follows in a minute, but it needs some longer testing to make sure this problem is solved for real. In the testcase it happens about once every 24 hours.
Comment 1 Sebastian Dröge (slomo) 2017-12-19 09:56:27 PST
Created attachment 329768 [details]
0001-GStreamer-Don-t-wait-for-draw-condition-variable-whe.patch
Comment 2 Sebastian Dröge (slomo) 2017-12-19 10:03:51 PST
Created attachment 329769 [details]
0001-GStreamer-Don-t-wait-for-draw-condition-variable-whe.patch

There was some locale confusion by GIT, this should be better now
Comment 3 Sebastian Dröge (slomo) 2017-12-21 06:14:10 PST
(In reply to Sebastian Dröge (slomo) from comment #0)

> Patch follows in a minute, but it needs some longer testing to make sure
> this problem is solved for real. In the testcase it happens about once every
> 24 hours.

Probably confirmed, this deadlock did not happen again
Comment 4 Carlos Garcia Campos 2018-01-03 03:33:48 PST
Miguel, could you review this? you were the last one dealing with the deadlocks related to the draw timer. I'll r+ it if approved by Miguel.
Comment 5 Xabier Rodríguez Calvar 2018-01-09 00:43:46 PST
(In reply to Carlos Garcia Campos from comment #4)
> Miguel, could you review this? you were the last one dealing with the
> deadlocks related to the draw timer. I'll r+ it if approved by Miguel.

I agree, just pinged him.
Comment 6 Miguel Gomez 2018-01-09 01:12:24 PST
(In reply to Sebastian Dröge (slomo) from comment #3)
> (In reply to Sebastian Dröge (slomo) from comment #0)
> 
> > Patch follows in a minute, but it needs some longer testing to make sure
> > this problem is solved for real. In the testcase it happens about once every
> > 24 hours.
> 
> Probably confirmed, this deadlock did not happen again

Sorry for the late reply.

Is this issue happening when accelerated compositing is disabled? Or does it happen with accelerated compositing enabled but gstreamer-gl is disabled? I ask because the patch touches the two cases but I'm not sure which is one you're trying to fix.

My bet is that this is the AC enabled non gst-gl case, cause I think with AC disabled the deadlock should not be happening. In that case, I think what we need is just to modify MediaPlayerPrivateGStreamerBase::cancelRepaint() to notify drawCondition always (as you did in your patch, but we don't need the m_drawCancelled variable) instead of just doing it for the non AC case. I think that the problem we find here is that the compositing thread has been destroyed so the gstreamer thread will never get the drawCondition notification from it, and we need to send it from the main thread prior to destroying the player.
Comment 7 Sebastian Dröge (slomo) 2018-01-09 05:50:59 PST
It happens when accelerated compositing is disabled, that's the case I was debugging and which this patch is fixing.

I only touched that other case for completeness to ensure that nothing is ever waiting on the condition variable during shutdown, no matter which code path is taken.


Why is the cancelled boolean unneeded in your opinion? One potential problem here is that the condition variable is signalled (due to cancelling) before waiting happens, and then it would be waiting forever if we didn't have another boolean flag that can be checked before waiting to decide whether we have to wait or should not wait because nothing is ever going to wake us up again.
Comment 8 Miguel Gomez 2018-01-09 07:01:11 PST
(In reply to Sebastian Dröge (slomo) from comment #7)
> It happens when accelerated compositing is disabled, that's the case I was
> debugging and which this patch is fixing.
> 
> I only touched that other case for completeness to ensure that nothing is
> ever waiting on the condition variable during shutdown, no matter which code
> path is taken.

Ah, ok. Bad assumption from my side then.

> Why is the cancelled boolean unneeded in your opinion? One potential problem
> here is that the condition variable is signalled (due to cancelling) before
> waiting happens, and then it would be waiting forever if we didn't have
> another boolean flag that can be checked before waiting to decide whether we
> have to wait or should not wait because nothing is ever going to wake us up
> again.

The boolean comment was in case the problem was with AC enabled. But it's not the case.

cancelRepaint() gets called in the destructor after the sink's repaint callback is disconnected. I was assuming that after this no new repaint callbacks could happen, but if you were able to reproduce a situation where cancelRepaint() notifies the condition before triggerRepaint() waits on it, then it means that after disconnecting the signals there might be callbacks running anyway. The usage of the boolean makes sense then.

But I'm not sure about trying to fix the AC case as well, as in that case the condition is not notified from the main thread but from the compositing one. And notifying it from the main thread could cause gstreamer to release the frame while the compositor thread is still using it. In this case I would prefer to leave the behavior as is: the main thread is blocked while trying to set the pipeline to null while the compositor thread is using the frame. Once the compositor thread notifies the gstreamer thread, this can proceed to set the null state and then the main thread can continue.
Comment 9 Sebastian Dröge (slomo) 2018-01-09 07:43:12 PST
(In reply to Miguel Gomez from comment #8)
> But I'm not sure about trying to fix the AC case as well, as in that case
> the condition is not notified from the main thread but from the compositing
> one. And notifying it from the main thread could cause gstreamer to release
> the frame while the compositor thread is still using it. In this case I
> would prefer to leave the behavior as is: the main thread is blocked while
> trying to set the pipeline to null while the compositor thread is using the
> frame. Once the compositor thread notifies the gstreamer thread, this can
> proceed to set the null state and then the main thread can continue.

Is it guaranteed however that in the AC case the compositor thread is always going to wake up the condition variable, even if both the GStreamer streaming thread and the main thread are blocked?

Also even in the AC case it seems to make sense to remember if we were cancelled (i.e. if we're shutting down), and if so just not wait (or render or anything) at all.
Comment 10 Sebastian Dröge (slomo) 2018-01-13 11:03:08 PST
How should we move forward here?
Comment 11 Miguel Gomez 2018-01-15 01:21:46 PST
(In reply to Sebastian Dröge (slomo) from comment #10)
> How should we move forward here?

Let's just go ahead with your patch. Carlos, could you r+, please?
Comment 12 Philippe Normand 2018-01-15 01:56:35 PST
Sebastian, do you want the commit-queue to land this patch? I don't remember if you have a SVN account or not.
Comment 13 Sebastian Dröge (slomo) 2018-01-15 02:27:08 PST
(In reply to Philippe Normand from comment #12)
> Sebastian, do you want the commit-queue to land this patch? I don't remember
> if you have a SVN account or not.

I do, but let's just have the commit-queue handle that. I'm lazy :)
Comment 14 WebKit Commit Bot 2018-01-15 02:50:49 PST
Comment on attachment 329769 [details]
0001-GStreamer-Don-t-wait-for-draw-condition-variable-whe.patch

Clearing flags on attachment: 329769

Committed r226947: <https://trac.webkit.org/changeset/226947>
Comment 15 WebKit Commit Bot 2018-01-15 02:50:50 PST
All reviewed patches have been landed.  Closing bug.
Comment 16 Radar WebKit Bug Importer 2018-01-15 02:51:36 PST
<rdar://problem/36517690>