Summary: | [GStreamer] Some layout tests issue "g_mutex_clear() called on uninitialised or locked mutex" and flaky crash in ~MediaPlayerPrivateGStreamerBase | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Fujii Hironori <Hironori.Fujii> | ||||||||||||||||
Component: | WebKitGTK | Assignee: | Fujii Hironori <Hironori.Fujii> | ||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||
Severity: | Normal | CC: | bugs-noreply, buildbot, calvaris, cgarcia, commit-queue, magomez, pnormand | ||||||||||||||||
Priority: | P2 | ||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Attachments: |
|
Description
Fujii Hironori
2017-06-28 19:24:20 PDT
Created attachment 314122 [details]
WIP patch
I created a WIP patch.
* Lock m_sampleMutex at the beginning of signal handlers
* Lock m_sampleMutex while unregistering singnal handlers
* Change the mutex type of m_sampleMutex to GRecMutex
This WIP patch seems to solve the problem.
I'm not sure this is a right fix.
The sample mutex is not actually a GST mutex, so I wonder if we could use WTF locks instead. (In reply to Carlos Garcia Campos from comment #2) > The sample mutex is not actually a GST mutex, so I wonder if we could use > WTF locks instead. Yes, please. Created attachment 314128 [details]
WIP patch
Thank you, KaL and Xabier.
I revised WIP patch using WTF::RecursiveLockAdapter<Lock>.
I think there are still tiny possibility that a destructed mutex will be locked.
But, I don't have any idea to solve that.
(In reply to Fujii Hironori from comment #4) > Created attachment 314128 [details] > WIP patch > > Thank you, KaL and Xabier. > I revised WIP patch using WTF::RecursiveLockAdapter<Lock>. > > I think there are still tiny possibility that a destructed mutex will be > locked. > But, I don't have any idea to solve that. Do we really need recursive mutex? Is this something that has started to happen recently? It would be interesting to understand why this is happening now. https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/1679 This is the oldest build which has same crash. Maybe, Bug 173248 has introduced this bug. Created attachment 314243 [details]
Patch
I think the actual problem here is that cancelRepaint() does nothing in the case of GstGL when accelerated compositing is enabled. cancelRepaint() is called in the destructor before clearing the mutex, so we need to make it cancel the repaint to ensure the mutex is released. Another idea is incrementing the ref count when adding to the Appsink and decrementing it when the Appsink is destructed. (In reply to Fujii Hironori from comment #9) > Another idea is incrementing the ref count when adding to the Appsink and > decrementing it when the Appsink is destructed. Anything that doesn't require adding more locks is fine with me :-) Created attachment 314447 [details] WIP patch with gst_element_get_state() (In reply to Fujii Hironori from comment #9) > Another idea is incrementing the ref count when adding to the Appsink and > decrementing it when the Appsink is destructed. This my idea was wrong. Incrementing ref count of MediaPlayerPrivateGStreamerBase won't trigger the destructor of MediaPlayerPrivateGStreamerBase. (In reply to Carlos Garcia Campos from comment #8) > I think the actual problem here is that cancelRepaint() does nothing in the > case of GstGL when accelerated compositing is enabled. cancelRepaint() is > called in the destructor before clearing the mutex, so we need to make it > cancel the repaint to ensure the mutex is released. I think unregistering singnal handler is the equivalent of the cancelRepaint. The problem is that unregistering singnal handler is done asynchronously. According to the document of gst_element_get_state(), <https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstElement.html#gst-element-get-state> gst_element_get_state() waits for the thread. I created another WIP patch with it. Comment on attachment 314447 [details] WIP patch with gst_element_get_state() View in context: https://bugs.webkit.org/attachment.cgi?id=314447&action=review > Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:247 > + GstState state; > + gst_element_get_state(appsink.get(), &state, nullptr, GST_CLOCK_TIME_NONE); This looks like a hack to me. I'm going to write a dissertation about the possible scenarios when a new sample arrives and how I think they work, and things that I think that could be improved or that I have doubts whether they are properly implemented. Please tell me what do you think. I have an idea to fix the bug but I need to be sure that my analysis is right. (I hope bugzilla doesn't screw the formatting). All this happens inside MediaPlayerPrivateGStreamerBase. When a new sample arrives from gstreamer, triggerRepaint() gets called (and run inside the gstreamer thread). It replaces the current sample with the new one and then there are 3 scenarios to paint the new sample: * AC disabled (fallback videoSink is used). AC disabled gets detected by checking m_renderingCanBeAccelerated value. In this case a repaint gets scheduled in the main thread using m_drawTimer and the gstreamer thread waits in m_drawCondition. The main thread executes repaint(), which notifies the condition when it finishes to let the gstreamer thread continue. Repaints can be cancelled in this case by stopping m_drawTimer and notifying the signal, as it's done. Calls to cancelRepaint() can be requested by the video sink at any moment. Something that can be improved here is the repaint() function. It's currently checking some conditions to decide whether it was to notify the condition. But repaint() is only called when AC is disabled, and we always need to notify in that situation, so the checks are useless. * AC enabled and using gst-gl (gl videoSink is used): In this case the frame in already in a texture and we just need to send the texture to the compositor. This is done in the gstreamer thread because it's a fast operation. No locks are involved here and no interaction with the main thread is required. I have doubts about how we can cancel a repaint here. The only idea that comes to my mind is to set a flag from the main thread that avoids calling pushTextureToCompositor() from triggerRepaint() when it's set. Also, in this case cancelRepaint() only gets called in the destructor. This seems to be the situation that causes the test crash. The destructor is running in the main thread while the gstreamer thread is executing pushTextureToCompositor(), which holds the sample lock. We have to guarantee that pushTextureToCompositor() is not running and is not going to run anymore before we clear the sample mutex. * AC enabled and not using gst-gl (fallback videoSink is used): the frame is in a normal memory buffer and it needs to be copied into a texture before being sent to the compositor. The copy is performed in the compositor thread (I'm not 100% sure why, but I think it's because we want to allocate the destination texture there). A call to pushTextureToCompositor() is scheduled on the compositor thread and the gstreamer thread waits on m_drawCondition, which gets notified at the end of pushTextureToCompositor(). Currently cancelRepaint() tries to cancel this by stopping the m_drawTimer (which is wrong cause it's not used) and notifying m_drawCondition. This releases the GStreamer thread, that could perform other operations before pushTextureToCompositor() hasn't finished, which I think is not a good idea. I think we should not notify the condition in this case and let pushTextureToCompositor() finish. In this case, cancelRepaint() can be invoked from the videoSink at any moment as well. I'll send a WiP patch with a fix proposal so we can discuss it. Created attachment 314718 [details]
WiP patch
Patch changes:
* repaint() always notifies m_drawCondition
* cancelRepaint() only does something when AC is disabled
* triggerRepaint() locks m_drawMutex when handling both AC cases. If after getting the lock the m_deleting flag is enabled, don't try to render the new frame.
* destructor:
* disconnect the signals before
* when AC is enabled, lock m_drawMutex and set the m_deleting flag to true. This ensures that we wait there until pushTextureToCompositor() is finished if it's being executed, and that it won't be executed if it's pending.
I haven't extensively tested this yet.
Comment on attachment 314718 [details] WiP patch View in context: https://bugs.webkit.org/attachment.cgi?id=314718&action=review A more expert in rendering should review this but other than the missing changelog I see the patch ok. > Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:248 > + // If AC is disabled, this will release the gstreamer thread from the condition. Otherwise GStreamer > Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:252 > + // FIXME: I assume that at this point on no new signals can arrive, but there's the possibility that a last Is it your intention to write a better patch for this or why do you write FIXME here? I think your patch doesn't solve the original problem of this bug. m_sampleMutex and m_drawMutex will be locked after destructed. I've found another problem. m_renderingCanBeAccelerated is accessed from multiple threads without mutex. (In reply to Xabier Rodríguez Calvar from comment #15) > > Is it your intention to write a better patch for this or why do you write > FIXME here? It's a WiP patch. I'd like to get some feedback from Carlos García regarding the approach before proposing a formal one. Created attachment 315792 [details]
Patch
Attachment 315792 [details] did not pass style-queue:
ERROR: Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:498: More than one command on the same line [whitespace/newline] [4]
ERROR: Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:554: More than one command on the same line [whitespace/newline] [4]
Total errors found: 2 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 315792 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=315792&action=review Maybe Žan can have another look but you have r=me. > Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.h:174 > + void triggerVolumeChangedNotification(); > static void muteChangedCallback(MediaPlayerPrivateGStreamerBase*); > + void triggerMuteChangedNotification(); These two functions are called at the end of the callers, they are called only once and have just one line (besides the lock) so I think you can inline them. I've been talking to Carlos García about how the MainThreadNotifier works, and it seems that adding the locks to the volume and mute functions is not necessary. Makes the patch simpler :). I'll upload a new version. Created attachment 315932 [details]
Patch
The specification of gst_element_set_state(): https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstElement.html#gst-element-set-state > This function can return GST_STATE_CHANGE_ASYNC, in which case > the element will perform the remainder of the state change > asynchronously in another thread. An application can use > gst_element_get_state() to wait for the completion of the state > change or it can wait for a GST_MESSAGE_ASYNC_DONE or > GST_MESSAGE_STATE_CHANGED on the bus. > > State changes to GST_STATE_READY or GST_STATE_NULL never return > GST_STATE_CHANGE_ASYNC. I didn't read the last sentence. Looks a nice fix to me. I have tested your patch locally and confirmed the crash was fixed. Comment on attachment 315932 [details]
Patch
Excellent! this looks much better, thanks.
Comment on attachment 315932 [details] Patch Clearing flags on attachment: 315932 Committed r219681: <http://trac.webkit.org/changeset/219681> All reviewed patches have been landed. Closing bug. |