Bug 173952

Summary: [GStreamer] Some layout tests issue "g_mutex_clear() called on uninitialised or locked mutex" and flaky crash in ~MediaPlayerPrivateGStreamerBase
Product: WebKit Reporter: Fujii Hironori <Hironori.Fujii>
Component: WebKitGTKAssignee: Fujii Hironori <Hironori.Fujii>
Status: RESOLVED FIXED    
Severity: Normal CC: bugs-noreply, buildbot, calvaris, cgarcia, commit-queue, magomez, pnormand
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
WIP patch
none
WIP patch
none
Patch
none
WIP patch with gst_element_get_state()
none
WiP patch
none
Patch
none
Patch none

Description Fujii Hironori 2017-06-28 19:24:20 PDT
> STDERR: g_mutex_clear() called on uninitialised or locked mutex

For example,
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r218899%20(1980)/media/media-source/media-source-abort-resets-parser-crash-log.txt

Callstack:

> Thread 1 (Thread 0x7f81a90c6a80 (LWP 3818)):
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f81b06aa3fa in __GI_abort () at abort.c:89
> #2  0x00007f81b51aa78f in g_mutex_clear () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.52.1/glib/gthread-posix.c:1300
> #3  0x00007f81bd783b81 in WebCore::MediaPlayerPrivateGStreamerBase::~MediaPlayerPrivateGStreamerBase() ()
> #4  0x00007f81bd7808c0 in WebCore::MediaPlayerPrivateGStreamer::~MediaPlayerPrivateGStreamer() ()
> #5  0x00007f81bd780a29 in WebCore::MediaPlayerPrivateGStreamer::~MediaPlayerPrivateGStreamer() ()
> #6  0x00007f81bd20e9aa in WebCore::MediaPlayer::~MediaPlayer() ()
> #7  0x00007f81bd20ea39 in WebCore::MediaPlayer::~MediaPlayer() [clone .localalias.101] ()
> #8  0x00007f81bcdc72a2 in WebCore::HTMLMediaElement::clearMediaPlayer(WebCore::HTMLMediaElementEnums::DelayedActionType) ()
> #9  0x00007f81bcdca111 in WebCore::HTMLMediaElement::stop() ()
> #10 0x00007f81bcc41e4b in WebCore::ScriptExecutionContext::stopActiveDOMObjects() ()
> #11 0x00007f81bcbcaa00 in WebCore::Document::stopActiveDOMObjects() ()
> #12 0x00007f81bcbcab07 in WebCore::Document::prepareForDestruction() ()
> #13 0x00007f81bd096d72 in WebCore::Frame::setView(WTF::RefPtr<WebCore::FrameView>&&) ()
> #14 0x00007f81bd0999cc in WebCore::Frame::createView(WebCore::IntSize const&, WebCore::Color const&, bool, WebCore::IntSize const&, WebCore::IntRect const&, bool, WebCore::ScrollbarMode, bool, WebCore::ScrollbarMode, bool) ()
> #15 0x00007f81b96f889b in WebKit::WebFrameLoaderClient::transitionToCommittedForNewPage() () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #16 0x00007f81bcf669aa in WebCore::FrameLoader::transitionToCommitted(WebCore::CachedPage*) ()
> #17 0x00007f81bcf66ae0 in WebCore::FrameLoader::commitProvisionalLoad() ()
> #18 0x00007f81bcf40b74 in WebCore::DocumentLoader::finishedLoading() ()
> #19 0x00007f81bcf41129 in WebCore::DocumentLoader::maybeLoadEmpty() ()
> #20 0x00007f81bcf43cdd in WebCore::DocumentLoader::startLoadingMainResource() ()
> #21 0x00007f81bcf674e8 in WebCore::FrameLoader::continueLoadAfterNavigationPolicy(WebCore::ResourceRequest const&, WebCore::FormState*, bool, WebCore::AllowNavigationToInvalidURL) ()
> #22 0x00007f81bcf941cb in WebCore::PolicyCallback::call(bool) ()
> #23 0x00007f81bcf99fda in WebCore::PolicyChecker::continueAfterNavigationPolicy(WebCore::PolicyAction) ()
> #24 0x00007f81b97003b8 in WebKit::WebFrameLoaderClient::dispatchDecidePolicyForNavigationAction(WebCore::NavigationAction const&, WebCore::ResourceRequest const&, WebCore::FormState*, WTF::Function<void (WebCore::PolicyAction)>&&) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #25 0x00007f81bcf9c611 in WebCore::PolicyChecker::checkNavigationPolicy(WebCore::ResourceRequest const&, bool, WebCore::DocumentLoader*, WebCore::FormState*, WTF::Function<void (WebCore::ResourceRequest const&, WebCore::FormState*, bool)>) ()
> #26 0x00007f81bcf678e4 in WebCore::FrameLoader::loadWithDocumentLoader(WebCore::DocumentLoader*, WebCore::FrameLoadType, WebCore::FormState*, WebCore::AllowNavigationToInvalidURL) ()
> #27 0x00007f81bcf6a1e1 in WebCore::FrameLoader::load(WebCore::FrameLoadRequest&&) ()
> #28 0x00007f81b972ad65 in WebKit::WebPage::loadRequest(WebKit::LoadParameters const&) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #29 0x00007f81b98fc6b7 in void IPC::handleMessage<Messages::WebPage::LoadAlternateHTMLString, WebKit::WebPage, void (WebKit::WebPage::*)(WebKit::LoadParameters const&)>(IPC::Decoder&, WebKit::WebPage*, void (WebKit::WebPage::*)(WebKit::LoadParameters const&)) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #30 0x00007f81b98fa359 in WebKit::WebPage::didReceiveWebPageMessage(IPC::Connection&, IPC::Decoder&) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #31 0x00007f81b9510789 in IPC::MessageReceiverMap::dispatchMessage(IPC::Connection&, IPC::Decoder&) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #32 0x00007f81b9672516 in WebKit::WebProcess::didReceiveMessage(IPC::Connection&, IPC::Decoder&) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #33 0x00007f81b950c64b in IPC::Connection::dispatchMessage(std::unique_ptr<IPC::Decoder, std::default_delete<IPC::Decoder> >) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #34 0x00007f81b950d57c in IPC::Connection::dispatchOneMessage() () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #35 0x00007f81b74373dd in WTF::RunLoop::performWork() () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
> #36 0x00007f81b746cd39 in WTF::RunLoop::RunLoop()::{lambda(void*)#1}::_FUN(void*) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
> #37 0x00007f81b51665ca in g_main_dispatch () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.52.1/glib/gmain.c:3212
> #38 g_main_context_dispatch () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.52.1/glib/gmain.c:3865
> #39 0x00007f81b5166948 in g_main_context_iterate () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.52.1/glib/gmain.c:3938
> #40 0x00007f81b5166c62 in g_main_loop_run () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/glib-2.52.1/glib/gmain.c:4134
> #41 0x00007f81b746d6e0 in WTF::RunLoop::run() () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libjavascriptcoregtk-4.0.so.18
> #42 0x00007f81b98a6b62 in int WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebProcessMain>(int, char**) () from /home/slave/webkitgtk/gtk-linux-64-release/build/WebKitBuild/Release/lib/libwebkit2gtk-4.0.so.37
> #43 0x00007f81b06962b1 in __libc_start_main (main=0x7f81bc799000 <main>, argc=2, argv=0x7ffebfbcc718, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffebfbcc708) at ../csu/libc-start.c:291
> #44 0x00007f81bc79954a in _start ()


Thread 29 seems to be locking the mutex.

> Thread 29 (Thread 0x7f80e3af9700 (LWP 7329)):
(...)
> #9  0x00007f81b6127e36 in gst_video_frame_map_id () at /home/slave/webkitgtk/gtk-linux-64-release-tests/build/WebKitBuild/DependenciesGTK/Source/gst-plugins-base-1.10.4/gst-libs/gst/video/video-frame.c:103
> #10 0x00007f81bd783414 in WebCore::MediaPlayerPrivateGStreamerBase::pushTextureToCompositor() ()
> #11 0x00007f81bd7836f8 in WebCore::MediaPlayerPrivateGStreamerBase::triggerRepaint(_GstSample*) ()
> #12 0x00007f81bd7839fe in WebCore::MediaPlayerPrivateGStreamerBase::newPrerollCallback(_GstElement*, WebCore::MediaPlayerPrivateGStreamerBase*) ()
> #13 0x00007f81ac9fc038 in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
(...)
Comment 1 Fujii Hironori 2017-06-29 00:59:09 PDT
Created attachment 314122 [details]
WIP patch

I created a WIP patch.

* Lock m_sampleMutex at the beginning of signal handlers
* Lock m_sampleMutex while unregistering singnal handlers
* Change the mutex type of m_sampleMutex to GRecMutex

This WIP patch seems to solve the problem.
I'm not sure this is a right fix.
Comment 2 Carlos Garcia Campos 2017-06-29 02:01:39 PDT
The sample mutex is not actually a GST mutex, so I wonder if we could use WTF locks instead.
Comment 3 Xabier Rodríguez Calvar 2017-06-29 02:21:07 PDT
(In reply to Carlos Garcia Campos from comment #2)
> The sample mutex is not actually a GST mutex, so I wonder if we could use
> WTF locks instead.

Yes, please.
Comment 4 Fujii Hironori 2017-06-29 03:36:17 PDT
Created attachment 314128 [details]
WIP patch

Thank you, KaL and Xabier.
I revised WIP patch using WTF::RecursiveLockAdapter<Lock>.

I think there are still tiny possibility that a destructed mutex will be locked.
But, I don't have any idea to solve that.
Comment 5 Carlos Garcia Campos 2017-06-29 03:50:57 PDT
(In reply to Fujii Hironori from comment #4)
> Created attachment 314128 [details]
> WIP patch
> 
> Thank you, KaL and Xabier.
> I revised WIP patch using WTF::RecursiveLockAdapter<Lock>.
> 
> I think there are still tiny possibility that a destructed mutex will be
> locked.
> But, I don't have any idea to solve that.

Do we really need recursive mutex? Is this something that has started to happen recently? It would be interesting to understand why this is happening now.
Comment 6 Fujii Hironori 2017-06-29 20:53:58 PDT
https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/1679
This is the oldest build which has same crash.

Maybe, Bug 173248 has introduced this bug.
Comment 7 Fujii Hironori 2017-06-30 00:10:07 PDT
Created attachment 314243 [details]
Patch
Comment 8 Carlos Garcia Campos 2017-06-30 00:47:50 PDT
I think the actual problem here is that cancelRepaint() does nothing in the case of GstGL when accelerated compositing is enabled. cancelRepaint() is called in the destructor before clearing the mutex, so we need to make it cancel the repaint to ensure the mutex is released.
Comment 9 Fujii Hironori 2017-06-30 06:53:38 PDT
Another idea is incrementing the ref count when adding to the Appsink and decrementing it when the Appsink is destructed.
Comment 10 Carlos Garcia Campos 2017-06-30 07:05:45 PDT
(In reply to Fujii Hironori from comment #9)
> Another idea is incrementing the ref count when adding to the Appsink and
> decrementing it when the Appsink is destructed.

Anything that doesn't require adding more locks is fine with me :-)
Comment 11 Fujii Hironori 2017-07-02 22:22:28 PDT
Created attachment 314447 [details]
WIP patch with gst_element_get_state()

(In reply to Fujii Hironori from comment #9)
> Another idea is incrementing the ref count when adding to the Appsink and
> decrementing it when the Appsink is destructed.

This my idea was wrong.
Incrementing ref count of MediaPlayerPrivateGStreamerBase won't trigger the destructor of MediaPlayerPrivateGStreamerBase.

(In reply to Carlos Garcia Campos from comment #8)
> I think the actual problem here is that cancelRepaint() does nothing in the
> case of GstGL when accelerated compositing is enabled. cancelRepaint() is
> called in the destructor before clearing the mutex, so we need to make it
> cancel the repaint to ensure the mutex is released.

I think unregistering singnal handler is the equivalent of the cancelRepaint.
The problem is that unregistering singnal handler is done asynchronously.

According to the document of gst_element_get_state(),
<https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstElement.html#gst-element-get-state>
gst_element_get_state() waits for the thread.
I created another WIP patch with it.
Comment 12 Xabier Rodríguez Calvar 2017-07-04 01:38:17 PDT
Comment on attachment 314447 [details]
WIP patch with gst_element_get_state()

View in context: https://bugs.webkit.org/attachment.cgi?id=314447&action=review

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:247
> +            GstState state;
> +            gst_element_get_state(appsink.get(), &state, nullptr, GST_CLOCK_TIME_NONE);

This looks like a hack to me.
Comment 13 Miguel Gomez 2017-07-06 03:22:57 PDT
I'm going to write a dissertation about the possible scenarios when a new sample arrives and how I think they work, and things that I think that could be improved or that I have doubts whether they are properly implemented. Please tell me what do you think. I have an idea to fix the bug but I need to be sure that my analysis is right. (I hope bugzilla doesn't screw the formatting).

All this happens inside MediaPlayerPrivateGStreamerBase.
When a new sample arrives from gstreamer, triggerRepaint() gets called (and run inside the gstreamer thread). It replaces the current sample with the new one and then there are 3 scenarios to paint the new sample:

* AC disabled (fallback videoSink is used). AC disabled gets detected by checking m_renderingCanBeAccelerated value. In this case a repaint gets scheduled in the main thread using m_drawTimer and the gstreamer thread waits in m_drawCondition. The main thread executes repaint(), which notifies the condition when it finishes to let the gstreamer thread continue.
Repaints can be cancelled in this case by stopping m_drawTimer and notifying the signal, as it's done. Calls to cancelRepaint() can be requested by the video sink at any moment.
Something that can be improved here is the repaint() function. It's currently checking some conditions to decide whether it was to notify the condition. But repaint() is only called when AC is disabled, and we always need to notify in that situation, so the checks are useless.


* AC enabled and using gst-gl (gl videoSink is used): In this case the frame in already in a texture and we just need to send the texture to the compositor. This is done in the gstreamer thread because it's a fast operation. No locks are involved here and no interaction with the main thread is required.
I have doubts about how we can cancel a repaint here. The only idea that comes to my mind is to set a flag from the main thread that avoids calling pushTextureToCompositor() from triggerRepaint() when it's set. Also, in this case cancelRepaint() only gets called in the destructor.
This seems to be the situation that causes the test crash. The destructor is running in the main thread while the gstreamer thread is executing pushTextureToCompositor(), which holds the sample lock. We have to guarantee that pushTextureToCompositor() is not running and is not going to run anymore before we clear the sample mutex.


* AC enabled and not using gst-gl (fallback videoSink is used): the frame is in a normal memory buffer and it needs to be copied into a texture before being sent to the compositor. The copy is performed in the compositor thread (I'm not 100% sure why, but I think it's because we want to allocate the destination texture there). A call to pushTextureToCompositor() is scheduled on the compositor thread and the gstreamer thread waits on m_drawCondition, which gets notified at the end of pushTextureToCompositor().
Currently cancelRepaint() tries to cancel this by stopping the m_drawTimer (which is wrong cause it's not used) and notifying m_drawCondition. This releases the GStreamer thread, that could perform other operations before  pushTextureToCompositor() hasn't finished, which I think is not a good idea. I think we should not notify the condition in this case and let pushTextureToCompositor() finish.
In this case, cancelRepaint() can be invoked from the videoSink at any moment as well.


I'll send a WiP patch with a fix proposal so we can discuss it.
Comment 14 Miguel Gomez 2017-07-06 04:26:24 PDT
Created attachment 314718 [details]
WiP patch

Patch changes:

* repaint() always notifies m_drawCondition
* cancelRepaint() only does something when AC is disabled
* triggerRepaint() locks m_drawMutex when handling both AC cases. If after getting the lock the m_deleting flag is enabled, don't try to render the new frame.
* destructor:
  * disconnect the signals before
  * when AC is enabled, lock m_drawMutex and set the m_deleting flag to true. This ensures that we wait there until pushTextureToCompositor() is finished if it's being executed, and that it won't be executed if it's pending.

I haven't extensively tested this yet.
Comment 15 Xabier Rodríguez Calvar 2017-07-06 08:50:02 PDT
Comment on attachment 314718 [details]
WiP patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314718&action=review

A more expert in rendering should review this but other than the missing changelog I see the patch ok.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:248
> +    // If AC is disabled, this will release the gstreamer thread from the condition. Otherwise

GStreamer

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:252
> +    // FIXME: I assume that at this point on no new signals can arrive, but there's the possibility that a last

Is it your intention to write a better patch for this or why do you write FIXME here?
Comment 16 Fujii Hironori 2017-07-06 18:38:33 PDT
I think your patch doesn't solve the original problem of this bug.
m_sampleMutex and m_drawMutex will be locked after destructed.

I've found another problem.
m_renderingCanBeAccelerated is accessed from multiple threads without mutex.
Comment 17 Miguel Gomez 2017-07-07 00:38:55 PDT
(In reply to Xabier Rodríguez Calvar from comment #15)
> 
> Is it your intention to write a better patch for this or why do you write
> FIXME here?

It's a WiP patch. I'd like to get some feedback from Carlos García regarding the approach before proposing a formal one.
Comment 18 Miguel Gomez 2017-07-18 06:08:22 PDT
Created attachment 315792 [details]
Patch
Comment 19 Build Bot 2017-07-18 06:10:07 PDT
Attachment 315792 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:498:  More than one command on the same line  [whitespace/newline] [4]
ERROR: Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:554:  More than one command on the same line  [whitespace/newline] [4]
Total errors found: 2 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 20 Xabier Rodríguez Calvar 2017-07-18 07:45:44 PDT
Comment on attachment 315792 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=315792&action=review

Maybe Žan can have another look but you have r=me.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.h:174
> +    void triggerVolumeChangedNotification();
>      static void muteChangedCallback(MediaPlayerPrivateGStreamerBase*);
> +    void triggerMuteChangedNotification();

These two functions are called at the end of the callers, they are called only once and have just one line (besides the lock) so I think you can inline them.
Comment 21 Miguel Gomez 2017-07-19 02:02:35 PDT
I've been talking to Carlos García about how the MainThreadNotifier works, and it seems that adding the locks to the volume and mute functions is not necessary. Makes the patch simpler :). I'll upload a new version.
Comment 22 Miguel Gomez 2017-07-19 10:27:16 PDT
Created attachment 315932 [details]
Patch
Comment 23 Fujii Hironori 2017-07-19 17:56:03 PDT
The specification of gst_element_set_state():

https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstElement.html#gst-element-set-state

> This function can return GST_STATE_CHANGE_ASYNC, in which case
> the element will perform the remainder of the state change
> asynchronously in another thread. An application can use
> gst_element_get_state() to wait for the completion of the state
> change or it can wait for a GST_MESSAGE_ASYNC_DONE or
> GST_MESSAGE_STATE_CHANGED on the bus.
> 
> State changes to GST_STATE_READY or GST_STATE_NULL never return
> GST_STATE_CHANGE_ASYNC.

I didn't read the last sentence. Looks a nice fix to me.
I have tested your patch locally and confirmed the crash was fixed.
Comment 24 Carlos Garcia Campos 2017-07-19 22:49:11 PDT
Comment on attachment 315932 [details]
Patch

Excellent! this looks much better, thanks.
Comment 25 WebKit Commit Bot 2017-07-20 06:07:05 PDT
Comment on attachment 315932 [details]
Patch

Clearing flags on attachment: 315932

Committed r219681: <http://trac.webkit.org/changeset/219681>
Comment 26 WebKit Commit Bot 2017-07-20 06:07:07 PDT
All reviewed patches have been landed.  Closing bug.