Summary: | REGRESSION(r243197): [GStreamer] Web process hangs when scrolling twitter timeline which contains HLS videos | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Jérémy Lal <kapouer> | ||||||||||||||
Component: | WebKitGTK | Assignee: | Philippe Normand <pnormand> | ||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||
Severity: | Normal | CC: | andrea.giammarchi, bugs-noreply, calvaris, commit-queue, cturner, ews-watchlist, mcatanzaro, pnormand, richcoe2, svillar | ||||||||||||||
Priority: | P2 | ||||||||||||||||
Version: | Other | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Attachments: |
|
Description
Jérémy Lal
2019-05-03 06:58:03 PDT
Also it did not happen with webkit2gtk 2.22.7. Please provide a URL test-case. By 2.24, do you mean 2.24.0 or 2.24.1 ? 2.24.1, the one currently in debian/testing. https://twitter.com/search?q=video&src=typd and scroll down until it hangs. Ok, this is a deadlock between adaptivedemux and the webkithttpsrc element used for HLS manifest or fragments downloading. I'm investigating this issue. Created attachment 369484 [details]
Patch
Comment on attachment 369484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=369484&action=review > Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:73 > + GST_CAT_LEVEL_LOG(webkit_media_player_debug, GST_LEVEL_WARNING, m_object.get(), "Notifier was invalidated, bailing out"); Do you think this notification should never happen? I don't think it's so weird that a task has been enqueued, notifier becomes invalid and the task is run. Wouldn't it be interesting to make this at least INFO? > Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 > + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait? > Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:158 > + GRefPtr<GstObject> m_object; I think this should be called m_gstObject. > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:614 > + GST_DEBUG_OBJECT(src, "Closing session, loader: %p, resource: %p", priv->loader.get(), priv->resource.get()); You could put this as else of the next if > Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:649 > + return TRUE; Shouldn't we unref the event before this here? Comment on attachment 369484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=369484&action=review >> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:73 >> + GST_CAT_LEVEL_LOG(webkit_media_player_debug, GST_LEVEL_WARNING, m_object.get(), "Notifier was invalidated, bailing out"); > > Do you think this notification should never happen? I don't think it's so weird that a task has been enqueued, notifier becomes invalid and the task is run. Wouldn't it be interesting to make this at least INFO? Yes, makes sense :) >> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 >> + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); > > Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait? Hum, I'm not sure ... I'll try. >> Source/WebCore/platform/graphics/gstreamer/WebKitWebSourceGStreamer.cpp:649 >> + return TRUE; > > Shouldn't we unref the event before this here? No, gst_base_src_event() does it for us. Comment on attachment 369484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=369484&action=review >>> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 >>> + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); >> >> Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait? > > Hum, I'm not sure ... I'll try. Actually I'm not sure I understood what you meant. By notification flag, do you mean m_pendingNotifications? If so, I think the existing !isMainThread() test before the wait is more than enough already :) Comment on attachment 369484 [details] Patch Attachment 369484 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/12143904 New failing tests: security/contentSecurityPolicy/video-with-file-url-allowed-by-media-src-star-with-AllowContentSecurityPolicySourceStarToMatchAnyProtocol-enabled.html storage/indexeddb/modern/gc-closes-database.html http/tests/css/filters-on-iframes.html Created attachment 369493 [details]
Archive of layout-test-results from ews215 for win-future
The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews215 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
False positive (again). Comment on attachment 369484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=369484&action=review > Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:91 > + m_synchronousNotificationCondition.notifyOne(); While answering your comment below I realized this could be an issue as well because of having now shared lock and condition. Imagine you have two pending notifications, you could be waking the wrong one here, couldn't you? I think a conditional wait below could fix this as well, couldn't it? >>>> Source/WebCore/platform/graphics/gstreamer/MainThreadNotifier.h:97 >>>> + m_synchronousNotificationCondition.wait(m_synchronousNotificationMutex); >>> >>> Considering we are waking up everybody below, shouldn't we check for the notification flag to be still active? Maybe a conditional wait? >> >> Hum, I'm not sure ... I'll try. > > Actually I'm not sure I understood what you meant. By notification flag, do you mean m_pendingNotifications? If so, I think the existing !isMainThread() test before the wait is more than enough already :) Yes, I mean to check the m_pendingNotifications flags. The use case I am thinking of is you cancel a certain notification type but then you get notified for all other types and wake them here and shouldn't they be still waiting? There's another issue to address, 2 webkitwebsrc operating from different threads, both requiring access to the main thread at the same time, leading to another deadlock... Created attachment 371048 [details]
WiP patch
I spent a lot of time debugging this, and it seems like we can't achieve deadlock freedom with the design of the adaptive demuxer, and the constraint that all our network operations happen on the main thread. I tried using the AbortableTaskQueue abstraction instead of the MainThreadNotifier, which seemed in the same vain as philn's changes to the MainThreadNotifier, but there's no clear time when a startAborting phase is going to guarantee adaptivedemux doesn't kick off new get_range requests at an unfortunate time for us. It's not just in the destruction phase of the player when this can happen, it can also happen in changePipelineState calls to READY as well. I tried making the WebSrc behave asynchronously, such that no state change will block (see gst_base_src_set_async). That solves the MediaPlayer taking the main thread during shutdown and blocking on the fragment downloader threads, but there are other issues with this approach that caused it to be a dead-end that are internal to GStreamer. I tried adding signal handlers to collect all created websrcs inside the player, such that at shutdown time, before we issue a change_state to NULL, we can forcibly close all network sessions on the main thread before shutting the rest of the pipeline down. This helps a lot, but it doesn't stop the tricky case of adaptivedemux's EOS handling. When the websrc completes the download of a manifest, adaptive demux uses the EOS on the manifest file to signal it should start new srcs to download the fragments. This can happen, due to bad timing, at shutdown and render the careful collection of websrcs in the player pointless. I tried a few variations on philn's idea of adding timeouts to some of the conditional waits. These kinds of solutions can get real tricky, and indeed I didn't manage to eliminate the deadlocks, they were just a little more flakey rather than easy to trigger. I tried sending EOS's and/or GST_MESSAGE_ERRORS to the uridownloader, to try and get it to unblock itself in the range requests at the correct times. These are the only ways to signal that element that it should interrupt it's blocking operations. Again, improvements, but not complete solutions. The fundamental problem here is that the EOS handler takes an API_LOCK in the manifest, which blocks any state changes happening until fragments are downloaded (state changes also need the API lock). If, during the network request window, the player is destroyed, which can happen whenever with a scrolling view of players, the main thread is taken over and can not complete it's state change, and the network request completion signal never makes it back to the main thread. Deadlock. So, I only see three options now, 1) Someone else takes a look and hopefully spots a functional approach to this issue 2) We revert using webkitwebsrc in the adaptivedemuxers (bye sandboxing) 3) We stop advertising support for application/x-hls. I'd like for 1) to happen :), 2) is sad times, and 3) is not that crazy but seemingly a bit impractical. Chrome & FF don't support HLS natively AFAICT, what happens on Twitter is a JS library does the HLS protocol and exposes it as MSE blobs. Unfortunately, for reasons I didn't debug, when you fake the user agent as FF, no videos play. I suspect either an issue in our MSE player, or something FF specific in the HLS library twitter is using. Unfortunately because we're a WebKit, most sites will want to using the native HLS support, because Safari supports it well I assume. (In reply to Charlie Turner from comment #15) > 2) We revert using webkitwebsrc in the adaptivedemuxers (bye sandboxing) That would also reintroduce CVE-2019-11070. *** Bug 198857 has been marked as a duplicate of this bug. *** *** Bug 198958 has been marked as a duplicate of this bug. *** *** Bug 199217 has been marked as a duplicate of this bug. *** *** Bug 199287 has been marked as a duplicate of this bug. *** Created attachment 373468 [details]
Patch
Above patch also a bit of a WIP, since the upstream GStreamer review has not begun yet. Charlie, you're a hero! But what is the plan for older versions of GStreamer? I think it's OK to regress functionality if GStreamer isn't new enough, but of course not OK to deadlock. (In reply to Michael Catanzaro from comment #23) > Charlie, you're a hero! But what is the plan for older versions of GStreamer? > > I think it's OK to regress functionality if GStreamer isn't new enough, but > of course not OK to deadlock. Thanks :blush: but lets see how it settles with users, hopefully I can get some testers here with the patch. I think we have to just disable HLS support for now until the GStreamer patch makes its way to users through the distro channels. Especially on Twitter this is going to be painful for those users who enjoy watching videos on that site. Created attachment 373621 [details]
Patch for landing
Created attachment 373622 [details]
Patch for landing
Comment on attachment 373622 [details] Patch for landing Clearing flags on attachment: 373622 Committed r247215: <https://trac.webkit.org/changeset/247215> All reviewed patches have been landed. Closing bug. the patch is working for me. |