RESOLVED FIXED Bug 274261
REGRESSION(276798@main): [GStreamer] Excessive CPU usage from cached web process after closing nbcnews.com page, failure to display web content after reopening page
https://bugs.webkit.org/show_bug.cgi?id=274261
Summary REGRESSION(276798@main): [GStreamer] Excessive CPU usage from cached web proc...
Michael Catanzaro
Reported 2024-05-16 08:25:51 PDT
Reproducer: visit https://www.nbcnews.com/nbc-out/out-news/florida-teen-says-was-denied-entry-prom-wearing-suit-rcna152440 in Epiphany Tech Preview with WebKitGTK 2.45.2. Close the page. When the page is closed, the associated web process keeps running (because it enters the web process cache) and begins excessively churning CPU, which is terrible for the user's battery life. If I then reload the same page in a different web view, WebKit is unable to render anything because the cached web process is out of control. I only noticed because my system fans start running at a higher speed when this happens. (Cached web processes are supposed to be suspended, but this has never actually worked, bug #262794. It's also not really relevant, because even if the process was properly suspended and then resumed, that would surely only defer this bug such that it doesn't occur until the page is reopened.) I tried taking a few backtraces of the out-of-control web process. Unfortunately most of them point to polling in the main context, which is not useful. But here are some that show the web process doing basically random web process stuff: (gdb) bt #0 0x00007f6676f3747e in WebCore::HTMLMediaElement::mediaPlayerReadyStateChanged() () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #1 0x00007f6676d0fbaf in WebCore::EventLoop::run(std::optional<WTF::ApproximateTime>) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #2 0x00007f6676db2a0c in WebCore::WindowEventLoop::didReachTimeToRun() () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #3 0x00007f6677483e4f in WebCore::ThreadTimers::sharedTimerFiredInternal() () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #4 0x00007f66743f6b75 in WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::$_0::__invoke(void*) () from target:/usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-6.0.so.1 #5 0x00007f66743f5c31 in WTF::RunLoop::$_0::__invoke(_GSource*, int (*)(void*), void*) () from target:/usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-6.0.so.1 #6 0x00007f667080f767 in g_main_dispatch () from target:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #7 0x00007f6670811907 in g_main_context_iterate_unlocked.isra () from target:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #8 0x00007f66708123a7 in g_main_loop_run () from target:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #9 0x00007f66743f6211 in WTF::RunLoop::run() () from target:/usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-6.0.so.1 #10 0x00007f6675bbae47 in int WebKit::AuxiliaryProcessMain<WebKit::WebProcessMainGtk>(int, char**) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #11 0x00007f6674c3b08a in __libc_start_call_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #12 0x00007f6674c3b14b in __libc_start_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #13 0x000055c267573085 in _start () Why is HTMLMediaElement doing anything when there is no web view? It shouldn't even possible for an HTMLMediaElement to still exist because surely the WebPage object should have been destroyed. There shouldn't be any DOM left. Maybe it's been leaked? I got multiple backtraces that look like this, so this was (gdb) bt #0 0x00007f2f16db5bf8 in WTF::Detail::CallableWrapper<WebCore::WindowEventLoop::didReachTimeToRun()::$_0, bool, WebCore::ScriptExecutionContext&>::call(WebCore::ScriptExecutionContext&) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #1 0x00007f2f16d10190 in WebCore::EventLoop::findMatchingAssociatedContext(WTF::Function<bool (WebCore::ScriptExecutionContext&)> const&) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #2 0x00007f2f16db2a30 in WebCore::WindowEventLoop::didReachTimeToRun() () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #3 0x00007f2f17483e4f in WebCore::ThreadTimers::sharedTimerFiredInternal() () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #4 0x00007f2f145f6b75 in WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::$_0::__invoke(void*) () from target:/usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-6.0.so.1 #5 0x00007f2f145f5c31 in WTF::RunLoop::$_0::__invoke(_GSource*, int (*)(void*), void*) () from target:/usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-6.0.so.1 #6 0x00007f2f10918767 in g_main_dispatch () from target:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #7 0x00007f2f1091a907 in g_main_context_iterate_unlocked.isra () from target:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #8 0x00007f2f1091b3a7 in g_main_loop_run () from target:/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0 #9 0x00007f2f145f6211 in WTF::RunLoop::run() () from target:/usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-6.0.so.1 #10 0x00007f2f15bbae47 in int WebKit::AuxiliaryProcessMain<WebKit::WebProcessMainGtk>(int, char**) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #11 0x00007f2f14c3b08a in __libc_start_call_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #12 0x00007f2f14c3b14b in __libc_start_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #13 0x000055f3a2709085 in _start () A timer is firing. It's probably about to run the HTMLMediaElement code? I also got a backtrace where it looks like the timer just finished firing. (gdb) bt #0 0x00007f2f16d82640 in void WebCore::SelectorDataList::execute<WebCore::Element*>(WebCore::ContainerNode&, WebCore::Element*&) const () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #1 0x00007f2f16d81ada in WebCore::SelectorDataList::queryFirst(WebCore::ContainerNode&) const () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #2 0x00007f2f16c73e82 in WebCore::ContainerNode::querySelector(WTF::String const&) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #3 0x00007f2f15eff8e6 in WebCore::jsDocumentPrototypeFunction_querySelector(JSC::JSGlobalObject*, JSC::CallFrame*) () from target:/usr/lib/x86_64-linux-gnu/libwebkitgtk-6.0.so.4 #4 0x00007f2eabe0c038 in ?? () #5 0x00007ffe862e8130 in ?? () #6 0x00007f2eaca10314 in ?? () #7 0x0000000000000000 in ?? () This looks like the cached process is executing miscellaneous JavaScript code.
Attachments
Michael Catanzaro
Comment 1 2024-05-17 06:57:33 PDT
I will test to see if I can bisect this. (Probably next week.)
Michael Catanzaro
Comment 2 2024-05-17 12:45:39 PDT
f91aeb92bd8ee8b8f6d7242afad4e07d5bfff6f3 is the first bad commit commit f91aeb92bd8ee8b8f6d7242afad4e07d5bfff6f3 (HEAD) Author: Carlos Bentzen <cadubentzen@igalia.com> Date: Thu Mar 28 13:24:38 2024 -0700 [GStreamer][MSE] Crash in webKitMediaSrcStreamFlush https://bugs.webkit.org/show_bug.cgi?id=260455 Reviewed by Alicia Boya Garcia. Fix crash and playback of video on https://www.apple.com/apple-watch-ultra-2/. The video starts outside of the viewport and is played via MSE. The issue was caused due to failing to transition the pipeline to PAUSED in MediaPlayerPrivateGStreamer::setVisibleInViewport when an MSE video starts initially outside of the viewport. That, in turn, is caused because we try to set playbin to PAUSED without an URL set. Then, the media player pipeline gets stuck in READY, while webkitmediasrc is in the PAUSED state. With this, the video never starts playing back. Moreover, scrolling the page for the video to be in and out of the viewport, it triggers a downgrade state in webkitmediasrc to READY, which if coupled with a seek attempt then causes a crash in WebKitMediaSourceGStreamer when flushing. The fix is two-fold: 1. Avoid setting the state to PAUSED if no playbin URL has been set. 2. Avoid setting the state of the pipeline to READY in MediaPlayerPrivateGStreamer::setVisibleInViewport if the media player is MSE. Added a test that triggers the described behavior and crashes without the fix. It scrolls the page to move a video in and out of the viewport, while also seeking to the start on every scroll. Currently, part of the code to trigger the crash isn't executed due to WEBKIT_GST_ALLOW_PLAYBACK_OF_INVISIBLE_VIDEOS=1 being set in the test driver in Tools/Scripts/webkitpy/port/glib.py (264017@main), which needs to be commented manually. We should find a way to add a test preference for this code path to be enabled. * LayoutTests/media/media-source/media-source-muted-scroll-and-seek-crash-expected.txt: Added. * LayoutTests/media/media-source/media-source-muted-scroll-and-seek-crash.html: Added. * Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp: (WebCore::MediaPlayerPrivateGStreamer::load): Set playbin URL before calling setVisibleInViewport. (WebCore::MediaPlayerPrivateGStreamer::setVisibleInViewport): Avoid state transitions to READY if MSE and avoid setting the pipeline to PAUSED if playbin URL is not set. Canonical link: https://commits.webkit.org/276798@main LayoutTests/media/media-source/media-source-muted-scroll-and-seek-crash-expected.txt | 4 ++++ LayoutTests/media/media-source/media-source-muted-scroll-and-seek-crash.html | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp | 14 +++++++++++--- 3 files changed, 80 insertions(+), 3 deletions(-) create mode 100644 LayoutTests/media/media-source/media-source-muted-scroll-and-seek-crash-expected.txt create mode 100644 LayoutTests/media/media-source/media-source-muted-scroll-and-seek-crash.html
Michael Catanzaro
Comment 3 2024-05-17 12:49:38 PDT
*** Bug 274329 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 4 2024-05-17 12:52:08 PDT
Honestly it's pretty bad that the UI process allows the web process to utilize an unlimited amount of CPU. We have a memory limit; probably also need a CPU limit....
Michael Catanzaro
Comment 5 2024-05-17 12:53:20 PDT
EWS
Comment 6 2024-05-17 16:02:28 PDT
Committed 278929@main (a57fafd69d27): <https://commits.webkit.org/278929@main> Reviewed commits have been landed. Closing PR #28727 and removing active labels.
Michael Catanzaro
Comment 7 2024-05-26 10:34:21 PDT
Reopening. Something is wrong, because the regression persists after https://gitlab.gnome.org/GNOME/gnome-build-meta/-/merge_requests/2910, even though this was a very straightforward bisect. I'll need to investigate more.
Michael Catanzaro
Comment 8 2024-05-27 14:43:29 PDT
This bug is fixed. We just have a second excessive CPU usage regression, which I haven't figured out how to reproduce yet.
Michael Catanzaro
Comment 10 2024-05-28 10:28:33 PDT
Reproducer: same procedure as with nbcnews.com, but use https://www.ksdk.com/article/news/local/business-journal/why-fewer-cicadas-in-city-of-st-louis/63-387e0227-992c-43f0-80a7-ceaa892a3561 instead. The underlying bug is different though. (This is a repost of my previous comment, this time without the tracking parameter.)
Michael Catanzaro
Comment 11 2024-05-28 14:46:47 PDT
Actually, the first link is better because without the tracking parameter, we get a paywall.
Michael Catanzaro
Comment 12 2024-05-28 14:52:44 PDT
Created bug #274802 for this sequel problem.
Note You need to log in before you can comment on or make changes to this bug.