Summary: | [GTK] excessive wakeups/polling due to gdk_frame_clock_begin_updating | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Tomáš Janoušek <webkit> | ||||
Component: | WebKitGTK | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | berto, bugs-noreply, calestyo, cgarcia, clopez, mcatanzaro, mcrha, pnormand, svillar, zan | ||||
Priority: | P2 | ||||||
Version: | Other | ||||||
Hardware: | PC | ||||||
OS: | Linux | ||||||
Attachments: |
|
Description
Tomáš Janoušek
2020-04-15 11:46:57 PDT
Thanks for the so detailed bug report. We have noticed this, but it wasn't always reproducible for me. I assumed we were leaking a DisplayRefreshMonitor at some point, but it seems we don't, and maybe the problem is the way we use the GdkFrameClock. The reason why we don't call end_updating is because we assume the frame clock is destroyed when the offscreen window is destroyed. I've tried to reproduce this again with MiniBrowser, evolution and liferea and I still can't reproduce it :-( > I've tried to reproduce this again with MiniBrowser, evolution and liferea and I still can't reproduce it :-( Weird. :-/ I can reproduce it with 2.28.1 just by launching liferea, or by launching surf about:blank and then stracing the WebKitWebProcess. With 2.26.4 surf about:blank is okay, but surf https://en.wikipedia.org/wiki/GIF#/media/File:Rotating_earth_(large).gif triggers it. I should also say that my system is amd64 Debian testing. Just to be clear, I see the GdkFrameClock being used, but it's correctly destroyed as expected. Perhaps you have a different version of gtk that fixed/worked around this? Or maybe it's a difference between X11 and Wayland backends? I'm using X11 and gtk 3.24.18. Anyway, if you want the offscreen window destruction to clean things up, perhaps https://developer.gnome.org/gtk3/unstable/GtkWidget.html#gtk-widget-add-tick-callback is better than gdk_frame_clock_begin_updating? (BTW, if you want a more interactive communication channel, I'm Liskni_si on freenode and @Liskni_si on Twitter.) (In reply to Tomáš Janoušek from comment #7) > Perhaps you have a different version of gtk that fixed/worked around this? > Or maybe it's a difference between X11 and Wayland backends? I'm using X11 > and gtk 3.24.18. Maybe, I don't think it's X11 or wayland specific, because other people reported the problem with evolution under wayland. (In reply to Tomáš Janoušek from comment #8) > Anyway, if you want the offscreen window destruction to clean things up, > perhaps > https://developer.gnome.org/gtk3/unstable/GtkWidget.html#gtk-widget-add-tick- > callback is better than gdk_frame_clock_begin_updating? I've checked the GdkFrameClock is properly destroyed. > (BTW, if you want a more interactive communication channel, I'm Liskni_si on > freenode and @Liskni_si on Twitter.) I'll ping you tomorrow on IRC then. Thanks! Hey. Probably the following has nothing to do with this bug (which I'm also experiencing in at least Evolution): Could it be that this issue is somehow caused by a kernel change (specifically something that happened in kernels >5.2)? I see this pattern of fast: pid 243332] recvmsg(3, {msg_namelen=0}, 0) = -1 EAGAIN (Resource temporarily unavailable) [pid 243332] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, 17) = 0 (Timeout) in quite a number of applications (e.g. diodon is one: https://bugs.launchpad.net/diodon/+bug/1871008 ... and I've also noticed it in cinnamon https://github.com/linuxmint/cinnamon/issues/9085#issuecomment-615543487 ) These in turn are all programs which might play a part in a bigger issue I suffer since kernels >5.2, namely some massive increase of CPU temperature respectively power consumption for GPU but also non-GPU intensive workloads: see: https://bugzilla.kernel.org/show_bug.cgi?id=207245 (with links to other related bug reports) for more details. Cheers, Chris. If I were you, I would try bisecting the kernel. Christoph Anton Mitterer, yeah, if you're running several apps that each use WebKitGtk and suffer from this issue, that would indeed make your CPU considerably hotter. I'm lucky to just be running one and I still had to downgrade. :-) Anyway, I added a reply here as well: https://gitlab.freedesktop.org/drm/intel/-/issues/953#note_470483 because I think the kernel issue may not be fully fixed either, and might be making it worse. @Michael: Well that's probably the next thing on the road. @Tomáš: Thanks for your comments, though I kinda doubt that the WebKit issue is the only reason for my problems (see e.g. my extensive tests, which show that even without Evolution or any GUI activity, there's a huge difference between 5.2 and 5.5 (worse)... and within each between intel_pstate being enabled (worse) and not. Having e.g. Evolution run with some WebKit process being in the state of high CPU utilisation (which is likely this bug), just makes it worse. Could anyone else check whether he sees *this* issue here, with much older kernels (i.e. 5.2 and below)? Also, any fix for this on the horizon? :-) (In reply to Christoph Anton Mitterer from comment #14) > @Michael: Well that's probably the next thing on the road. > > @Tomáš: Thanks for your comments, though I kinda doubt that the WebKit issue > is the only reason for my problems (see e.g. my extensive tests, which show > that even without Evolution or any GUI activity, there's a huge difference > between 5.2 and 5.5 (worse)... and within each between intel_pstate being > enabled (worse) and not. > > Having e.g. Evolution run with some WebKit process being in the state of > high CPU utilisation (which is likely this bug), just makes it worse. > > > Could anyone else check whether he sees *this* issue here, with much older > kernels (i.e. 5.2 and below)? > > Also, any fix for this on the horizon? :-) I need to reproduce it to fix it. Or some help from someone who can reproduce it to debug the issue and understand the problem. Another solution might be to forget about GdkFrameClock and use a simple timer at 60fps. > I need to reproduce it to fix it. Or some help from someone who can reproduce it to debug the issue and understand the problem. Another solution might be to forget about GdkFrameClock and use a simple timer at 60fps.
I'd love to help but I don't know what else I can do here. Any ideas?
Last week you said you'd ping me on IRC on Friday... :-/
Created attachment 397337 [details]
Patch
Finally found the problem with help from Tomáš. Thanks!
Committed r260567: <https://trac.webkit.org/changeset/260567> Comment on attachment 397337 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=397337&action=review Awesome! > Source/WebCore/platform/graphics/gtk/DisplayRefreshMonitorGtk.cpp:50 > + gtk_widget_destroy(m_window); Do we really need the destroy? Isn't it enough with calling _end_updating() (In reply to Sergio Villar Senin from comment #19) > Comment on attachment 397337 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=397337&action=review > > Awesome! > > > Source/WebCore/platform/graphics/gtk/DisplayRefreshMonitorGtk.cpp:50 > > + gtk_widget_destroy(m_window); > > Do we really need the destroy? Isn't it enough with calling _end_updating() Beh, I've just realized that we're already calling it, nevermind. Is it good idea to: ASSERT(frameClock); on both places? The gtk_widget_get_frame_clock() can return NULL, according to its documentation. You can also use g_signal_handlers_disconnect_by_data(), which was added in glib 2.32. I do not see which version of glib WebKitGTK depends on (not from FindGLIB.cmake at least). (In reply to Milan Crha from comment #21) > You can also use g_signal_handlers_disconnect_by_data(), which was added in > glib 2.32. I do not see which version of glib WebKitGTK depends on (not from > FindGLIB.cmake at least). It's currently set to 2.44 in Source/cmake/OptionsGTK.cmake:find_package(GLIB 2.44.0 REQUIRED COMPONENTS gio gio-unix gobject gthread gmodule) And it would be OK to raise that minimum version to 2.56 (its what ships Ubuntu-18.04) (In reply to Carlos Alberto Lopez Perez from comment #22) > ...And it would be OK to raise that minimum version... No need to raise it, from my point of view. As long as you use API in the old version it's all fine. (In reply to Milan Crha from comment #21) > Is it good idea to: > > ASSERT(frameClock); > > on both places? The gtk_widget_get_frame_clock() can return NULL, according > to its documentation. Yes, but in this case we are creating a toplevel that we manually realize, so the frame clock is already created for sure. > You can also use g_signal_handlers_disconnect_by_data(), which was added in > glib 2.32. I do not see which version of glib WebKitGTK depends on (not from > FindGLIB.cmake at least). I guess it doesn't matter unless it's more efficient. (In reply to Carlos Garcia Campos from comment #24) > I guess it doesn't matter unless it's more efficient. I doubt it's more efficient. I suggested it rather for readability/convenience. Nothing important. Hey guys. I've got the fix no with some recent Debian package,... and the extreme CPU utilisations are in fact fixed. However, I still see some little sporadic CPU utilisation from Evolution's WebKit processes every now and then... really nothing dramatic, most of the time they're 0%, however they do go up to say 2-3,3% every once in a while, *even though nothing happens at Evolution* (like I do not select any other mail, so that content would have to be re-drawn, nor are there any animated GIFs or so). It even happens when Evolution is minimised. Is this kinda expected or something that one should follow up. Thanks, Chris. Evolution can invoke repaint of the message preview, indirectly, when the theme style changes. It can happen also when the window gets focus or loses it. I currently do not know what else could cause that, not periodically. Maybe install debuginfo for (or build with it) the evolution itself (not WebKit, it's too huge) and when it happens, supposing there will be enough time to do it, catch a series of backtraces of the offending process, which may or may not shed a bit of light on this. I catch backtraces with this command: for i in {1..10}; do gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt$i.txt; sleep 0.1; done You can tweak the delay between them with the sleep command (set to 100ms in the command). The backtraces may not expose any private information, but, please, check the files for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only). Just in case. I do not know whether we should keep this here, as the bug itself is fixed. Feel free to open a new bug on the evolution side and we can investigate there [1]. [1] https://gitlab.gnome.org/GNOME/evolution/issues/new |