Visiting http://www.theonion.com/ sometimes, but not always, hangs my entire desktop. My lowball guess is that this happens roughly 20% of the time.
The first warning sign is that the page is taking unusually long to load (3-4 seconds). If I react very quickly and close the tab or my browser, then the day is saved. Otherwise, I can at first move the mouse pointer very sluggishly, but not click on anything, and then I completely lose control. I have no choice but to hold the power button to shut down my computer.
I've had no success attempting to debug this, since I don't know how to trigger the bug without borking my computer.
webkitgtk3-2.2.3-1.fc20
epiphany-3.10.2-1.fc20
So here's the thing. You are using WebKitGTK+ 2.2 which is rather old. I can reproduce this with WKGTK 2.4, but since this had not happened before to either of us, I think something buggy got into Fedora (so far all people affected report using Fedora 20) somewhere in the stack.
I'm currently using the GNOME 3.12 copr on Fedora 20, but I've reproduced this on Arch in the past, so it's nothing Fedora-specific. I think it's been happening since the GNOME 3.8 era (so WebKitGTK+ 2.0).
I cannot reproduce this anymore after a Fedora upgrade. Both LinkedIn and The Onion work fine for me. There were mesa packages upgraded, can't pinpoint anything relevant in the changelogs though.
Here are three backtraces taken on projects.archlinux.org (where I experience the same symptoms). They all look quite similar in thread one. Stepping through the third case in gdb, I was able to successfully return to WebKit's main loop, which suggests that this may be a red herring.
I can't ever seem to reproduce it by refreshing the same page over and over, so it might only occur for new web processes, or the first time the page is loaded in the current process.
Also, it's a memory allocation loop (at least sometimes): WebKit was eating up ~4GiB at the time I stopped it this time, but it would have continued to rise up to the size of my physical memory before the system hangs up.
I can sort of reproduce the Onion one with MiniBrowser built from master, on Fedora 21, using Mesa 10.4.x. The "sort of" is because I see a significant delay when the page loads, and if I interrupt loading the backtrace looks quite similar to what is in comment 9. BUT if I wait long enough, the page loads. And my desktop does not hang. Is it definitely a hang, or if you walk away and make a sandwich, do you find things back to normal upon your return?
As for the backtraces in comment 10 and comment 11: Are those from the Onion? Or are they from pages with select elements? And if the latter, is the problem seen primarily when there are a bunch of options in a given select element? And if so to that, if you do the make-a-sandwich test, does the problem eventually go away or is it a proper hang?
(In reply to comment #13)
> I can sort of reproduce the Onion one with MiniBrowser built from master, on
> Fedora 21, using Mesa 10.4.x. The "sort of" is because I see a significant
> delay when the page loads, and if I interrupt loading the backtrace looks
> quite similar to what is in comment 9. BUT if I wait long enough, the page
> loads.
Often there is a significant delay, but the browser recovers and completes the load. When the load completes, I can look in System Monitor and see that I have one web process using 3.4 GiB of memory, the rest with 40 MiB apiece. I think that when the browser does not recover within a reasonable amount of time, it has simply allocated something much more than 3 GiB and begun swapping.
> And my desktop does not hang. Is it definitely a hang, or if you walk
> away and make a sandwich, do you find things back to normal upon your return?
It does recover after a "sandwich," but it's very slow when I return (e.g. unlocking the computer takes ~5s, launching any application takes ~15s) and the desktop will regularly hang for ~5s intervals. I think this is just swapping.
(In reply to comment #13)
> As for the backtraces in comment 10 and comment 11: Are those from the
> Onion?
No:
(In reply to comment #8)
> Here are three backtraces taken on projects.archlinux.org (where I
> experience the same symptoms).
If you think that is a different bug, I can reopen bug #126123, but I hope they're the same. The Onion is worse for testing because it reloads itself (probably to trick its advertisers into thinking it gets more page views than it really does) and it's not fun when I forget and a background tab triggers the bug.
> Or are they from pages with select elements? And if the latter, is
> the problem seen primarily when there are a bunch of options in a given
> select element?
Yes, that must be it, good call! projects.archlinux.org has a huge combo box in the upper right that I never noticed until you suggested I look for select elements. bugzilla.redhat.org has one as well. These are well-known to perform horrendously (I use Firefox when I need to change a Component on bugzilla.redhat.com).
I don't see one on the Onion, though.
So I see that you've changed the summary to reflect the combo box issue. Before the summary stated that the problem is on the Onion's site which lacks combo boxes. Were it me, I'd probably split this into two bugs because, well, it's two bugs. ;) See your backtraces.
This bug has become a bit confused, due to the various web sites studied in the past:
theonion.com, which no longer seems to exhibit this issue
www.archlinux.org/packages, ditto
bugzilla.redhat.com, still a big problem, but not a reliable reproducer
A user recently found a site that reproduces this reliably: [1]
Let's repurpose this bug to cover just the site where we can reproduce this reliably. If it happens to be that the bug still exists on other sites after we have fixed it on this on, we can open different bugs.
***Warning*** clicking the link below will hang your computer unless you are prepared to immediately kill your web process.
[1] http://www.reuters.com/article/2015/09/02/us-iran-nuclear-congress-idUSKCN0R21L620150902
At least on the Reuters page the DRM allocator nodes appear to be leaked, resulting in an OOM.
From attached webprocess.smaps,
cat webprocess.smaps | grep "drm mm object"
yields ~2700 such entries, each 4MB in size, or >10GB in total (at ~11GB consumption).
No idea about the why.
A Radeon user reports it crashes the driver:
[drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to parse relocation -12
But presumably that's a symptom of the leak.
I've been experiencing this problem with LinkedIn recently when accepting invitations to connect (when clicking the button)... FWIW / adding some precision to the previous mention of LinkedIn by someone else.
Another similar issue:
"""
Then, from the top bar I place my pointer over the "Connections" menu so it is expanded, once expanded, I click on the "See all" link:
https://www.linkedin.com/people/pymk/hub?ref=global-nav&trk=nav_utilities_invites_header
Upon doing so, the browser freezes and I notice the HD of my computer starting to work hard. After checking with "top" the WebKitWebProcess is growing its used memory by the Gbs!!! and it doesn't seem to stop.
This may not be a problem in WKGTK+ but the graphics driver since, after repeating the process with a GDB attached to the WebKitWebProcess, GDB states that the point in which it exits when manually killing the process is:
/usr/lib/x86_64-linux-gnu/dri/i965_dri.so
"""
You can reproduce this without killing your desktop by using Linux's cgroups to limit the amount of memory that webkit can use. This way is easier to reproduce and test:
$ sudo cgcreate -g memory:/testwebkitmem
$ echo $(( 1 * 1024 * 1024 * 1024 ))| sudo tee /sys/fs/cgroup/memory/testwebkitmem/memory.limit_in_bytes
$ echo $(( 2 * 1024 * 1024 * 1024 ))| sudo tee /sys/fs/cgroup/memory/testwebkitmem/memory.memsw.limit_in_bytes
$ sudo cgexec -g memory:/testwebkitmem sudo -H -u $USER Tools/jhbuild/jhbuild-wrapper --gtk run ./WebKitBuild/Release/bin/MiniBrowser http://www.reuters.com/article/us-new-york-flightcenter-idUSKCN0SC14B20151018
So, I quickly get WebKitWebProcess killed by the kernel's OOM-killer:
Task in /testwebkitmem killed as a result of limit of /testwebkitmem
memory: usage 1048576kB, limit 1048576kB, failcnt 72526
memory+swap: usage 1967216kB, limit 2097152kB, failcnt 0
kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
Memory cgroup stats for /testwebkitmem: cache:3480KB rss:1045096KB rss_huge:0KB mapped_file:3476KB writeback:285804KB swap:918640KB inactive_anon:524344KB active_anon:524192KB inactive_file:0KB active_file:0KB unevictable:0KB
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[17018] 0 17018 13843 806 31 120 0 sudo
[17019] 0 17019 13843 0 29 120 0 sudo
[17020] 1000 17020 515762 18778 226 6572 0 MiniBrowser
[17040] 1000 17040 775055 15724 312 6776 0 WebKitNetworkPr
[17042] 1000 17042 1085848 209041 1182 288385 0 WebKitWebProces
Memory cgroup out of memory: Kill process 17042 (WebKitWebProces) score 951 or sacrifice child
Killed process 17042 (WebKitWebProces) total-vm:4343392kB, anon-rss:752824kB, file-rss:83340kB
But, this doesn't kills or causes any instability on my system, so is a nice trick to debug memory problems.
My 2 cents.
Thanks for the neat cgroups suggestion and for taking the backup, Carlos!
My less-refined way to test this safely is to type 'killall WebKitWebProcess' in my terminal, open system monitor, and kill it before it gets out of hand, but that's pretty risky....
(In reply to comment #28)
> Can you also reproduce the problem with this one?
>
> http://www.reuters.com/article/2015/10/18/us-new-york-flightcenter-
> idUSKCN0SC14B20151018
Yes, on the first try.
(In reply to comment #32)
> FWIW we've now documented this problem on Intel, Nvidia, and AMD graphics.
This is happening also on my laptop with the NVIDIA proprietary drivers. I don't see any artifacts on the screen, I just get into an OOM situation
Created attachment 270564[details]
BT from gdb
I'm using WebKitGtk+ with my own JHBuild setting:
https://github.com/tanty/jhbuild-epiphany/tree/master
Epiphany 3.18.0 and WebKit 2.10.7
I'm running Epiphany with the dconf key:
"process-model" = "shared-secondary-process"
Attached, a backtrace stopped when the memory is exponentially increasing.
I can confirm this is not reproducible (with any of the listed links in this report) with WebKitGtk+ from my own JHBuild setting:
https://github.com/tanty/jhbuild-epiphany/tree/master
WebKit 2.10.7, using MiniBrowser, and passing the CMake arg: -DENABLE_OPENGL=OFF
I can confirm this IS REPRODUCIBLE with WebKitGtk+ from my own JHBuild setting:
https://github.com/tanty/jhbuild-epiphany/tree/master
WebKit 2.10.7, using MiniBrowser, and passing the CMake arg: -DUSE_REDIRECTED_XCOMPOSITE_WINDOW=OFF
(In reply to comment #37)
> I can confirm this IS REPRODUCIBLE with WebKitGtk+ from my own JHBuild
> setting:
> https://github.com/tanty/jhbuild-epiphany/tree/master
>
> WebKit 2.10.7, using MiniBrowser, and passing the CMake arg:
> -DUSE_REDIRECTED_XCOMPOSITE_WINDOW=OFF
Thanks!, so this time I can't blame the redirected window :-P
OK, I've been debugging this and I know what the problem is. In the case of the test case saved by Carlos Lopez, we have a layer of size 12000079x525, which is huge. The tiled backing store (TextureMapperTiledBackingStore) creates tile for the whole layer bounds, which means ~6000 tiles of 2000x525. If I'm understanding the code correctly, the tiled backing store is not taking into account the visible area at all. This is a problem for huge layers, but even for the case of smaller layers, we might be wasting the memory and cpu computing tiles that are offscreen. So, I think the tiled backing store should build a coverage area based on the visible size, and create tiles only for that area. I've noticed that texmap/coordinated/TiledBackingStore does that, so I wonder if we could somehow reuse the code or have a common tile manager that could be used by both backing stores. I'm not a graphics expert, so I hope that someone with more experience could help here now that we know what the problem is, otherwise I'll try to do it myself and any help would be really appreciated.
Created attachment 270986[details]
WIP
This is just an experiment, I don't think it's the right solution, but could work as a workaround. The idea is basically to provide the visible rectangle to the texture mapper, and never create tiles that don't intersect with the visible rect.
Not directly related with this bug, but I guess is interesting info anyway:
I'm not longer able to reproduce this problem if I force AC mode to be off. The usage of RAM don't grows out of normal. I have uploaded a patch to bug 154147 that allows to do this by just setting an environment variable before running the browser.
As cgarcia mentioned, this whole problem happened because of the out-dated TextureMapperBackingStore. I modified it to use basic concepts of the visible rect and coverage area when creating tiles at the TextureMapperBackingStore, instead of replacing it with a TiledBackingStore.
Because there is some difference between the coordinated graphics and the GTK's texture mapper while handling scrolling and bitmap buffers.
Comment on attachment 271659[details]
Patch
View in context: https://bugs.webkit.org/attachment.cgi?id=271659&action=review> Source/WebCore/platform/graphics/texmap/GraphicsLayerTextureMapper.cpp:652
> +void GraphicsLayerTextureMapper::markVisibleRectAsDirty()
> +{
> + m_isVisibleRectDirty = true;
> +
> + if (maskLayer())
> + downcast<GraphicsLayerTextureMapper>(*maskLayer()).markVisibleRectAsDirty();
> + if (replicaLayer())
> + downcast<GraphicsLayerTextureMapper>(*replicaLayer()).markVisibleRectAsDirty();
> + for (auto* child : children())
> + downcast<GraphicsLayerTextureMapper>(*child).markVisibleRectAsDirty();
> +}
This is wasteful. It would be more efficient to set up a state object during the layer flushing, copy it for every layer and update the 'is-visible-rect-dirty' parameter by or-ing the m_isVisibleRectDirty value into it, and then pass it to every child.
> Source/WebCore/platform/graphics/texmap/GraphicsLayerTextureMapper.cpp:663
> +bool GraphicsLayerTextureMapper::selfOrAncestorHasActiveTransformAnimation() const
> +{
> + if (m_animations.hasActiveAnimationsOfType(AnimatedPropertyTransform))
> + return true;
> +
> + if (!parent())
> + return false;
> +
> + return downcast<GraphicsLayerTextureMapper>(*parent()).selfOrAncestorHasActiveTransformAnimation();
> +}
This too could be included in that state object, removing the need to go back up the layer tree to find a parent with any active animation.
> Source/WebCore/platform/graphics/texmap/GraphicsLayerTextureMapper.cpp:682
> + m_layerTransform.combineTransforms(parent() ? downcast<GraphicsLayerTextureMapper>(*parent()).m_layerTransform.combinedForChildren() : TransformationMatrix());
combineTransforms() could only be called if parent() exists, no?
(In reply to comment #49)
> It would be really nice to get an EWS for WinCairo.
The WinCairo port and its buildbot isn't maintained well,
nobody fixes build breaks, bot issues day by day. I don't
think if somebody is interested in maintaining an EWS too.
Today's complaint: "so a friend sent me a link to a tweet that froze my computer for 5 minutes :P"
I've seen enough circumstantial evidence of this that I'm now fairly confident that the set of sites broken by this commit are the same as the set of sites that suffer from this issue.
Carlos says issue is fixed in threaded compositor; we're keeping it open until we remove the ability to build without threaded compositor, but I guess it's not a priority anymore (Yoon).
(In reply to comment #61)
> we're keeping it open until we remove the ability to build without threaded compositor
So, can we do this? It seems to be working fine, right?
Or: is this bug still an issue when WEBKIT_DISABLE_COMPOSITING_MODE is used?
(In reply to comment #62)
> (In reply to comment #61)
> > we're keeping it open until we remove the ability to build without threaded compositor
>
> So, can we do this? It seems to be working fine, right?
No, we don't have enough feedback yet, it might be broken in some particular gpus, for example.
> Or: is this bug still an issue when WEBKIT_DISABLE_COMPOSITING_MODE is used?
No, it only happens with AC on.
One of the real-world examples of this bug is TypeScript Playground (https://www.typescriptlang.org/play/), which hosts Monaco editor.
Crashing with AC on.
--------------------------------------------------------
System: Win 10.
Webkit r209238.
Cairo v1.14.4
Graphics: Intel® HD Graphics 4400; nVidia GeForce 750 M
--------------------------------------------------------
Patch (https://bugs.webkit.org/attachment.cgi?id=271659) helps to resolve the issue partially: some tiles remain uninitialzed (i.e. non-intersected).
Just try to scroll a sample on TypeScript Playground's page.
Moreover, GitHub is being affected: single-file pages are rendering huge tables, which require tiles to be preinitialized.
For example, try to open (https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/graphics/texmap/coordinated/TiledBackingStore.cpp).
Karlen, did you build WebKit with -DENABLE_THREADED_COMPOSITOR=OFF?
This issue has kinda dropped off the roadmap because that is a non-default configuration.
Hi Michael,
Yes without threaded compositor feature enabled, so texmap/TextureMapperTiledBackingStore is being used instead of texmap/coordinated/TiledBackingStore.
Nope, I must have misread the patch. Looks like it's still around if ENABLE_OPENGL=OFF.
I think we should remove support for ENABLE_OPENGL=OFF if nobody fixes this bug.
2015-01-06 12:02 PST, Michael Catanzaro
2015-01-06 12:03 PST, Michael Catanzaro
2015-01-06 12:03 PST, Michael Catanzaro
2015-09-13 10:15 PDT, Zan Dobersek
2016-02-03 00:56 PST, Andres Gomez Garcia
2016-02-10 06:40 PST, Carlos Garcia Campos
2016-02-18 06:38 PST, Gwang Yoon Hwang