Bug 164052

Summary: [GTK] Since the memory pressure relief has been activated, my disk has a high usage and the desktop stalls
Product: WebKit Reporter: Andres Gomez Garcia <agomez>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bugs-noreply, cgarcia, clopez, mcatanzaro, rishi.is, tpopela, zan
Priority: P2    
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
See Also: https://bugzilla.gnome.org/show_bug.cgi?id=773605
Attachments:
Description Flags
Patch
mcatanzaro: review+, buildbot: commit-queue-
Archive of layout-test-results from ews115 for mac-yosemite
none
Several memory measures with different confiturations and patches on top of 2.15.2 none

Description Andres Gomez Garcia 2016-10-27 01:53:43 PDT
The memory pressure relief was activated recently in WKGTK+

Using Ephy, every time it takes action, I can see something like this:

Memory pressure relief: Empty the PageCache: =dirty (at 8351166464 bytes)
Memory pressure relief: Prune MemoryCache live resources: =dirty (at 8351166464 bytes)
Memory pressure relief: Drain CSSValuePool: =dirty (at 8351166464 bytes)
Memory pressure relief: Discard StyleResolvers: =dirty (at 8351166464 bytes)
Memory pressure relief: Discard all JIT-compiled code: =dirty (at 8351166464 bytes)
Memory pressure relief: Dropping buffered data from paused media elements: =dirty (at 8351166464 bytes)
Memory pressure relief: Purge inactive FontData: =dirty (at 8351166464 bytes)
Memory pressure relief: Clear WidthCaches: =dirty (at 8351166464 bytes)
Memory pressure relief: Discard Selector Query Cache: =dirty (at 8351166464 bytes)
Memory pressure relief: Prune MemoryCache dead resources: =dirty (at 8351166464 bytes)
Memory pressure relief: Prune presentation attribute cache: =dirty (at 8351166464 bytes)
Memory pressure relief: Run malloc_trim: =dirty (at 8351166464 bytes)
Memory pressure relief: Release free FastMalloc memory: =dirty (at 8351166464 bytes)


The mechanism is definitively good but the problem is when it doesn't succeed on freeing much memory. What it happens is that gets in action again, and again, and again, not achieving much but making my hard drive get crazy and, therefore, almost stalling my whole desktop.

Sometimes, at some point it stops and I can retake the usage of the desktop but any action that I do in Ephy triggers again the memory pressure relief and I have to wait for some more minutes in hope that I will recover control over my desktop.
Comment 1 Andres Gomez Garcia 2016-10-27 01:55:37 PDT
Ideally, if the pressure relief is not achieving much, it should stop trying.

I think that it would be good, then, to have some public signal notifying that WK is getting out of memory and the pressure relief is not helping.

That way, apps that are using WK will be able to warn the user about it.

For example, with Ephy, it could be displayed a warning in the fashion:

"Ephy is running out of memory. Please, close some tabs or other programs"
Comment 2 Debarshi Ray 2016-10-27 15:37:49 PDT
(In reply to comment #1)
> I think that it would be good, then, to have some public signal notifying
> that WK is getting out of memory and the pressure relief is not helping.
> 
> That way, apps that are using WK will be able to warn the user about it.
> 
> For example, with Ephy, it could be displayed a warning in the fashion:
> 
> "Ephy is running out of memory. Please, close some tabs or other programs"

The same workload / number of tabs that used to work quite well without the memory pressure relief is now swapping. So I doubt WK is really running out of memory.

For what it is worth, I have 8GB of physical RAM.
Comment 3 Carlos Garcia Campos 2016-10-27 22:47:04 PDT
(In reply to comment #2)
> (In reply to comment #1)
> > I think that it would be good, then, to have some public signal notifying
> > that WK is getting out of memory and the pressure relief is not helping.
> > 
> > That way, apps that are using WK will be able to warn the user about it.
> > 
> > For example, with Ephy, it could be displayed a warning in the fashion:
> > 
> > "Ephy is running out of memory. Please, close some tabs or other programs"
> 
> The same workload / number of tabs that used to work quite well without the
> memory pressure relief is now swapping. So I doubt WK is really running out
> of memory.
> 
> For what it is worth, I have 8GB of physical RAM.

That's a different issue though. This bug is about running the memory pressure handler several times when the memory situation is critical, assuming the measure is correct. Another issue is that we might be calculating the memory available wrongly, or that we are triggering the handler too early (we currently use 90%). Note also that the memory pressure doesn't monitors the webkit memory, but the system memory, so it's possible that it's not WebKit the one causing the high memory usage, but we try to help in any case by freeing caches, etc. In those cases it's common that we end up freeing very little memory or nothing, we could check that and increase the interval to install the handler again.
If you think we are measuring wrong, or that 90% is too early, file a new bug report, please.
Comment 4 Debarshi Ray 2016-11-02 07:38:00 PDT
(In reply to comment #3)
> (In reply to comment #2)
>> The same workload / number of tabs that used to work quite well without the
>> memory pressure relief is now swapping. So I doubt WK is really running out
>> of memory.
> 
> That's a different issue though. This bug is about running the memory
> pressure handler several times when the memory situation is critical,
> assuming the measure is correct. Another issue is that we might be
> calculating the memory available wrongly, or that we are triggering the
> handler too early (we currently use 90%). Note also that the memory pressure
> doesn't monitors the webkit memory, but the system memory, so it's possible
> that it's not WebKit the one causing the high memory usage, but we try to
> help in any case by freeing caches, etc. In those cases it's common that we
> end up freeing very little memory or nothing, we could check that and
> increase the interval to install the handler again.

I can reliably make it swap with just one terminal (running screen over SSH) and epiphany with a dozen tabs. The tabs have a bunch of bugzilla.gnome.org pages, and some other websites. The tabs act as a TODO list, so I have had them for a while and it wasn't a problem before.

It can always be some other thing that is eating away memory. However, it seems unlikely since this happens just after opening some new tabs on a freshly booted machine.

> If you think we are measuring wrong, or that 90% is too early, file a new
> bug report, please.

Or I can wait for this bug to be fixed and see if it also addresses my problem. :)
Comment 5 Carlos Garcia Campos 2016-11-02 07:48:45 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> >> The same workload / number of tabs that used to work quite well without the
> >> memory pressure relief is now swapping. So I doubt WK is really running out
> >> of memory.
> > 
> > That's a different issue though. This bug is about running the memory
> > pressure handler several times when the memory situation is critical,
> > assuming the measure is correct. Another issue is that we might be
> > calculating the memory available wrongly, or that we are triggering the
> > handler too early (we currently use 90%). Note also that the memory pressure
> > doesn't monitors the webkit memory, but the system memory, so it's possible
> > that it's not WebKit the one causing the high memory usage, but we try to
> > help in any case by freeing caches, etc. In those cases it's common that we
> > end up freeing very little memory or nothing, we could check that and
> > increase the interval to install the handler again.
> 
> I can reliably make it swap with just one terminal (running screen over SSH)
> and epiphany with a dozen tabs. The tabs have a bunch of bugzilla.gnome.org
> pages, and some other websites. The tabs act as a TODO list, so I have had
> them for a while and it wasn't a problem before.
> 
> It can always be some other thing that is eating away memory. However, it
> seems unlikely since this happens just after opening some new tabs on a
> freshly booted machine.
> 
> > If you think we are measuring wrong, or that 90% is too early, file a new
> > bug report, please.
> 
> Or I can wait for this bug to be fixed and see if it also addresses my
> problem. :)

Since you can reliably reproduce it, it would be very useful for us to know some information to fix this bug. Could you please check the current memory usage of the system when this happens? Check what gnome-system-monitor says, and what is more important the contents of /proc/meminfo at that moment.
Comment 6 Debarshi Ray 2016-11-02 13:36:35 PDT
(In reply to comment #5)
> Since you can reliably reproduce it, it would be very useful for us to know
> some information to fix this bug. Could you please check the current memory
> usage of the system when this happens? Check what gnome-system-monitor says,

Running gnome-system-monitor reminded me that I don't have any swap partition on both my machines. I completely forgot about this because this has never been a problem in the last 4 years. :)

They both have 8G of RAM, and so far I have managed to build libreoffice or webkit while simultaneously getting other work done; Firefox and Chrome work; etc..

The 'Resources' tab in gnome-system-monitor showed memory use jump to 7G+ and there was a similar rise in CPU activity. Possibly due to kswapd. Next time (the machine freezes so I can interact while this is going on) I will keep the 'Processes' tab open and see what happens there.

> and what is more important the contents of /proc/meminfo at that moment.

Ok.
Comment 7 Carlos Alberto Lopez Perez 2016-11-02 15:17:17 PDT
(In reply to comment #3)
> Note also that the memory pressure
> doesn't monitors the webkit memory, but the system memory, so it's possible
> that it's not WebKit the one causing the high memory usage, but we try to
> help in any case by freeing caches, etc.

Is there any easy way of checking how much bytes would be discarded before triggering the handler?

> In those cases it's common that we
> end up freeing very little memory or nothing, we could check that and
> increase the interval to install the handler again.
> If you think we are measuring wrong, or that 90% is too early, file a new
> bug report, please.

An idea can be to avoid executing the memory pressure handler more than 2 or 3 times per minute.
Comment 8 Carlos Alberto Lopez Perez 2016-11-02 15:47:29 PDT
(In reply to comment #6)
> The 'Resources' tab in gnome-system-monitor showed memory use jump to 7G+
> and there was a similar rise in CPU activity. Possibly due to kswapd. Next
> time (the machine freezes so I can interact while this is going on) I will
> keep the 'Processes' tab open and see what happens there.
> 

Please, try the following:

1. Download this script https://people.igalia.com/clopez/printwebkitmemory.sh
2. Execute it and redirect the output to a file:

wget https://people.igalia.com/clopez/printwebkitmemory.sh
chmod +x printwebkitmemory.sh
./printwebkitmemory.sh > memory_stats_webkit.txt

It will print each 5 seconds the memory stats of your system and the memory usage numbers of the webkit related process (if any running).

Then when your system freezes, correlate the time at which the event happened with the data from the file.

Upload here the section of the file near the time at which the freeze happened.
Comment 9 Michael Catanzaro 2016-11-03 09:21:34 PDT
One more request. AFTER following the steps Carlos has in comment #8, please try this F23 scratch build: 

http://koji.fedoraproject.org/koji/taskinfo?taskID=16283792

It should be the same as what you tested, except it's built with -DENABLE_THREADED_COMPOSITOR=OFF. See if that makes a difference.

We think you might be impacted by bug #164049 as well.
Comment 10 Carlos Garcia Campos 2016-11-10 06:49:36 PST
(In reply to comment #0)
> The memory pressure relief was activated recently in WKGTK+
> 
> Using Ephy, every time it takes action, I can see something like this:
> 
> Memory pressure relief: Empty the PageCache: =dirty (at 8351166464 bytes)
> Memory pressure relief: Prune MemoryCache live resources: =dirty (at
> 8351166464 bytes)
> Memory pressure relief: Drain CSSValuePool: =dirty (at 8351166464 bytes)
> Memory pressure relief: Discard StyleResolvers: =dirty (at 8351166464 bytes)
> Memory pressure relief: Discard all JIT-compiled code: =dirty (at 8351166464
> bytes)
> Memory pressure relief: Dropping buffered data from paused media elements:
> =dirty (at 8351166464 bytes)
> Memory pressure relief: Purge inactive FontData: =dirty (at 8351166464 bytes)
> Memory pressure relief: Clear WidthCaches: =dirty (at 8351166464 bytes)
> Memory pressure relief: Discard Selector Query Cache: =dirty (at 8351166464
> bytes)
> Memory pressure relief: Prune MemoryCache dead resources: =dirty (at
> 8351166464 bytes)
> Memory pressure relief: Prune presentation attribute cache: =dirty (at
> 8351166464 bytes)
> Memory pressure relief: Run malloc_trim: =dirty (at 8351166464 bytes)
> Memory pressure relief: Release free FastMalloc memory: =dirty (at
> 8351166464 bytes)
> 
> 
> The mechanism is definitively good but the problem is when it doesn't
> succeed on freeing much memory.

Those values are useless because they are wrong :-( See bug #164589.

> What it happens is that gets in action
> again, and again, and again, not achieving much but making my hard drive get
> crazy and, therefore, almost stalling my whole desktop.
> 
> Sometimes, at some point it stops and I can retake the usage of the desktop
> but any action that I do in Ephy triggers again the memory pressure relief
> and I have to wait for some more minutes in hope that I will recover control
> over my desktop.

We currently wait at last 5 seconds before trying again. My plan is to check the memory released and if we can contribute at least 1MB, we wait for 30 seconds before trying again.
Comment 11 Carlos Garcia Campos 2016-11-10 07:23:52 PST
Created attachment 294370 [details]
Patch

Could you guys try this patch? I picked some values and did some tests, but we can adapt them if they are not good enough. The idea is that we don't free at least 1MB we wait for 30 seconds instead 5 to try again.
Comment 12 Build Bot 2016-11-10 11:01:52 PST
Comment on attachment 294370 [details]
Patch

Attachment 294370 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/2491769

New failing tests:
imported/w3c/web-platform-tests/IndexedDB/transaction-create_in_versionchange.htm
Comment 13 Build Bot 2016-11-10 11:01:55 PST
Created attachment 294387 [details]
Archive of layout-test-results from ews115 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 14 Debarshi Ray 2016-11-11 03:33:47 PST
(In reply to comment #8)
> (In reply to comment #6)
> > The 'Resources' tab in gnome-system-monitor showed memory use jump to 7G+
> > and there was a similar rise in CPU activity. Possibly due to kswapd. Next
> > time (the machine freezes so I can interact while this is going on) I will
> > keep the 'Processes' tab open and see what happens there.
> > 
> 
> Please, try the following:
> 
> 1. Download this script https://people.igalia.com/clopez/printwebkitmemory.sh
> 2. Execute it and redirect the output to a file:

Damn! I can't get it to swap on this Fedora 24 laptop with webkitgtk4-2.14.1-1.fc24.x86_64. It used to happen very predictably with the same version, and both my laptops used to be very similar memory-wise - 8G RAM, no swap partition.

Maybe I just need to go back to using Ephy by default to trigger it again. (I have temporarily switched to Firefox.)
Comment 15 Michael Catanzaro 2016-11-12 18:39:48 PST
I think we need to know if it works for Rishi and Andres before proceeding with this.
Comment 16 Debarshi Ray 2016-11-23 23:15:22 PST
(In reply to comment #14)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > The 'Resources' tab in gnome-system-monitor showed memory use jump to 7G+
> > > and there was a similar rise in CPU activity. Possibly due to kswapd. Next
> > > time (the machine freezes so I can interact while this is going on) I will
> > > keep the 'Processes' tab open and see what happens there.
> > > 
> > 
> > Please, try the following:
> > 
> > 1. Download this script https://people.igalia.com/clopez/printwebkitmemory.sh
> > 2. Execute it and redirect the output to a file:
> 
> Damn! I can't get it to swap on this Fedora 24 laptop with
> webkitgtk4-2.14.1-1.fc24.x86_64. It used to happen very predictably with the
> same version, and both my laptops used to be very similar memory-wise - 8G
> RAM, no swap partition.
> 
> Maybe I just need to go back to using Ephy by default to trigger it again.
> (I have temporarily switched to Firefox.)

I managed to trigger this a few times by - (a) opening a bunch of links from FB, (b) creating Fedora updates from admin.fedoraproject.org/updates. Unfortunately when kswapd kicks in, the printwebkitmemory.sh script crashes. 'fork' fails with 'unable to allocate memory', which crashes the bash child process.

Maybe I should actually test some of the webkit patches that were posted. Is there any scratch build for Fedora 24 or 25 that I could try?
Comment 17 Carlos Alberto Lopez Perez 2016-11-24 04:10:04 PST
(In reply to comment #16)
> (In reply to comment #14)
> > (In reply to comment #8)
> > > (In reply to comment #6)
> > > > The 'Resources' tab in gnome-system-monitor showed memory use jump to 7G+
> > > > and there was a similar rise in CPU activity. Possibly due to kswapd. Next
> > > > time (the machine freezes so I can interact while this is going on) I will
> > > > keep the 'Processes' tab open and see what happens there.
> > > > 
> > > 
> > > Please, try the following:
> > > 
> > > 1. Download this script https://people.igalia.com/clopez/printwebkitmemory.sh
> > > 2. Execute it and redirect the output to a file:
> > 
> > Damn! I can't get it to swap on this Fedora 24 laptop with
> > webkitgtk4-2.14.1-1.fc24.x86_64. It used to happen very predictably with the
> > same version, and both my laptops used to be very similar memory-wise - 8G
> > RAM, no swap partition.
> > 
> > Maybe I just need to go back to using Ephy by default to trigger it again.
> > (I have temporarily switched to Firefox.)
> 
> I managed to trigger this a few times by - (a) opening a bunch of links from
> FB, (b) creating Fedora updates from admin.fedoraproject.org/updates.
> Unfortunately when kswapd kicks in, the printwebkitmemory.sh script crashes.
> 'fork' fails with 'unable to allocate memory', which crashes the bash child
> process.
> 
> Maybe I should actually test some of the webkit patches that were posted. Is
> there any scratch build for Fedora 24 or 25 that I could try?

Wow...

Ok... you can use cgroups to run the browser inside a memory cgroup with a limited amount of RAM. 

Do this:

# Create the memory cgroup
$ sudo cgcreate -a $USER:$USER -s 777 -g memory:/webkit
# Set it to have a limit of 3GB soft / 4GB hard. (If you only have 4GB of RAM or less lower this values)
$ echo $(( 3 * 1024 * 1024 * 1024 )) > /sys/fs/cgroup/memory/webkit/memory.limit_in_bytes
$ echo $(( 4 * 1024 * 1024 * 1024 )) > /sys/fs/cgroup/memory/webkit/memory.memsw.limit_in_bytes
# Run the browser inside the cgroup
$ cgexec -g memory:webkit /path/to/MinibBrowser


Now you can also check the cgroup memory stats when the browser runs:

$ while sleep 1; do cat /sys/fs/cgroup/memory/webkit/memory.memsw.usage_in_bytes; done

The browser will not be allowed to use more than 4GB of RAM, so the rest of your system will work without problem even if it goes crazy. The kernel will just kill it if it attempts to use more than 4GB.
Comment 18 Tomas Popela 2016-11-24 04:43:15 PST
(In reply to comment #16)
> Maybe I should actually test some of the webkit patches that were posted. Is
> there any scratch build for Fedora 24 or 25 that I could try?

There is a scratch build for you Rishi - http://koji.fedoraproject.org/koji/taskinfo?taskID=16595876 it includes this patch and

https://bugs.webkit.org/show_bug.cgi?id=164589
https://bugs.webkit.org/show_bug.cgi?id=160497

as well.
Comment 19 Tomas Popela 2016-12-02 02:43:31 PST
The build from the previous comment failed, but I made a new one:

F24 - http://koji.fedoraproject.org/koji/taskinfo?taskID=16668274
F25 - http://koji.fedoraproject.org/koji/taskinfo?taskID=16668227

but this one doesn't have the https://bugs.webkit.org/show_bug.cgi?id=160497 applied (it was the reason why the previous build failed).
Comment 20 Andres Gomez Garcia 2016-12-09 02:41:17 PST
Created attachment 296649 [details]
Several memory measures with different confiturations and patches on top of 2.15.2
Comment 21 Zan Dobersek 2016-12-10 08:43:11 PST
(In reply to comment #20)
> Created attachment 296649 [details]
> Several memory measures with different confiturations and patches on top of
> 2.15.2

Can you provide the specific set of Web pages you're loading here? Or at least a set we can settle on for analysis?
Comment 22 Michael Catanzaro 2016-12-19 20:59:02 PST
(Note: I've removed this patch from the list of proposed backports for 2.14, because nothing has been committed yet. Feel free to put it back once something has been committed.)

(In reply to comment #20)
> Created attachment 296649 [details]
> Several memory measures with different confiturations and patches on top of
> 2.15.2

I think the main goal here was to see if your desktop still stalls after this patch is applied before committing it. Independent of the change in r208975, did this patch help with your problem?
Comment 23 Andres Gomez Garcia 2016-12-20 01:05:04 PST
(In reply to comment #22)
> (Note: I've removed this patch from the list of proposed backports for 2.14,
> because nothing has been committed yet. Feel free to put it back once
> something has been committed.)

Not sure what you did this, since they are proposed backports. Anyway ...

> (In reply to comment #20)
> > Created attachment 296649 [details]
> > Several memory measures with different confiturations and patches on top of
> > 2.15.2
> 
> I think the main goal here was to see if your desktop still stalls after
> this patch is applied before committing it. Independent of the change in
> r208975, did this patch help with your problem?

The patch helps to recover the control quicker.

The desktop still stalls, though.
Comment 24 Carlos Garcia Campos 2016-12-27 08:25:43 PST
(In reply to comment #22)
> (Note: I've removed this patch from the list of proposed backports for 2.14,
> because nothing has been committed yet. Feel free to put it back once
> something has been committed.)
> 
> (In reply to comment #20)
> > Created attachment 296649 [details]
> > Several memory measures with different confiturations and patches on top of
> > 2.15.2
> 
> I think the main goal here was to see if your desktop still stalls after
> this patch is applied before committing it. Independent of the change in
> r208975, did this patch help with your problem?

No, this patch doesn't try to fix the desktop stalling, but the high cpu usage when memory pressure is triggered and web process don't release much memory. If we can't release memory and it continues growing, the desktop will end up stalling anyway.
Comment 25 Carlos Garcia Campos 2016-12-28 00:47:18 PST
Could someone review this? I want to include this patch in 2.14.3 release. Feedback from people who tried this patch would be appreciated.
Comment 26 Michael Catanzaro 2016-12-28 10:40:53 PST
Comment on attachment 294370 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=294370&action=review

> Source/WebCore/platform/linux/MemoryPressureHandlerLinux.cpp:291
> +    size_t processMemory = processMemoryUsage();

That reminds me that this function ought to return a pair of values now, see bug #165533. But that's a separate issue that doesn't have to be fixed here.

> Source/WebCore/platform/linux/MemoryPressureHandlerLinux.cpp:293
> +    int64_t bytesFreed = processMemory - processMemoryUsage();

Careful, you have integer underflow here, you have to cast either processMemory or processMemoryUsage to int64_t on the right hand side of the assignment as the result of the subtraction is a size_t. i.e. the value has already underflowed before it gets assigned to the int64_t. It might work properly on x86_64, but with a 32-bit size_t I think you'd wind up with a huge positive number for bytesFreed when it should be negative.
Comment 27 Michael Catanzaro 2016-12-28 10:41:44 PST
(In reply to comment #26)
> Careful, you have integer underflow here

(in the case that memory use somehow increases after running the memory pressure handler)
Comment 28 Carlos Garcia Campos 2017-01-02 05:55:21 PST
Committed r210223: <http://trac.webkit.org/changeset/210223>
Comment 29 Debarshi Ray 2017-08-23 02:30:31 PDT
Sorry for not having responded earlier. I think this is indeed fixed for me. I haven't been able to reproduce with Fedora 26 so far:
epiphany-3.24.3-1.fc26.x86_64
webkitgtk4-2.16.6-1.fc26.x86_64