Bug 65766

Summary: [Linux] OSAllocator::reserveUncommitted should not commit physical memory
Product: WebKit Reporter: Uli Schlachter <psychon>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: barraclough, benjamin, eric, evan, hausmann, landry, mdempsky, mhahnenberg, mrobinson, ossy, tony, tonyg, webkit.review.bot, willchan, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Linux   
Bug Depends on: 96966, 108632    
Bug Blocks:    
Attachments:
Description Flags
Output of valgrind --massif
none
Patch fixing this problem
none
Slightly different patch, now hopefully it pleases the bot :-)
none
New patch for using mmap(PROT_NONE) and mprotect() for reserving virtual memory. none

Description Uli Schlachter 2011-08-05 05:28:07 PDT
Created attachment 103062 [details]
Output of valgrind --massif

Hi,

since debian updated me to a newer webkit version, I can't open more than one browser window since webkit dies the out-of-memory death during startup. The bug here clearly is a duplicate of bug#42756. However, I have 2 in /proc/sys/vm/overcommit_memory. That means that MAP_NORESERVE is ignored and thus the fix from there doesn't work.

I am using this value so that too big allocations fail immediately and not some random time later, which is quite hard to debug (as can be seen by that other bug). Also, this avoids the evil OOM killer which always picks the wrong process anyway.

I reproduced this problem with WebKit-r92445.tar.bz2 and Programs/GtkLauncher. Attached is the ms_print output for "valgrind --tool=massif --pages-as-heap=yes Programs/GtkLauncher" which shows that this mmap()'d 1GiB address space.

Obviously, setting /proc/sys/vm/overcommit_memory to 0 or 1 hides this problem and nothing crashes anymore, but that doesn't count as a solution for me.

Uli
Comment 1 Gavin Barraclough 2011-08-05 12:32:17 PDT
I'm going to retitle this bug since "JIT still requires VM overcommit" is not a bug, this is by design. :o)

We need a mechanism to map VM space without reserving physical memory.  It sounds like MAP_NORESERVE should do exactly this, so my first question would be, why is this not working?  If the kernel is not obeying the MAP_NORESERVE flag, it seems there are two options:

(1) Make the kernel obey MAP_NORESERVE.
(2) Find an alternate interface that will allocate VM without reserving physical memory.
Comment 2 Gavin Barraclough 2011-08-07 18:42:50 PDT
https://bugs.webkit.org/show_bug.cgi?id=61137 may be related
Comment 3 Mark Hahnenberg 2011-08-09 11:36:25 PDT
> Obviously, setting /proc/sys/vm/overcommit_memory to 0 or 1 hides this problem and nothing crashes anymore, but that doesn't count as a solution for me.
> 
What is the value in /proc/sys/vm/overcommit_ratio?  I think the default is 50, so you could up it to 100…but I mean, the underlying problem is that you're using 2 in overcommit_memory.  It seems that this setting is made to ignore MAP_NORESERVE by design in that it *always* checks to see if you have enough physical memory + swap as determined by your overcommit_ratio.  I think it's pretty unlikely that JSC will actually use that much memory.  If it does, then that is very likely to be a bug :-)  In the end though, if using 0 or 1 is a deal breaker…

tldr; (you|we)'ll have to go with option #2 from comment #1.
Comment 4 Mark Hahnenberg 2011-08-09 12:06:25 PDT
Examining the documentation within the Linux kernel, it explicitly says that the MAP_NORESERVE flag is ignored for overcommit_memory mode 2:

http://lxr.linux.no/linux/Documentation/vm/overcommit-accounting
Comment 5 Mark Hahnenberg 2011-08-09 12:18:30 PDT
A third option might be to "volunteer" ourselves if we're using a lot of memory to be killed by the OOM killer when it's searching for a process to kill.  We have some influence on the OOM killer by increasing or decreasing the value in /proc/<pid>/oomadj (http://linux-mm.org/OOM_Killer).  This could somewhat alleviate the problem that the OOM killer kills a "random" process that the OP described.  I'm not sure how effective this option might be, but I thought I'd at least bring it up.
Comment 6 Xan Lopez 2011-11-29 02:03:50 PST
 My opinion here is that if you are setting a '2' in /proc/sys/vm/overcommit_memory you are pretty much on your own, and something like you mention in comment #5 might be the best solution unless we can find alternate interfaces to allocate VM.
Comment 7 Gavin Barraclough 2011-11-29 12:29:50 PST
(In reply to comment #6)
>  My opinion here is that if you are setting a '2' in /proc/sys/vm/overcommit_memory you are pretty much on your own, and something like you mention in comment #5 might be the best solution unless we can find alternate interfaces to allocate VM.

I'm okay with a workaround like this going in, but I'm not sure that it's really helps at all.  It just means that when the OOM killer kicks in, your apps won't randomly get killed - instead it will routinely be the WebKit using app that gets the chop.  That doesn't seem unreasonable – after all, it is likely due to the bug that WebKit commits too much memory that results in the OOM killer deciding to take an app down, so we should take responsibility and let the axe fall on us.  But overall this just makes WebKit less stable – it increase the rate at which WebKit 'crashes' (the fine distinction between a crash and an OOM death likely being uninteresting to the user).

So long as the JIT is designed to run on systems that support allocating VM without committing physical memory (which I don't really see changing) the only real fix seems to be to find a way to allocate VM without committing physical memory.  :o)

If we do land a change like this, we should not close this bug – we still need to track that reserveUncommitted does not work correctly on Linux.
Comment 8 Uli Schlachter 2012-07-22 12:55:53 PDT
(In reply to comment #1)
> We need a mechanism to map VM space without reserving physical memory.  It sounds like MAP_NORESERVE should do exactly this, so my first question would be, why is this not working?  If the kernel is not obeying the MAP_NORESERVE flag, it seems there are two options:
> 
> (1) Make the kernel obey MAP_NORESERVE.
> (2) Find an alternate interface that will allocate VM without reserving physical memory.

Mapping anonymous memory read-only doesn't make the (linux) kernel count it as committed. That should make it possible to allocate virtual address space without reserving physical memory, even when overcommit is disabled.

Actually making this memory usable (OSAllocator::commit(), I guess) would then require mprotect() to make the memory writable/executable.

Could this scheme work? If not, which part did I miss?
Comment 9 Uli Schlachter 2012-09-16 10:26:00 PDT
Created attachment 164320 [details]
Patch fixing this problem

The attached patch implements my proposed solution. I tested this with vm.overcommit_memory=2. Without this patch, starting ./Programs/MiniBrowser fails. With the patch, starting 5 instances of this program is no problem.
Comment 10 WebKit Review Bot 2012-09-16 10:27:32 PDT
Attachment 164320 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WTF/ChangeLog', u'Source/WTF/wtf/OS..." exit_code: 1
Source/WTF/wtf/OSAllocatorPosix.cpp:139:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 Uli Schlachter 2012-09-16 10:35:14 PDT
Created attachment 164321 [details]
Slightly different patch, now hopefully it pleases the bot :-)
Comment 12 Gavin Barraclough 2012-09-16 19:57:06 PDT
Comment on attachment 164321 [details]
Slightly different patch, now hopefully it pleases the bot :-)

This looks great, do you happen to have any knowledge of the BSD VM system, and whether this fix may apply? – OpenBSD have the same problem (see https://bugs.webkit.org/show_bug.cgi?id=61137) I'm wondering if the #if guard should include OS(OPENBSD), OS(NETBSD) and/or OS(FREEBSD) in addition to OS(LINUX).
Comment 13 Landry Breuil 2012-09-17 12:22:57 PDT
(In reply to comment #12)
> (From update of attachment 164321 [details])
> This looks great, do you happen to have any knowledge of the BSD VM system, and whether this fix may apply? – OpenBSD have the same problem (see https://bugs.webkit.org/show_bug.cgi?id=61137) I'm wondering if the #if guard should include OS(OPENBSD), OS(NETBSD) and/or OS(FREEBSD) in addition to OS(LINUX).

From discussions i had with some of our VM devs, this wont work on OpenBSD - ie will blow at runtime, and apparently will throw away all the security measures (address space randomization and all). Don't ask me why, i'm just the messenger, and i don't understand how this whole stuff works.

As for bug 61137, it's just workarounding the fact that pre-allocating 1Gb at startup when ulimit -m is 512Mb will directly blow up. So even if we had a 'mechanism to allocate VM without committing the physical pages up front' i think this wouldnt work.
Comment 14 Uli Schlachter 2012-09-17 12:40:47 PDT
(In reply to comment #13)
> (In reply to comment #12)
> > (From update of attachment 164321 [details] [details])
> > This looks great, do you happen to have any knowledge of the BSD VM system, and whether this fix may apply? – OpenBSD have the same problem (see https://bugs.webkit.org/show_bug.cgi?id=61137) I'm wondering if the #if guard should include OS(OPENBSD), OS(NETBSD) and/or OS(FREEBSD) in addition to OS(LINUX).
> 
> From discussions i had with some of our VM devs, this wont work on OpenBSD - ie will blow at runtime, and apparently will throw away all the security measures (address space randomization and all). Don't ask me why, i'm just the messenger, and i don't understand how this whole stuff works.

I doubt that alot. The only thing that changes is that instead of asking for read-write-exec it now asks for "no access" memory. It doesn't dictate the virtual address of the memory block. So IMHO this shouldn't affect ASLR, but I won't ask you why it does affect it. :-)

> As for bug 61137, it's just workarounding the fact that pre-allocating 1Gb at startup when ulimit -m is 512Mb will directly blow up. So even if we had a 'mechanism to allocate VM without committing the physical pages up front' i think this wouldnt work.

"ulimit -m" limits the memory size. "ulimit -v" limits virtual memory. This code tries to reserve virtual memory without getting any "real" memory for it, so the 512 MiB limit on ulimit -m shouldn't matter (yet). In theory. So, from my understanding, this shouldn't hit the "ulimit -m" limit until it actually starts using that memory.
Comment 15 WebKit Review Bot 2012-09-17 13:06:48 PDT
Comment on attachment 164321 [details]
Slightly different patch, now hopefully it pleases the bot :-)

Clearing flags on attachment: 164321

Committed r128796: <http://trac.webkit.org/changeset/128796>
Comment 16 WebKit Review Bot 2012-09-17 13:06:52 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 Matthew Dempsky 2012-09-17 13:14:39 PDT
> "ulimit -m" limits the memory size. "ulimit -v" limits virtual memory.

So OpenBSD doesn't have this distinction.  We [OpenBSD] don't currently support unreserved mappings, nor do we have any intention to currently.  E.g., we don't have a MAP_NORESERVE flag for mmap() or any kernel knobs analogous to overcommit_memory/overcommit_ratio, and there's no -v option for our ulimit.

That said, I think it would be reasonable for us to still support PROT_NONE + MAP_NORESERVE mappings just for the purpose of reserving an (inaccessible) region of virtual memory that doesn't count against the process's used memory limit and with the guarantee that mmap() won't try to reuse that memory later unless explicitly requested to (i.e., with MAP_FIXED).  Is that adequate for WebKit's needs?

To be explicit, I'm envisioning syscalls like this on OpenBSD:

  p = mmap(0, 1 << 20, PROT_NONE, MAP_ANON | MAP_NORESERVE, -1, 0);
  /* [p, p+1M) is still unusable, but the OS will not reuse it for something else */

  mmap(p + 65536, 16384, PROT_READ | PROT_WRITE, MAP_ANON | MAP_FIXED, -1, 0);
  /* [p+64k, p+80k) is now readable+writable and counts as 16k against process resource limit; call might instead fail though if resources are exhausted */

  mmap(p + 65536, 8192, PROT_NONE, MAP_ANON | MAP_NORESERVE | MAP_FIXED, -1, 0);
  /* remap [p+64k, p+72k) as noreserve, effectively freeing it except for OS reuse guarantees; [p+72k, p+80k) is still usable and counts as only 8k against resource limits */
Comment 18 Gavin Barraclough 2012-09-17 13:44:23 PDT
(In reply to comment #17)
> That said, I think it would be reasonable for us to still support PROT_NONE + MAP_NORESERVE mappings just for the purpose of reserving an (inaccessible) region of virtual memory that doesn't count against the process's used memory limit and with the guarantee that mmap() won't try to reuse that memory later unless explicitly requested to (i.e., with MAP_FIXED).  Is that adequate for WebKit's needs?

This sounds like it should work perfectly to me.

FYI you may want to move further discussion over to bug#61137!
Comment 19 WebKit Review Bot 2012-09-17 15:57:03 PDT
Re-opened since this is blocked by 96966
Comment 20 Csaba Osztrogonác 2012-09-17 16:03:42 PDT
(In reply to comment #19)
> Re-opened since this is blocked by 96966

The patch is reverted by https://trac.webkit.org/changeset/128818 
because it broke everything. (At least Qt, GTK, EFL) No, I won't 
upload backtraces, links, etc. build.webkit.org is your friend,
you guys should have watch it after landing this patch ...
Comment 21 Uli Schlachter 2012-09-18 09:03:39 PDT
Created attachment 164570 [details]
New patch for using mmap(PROT_NONE) and mprotect() for reserving virtual memory.

(In reply to comment #20)
> (In reply to comment #19)
> > Re-opened since this is blocked by 96966
> 
> The patch is reverted by https://trac.webkit.org/changeset/128818 
> because it broke everything. (At least Qt, GTK, EFL) No, I won't 
> upload backtraces, links, etc. build.webkit.org is your friend,
> you guys should have watch it after landing this patch ...

Urgh. I'm sorry. I did two big mistakes. The bigger one was not to test the changes I did in response to the style warnings. The second mistake was changing "if (mprotect() != 0)" into "if (!mprotect())".

So here is a new patch which removes those evil "!". I even tested this one.

Again: Sorry, entirely my fault.
Comment 22 Eric Seidel (no email) 2012-10-30 13:27:35 PDT
I'm not sure if Chromium even uses this code on Linux, but I've CC'd some linux peeps.  If they like this, I'm happy to r+ this  (or they can).
Comment 23 William Chan 2012-10-30 13:47:57 PDT
V8 does its own allocations directly, and for malloc() Chromium/Linux uses TCMalloc, which does its own allocations directly as well. My brief look at the code seems to indicate this is used by JavaScriptCore for its OS memory allocations. Therefore I believe, but am not absolutely certain, that Chromium/Linux does not use this code.
Comment 24 Tony Chang 2012-10-30 14:51:41 PDT
Will looks correct. I don't see OSAllocated used outside of JavaScriptCore, which Chromium doesn't use.
Comment 25 WebKit Review Bot 2012-12-18 00:38:30 PST
Comment on attachment 164570 [details]
New patch for using mmap(PROT_NONE) and mprotect() for reserving virtual memory.

Clearing flags on attachment: 164570

Committed r137994: <http://trac.webkit.org/changeset/137994>
Comment 26 WebKit Review Bot 2012-12-18 00:38:35 PST
All reviewed patches have been landed.  Closing bug.
Comment 27 Csaba Osztrogonác 2013-02-15 03:26:49 PST
(In reply to comment #25)
> (From update of attachment 164570 [details])
> Clearing flags on attachment: 164570
> 
> Committed r137994: <http://trac.webkit.org/changeset/137994>

Are you going to fix the regression caused by this patch on Thumb2?
See https://bugs.webkit.org/show_bug.cgi?id=108632 for details.