Bug 118733 - Javascript JIT still allocates 2GB of memory on x86-64 Linux
Summary: Javascript JIT still allocates 2GB of memory on x86-64 Linux
Status: UNCONFIRMED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-16 07:45 PDT by Török Edwin
Modified: 2013-07-16 10:45 PDT (History)
4 users (show)

See Also:


Attachments
Use MAP_32BIT (1.74 KB, patch)
2013-07-16 07:45 PDT, Török Edwin
oliver: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Török Edwin 2013-07-16 07:45:06 PDT
Created attachment 206778 [details]
Use MAP_32BIT

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712387 for the original bugreport.

Currently the javascript JIT allocates 2GB on x86-64, to ensure that all jumps are within a 2GB range.
This causes problems without overcommit (or without a swapfile), even on a machine with 8GB of physical RAM.

Attached is a patch that uses MAP_32BIT, instead of the wasteful allocation of 2GB.
Comment 1 Oliver Hunt 2013-07-16 08:48:21 PDT
Comment on attachment 206778 [details]
Use MAP_32BIT

View in context: https://bugs.webkit.org/attachment.cgi?id=206778&action=review

> b/src/3rdparty/javascriptcore/JavaScriptCore/jit/ExecutableAllocatorFixedVMPool.cpp:44
> -#ifdef QT_USE_ONEGB_VMALLOCATOR
> -    #define VM_POOL_SIZE (1024u * 1024u * 1024u) // 1Gb
> -#else
> -    #define VM_POOL_SIZE (2u * 1024u * 1024u * 1024u) // 2Gb
> -#endif
> +    // On x86-64, where we require all jumps to have a 2Gb max range we'll use
> +    // MAP_32BIT
> +    #define VM_POOL_SIZE (32u * 1024u * 1024u) // 32Mb

Nope, we aren't taking a 32Mb JIT segment on 64 bit, also this leads to an incorrect comment.

The comment also becomes wrong
Comment 2 Oliver Hunt 2013-07-16 08:54:00 PDT
(In reply to comment #0)
> Created an attachment (id=206778) [details]
> Use MAP_32BIT
> 
> See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712387 for the original bugreport.
> 
> Currently the javascript JIT allocates 2GB on x86-64, to ensure that all jumps are within a 2GB range.
> This causes problems without overcommit (or without a swapfile), even on a machine with 8GB of physical RAM.
> 
> Attached is a patch that uses MAP_32BIT, instead of the wasteful allocation of 2GB.

It doesn't use 2gig of memory, it reserves 2gig of address space.  I would be stunned if linux cannot handle reserving address space as that's a common technique used by garbage collectors.

32Mb is also vastly too small to successfully jit large pieces of JS so i'll assume that there were no perf tests run on this patch either.

I think the correct fix here is to find out how linux GCs reserve address space without having the VM allocate physical backing memory.
Comment 3 Török Edwin 2013-07-16 09:03:25 PDT
(In reply to comment #2)
> (In reply to comment #0)
> > Created an attachment (id=206778) [details] [details]
> > Use MAP_32BIT
> > 
> > See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712387 for the original bugreport.
> > 
> > Currently the javascript JIT allocates 2GB on x86-64, to ensure that all jumps are within a 2GB range.
> > This causes problems without overcommit (or without a swapfile), even on a machine with 8GB of physical RAM.
> > 
> > Attached is a patch that uses MAP_32BIT, instead of the wasteful allocation of 2GB.
> 
> It doesn't use 2gig of memory, it reserves 2gig of address space.  I would be stunned if linux cannot handle reserving address space as that's a common technique used by garbage collectors.
> 
> 32Mb is also vastly too small to successfully jit large pieces of JS so i'll assume that there were no perf tests run on this patch either.

Right, I have tested this only with Qt/KDE, where JIT performance doesn't really matter (in fact I'd be perfectly happy with disabling the JIT for it).

I haven't checked how the allocation code works, but won't it allocate more pools when it runs out of the 32MB?

I've chosen 32MB because that is what the 32-bit code uses, but if a larger value would be more appropriate then both should be updated.

> 
> I think the correct fix here is to find out how linux GCs reserve address space without having the VM allocate physical backing memory.

There is a MAP_NORESERVE on the allocation but that doesn't seem to do what its supposed to: the OOM killer kicks in and starts killing applications once KWin+plasma-desktop+other applications exceed my physical RAM (8GB).
If I run the patched libqt4-script (which has the patched javascriptcore), then the memory usage of KDE is no longer >2GB, and the OOM killer never kicks in.

Reserving 2GB of VIRT, for a feature that is not critical for the application is not really nice...
Comment 4 Oliver Hunt 2013-07-16 09:15:18 PDT
(In reply to comment #3)
> > 32Mb is also vastly too small to successfully jit large pieces of JS so i'll assume that there were no perf tests run on this patch either.
> 
> Right, I have tested this only with Qt/KDE, where JIT performance doesn't really matter (in fact I'd be perfectly happy with disabling the JIT for it).
>

Have you talked to any other Qt embedders? 
 
> I haven't checked how the allocation code works, but won't it allocate more pools when it runs out of the 32MB?

No.  The whole point of the FixedVMPool is that there is a single region of the address space in which all hit compiled code goes, this allows us to guarantee that all jumps are within the 32bit range allowed by a direct PC relative branch

> 
> I've chosen 32MB because that is what the 32-bit code uses, but if a larger value would be more appropriate then both should be updated.
> 

Desktop 32-bit does not use the FixedVMPool, I think the only 32bit platform that uses it is iOS.


> > 
> > I think the correct fix here is to find out how linux GCs reserve address space without having the VM allocate physical backing memory.
> 
> There is a MAP_NORESERVE on the allocation but that doesn't seem to do what its supposed to: the OOM killer kicks in and starts killing applications once KWin+plasma-desktop+other applications exceed my physical RAM (8GB).
> If I run the patched libqt4-script (which has the patched javascriptcore), then the memory usage of KDE is no longer >2GB, and the OOM killer never kicks in.
> 
> Reserving 2GB of VIRT, for a feature that is not critical for the application is not really nice...

Reserving 2gig of address space when there are terabytes (exabytes?) of address space available should be fine.  JSC and FastMalloc both make extensive use of reserved but unused address space, this seems like something where we need to know exactly what needs to happen to get linux to do the right thing (I refuse to believe that the linux VM can't handle this given that even windows can)
Comment 5 Török Edwin 2013-07-16 09:25:00 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > > 32Mb is also vastly too small to successfully jit large pieces of JS so i'll assume that there were no perf tests run on this patch either.
> > 
> > Right, I have tested this only with Qt/KDE, where JIT performance doesn't really matter (in fact I'd be perfectly happy with disabling the JIT for it).
> >
> 
> Have you talked to any other Qt embedders? 
> 
> > I haven't checked how the allocation code works, but won't it allocate more pools when it runs out of the 32MB?
> 
> No.  The whole point of the FixedVMPool is that there is a single region of the address space in which all hit compiled code goes, this allows us to guarantee that all jumps are within the 32bit range allowed by a direct PC relative branch

My patch uses MAP_32BIT to guarantee that jumps are within 32-bit range.

> 
> > 
> > I've chosen 32MB because that is what the 32-bit code uses, but if a larger value would be more appropriate then both should be updated.
> > 
> 
> Desktop 32-bit does not use the FixedVMPool, I think the only 32bit platform that uses it is iOS.
> 
> 
> > > 
> > > I think the correct fix here is to find out how linux GCs reserve address space without having the VM allocate physical backing memory.
> > 
> > There is a MAP_NORESERVE on the allocation but that doesn't seem to do what its supposed to: the OOM killer kicks in and starts killing applications once KWin+plasma-desktop+other applications exceed my physical RAM (8GB).
> > If I run the patched libqt4-script (which has the patched javascriptcore), then the memory usage of KDE is no longer >2GB, and the OOM killer never kicks in.
> > 
> > Reserving 2GB of VIRT, for a feature that is not critical for the application is not really nice...
> 
> Reserving 2gig of address space when there are terabytes (exabytes?) of address space available should be fine.  JSC and FastMalloc both make extensive use of reserved but unused address space, this seems like something where we need to know exactly what needs to happen to get linux to do the right thing (I refuse to believe that the linux VM can't handle this given that even windows can)

I'll try to write some standalone testcases with mmap to see what happens.
I'm running Linux 3.10.1, it could be a regression kernel-side.

In general its better to not overcommit, unless you have to. In this case I think the 2GB reservation can be worked around by using MAP_32BIT, so it is not an absolute requirement to overcommit by that much.
Comment 6 Gavin Barraclough 2013-07-16 10:05:26 PDT
(In reply to comment #5)
> In general its better to not overcommit, unless you have to. In this case I think the 2GB reservation can be worked around by using MAP_32BIT, so it is not an absolute requirement to overcommit by that much.

Agreed – but note that the memory is being allocated by a method called 'reserveUncommitted' – an implementation of this method should be allocating VM without committing, so there should be no overcommit.  Unfortuantely it appears that this is a little trickier to achieve on Linux than other platforms (Windows appears to win at best VM API for this particular use case) – but it seems to have been possible to coax Linux into doing so in the past, and the real fix here should be to get this going again.
Comment 7 Oliver Hunt 2013-07-16 10:08:40 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > > 32Mb is also vastly too small to successfully jit large pieces of JS so i'll assume that there were no perf tests run on this patch either.
> > > 
> > > Right, I have tested this only with Qt/KDE, where JIT performance doesn't really matter (in fact I'd be perfectly happy with disabling the JIT for it).
> > >
> > 
> > Have you talked to any other Qt embedders? 
> > 
> > > I haven't checked how the allocation code works, but won't it allocate more pools when it runs out of the 32MB?
> > 
> > No.  The whole point of the FixedVMPool is that there is a single region of the address space in which all hit compiled code goes, this allows us to guarantee that all jumps are within the 32bit range allowed by a direct PC relative branch
> 
> My patch uses MAP_32BIT to guarantee that jumps are within 32-bit range.

As long as the FixedVMPool is 2gig or less you get that guarantee.  MAP_32BIT is irrelevant to that.

Please read my replies, I said _No_.  The FixedVMPool allocator means we have a _fixed_ pool.  There is only one allocation.  There are no more after that.  All JIT code goes into that one allocation.  If that region is exhausted, and we're unable to reclaim sufficient space we will just drop our generated code at link time and fall back to the interpreter.

> In general its better to not overcommit, unless you have to. In this case I think the 2GB reservation can be worked around by using MAP_32BIT, so it is not an absolute requirement to overcommit by that much.

What? MAP_32BIT is not an answer here.

Do you understand what this code is doing?  Your references to MAP_32BIT are confusing me.

My understanding of MAP_32BIT are that it is a linux flag to force page allocation in the low 4 gig of the 64 bit address space, that's all.  Is my understanding of that behavior correct?
Comment 8 Török Edwin 2013-07-16 10:45:53 PDT
(In reply to comment #7)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > (In reply to comment #3)
> > > > > 32Mb is also vastly too small to successfully jit large pieces of JS so i'll assume that there were no perf tests run on this patch either.
> > > > 
> > > > Right, I have tested this only with Qt/KDE, where JIT performance doesn't really matter (in fact I'd be perfectly happy with disabling the JIT for it).
> > > >
> > > 
> > > Have you talked to any other Qt embedders? 
> > > 
> > > > I haven't checked how the allocation code works, but won't it allocate more pools when it runs out of the 32MB?
> > > 
> > > No.  The whole point of the FixedVMPool is that there is a single region of the address space in which all hit compiled code goes, this allows us to guarantee that all jumps are within the 32bit range allowed by a direct PC relative branch
> > 
> > My patch uses MAP_32BIT to guarantee that jumps are within 32-bit range.
> 
> As long as the FixedVMPool is 2gig or less you get that guarantee.  MAP_32BIT is irrelevant to that.
> 
> Please read my replies, I said _No_.  The FixedVMPool allocator means we have a _fixed_ pool.  There is only one allocation.  There are no more after that.  All JIT code goes into that one allocation.  If that region is exhausted, and we're unable to reclaim sufficient space we will just drop our generated code at link time and fall back to the interpreter.
> 
> > In general its better to not overcommit, unless you have to. In this case I think the 2GB reservation can be worked around by using MAP_32BIT, so it is not an absolute requirement to overcommit by that much.
> 
> What? MAP_32BIT is not an answer here.

> 
> Do you understand what this code is doing?  Your references to MAP_32BIT are confusing me.

Turns out my patch is indeed flawed, as I've only read/patched the part of the code that showed up in gdb, and I somehow thought you would do multiple allocations and then MAP_32BIT could've been used to keep them no more than 2GB apart. But that is ExecutableAllocatorPosix.cpp, not ExecutableAllocatorFixedVMPool.cpp.

So a possible solution would be to use ExecutableAllocatorPosix with MAP_32BIT, but lets first try to find out where the regression happened.
(I know when it happened, 2013-06-15 when I filed the Debian bug, but I upgraded Qt, KDE, and the kernel around the time.
I'm still working on that mmap testcase, will let you know once I have a working testcase. Worst case I'll try downgrading packages one by one to figure out which is the culprit.)

> 
> My understanding of MAP_32BIT are that it is a linux flag to force page allocation in the low 4 gig of the 64 bit address space, that's all.  Is my understanding of that behavior correct?

manpage says lower 2GB:
 MAP_32BIT (since Linux 2.4.20, 2.6)
              Put the mapping into the first 2 Gigabytes of the process address  space.