Bug 183329 - Define GIGACAGE_ALLOCATION_CAN_FAIL on Linux
Summary: Define GIGACAGE_ALLOCATION_CAN_FAIL on Linux
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: bmalloc (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P3 Normal
Assignee: Michael Catanzaro
URL:
Keywords: Gtk, InRadar
: 183594 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-03-05 06:44 PST by Jeremy Bicha
Modified: 2018-05-22 03:49 PDT (History)
12 users (show)

See Also:


Attachments
Patch (1.93 KB, patch)
2018-03-14 07:41 PDT, Michael Catanzaro
no flags Details | Formatted Diff | Diff
Patch (1.54 KB, patch)
2018-05-21 06:30 PDT, Yusuke Suzuki
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Bicha 2018-03-05 06:44:40 PST
Ubuntu 18.04 (pre-Beta) recently upgraded to webkit2gtk 2.19.91. We are receiving lots of crash reports from Deja Dup at https://launchpad.net/bugs/1751460

Deja Dup is versioned 37.1.

Stacktrace
----------
#0  0x00007fac7224d588 in  () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#1  0x00007fac76c64827 in __pthread_once_slow (once_control=0x7fac724b502c, init_routine=0x7fac69330490 <__once_proxy>) at pthread_once.c:116
        _buffer = {__routine = 0x7fac76c64880 <clear_once_control>, __arg = 0x7fac724b502c, __canceltype = 2008932720, __prev = 0x0}
        val = <optimized out>
        newval = <optimized out>
#2  0x00007fac7224ce0d in Gigacage::ensureGigacage() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#3  0x00007fac7224dd01 in bmalloc::Heap::Heap(bmalloc::HeapKind, std::lock_guard<bmalloc::StaticMutex>&) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#4  0x00007fac7224bb10 in bmalloc::PerProcess<bmalloc::PerHeapKind<bmalloc::Heap> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#5  0x00007fac7224b7d4 in bmalloc::Cache::Cache(bmalloc::HeapKind) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#6  0x00007fac7224bbf6 in bmalloc::PerThread<bmalloc::PerHeapKind<bmalloc::Cache> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#7  0x00007fac7224b87f in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#8  0x00007fac72230f56 in WTF::StringImpl::createFromLiteral(char const*, unsigned int) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#9  0x00007fac72230fe1 in WTF::StringImpl::createFromLiteral(char const*) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#10 0x00007fac7223d3c0 in WTF::String::String(WTF::ASCIILiteral) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#11 0x00007fac729f7557 in  () at /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#12 0x00007fac779c5733 in call_init (env=0x7ffc3677ae08, argv=0x7ffc3677adf8, argc=1, l=<optimized out>) at dl-init.c:72
        j = <optimized out>
        jm = <optimized out>
        addrs = <optimized out>
        init_array = <optimized out>
        env = 0x7ffc3677ae08
        argv = 0x7ffc3677adf8
        argc = 1
        l = <optimized out>
        preinit_array = <optimized out>
        preinit_array_size = <optimized out>
        i = 6
#13 0x00007fac779c5733 in _dl_init (main_map=0x7fac77bde170, argc=1, argv=0x7ffc3677adf8, env=0x7ffc3677ae08) at dl-init.c:119
        preinit_array = <optimized out>
        preinit_array_size = <optimized out>
        i = 6
#14 0x00007fac779b60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2
#15 0x0000000000000001 in  ()
#16 0x00007ffc3677b9c3 in  ()
#17 0x0000000000000000 in  ()
Comment 1 Jeremy Bicha 2018-03-05 06:45:36 PST
Thread Stacktrace
=================
.
Thread 1 (Thread 0x7fac77b6aac0 (LWP 3178)):
#0  0x00007fac7224d588 in  () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#1  0x00007fac76c64827 in __pthread_once_slow (once_control=0x7fac724b502c, init_routine=0x7fac69330490 <__once_proxy>) at pthread_once.c:116
        _buffer = {__routine = 0x7fac76c64880 <clear_once_control>, __arg = 0x7fac724b502c, __canceltype = 2008932720, __prev = 0x0}
        val = <optimized out>
        newval = <optimized out>
#2  0x00007fac7224ce0d in Gigacage::ensureGigacage() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#3  0x00007fac7224dd01 in bmalloc::Heap::Heap(bmalloc::HeapKind, std::lock_guard<bmalloc::StaticMutex>&) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#4  0x00007fac7224bb10 in bmalloc::PerProcess<bmalloc::PerHeapKind<bmalloc::Heap> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#5  0x00007fac7224b7d4 in bmalloc::Cache::Cache(bmalloc::HeapKind) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#6  0x00007fac7224bbf6 in bmalloc::PerThread<bmalloc::PerHeapKind<bmalloc::Cache> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#7  0x00007fac7224b87f in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#8  0x00007fac72230f56 in WTF::StringImpl::createFromLiteral(char const*, unsigned int) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#9  0x00007fac72230fe1 in WTF::StringImpl::createFromLiteral(char const*) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#10 0x00007fac7223d3c0 in WTF::String::String(WTF::ASCIILiteral) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#11 0x00007fac729f7557 in  () at /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#12 0x00007fac779c5733 in call_init (env=0x7ffc3677ae08, argv=0x7ffc3677adf8, argc=1, l=<optimized out>) at dl-init.c:72
        j = <optimized out>
        jm = <optimized out>
        addrs = <optimized out>
        init_array = <optimized out>
        env = 0x7ffc3677ae08
        argv = 0x7ffc3677adf8
        argc = 1
        l = <optimized out>
        preinit_array = <optimized out>
        preinit_array_size = <optimized out>
        i = 6
#13 0x00007fac779c5733 in _dl_init (main_map=0x7fac77bde170, argc=1, argv=0x7ffc3677adf8, env=0x7ffc3677ae08) at dl-init.c:119
        preinit_array = <optimized out>
        preinit_array_size = <optimized out>
        i = 6
#14 0x00007fac779b60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2
#15 0x0000000000000001 in  ()
#16 0x00007ffc3677b9c3 in  ()
#17 0x0000000000000000 in  ()
Comment 2 Carlos Garcia Campos 2018-03-05 06:51:05 PST
The bt looks similar to the one in https://bugs.webkit.org/show_bug.cgi?id=179914#c39
Comment 3 Michael Catanzaro 2018-03-05 09:08:26 PST
The failure occurs here:

            // FIXME: Randomize where this goes.
            // https://bugs.webkit.org/show_bug.cgi?id=175245
            void* base = tryVMAllocate(maxAlignment, totalSize);
            if (!base) {
                if (GIGACAGE_ALLOCATION_CAN_FAIL)
                    return;
                fprintf(stderr, "FATAL: Could not allocate gigacage memory with maxAlignment = %lu, totalSize = %lu.\n", maxAlignment, totalSize);
                BCRASH();
            }

So tryVMAllocate fails. That means bmalloc was unable to allocate virtual memory. That's not supposed to fail (obviously). Implementation is here:

inline void* tryVMAllocate(size_t vmSize)
{
    vmValidate(vmSize);
    void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
    if (result == MAP_FAILED)
        return nullptr;
    return result;
}

So the problem boils down to this mmap call. It's very strange that this is only happening with Deja Dup. Other applications are unaffected?
Comment 4 Michael Catanzaro 2018-03-05 09:13:36 PST
Jeremy, here's some debug you could try adding to Source/bmalloc/bmalloc/VMAllocate.h:

// At the top of the file, before the bmalloc namespace
#include <cstring>
#include <errno.h>

inline void* tryVMAllocate(size_t vmSize)
{
    vmValidate(vmSize);
    void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
    if (result == MAP_FAILED)
{
WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno, strerror(errno));
        return nullptr;
}
    return result;
}

That would tell us which of the many possible errors are occurring here.

And if you need an immediate workaround, you can of course build with -DUSE_SYSTEM_MALLOC=ON. That will be bad, so I can't recommend that... but you're already disabling GStreamerGL and web fonts.... :P
Comment 5 Michael Catanzaro 2018-03-05 09:15:35 PST
(In reply to Michael Catanzaro from comment #3)
>                 fprintf(stderr, "FATAL: Could not allocate gigacage memory
> with maxAlignment = %lu, totalSize = %lu.\n", maxAlignment, totalSize);

This might be helpful too. That should also print before the crash.

(In reply to Michael Catanzaro from comment #4)
> inline void* tryVMAllocate(size_t vmSize)
> {
>     vmValidate(vmSize);
>     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
> MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
>     if (result == MAP_FAILED)
> {
> WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno,
> strerror(errno));
>         return nullptr;
> }
>     return result;
> }

I forgot we can't use WTF here. Got to use fprintf:

fprintf(stderr, "%s: mmap failed: %d (%s)", __FUNCTION__, errno, strerror(errno));
Comment 6 Yusuke Suzuki 2018-03-05 09:22:47 PST
(In reply to Michael Catanzaro from comment #4)
> Jeremy, here's some debug you could try adding to
> Source/bmalloc/bmalloc/VMAllocate.h:
> 
> // At the top of the file, before the bmalloc namespace
> #include <cstring>
> #include <errno.h>
> 
> inline void* tryVMAllocate(size_t vmSize)
> {
>     vmValidate(vmSize);
>     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
> MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
>     if (result == MAP_FAILED)
> {
> WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno,
> strerror(errno));
>         return nullptr;
> }
>     return result;
> }
> 
> That would tell us which of the many possible errors are occurring here.
> 
> And if you need an immediate workaround, you can of course build with
> -DUSE_SYSTEM_MALLOC=ON. That will be bad, so I can't recommend that... but
> you're already disabling GStreamerGL and web fonts.... :P

The immediate fix is disabling Gigacage by setting GIGACAGE_ENABLED 0 in bmalloc/Gigacage.h.
This keeps bmalloc, but disables Gigacage.

My guess is that Linux fails to mmap regions and returns MAP_FAILED if the size is very large.
But I'm not sure right now since it is working on my environment...
Anyway, @mcatanzaro, do you know the way to allocate virtual memory region which does not have actual backing pages?
Comment 7 Yusuke Suzuki 2018-03-05 09:48:33 PST
(In reply to Yusuke Suzuki from comment #6)
> (In reply to Michael Catanzaro from comment #4)
> > Jeremy, here's some debug you could try adding to
> > Source/bmalloc/bmalloc/VMAllocate.h:
> > 
> > // At the top of the file, before the bmalloc namespace
> > #include <cstring>
> > #include <errno.h>
> > 
> > inline void* tryVMAllocate(size_t vmSize)
> > {
> >     vmValidate(vmSize);
> >     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
> > MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
> >     if (result == MAP_FAILED)
> > {
> > WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno,
> > strerror(errno));
> >         return nullptr;
> > }
> >     return result;
> > }
> > 
> > That would tell us which of the many possible errors are occurring here.
> > 
> > And if you need an immediate workaround, you can of course build with
> > -DUSE_SYSTEM_MALLOC=ON. That will be bad, so I can't recommend that... but
> > you're already disabling GStreamerGL and web fonts.... :P
> 
> The immediate fix is disabling Gigacage by setting GIGACAGE_ENABLED 0 in
> bmalloc/Gigacage.h.
> This keeps bmalloc, but disables Gigacage.
> 
> My guess is that Linux fails to mmap regions and returns MAP_FAILED if the
> size is very large.
> But I'm not sure right now since it is working on my environment...
> Anyway, @mcatanzaro, do you know the way to allocate virtual memory region
> which does not have actual backing pages?

My guess is that mmap with READ/WRITE starts populating backing pages and it fails.
I think we potentially have two ways to fix this issue.

The first way is,
1. first allocate region with mmap NO_PROTO
2. set read/write if necessary

But since Darwin is not requiring this mechanism, I think we need to change the current code a bit.
For example, vmAllocate() assumes that returned memory is accessible. But this assumption is slightly broken in Linux if we take this way. We need a bit additional code for Linux.

The second way is
1. first allocate region with mmap NO_PROTO
2. deallocate backing pages
3. set read/write for this region immediately

I'm not sure it works. But if it works, the change should be minimum.
Comment 8 Michael Catanzaro 2018-03-05 09:55:48 PST
One thing I just noticed, from https://bugzilla.gnome.org/show_bug.cgi?id=794056, is that this stack trace is exactly the same as the one valgrind hits when trying to use valgrind to debug epiphany since gigacage was enabled. That's probably expected, since valgrind attempts to allocate shadow bytes to keep track of the entire address space, which isn't practical under gigacage.

(In reply to Yusuke Suzuki from comment #6)
> Anyway, @mcatanzaro, do you know the way to allocate virtual memory region
> which does not have actual backing pages?

I have no clue about this. I thought actual backing pages were only employed once the memory region is actually used.
Comment 9 Michael Catanzaro 2018-03-05 10:00:35 PST
(In reply to Yusuke Suzuki from comment #6)
> My guess is that Linux fails to mmap regions and returns MAP_FAILED if the
> size is very large.

Keep in mind, we've had gigacage enabled and working fine for half a year....
Comment 10 Yusuke Suzuki 2018-03-05 10:03:36 PST
(In reply to Michael Catanzaro from comment #8)
> One thing I just noticed, from
> https://bugzilla.gnome.org/show_bug.cgi?id=794056, is that this stack trace
> is exactly the same as the one valgrind hits when trying to use valgrind to
> debug epiphany since gigacage was enabled. That's probably expected, since
> valgrind attempts to allocate shadow bytes to keep track of the entire
> address space, which isn't practical under gigacage.

It sounds reasonable assumption to me.

> 
> (In reply to Yusuke Suzuki from comment #6)
> > Anyway, @mcatanzaro, do you know the way to allocate virtual memory region
> > which does not have actual backing pages?
> 
> I have no clue about this. I thought actual backing pages were only employed
> once the memory region is actually used.

(In reply to Michael Catanzaro from comment #9)
> (In reply to Yusuke Suzuki from comment #6)
> > My guess is that Linux fails to mmap regions and returns MAP_FAILED if the
> > size is very large.
> 
> Keep in mind, we've had gigacage enabled and working fine for half a year....

Yeah, right. If it returns MAP_FAILED due to this issue, we should more constantly see this issue in our daily development. So my guess seems wrong.
Comment 11 Michael Catanzaro 2018-03-05 10:37:00 PST
Something could be different in Ubuntu, or in Deja Dup specifically?

I'm actually really surprised that Deja Dup now depends on WebKit. Must be via gnome-online-accounts.
Comment 12 Jeremy Bicha 2018-03-05 10:39:54 PST
(In reply to Michael Catanzaro from comment #11)
> I'm actually really surprised that Deja Dup now depends on WebKit. Must be
> via gnome-online-accounts.

Good question. I don't see any webkit code in Deja Dup but it does use GOA.
Comment 13 Michael Catanzaro 2018-03-13 09:00:09 PDT
*** Bug 183594 has been marked as a duplicate of this bug. ***
Comment 14 Michael Catanzaro 2018-03-13 09:25:25 PDT
So now we have:

 * Our first bug reports against applications other than Deja Dup
 * Our first bug report from a distro other than Ubuntu

Seems like a serious problem.
Comment 15 Michael Catanzaro 2018-03-13 09:26:39 PDT
(In reply to Michael Catanzaro from comment #4)
> Jeremy, here's some debug you could try adding to
> Source/bmalloc/bmalloc/VMAllocate.h

^ This remains the next step here. We need to know why the mmap() call is failing.
Comment 16 Jeremy Bicha 2018-03-13 17:25:51 PDT
(In reply to Michael Catanzaro from comment #4)
> Jeremy, here's some debug you could try adding to
> Source/bmalloc/bmalloc/VMAllocate.h:

webkit2gtk failed to build for me with those lines.

https://launchpad.net/~jbicha/+archive/ubuntu/tempdeja/+sourcepub/8853741/+listing-archive-extra

Build log excerpt
=================

In file included from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/SmallPage.h:32:0,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/Deallocator.h:31,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/Cache.h:31,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/bmalloc.h:29,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/BMalloced.h:28,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/IsoHeapImpl.h:28,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/AllIsoHeaps.h:28,
                 from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/AllIsoHeaps.cpp:26:
/<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h: In function ‘void* tryVMAllocate(size_t)’:
/<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h:48:5: error: ‘vmValidate’ was not declared in this scope
     vmValidate(vmSize);
     ^~~~~~~~~~
/<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h:49:85: error: ‘BMALLOC_NORESERVE’ was not declared in this scope
     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
                                                                                     ^~~~~~~~~~~~~~~~~
/<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h:49:104: error: ‘BMALLOC_VM_TAG’ was not declared in this scope
     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
Comment 17 Michael Catanzaro 2018-03-13 17:37:08 PDT
(In reply to Jeremy Bicha from comment #16)
> (In reply to Michael Catanzaro from comment #4)
> > Jeremy, here's some debug you could try adding to
> > Source/bmalloc/bmalloc/VMAllocate.h:
> 
> webkit2gtk failed to build for me with those lines.

Do: keep the two new #includes where you added them, at the top, above the bmalloc namespace

Don't: add a new definition of tryVMAllocate. Just add the print to the one that already exists, lower in the file. Watch the braces.
Comment 18 Michael Catanzaro 2018-03-13 19:58:02 PDT
Here's a debug patch:

diff --git a/Source/bmalloc/bmalloc/VMAllocate.h b/Source/bmalloc/bmalloc/VMAllocate.h
index 713e9c9fbb6..5c9c8553b1a 100644
--- a/Source/bmalloc/bmalloc/VMAllocate.h
+++ b/Source/bmalloc/bmalloc/VMAllocate.h
@@ -35,6 +35,9 @@
 #include <sys/mman.h>
 #include <unistd.h>
 
+#include <cstdio>
+#include <errno.h>
+
 #if BOS(DARWIN)
 #include <mach/vm_page_size.h>
 #include <mach/vm_statistics.h>
@@ -123,7 +126,10 @@ inline void* tryVMAllocate(size_t vmSize)
     vmValidate(vmSize);
     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
     if (result == MAP_FAILED)
+{
+fprintf(stderr, "%s: mmap failed: vmSize=%zu errno=%d (%s)", __FUNCTION__, vmSize, errno, strerror(errno));
         return nullptr;
+}
     return result;
 }
Comment 19 Tomas Popela 2018-03-14 01:46:26 PDT
(In reply to Michael Catanzaro from comment #18)
> Here's a debug patch:

It's missing <cstring> include for strerror..

diff -up webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h.bmalloc_debug webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h
--- webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h.bmalloc_debug	2018-02-19 08:45:33.000000000 +0100
+++ webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h	2018-03-14 09:26:14.977674938 +0100
@@ -35,6 +35,10 @@
 #include <sys/mman.h>
 #include <unistd.h>
 
+#include <cstdio>
+#include <errno.h>
+#include <cstring>
+
 #if BOS(DARWIN)
 #include <mach/vm_page_size.h>
 #include <mach/vm_statistics.h>
@@ -122,8 +126,10 @@ inline void* tryVMAllocate(size_t vmSize
 {
     vmValidate(vmSize);
     void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
-    if (result == MAP_FAILED)
-        return nullptr;
+    if (result == MAP_FAILED) {
+         fprintf(stderr, "%s: mmap failed: vmSize=%zu errno=%d (%s)", __FUNCTION__, vmSize, errno, strerror(errno));
+         return nullptr;
+    }
     return result;
 }
Comment 20 Michael Catanzaro 2018-03-14 06:47:41 PDT
I swear I tested it, but I tested on trunk, so maybe cstring was getting pulled in elsewhere.
Comment 21 Tomas Popela 2018-03-14 06:53:14 PDT
(In reply to Michael Catanzaro from comment #20)
> I swear I tested it, but I tested on trunk, so maybe cstring was getting
> pulled in elsewhere.

Nevermind, see Jiri's response in https://bugzilla.redhat.com/show_bug.cgi?id=1554873#c6. Updating mutter fixed the issue..
Comment 22 Michael Catanzaro 2018-03-14 07:04:32 PDT
(In reply to Michael Catanzaro from comment #15)
> ^ This remains the next step here. We need to know why the mmap() call is
> failing.

Jeremy discovered Deja Dup is setting a virtual memory limit in its desktop file:

Exec=sh -c "ulimit -v 1000000; exec @pkglibexecdir@/deja-dup-monitor"

which is incompatible with Gigacage. It needs to not do that. This is NOTABUG. Good find, Jeremy!

(In reply to Tomas Popela from comment #21)
> Nevermind, see Jiri's response in
> https://bugzilla.redhat.com/show_bug.cgi?id=1554873#c6. Updating mutter
> fixed the issue..

The mutter update would definitely fix the problem described (app fails to launch another app), but I don't see how it could possibly be related to the ensureGigacage() backtrace. Anyway, that bug is closed now too... we can always revisit if it becomes a problem again.
Comment 23 Jeremy Bicha 2018-03-14 07:11:31 PDT
To give some more context about what Deja Dup was doing…

See 
https://salsa.debian.org/gnome-team/deja-dup/blob/debian/master/data/org.gnome.DejaDup.Monitor.desktop.in

which points to https://launchpad.net/bugs/1302416

And here's the output when I ran the command via webkit2gtk 2.20 built with the debugging patch proposed earlier here

$ ulimit -v 1000000; /usr/lib/deja-dup/deja-dup-monitor
tryVMAllocate: mmap failed: vmSize=154618822656 errno=12 (Cannot allocate memory)
FATAL: Could not allocate gigacage memory with
 maxAlignment = 34359738368, totalSize = 120259084288.
Segmentation fault (core dumped)
Comment 24 Michael Catanzaro 2018-03-14 07:32:02 PDT
Seb from Ubuntu is worried that people will have set a vmlimit for the entire desktop session, and requests that we improve the error message here.
Comment 25 Michael Catanzaro 2018-03-14 07:41:23 PDT
Created attachment 335768 [details]
Patch
Comment 26 Tomas Popela 2018-03-14 07:43:23 PDT
(In reply to Michael Catanzaro from comment #25)
> Created attachment 335768 [details]
> Patch

Informal r+ as we spoke together on IRC..
Comment 27 Michael Catanzaro 2018-03-14 09:06:59 PDT
Filip, Geoffrey, does this error message look OK to you? Want any adjustments?
Comment 28 dytynenko.roman 2018-03-14 10:39:47 PDT
ulimit -a returned virtual memory unlimited, but I figured out that this error was triggered because I explicitly disabled overcommit by
vm.overcommit_memory = 2
setting it to 1 fixed the issue
Comment 29 Filip Pizlo 2018-03-14 10:53:06 PDT
Comment on attachment 335768 [details]
Patch

You might want to set GIGACAEG_ALLOCATION_CAN_FAIL to true on Linux until that's fixed.
Comment 30 Michael Catanzaro 2018-03-14 12:46:10 PDT
(In reply to dytynenko.roman from comment #28)
> ulimit -a returned virtual memory unlimited, but I figured out that this
> error was triggered because I explicitly disabled overcommit by
> vm.overcommit_memory = 2
> setting it to 1 fixed the issue

Oh boy...

(In reply to Filip Pizlo from comment #29)
> Comment on attachment 335768 [details]
> Patch
> 
> You might want to set GIGACAEG_ALLOCATION_CAN_FAIL to true on Linux until
> that's fixed.

I'm not sure what we should do here. We definitely don't need to change anything for Deja Dup; it's easier to fix Deja Dup than to update WebKit, anyway. Disabling overcommit is worth considering, though, and the connection between overcommit and browser security is not obvious.

 * If we leave things unchanged, WebKit will crash with overcommit disabled.
 * If we set GIGACAGE_ALLOCATION_CAN_FAIL, WebKit will not crash, but an important security feature will be lost, and users will not notice.

My inclination is to say Gigacage is sufficiently important that WebKit should crash if overcommit is disabled. Perhaps that's not very kind, though. Thoughts?
Comment 31 Michael Catanzaro 2018-03-16 09:41:04 PDT
Comment on attachment 335768 [details]
Patch

Let's keep this error message. System administrators should consider the needs of the software they are running before setting virtual memory limits or disabling overcommit.
Comment 32 WebKit Commit Bot 2018-03-16 10:06:07 PDT
Comment on attachment 335768 [details]
Patch

Clearing flags on attachment: 335768

Committed r229669: <https://trac.webkit.org/changeset/229669>
Comment 33 WebKit Commit Bot 2018-03-16 10:06:09 PDT
All reviewed patches have been landed.  Closing bug.
Comment 34 Radar WebKit Bug Importer 2018-03-16 10:07:18 PDT
<rdar://problem/38548430>
Comment 35 Yusuke Suzuki 2018-03-16 10:57:40 PDT
(In reply to dytynenko.roman from comment #28)
> ulimit -a returned virtual memory unlimited, but I figured out that this
> error was triggered because I explicitly disabled overcommit by
> vm.overcommit_memory = 2
> setting it to 1 fixed the issue

OVERCOMMIT_NEVER (2) makes mmap's flag MAP_NORESERVE meaningless.
It makes Gigacage address space allocation failed.
https://github.com/torvalds/linux/blob/master/mm/mmap.c#L1475
So it sounds reasonable to me that vm.overcommit_memory=0 or 1 fixes this issue.
Comment 36 Tomas Popela 2018-04-23 01:32:41 PDT
So it looks like there is a way of avoiding this in most cases. Florian Weimer suggested the following in https://bugzilla.redhat.com/show_bug.cgi?id=1570021:


mmap(NULL, 154618822656, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)

MAP_NORESERVE does not have the intended effect with vm.overcommit_memory=2 (which enables the beancounter/disables overcommit).

The standard way of reserving address space is to use a PROT_NONE mapping and make parts readable/writable as needed using mprotect later.  This will work with vm.overcommit_memory=2 as well because initially PROT_NONE mappings do not count towards the commit limit.


Yusuke, would you mind looking at this? You seem to know this low level stuff best.
Comment 37 Michael Catanzaro 2018-04-23 05:59:29 PDT
Of course he notices due to Emacs... OK, reopening.
Comment 38 Daniel Kondor 2018-05-09 22:33:38 PDT
Hi,

I just started having problems due to this issue recently on Ubuntu 16.04.4 with libjavascriptcoregtk-4.0-18 version 2.20.1-0ubuntu0.16.04.1. Several Gnome 3.18 applications will crash on startup (including Rhythmbox 3.3, yelp, gnome-control-center, unity-control-center 15.04.0+16.04.20171130-0ubuntu1, zenity, but basically anything that depends on WebKit I guess). I get the following (or similar) error message:

FATAL: Could not allocate gigacage memory with maxAlignment = 34359738368, totalSize = 103079215104.
(Make sure you have not set a virtual memory limit.)

Setting the environment variable GIGACAGE_ENABLED=0 is a good workaround.

I have overcommit_memory = 2, and I prefer to have it that way (short reason: I develop and run various programs for scientific computations, which sometimes need a lot of memory -- in my case, it is clearly preferable to have a memory allocation fail, producing a meaningful error message than having a seemingly random crash later or invoking the OOM killer). Nevertheless, I still use Gnome desktop applications, which depend on WebKit, producing this issue. 

My understanding is that the only way to fix this is to use PROT_NONE when allocating address space and later use mprotect() before actually using it. I understand that depending the structure of the code, that could be a lot of work to do for a somewhat specialized use case.

Let me know if I can provide any more info.


Stack trace:

Thread 1 (Thread 0x7ffff7edda80 (LWP 16799)):
#0  0x00007fffeccdb905 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#1  0x00007ffff66b4a99 in __pthread_once_slow (once_control=0x7fffecf49008, 
    init_routine=0x7fffe8aceac0 <__once_proxy>) at pthread_once.c:116
#2  0x00007fffeccdb243 in Gigacage::ensureGigacage() ()
   from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#3  0x00007fffecce05a8 in bmalloc::mapToActiveHeapKind(bmalloc::HeapKind) ()
   from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#4  0x00007fffeccd9d3e in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) ()
   from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#5  0x00007fffeccbf6f6 in WTF::StringImpl::createFromLiteral(char const*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#6  0x00007fffeccbf781 in WTF::StringImpl::createFromLiteral(char const*) ()
   from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#7  0x00007fffecccbc20 in WTF::String::String(WTF::ASCIILiteral) ()
   from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#8  0x00007ffff3865a17 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
#9  0x00007ffff7de76ba in call_init (l=<optimized out>, argc=argc@entry=1, 
    argv=argv@entry=0x7fffffffe608, env=env@entry=0x7fffffffe618)
    at dl-init.c:72
#10 0x00007ffff7de77cb in call_init (env=0x7fffffffe618, argv=0x7fffffffe608, 
    argc=1, l=<optimized out>) at dl-init.c:30
#11 _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffe608, 
    env=0x7fffffffe618) at dl-init.c:120
#12 0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#13 0x0000000000000001 in ?? ()
#14 0x00007fffffffe91c in ?? ()
#15 0x0000000000000000 in ?? ()
Comment 39 Michael Catanzaro 2018-05-10 07:14:24 PDT
(In reply to Daniel Kondor from comment #38)
> My understanding is that the only way to fix this is to use PROT_NONE when
> allocating address space and later use mprotect() before actually using it.
> I understand that depending the structure of the code, that could be a lot
> of work to do for a somewhat specialized use case.

Yeah, that's exactly what needs to happen here. I took a look at this, but wasn't sure where the mprotect()s were needed.
Comment 40 Yusuke Suzuki 2018-05-10 07:15:27 PDT
(In reply to Michael Catanzaro from comment #39)
> (In reply to Daniel Kondor from comment #38)
> > My understanding is that the only way to fix this is to use PROT_NONE when
> > allocating address space and later use mprotect() before actually using it.
> > I understand that depending the structure of the code, that could be a lot
> > of work to do for a somewhat specialized use case.
> 
> Yeah, that's exactly what needs to happen here. I took a look at this, but
> wasn't sure where the mprotect()s were needed.

I'm a bit worried about this approach in terms of performance...
Comment 41 Daniel Kondor 2018-05-11 03:01:45 PDT
Just another thought on a potential further implication:
if PROT_NONE was the default and mprotect() used to make individual pages writable / readable, any out of bounds accesses (due to e.g. any bugs / exploits that might come up later) have some chance of triggering a segfault if it falls to another page not reserved with mprotect(). 

Not being familiar with the code, I'm not sure if this would be a feature or if guard pages like this are already implemented. Just to point out that any change will probably have more implications than I originally would have thought :)
Comment 42 Yusuke Suzuki 2018-05-21 06:30:57 PDT
Created attachment 340850 [details]
Patch
Comment 43 Michael Catanzaro 2018-05-21 08:11:31 PDT
Comment on attachment 340850 [details]
Patch

Well I'm not sure if we should do this, because Gigacage is an important security feature. There are some notes at https://labs.mwrinfosecurity.com/blog/some-brief-notes-on-webkit-heap-hardening/ about all the hardening features that this will silently disable.

But I guess not crashing is good, so r=me.

Did you try Florian's suggestion? Were there problems with performance?
Comment 44 Yusuke Suzuki 2018-05-22 03:45:55 PDT
(In reply to Michael Catanzaro from comment #43)
> Comment on attachment 340850 [details]
> Patch
> 
> Well I'm not sure if we should do this, because Gigacage is an important
> security feature. There are some notes at
> https://labs.mwrinfosecurity.com/blog/some-brief-notes-on-webkit-heap-
> hardening/ about all the hardening features that this will silently disable.
> 
> But I guess not crashing is good, so r=me.
> 
> Did you try Florian's suggestion? Were there problems with performance?

Currently, I'm not trying this.
I think currently this patch is OK since it avoids crashing. And users can enable gigacage by enabling overcommit :)
Comment 45 Yusuke Suzuki 2018-05-22 03:49:22 PDT
Committed r232059: <https://trac.webkit.org/changeset/232059>