RESOLVED FIXED 183329
Define GIGACAGE_ALLOCATION_CAN_FAIL on Linux
https://bugs.webkit.org/show_bug.cgi?id=183329
Summary Define GIGACAGE_ALLOCATION_CAN_FAIL on Linux
Jeremy Bicha
Reported 2018-03-05 06:44:40 PST
Ubuntu 18.04 (pre-Beta) recently upgraded to webkit2gtk 2.19.91. We are receiving lots of crash reports from Deja Dup at https://launchpad.net/bugs/1751460 Deja Dup is versioned 37.1. Stacktrace ---------- #0 0x00007fac7224d588 in () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #1 0x00007fac76c64827 in __pthread_once_slow (once_control=0x7fac724b502c, init_routine=0x7fac69330490 <__once_proxy>) at pthread_once.c:116 _buffer = {__routine = 0x7fac76c64880 <clear_once_control>, __arg = 0x7fac724b502c, __canceltype = 2008932720, __prev = 0x0} val = <optimized out> newval = <optimized out> #2 0x00007fac7224ce0d in Gigacage::ensureGigacage() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #3 0x00007fac7224dd01 in bmalloc::Heap::Heap(bmalloc::HeapKind, std::lock_guard<bmalloc::StaticMutex>&) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #4 0x00007fac7224bb10 in bmalloc::PerProcess<bmalloc::PerHeapKind<bmalloc::Heap> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #5 0x00007fac7224b7d4 in bmalloc::Cache::Cache(bmalloc::HeapKind) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #6 0x00007fac7224bbf6 in bmalloc::PerThread<bmalloc::PerHeapKind<bmalloc::Cache> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #7 0x00007fac7224b87f in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #8 0x00007fac72230f56 in WTF::StringImpl::createFromLiteral(char const*, unsigned int) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #9 0x00007fac72230fe1 in WTF::StringImpl::createFromLiteral(char const*) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #10 0x00007fac7223d3c0 in WTF::String::String(WTF::ASCIILiteral) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #11 0x00007fac729f7557 in () at /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37 #12 0x00007fac779c5733 in call_init (env=0x7ffc3677ae08, argv=0x7ffc3677adf8, argc=1, l=<optimized out>) at dl-init.c:72 j = <optimized out> jm = <optimized out> addrs = <optimized out> init_array = <optimized out> env = 0x7ffc3677ae08 argv = 0x7ffc3677adf8 argc = 1 l = <optimized out> preinit_array = <optimized out> preinit_array_size = <optimized out> i = 6 #13 0x00007fac779c5733 in _dl_init (main_map=0x7fac77bde170, argc=1, argv=0x7ffc3677adf8, env=0x7ffc3677ae08) at dl-init.c:119 preinit_array = <optimized out> preinit_array_size = <optimized out> i = 6 #14 0x00007fac779b60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2 #15 0x0000000000000001 in () #16 0x00007ffc3677b9c3 in () #17 0x0000000000000000 in ()
Attachments
Patch (1.93 KB, patch)
2018-03-14 07:41 PDT, Michael Catanzaro
no flags
Patch (1.54 KB, patch)
2018-05-21 06:30 PDT, Yusuke Suzuki
mcatanzaro: review+
Jeremy Bicha
Comment 1 2018-03-05 06:45:36 PST
Thread Stacktrace ================= . Thread 1 (Thread 0x7fac77b6aac0 (LWP 3178)): #0 0x00007fac7224d588 in () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #1 0x00007fac76c64827 in __pthread_once_slow (once_control=0x7fac724b502c, init_routine=0x7fac69330490 <__once_proxy>) at pthread_once.c:116 _buffer = {__routine = 0x7fac76c64880 <clear_once_control>, __arg = 0x7fac724b502c, __canceltype = 2008932720, __prev = 0x0} val = <optimized out> newval = <optimized out> #2 0x00007fac7224ce0d in Gigacage::ensureGigacage() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #3 0x00007fac7224dd01 in bmalloc::Heap::Heap(bmalloc::HeapKind, std::lock_guard<bmalloc::StaticMutex>&) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #4 0x00007fac7224bb10 in bmalloc::PerProcess<bmalloc::PerHeapKind<bmalloc::Heap> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #5 0x00007fac7224b7d4 in bmalloc::Cache::Cache(bmalloc::HeapKind) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #6 0x00007fac7224bbf6 in bmalloc::PerThread<bmalloc::PerHeapKind<bmalloc::Cache> >::getSlowCase() () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #7 0x00007fac7224b87f in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #8 0x00007fac72230f56 in WTF::StringImpl::createFromLiteral(char const*, unsigned int) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #9 0x00007fac72230fe1 in WTF::StringImpl::createFromLiteral(char const*) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #10 0x00007fac7223d3c0 in WTF::String::String(WTF::ASCIILiteral) () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #11 0x00007fac729f7557 in () at /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37 #12 0x00007fac779c5733 in call_init (env=0x7ffc3677ae08, argv=0x7ffc3677adf8, argc=1, l=<optimized out>) at dl-init.c:72 j = <optimized out> jm = <optimized out> addrs = <optimized out> init_array = <optimized out> env = 0x7ffc3677ae08 argv = 0x7ffc3677adf8 argc = 1 l = <optimized out> preinit_array = <optimized out> preinit_array_size = <optimized out> i = 6 #13 0x00007fac779c5733 in _dl_init (main_map=0x7fac77bde170, argc=1, argv=0x7ffc3677adf8, env=0x7ffc3677ae08) at dl-init.c:119 preinit_array = <optimized out> preinit_array_size = <optimized out> i = 6 #14 0x00007fac779b60ca in _dl_start_user () at /lib64/ld-linux-x86-64.so.2 #15 0x0000000000000001 in () #16 0x00007ffc3677b9c3 in () #17 0x0000000000000000 in ()
Carlos Garcia Campos
Comment 2 2018-03-05 06:51:05 PST
The bt looks similar to the one in https://bugs.webkit.org/show_bug.cgi?id=179914#c39
Michael Catanzaro
Comment 3 2018-03-05 09:08:26 PST
The failure occurs here: // FIXME: Randomize where this goes. // https://bugs.webkit.org/show_bug.cgi?id=175245 void* base = tryVMAllocate(maxAlignment, totalSize); if (!base) { if (GIGACAGE_ALLOCATION_CAN_FAIL) return; fprintf(stderr, "FATAL: Could not allocate gigacage memory with maxAlignment = %lu, totalSize = %lu.\n", maxAlignment, totalSize); BCRASH(); } So tryVMAllocate fails. That means bmalloc was unable to allocate virtual memory. That's not supposed to fail (obviously). Implementation is here: inline void* tryVMAllocate(size_t vmSize) { vmValidate(vmSize); void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); if (result == MAP_FAILED) return nullptr; return result; } So the problem boils down to this mmap call. It's very strange that this is only happening with Deja Dup. Other applications are unaffected?
Michael Catanzaro
Comment 4 2018-03-05 09:13:36 PST
Jeremy, here's some debug you could try adding to Source/bmalloc/bmalloc/VMAllocate.h: // At the top of the file, before the bmalloc namespace #include <cstring> #include <errno.h> inline void* tryVMAllocate(size_t vmSize) { vmValidate(vmSize); void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); if (result == MAP_FAILED) { WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno, strerror(errno)); return nullptr; } return result; } That would tell us which of the many possible errors are occurring here. And if you need an immediate workaround, you can of course build with -DUSE_SYSTEM_MALLOC=ON. That will be bad, so I can't recommend that... but you're already disabling GStreamerGL and web fonts.... :P
Michael Catanzaro
Comment 5 2018-03-05 09:15:35 PST
(In reply to Michael Catanzaro from comment #3) > fprintf(stderr, "FATAL: Could not allocate gigacage memory > with maxAlignment = %lu, totalSize = %lu.\n", maxAlignment, totalSize); This might be helpful too. That should also print before the crash. (In reply to Michael Catanzaro from comment #4) > inline void* tryVMAllocate(size_t vmSize) > { > vmValidate(vmSize); > void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | > MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); > if (result == MAP_FAILED) > { > WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno, > strerror(errno)); > return nullptr; > } > return result; > } I forgot we can't use WTF here. Got to use fprintf: fprintf(stderr, "%s: mmap failed: %d (%s)", __FUNCTION__, errno, strerror(errno));
Yusuke Suzuki
Comment 6 2018-03-05 09:22:47 PST
(In reply to Michael Catanzaro from comment #4) > Jeremy, here's some debug you could try adding to > Source/bmalloc/bmalloc/VMAllocate.h: > > // At the top of the file, before the bmalloc namespace > #include <cstring> > #include <errno.h> > > inline void* tryVMAllocate(size_t vmSize) > { > vmValidate(vmSize); > void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | > MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); > if (result == MAP_FAILED) > { > WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno, > strerror(errno)); > return nullptr; > } > return result; > } > > That would tell us which of the many possible errors are occurring here. > > And if you need an immediate workaround, you can of course build with > -DUSE_SYSTEM_MALLOC=ON. That will be bad, so I can't recommend that... but > you're already disabling GStreamerGL and web fonts.... :P The immediate fix is disabling Gigacage by setting GIGACAGE_ENABLED 0 in bmalloc/Gigacage.h. This keeps bmalloc, but disables Gigacage. My guess is that Linux fails to mmap regions and returns MAP_FAILED if the size is very large. But I'm not sure right now since it is working on my environment... Anyway, @mcatanzaro, do you know the way to allocate virtual memory region which does not have actual backing pages?
Yusuke Suzuki
Comment 7 2018-03-05 09:48:33 PST
(In reply to Yusuke Suzuki from comment #6) > (In reply to Michael Catanzaro from comment #4) > > Jeremy, here's some debug you could try adding to > > Source/bmalloc/bmalloc/VMAllocate.h: > > > > // At the top of the file, before the bmalloc namespace > > #include <cstring> > > #include <errno.h> > > > > inline void* tryVMAllocate(size_t vmSize) > > { > > vmValidate(vmSize); > > void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | > > MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); > > if (result == MAP_FAILED) > > { > > WTFLogAlways("%s: mmap failed: %d (%s)", __FUNCTION__, errno, > > strerror(errno)); > > return nullptr; > > } > > return result; > > } > > > > That would tell us which of the many possible errors are occurring here. > > > > And if you need an immediate workaround, you can of course build with > > -DUSE_SYSTEM_MALLOC=ON. That will be bad, so I can't recommend that... but > > you're already disabling GStreamerGL and web fonts.... :P > > The immediate fix is disabling Gigacage by setting GIGACAGE_ENABLED 0 in > bmalloc/Gigacage.h. > This keeps bmalloc, but disables Gigacage. > > My guess is that Linux fails to mmap regions and returns MAP_FAILED if the > size is very large. > But I'm not sure right now since it is working on my environment... > Anyway, @mcatanzaro, do you know the way to allocate virtual memory region > which does not have actual backing pages? My guess is that mmap with READ/WRITE starts populating backing pages and it fails. I think we potentially have two ways to fix this issue. The first way is, 1. first allocate region with mmap NO_PROTO 2. set read/write if necessary But since Darwin is not requiring this mechanism, I think we need to change the current code a bit. For example, vmAllocate() assumes that returned memory is accessible. But this assumption is slightly broken in Linux if we take this way. We need a bit additional code for Linux. The second way is 1. first allocate region with mmap NO_PROTO 2. deallocate backing pages 3. set read/write for this region immediately I'm not sure it works. But if it works, the change should be minimum.
Michael Catanzaro
Comment 8 2018-03-05 09:55:48 PST
One thing I just noticed, from https://bugzilla.gnome.org/show_bug.cgi?id=794056, is that this stack trace is exactly the same as the one valgrind hits when trying to use valgrind to debug epiphany since gigacage was enabled. That's probably expected, since valgrind attempts to allocate shadow bytes to keep track of the entire address space, which isn't practical under gigacage. (In reply to Yusuke Suzuki from comment #6) > Anyway, @mcatanzaro, do you know the way to allocate virtual memory region > which does not have actual backing pages? I have no clue about this. I thought actual backing pages were only employed once the memory region is actually used.
Michael Catanzaro
Comment 9 2018-03-05 10:00:35 PST
(In reply to Yusuke Suzuki from comment #6) > My guess is that Linux fails to mmap regions and returns MAP_FAILED if the > size is very large. Keep in mind, we've had gigacage enabled and working fine for half a year....
Yusuke Suzuki
Comment 10 2018-03-05 10:03:36 PST
(In reply to Michael Catanzaro from comment #8) > One thing I just noticed, from > https://bugzilla.gnome.org/show_bug.cgi?id=794056, is that this stack trace > is exactly the same as the one valgrind hits when trying to use valgrind to > debug epiphany since gigacage was enabled. That's probably expected, since > valgrind attempts to allocate shadow bytes to keep track of the entire > address space, which isn't practical under gigacage. It sounds reasonable assumption to me. > > (In reply to Yusuke Suzuki from comment #6) > > Anyway, @mcatanzaro, do you know the way to allocate virtual memory region > > which does not have actual backing pages? > > I have no clue about this. I thought actual backing pages were only employed > once the memory region is actually used. (In reply to Michael Catanzaro from comment #9) > (In reply to Yusuke Suzuki from comment #6) > > My guess is that Linux fails to mmap regions and returns MAP_FAILED if the > > size is very large. > > Keep in mind, we've had gigacage enabled and working fine for half a year.... Yeah, right. If it returns MAP_FAILED due to this issue, we should more constantly see this issue in our daily development. So my guess seems wrong.
Michael Catanzaro
Comment 11 2018-03-05 10:37:00 PST
Something could be different in Ubuntu, or in Deja Dup specifically? I'm actually really surprised that Deja Dup now depends on WebKit. Must be via gnome-online-accounts.
Jeremy Bicha
Comment 12 2018-03-05 10:39:54 PST
(In reply to Michael Catanzaro from comment #11) > I'm actually really surprised that Deja Dup now depends on WebKit. Must be > via gnome-online-accounts. Good question. I don't see any webkit code in Deja Dup but it does use GOA.
Michael Catanzaro
Comment 13 2018-03-13 09:00:09 PDT
*** Bug 183594 has been marked as a duplicate of this bug. ***
Michael Catanzaro
Comment 14 2018-03-13 09:25:25 PDT
So now we have: * Our first bug reports against applications other than Deja Dup * Our first bug report from a distro other than Ubuntu Seems like a serious problem.
Michael Catanzaro
Comment 15 2018-03-13 09:26:39 PDT
(In reply to Michael Catanzaro from comment #4) > Jeremy, here's some debug you could try adding to > Source/bmalloc/bmalloc/VMAllocate.h ^ This remains the next step here. We need to know why the mmap() call is failing.
Jeremy Bicha
Comment 16 2018-03-13 17:25:51 PDT
(In reply to Michael Catanzaro from comment #4) > Jeremy, here's some debug you could try adding to > Source/bmalloc/bmalloc/VMAllocate.h: webkit2gtk failed to build for me with those lines. https://launchpad.net/~jbicha/+archive/ubuntu/tempdeja/+sourcepub/8853741/+listing-archive-extra Build log excerpt ================= In file included from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/SmallPage.h:32:0, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/Deallocator.h:31, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/Cache.h:31, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/bmalloc.h:29, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/BMalloced.h:28, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/IsoHeapImpl.h:28, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/AllIsoHeaps.h:28, from /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/AllIsoHeaps.cpp:26: /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h: In function ‘void* tryVMAllocate(size_t)’: /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h:48:5: error: ‘vmValidate’ was not declared in this scope vmValidate(vmSize); ^~~~~~~~~~ /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h:49:85: error: ‘BMALLOC_NORESERVE’ was not declared in this scope void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); ^~~~~~~~~~~~~~~~~ /<<PKGBUILDDIR>>/Source/bmalloc/bmalloc/VMAllocate.h:49:104: error: ‘BMALLOC_VM_TAG’ was not declared in this scope void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0);
Michael Catanzaro
Comment 17 2018-03-13 17:37:08 PDT
(In reply to Jeremy Bicha from comment #16) > (In reply to Michael Catanzaro from comment #4) > > Jeremy, here's some debug you could try adding to > > Source/bmalloc/bmalloc/VMAllocate.h: > > webkit2gtk failed to build for me with those lines. Do: keep the two new #includes where you added them, at the top, above the bmalloc namespace Don't: add a new definition of tryVMAllocate. Just add the print to the one that already exists, lower in the file. Watch the braces.
Michael Catanzaro
Comment 18 2018-03-13 19:58:02 PDT
Here's a debug patch: diff --git a/Source/bmalloc/bmalloc/VMAllocate.h b/Source/bmalloc/bmalloc/VMAllocate.h index 713e9c9fbb6..5c9c8553b1a 100644 --- a/Source/bmalloc/bmalloc/VMAllocate.h +++ b/Source/bmalloc/bmalloc/VMAllocate.h @@ -35,6 +35,9 @@ #include <sys/mman.h> #include <unistd.h> +#include <cstdio> +#include <errno.h> + #if BOS(DARWIN) #include <mach/vm_page_size.h> #include <mach/vm_statistics.h> @@ -123,7 +126,10 @@ inline void* tryVMAllocate(size_t vmSize) vmValidate(vmSize); void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); if (result == MAP_FAILED) +{ +fprintf(stderr, "%s: mmap failed: vmSize=%zu errno=%d (%s)", __FUNCTION__, vmSize, errno, strerror(errno)); return nullptr; +} return result; }
Tomas Popela
Comment 19 2018-03-14 01:46:26 PDT
(In reply to Michael Catanzaro from comment #18) > Here's a debug patch: It's missing <cstring> include for strerror.. diff -up webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h.bmalloc_debug webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h --- webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h.bmalloc_debug 2018-02-19 08:45:33.000000000 +0100 +++ webkitgtk-2.20.0/Source/bmalloc/bmalloc/VMAllocate.h 2018-03-14 09:26:14.977674938 +0100 @@ -35,6 +35,10 @@ #include <sys/mman.h> #include <unistd.h> +#include <cstdio> +#include <errno.h> +#include <cstring> + #if BOS(DARWIN) #include <mach/vm_page_size.h> #include <mach/vm_statistics.h> @@ -122,8 +126,10 @@ inline void* tryVMAllocate(size_t vmSize { vmValidate(vmSize); void* result = mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | BMALLOC_NORESERVE, BMALLOC_VM_TAG, 0); - if (result == MAP_FAILED) - return nullptr; + if (result == MAP_FAILED) { + fprintf(stderr, "%s: mmap failed: vmSize=%zu errno=%d (%s)", __FUNCTION__, vmSize, errno, strerror(errno)); + return nullptr; + } return result; }
Michael Catanzaro
Comment 20 2018-03-14 06:47:41 PDT
I swear I tested it, but I tested on trunk, so maybe cstring was getting pulled in elsewhere.
Tomas Popela
Comment 21 2018-03-14 06:53:14 PDT
(In reply to Michael Catanzaro from comment #20) > I swear I tested it, but I tested on trunk, so maybe cstring was getting > pulled in elsewhere. Nevermind, see Jiri's response in https://bugzilla.redhat.com/show_bug.cgi?id=1554873#c6. Updating mutter fixed the issue..
Michael Catanzaro
Comment 22 2018-03-14 07:04:32 PDT
(In reply to Michael Catanzaro from comment #15) > ^ This remains the next step here. We need to know why the mmap() call is > failing. Jeremy discovered Deja Dup is setting a virtual memory limit in its desktop file: Exec=sh -c "ulimit -v 1000000; exec @pkglibexecdir@/deja-dup-monitor" which is incompatible with Gigacage. It needs to not do that. This is NOTABUG. Good find, Jeremy! (In reply to Tomas Popela from comment #21) > Nevermind, see Jiri's response in > https://bugzilla.redhat.com/show_bug.cgi?id=1554873#c6. Updating mutter > fixed the issue.. The mutter update would definitely fix the problem described (app fails to launch another app), but I don't see how it could possibly be related to the ensureGigacage() backtrace. Anyway, that bug is closed now too... we can always revisit if it becomes a problem again.
Jeremy Bicha
Comment 23 2018-03-14 07:11:31 PDT
To give some more context about what Deja Dup was doing… See https://salsa.debian.org/gnome-team/deja-dup/blob/debian/master/data/org.gnome.DejaDup.Monitor.desktop.in which points to https://launchpad.net/bugs/1302416 And here's the output when I ran the command via webkit2gtk 2.20 built with the debugging patch proposed earlier here $ ulimit -v 1000000; /usr/lib/deja-dup/deja-dup-monitor tryVMAllocate: mmap failed: vmSize=154618822656 errno=12 (Cannot allocate memory) FATAL: Could not allocate gigacage memory with maxAlignment = 34359738368, totalSize = 120259084288. Segmentation fault (core dumped)
Michael Catanzaro
Comment 24 2018-03-14 07:32:02 PDT
Seb from Ubuntu is worried that people will have set a vmlimit for the entire desktop session, and requests that we improve the error message here.
Michael Catanzaro
Comment 25 2018-03-14 07:41:23 PDT
Tomas Popela
Comment 26 2018-03-14 07:43:23 PDT
(In reply to Michael Catanzaro from comment #25) > Created attachment 335768 [details] > Patch Informal r+ as we spoke together on IRC..
Michael Catanzaro
Comment 27 2018-03-14 09:06:59 PDT
Filip, Geoffrey, does this error message look OK to you? Want any adjustments?
dytynenko.roman
Comment 28 2018-03-14 10:39:47 PDT
ulimit -a returned virtual memory unlimited, but I figured out that this error was triggered because I explicitly disabled overcommit by vm.overcommit_memory = 2 setting it to 1 fixed the issue
Filip Pizlo
Comment 29 2018-03-14 10:53:06 PDT
Comment on attachment 335768 [details] Patch You might want to set GIGACAEG_ALLOCATION_CAN_FAIL to true on Linux until that's fixed.
Michael Catanzaro
Comment 30 2018-03-14 12:46:10 PDT
(In reply to dytynenko.roman from comment #28) > ulimit -a returned virtual memory unlimited, but I figured out that this > error was triggered because I explicitly disabled overcommit by > vm.overcommit_memory = 2 > setting it to 1 fixed the issue Oh boy... (In reply to Filip Pizlo from comment #29) > Comment on attachment 335768 [details] > Patch > > You might want to set GIGACAEG_ALLOCATION_CAN_FAIL to true on Linux until > that's fixed. I'm not sure what we should do here. We definitely don't need to change anything for Deja Dup; it's easier to fix Deja Dup than to update WebKit, anyway. Disabling overcommit is worth considering, though, and the connection between overcommit and browser security is not obvious. * If we leave things unchanged, WebKit will crash with overcommit disabled. * If we set GIGACAGE_ALLOCATION_CAN_FAIL, WebKit will not crash, but an important security feature will be lost, and users will not notice. My inclination is to say Gigacage is sufficiently important that WebKit should crash if overcommit is disabled. Perhaps that's not very kind, though. Thoughts?
Michael Catanzaro
Comment 31 2018-03-16 09:41:04 PDT
Comment on attachment 335768 [details] Patch Let's keep this error message. System administrators should consider the needs of the software they are running before setting virtual memory limits or disabling overcommit.
WebKit Commit Bot
Comment 32 2018-03-16 10:06:07 PDT
Comment on attachment 335768 [details] Patch Clearing flags on attachment: 335768 Committed r229669: <https://trac.webkit.org/changeset/229669>
WebKit Commit Bot
Comment 33 2018-03-16 10:06:09 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 34 2018-03-16 10:07:18 PDT
Yusuke Suzuki
Comment 35 2018-03-16 10:57:40 PDT
(In reply to dytynenko.roman from comment #28) > ulimit -a returned virtual memory unlimited, but I figured out that this > error was triggered because I explicitly disabled overcommit by > vm.overcommit_memory = 2 > setting it to 1 fixed the issue OVERCOMMIT_NEVER (2) makes mmap's flag MAP_NORESERVE meaningless. It makes Gigacage address space allocation failed. https://github.com/torvalds/linux/blob/master/mm/mmap.c#L1475 So it sounds reasonable to me that vm.overcommit_memory=0 or 1 fixes this issue.
Tomas Popela
Comment 36 2018-04-23 01:32:41 PDT
So it looks like there is a way of avoiding this in most cases. Florian Weimer suggested the following in https://bugzilla.redhat.com/show_bug.cgi?id=1570021: mmap(NULL, 154618822656, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory) MAP_NORESERVE does not have the intended effect with vm.overcommit_memory=2 (which enables the beancounter/disables overcommit). The standard way of reserving address space is to use a PROT_NONE mapping and make parts readable/writable as needed using mprotect later. This will work with vm.overcommit_memory=2 as well because initially PROT_NONE mappings do not count towards the commit limit. Yusuke, would you mind looking at this? You seem to know this low level stuff best.
Michael Catanzaro
Comment 37 2018-04-23 05:59:29 PDT
Of course he notices due to Emacs... OK, reopening.
Daniel Kondor
Comment 38 2018-05-09 22:33:38 PDT
Hi, I just started having problems due to this issue recently on Ubuntu 16.04.4 with libjavascriptcoregtk-4.0-18 version 2.20.1-0ubuntu0.16.04.1. Several Gnome 3.18 applications will crash on startup (including Rhythmbox 3.3, yelp, gnome-control-center, unity-control-center 15.04.0+16.04.20171130-0ubuntu1, zenity, but basically anything that depends on WebKit I guess). I get the following (or similar) error message: FATAL: Could not allocate gigacage memory with maxAlignment = 34359738368, totalSize = 103079215104. (Make sure you have not set a virtual memory limit.) Setting the environment variable GIGACAGE_ENABLED=0 is a good workaround. I have overcommit_memory = 2, and I prefer to have it that way (short reason: I develop and run various programs for scientific computations, which sometimes need a lot of memory -- in my case, it is clearly preferable to have a memory allocation fail, producing a meaningful error message than having a seemingly random crash later or invoking the OOM killer). Nevertheless, I still use Gnome desktop applications, which depend on WebKit, producing this issue. My understanding is that the only way to fix this is to use PROT_NONE when allocating address space and later use mprotect() before actually using it. I understand that depending the structure of the code, that could be a lot of work to do for a somewhat specialized use case. Let me know if I can provide any more info. Stack trace: Thread 1 (Thread 0x7ffff7edda80 (LWP 16799)): #0 0x00007fffeccdb905 in ?? () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #1 0x00007ffff66b4a99 in __pthread_once_slow (once_control=0x7fffecf49008, init_routine=0x7fffe8aceac0 <__once_proxy>) at pthread_once.c:116 #2 0x00007fffeccdb243 in Gigacage::ensureGigacage() () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #3 0x00007fffecce05a8 in bmalloc::mapToActiveHeapKind(bmalloc::HeapKind) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #4 0x00007fffeccd9d3e in bmalloc::Cache::allocateSlowCaseNullCache(bmalloc::HeapKind, unsigned long) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #5 0x00007fffeccbf6f6 in WTF::StringImpl::createFromLiteral(char const*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #6 0x00007fffeccbf781 in WTF::StringImpl::createFromLiteral(char const*) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #7 0x00007fffecccbc20 in WTF::String::String(WTF::ASCIILiteral) () from /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18 #8 0x00007ffff3865a17 in ?? () from /usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37 #9 0x00007ffff7de76ba in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe608, env=env@entry=0x7fffffffe618) at dl-init.c:72 #10 0x00007ffff7de77cb in call_init (env=0x7fffffffe618, argv=0x7fffffffe608, argc=1, l=<optimized out>) at dl-init.c:30 #11 _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffe608, env=0x7fffffffe618) at dl-init.c:120 #12 0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #13 0x0000000000000001 in ?? () #14 0x00007fffffffe91c in ?? () #15 0x0000000000000000 in ?? ()
Michael Catanzaro
Comment 39 2018-05-10 07:14:24 PDT
(In reply to Daniel Kondor from comment #38) > My understanding is that the only way to fix this is to use PROT_NONE when > allocating address space and later use mprotect() before actually using it. > I understand that depending the structure of the code, that could be a lot > of work to do for a somewhat specialized use case. Yeah, that's exactly what needs to happen here. I took a look at this, but wasn't sure where the mprotect()s were needed.
Yusuke Suzuki
Comment 40 2018-05-10 07:15:27 PDT
(In reply to Michael Catanzaro from comment #39) > (In reply to Daniel Kondor from comment #38) > > My understanding is that the only way to fix this is to use PROT_NONE when > > allocating address space and later use mprotect() before actually using it. > > I understand that depending the structure of the code, that could be a lot > > of work to do for a somewhat specialized use case. > > Yeah, that's exactly what needs to happen here. I took a look at this, but > wasn't sure where the mprotect()s were needed. I'm a bit worried about this approach in terms of performance...
Daniel Kondor
Comment 41 2018-05-11 03:01:45 PDT
Just another thought on a potential further implication: if PROT_NONE was the default and mprotect() used to make individual pages writable / readable, any out of bounds accesses (due to e.g. any bugs / exploits that might come up later) have some chance of triggering a segfault if it falls to another page not reserved with mprotect(). Not being familiar with the code, I'm not sure if this would be a feature or if guard pages like this are already implemented. Just to point out that any change will probably have more implications than I originally would have thought :)
Yusuke Suzuki
Comment 42 2018-05-21 06:30:57 PDT
Michael Catanzaro
Comment 43 2018-05-21 08:11:31 PDT
Comment on attachment 340850 [details] Patch Well I'm not sure if we should do this, because Gigacage is an important security feature. There are some notes at https://labs.mwrinfosecurity.com/blog/some-brief-notes-on-webkit-heap-hardening/ about all the hardening features that this will silently disable. But I guess not crashing is good, so r=me. Did you try Florian's suggestion? Were there problems with performance?
Yusuke Suzuki
Comment 44 2018-05-22 03:45:55 PDT
(In reply to Michael Catanzaro from comment #43) > Comment on attachment 340850 [details] > Patch > > Well I'm not sure if we should do this, because Gigacage is an important > security feature. There are some notes at > https://labs.mwrinfosecurity.com/blog/some-brief-notes-on-webkit-heap- > hardening/ about all the hardening features that this will silently disable. > > But I guess not crashing is good, so r=me. > > Did you try Florian's suggestion? Were there problems with performance? Currently, I'm not trying this. I think currently this patch is OK since it avoids crashing. And users can enable gigacage by enabling overcommit :)
Yusuke Suzuki
Comment 45 2018-05-22 03:49:22 PDT
Note You need to log in before you can comment on or make changes to this bug.