Bug 170628

Summary: WebAssembly: manage memory better
Product: WebKit Reporter: JF Bastien <jfbastien>
Component: JavaScriptCoreAssignee: JF Bastien <jfbastien>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, buildbot, cdumez, cmarcelo, commit-queue, dbates, fpizlo, jbedard, jfbastien, keith_miller, mark.lam, msaboff, ryanhaddad, sbarati
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=163600
https://bugs.webkit.org/show_bug.cgi?id=170218
https://bugs.webkit.org/show_bug.cgi?id=169892
https://bugs.webkit.org/show_bug.cgi?id=169882
https://bugs.webkit.org/show_bug.cgi?id=169713
https://bugs.webkit.org/show_bug.cgi?id=169728
https://bugs.webkit.org/show_bug.cgi?id=165704
https://bugs.webkit.org/show_bug.cgi?id=165935
Bug Depends on:    
Bug Blocks: 159775, 170748, 170773, 170774, 170788, 170826, 170690, 170751, 170825, 170832, 170909    
Attachments:
Description Flags
preliminary patch
jfbastien: commit-queue-
add multi-agent test
jfbastien: commit-queue-
tune GC after discussing with Fil
jfbastien: commit-queue-
Updates
jfbastien: commit-queue-
Small fix
jfbastien: commit-queue-
patch
keith_miller: review-, keith_miller: commit-queue-
patch
none
patch
keith_miller: review+, keith_miller: commit-queue-
patch
commit-queue: commit-queue-
patch none

Description JF Bastien 2017-04-07 17:12:31 PDT
We could do way better with how we manage fast memories. I've got an upcoming patch which does a bunch of stuff to improve this situation.
Comment 1 JF Bastien 2017-04-07 17:13:00 PDT
*** Bug 163600 has been marked as a duplicate of this bug. ***
Comment 2 JF Bastien 2017-04-07 17:13:19 PDT
*** Bug 170218 has been marked as a duplicate of this bug. ***
Comment 3 JF Bastien 2017-04-07 17:13:28 PDT
*** Bug 169892 has been marked as a duplicate of this bug. ***
Comment 4 JF Bastien 2017-04-07 17:13:38 PDT
*** Bug 169882 has been marked as a duplicate of this bug. ***
Comment 5 JF Bastien 2017-04-07 17:13:55 PDT
*** Bug 169713 has been marked as a duplicate of this bug. ***
Comment 6 JF Bastien 2017-04-07 17:14:23 PDT
*** Bug 169728 has been marked as a duplicate of this bug. ***
Comment 7 JF Bastien 2017-04-07 17:14:32 PDT
*** Bug 165704 has been marked as a duplicate of this bug. ***
Comment 8 JF Bastien 2017-04-07 17:14:42 PDT
*** Bug 165935 has been marked as a duplicate of this bug. ***
Comment 9 JF Bastien 2017-04-07 17:54:34 PDT
Created attachment 306558 [details]
preliminary patch

Here's a first stab at it. There are a bunch more things to do, but the big parts are in place and could be reviewed.
Comment 10 JF Bastien 2017-04-10 10:52:20 PDT
Created attachment 306710 [details]
add multi-agent test

Still WIP.
Comment 11 JF Bastien 2017-04-10 13:54:45 PDT
Created attachment 306738 [details]
tune GC after discussing with Fil

WIP
Comment 12 JF Bastien 2017-04-10 16:27:21 PDT
Created attachment 306754 [details]
Updates

A few fixes, and after discussion with Fil we're back to calling vm.heap.collectAllGarbage(); when we fail getting a memory from the cache. On my desktop the GC thread was getting behind if I had something else (like a compile) running, and I wasn't reliably getting fast memories. This update makes it 100% reliable that we'll get fast memories on my desktop.

This is the test I ran:

const memories = 128;
const verbose = true;
const initial = 1;

let types = {};

for (let m = 0; m < memories; ++m) {
    let memory = new WebAssembly.Memory({ initial: initial });
    let type = WebAssemblyMemoryMode(memory);
    types[type] = types[type] ? types[type] + 1 : 1;
}

if (verbose) {
    let got = "Got: ";
    for (let p in types)
        got += ` ${types[p]}: ${p}`;
    print(got);
}
Comment 13 JF Bastien 2017-04-10 21:29:21 PDT
Created attachment 306773 [details]
Small fix

WIP
Comment 14 JF Bastien 2017-04-11 16:48:32 PDT
Created attachment 306873 [details]
patch

Ready for review!
Comment 15 Build Bot 2017-04-11 16:51:13 PDT
Attachment 306873 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:85:  Multi line control clauses should use braces.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:141:  One line control clauses should not use braces.  [whitespace/braces] [4]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:285:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:305:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:558:  When wrapping a line, only indent 4 spaces.  [whitespace/indent] [3]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.h:64:  The parameter name "vm" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.h:89:  The parameter name "vm" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 7 in 37 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 16 Keith Miller 2017-04-11 21:11:14 PDT
Comment on attachment 306873 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=306873&action=review

r- on removal of locking for memories, until convinced otherwise. Also, have some nits.

> JSTests/wasm/function-tests/memory-access-past-4gib.js:61
> +    const memory = new WebAssembly.Memory(memoryDeclaration);
> +    if (verbose)
> +        print(WebAssemblyMemoryMode(memory));

Why not move this out of the loop?

> Source/JavaScriptCore/ChangeLog:31
> +        watermark of fast memories was allocated.

Maybe we should have a system where Wasm memories get unmapped if enough GCs happen without them reentering use.

> Source/JavaScriptCore/ChangeLog:50
> +        on iOS. Once we fix th above issues we'll want to re-visit and

typo: th => the

> Source/JavaScriptCore/ChangeLog:64
> +        * b3/B3LowerToAir.cpp: fix a baaaaddd bug where unsigned->signed
> +        conversion allowed out-of-bounds reads by -2GiB. I'll follow-up in
> +        bug #170692 to prevent this type of bug once and for all.

Interesting, I'm surprised that no tests caught this.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3177
> +                RELEASE_ASSERT(limitValue <= value->redzoneLimit());

I would make this a debug assert.

> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:353
> +                RELEASE_ASSERT(m_memorySizeGPR == pinnedGPR);
> +                break;
> +            case MemoryMode::Signaling:
> +                RELEASE_ASSERT(InvalidGPRReg == pinnedGPR);
> +                break;
> +            case MemoryMode::NumberOfMemoryModes:
> +                RELEASE_ASSERT_NOT_REACHED();

Nit: I'm not sure these need to be release asserts. It seems unlikely this will catch any release bugs in practice. This function might also be called many times.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:51
> +constexpr size_t fastMemoryCacheHardLimit { 16};

nit: { 16} => { 16 }

> Source/JavaScriptCore/wasm/WasmMemory.cpp:58
> +std::atomic<void*> fastMemoryCache[fastMemoryCacheHardLimit] = { ATOMIC_VAR_INIT(nullptr) };
> +std::atomic<void*> currentlyActiveFastMemories[fastMemoryAllocationSoftLimit] = { ATOMIC_VAR_INIT(nullptr) };
> +std::atomic<size_t> currentlyAllocatedFastMemories = ATOMIC_VAR_INIT(0);
> +std::atomic<size_t> observedMaximumFastMemory = ATOMIC_VAR_INIT(0);

I'm not sure I see what the real advantage of not using a lock here. It feels like it just makes the other code more complicated. It seems really unlikely that we are going to be allocating more than one memory at the same time in different threads. Even if we are, we potentially do a full GC anyway, so we can't be too worried about a latency problem here.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:124
> +            dataLogLnIf(verbose && memory, "tryGetFastMemory waited on GC and retried successfully");

How is this ever true?

> Websites/webkit.org/docs/b3/intermediate-representation.html:661
>          WasmBoundsCheckValue class.</dd>

You should update this to comment on the InvalidGPR case and the maximum.
Comment 17 JF Bastien 2017-04-11 22:02:18 PDT
(In reply to Keith Miller from comment #16)
> Comment on attachment 306873 [details]
> patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=306873&action=review
> 
> r- on removal of locking for memories, until convinced otherwise. Also, have
> some nits.

I removed locking for a few reasons which I should better describe.

The main one is that I'd rather not grab a lock in a signal handler. It's unlikely to cause badness here given the scope of this lock, but is generally a bad idea because in general signals should happen when we've messed up... except here, we're doing it on purpose, which doesn't mean we won't mess up elsewhere! This will get more complicated once we add support for multiple threads sharing memories and being able to grow their memories. Right now I think it's benign, but I'd rather not have to concern myself with this.

It seems weird to acquire a lock for the GC as well. I guess we could make the watermark update under the lock, and the read racy.

I want to perform this initialization super early in the process. That's my main reasoning for moving to a static array (well, and simplicity).

So it's not about speed or reduced contention. In fact, I'd use something else than a FIFO if these were a concern. I stayed with a FIFO because it's super easy: visit the bounded list from start to finish, until we find what we want.

So I think this approach is better, not because it's faster or scales more (we're about to do syscalls anyways), but because it's a statically-sized array traversal, nothing more.

I should update the ChangeLog with that info.
Comment 18 JF Bastien 2017-04-12 09:18:14 PDT
(In reply to JF Bastien from comment #17)
> (In reply to Keith Miller from comment #16)
> > Comment on attachment 306873 [details]
> > patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=306873&action=review
> > 
> > r- on removal of locking for memories, until convinced otherwise. Also, have
> > some nits.
> 
> I removed locking for a few reasons which I should better describe.
> 
> The main one is that I'd rather not grab a lock in a signal handler. It's
> unlikely to cause badness here given the scope of this lock, but is
> generally a bad idea because in general signals should happen when we've
> messed up... except here, we're doing it on purpose, which doesn't mean we
> won't mess up elsewhere! This will get more complicated once we add support
> for multiple threads sharing memories and being able to grow their memories.
> Right now I think it's benign, but I'd rather not have to concern myself
> with this.
> 
> It seems weird to acquire a lock for the GC as well. I guess we could make
> the watermark update under the lock, and the read racy.
> 
> I want to perform this initialization super early in the process. That's my
> main reasoning for moving to a static array (well, and simplicity).
> 
> So it's not about speed or reduced contention. In fact, I'd use something
> else than a FIFO if these were a concern. I stayed with a FIFO because it's
> super easy: visit the bounded list from start to finish, until we find what
> we want.
> 
> So I think this approach is better, not because it's faster or scales more
> (we're about to do syscalls anyways), but because it's a statically-sized
> array traversal, nothing more.
> 
> I should update the ChangeLog with that info.

Forgot to say: the original reason I wasn't thrilled by this was that the fast memory code uses CRASH() when things go bad, and in release the following happens:

    *(int *)(uintptr_t)0xbbadbeef = 0;

That's great for backtraces... except it's a segfault, so the signal handler will be called first, and try to determine how bad 0xbbadbeef really is, which will acquire the fast memory lock. If we already held it then we deadlock, don't get a crash report, and get a bad user experience.

There are 3 possible fixes to this:

 1. Don't acquire a lock when figuring out whether the address is in range.
 2. Always release the lock before CRASH().
 3. Don't use locks.

I think they're all manageable, but I think 3. is the hardest to mess up in that way later.

I agree, though, that atomics are harder than locks. I tried to keep it simple with a FIFO, but I can add more comments and details in the ChangeLog.
Comment 19 JF Bastien 2017-04-12 10:25:13 PDT
Created attachment 306919 [details]
patch

> JSTests/wasm/function-tests/memory-access-past-4gib.js:61
> > +    const memory = new WebAssembly.Memory(memoryDeclaration);
> > +    if (verbose)
> > +        print(WebAssemblyMemoryMode(memory));
> 
> Why not move this out of the loop?

Because I want to get different memories with different declarations each time, and would rather not pre-allocate the memory before the loop.


> > Source/JavaScriptCore/ChangeLog:31
> > +        watermark of fast memories was allocated.
> 
> Maybe we should have a system where Wasm memories get unmapped if enough GCs
> happen without them reentering use.

Filed https://bugs.webkit.org/show_bug.cgi?id=170773


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:58
> > +std::atomic<void*> fastMemoryCache[fastMemoryCacheHardLimit] = { ATOMIC_VAR_INIT(nullptr) };
> > +std::atomic<void*> currentlyActiveFastMemories[fastMemoryAllocationSoftLimit] = { ATOMIC_VAR_INIT(nullptr) };
> > +std::atomic<size_t> currentlyAllocatedFastMemories = ATOMIC_VAR_INIT(0);
> > +std::atomic<size_t> observedMaximumFastMemory = ATOMIC_VAR_INIT(0);
> 
> I'm not sure I see what the real advantage of not using a lock here. It
> feels like it just makes the other code more complicated. It seems really
> unlikely that we are going to be allocating more than one memory at the same
> time in different threads. Even if we are, we potentially do a full GC
> anyway, so we can't be too worried about a latency problem here.

I updated the ChangeLog with some of the details I posted before, and improved comments around this code. I also simplified some of the atomics.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:124
> > +            dataLogLnIf(verbose && memory, "tryGetFastMemory waited on GC and retried successfully");
> 
> How is this ever true?

Copy / pasta! Fixed.
Comment 20 Build Bot 2017-04-12 10:26:53 PDT
Attachment 306919 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:302:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:322:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/runtime/Options.h:441:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/runtime/Options.h:440:  One space before end of line comments  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/runtime/Options.h:440:  Should have a space between // and comment  [whitespace/comments] [4]
Total errors found: 5 in 37 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 Saam Barati 2017-04-12 11:25:19 PDT
(In reply to JF Bastien from comment #18)
> (In reply to JF Bastien from comment #17)
> > (In reply to Keith Miller from comment #16)
> > > Comment on attachment 306873 [details]
> > > patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=306873&action=review
> > > 
> > > r- on removal of locking for memories, until convinced otherwise. Also, have
> > > some nits.
> > 
> > I removed locking for a few reasons which I should better describe.
> > 
> > The main one is that I'd rather not grab a lock in a signal handler. It's
> > unlikely to cause badness here given the scope of this lock, but is
> > generally a bad idea because in general signals should happen when we've
> > messed up... except here, we're doing it on purpose, which doesn't mean we
> > won't mess up elsewhere! This will get more complicated once we add support
> > for multiple threads sharing memories and being able to grow their memories.
> > Right now I think it's benign, but I'd rather not have to concern myself
> > with this.
> > 
> > It seems weird to acquire a lock for the GC as well. I guess we could make
> > the watermark update under the lock, and the read racy.
> > 
> > I want to perform this initialization super early in the process. That's my
> > main reasoning for moving to a static array (well, and simplicity).
> > 
> > So it's not about speed or reduced contention. In fact, I'd use something
> > else than a FIFO if these were a concern. I stayed with a FIFO because it's
> > super easy: visit the bounded list from start to finish, until we find what
> > we want.
> > 
> > So I think this approach is better, not because it's faster or scales more
> > (we're about to do syscalls anyways), but because it's a statically-sized
> > array traversal, nothing more.
> > 
> > I should update the ChangeLog with that info.
> 
> Forgot to say: the original reason I wasn't thrilled by this was that the
> fast memory code uses CRASH() when things go bad, and in release the
> following happens:
> 
>     *(int *)(uintptr_t)0xbbadbeef = 0;
> 
> That's great for backtraces... except it's a segfault, so the signal handler
> will be called first, and try to determine how bad 0xbbadbeef really is,
> which will acquire the fast memory lock. If we already held it then we
> deadlock, don't get a crash report, and get a bad user experience.
Doesn't the signal handler first check if you're executing JIT code?

> 
> There are 3 possible fixes to this:
> 
>  1. Don't acquire a lock when figuring out whether the address is in range.
>  2. Always release the lock before CRASH().
>  3. Don't use locks.
> 
> I think they're all manageable, but I think 3. is the hardest to mess up in
> that way later.
> 
> I agree, though, that atomics are harder than locks. I tried to keep it
> simple with a FIFO, but I can add more comments and details in the ChangeLog.
Comment 22 Michael Saboff 2017-04-12 12:08:30 PDT
Comment on attachment 306919 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=306919&action=review

I checked the atomic code in WasmMemory and it looks fine to me.

> Source/JavaScriptCore/ChangeLog:56
> +        in a signal handler because in general signals should happen when

Change *should* to *can*.

> Source/JavaScriptCore/ChangeLog:58
> +        raising a signal on purpose and handling it. However this which

Eliminate *which*

> Source/JavaScriptCore/wasm/WasmMemory.cpp:152
> +            dataLogLnIf(verbose && memory, "tryGetFastMemory waited on GC and retried successfully");

Suggestion: Change this to dataLogLnIf(verbose, "tryGetFastMemory waited on GC and retried ", memory ? "successfully" : "unsuccessfully"); so that you can log how successful GC is at freeing up fast memory.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:168
> +                        dataLogLnIf(verbose, "tryGetFastMemory currently observer maximum is now ", currentlyAllocated);

In the message, should *observer* really be *observed*?

> Source/JavaScriptCore/wasm/WasmMemory.cpp:345
>  RefPtr<Memory> Memory::createImpl(VM& vm, PageCount initial, PageCount maximum, std::optional<MemoryMode> requiredMode)

It doesn't appear that we call this anywhere where the optional<MemoryMode> is passed.  Seems like we could simplify the code if we eliminated this parameter and the corresponding logic.
Comment 23 JF Bastien 2017-04-12 13:42:03 PDT
> > That's great for backtraces... except it's a segfault, so the signal handler
> > will be called first, and try to determine how bad 0xbbadbeef really is,
> > which will acquire the fast memory lock. If we already held it then we
> > deadlock, don't get a crash report, and get a bad user experience.
> Doesn't the signal handler first check if you're executing JIT code?

Correct. We can however reach the wasm memory code from JIT code, so any future changes there mean we have to be careful.

My point still stands though: I'd really rather not acquire locks in a signal handler. It's the type of code that's easy to get wrong, easy to make wrong in the future, and hard to predict because signals occur when we've already messed something up. It's not a high-priority task to fix, but we should fix as we can. I could do so here, so I did.

I know this sounds tinfoil-hatty :)
Comment 24 JF Bastien 2017-04-12 14:12:06 PDT
Created attachment 306933 [details]
patch

Address Michael's comments.
Comment 25 Build Bot 2017-04-12 14:14:41 PDT
Attachment 306933 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:302:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/wasm/WasmMemory.cpp:322:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
ERROR: Source/JavaScriptCore/runtime/Options.h:441:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
ERROR: Source/JavaScriptCore/runtime/Options.h:440:  One space before end of line comments  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/runtime/Options.h:440:  Should have a space between // and comment  [whitespace/comments] [4]
Total errors found: 5 in 37 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 26 Keith Miller 2017-04-13 12:10:39 PDT
Comment on attachment 306933 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=306933&action=review

r=me with more comments.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:480
> +    void* newMemory = tryGetSlowMemory(desiredSize);

I'm not sure we want to grow to the exact size. Usually you want to double up so you can avoid the pathological case of repeatedly calling grow(1).

I would make this:

std::min(std::max(desiredSize, m_mappedCapacity * 2), 2 << 32).

> Websites/webkit.org/docs/b3/intermediate-representation.html:659
> +        InvalidGPR in which case the out-of-bounds limit is 4GiB. In the later case a maximum parameter

nit: I think you need a comma after InvalidGPR.
Comment 27 JF Bastien 2017-04-13 14:00:26 PDT
Created attachment 307020 [details]
patch

Fix html pages. I filed a bug for the other comment since I'm not sure we want to do this.
Comment 28 WebKit Commit Bot 2017-04-13 14:02:08 PDT
Comment on attachment 307020 [details]
patch

Rejecting attachment 307020 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-02', 'validate-changelog', '--check-oops', '--non-interactive', 307020, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

ChangeLog entry in JSTests/ChangeLog contains OOPS!.

Full output: http://webkit-queues.webkit.org/results/3530417
Comment 29 JF Bastien 2017-04-13 14:05:59 PDT
Created attachment 307021 [details]
patch

OOPS
Comment 30 WebKit Commit Bot 2017-04-13 14:48:46 PDT
Comment on attachment 307021 [details]
patch

Clearing flags on attachment: 307021

Committed r215340: <http://trac.webkit.org/changeset/215340>
Comment 31 WebKit Commit Bot 2017-04-13 14:48:48 PDT
All reviewed patches have been landed.  Closing bug.
Comment 32 Ryan Haddad 2017-04-13 15:27:56 PDT
This change appears to have broken the Windows build:

https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/657
Comment 33 JF Bastien 2017-04-13 15:30:26 PDT
(In reply to Ryan Haddad from comment #32)
> This change appears to have broken the Windows build:
> 
> https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/
> 657

Interesting, I didn't realize that AssemblerCommon.h was included in Options.h. I can just delete the one I added and fix this build issue.

Patch forthcoming.
Comment 34 JF Bastien 2017-04-13 15:33:29 PDT
(In reply to JF Bastien from comment #33)
> (In reply to Ryan Haddad from comment #32)
> > This change appears to have broken the Windows build:
> > 
> > https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/
> > 657
> 
> Interesting, I didn't realize that AssemblerCommon.h was included in
> Options.h. I can just delete the one I added and fix this build issue.
> 
> Patch forthcoming.

https://bugs.webkit.org/show_bug.cgi?id=170832
Comment 35 Saam Barati 2017-04-13 20:38:41 PDT
Comment on attachment 307021 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307021&action=review

> Source/JavaScriptCore/ChangeLog:52
> +        Part of this good positioning is a facility to pre-allocate fast
> +        memories very early at startup, before any fragmentation
> +        occurs. This is currently disabled but worked extremely reliably
> +        on iOS. Once we fix the above issues we'll want to re-visit and
> +        turn on pre-allocation.

So this is off on macOS too? If so, does it need to be checked in?

> Source/JavaScriptCore/ChangeLog:70
> +        This is a segfault, which our fast memory signal handler tries to
> +        handle. It does so by first figuring out whether 0xbbadbeef is in
> +        a fast memory region, reqiring a lock. If we CRASH() while holding
> +        the lock then our thread self-deadlocks, giving us no crash report
> +        and a bad user experience.

This is not an accurate description of what happens. We first check if the PC is in JIT code. That makes all the difference.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3174
> +                // We know that anything above the declared 'maximum' will trap, so we can compare against that number. If there was no declared
> +                // 'maximum' then we still know that any access above 4GiB will trap, no need to add the redzone.

Your comment here is off by 1. It should be above or equal, not above.

> Source/JavaScriptCore/b3/B3LowerToAir.cpp:3176
> +                size_t limitValue = value->maximum() ? value->maximum().bytes() : std::numeric_limits<uint32_t>::max();

IMO, this logic should not be in Air.
I think it's cleaner if Air/B3 don't know about Wasm data types.
I think WasmB3IRGenerator should contain this logic, and always give the WasmBoundsCheckValue a size_t maximum.
Then, you can also remove the ifdef. Even though this opcode has Wasm in the name, other users of B3 could theoretically use it.

Also, this code looks very wrong to me, and looks to have a security hole. Maybe I'm wrong, but you're trusting that we've allocated the maximum() bytes, but we might not be able to, and we might just allocate initial bytes.

> Source/JavaScriptCore/b3/B3WasmBoundsCheckValue.h:69
> +    JS_EXPORT_PRIVATE WasmBoundsCheckValue(Origin, Value* ptr, GPRReg pinnedGPR, unsigned offset, PageCount maximum);

See comment above, I'd have this take a size_t.

> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:557
> +        // We've virtually mapped 4GiB+redzone fo this memory. Only the user-allocated pages are addressable, contiguously in range [0, current], and everything above is mapped PROT_NONE. We don't need to perform any explicit bounds check in the 4GiB range because WebAssembly register memory accesses are 32-bit. However WebAssembly register+immediate accesses perform the addition in 64-bit which can push an access above the 32-bit limit. The redzone will catch most small immediates, and we'll explicitly bounds check any register + large immediate access.

Typo: "fo" => "for".
Also, style nit: We usually make really long comments multiline in JSC.

> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1302
> +    if (!!info.memory) {
> +        // Resetting the register prevents the GC from mistakenly thinking that the context is still live.
> +        GPRReg baseMemory = pinnedRegs.baseMemoryPointer;
> +        jit.move(CCallHelpers::TrustedImm32(0), baseMemory);
> +    }

I'm 99% that this literally does nothing. The loop after this is restoring callee saves, which baseMemory should be.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:69
> +        webAssemblyCouldntUnmapMemoryBytes();

Style: No need for the word "bytes" at the end of this function name.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:81
> +void zeroAndUnprotectBytes(void* start, size_t bytes)
>  {
> -    dataLogIf(verbose, "Attempting to mmap ", bytes, " bytes: ");
> -    // FIXME: It would be nice if we had a VM tag for wasm memory. https://bugs.webkit.org/show_bug.cgi?id=163600
> -    void* result = mmap(nullptr, bytes, PROT_NONE, MAP_PRIVATE | MAP_ANON, -1, 0);
> -    if (result == MAP_FAILED) {
> -        dataLogLnIf(verbose, "failed");
> -        return false;
> +    if (bytes) {
> +        dataLogLnIf(verbose, "Zeroing and unprotecting ", bytes, " from ", RawPointer(start));
> +        // FIXME: We could be smarter about memset / mmap / madvise. Here, we may not need to act synchronously, or maybe we can memset+unprotect smaller ranges of memory (which would pay off if not all the writable memory was actually physically backed: memset forces physical backing only to unprotect it right after). https://bugs.webkit.org/show_bug.cgi?id=170343
> +        memset(start, 0, bytes);
> +        if (UNLIKELY(mprotect(start, bytes, PROT_NONE)))
> +            webAssemblyCouldntUnprotectMemory();
>      }
> -    dataLogLnIf(verbose, "succeeded");
> -    memory = result;
> -    return true;
>  }

Style nit: I wouldn't have this be its own function. It has one caller, and it's super confusing when not in the context of its caller. mprotect can fail for valid reasons, but it should not fail in this context. That context is lost when this is its own function.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:152
> +            // No memory was available in the cache. Maybe waiting on GC will find a free one.
> +            // FIXME collectSync(Full) and custom eager destruction of wasm memories could be better. For now use collectAllGarbage. Also, nothing tells us the current VM is holding onto fast memories. https://bugs.webkit.org/show_bug.cgi?id=170748
> +            dataLogLnIf(verbose, "tryGetFastMemory waiting on GC and retrying");
> +            vm.heap.collectAllGarbage();
> +            memory = tryGetCachedFastMemory();
> +            dataLogLnIf(verbose, "tryGetFastMemory waited on GC and retried ", memory? "successfully" : "unseccessfully");

This looks like a regression given your preset numbers. It looks like we'll sync GC the first time we allocate Wasm memory. We should not have this behavior. It's unfortunate WasmBench doesn't time creating a memory. Perhaps we should change that. I would suspect something like this would show up. I suspect your patch is causing us to GC more, not less.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:155
> +        // The soft limit is inherently racy because checking+allocation isn't atomic. Exceeding it slightly is fine.

Why is it OK if addressIsInActiveFastMemory uses this limit? This seems slightly wrong? Or is it OK b/c of tryAddToCurrentlyActiveFastMemories?

> Source/JavaScriptCore/wasm/WasmMemory.cpp:195
> +}

I think this function would be a lot easier to read and reason about if you just used a lock for fast memory related things. I'm not sure making everything lock free will buy us anything but pain when reading this code.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:276
> +    const auto startTime = MonotonicTime::now();

Please only do this if verbose.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:283
> +                void* startAddress = reinterpret_cast<char*>(memory) + Memory::fastMappedBytes() * subMemory + subMemory;

This looks off by num subMemory. Why "+ subMemory"? Pretty sure this is wrong.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:315
> +    const auto endTime = MonotonicTime::now();

Please only do this if verbose.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:321
> +    for (size_t cacheEntry = 0; cacheEntry < fastMemoryPreallocateCount; ++cacheEntry) {
> +        void* startAddress = fastMemoryCache[cacheEntry].load(std::memory_order_relaxed);
> +        ASSERT(startAddress);
> +        dataLogLnIf(verbose, "Pre-allocation of WebAssembly fast memory at ", RawPointer(startAddress));
> +    }

Style nit: This whole loop should only happen "if !ASSERT_DISABLED". I think LLVM will do that for you, but I think doing this yourself makes the code easier to read.

> Source/JavaScriptCore/wasm/WasmMemory.cpp:387
> +            mappedCapacityBytes = initialBytes;

This case is why I think your change to B3/Air is wrong.
Comment 36 Jonathan Bedard 2017-04-14 08:38:50 PDT
This caused a regression in the Windows build.

<https://trac.webkit.org/changeset/215340/webkit>
<https://build.webkit.org/builders/Apple%20Win%20Release%20%28Build%29/builds/657>
Comment 37 JF Bastien 2017-04-17 11:23:34 PDT
> > Source/JavaScriptCore/ChangeLog:52
> > +        Part of this good positioning is a facility to pre-allocate fast
> > +        memories very early at startup, before any fragmentation
> > +        occurs. This is currently disabled but worked extremely reliably
> > +        on iOS. Once we fix the above issues we'll want to re-visit and
> > +        turn on pre-allocation.
> 
> So this is off on macOS too? If so, does it need to be checked in?

Correct. I think it's doing exactly what we want, and the problem we encountered is orthogonal to this. When I try to fix the problem I'll need to re-enable pre-allocation, so I'd rather not have to dig it back up.

> > Source/JavaScriptCore/ChangeLog:70
> > +        This is a segfault, which our fast memory signal handler tries to
> > +        handle. It does so by first figuring out whether 0xbbadbeef is in
> > +        a fast memory region, reqiring a lock. If we CRASH() while holding
> > +        the lock then our thread self-deadlocks, giving us no crash report
> > +        and a bad user experience.
> 
> This is not an accurate description of what happens. We first check if the
> PC is in JIT code. That makes all the difference.

You are correct.


> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3174
> > +                // We know that anything above the declared 'maximum' will trap, so we can compare against that number. If there was no declared
> > +                // 'maximum' then we still know that any access above 4GiB will trap, no need to add the redzone.
> 
> Your comment here is off by 1. It should be above or equal, not above.

I'll fix in a follow-up.


> > Source/JavaScriptCore/b3/B3LowerToAir.cpp:3176
> > +                size_t limitValue = value->maximum() ? value->maximum().bytes() : std::numeric_limits<uint32_t>::max();
> 
> IMO, this logic should not be in Air.
> I think it's cleaner if Air/B3 don't know about Wasm data types.
> I think WasmB3IRGenerator should contain this logic, and always give the
> WasmBoundsCheckValue a size_t maximum.
> Then, you can also remove the ifdef. Even though this opcode has Wasm in the
> name, other users of B3 could theoretically use it.

I'll fix in a follow-up.


> Also, this code looks very wrong to me, and looks to have a security hole.
> Maybe I'm wrong, but you're trusting that we've allocated the maximum()
> bytes, but we might not be able to, and we might just allocate initial bytes.

As we discussed offline last week, no hole here. Because this is ever only used for fast memory, we know that a full 4GiB+redzone was allocated and will trap if OOB. The non-trapping area is [0-initial), and [initial,maximum) will trap (as will anything above maximum).


> > Source/JavaScriptCore/b3/B3WasmBoundsCheckValue.h:69
> > +    JS_EXPORT_PRIVATE WasmBoundsCheckValue(Origin, Value* ptr, GPRReg pinnedGPR, unsigned offset, PageCount maximum);
> 
> See comment above, I'd have this take a size_t.

I'll fix in a follow-up.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:81
> > +void zeroAndUnprotectBytes(void* start, size_t bytes)
> >  {
> > -    dataLogIf(verbose, "Attempting to mmap ", bytes, " bytes: ");
> > -    // FIXME: It would be nice if we had a VM tag for wasm memory. https://bugs.webkit.org/show_bug.cgi?id=163600
> > -    void* result = mmap(nullptr, bytes, PROT_NONE, MAP_PRIVATE | MAP_ANON, -1, 0);
> > -    if (result == MAP_FAILED) {
> > -        dataLogLnIf(verbose, "failed");
> > -        return false;
> > +    if (bytes) {
> > +        dataLogLnIf(verbose, "Zeroing and unprotecting ", bytes, " from ", RawPointer(start));
> > +        // FIXME: We could be smarter about memset / mmap / madvise. Here, we may not need to act synchronously, or maybe we can memset+unprotect smaller ranges of memory (which would pay off if not all the writable memory was actually physically backed: memset forces physical backing only to unprotect it right after). https://bugs.webkit.org/show_bug.cgi?id=170343
> > +        memset(start, 0, bytes);
> > +        if (UNLIKELY(mprotect(start, bytes, PROT_NONE)))
> > +            webAssemblyCouldntUnprotectMemory();
> >      }
> > -    dataLogLnIf(verbose, "succeeded");
> > -    memory = result;
> > -    return true;
> >  }
> 
> Style nit: I wouldn't have this be its own function. It has one caller, and
> it's super confusing when not in the context of its caller. mprotect can
> fail for valid reasons, but it should not fail in this context. That context
> is lost when this is its own function.

That will change when I address 170343, that function will be standalone and be fancier, so moving it back out seems impractical.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:152
> > +            // No memory was available in the cache. Maybe waiting on GC will find a free one.
> > +            // FIXME collectSync(Full) and custom eager destruction of wasm memories could be better. For now use collectAllGarbage. Also, nothing tells us the current VM is holding onto fast memories. https://bugs.webkit.org/show_bug.cgi?id=170748
> > +            dataLogLnIf(verbose, "tryGetFastMemory waiting on GC and retrying");
> > +            vm.heap.collectAllGarbage();
> > +            memory = tryGetCachedFastMemory();
> > +            dataLogLnIf(verbose, "tryGetFastMemory waited on GC and retried ", memory? "successfully" : "unseccessfully");
> 
> This looks like a regression given your preset numbers. It looks like we'll
> sync GC the first time we allocate Wasm memory. We should not have this
> behavior. It's unfortunate WasmBench doesn't time creating a memory. Perhaps
> we should change that. I would suspect something like this would show up. I
> suspect your patch is causing us to GC more, not less.

Good point, I'll fix in a follow-up.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:155
> > +        // The soft limit is inherently racy because checking+allocation isn't atomic. Exceeding it slightly is fine.
> 
> Why is it OK if addressIsInActiveFastMemory uses this limit? This seems
> slightly wrong? Or is it OK b/c of tryAddToCurrentlyActiveFastMemories?

fastMemoryAllocationSoftLimit is a constexpr which bounds the static global array. addressIsInActiveFastMemory just uses it as the upper bound of that array, there can't be more than that number actually active. There can, however, be more than that allocated if allocations race to creation. If that unlikely race happens then tryAddToCurrentlyActiveFastMemories returns false, and we begrudgingly relinquish that fast memory because even though we could use it we have no way to account for it as active (because the array is full).


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:195
> > +}
> 
> I think this function would be a lot easier to read and reason about if you
> just used a lock for fast memory related things. I'm not sure making
> everything lock free will buy us anything but pain when reading this code.

I'd really rather avoid the lock in the signal handler.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:276
> > +    const auto startTime = MonotonicTime::now();
> 
> Please only do this if verbose.

In follow-up.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:283
> > +                void* startAddress = reinterpret_cast<char*>(memory) + Memory::fastMappedBytes() * subMemory + subMemory;
> 
> This looks off by num subMemory. Why "+ subMemory"? Pretty sure this is
> wrong.

Eek, horrible copy / pasta. Follow-up.


> > Source/JavaScriptCore/wasm/WasmMemory.cpp:387
> > +            mappedCapacityBytes = initialBytes;
> 
> This case is why I think your change to B3/Air is wrong.

This is only for slow memories, whereas the above is only for fast memories.