Bug 177305 - WebAssembly: cache memory address / size on instance
Summary: WebAssembly: cache memory address / size on instance
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebAssembly (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: GSkachkov
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-09-21 09:31 PDT by JF Bastien
Modified: 2018-02-23 15:18 PST (History)
10 users (show)

See Also:


Attachments
Patch (15.85 KB, patch)
2017-12-15 14:10 PST, GSkachkov
no flags Details | Formatted Diff | Diff
Patch (19.10 KB, patch)
2017-12-16 11:30 PST, GSkachkov
no flags Details | Formatted Diff | Diff
Patch (19.95 KB, patch)
2017-12-17 10:59 PST, GSkachkov
no flags Details | Formatted Diff | Diff
Patch (20.91 KB, patch)
2017-12-20 12:11 PST, GSkachkov
no flags Details | Formatted Diff | Diff
Patch (22.39 KB, patch)
2018-01-07 02:00 PST, GSkachkov
no flags Details | Formatted Diff | Diff
Patch (22.48 KB, patch)
2018-01-11 14:14 PST, GSkachkov
no flags Details | Formatted Diff | Diff
Patch (22.73 KB, patch)
2018-02-23 12:55 PST, GSkachkov
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description JF Bastien 2017-09-21 09:31:55 PDT
(From a FIXME that didn't have a bug)

In Wasm::Memory, we should cache address / size to the instances to avoid a load on instance->instance calls. This will require updating all the instances when grow is called.
Comment 1 GSkachkov 2017-12-04 12:11:16 PST
I can try to fix this issue.

Wasm::Instance will have to add property cachedMemory and cachedSize, but Wasm::Memory should contain links to Wasm::Instance where Wasm::Memory is used to iterate within `grow`, is it correct understating of the issue?
Comment 2 JF Bastien 2017-12-04 17:05:41 PST
(In reply to GSkachkov from comment #1)
> I can try to fix this issue.

Cool!

> Wasm::Instance will have to add property cachedMemory and cachedSize,

Yes, and use it in the JIT instead of doing the indirection.

> Wasm::Memory should contain links to Wasm::Instance where Wasm::Memory is
> used to iterate within `grow`, is it correct understating of the issue?

Wasm::Memory can be used by multiple Wasm::Instance at the same time, so the Wasm::Memory has to know all of the Wasm::Instances that refer to it. The Instance owns the memory, so the back-pointers need to be weak references. Then yes, you would update all of them from grow.
Comment 3 GSkachkov 2017-12-15 14:10:15 PST
Created attachment 329518 [details]
Patch

WiP. Caching memory size & addess in  wasm::Instance
Comment 4 JF Bastien 2017-12-15 14:11:03 PST
(In reply to GSkachkov from comment #3)
> Created attachment 329518 [details]
> Patch
> 
> WiP. Caching memory size & addess in  wasm::Instance

You've marked as "review?". Is it WIP or ready for review?
Comment 5 GSkachkov 2017-12-15 14:15:14 PST
(In reply to JF Bastien from comment #4)
> (In reply to GSkachkov from comment #3)
> > Created attachment 329518 [details]
> > Patch
> > 
> > WiP. Caching memory size & addess in  wasm::Instance
> 
> You've marked as "review?". Is it WIP or ready for review?

Sorry, it is WiP and I removed review flag, but I would like to have some feedback if my changes related to what is expected to do in this task.
Comment 6 GSkachkov 2017-12-16 11:30:31 PST
Created attachment 329580 [details]
Patch

Patch for review
Comment 7 GSkachkov 2017-12-17 10:59:47 PST
Created attachment 329610 [details]
Patch

Patch for review + small clearence
Comment 8 JF Bastien 2017-12-18 09:12:52 PST
Comment on attachment 329610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=329610&action=review

Very cool! That's the right direction, but I have a few comments.

Also for testing you might want to look at the memory fuzz test I wrote in JSTests/wasm/fuzz

> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:-1284
> -            ASSERT(sizeRegs[0].sizeRegister != baseMemory);

I think we should keep this assert.

> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1281
>              ASSERT(!sizeRegs[0].sizeOffset);

Can you also assert that sizeRegs.size() == 1 ? The code below assumes it is, and would be wrong if that weren't the case. You can like to FIXME https://bugs.webkit.org/show_bug.cgi?id=162952

> Source/JavaScriptCore/wasm/WasmBinding.cpp:-68
> -    ASSERT(!sizeRegs[0].sizeOffset); // The following code assumes we start at 0, and calculates subsequent size registers relative to 0.

Same here on keeping the assert, and adding one for https://bugs.webkit.org/show_bug.cgi?id=162952

> Source/JavaScriptCore/wasm/WasmInstance.h:77
> +        updateChechedMemory();

Typo "cached".

> Source/JavaScriptCore/wasm/WasmInstance.h:79
> +    void updateChechedMemory()

Typo.

> Source/JavaScriptCore/wasm/WasmInstance.h:147
> +    WeakPtrFactory<Instance> m_weakPtrFactory;

I would try to organize members so that the ones accessed by the JIT are together, at the start. That way we need only small immediates, and they'll be in the same cacheline when accessed.

So the cached values would go after m_context, and the weak ptr factory after the call frame callback.

> Source/JavaScriptCore/wasm/js/JSToWasm.cpp:193
> +            jit.loadWasmContextInstance(baseMemory);

This now happens twice if the memory wasn't signaling and we use fast TLS.

It might be easier to have 4 code paths without any sharing, for all combinations of signal/no + fast TLS/no.

> JSTests/wasm/function-tests/memory-reuse.js:4
> +{

This scope doesn't seem useful?

> JSTests/wasm/function-tests/memory-reuse.js:8
> +    function createWasmModule(memory) {

This returns instances, so I'd call it "createInstance".

> JSTests/wasm/function-tests/memory-reuse.js:41
> +        for (let i = 0; i < buf.length/4; ++i) {

Spaces between divide. Same thing for other binary operators below.

> JSTests/wasm/function-tests/memory-reuse.js:56
> +        const pageSize = 64 * 1024;

Hoist this to the top of the file, and remove the other one above.

> JSTests/wasm/function-tests/memory-reuse.js:58
> +        const modules = [];

As above, I'd rename to "instances".

> JSTests/wasm/function-tests/memory-reuse.js:68
> +    }

I think the start of the test is good, but you then need to grow the memory. There are two ways to do so: as the grow_memory opcode directly in WebAssembly IR, or through the JS API on WebAssembly.Memory. You should then re-run doCheck().

The difficulty in doing this test is that you want them memory's base void* to have moved (hard!), and the size to change (easy!). You're not really testing size because signaling memory does that implicitly. I think you want a VM primitive for testing that allows you to allocate a WebAssembly.Memory that is never signaling. That way to can test size by going out-of-bounds, catching the trap, then growing, and same out-of-bounds should not trap anymore. Now with a non-signaling memory you know that enough growth will cause us to re-allocate and memmove then void* memory too!

After doing so, the tests should kill instances and force-invoke the GC to try to make sure they die, and then re-run the checks again.
Comment 9 GSkachkov 2017-12-20 12:11:48 PST
Created attachment 329942 [details]
Patch

Fix comments
Comment 10 GSkachkov 2017-12-20 12:15:03 PST
Comment on attachment 329610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=329610&action=review

>> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:-1284
>> -            ASSERT(sizeRegs[0].sizeRegister != baseMemory);
> 
> I think we should keep this assert.

Done

>> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1281
>>              ASSERT(!sizeRegs[0].sizeOffset);
> 
> Can you also assert that sizeRegs.size() == 1 ? The code below assumes it is, and would be wrong if that weren't the case. You can like to FIXME https://bugs.webkit.org/show_bug.cgi?id=162952

As I can see we have this assert, line 1283/1279

>> Source/JavaScriptCore/wasm/WasmBinding.cpp:-68
>> -    ASSERT(!sizeRegs[0].sizeOffset); // The following code assumes we start at 0, and calculates subsequent size registers relative to 0.
> 
> Same here on keeping the assert, and adding one for https://bugs.webkit.org/show_bug.cgi?id=162952

Done

>> Source/JavaScriptCore/wasm/WasmInstance.h:77
>> +        updateChechedMemory();
> 
> Typo "cached".

Fixed

>> Source/JavaScriptCore/wasm/WasmInstance.h:79
>> +    void updateChechedMemory()
> 
> Typo.

Fixed

>> Source/JavaScriptCore/wasm/WasmInstance.h:147
>> +    WeakPtrFactory<Instance> m_weakPtrFactory;
> 
> I would try to organize members so that the ones accessed by the JIT are together, at the start. That way we need only small immediates, and they'll be in the same cacheline when accessed.
> 
> So the cached values would go after m_context, and the weak ptr factory after the call frame callback.

Done

>> Source/JavaScriptCore/wasm/js/JSToWasm.cpp:193
>> +            jit.loadWasmContextInstance(baseMemory);
> 
> This now happens twice if the memory wasn't signaling and we use fast TLS.
> 
> It might be easier to have 4 code paths without any sharing, for all combinations of signal/no + fast TLS/no.

I've tried rewrite this, hope I did not lost something.

>> JSTests/wasm/function-tests/memory-reuse.js:4
>> +{
> 
> This scope doesn't seem useful?

Removed

>> JSTests/wasm/function-tests/memory-reuse.js:8
>> +    function createWasmModule(memory) {
> 
> This returns instances, so I'd call it "createInstance".

Fixed

>> JSTests/wasm/function-tests/memory-reuse.js:41
>> +        for (let i = 0; i < buf.length/4; ++i) {
> 
> Spaces between divide. Same thing for other binary operators below.

Fixed

>> JSTests/wasm/function-tests/memory-reuse.js:56
>> +        const pageSize = 64 * 1024;
> 
> Hoist this to the top of the file, and remove the other one above.

Done

>> JSTests/wasm/function-tests/memory-reuse.js:58
>> +        const modules = [];
> 
> As above, I'd rename to "instances".

Done

>> JSTests/wasm/function-tests/memory-reuse.js:68
>> +    }
> 
> I think the start of the test is good, but you then need to grow the memory. There are two ways to do so: as the grow_memory opcode directly in WebAssembly IR, or through the JS API on WebAssembly.Memory. You should then re-run doCheck().
> 
> The difficulty in doing this test is that you want them memory's base void* to have moved (hard!), and the size to change (easy!). You're not really testing size because signaling memory does that implicitly. I think you want a VM primitive for testing that allows you to allocate a WebAssembly.Memory that is never signaling. That way to can test size by going out-of-bounds, catching the trap, then growing, and same out-of-bounds should not trap anymore. Now with a non-signaling memory you know that enough growth will cause us to re-allocate and memmove then void* memory too!
> 
> After doing so, the tests should kill instances and force-invoke the GC to try to make sure they die, and then re-run the checks again.

Rewritten, I hope it is close to what is expected
Comment 11 JF Bastien 2018-01-04 09:01:46 PST
This patch will conflict with a not-yet-committed but important patch. I don't think the conflict will be very bad, but if you don't mind I'd like to hold off committing this until the other patch can land.

Once it's in I'll do the other round of reviews, because this is a really neat improvement that we definitely want!
Comment 12 GSkachkov 2018-01-04 10:02:30 PST
(In reply to JF Bastien from comment #11)
> This patch will conflict with a not-yet-committed but important patch. I
> don't think the conflict will be very bad, but if you don't mind I'd like to
> hold off committing this until the other patch can land.
> 
> Once it's in I'll do the other round of reviews, because this is a really
> neat improvement that we definitely want!

Yes sure, let me know when I need to rebase.
Comment 13 JF Bastien 2018-01-05 20:30:11 PST
(In reply to GSkachkov from comment #12)
> (In reply to JF Bastien from comment #11)
> > This patch will conflict with a not-yet-committed but important patch. I
> > don't think the conflict will be very bad, but if you don't mind I'd like to
> > hold off committing this until the other patch can land.
> > 
> > Once it's in I'll do the other round of reviews, because this is a really
> > neat improvement that we definitely want!
> 
> Yes sure, let me know when I need to rebase.

Here's the patch:
  https://bugs.webkit.org/show_bug.cgi?id=181313

There's now also an index mask that needs to be cached in the same way as memory and size.
Comment 14 GSkachkov 2018-01-07 02:00:20 PST
Created attachment 330654 [details]
Patch

Rebased patch & using cachedIndexingMask property
Comment 15 GSkachkov 2018-01-07 03:03:45 PST
(In reply to JF Bastien from comment #13)
> (In reply to GSkachkov from comment #12)
> > (In reply to JF Bastien from comment #11)
> > > This patch will conflict with a not-yet-committed but important patch. I
> > > don't think the conflict will be very bad, but if you don't mind I'd like to
> > > hold off committing this until the other patch can land.
> > > 
> > > Once it's in I'll do the other round of reviews, because this is a really
> > > neat improvement that we definitely want!
> > 
> > Yes sure, let me know when I need to rebase.
> 
> Here's the patch:
>   https://bugs.webkit.org/show_bug.cgi?id=181313
> 
> There's now also an index mask that needs to be cached in the same way as
> memory and size.

There is a new rebased patch and I added caching of the index mask in the same way as did for other properties.
Comment 16 GSkachkov 2018-01-11 14:14:14 PST
Created attachment 331117 [details]
Patch

Fix error in added tests
Comment 17 GSkachkov 2018-01-25 13:54:55 PST
JF Bastien, Are there any performance tests that I can run for Wasm and check any difference with my patch and without?
Comment 18 JF Bastien 2018-01-25 13:56:58 PST
(In reply to GSkachkov from comment #17)
> JF Bastien, Are there any performance tests that I can run for Wasm and
> check any difference with my patch and without?

We don't have public ones because of licenses. The WebAssembly paper used:
 - PolyBenchC-3.2
 - SciMark2.1

It's also fairly easy to compile SPEC2k (non-Fortran parts) and run the training sets.

Otherwise Emscripten has benchmarks as well.
Comment 19 GSkachkov 2018-01-26 12:56:58 PST
(In reply to JF Bastien from comment #18)
> (In reply to GSkachkov from comment #17)
> > JF Bastien, Are there any performance tests that I can run for Wasm and
> > check any difference with my patch and without?
> 
> We don't have public ones because of licenses. The WebAssembly paper used:
>  - PolyBenchC-3.2
>  - SciMark2.1
> 
> It's also fairly easy to compile SPEC2k (non-Fortran parts) and run the
> training sets.
> 
> Otherwise Emscripten has benchmarks as well.

Thanks for pointing to tests, will prepare some figures during the week.
Comment 20 JF Bastien 2018-02-20 14:31:44 PST
Comment on attachment 331117 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=331117&action=review

This looks pretty good, with a few comments I'm ready to r=me. Also, did you run the benchmarks you said you would?

> Source/JavaScriptCore/wasm/WasmMemory.cpp:403
> +        }

We never remove weak pointers that are now nullptr, correct? That's a (very slow) leak because m_instances never shrinks. I don't think grow() is the right place to remove null entries, though. Maybe in registerInstance you could first traverse m_instances looking for a null entry, reuse the entry if there's one, and if you can't find one then append?

> Source/JavaScriptCore/wasm/js/JSToWasm.cpp:181
> +        GPRReg memoryGRP = Context::useFastTLS() ? baseMemory : wasmContextInstanceGPR;

I'd rename memoryGPR to currentInstanceGPR, since we're using the basemMemory register to hold the instance.
Comment 21 GSkachkov 2018-02-21 12:43:49 PST
(In reply to JF Bastien from comment #20)
> Comment on attachment 331117 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=331117&action=review
> 
> This looks pretty good, with a few comments I'm ready to r=me. Also, did you
> run the benchmarks you said you would?
> 
> > Source/JavaScriptCore/wasm/WasmMemory.cpp:403
> > +        }
> 
> We never remove weak pointers that are now nullptr, correct? That's a (very
> slow) leak because m_instances never shrinks. I don't think grow() is the
> right place to remove null entries, though. Maybe in registerInstance you
> could first traverse m_instances looking for a null entry, reuse the entry
> if there's one, and if you can't find one then append?
> 
> > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:181
> > +        GPRReg memoryGRP = Context::useFastTLS() ? baseMemory : wasmContextInstanceGPR;
> 
> I'd rename memoryGPR to currentInstanceGPR, since we're using the
> basemMemory register to hold the instance.
Thanks for the feedback, will try to fix comments soon. 
Ohh yes, I forget about perf tests. I downloaded tests, but had some troubles with running then switch to another task. I'll try provide figures with next patch.
Comment 22 GSkachkov 2018-02-23 10:16:34 PST
(In reply to JF Bastien from comment #20)
> Comment on attachment 331117 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=331117&action=review
> 
> This looks pretty good, with a few comments I'm ready to r=me. Also, did you
> run the benchmarks you said you would?
> 

There is a result for scimark2_1c:

test was build with following parameters:
emcc   -O *.c -s WASM=1  -s ALLOW_MEMORY_GROWTH=1 -s ABORTING_MALLOC=0 -o scimark2_1c.html 

and checked by following url:
https://gskachkov.github.io/wasm-perf-tests/scimark2_1c/scimark2_1c.html

Result for Safari with Patch

**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:         1186.26
FFT             Mflops:  1281.59    (N=1024)
SOR             Mflops:  1269.40    (100 x 100)
MonteCarlo:     Mflops:   184.56
Sparse matmult  Mflops:  1254.28    (N=1000, nz=5000)
LU              Mflops:  1941.46    (M=100, N=100)

Result for Safari without Patch:

**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:         1182.51
FFT             Mflops:  1310.47    (N=1024)
SOR             Mflops:  1251.27    (100 x 100)
MonteCarlo:     Mflops:   184.94
Sparse matmult  Mflops:  1231.30    (N=1000, nz=5000)
LU              Mflops:  1934.58    (M=100, N=100)


Result from terminal:

**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov)     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:         1879.40
FFT             Mflops:  1681.28    (N=1024)
SOR             Mflops:  1253.96    (100 x 100)
MonteCarlo:     Mflops:   488.40
Sparse matmult  Mflops:  2064.39    (N=1000, nz=5000)
LU              Mflops:  3908.97    (M=100, N=100)

Looks like my changes do not affects this test
Comment 23 GSkachkov 2018-02-23 12:55:06 PST
Created attachment 334544 [details]
Patch
Comment 24 JF Bastien 2018-02-23 14:44:02 PST
Running the non-Fortran part of spec2k, I see roughly 5% throughput win on x86-64. Pretty nice!
Comment 25 JF Bastien 2018-02-23 14:44:38 PST
Comment on attachment 334544 [details]
Patch

r=me
Comment 26 GSkachkov 2018-02-23 15:07:08 PST
(In reply to JF Bastien from comment #24)
> Running the non-Fortran part of spec2k, I see roughly 5% throughput win on
> x86-64. Pretty nice!

Cool! Thanks! 
Where I can download spec2k? I only found by line https://www.spec.org/order.html and I'm not sure what exact version I should download to run tests? CPU2006 V1.2?
Comment 27 JF Bastien 2018-02-23 15:09:53 PST
(In reply to GSkachkov from comment #26)
> (In reply to JF Bastien from comment #24)
> > Running the non-Fortran part of spec2k, I see roughly 5% throughput win on
> > x86-64. Pretty nice!
> 
> Cool! Thanks! 
> Where I can download spec2k? I only found by line
> https://www.spec.org/order.html and I'm not sure what exact version I should
> download to run tests? CPU2006 V1.2?

We have a access to it internally, but cannot make it public because of licensing. I'm using CPU2000 because 2006 uses well over 1GiB of memory on many tests (unless you benchmark the training set).
Comment 28 WebKit Commit Bot 2018-02-23 15:17:03 PST
Comment on attachment 334544 [details]
Patch

Clearing flags on attachment: 334544

Committed r228966: <https://trac.webkit.org/changeset/228966>
Comment 29 WebKit Commit Bot 2018-02-23 15:17:05 PST
All reviewed patches have been landed.  Closing bug.
Comment 30 Radar WebKit Bug Importer 2018-02-23 15:18:25 PST
<rdar://problem/37841252>