177305 – WebAssembly: cache memory address / size on instance

JF Bastien

Reported 2017-09-21 09:31:55 PDT

(From a FIXME that didn't have a bug) In Wasm::Memory, we should cache address / size to the instances to avoid a load on instance->instance calls. This will require updating all the instances when grow is called.

GSkachkov

Comment 1 2017-12-04 12:11:16 PST

I can try to fix this issue. Wasm::Instance will have to add property cachedMemory and cachedSize, but Wasm::Memory should contain links to Wasm::Instance where Wasm::Memory is used to iterate within `grow`, is it correct understating of the issue?

JF Bastien

Comment 2 2017-12-04 17:05:41 PST

(In reply to GSkachkov from comment #1) > I can try to fix this issue. Cool! > Wasm::Instance will have to add property cachedMemory and cachedSize, Yes, and use it in the JIT instead of doing the indirection. > Wasm::Memory should contain links to Wasm::Instance where Wasm::Memory is > used to iterate within `grow`, is it correct understating of the issue? Wasm::Memory can be used by multiple Wasm::Instance at the same time, so the Wasm::Memory has to know all of the Wasm::Instances that refer to it. The Instance owns the memory, so the back-pointers need to be weak references. Then yes, you would update all of them from grow.

GSkachkov

Comment 3 2017-12-15 14:10:15 PST

Created attachment 329518 [details] Patch WiP. Caching memory size & addess in wasm::Instance

JF Bastien

Comment 4 2017-12-15 14:11:03 PST

(In reply to GSkachkov from comment #3) > Created attachment 329518 [details] > Patch > > WiP. Caching memory size & addess in wasm::Instance You've marked as "review?". Is it WIP or ready for review?

GSkachkov

Comment 5 2017-12-15 14:15:14 PST

(In reply to JF Bastien from comment #4) > (In reply to GSkachkov from comment #3) > > Created attachment 329518 [details] > > Patch > > > > WiP. Caching memory size & addess in wasm::Instance > > You've marked as "review?". Is it WIP or ready for review? Sorry, it is WiP and I removed review flag, but I would like to have some feedback if my changes related to what is expected to do in this task.

GSkachkov

Comment 6 2017-12-16 11:30:31 PST

Created attachment 329580 [details] Patch Patch for review

GSkachkov

Comment 7 2017-12-17 10:59:47 PST

Created attachment 329610 [details] Patch Patch for review + small clearence

JF Bastien

Comment 8 2017-12-18 09:12:52 PST

Comment on attachment 329610 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=329610&action=review Very cool! That's the right direction, but I have a few comments. Also for testing you might want to look at the memory fuzz test I wrote in JSTests/wasm/fuzz > Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:-1284 > - ASSERT(sizeRegs[0].sizeRegister != baseMemory); I think we should keep this assert. > Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1281 > ASSERT(!sizeRegs[0].sizeOffset); Can you also assert that sizeRegs.size() == 1 ? The code below assumes it is, and would be wrong if that weren't the case. You can like to FIXME https://bugs.webkit.org/show_bug.cgi?id=162952 > Source/JavaScriptCore/wasm/WasmBinding.cpp:-68 > - ASSERT(!sizeRegs[0].sizeOffset); // The following code assumes we start at 0, and calculates subsequent size registers relative to 0. Same here on keeping the assert, and adding one for https://bugs.webkit.org/show_bug.cgi?id=162952 > Source/JavaScriptCore/wasm/WasmInstance.h:77 > + updateChechedMemory(); Typo "cached". > Source/JavaScriptCore/wasm/WasmInstance.h:79 > + void updateChechedMemory() Typo. > Source/JavaScriptCore/wasm/WasmInstance.h:147 > + WeakPtrFactory<Instance> m_weakPtrFactory; I would try to organize members so that the ones accessed by the JIT are together, at the start. That way we need only small immediates, and they'll be in the same cacheline when accessed. So the cached values would go after m_context, and the weak ptr factory after the call frame callback. > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:193 > + jit.loadWasmContextInstance(baseMemory); This now happens twice if the memory wasn't signaling and we use fast TLS. It might be easier to have 4 code paths without any sharing, for all combinations of signal/no + fast TLS/no. > JSTests/wasm/function-tests/memory-reuse.js:4 > +{ This scope doesn't seem useful? > JSTests/wasm/function-tests/memory-reuse.js:8 > + function createWasmModule(memory) { This returns instances, so I'd call it "createInstance". > JSTests/wasm/function-tests/memory-reuse.js:41 > + for (let i = 0; i < buf.length/4; ++i) { Spaces between divide. Same thing for other binary operators below. > JSTests/wasm/function-tests/memory-reuse.js:56 > + const pageSize = 64 * 1024; Hoist this to the top of the file, and remove the other one above. > JSTests/wasm/function-tests/memory-reuse.js:58 > + const modules = []; As above, I'd rename to "instances". > JSTests/wasm/function-tests/memory-reuse.js:68 > + } I think the start of the test is good, but you then need to grow the memory. There are two ways to do so: as the grow_memory opcode directly in WebAssembly IR, or through the JS API on WebAssembly.Memory. You should then re-run doCheck(). The difficulty in doing this test is that you want them memory's base void* to have moved (hard!), and the size to change (easy!). You're not really testing size because signaling memory does that implicitly. I think you want a VM primitive for testing that allows you to allocate a WebAssembly.Memory that is never signaling. That way to can test size by going out-of-bounds, catching the trap, then growing, and same out-of-bounds should not trap anymore. Now with a non-signaling memory you know that enough growth will cause us to re-allocate and memmove then void* memory too! After doing so, the tests should kill instances and force-invoke the GC to try to make sure they die, and then re-run the checks again.

GSkachkov

Comment 9 2017-12-20 12:11:48 PST

Created attachment 329942 [details] Patch Fix comments

GSkachkov

Comment 10 2017-12-20 12:15:03 PST

Comment on attachment 329610 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=329610&action=review >> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:-1284 >> - ASSERT(sizeRegs[0].sizeRegister != baseMemory); > > I think we should keep this assert. Done >> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1281 >> ASSERT(!sizeRegs[0].sizeOffset); > > Can you also assert that sizeRegs.size() == 1 ? The code below assumes it is, and would be wrong if that weren't the case. You can like to FIXME https://bugs.webkit.org/show_bug.cgi?id=162952 As I can see we have this assert, line 1283/1279 >> Source/JavaScriptCore/wasm/WasmBinding.cpp:-68 >> - ASSERT(!sizeRegs[0].sizeOffset); // The following code assumes we start at 0, and calculates subsequent size registers relative to 0. > > Same here on keeping the assert, and adding one for https://bugs.webkit.org/show_bug.cgi?id=162952 Done >> Source/JavaScriptCore/wasm/WasmInstance.h:77 >> + updateChechedMemory(); > > Typo "cached". Fixed >> Source/JavaScriptCore/wasm/WasmInstance.h:79 >> + void updateChechedMemory() > > Typo. Fixed >> Source/JavaScriptCore/wasm/WasmInstance.h:147 >> + WeakPtrFactory<Instance> m_weakPtrFactory; > > I would try to organize members so that the ones accessed by the JIT are together, at the start. That way we need only small immediates, and they'll be in the same cacheline when accessed. > > So the cached values would go after m_context, and the weak ptr factory after the call frame callback. Done >> Source/JavaScriptCore/wasm/js/JSToWasm.cpp:193 >> + jit.loadWasmContextInstance(baseMemory); > > This now happens twice if the memory wasn't signaling and we use fast TLS. > > It might be easier to have 4 code paths without any sharing, for all combinations of signal/no + fast TLS/no. I've tried rewrite this, hope I did not lost something. >> JSTests/wasm/function-tests/memory-reuse.js:4 >> +{ > > This scope doesn't seem useful? Removed >> JSTests/wasm/function-tests/memory-reuse.js:8 >> + function createWasmModule(memory) { > > This returns instances, so I'd call it "createInstance". Fixed >> JSTests/wasm/function-tests/memory-reuse.js:41 >> + for (let i = 0; i < buf.length/4; ++i) { > > Spaces between divide. Same thing for other binary operators below. Fixed >> JSTests/wasm/function-tests/memory-reuse.js:56 >> + const pageSize = 64 * 1024; > > Hoist this to the top of the file, and remove the other one above. Done >> JSTests/wasm/function-tests/memory-reuse.js:58 >> + const modules = []; > > As above, I'd rename to "instances". Done >> JSTests/wasm/function-tests/memory-reuse.js:68 >> + } > > I think the start of the test is good, but you then need to grow the memory. There are two ways to do so: as the grow_memory opcode directly in WebAssembly IR, or through the JS API on WebAssembly.Memory. You should then re-run doCheck(). > > The difficulty in doing this test is that you want them memory's base void* to have moved (hard!), and the size to change (easy!). You're not really testing size because signaling memory does that implicitly. I think you want a VM primitive for testing that allows you to allocate a WebAssembly.Memory that is never signaling. That way to can test size by going out-of-bounds, catching the trap, then growing, and same out-of-bounds should not trap anymore. Now with a non-signaling memory you know that enough growth will cause us to re-allocate and memmove then void* memory too! > > After doing so, the tests should kill instances and force-invoke the GC to try to make sure they die, and then re-run the checks again. Rewritten, I hope it is close to what is expected

JF Bastien

Comment 11 2018-01-04 09:01:46 PST

This patch will conflict with a not-yet-committed but important patch. I don't think the conflict will be very bad, but if you don't mind I'd like to hold off committing this until the other patch can land. Once it's in I'll do the other round of reviews, because this is a really neat improvement that we definitely want!

GSkachkov

Comment 12 2018-01-04 10:02:30 PST

(In reply to JF Bastien from comment #11) > This patch will conflict with a not-yet-committed but important patch. I > don't think the conflict will be very bad, but if you don't mind I'd like to > hold off committing this until the other patch can land. > > Once it's in I'll do the other round of reviews, because this is a really > neat improvement that we definitely want! Yes sure, let me know when I need to rebase.

JF Bastien

Comment 13 2018-01-05 20:30:11 PST

(In reply to GSkachkov from comment #12) > (In reply to JF Bastien from comment #11) > > This patch will conflict with a not-yet-committed but important patch. I > > don't think the conflict will be very bad, but if you don't mind I'd like to > > hold off committing this until the other patch can land. > > > > Once it's in I'll do the other round of reviews, because this is a really > > neat improvement that we definitely want! > > Yes sure, let me know when I need to rebase. Here's the patch: https://bugs.webkit.org/show_bug.cgi?id=181313 There's now also an index mask that needs to be cached in the same way as memory and size.

GSkachkov

Comment 14 2018-01-07 02:00:20 PST

Created attachment 330654 [details] Patch Rebased patch & using cachedIndexingMask property

GSkachkov

Comment 15 2018-01-07 03:03:45 PST

(In reply to JF Bastien from comment #13) > (In reply to GSkachkov from comment #12) > > (In reply to JF Bastien from comment #11) > > > This patch will conflict with a not-yet-committed but important patch. I > > > don't think the conflict will be very bad, but if you don't mind I'd like to > > > hold off committing this until the other patch can land. > > > > > > Once it's in I'll do the other round of reviews, because this is a really > > > neat improvement that we definitely want! > > > > Yes sure, let me know when I need to rebase. > > Here's the patch: > https://bugs.webkit.org/show_bug.cgi?id=181313 > > There's now also an index mask that needs to be cached in the same way as > memory and size. There is a new rebased patch and I added caching of the index mask in the same way as did for other properties.

GSkachkov

Comment 16 2018-01-11 14:14:14 PST

Created attachment 331117 [details] Patch Fix error in added tests

GSkachkov

Comment 17 2018-01-25 13:54:55 PST

JF Bastien, Are there any performance tests that I can run for Wasm and check any difference with my patch and without?

JF Bastien

Comment 18 2018-01-25 13:56:58 PST

(In reply to GSkachkov from comment #17) > JF Bastien, Are there any performance tests that I can run for Wasm and > check any difference with my patch and without? We don't have public ones because of licenses. The WebAssembly paper used: - PolyBenchC-3.2 - SciMark2.1 It's also fairly easy to compile SPEC2k (non-Fortran parts) and run the training sets. Otherwise Emscripten has benchmarks as well.

GSkachkov

Comment 19 2018-01-26 12:56:58 PST

(In reply to JF Bastien from comment #18) > (In reply to GSkachkov from comment #17) > > JF Bastien, Are there any performance tests that I can run for Wasm and > > check any difference with my patch and without? > > We don't have public ones because of licenses. The WebAssembly paper used: > - PolyBenchC-3.2 > - SciMark2.1 > > It's also fairly easy to compile SPEC2k (non-Fortran parts) and run the > training sets. > > Otherwise Emscripten has benchmarks as well. Thanks for pointing to tests, will prepare some figures during the week.

JF Bastien

Comment 20 2018-02-20 14:31:44 PST

Comment on attachment 331117 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331117&action=review This looks pretty good, with a few comments I'm ready to r=me. Also, did you run the benchmarks you said you would? > Source/JavaScriptCore/wasm/WasmMemory.cpp:403 > + } We never remove weak pointers that are now nullptr, correct? That's a (very slow) leak because m_instances never shrinks. I don't think grow() is the right place to remove null entries, though. Maybe in registerInstance you could first traverse m_instances looking for a null entry, reuse the entry if there's one, and if you can't find one then append? > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:181 > + GPRReg memoryGRP = Context::useFastTLS() ? baseMemory : wasmContextInstanceGPR; I'd rename memoryGPR to currentInstanceGPR, since we're using the basemMemory register to hold the instance.

GSkachkov

Comment 21 2018-02-21 12:43:49 PST

(In reply to JF Bastien from comment #20) > Comment on attachment 331117 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331117&action=review > > This looks pretty good, with a few comments I'm ready to r=me. Also, did you > run the benchmarks you said you would? > > > Source/JavaScriptCore/wasm/WasmMemory.cpp:403 > > + } > > We never remove weak pointers that are now nullptr, correct? That's a (very > slow) leak because m_instances never shrinks. I don't think grow() is the > right place to remove null entries, though. Maybe in registerInstance you > could first traverse m_instances looking for a null entry, reuse the entry > if there's one, and if you can't find one then append? > > > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:181 > > + GPRReg memoryGRP = Context::useFastTLS() ? baseMemory : wasmContextInstanceGPR; > > I'd rename memoryGPR to currentInstanceGPR, since we're using the > basemMemory register to hold the instance. Thanks for the feedback, will try to fix comments soon. Ohh yes, I forget about perf tests. I downloaded tests, but had some troubles with running then switch to another task. I'll try provide figures with next patch.

GSkachkov

Comment 22 2018-02-23 10:16:34 PST

(In reply to JF Bastien from comment #20) > Comment on attachment 331117 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331117&action=review > > This looks pretty good, with a few comments I'm ready to r=me. Also, did you > run the benchmarks you said you would? > There is a result for scimark2_1c: test was build with following parameters: emcc -O *.c -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s ABORTING_MALLOC=0 -o scimark2_1c.html and checked by following url: https://gskachkov.github.io/wasm-perf-tests/scimark2_1c/scimark2_1c.html Result for Safari with Patch ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1186.26 FFT Mflops: 1281.59 (N=1024) SOR Mflops: 1269.40 (100 x 100) MonteCarlo: Mflops: 184.56 Sparse matmult Mflops: 1254.28 (N=1000, nz=5000) LU Mflops: 1941.46 (M=100, N=100) Result for Safari without Patch: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1182.51 FFT Mflops: 1310.47 (N=1024) SOR Mflops: 1251.27 (100 x 100) MonteCarlo: Mflops: 184.94 Sparse matmult Mflops: 1231.30 (N=1000, nz=5000) LU Mflops: 1934.58 (M=100, N=100) Result from terminal: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1879.40 FFT Mflops: 1681.28 (N=1024) SOR Mflops: 1253.96 (100 x 100) MonteCarlo: Mflops: 488.40 Sparse matmult Mflops: 2064.39 (N=1000, nz=5000) LU Mflops: 3908.97 (M=100, N=100) Looks like my changes do not affects this test

GSkachkov

Comment 23 2018-02-23 12:55:06 PST

Created attachment 334544 [details] Patch

JF Bastien

Comment 24 2018-02-23 14:44:02 PST

Running the non-Fortran part of spec2k, I see roughly 5% throughput win on x86-64. Pretty nice!

JF Bastien

Comment 25 2018-02-23 14:44:38 PST

Comment on attachment 334544 [details] Patch r=me

GSkachkov

Comment 26 2018-02-23 15:07:08 PST

(In reply to JF Bastien from comment #24) > Running the non-Fortran part of spec2k, I see roughly 5% throughput win on > x86-64. Pretty nice! Cool! Thanks! Where I can download spec2k? I only found by line https://www.spec.org/order.html and I'm not sure what exact version I should download to run tests? CPU2006 V1.2?

JF Bastien

Comment 27 2018-02-23 15:09:53 PST

(In reply to GSkachkov from comment #26) > (In reply to JF Bastien from comment #24) > > Running the non-Fortran part of spec2k, I see roughly 5% throughput win on > > x86-64. Pretty nice! > > Cool! Thanks! > Where I can download spec2k? I only found by line > https://www.spec.org/order.html and I'm not sure what exact version I should > download to run tests? CPU2006 V1.2? We have a access to it internally, but cannot make it public because of licensing. I'm using CPU2000 because 2006 uses well over 1GiB of memory on many tests (unless you benchmark the training set).

WebKit Commit Bot

Comment 28 2018-02-23 15:17:03 PST

Comment on attachment 334544 [details] Patch Clearing flags on attachment: 334544 Committed r228966: <https://trac.webkit.org/changeset/228966>

WebKit Commit Bot

Comment 29 2018-02-23 15:17:05 PST

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 30 2018-02-23 15:18:25 PST

<rdar://problem/37841252>

Attachments
Patch (15.85 KB, patch) 2017-12-15 14:10 PST, GSkachkov	no flags	Details Formatted Diff Diff
Patch (19.10 KB, patch) 2017-12-16 11:30 PST, GSkachkov	no flags	Details Formatted Diff Diff
Patch (19.95 KB, patch) 2017-12-17 10:59 PST, GSkachkov	no flags	Details Formatted Diff Diff
Patch (20.91 KB, patch) 2017-12-20 12:11 PST, GSkachkov	no flags	Details Formatted Diff Diff
Patch (22.39 KB, patch) 2018-01-07 02:00 PST, GSkachkov	no flags	Details Formatted Diff Diff
Patch (22.48 KB, patch) 2018-01-11 14:14 PST, GSkachkov	no flags	Details Formatted Diff Diff
Patch (22.73 KB, patch) 2018-02-23 12:55 PST, GSkachkov	no flags	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.