(From a FIXME that didn't have a bug) In Wasm::Memory, we should cache address / size to the instances to avoid a load on instance->instance calls. This will require updating all the instances when grow is called.
I can try to fix this issue. Wasm::Instance will have to add property cachedMemory and cachedSize, but Wasm::Memory should contain links to Wasm::Instance where Wasm::Memory is used to iterate within `grow`, is it correct understating of the issue?
(In reply to GSkachkov from comment #1) > I can try to fix this issue. Cool! > Wasm::Instance will have to add property cachedMemory and cachedSize, Yes, and use it in the JIT instead of doing the indirection. > Wasm::Memory should contain links to Wasm::Instance where Wasm::Memory is > used to iterate within `grow`, is it correct understating of the issue? Wasm::Memory can be used by multiple Wasm::Instance at the same time, so the Wasm::Memory has to know all of the Wasm::Instances that refer to it. The Instance owns the memory, so the back-pointers need to be weak references. Then yes, you would update all of them from grow.
Created attachment 329518 [details] Patch WiP. Caching memory size & addess in wasm::Instance
(In reply to GSkachkov from comment #3) > Created attachment 329518 [details] > Patch > > WiP. Caching memory size & addess in wasm::Instance You've marked as "review?". Is it WIP or ready for review?
(In reply to JF Bastien from comment #4) > (In reply to GSkachkov from comment #3) > > Created attachment 329518 [details] > > Patch > > > > WiP. Caching memory size & addess in wasm::Instance > > You've marked as "review?". Is it WIP or ready for review? Sorry, it is WiP and I removed review flag, but I would like to have some feedback if my changes related to what is expected to do in this task.
Created attachment 329580 [details] Patch Patch for review
Created attachment 329610 [details] Patch Patch for review + small clearence
Comment on attachment 329610 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=329610&action=review Very cool! That's the right direction, but I have a few comments. Also for testing you might want to look at the memory fuzz test I wrote in JSTests/wasm/fuzz > Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:-1284 > - ASSERT(sizeRegs[0].sizeRegister != baseMemory); I think we should keep this assert. > Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1281 > ASSERT(!sizeRegs[0].sizeOffset); Can you also assert that sizeRegs.size() == 1 ? The code below assumes it is, and would be wrong if that weren't the case. You can like to FIXME https://bugs.webkit.org/show_bug.cgi?id=162952 > Source/JavaScriptCore/wasm/WasmBinding.cpp:-68 > - ASSERT(!sizeRegs[0].sizeOffset); // The following code assumes we start at 0, and calculates subsequent size registers relative to 0. Same here on keeping the assert, and adding one for https://bugs.webkit.org/show_bug.cgi?id=162952 > Source/JavaScriptCore/wasm/WasmInstance.h:77 > + updateChechedMemory(); Typo "cached". > Source/JavaScriptCore/wasm/WasmInstance.h:79 > + void updateChechedMemory() Typo. > Source/JavaScriptCore/wasm/WasmInstance.h:147 > + WeakPtrFactory<Instance> m_weakPtrFactory; I would try to organize members so that the ones accessed by the JIT are together, at the start. That way we need only small immediates, and they'll be in the same cacheline when accessed. So the cached values would go after m_context, and the weak ptr factory after the call frame callback. > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:193 > + jit.loadWasmContextInstance(baseMemory); This now happens twice if the memory wasn't signaling and we use fast TLS. It might be easier to have 4 code paths without any sharing, for all combinations of signal/no + fast TLS/no. > JSTests/wasm/function-tests/memory-reuse.js:4 > +{ This scope doesn't seem useful? > JSTests/wasm/function-tests/memory-reuse.js:8 > + function createWasmModule(memory) { This returns instances, so I'd call it "createInstance". > JSTests/wasm/function-tests/memory-reuse.js:41 > + for (let i = 0; i < buf.length/4; ++i) { Spaces between divide. Same thing for other binary operators below. > JSTests/wasm/function-tests/memory-reuse.js:56 > + const pageSize = 64 * 1024; Hoist this to the top of the file, and remove the other one above. > JSTests/wasm/function-tests/memory-reuse.js:58 > + const modules = []; As above, I'd rename to "instances". > JSTests/wasm/function-tests/memory-reuse.js:68 > + } I think the start of the test is good, but you then need to grow the memory. There are two ways to do so: as the grow_memory opcode directly in WebAssembly IR, or through the JS API on WebAssembly.Memory. You should then re-run doCheck(). The difficulty in doing this test is that you want them memory's base void* to have moved (hard!), and the size to change (easy!). You're not really testing size because signaling memory does that implicitly. I think you want a VM primitive for testing that allows you to allocate a WebAssembly.Memory that is never signaling. That way to can test size by going out-of-bounds, catching the trap, then growing, and same out-of-bounds should not trap anymore. Now with a non-signaling memory you know that enough growth will cause us to re-allocate and memmove then void* memory too! After doing so, the tests should kill instances and force-invoke the GC to try to make sure they die, and then re-run the checks again.
Created attachment 329942 [details] Patch Fix comments
Comment on attachment 329610 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=329610&action=review >> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:-1284 >> - ASSERT(sizeRegs[0].sizeRegister != baseMemory); > > I think we should keep this assert. Done >> Source/JavaScriptCore/wasm/WasmB3IRGenerator.cpp:1281 >> ASSERT(!sizeRegs[0].sizeOffset); > > Can you also assert that sizeRegs.size() == 1 ? The code below assumes it is, and would be wrong if that weren't the case. You can like to FIXME https://bugs.webkit.org/show_bug.cgi?id=162952 As I can see we have this assert, line 1283/1279 >> Source/JavaScriptCore/wasm/WasmBinding.cpp:-68 >> - ASSERT(!sizeRegs[0].sizeOffset); // The following code assumes we start at 0, and calculates subsequent size registers relative to 0. > > Same here on keeping the assert, and adding one for https://bugs.webkit.org/show_bug.cgi?id=162952 Done >> Source/JavaScriptCore/wasm/WasmInstance.h:77 >> + updateChechedMemory(); > > Typo "cached". Fixed >> Source/JavaScriptCore/wasm/WasmInstance.h:79 >> + void updateChechedMemory() > > Typo. Fixed >> Source/JavaScriptCore/wasm/WasmInstance.h:147 >> + WeakPtrFactory<Instance> m_weakPtrFactory; > > I would try to organize members so that the ones accessed by the JIT are together, at the start. That way we need only small immediates, and they'll be in the same cacheline when accessed. > > So the cached values would go after m_context, and the weak ptr factory after the call frame callback. Done >> Source/JavaScriptCore/wasm/js/JSToWasm.cpp:193 >> + jit.loadWasmContextInstance(baseMemory); > > This now happens twice if the memory wasn't signaling and we use fast TLS. > > It might be easier to have 4 code paths without any sharing, for all combinations of signal/no + fast TLS/no. I've tried rewrite this, hope I did not lost something. >> JSTests/wasm/function-tests/memory-reuse.js:4 >> +{ > > This scope doesn't seem useful? Removed >> JSTests/wasm/function-tests/memory-reuse.js:8 >> + function createWasmModule(memory) { > > This returns instances, so I'd call it "createInstance". Fixed >> JSTests/wasm/function-tests/memory-reuse.js:41 >> + for (let i = 0; i < buf.length/4; ++i) { > > Spaces between divide. Same thing for other binary operators below. Fixed >> JSTests/wasm/function-tests/memory-reuse.js:56 >> + const pageSize = 64 * 1024; > > Hoist this to the top of the file, and remove the other one above. Done >> JSTests/wasm/function-tests/memory-reuse.js:58 >> + const modules = []; > > As above, I'd rename to "instances". Done >> JSTests/wasm/function-tests/memory-reuse.js:68 >> + } > > I think the start of the test is good, but you then need to grow the memory. There are two ways to do so: as the grow_memory opcode directly in WebAssembly IR, or through the JS API on WebAssembly.Memory. You should then re-run doCheck(). > > The difficulty in doing this test is that you want them memory's base void* to have moved (hard!), and the size to change (easy!). You're not really testing size because signaling memory does that implicitly. I think you want a VM primitive for testing that allows you to allocate a WebAssembly.Memory that is never signaling. That way to can test size by going out-of-bounds, catching the trap, then growing, and same out-of-bounds should not trap anymore. Now with a non-signaling memory you know that enough growth will cause us to re-allocate and memmove then void* memory too! > > After doing so, the tests should kill instances and force-invoke the GC to try to make sure they die, and then re-run the checks again. Rewritten, I hope it is close to what is expected
This patch will conflict with a not-yet-committed but important patch. I don't think the conflict will be very bad, but if you don't mind I'd like to hold off committing this until the other patch can land. Once it's in I'll do the other round of reviews, because this is a really neat improvement that we definitely want!
(In reply to JF Bastien from comment #11) > This patch will conflict with a not-yet-committed but important patch. I > don't think the conflict will be very bad, but if you don't mind I'd like to > hold off committing this until the other patch can land. > > Once it's in I'll do the other round of reviews, because this is a really > neat improvement that we definitely want! Yes sure, let me know when I need to rebase.
(In reply to GSkachkov from comment #12) > (In reply to JF Bastien from comment #11) > > This patch will conflict with a not-yet-committed but important patch. I > > don't think the conflict will be very bad, but if you don't mind I'd like to > > hold off committing this until the other patch can land. > > > > Once it's in I'll do the other round of reviews, because this is a really > > neat improvement that we definitely want! > > Yes sure, let me know when I need to rebase. Here's the patch: https://bugs.webkit.org/show_bug.cgi?id=181313 There's now also an index mask that needs to be cached in the same way as memory and size.
Created attachment 330654 [details] Patch Rebased patch & using cachedIndexingMask property
(In reply to JF Bastien from comment #13) > (In reply to GSkachkov from comment #12) > > (In reply to JF Bastien from comment #11) > > > This patch will conflict with a not-yet-committed but important patch. I > > > don't think the conflict will be very bad, but if you don't mind I'd like to > > > hold off committing this until the other patch can land. > > > > > > Once it's in I'll do the other round of reviews, because this is a really > > > neat improvement that we definitely want! > > > > Yes sure, let me know when I need to rebase. > > Here's the patch: > https://bugs.webkit.org/show_bug.cgi?id=181313 > > There's now also an index mask that needs to be cached in the same way as > memory and size. There is a new rebased patch and I added caching of the index mask in the same way as did for other properties.
Created attachment 331117 [details] Patch Fix error in added tests
JF Bastien, Are there any performance tests that I can run for Wasm and check any difference with my patch and without?
(In reply to GSkachkov from comment #17) > JF Bastien, Are there any performance tests that I can run for Wasm and > check any difference with my patch and without? We don't have public ones because of licenses. The WebAssembly paper used: - PolyBenchC-3.2 - SciMark2.1 It's also fairly easy to compile SPEC2k (non-Fortran parts) and run the training sets. Otherwise Emscripten has benchmarks as well.
(In reply to JF Bastien from comment #18) > (In reply to GSkachkov from comment #17) > > JF Bastien, Are there any performance tests that I can run for Wasm and > > check any difference with my patch and without? > > We don't have public ones because of licenses. The WebAssembly paper used: > - PolyBenchC-3.2 > - SciMark2.1 > > It's also fairly easy to compile SPEC2k (non-Fortran parts) and run the > training sets. > > Otherwise Emscripten has benchmarks as well. Thanks for pointing to tests, will prepare some figures during the week.
Comment on attachment 331117 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331117&action=review This looks pretty good, with a few comments I'm ready to r=me. Also, did you run the benchmarks you said you would? > Source/JavaScriptCore/wasm/WasmMemory.cpp:403 > + } We never remove weak pointers that are now nullptr, correct? That's a (very slow) leak because m_instances never shrinks. I don't think grow() is the right place to remove null entries, though. Maybe in registerInstance you could first traverse m_instances looking for a null entry, reuse the entry if there's one, and if you can't find one then append? > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:181 > + GPRReg memoryGRP = Context::useFastTLS() ? baseMemory : wasmContextInstanceGPR; I'd rename memoryGPR to currentInstanceGPR, since we're using the basemMemory register to hold the instance.
(In reply to JF Bastien from comment #20) > Comment on attachment 331117 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331117&action=review > > This looks pretty good, with a few comments I'm ready to r=me. Also, did you > run the benchmarks you said you would? > > > Source/JavaScriptCore/wasm/WasmMemory.cpp:403 > > + } > > We never remove weak pointers that are now nullptr, correct? That's a (very > slow) leak because m_instances never shrinks. I don't think grow() is the > right place to remove null entries, though. Maybe in registerInstance you > could first traverse m_instances looking for a null entry, reuse the entry > if there's one, and if you can't find one then append? > > > Source/JavaScriptCore/wasm/js/JSToWasm.cpp:181 > > + GPRReg memoryGRP = Context::useFastTLS() ? baseMemory : wasmContextInstanceGPR; > > I'd rename memoryGPR to currentInstanceGPR, since we're using the > basemMemory register to hold the instance. Thanks for the feedback, will try to fix comments soon. Ohh yes, I forget about perf tests. I downloaded tests, but had some troubles with running then switch to another task. I'll try provide figures with next patch.
(In reply to JF Bastien from comment #20) > Comment on attachment 331117 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331117&action=review > > This looks pretty good, with a few comments I'm ready to r=me. Also, did you > run the benchmarks you said you would? > There is a result for scimark2_1c: test was build with following parameters: emcc -O *.c -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s ABORTING_MALLOC=0 -o scimark2_1c.html and checked by following url: https://gskachkov.github.io/wasm-perf-tests/scimark2_1c/scimark2_1c.html Result for Safari with Patch ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1186.26 FFT Mflops: 1281.59 (N=1024) SOR Mflops: 1269.40 (100 x 100) MonteCarlo: Mflops: 184.56 Sparse matmult Mflops: 1254.28 (N=1000, nz=5000) LU Mflops: 1941.46 (M=100, N=100) Result for Safari without Patch: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1182.51 FFT Mflops: 1310.47 (N=1024) SOR Mflops: 1251.27 (100 x 100) MonteCarlo: Mflops: 184.94 Sparse matmult Mflops: 1231.30 (N=1000, nz=5000) LU Mflops: 1934.58 (M=100, N=100) Result from terminal: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to pozo@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1879.40 FFT Mflops: 1681.28 (N=1024) SOR Mflops: 1253.96 (100 x 100) MonteCarlo: Mflops: 488.40 Sparse matmult Mflops: 2064.39 (N=1000, nz=5000) LU Mflops: 3908.97 (M=100, N=100) Looks like my changes do not affects this test
Created attachment 334544 [details] Patch
Running the non-Fortran part of spec2k, I see roughly 5% throughput win on x86-64. Pretty nice!
Comment on attachment 334544 [details] Patch r=me
(In reply to JF Bastien from comment #24) > Running the non-Fortran part of spec2k, I see roughly 5% throughput win on > x86-64. Pretty nice! Cool! Thanks! Where I can download spec2k? I only found by line https://www.spec.org/order.html and I'm not sure what exact version I should download to run tests? CPU2006 V1.2?
(In reply to GSkachkov from comment #26) > (In reply to JF Bastien from comment #24) > > Running the non-Fortran part of spec2k, I see roughly 5% throughput win on > > x86-64. Pretty nice! > > Cool! Thanks! > Where I can download spec2k? I only found by line > https://www.spec.org/order.html and I'm not sure what exact version I should > download to run tests? CPU2006 V1.2? We have a access to it internally, but cannot make it public because of licensing. I'm using CPU2000 because 2006 uses well over 1GiB of memory on many tests (unless you benchmark the training set).
Comment on attachment 334544 [details] Patch Clearing flags on attachment: 334544 Committed r228966: <https://trac.webkit.org/changeset/228966>
All reviewed patches have been landed. Closing bug.
<rdar://problem/37841252>