When JIT compiling, we create AssemblerBuffers to store the generated code before linking and copying to executable memory. Those buffers start out with a built in capacity of 128 bytes. Every time we need to grow, we realloc() which is usually a malloc() followed by a copy. This causes malloc churn, possible cache locality issues and other performance hits. If we keep a thread local cache, it should allow JIT worker threads to reuse a prior buffer with little or no realloc()'s.
Created attachment 400649 [details] Patch
<rdar://problem/63784637>
Created attachment 400668 [details] Patch to fix build failure
Comment on attachment 400668 [details] Patch to fix build failure View in context: https://bugs.webkit.org/attachment.cgi?id=400668&action=review > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202 > + *threadSpecific = WTFMove(m_storage); Worth trying just WTFMoving if the destination is clear. Or letting the bigger buffer win. > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324 > + void* ptr = static_cast<AssemblerData*>(threadSpecific); I’m not sure that this placement new thing is the canonical way to initialize a thread specific?
Created attachment 400696 [details] Updated patch > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202 > > + *threadSpecific = WTFMove(m_storage); > > Worth trying just WTFMoving if the destination is clear. Or letting the > bigger buffer win. Made that change and it appears to improve performance. > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324 > > + void* ptr = static_cast<AssemblerData*>(threadSpecific); > > I’m not sure that this placement new thing is the canonical way to > initialize a thread specific? Copied this from a similar recent patch. Speculative fix for the JSC i386 build failure.
Committed r262362: <https://trac.webkit.org/changeset/262362>
Comment on attachment 400696 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=400696&action=review > Source/JavaScriptCore/wasm/WasmWorklist.cpp:123 > + clearAssembleDataThreadSpecificCache(); Maybe clear it out on VM destruction? What about the case when a GC finishes linking some code. What thread is that done on? Maybe we should have a way of clearing that out too.
Comment on attachment 400696 [details] Updated patch LGTM too, but I believe you're currently leaking these for web workers. I wonder if we should just say VM destruction releases this data too.
Comment on attachment 400696 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=400696&action=review > Source/JavaScriptCore/assembler/AssemblerBuffer.h:131 > + void takeBufferIfLarger(AssemblerData&& other) nit: I would've used a reference instead of rvalue reference here, since it's not always being moved out of.
(In reply to Filip Pizlo from comment #4) > Comment on attachment 400668 [details] > Patch to fix build failure > > View in context: > https://bugs.webkit.org/attachment.cgi?id=400668&action=review > > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202 > > + *threadSpecific = WTFMove(m_storage); > > Worth trying just WTFMoving if the destination is clear. Or letting the > bigger buffer win. > > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324 > > + void* ptr = static_cast<AssemblerData*>(threadSpecific); > > I’m not sure that this placement new thing is the canonical way to > initialize a thread specific? Yeah this is how I did it recently. This seemed like a reasonable way to me to initialize non PODs, but am curious if you think there is a better way?
(In reply to Saam Barati from comment #7) > Comment on attachment 400696 [details] > Updated patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=400696&action=review > > > Source/JavaScriptCore/wasm/WasmWorklist.cpp:123 > > + clearAssembleDataThreadSpecificCache(); > > Maybe clear it out on VM destruction? > > What about the case when a GC finishes linking some code. What thread is > that done on? Maybe we should have a way of clearing that out too. It puts it into the GC's thread. I really want it to go back to the original thread's cache, but felt the locking and "owning" thread logic could slow things down.
(In reply to Saam Barati from comment #10) > (In reply to Filip Pizlo from comment #4) > > Comment on attachment 400668 [details] > > Patch to fix build failure > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=400668&action=review > > > > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202 > > > + *threadSpecific = WTFMove(m_storage); > > > > Worth trying just WTFMoving if the destination is clear. Or letting the > > bigger buffer win. > > > > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324 > > > + void* ptr = static_cast<AssemblerData*>(threadSpecific); > > > > I’m not sure that this placement new thing is the canonical way to > > initialize a thread specific? > > Yeah this is how I did it recently. This seemed like a reasonable way to me > to initialize non PODs, but am curious if you think there is a better way? So the answer here is just to do operator* or operator-> I'm super off base here. We already automatically call ctor and dtor of T in ThreadSpecific<T>. No need for this cleanup code I'm proposing. Will fix.
Can we clear on low memory warning too?
Oh, I see: Only the compiler threads use this, and they clear right away when done compiling. Probably not profitable to clear in the middle of compiling.