Bug 212562 - Consider a Thread Specific Cache for AssemblerBuffers
Summary: Consider a Thread Specific Cache for AssemblerBuffers
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-05-29 21:37 PDT by Michael Saboff
Modified: 2020-06-01 12:38 PDT (History)
7 users (show)

See Also:


Attachments
Patch (6.99 KB, patch)
2020-05-29 21:48 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch to fix build failure (6.98 KB, patch)
2020-05-30 07:57 PDT, Michael Saboff
msaboff: review?
Details | Formatted Diff | Diff
Updated patch (7.87 KB, patch)
2020-05-30 23:01 PDT, Michael Saboff
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2020-05-29 21:37:06 PDT
When JIT compiling, we create AssemblerBuffers to store the generated code before linking and copying to executable memory.  Those buffers start out with a built in capacity of 128 bytes.   Every time we need to grow, we realloc() which is usually a malloc() followed by a copy.  This causes malloc churn, possible cache locality issues and other performance hits.  If we keep a thread local cache, it should allow JIT worker threads to reuse a prior buffer with little or no realloc()'s.
Comment 1 Michael Saboff 2020-05-29 21:48:09 PDT
Created attachment 400649 [details]
Patch
Comment 2 Radar WebKit Bug Importer 2020-05-29 21:48:45 PDT
<rdar://problem/63784637>
Comment 3 Michael Saboff 2020-05-30 07:57:27 PDT
Created attachment 400668 [details]
Patch to fix build failure
Comment 4 Filip Pizlo 2020-05-30 12:13:20 PDT
Comment on attachment 400668 [details]
Patch to fix build failure

View in context: https://bugs.webkit.org/attachment.cgi?id=400668&action=review

> Source/JavaScriptCore/assembler/AssemblerBuffer.h:202
> +            *threadSpecific = WTFMove(m_storage);

Worth trying just WTFMoving if the destination is clear. Or letting the bigger buffer win.

> Source/JavaScriptCore/assembler/AssemblerBuffer.h:324
> +                void* ptr = static_cast<AssemblerData*>(threadSpecific);

I’m not sure that this placement new thing is the canonical way to initialize a thread specific?
Comment 5 Michael Saboff 2020-05-30 23:01:23 PDT
Created attachment 400696 [details]
Updated patch

> > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202
> > +            *threadSpecific = WTFMove(m_storage);
> 
> Worth trying just WTFMoving if the destination is clear. Or letting the
> bigger buffer win.

Made that change and it appears to improve performance.

> > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324
> > +                void* ptr = static_cast<AssemblerData*>(threadSpecific);
> 
> I’m not sure that this placement new thing is the canonical way to
> initialize a thread specific?

Copied this from a similar recent patch.

Speculative fix for the JSC i386 build failure.
Comment 6 Michael Saboff 2020-05-31 07:56:08 PDT
Committed r262362: <https://trac.webkit.org/changeset/262362>
Comment 7 Saam Barati 2020-06-01 10:11:10 PDT
Comment on attachment 400696 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=400696&action=review

> Source/JavaScriptCore/wasm/WasmWorklist.cpp:123
> +        clearAssembleDataThreadSpecificCache();

Maybe clear it out on VM destruction?

What about the case when a GC finishes linking some code. What thread is that done on? Maybe we should have a way of clearing that out too.
Comment 8 Saam Barati 2020-06-01 10:16:50 PDT
Comment on attachment 400696 [details]
Updated patch

LGTM too, but I believe you're currently leaking these for web workers. I wonder if we should just say VM destruction releases this data too.
Comment 9 Saam Barati 2020-06-01 10:18:11 PDT
Comment on attachment 400696 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=400696&action=review

> Source/JavaScriptCore/assembler/AssemblerBuffer.h:131
> +        void takeBufferIfLarger(AssemblerData&& other)

nit: I would've used a reference instead of rvalue reference here, since it's not always being moved out of.
Comment 10 Saam Barati 2020-06-01 10:24:02 PDT
(In reply to Filip Pizlo from comment #4)
> Comment on attachment 400668 [details]
> Patch to fix build failure
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=400668&action=review
> 
> > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202
> > +            *threadSpecific = WTFMove(m_storage);
> 
> Worth trying just WTFMoving if the destination is clear. Or letting the
> bigger buffer win.
> 
> > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324
> > +                void* ptr = static_cast<AssemblerData*>(threadSpecific);
> 
> I’m not sure that this placement new thing is the canonical way to
> initialize a thread specific?

Yeah this is how I did it recently. This seemed like a reasonable way to me to initialize non PODs, but am curious if you think there is a better way?
Comment 11 Michael Saboff 2020-06-01 11:14:36 PDT
(In reply to Saam Barati from comment #7)
> Comment on attachment 400696 [details]
> Updated patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=400696&action=review
> 
> > Source/JavaScriptCore/wasm/WasmWorklist.cpp:123
> > +        clearAssembleDataThreadSpecificCache();
> 
> Maybe clear it out on VM destruction?
> 
> What about the case when a GC finishes linking some code. What thread is
> that done on? Maybe we should have a way of clearing that out too.

It puts it into the GC's thread.  I really want it to go back to the original thread's cache, but felt the locking and "owning" thread logic could slow things down.
Comment 12 Saam Barati 2020-06-01 12:38:33 PDT
(In reply to Saam Barati from comment #10)
> (In reply to Filip Pizlo from comment #4)
> > Comment on attachment 400668 [details]
> > Patch to fix build failure
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=400668&action=review
> > 
> > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:202
> > > +            *threadSpecific = WTFMove(m_storage);
> > 
> > Worth trying just WTFMoving if the destination is clear. Or letting the
> > bigger buffer win.
> > 
> > > Source/JavaScriptCore/assembler/AssemblerBuffer.h:324
> > > +                void* ptr = static_cast<AssemblerData*>(threadSpecific);
> > 
> > I’m not sure that this placement new thing is the canonical way to
> > initialize a thread specific?
> 
> Yeah this is how I did it recently. This seemed like a reasonable way to me
> to initialize non PODs, but am curious if you think there is a better way?

So the answer here is just to do operator* or operator->

I'm super off base here. We already automatically call ctor and dtor of T in ThreadSpecific<T>. No need for this cleanup code I'm proposing.

Will fix.