Bug 234102 - Enable thread safe statics
Summary: Enable thread safe statics
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-12-09 14:29 PST by Chris Dumez
Modified: 2021-12-17 10:22 PST (History)
40 users (show)

See Also:


Attachments
WIP Patch (115.97 KB, patch)
2021-12-09 14:41 PST, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP Patch (116.58 KB, patch)
2021-12-09 14:47 PST, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP Patch (117.31 KB, patch)
2021-12-09 14:58 PST, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP Patch (117.79 KB, patch)
2021-12-09 15:32 PST, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP Patch (118.48 KB, patch)
2021-12-10 07:18 PST, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP Patch (119.14 KB, patch)
2021-12-10 09:58 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
WIP Patch (119.14 KB, patch)
2021-12-10 10:47 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
WIP Patch (198.69 KB, patch)
2021-12-16 10:30 PST, Chris Dumez
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
WIP Patch (198.70 KB, patch)
2021-12-16 10:44 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
WIP Patch (198.70 KB, patch)
2021-12-16 15:21 PST, Chris Dumez
no flags Details | Formatted Diff | Diff
WIP Patch (198.70 KB, patch)
2021-12-16 18:03 PST, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2021-12-09 14:29:03 PST
Enable thread safe statics.
Comment 1 Chris Dumez 2021-12-09 14:41:01 PST
Created attachment 446609 [details]
WIP Patch

Doing perf testing on this.
Comment 2 Chris Dumez 2021-12-09 14:47:50 PST
Created attachment 446612 [details]
WIP Patch
Comment 3 Chris Dumez 2021-12-09 14:58:50 PST
Created attachment 446616 [details]
WIP Patch
Comment 4 Yusuke Suzuki 2021-12-09 15:31:15 PST
Can you run Speedometer2, JetStream2 etc. to ensure that this does not affect on performance?
Comment 5 Chris Dumez 2021-12-09 15:31:49 PST
(In reply to Yusuke Suzuki from comment #4)
> Can you run Speedometer2, JetStream2 etc. to ensure that this does not
> affect on performance?

Yes this is in progress, see https://bugs.webkit.org/show_bug.cgi?id=234102#c1
Comment 6 Yusuke Suzuki 2021-12-09 15:32:44 PST
(In reply to Chris Dumez from comment #5)
> (In reply to Yusuke Suzuki from comment #4)
> > Can you run Speedometer2, JetStream2 etc. to ensure that this does not
> > affect on performance?
> 
> Yes this is in progress, see
> https://bugs.webkit.org/show_bug.cgi?id=234102#c1

Great!
Comment 7 Chris Dumez 2021-12-09 15:32:47 PST
Created attachment 446621 [details]
WIP Patch
Comment 8 Chris Dumez 2021-12-10 07:18:39 PST
Created attachment 446724 [details]
WIP Patch
Comment 9 Chris Dumez 2021-12-10 09:58:26 PST
Created attachment 446747 [details]
WIP Patch
Comment 10 Darin Adler 2021-12-10 10:27:46 PST
This would be so great. Looking forward to the performance testing results.
Comment 11 Chris Dumez 2021-12-10 10:47:43 PST
Created attachment 446759 [details]
WIP Patch
Comment 12 Darin Adler 2021-12-10 10:49:29 PST
Comment on attachment 446747 [details]
WIP Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=446747&action=review

Style comments in case you want to tweak a little further.

Generally speaking, these tend to repeat the type and using auto and deduction guides could cut down on that. Without thread safe globals, we were forced to be more verbose about types, but that’s not needed any more once we make the change.

Can always do that code style cleanup after the fact. I do think it helps make things more readable and is worthwhile but doesn’t have to be done right away.

> Source/JavaScriptCore/assembler/AssemblerBuffer.cpp:37
> +    static ThreadSpecificAssemblerData* threadSpecificAssemblerDataPtr = new ThreadSpecificAssemblerData();

I would use auto and no () here.

I also think a shorter variable name would be nice, since it’s a super-local scope.

> Source/JavaScriptCore/assembler/AssemblerBuffer.cpp:44
> +    static ThreadSpecificAssemblerData* threadSpecificAssemblerHashesPtr = = new ThreadSpecificAssemblerData();

Same, but also "= =" here is a mistake.

> Source/JavaScriptCore/assembler/LinkBuffer.cpp:113
> +    static ThreadSpecificBranchCompactionLinkBuffer* threadSpecificBranchCompactionLinkBufferPtr = new ThreadSpecificBranchCompactionLinkBuffer();

Ditto.

> Source/JavaScriptCore/heap/Heap.cpp:162
> +    static HashMap<const char*, GCTypeMap<SimpleStats>>* result =  new HashMap<const char*, GCTypeMap<SimpleStats>>();

Using auto here would make this much nicer. Also extra space after "=".

> Source/JavaScriptCore/heap/HeapHelperPool.cpp:36
> +    static ParallelHelperPool* helperPool = [] {

Could use auto here. Not necessarily something everyone agrees makes this better, but I would do it. I also would just use the name "pool" inside this function. I don’t think "helperPool" makes anything better. Also, this global could be a reference, need not be a pointer.

> Source/JavaScriptCore/interpreter/Interpreter.cpp:346
> +    static NeverDestroyed<HashMap<Opcode, OpcodeID>> opcodeIDTable = [] {

We could upgrade NeverDestroyed so it has appropriate deduction guides, so we can just write:

    static NeverDestroyed table = [] {

Or could accomplish the same thing with auto and makeNeverDestroyed. Either way we avoid reiterating the type name one more time, which I think is good.

> Source/JavaScriptCore/jit/RegisterAtOffsetList.cpp:75
> +    static NeverDestroyed<RegisterAtOffsetList> result(RegisterSet::llintBaselineCalleeSaveRegisters());

Deduction guide for NeverDestroyed would be great here.

> Source/JavaScriptCore/jit/RegisterSet.cpp:188
> +    static RegisterAtOffsetList* result = new RegisterAtOffsetList(vmCalleeSaveRegisters(), RegisterAtOffsetList::ZeroBased);

Using auto could make this less repetitive.

> Source/JavaScriptCore/jsc.cpp:1960
> +    static ThreadSpecific<Worker*>* result = new ThreadSpecific<Worker*>();

I would use auto and not use (), and I would also probably use a reference rather than a pointer.

> Source/JavaScriptCore/jsc.cpp:2009
> +    static Workers* result = new Workers();

Ditto.

> Source/JavaScriptCore/llint/LLIntEntrypoint.cpp:61
> +            static DirectJITCode* jitCode = [&] {

Could use auto.

> Source/JavaScriptCore/llint/LLIntThunks.cpp:133
> +    static NeverDestroyed<MacroAssemblerCodeRef<JSEntryPtrTag>> codeRef(generateThunkWithJumpToPrologue<JSEntryPtrTag>(llint_function_for_call_prologue, "function for call"));

Seems like these really would benefit from the deduction guide.

> Source/WTF/wtf/Identified.h:98
> +        static NeverDestroyed<std::atomic<uint64_t>> currentIdentifier(0);

The (0) is not needed. This is a static, and those get zero-initialized. In fact, I think this is a case where the old idiom wasn’t needed even with -fno-threadsafe-statics

> Source/WTF/wtf/ObjectIdentifier.cpp:40
> +    static NeverDestroyed<std::atomic<uint64_t>> current(0);

Same thing, no (0) needed.

> Source/WTF/wtf/ParkingLot.cpp:448
> +    static ThreadSpecific<RefPtr<ThreadData>, CanBeGCThread::True>* threadData = new ThreadSpecific<RefPtr<ThreadData>, CanBeGCThread::True>();

auto

I’m not going to mention this every time.

> Source/WTF/wtf/TranslatedProcess.cpp:43
> +        return !!value;

No need for the !! since this is a return statement. Pretty sure it wasn’t needed before either.

> Source/WTF/wtf/URL.cpp:891
> +    static NeverDestroyed<URL> staticBlankURL(URL(), &aboutBlankString);

I am so sick of the syntax for making a URL from an absolute URL string; gotta fix that idiom, because this is so ugly.

> Source/WebCore/PAL/Configurations/Base.xcconfig:74
> -GCC_THREADSAFE_STATICS = NO;
> +GCC_THREADSAFE_STATICS = YES;

Could we just remove these instead of changing them from NO to YES? After all, YES is the default.

> Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:58
> +    static ThreadSpecific<BlobUrlOriginMap>* map = new ThreadSpecific<BlobUrlOriginMap>;

auto is so good

> Source/WebCore/page/NavigatorBase.cpp:188
>          if (WTF::numberOfProcessorCores() < 8)
> -            numberOfCores = 4;
> -        else
> -            numberOfCores = 8;
> -    });
> +            return 4;
> +        return 8;

Could use ? : here and not bother with a lambda. I think it would be easier to read. Still keep the giant comment, but before the static.

> Source/WebCore/platform/audio/cocoa/AudioSampleDataSource.mm:131
> +        double frequency = 1e-9 * static_cast<double>(timebaseInfo.numer) / static_cast<double>(timebaseInfo.denom);

These casts to double aren’t needed. C++ numeric type rules will convert them to double since we are doing math with 1e-9, a double.

> Source/WebCore/platform/graphics/cg/ImageBufferUtilitiesCG.cpp:78
> +    static CFStringRef kUTTypePNG = CFSTR("public.png");
> +    static CFStringRef kUTTypeGIF = CFSTR("com.compuserve.gif");

Surprised we can’t use constexpr. We definitely could if these were @"" constants. Unfortunate to use read/write global space for these.
Comment 13 Chris Dumez 2021-12-10 10:55:15 PST
(In reply to Darin Adler from comment #12)
> Comment on attachment 446747 [details]
> WIP Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=446747&action=review
> 
> Style comments in case you want to tweak a little further.
> 
> Generally speaking, these tend to repeat the type and using auto and
> deduction guides could cut down on that. Without thread safe globals, we
> were forced to be more verbose about types, but that’s not needed any more
> once we make the change.
> 
> Can always do that code style cleanup after the fact. I do think it helps
> make things more readable and is worthwhile but doesn’t have to be done
> right away.
> 
> > Source/JavaScriptCore/assembler/AssemblerBuffer.cpp:37
> > +    static ThreadSpecificAssemblerData* threadSpecificAssemblerDataPtr = new ThreadSpecificAssemblerData();
> 
> I would use auto and no () here.
> 
> I also think a shorter variable name would be nice, since it’s a super-local
> scope.
> 
> > Source/JavaScriptCore/assembler/AssemblerBuffer.cpp:44
> > +    static ThreadSpecificAssemblerData* threadSpecificAssemblerHashesPtr = = new ThreadSpecificAssemblerData();
> 
> Same, but also "= =" here is a mistake.
> 
> > Source/JavaScriptCore/assembler/LinkBuffer.cpp:113
> > +    static ThreadSpecificBranchCompactionLinkBuffer* threadSpecificBranchCompactionLinkBufferPtr = new ThreadSpecificBranchCompactionLinkBuffer();
> 
> Ditto.
> 
> > Source/JavaScriptCore/heap/Heap.cpp:162
> > +    static HashMap<const char*, GCTypeMap<SimpleStats>>* result =  new HashMap<const char*, GCTypeMap<SimpleStats>>();
> 
> Using auto here would make this much nicer. Also extra space after "=".
> 
> > Source/JavaScriptCore/heap/HeapHelperPool.cpp:36
> > +    static ParallelHelperPool* helperPool = [] {
> 
> Could use auto here. Not necessarily something everyone agrees makes this
> better, but I would do it. I also would just use the name "pool" inside this
> function. I don’t think "helperPool" makes anything better. Also, this
> global could be a reference, need not be a pointer.
> 
> > Source/JavaScriptCore/interpreter/Interpreter.cpp:346
> > +    static NeverDestroyed<HashMap<Opcode, OpcodeID>> opcodeIDTable = [] {
> 
> We could upgrade NeverDestroyed so it has appropriate deduction guides, so
> we can just write:
> 
>     static NeverDestroyed table = [] {
> 
> Or could accomplish the same thing with auto and makeNeverDestroyed. Either
> way we avoid reiterating the type name one more time, which I think is good.
> 
> > Source/JavaScriptCore/jit/RegisterAtOffsetList.cpp:75
> > +    static NeverDestroyed<RegisterAtOffsetList> result(RegisterSet::llintBaselineCalleeSaveRegisters());
> 
> Deduction guide for NeverDestroyed would be great here.
> 
> > Source/JavaScriptCore/jit/RegisterSet.cpp:188
> > +    static RegisterAtOffsetList* result = new RegisterAtOffsetList(vmCalleeSaveRegisters(), RegisterAtOffsetList::ZeroBased);
> 
> Using auto could make this less repetitive.
> 
> > Source/JavaScriptCore/jsc.cpp:1960
> > +    static ThreadSpecific<Worker*>* result = new ThreadSpecific<Worker*>();
> 
> I would use auto and not use (), and I would also probably use a reference
> rather than a pointer.
> 
> > Source/JavaScriptCore/jsc.cpp:2009
> > +    static Workers* result = new Workers();
> 
> Ditto.
> 
> > Source/JavaScriptCore/llint/LLIntEntrypoint.cpp:61
> > +            static DirectJITCode* jitCode = [&] {
> 
> Could use auto.
> 
> > Source/JavaScriptCore/llint/LLIntThunks.cpp:133
> > +    static NeverDestroyed<MacroAssemblerCodeRef<JSEntryPtrTag>> codeRef(generateThunkWithJumpToPrologue<JSEntryPtrTag>(llint_function_for_call_prologue, "function for call"));
> 
> Seems like these really would benefit from the deduction guide.
> 
> > Source/WTF/wtf/Identified.h:98
> > +        static NeverDestroyed<std::atomic<uint64_t>> currentIdentifier(0);
> 
> The (0) is not needed. This is a static, and those get zero-initialized. In
> fact, I think this is a case where the old idiom wasn’t needed even with
> -fno-threadsafe-statics
> 
> > Source/WTF/wtf/ObjectIdentifier.cpp:40
> > +    static NeverDestroyed<std::atomic<uint64_t>> current(0);
> 
> Same thing, no (0) needed.
> 
> > Source/WTF/wtf/ParkingLot.cpp:448
> > +    static ThreadSpecific<RefPtr<ThreadData>, CanBeGCThread::True>* threadData = new ThreadSpecific<RefPtr<ThreadData>, CanBeGCThread::True>();
> 
> auto
> 
> I’m not going to mention this every time.
> 
> > Source/WTF/wtf/TranslatedProcess.cpp:43
> > +        return !!value;
> 
> No need for the !! since this is a return statement. Pretty sure it wasn’t
> needed before either.
> 
> > Source/WTF/wtf/URL.cpp:891
> > +    static NeverDestroyed<URL> staticBlankURL(URL(), &aboutBlankString);
> 
> I am so sick of the syntax for making a URL from an absolute URL string;
> gotta fix that idiom, because this is so ugly.
> 
> > Source/WebCore/PAL/Configurations/Base.xcconfig:74
> > -GCC_THREADSAFE_STATICS = NO;
> > +GCC_THREADSAFE_STATICS = YES;
> 
> Could we just remove these instead of changing them from NO to YES? After
> all, YES is the default.
> 
> > Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:58
> > +    static ThreadSpecific<BlobUrlOriginMap>* map = new ThreadSpecific<BlobUrlOriginMap>;
> 
> auto is so good
> 
> > Source/WebCore/page/NavigatorBase.cpp:188
> >          if (WTF::numberOfProcessorCores() < 8)
> > -            numberOfCores = 4;
> > -        else
> > -            numberOfCores = 8;
> > -    });
> > +            return 4;
> > +        return 8;
> 
> Could use ? : here and not bother with a lambda. I think it would be easier
> to read. Still keep the giant comment, but before the static.
> 
> > Source/WebCore/platform/audio/cocoa/AudioSampleDataSource.mm:131
> > +        double frequency = 1e-9 * static_cast<double>(timebaseInfo.numer) / static_cast<double>(timebaseInfo.denom);
> 
> These casts to double aren’t needed. C++ numeric type rules will convert
> them to double since we are doing math with 1e-9, a double.
> 
> > Source/WebCore/platform/graphics/cg/ImageBufferUtilitiesCG.cpp:78
> > +    static CFStringRef kUTTypePNG = CFSTR("public.png");
> > +    static CFStringRef kUTTypeGIF = CFSTR("com.compuserve.gif");
> 
> Surprised we can’t use constexpr. We definitely could if these were @""
> constants. Unfortunate to use read/write global space for these.

Thanks for the feedback, the patch definitely wasn't meant to be "clean", just quick-and-dirty to get perf testing going. I'll do the suggested improvements if the perf results come back neutral.
Comment 14 Chris Dumez 2021-12-11 15:24:05 PST
Sadly, this seems to be a ~0.6% regression on Speedometer 2 on iPhone X (with 98% probability). It is neutral on macOS.

MotionMark 1.1 & JetStream 2.0 seems pretty much neutral on both macOS and iOS.
Comment 15 Myles C. Maxfield 2021-12-11 21:50:05 PST
Does that mean this bug is just WONTFIX? Or is there a path forward here?
Comment 16 Chris Dumez 2021-12-12 11:00:45 PST
(In reply to Myles C. Maxfield from comment #15)
> Does that mean this bug is just WONTFIX? Or is there a path forward here?

We should probably profile Speedometer to find out why it got slower and see if we can address it somehow. It is not a large regression and only on a single benchmark.
Comment 17 Chris Dumez 2021-12-13 07:44:08 PST
This may also be a small regression on Jetstream 2 on macOS (~0.5%) although not every task confirmed this.
Comment 18 Alexey Proskuryakov 2021-12-14 10:14:22 PST
(In reply to Chris Dumez from comment #14)
> Sadly, this seems to be a ~0.6% regression on Speedometer 2 on iPhone X
> (with 98% probability). It is neutral on macOS.
> 
> MotionMark 1.1 & JetStream 2.0 seems pretty much neutral on both macOS and
> iOS.

It would seem surprising if it's about macOS vs. iOS more than about Apple Silicon vs. x86_64. Is that what the tests are saying?
Comment 19 Chris Dumez 2021-12-14 10:24:16 PST
(In reply to Alexey Proskuryakov from comment #18)
> (In reply to Chris Dumez from comment #14)
> > Sadly, this seems to be a ~0.6% regression on Speedometer 2 on iPhone X
> > (with 98% probability). It is neutral on macOS.
> > 
> > MotionMark 1.1 & JetStream 2.0 seems pretty much neutral on both macOS and
> > iOS.
> 
> It would seem surprising if it's about macOS vs. iOS more than about Apple
> Silicon vs. x86_64. Is that what the tests are saying?

I can't answer this. The A/B UI doesn't make it clear which architecture the macOS bots are. I assume I tried Intel Mac hardware but I am not certain.

In the end, it doesn't really matter. We have a ~0.6% speedometer regression on iPhone X that needs to be looked into before we can consider making this change.
Comment 20 Chris Dumez 2021-12-16 10:30:56 PST
Created attachment 447368 [details]
WIP Patch
Comment 21 Chris Dumez 2021-12-16 10:44:15 PST
Created attachment 447371 [details]
WIP Patch
Comment 22 Radar WebKit Bug Importer 2021-12-16 14:50:17 PST
<rdar://problem/86597648>
Comment 23 Chris Dumez 2021-12-16 15:21:59 PST
Created attachment 447394 [details]
WIP Patch
Comment 24 Chris Dumez 2021-12-16 18:03:30 PST
Created attachment 447405 [details]
WIP Patch
Comment 25 Chris Dumez 2021-12-17 07:29:31 PST
(In reply to Chris Dumez from comment #24)
> Created attachment 447405 [details]
> WIP Patch

I removed more std::call_once and dispatch_once() calls. I also cleaned up the patch and re-ran A/B testing. Sadly the patch is still a 0.7-1% regression on iPhone X for both Speedometer and JetStream. I gathered traces and will see if something stands out.
Comment 26 Chris Dumez 2021-12-17 08:22:02 PST
(In reply to Chris Dumez from comment #25)
> (In reply to Chris Dumez from comment #24)
> > Created attachment 447405 [details]
> > WIP Patch
> 
> I removed more std::call_once and dispatch_once() calls. I also cleaned up
> the patch and re-ran A/B testing. Sadly the patch is still a 0.7-1%
> regression on iPhone X for both Speedometer and JetStream. I gathered traces
> and will see if something stands out.

Most of the regression seems to be under JSC::DFG::Plan::compileInThreadImpl().
Just a thought but we may be able to turn on thread-safe statics outside JSC with much less pain.
Comment 27 Darin Adler 2021-12-17 10:08:54 PST
(In reply to Chris Dumez from comment #26)
> Just a thought but we may be able to turn on thread-safe statics outside JSC
> with much less pain.

I like the idea of turning them on everywhere we can. Having them off in JSC is a fine first step. Then later, we can also turn them on in JSC except in the places where we need something more efficient. And in those, as I mentioned to you elsewhere, we can seek our idioms that side step the thread safety without relying on the compiler switch. Should not be too challenging once we know which globals are the costly ones.
Comment 28 Darin Adler 2021-12-17 10:09:22 PST
Will you do the A/B test that leaves out JSC next?
Comment 29 Chris Dumez 2021-12-17 10:10:12 PST
(In reply to Darin Adler from comment #28)
> Will you do the A/B test that leaves out JSC next?

Yes, it is ongoing.

Sadly, It is tricky to figure out which globals are the culprits from the traces I gathered.
Comment 30 Darin Adler 2021-12-17 10:22:17 PST
(In reply to Chris Dumez from comment #29)
> (In reply to Darin Adler from comment #28)
> > Will you do the A/B test that leaves out JSC next?
> 
> Yes, it is ongoing.
> 
> Sadly, It is tricky to figure out which globals are the culprits from the
> traces I gathered.

If nothing else works, we may need to hook up a fuzzer to toggle individual source files in combinations and A/B test for us, do a search, then maybe a long time later it will tell us which set of source files matter.