Bug 228240 - 3.5 MB system-wide footprint impact due to thread-locals in libANGLE
Summary: 3.5 MB system-wide footprint impact due to thread-locals in libANGLE
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: ANGLE (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dean Jackson
URL:
Keywords: InRadar
Depends on:
Blocks: 220896
  Show dependency treegraph
 
Reported: 2021-07-23 11:28 PDT by Dean Jackson
Modified: 2021-07-27 14:36 PDT (History)
8 users (show)

See Also:


Attachments
WIP (8.52 KB, patch)
2021-07-23 11:32 PDT, Dean Jackson
no flags Details | Formatted Diff | Diff
Patch (10.27 KB, patch)
2021-07-23 17:23 PDT, Dean Jackson
kbr: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dean Jackson 2021-07-23 11:28:58 PDT
From the macOS system memory folks:

Because of the way thread-locals are implemented in dyld today, they are dirty on process launch, even if the process does not use the functionality provided by your framework at all.

/System/Library/PrivateFrameworks/WebCore.framework/Frameworks/libANGLE-shared.dylib has recently added these symbols:

  100.0%: __ZN2gl20gCurrentValidContextE
  97.5%: __ZN3egl14gCurrentThreadE

(gl::gCurrentValidContext and egl::gCurrentThread)

Which are thread-local.

This is a regression in macOS Monterey.

Is it possible to switch to a different mechanism, or to use pthread TLVs manually instead of thread-locals?
Because of this, libANGLE-shared used to be dirty in 172 processes and is now dirtied in 355.

Overall it is a 3.5MB regression.
Comment 1 Dean Jackson 2021-07-23 11:29:19 PDT
rdar://79504783
Comment 2 Dean Jackson 2021-07-23 11:32:21 PDT
Created attachment 434103 [details]
WIP

Work in progress to get review. Please ignore the change to WebKit networking code - that was to help a local compile.

I'm not confident in this patch because:
- I've never used pthread_key_create before
- I'm worried about a performance regression in querying state on every access (which happens a lot)
- I want a better place to initialise and delete the keys
- This is an invasive patch that will need to be applied each time we update ANGLE

Eventually we'll fix dyld's thread_local implementation.
Comment 3 EWS Watchlist 2021-07-23 11:33:17 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Comment 4 Kenneth Russell 2021-07-23 12:02:35 PDT
Comment on attachment 434103 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=434103&action=review

Looks good overall especially if this significantly reduces binary size, though it's unfortunate we can't use the built-in mechanisms.

A few questions / comments, but to unblock this work, r+ now.

> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:302
> +void FreeCurrentValidContext()

This and the function above are a little confusingly named. Could you consider:

  InitializeCurrentValidContextTLS
  FreeCurrentValidContextTLS
?

I also noticed that FreeCurrentValidContext isn't called from anywhere - which is fine because it can really only be called upon unloading of ANGLE or application shutdown. Should a comment be added about when it could plausibly be used?

> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:49
> +    if (!initialized) {

Is it necessary to do this double-checking? The documentation indicates that the correct usage is just to call dispatch_once unconditionally.
https://developer.apple.com/documentation/dispatch/1447169-dispatch_once

> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:99
> +bool InitializeCurrentThreadTLS()

The naming convention between the CurrentThread and CurrentValidContext should match. So maybe here:

InitializeCurrentThreadTLSIndex
FreeCurrentThreadTLSIndex
GetCurrentThreadTLS
SetCurrentThreadTLS

and use TLSIndex / TLS in the CurrentValidContext functions above.

> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:119
> +    if (!initialized) {

Same question about the double-checked initialization, though I do see this is on the hot path.
Comment 5 Geoffrey Garen 2021-07-23 12:32:32 PDT
Comment on attachment 434103 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=434103&action=review

I appreciate Kenneth's review but I'm gonna be a little more explicit and say r- because I see a bug.

> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:47
> +    static dispatch_once_t once = 0;

Better to leave off the = 0 and accept whatever the platform says is the default linker-initialized value.

>> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:49
>> +    static bool initialized = false;
>> +    if (!initialized) {
> 
> Is it necessary to do this double-checking? The documentation indicates that the correct usage is just to call dispatch_once unconditionally.
> https://developer.apple.com/documentation/dispatch/1447169-dispatch_once

Kenneth is right. This is a bug. Because 'initialized' lacks any synchronization or memory barrier, it can read as true when it is still logically false.

I suggest removing the initialized variable and always calling dispatch_once. dispatch_once is fast (and inlined) in the "already dispatched" case.

>> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:119
>> +    if (!initialized) {
> 
> Same question about the double-checked initialization, though I do see this is on the hot path.

Same comments as above.

> Source/WebKit/NetworkProcess/webrtc/NetworkRTCUDPSocketCocoa.mm:282
> +#if defined(NW_HAS_SHARE_LISTENER_PORT) && NW_HAS_SHARE_LISTENER_PORT && 0

Revert?
Comment 6 Dean Jackson 2021-07-23 17:07:00 PDT
Comment on attachment 434103 [details]
WIP

View in context: https://bugs.webkit.org/attachment.cgi?id=434103&action=review

>> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:302
>> +void FreeCurrentValidContext()
> 
> This and the function above are a little confusingly named. Could you consider:
> 
>   InitializeCurrentValidContextTLS
>   FreeCurrentValidContextTLS
> ?
> 
> I also noticed that FreeCurrentValidContext isn't called from anywhere - which is fine because it can really only be called upon unloading of ANGLE or application shutdown. Should a comment be added about when it could plausibly be used?

Done!

(The reason I prefixed the thread form with TLS was because there was already a method called GetCurrentThread()).

>> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:47
>> +    static dispatch_once_t once = 0;
> 
> Better to leave off the = 0 and accept whatever the platform says is the default linker-initialized value.

Done

>>> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:49
>>> +    if (!initialized) {
>> 
>> Is it necessary to do this double-checking? The documentation indicates that the correct usage is just to call dispatch_once unconditionally.
>> https://developer.apple.com/documentation/dispatch/1447169-dispatch_once
> 
> Kenneth is right. This is a bug. Because 'initialized' lacks any synchronization or memory barrier, it can read as true when it is still logically false.
> 
> I suggest removing the initialized variable and always calling dispatch_once. dispatch_once is fast (and inlined) in the "already dispatched" case.

Removed (although, question... even if it was read as true, the code would still hit the dispatch_once and sync there right?)

>> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:99
>> +bool InitializeCurrentThreadTLS()
> 
> The naming convention between the CurrentThread and CurrentValidContext should match. So maybe here:
> 
> InitializeCurrentThreadTLSIndex
> FreeCurrentThreadTLSIndex
> GetCurrentThreadTLS
> SetCurrentThreadTLS
> 
> and use TLSIndex / TLS in the CurrentValidContext functions above.

Done.

>>> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:119
>>> +    if (!initialized) {
>> 
>> Same question about the double-checked initialization, though I do see this is on the hot path.
> 
> Same comments as above.

Done. I also moved the initialization call into the Get/Set function to make it less confusing.
Comment 7 Dean Jackson 2021-07-23 17:23:27 PDT
Created attachment 434148 [details]
Patch
Comment 8 Kenneth Russell 2021-07-23 17:43:08 PDT
Comment on attachment 434148 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434148&action=review

This revision looks better to me. Since Geoff had concerns earlier, please sync up with him too. r+ with two small nits.

> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:297
> +TLSIndex CurrentValidContextIndex = TLS_INVALID_INDEX;

Nit: did you want to put this in an anonymous namespace, make it static, etc. to hide the symbol's visibility?

> Source/ThirdParty/ANGLE/src/libGLESv2/global_state.cpp:94
> +TLSIndex CurrentThreadIndex = TLS_INVALID_INDEX;

Same question here.
Comment 9 Geoffrey Garen 2021-07-26 11:02:37 PDT
Comment on attachment 434148 [details]
Patch

r=me too; also agree that we should use static for CurrentValidContextIndex and CurrentThreadIndex.
Comment 10 Dean Jackson 2021-07-27 11:06:30 PDT
Comment on attachment 434148 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434148&action=review

>> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:297
>> +TLSIndex CurrentValidContextIndex = TLS_INVALID_INDEX;
> 
> Nit: did you want to put this in an anonymous namespace, make it static, etc. to hide the symbol's visibility?

Yes, thanks!
Comment 11 Dean Jackson 2021-07-27 11:06:31 PDT
Comment on attachment 434148 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=434148&action=review

>> Source/ThirdParty/ANGLE/src/libANGLE/Context.cpp:297
>> +TLSIndex CurrentValidContextIndex = TLS_INVALID_INDEX;
> 
> Nit: did you want to put this in an anonymous namespace, make it static, etc. to hide the symbol's visibility?

Yes, thanks!
Comment 12 Dean Jackson 2021-07-27 14:36:42 PDT
Committed r280349 (239996@main): <https://commits.webkit.org/239996@main>