Bug 200329 - [JSC] Support WebAssembly in SamplingProfiler
Summary: [JSC] Support WebAssembly in SamplingProfiler
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks: 200362
  Show dependency treegraph
 
Reported: 2019-07-31 18:40 PDT by Yusuke Suzuki
Modified: 2019-08-02 15:59 PDT (History)
5 users (show)

See Also:


Attachments
Patch (47.38 KB, patch)
2019-07-31 22:15 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (64.67 KB, patch)
2019-08-01 00:26 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews115 for mac-highsierra (3.12 MB, application/zip)
2019-08-01 02:25 PDT, Build Bot
no flags Details
Patch (72.94 KB, patch)
2019-08-01 11:35 PDT, Yusuke Suzuki
sbarati: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2019-07-31 18:40:35 PDT
Currently, sampling profiler does not list up wasm functions. We should do that to investigate JetStream2 wasm benchmarks performance bottlenecks.
Comment 1 Yusuke Suzuki 2019-07-31 18:49:46 PDT
(In reply to Yusuke Suzuki from comment #0)
> Currently, sampling profiler does not list up wasm functions. We should do
> that to investigate JetStream2 wasm benchmarks performance bottlenecks.

I'll use this to investigate wasm benchmark status in JetStream2 cli.js
Comment 2 Yusuke Suzuki 2019-07-31 22:15:39 PDT
Created attachment 375292 [details]
Patch

WIP
Comment 3 Build Bot 2019-07-31 22:18:20 PDT
Attachment 375292 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/wasm/WasmFaultSignalHandler.cpp:78:  Extra space before [.  [whitespace/brackets] [5]
Total errors found: 1 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Yusuke Suzuki 2019-08-01 00:26:10 PDT
Created attachment 375294 [details]
Patch
Comment 5 Build Bot 2019-08-01 00:28:06 PDT
Attachment 375294 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/wasm/WasmFaultSignalHandler.cpp:78:  Extra space before [.  [whitespace/brackets] [5]
Total errors found: 1 in 22 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Saam Barati 2019-08-01 01:33:19 PDT
Comment on attachment 375294 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=375294&action=review

Will look in detail tomorrow.

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:91
> +        , m_wasmCalleeLocker(wasmCalleeLocker)

Stopping Wasm and grabbing a lock is somewhat new in the sense that Wasm::Callee lives outside VM where SamplingProfiler is per VM. Do you foresee any potential for deadlocks when running two sampling profilers?

> Source/JavaScriptCore/wasm/WasmCallee.cpp:85
> +    return callees->contains(callee);

Nit: it’s a little weird to put all this singleton stuff on Callee. I think it’d be better to have something like a CalleeRegistry class.

> Source/JavaScriptCore/wasm/WasmIndexOrName.h:38
> +// Keep this class copyable the world is stopped: do not allocate any memory while copying this.

“copyable the” => “copyable when the”
Comment 7 Yusuke Suzuki 2019-08-01 02:24:21 PDT
Comment on attachment 375294 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=375294&action=review

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:91
>> +        , m_wasmCalleeLocker(wasmCalleeLocker)
> 
> Stopping Wasm and grabbing a lock is somewhat new in the sense that Wasm::Callee lives outside VM where SamplingProfiler is per VM. Do you foresee any potential for deadlocks when running two sampling profilers?

So long as we keep a correct order of locking, the deadlock can be avoided. Since Wasm::Callee's lock is somewhat a leaf of a lock (we do not take other locks while holding this Wasm::Callee's lock), so long as we take this lock at the end of lock taking sequence, I think this is OK.

>> Source/JavaScriptCore/wasm/WasmCallee.cpp:85
>> +    return callees->contains(callee);
> 
> Nit: it’s a little weird to put all this singleton stuff on Callee. I think it’d be better to have something like a CalleeRegistry class.

The benefit of this is that we can completely hide `registerCallee` and `unregisterCallee` inside WasmCallee.cpp. The interfaces modifying this set are strictly limited: Callee::create & Callee::~Callee.
But maybe, adding CalleeRegistry is simpler. Changing.

>> Source/JavaScriptCore/wasm/WasmIndexOrName.h:38
>> +// Keep this class copyable the world is stopped: do not allocate any memory while copying this.
> 
> “copyable the” => “copyable when the”

Fixed.
Comment 8 Yusuke Suzuki 2019-08-01 02:24:54 PDT
BTW, here is sample output.


Starting JetStream2
Running gcc-loops-wasm:
    Startup: 227.273
    Run time: 0.588
    Score: 11.561


Startup: 227.273
Runtime: 0.588

Total Score:  11.561



Sampling rate: 1000.000000 microseconds
Top functions as <numSamples  'functionName:sourceID'>
  2603    '<?>.wasm-function[38]:-1'
  1494    '<?>.wasm-function[39]:-1'
  1004    '<?>.wasm-function[36]:-1'
   965    '<?>.wasm-function[35]:-1'
   325    '<?>.wasm-function[34]:-1'
   202    '<?>.wasm-function[546]:-1'
    16    '<?>.wasm-function[40]:-1'
     3    'loadString:-1'
     3    '<?>.wasm-function[37]:-1'
     2    'load:-1'
     1    'prepareToRun:2'
     1    '(unknown):-1'


Sampling rate: 1000.000000 microseconds
Hottest bytecodes as <numSamples   'functionName#hash:JITType:bytecodeIndex'>
  2603    '<?>.wasm-function[38]#<nil>:BBQ:<nil>'
  1494    '<?>.wasm-function[39]#<nil>:BBQ:<nil>'
  1004    '<?>.wasm-function[36]#<nil>:BBQ:<nil>'
   965    '<?>.wasm-function[35]#<nil>:BBQ:<nil>'
   324    '<?>.wasm-function[34]#<nil>:OMG:<nil>'
   202    '<?>.wasm-function[546]#<nil>:OMG:<nil>'
    16    '<?>.wasm-function[40]#<nil>:BBQ:<nil>'
     3    'loadString#<nil>:None:<nil>'
     3    '<?>.wasm-function[37]#<nil>:BBQ:<nil>'
     2    'load#<nil>:None:<nil>'
     1    'createStandardStreams#Asr3FY:LLInt:387'
     1    'filter#CXE8tr:Baseline:81'
     1    'newPromiseCapability#DzQl2X:Baseline:91'
     1    'prepareToRun#DqWFYU:LLInt:129'
     1    'callRuntimeCallbacks#EYZrDM:LLInt:200'
     1    'updateGlobalBufferViews#AqvIC2:LLInt:85'
     1    '(unknown)#<nil>:None:<nil>'
     1    '#<nil>:None:<nil>'
     1    '<?>.wasm-function[34]#<nil>:BBQ:<nil>'
     1    'doRun#C0pLVZ:LLInt:20191'
     1    '#A7TOIG:LLInt:22'
     1    'instantiateArrayBuffer#ALPYQ8:LLInt:22'
     1    'fulfillPromise#AuvX5q:Baseline:95'
Comment 9 Build Bot 2019-08-01 02:25:55 PDT
Comment on attachment 375294 [details]
Patch

Attachment 375294 [details] did not pass mac-debug-ews (mac):
Output: https://webkit-queues.webkit.org/results/12845657

New failing tests:
inspector/cpu-profiler/threads.html
Comment 10 Build Bot 2019-08-01 02:25:57 PDT
Created attachment 375296 [details]
Archive of layout-test-results from ews115 for mac-highsierra

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews115  Port: mac-highsierra  Platform: Mac OS X 10.13.6
Comment 11 Yusuke Suzuki 2019-08-01 11:35:47 PDT
Created attachment 375328 [details]
Patch
Comment 12 Build Bot 2019-08-01 11:39:03 PDT
Attachment 375328 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/wasm/WasmFaultSignalHandler.cpp:81:  Extra space before [.  [whitespace/brackets] [5]
Total errors found: 1 in 24 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 Saam Barati 2019-08-02 14:58:37 PDT
Comment on attachment 375328 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=375328&action=review

r=me

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:132
> +        if (unsafeCallee.isWasm()) {

nit: Maybe we should name this function we're in "recordJITFrame" instead of "recordJSFrame" since it now handles Wasm

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:135
> +                // At this point, Wasm::Callee would be dying (ref count is 0), but its fields are still live.

this is the other end of the memory ordering I was talking about in ~Callee()

> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:136
> +                // And we can safely copy Wasm::IndexOrName even any lock is held by suspended threads.

"even any" => "even when any"

> Source/JavaScriptCore/runtime/SamplingProfiler.h:71
> +        Optional<Wasm::CompilationMode> wasmCompilationMode;

why not also put this under "#if ENABLE(WEBASSEMBLY)"?

(If it makes the code more annoying, I'd just skip doing it)

> Source/JavaScriptCore/runtime/SamplingProfiler.h:98
> +        Optional<Wasm::CompilationMode> wasmCompilationMode;

ditto

> Source/JavaScriptCore/wasm/WasmCallee.cpp:52
> +    CalleeRegistry::singleton().unregisterCallee(this);

To be super pedantic, we may need a store store memory fence after this, since theoretically, locking doesn't prevent stores below the lock is released into moving into the lock region (just stores in the lock region can't move out).

So theoretically, the compiler could move stores of the fields of |this| object before the store to remove the entry in the hash table.

Maybe we don't care about this memory ordering weirdness, but I think it might be worth making sure we don't crash if those bits are bad.

> Source/JavaScriptCore/wasm/WasmCallee.h:64
> +        return std::make_tuple(start, end);

style nit: I'd just do "return { start, end };"

> Source/JavaScriptCore/wasm/WasmCalleeRegistry.cpp:43
> +void CalleeRegistry::initialize()
> +{
> +    static std::once_flag onceKey;
> +    std::call_once(onceKey, [] {
> +        calleeRegistry.construct();
> +    });
> +}

Since this is called from initializeThreading, you don't need this synchronization to happen.

Alternatively, you could just not call this from initializeThreading and keep things this way.
Comment 14 Saam Barati 2019-08-02 14:58:55 PDT
(In reply to Yusuke Suzuki from comment #8)
> BTW, here is sample output.
> 
> 
> Starting JetStream2
> Running gcc-loops-wasm:
>     Startup: 227.273
>     Run time: 0.588
>     Score: 11.561
> 
> 
> Startup: 227.273
> Runtime: 0.588
> 
> Total Score:  11.561
> 
> 
> 
> Sampling rate: 1000.000000 microseconds
> Top functions as <numSamples  'functionName:sourceID'>
>   2603    '<?>.wasm-function[38]:-1'
>   1494    '<?>.wasm-function[39]:-1'
>   1004    '<?>.wasm-function[36]:-1'
>    965    '<?>.wasm-function[35]:-1'
>    325    '<?>.wasm-function[34]:-1'
>    202    '<?>.wasm-function[546]:-1'
>     16    '<?>.wasm-function[40]:-1'
>      3    'loadString:-1'
>      3    '<?>.wasm-function[37]:-1'
>      2    'load:-1'
>      1    'prepareToRun:2'
>      1    '(unknown):-1'
> 
> 
> Sampling rate: 1000.000000 microseconds
> Hottest bytecodes as <numSamples   'functionName#hash:JITType:bytecodeIndex'>
>   2603    '<?>.wasm-function[38]#<nil>:BBQ:<nil>'
>   1494    '<?>.wasm-function[39]#<nil>:BBQ:<nil>'
>   1004    '<?>.wasm-function[36]#<nil>:BBQ:<nil>'
>    965    '<?>.wasm-function[35]#<nil>:BBQ:<nil>'
>    324    '<?>.wasm-function[34]#<nil>:OMG:<nil>'
>    202    '<?>.wasm-function[546]#<nil>:OMG:<nil>'
>     16    '<?>.wasm-function[40]#<nil>:BBQ:<nil>'
>      3    'loadString#<nil>:None:<nil>'
>      3    '<?>.wasm-function[37]#<nil>:BBQ:<nil>'
>      2    'load#<nil>:None:<nil>'
>      1    'createStandardStreams#Asr3FY:LLInt:387'
>      1    'filter#CXE8tr:Baseline:81'
>      1    'newPromiseCapability#DzQl2X:Baseline:91'
>      1    'prepareToRun#DqWFYU:LLInt:129'
>      1    'callRuntimeCallbacks#EYZrDM:LLInt:200'
>      1    'updateGlobalBufferViews#AqvIC2:LLInt:85'
>      1    '(unknown)#<nil>:None:<nil>'
>      1    '#<nil>:None:<nil>'
>      1    '<?>.wasm-function[34]#<nil>:BBQ:<nil>'
>      1    'doRun#C0pLVZ:LLInt:20191'
>      1    '#A7TOIG:LLInt:22'
>      1    'instantiateArrayBuffer#ALPYQ8:LLInt:22'
>      1    'fulfillPromise#AuvX5q:Baseline:95'

Excellent!!
Comment 15 Yusuke Suzuki 2019-08-02 15:57:18 PDT
Comment on attachment 375328 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=375328&action=review

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:132
>> +        if (unsafeCallee.isWasm()) {
> 
> nit: Maybe we should name this function we're in "recordJITFrame" instead of "recordJSFrame" since it now handles Wasm

Sounds really nice. Fixed.

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:135
>> +                // At this point, Wasm::Callee would be dying (ref count is 0), but its fields are still live.
> 
> this is the other end of the memory ordering I was talking about in ~Callee()

Discussed with Saam, and we conclude that this is OK. The rationale is described in Wasm::Callee's part.

>> Source/JavaScriptCore/runtime/SamplingProfiler.cpp:136
>> +                // And we can safely copy Wasm::IndexOrName even any lock is held by suspended threads.
> 
> "even any" => "even when any"

Fixed.

>> Source/JavaScriptCore/runtime/SamplingProfiler.h:71
>> +        Optional<Wasm::CompilationMode> wasmCompilationMode;
> 
> why not also put this under "#if ENABLE(WEBASSEMBLY)"?
> 
> (If it makes the code more annoying, I'd just skip doing it)

Yeah, changing this part makes a code tricky, like, adding a new parameter based on ifdef.

(xxx param
#if ENABLE(WEBASSEMBLY)
, yyy param2
#endif
)

So, I intentionally added this here unconditionally :)
Due to the similar reason (annoying code), DFG CompilationMode is also unconditionally defined.

>> Source/JavaScriptCore/wasm/WasmCallee.cpp:52
>> +    CalleeRegistry::singleton().unregisterCallee(this);
> 
> To be super pedantic, we may need a store store memory fence after this, since theoretically, locking doesn't prevent stores below the lock is released into moving into the lock region (just stores in the lock region can't move out).
> 
> So theoretically, the compiler could move stores of the fields of |this| object before the store to remove the entry in the hash table.
> 
> Maybe we don't care about this memory ordering weirdness, but I think it might be worth making sure we don't crash if those bits are bad.

Discussed with Saam. We conclude that we do not need to have this barrier since sampling-profiler reads the state while it is holding this lock. So half-baked modification inside the lock region is not exposed to sampling-profiler. The sampling-profiler see the changes in lock-region as an atomic op.

>> Source/JavaScriptCore/wasm/WasmCallee.h:64
>> +        return std::make_tuple(start, end);
> 
> style nit: I'd just do "return { start, end };"

Nice. Fixed.

>> Source/JavaScriptCore/wasm/WasmCalleeRegistry.cpp:43
>> +}
> 
> Since this is called from initializeThreading, you don't need this synchronization to happen.
> 
> Alternatively, you could just not call this from initializeThreading and keep things this way.

Removing synchronization seems simple and it just works well :) Changed.
Comment 16 Yusuke Suzuki 2019-08-02 15:58:23 PDT
Committed r248187: <https://trac.webkit.org/changeset/248187>
Comment 17 Radar WebKit Bug Importer 2019-08-02 15:59:18 PDT
<rdar://problem/53883708>