Bug 218939

Summary: Consider AoT-compiling small (<64KB?) WebAssembly Modules immediately without interpreting?
Product: WebKit Reporter: jujjyl
Component: WebAssemblyAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WONTFIX    
Severity: Normal CC: anthony.bowker, fpizlo, justin_michaud, keith_miller, mark.lam, smoley, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: Safari 14   
Hardware: Mac   
OS: macOS 10.15   

Description jujjyl 2020-11-14 06:24:37 PST
http://clb.confined.space/tgifelse/t_shapes_looped.html runs a small Wasm Module (~6KB) in a setTimeout() loop, performing the same Wasm computation over and over.

Executing it in Firefox and Chrome, they both give consistent performance results immediately from the first iteration onwards.

In Safari however, first iteration takes up ~500msecs as the code is interpreted(?), after which AoT compilation kicks in(?), and the subsequent runs are fast ~10-20msecs.

Would it make sense to skip interpreting small (<64KB?) Wasm Modules at all, but immediately/eagerly AoT-compile them?
Comment 1 Radar WebKit Bug Importer 2020-11-16 18:35:24 PST
<rdar://problem/71468658>
Comment 2 Justin Michaud 2022-11-30 13:09:27 PST
Do real applications do this? I think this decision should be made based on benchmarking a real application.

If I remember correctly, there was an x86 emulator that worked by jit-generation of wasm modules. Maybe that would shine some light onto what behaviour would be best.
Comment 3 jujjyl 2022-11-30 13:40:31 PST
I guess the rationale for this might be for when people write really small synthetic benchmarks that run synchronously.

E.g. maybe wasm counterparts of https://jsben.ch/ or https://jsbench.me/ types of pages.

Or otherwise when one is testing how some small thing X works.

In the example of t_shapes_looped.html, I was originally writing a test case in a programming contest where performance of the code was measured. I was getting bad time results with Safari, before I realized that a) Chrome seemed to be much faster (so first thought "Safari is really slow", but then observed that b) I could get Safari to be as fast as Chrome if I would "prime" the performance of the code by running it several times in a setTimeout loop for Safari.

So if this would be a "all the same" kind of thing for deciding between AOT vs interpreting of small modules, then these types of situations might be helped by a heuristic like this?

Naturally, if there is some complication though or other rationale that trumps here and makes this not feasible, then definitely not worth it.
Comment 4 Mark Lam 2022-11-30 13:49:09 PST
I think you mean JIT, not AOT.  AOT means something completely different, and non of the engines use AOT.
Comment 5 jujjyl 2022-12-01 02:02:17 PST
By AOT I mean "Ahead of Time" compilation, not JIT. I.e. the process of not using runtime profiling information to compile optimized code, but compiling it with static information.

Assuming https://webkit.org/blog/7691/webassembly/ is still up to date(?), then I suppose I am referring to compiling with OMG tier up front instead of the BBQ tier.

The article writes "Since WebAssembly does not require any type speculations, we only use tiering to conserve compile time.", suggesting that the compilation with OMG could be done without needing to collect any profiling data. (in other words, compile with OMG "ahead of time")
Comment 6 Mark Lam 2023-08-09 13:48:24 PDT
We will not be doing this in JSC.  Eagerly JIT'ing Wasm is a recipe for slow responsiveness.  Ahead of time compiling is also not a viable strategy for web workloads that are highly variable.

The better strategy would be to implement a faster interpreter, and tier up faster on the JITs as well as tune their compilation policy.