Bug 218939

Summary:	Consider AoT-compiling small (<64KB?) WebAssembly Modules immediately without interpreting?
Product:	WebKit	Reporter:	jujjyl
Component:	WebAssembly	Assignee:	Nobody <webkit-unassigned>
Status:	RESOLVED WONTFIX
Severity:	Normal	CC:	anthony.bowker, fpizlo, justin_michaud, keith_miller, mark.lam, smoley, webkit-bug-importer, ysuzuki
Priority:	P2	Keywords:	InRadar
Version:	Safari 14
Hardware:	Mac
OS:	macOS 10.15

jujjyl

Reported 2020-11-14 06:24:37 PST

http://clb.confined.space/tgifelse/t_shapes_looped.html runs a small Wasm Module (~6KB) in a setTimeout() loop, performing the same Wasm computation over and over. Executing it in Firefox and Chrome, they both give consistent performance results immediately from the first iteration onwards. In Safari however, first iteration takes up ~500msecs as the code is interpreted(?), after which AoT compilation kicks in(?), and the subsequent runs are fast ~10-20msecs. Would it make sense to skip interpreting small (<64KB?) Wasm Modules at all, but immediately/eagerly AoT-compile them?

Attachments
Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-11-16 18:35:24 PST

<rdar://problem/71468658>

Justin Michaud

Comment 2 2022-11-30 13:09:27 PST

Do real applications do this? I think this decision should be made based on benchmarking a real application. If I remember correctly, there was an x86 emulator that worked by jit-generation of wasm modules. Maybe that would shine some light onto what behaviour would be best.

jujjyl

Comment 3 2022-11-30 13:40:31 PST

I guess the rationale for this might be for when people write really small synthetic benchmarks that run synchronously. E.g. maybe wasm counterparts of https://jsben.ch/ or https://jsbench.me/ types of pages. Or otherwise when one is testing how some small thing X works. In the example of t_shapes_looped.html, I was originally writing a test case in a programming contest where performance of the code was measured. I was getting bad time results with Safari, before I realized that a) Chrome seemed to be much faster (so first thought "Safari is really slow", but then observed that b) I could get Safari to be as fast as Chrome if I would "prime" the performance of the code by running it several times in a setTimeout loop for Safari. So if this would be a "all the same" kind of thing for deciding between AOT vs interpreting of small modules, then these types of situations might be helped by a heuristic like this? Naturally, if there is some complication though or other rationale that trumps here and makes this not feasible, then definitely not worth it.

Mark Lam

Comment 4 2022-11-30 13:49:09 PST

I think you mean JIT, not AOT. AOT means something completely different, and non of the engines use AOT.

jujjyl

Comment 5 2022-12-01 02:02:17 PST

By AOT I mean "Ahead of Time" compilation, not JIT. I.e. the process of not using runtime profiling information to compile optimized code, but compiling it with static information. Assuming https://webkit.org/blog/7691/webassembly/ is still up to date(?), then I suppose I am referring to compiling with OMG tier up front instead of the BBQ tier. The article writes "Since WebAssembly does not require any type speculations, we only use tiering to conserve compile time.", suggesting that the compilation with OMG could be done without needing to collect any profiling data. (in other words, compile with OMG "ahead of time")

Mark Lam

Comment 6 2023-08-09 13:48:24 PDT

We will not be doing this in JSC. Eagerly JIT'ing Wasm is a recipe for slow responsiveness. Ahead of time compiling is also not a viable strategy for web workloads that are highly variable. The better strategy would be to implement a faster interpreter, and tier up faster on the JITs as well as tune their compilation policy.

Note You need to log in before you can comment on or make changes to this bug.