Bug 163171
Summary: | B3 should efficiently emit great code for WebAssembly property accesses and bounds checks | ||
---|---|---|---|
Product: | WebKit | Reporter: | Filip Pizlo <fpizlo> |
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW | ||
Severity: | Normal | CC: | keith_miller |
Priority: | P2 | ||
Version: | WebKit Nightly Build | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | 163465, 163469, 163172, 163175, 163234, 163246, 163394 | ||
Bug Blocks: |
Filip Pizlo
This implies a strategy where:
- The base and size of wasm memory are pinned to registers.
- B3 understands the meaning of a memory access or bounds check, so that existing optimizations can be used to remove them and new specialized optimizations can be added *easily*.
- Air understands the meaning of a pinned register, and for these registers, it understands *why* they are pinned.
- B3->Air lowering easily does all of the same fusion as now, and has new "fusions" for wasm memory accesses (basically, it will only fuze the offset and rely on lea for all else).
- B3->Air lowering should be able to select complex leas! I think we forgot to implement that, but it will be super important now.
- Strength reduction knows about the whole guard page thingy. It's gotta have some way of querying whether that's on or not, and how big that page is. When it's non-zero, strength reduction has got to be able to do smart things with offset computations.
- The amount of IR used to represent a memory access should be minimal.
One way or another we'll need to write WebAssembly-specific compiler code that optimizes these accesses. I think that code might as well live in B3, so B3 might as well know about WebAssembly.
I think we should do all of this with a Wasm flag and a WasmBoundsCheck opcode.
The most important is the Wasm flag, as in:
Load<Wasm>(@ptr, offset = 0, guardedBytes = 0)
This zero-extends @ptr, adds it to the base register, then adds the given offset, and performs the access on the resulting pointer. The Wasm flag may be applied to any memory access operation. The guardedBytes tells you how many bytes of guard page are available. You can safely increase the offset by up to that amount. The offset and guardedBytes must be balanced: if you add to offset, you subtract from guardedBytes. You are not allowed to reduce offset.
So, you can turn this:
Load<Wasm>(Add(@ptr, 666), offset = 0, guardedBytes = 4096)
into this:
Load<Wasm>(@ptr, offset = 666, guardedBytes = 3430)
And to perform bounds checks:
WasmBoundsCheck(@ptr, guardedBytes = 0)
The guardedBytes tells you how many bytes of guard page are available. This means that you can turn this:
WasmBoundsCheck(Add(@ptr, 666), guardedBytes = 4096)
into this:
WasmBoundsCheck(@ptr, guardedBytes = 3430)
The reason for adding such Wasm-specific things is twofold: (1) it reduces the IR for an access and (2) it allows B3 to perform the guard page optimization for fusing an offset computing Add into a memory access. I don't think B3 would have been able to do that optimization without deep knowledge of WebAssembly, and I don't think that we should invent a whole new compiler just to do Wasm optimizations - especially since the B3 IR is such a good fit for Wasm already.
Awkwardly, we need a way of tracking the fact that any call may cause a heap resize, which could change both base and size. I think that the best way to achieve this is to simultaneously add a first-class pinned register support. This means that Effects will have a RegisterSet. Actually two: one for reads and one for writes. Wow that'll be fun.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Filip Pizlo
The downside of having a Wasm flag on memory accesses is that it requires a new MemoryValue subclass, and we don't want to add too many of those. It's worth considering if there is a way to do this without adding MemoryValue subclasses.
What if we had a WasmAddress(@ptr, reg, guardedBytes) thing?
I think that will work better.
Keith Miller
I think your strategy does not work with the design of wasm. You cannot sink WasmBoundsCheck(Add(ptr, const), gpr, offset) to WasmBoundsCheck(ptr, gpr, offset - const) because that would change the trapping semantics of the language. For example if the user wrote the following wasm:
ptr = i32.Add(p, 2024)
i32.Load(ptr)
if p = 0 - 1024 then this code should not trap but if we sunk that addition into the guard page size we would trap since the pointer would appear to be out of bounds.
Filip Pizlo
(In reply to comment #2)
> I think your strategy does not work with the design of wasm. You cannot sink
> WasmBoundsCheck(Add(ptr, const), gpr, offset) to WasmBoundsCheck(ptr, gpr,
> offset - const) because that would change the trapping semantics of the
> language. For example if the user wrote the following wasm:
>
> ptr = i32.Add(p, 2024)
> i32.Load(ptr)
>
> if p = 0 - 1024 then this code should not trap but if we sunk that addition
> into the guard page size we would trap since the pointer would appear to be
> out of bounds.
Word. I think that if we wanted to do this, we would do it by versioning the code in B3.
When using bounds checks, we'd emit the check that we want (with the sunken offset), but as an actual branch. The failing path would have a different version of the code in which we do the precise bounds checks. We would duplicate (i.e. version) some path following the check, taking into consideration code size versus profitability heuristics. For example, we would probably always version small loops that contained repeated accesses to some pointer at different offsets that were all within the guard page size.
When using trapping, we could introduce the idea of a trapping MemoryValue block terminal with two successors. The normal successor would be taken when the instruction did not trap, and the other successor would be taken if we trapped (the trap handler would just fiddle the PC to take you to the other path).
We don't have to do any of that now, but boy would it be fun!