Bug 169815
Summary: | WebAssembly: eliminate redundant ARM64 TLS load | ||
---|---|---|---|
Product: | WebKit | Reporter: | JF Bastien <jfbastien> |
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED DUPLICATE | ||
Severity: | Normal | CC: | fpizlo, jfbastien, keith_miller, mark.lam, msaboff, saam |
Priority: | P2 | ||
Version: | WebKit Nightly Build | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Bug Depends on: | 169611 | ||
Bug Blocks: | 159775 |
JF Bastien
This is a small optimization, I'm not sure it'll pay off much but it's neat.
As part of bug #169611 we're moving the WebAssembly context to a TLS slot. On x86 that's a single load / store off the segment register, but on ARM64 it uses mrs + mask + {load,store}. the `mrs TPIDRRO EL0` instruction, coupled with the mask and the address generation, simply return the location of our TLS slot (the offset is defined as WTF_WASM_CONTEXT_KEY in wtf/FastTls.h). That value is idempotent as long as we're executing in the same thread, and that's an invariant of WebAssembly: different instances are set in that context but the location is the same per thread.
Right now this mrs+mask+memory combo is generated by the ARM64 macro assembler. This is inefficient. We could instead teach the compiler about the idempotent part (i.e. "get TLS slot #x") and then split off the load / store from that slot. For x86 that could mean combining both operations after the fact or keeping the same model we have now. For ARM64 that would allow us to eliminate redundant mrs+mask if profitable, or dematerializing them under register pressure.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
JF Bastien
Fil thinks we just want to pin a register on ARM because the optimization I propose will likely do the same thing by hoisting the redundant load to the top of each function. May as well get rid of the load entirely.
Let's just do is as part of bug #169773 then
*** This bug has been marked as a duplicate of bug 169773 ***