Bug 241905

Summary: Add ldp and stp support to ARM64 and ARM64E offlineasm.
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: ews-watchlist, keith_miller, msaboff, saam, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
EWS testing ews-feeder: commit-queue-

Description Mark Lam 2022-06-22 22:44:50 PDT
offlineasm used to emit this LLInt code:
    ".loc 1 996\n"        "ldr x19, [x0] \n"                   // LowLevelInterpreter.asm:996
    ".loc 1 997\n"        "ldr x20, [x0, #8] \n"               // LowLevelInterpreter.asm:997
    ".loc 1 998\n"        "ldr x21, [x0, #16] \n"              // LowLevelInterpreter.asm:998
    ".loc 1 999\n"        "ldr x22, [x0, #24] \n"              // LowLevelInterpreter.asm:999
    ...
    ".loc 1 1006\n"       "ldr d8, [x0, #80] \n"               // LowLevelInterpreter.asm:1006
    ".loc 1 1007\n"       "ldr d9, [x0, #88] \n"               // LowLevelInterpreter.asm:1007
    ".loc 1 1008\n"       "ldr d10, [x0, #96] \n"              // LowLevelInterpreter.asm:1008
    ".loc 1 1009\n"       "ldr d11, [x0, #104] \n"             // LowLevelInterpreter.asm:1009
    ...

Now, it emits this:
    ".loc 1 996\n"        "ldp x19, x20, [x0, #0] \n"          // LowLevelInterpreter.asm:996
    ".loc 1 997\n"        "ldp x21, x22, [x0, #16] \n"         // LowLevelInterpreter.asm:997
    ...
    ".loc 1 1001\n"       "ldp d8, d9, [x0, #80] \n"           // LowLevelInterpreter.asm:1001
    ".loc 1 1002\n"       "ldp d10, d11, [x0, #96] \n"         // LowLevelInterpreter.asm:1002
    ...

Also, there was some code that kept recomputing the base address of a sequence of load/store instructions.  For example,
    ".loc 6 902\n"        "add x13, sp, x10, lsl #3 \n"        // WebAssembly.asm:902
                          "ldr x0, [x13, #48] \n"
                          "add x13, sp, x10, lsl #3 \n"
                          "ldr x1, [x13, #56] \n"
                          "add x13, sp, x10, lsl #3 \n"
                          "ldr x2, [x13, #64] \n"
                          "add x13, sp, x10, lsl #3 \n"
                          "ldr x3, [x13, #72] \n"
    ...

For such places, we observe that the base address is the same for every load/store instruction in the sequence, and precompute it in the LLInt asm code to help out the offline asm.  This allows the offlineasm to now emits this more efficient code instead:
    ".loc 6 896\n"        "add x10, sp, x10, lsl #3 \n"        // WebAssembly.asm:896
    ".loc 6 898\n"        "ldp x0, x1, [x10, #48] \n"          // WebAssembly.asm:898
                          "ldp x2, x3, [x10, #64] \n"
    ...
Comment 1 Mark Lam 2022-06-22 23:15:05 PDT
Pull request: https://github.com/WebKit/WebKit/pull/1716
Comment 2 Mark Lam 2022-06-23 10:03:52 PDT
Created attachment 460449 [details]
EWS testing
Comment 3 EWS 2022-06-23 13:25:43 PDT
Committed 251799@main (79eb5e92d4fc): <https://commits.webkit.org/251799@main>

Reviewed commits have been landed. Closing PR #1716 and removing active labels.
Comment 4 Radar WebKit Bug Importer 2022-06-23 13:26:13 PDT
<rdar://problem/95802476>