126474 – CStack Branch: ARM64 needs push/pop pair macro assembler instructions

RESOLVED FIXED 126474

CStack Branch: ARM64 needs push/pop pair macro assembler instructions

https://bugs.webkit.org/show_bug.cgi?id=126474

Summary CStack Branch: ARM64 needs push/pop pair macro assembler instructions

Michael Saboff

Reported 2014-01-03 17:02:57 PST

On the ARM64 processor, register pushes and pops to/from the stack need to be done with in register pairs using the stp (store pair) and ldp (load pair) instructions. These are needed for saving and restoring the frame pointer register and link register in function prologues and epilogues.

Attachments
Patch (14.22 KB, patch) 2014-01-03 17:10 PST, Michael Saboff	mark.lam: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Michael Saboff

Comment 1 2014-01-03 17:10:42 PST

Created attachment 220355 [details] Patch

Mark Lam

Comment 2 2014-01-04 07:02:28 PST

Comment on attachment 220355 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=220355&action=review > Source/JavaScriptCore/ChangeLog:14 > + Verified that we corectly generate the new instructions and that they disassemble typo: “corectly” ==> “correctly”. > Source/JavaScriptCore/assembler/ARM64Assembler.h:872 > + enum MemPairOpSize { > + MemPairOp_32, > + MemPairOp_LoadSigned_64, > + MemPairOp_V64 = 1, > + MemPairOp_64, > + MemPairOp_V128 = 2 > + }; I don’t have access to the instruction encoding spec right now, and so I’m uncertain about this. Can you please confirm if MemPairOp_64 is indeed supposed to have the value 2? I could be wrong, but it seems strange to me that the instruction encoding (as used in loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex()) would have MemPairOp_64 as 2 while MemPairOp_LoadSigned_64 and MemPairOp_V64 are 1. If MemPairOp_64 is supposed to have the value 1 as I suspect, then I suggest expressing the above as: // MemPairOpSize values are computed as (log2(number of bits / 8) - 2)). enum MemPairOpSize { MemPairOp_32 = 0, MemPairOp_LoadSigned_64 = 1, MemPairOp_V64 = 1, MemPairOp_64 = 1, MemPairOp_V128 = 2 }; Spelling out their values explicitly makes it clearer that you intended these enums to have the values of (log2(number of bits / 8) - 2) ... at least clearer than before. I also add the comment indicating that relationship (log2(number of bits / 8) - 2) since the enum is defined here, but the relationship isn’t clear unless we happened to read the implementation of memPairOffsetShift() below. If I’m wrong and MemPairOp_64 is supposed to be 2, then I suggest expressing the enum list as follows: enum MemPairOpSize { MemPairOp_32 = 0, MemPairOp_LoadSigned_64 = 1, MemPairOp_64 = 2, MemPairOp_V64 = 1, MemPairOp_V128 = 2 }; I broke out the V ops from the non-V ops based on "ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD))” which I read in loadStoreRegisterPairPostIndex(). IMHO, I think this would better show the relationship between how these enums increment in value. > Source/JavaScriptCore/assembler/ARM64Assembler.h:887 > + static unsigned memPairOffsetShift(bool V, MemPairOpSize size) I presume by “V”, you mean “isVectorMempairOp”. Can you use a more descriptive name like that if the one I suggested is not appropriate? Or maybe add the "“V” means Vector” comment that you use for loadStoreRegisterPairPostIndex() below. > Source/JavaScriptCore/assembler/ARM64Assembler.h:892 > + // return the log2 of the size in bytes, e.g. 64 bit size returns 3 > + if (V) > + return size + 2; > + return (size >> 1) + 2; If MemPairOp_64 is supposed to be 1 instead of 2 (see comment above regarding the MemPairOpSize enum list), then this should reduce to "return size + 2;”. Otherwise, please disregard this comment. > Source/JavaScriptCore/assembler/ARM64Assembler.h:3488 > + ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) > + { > + ASSERT(size < 3); > + ASSERT(opc == (opc & 1)); // Only load or store, load signed 64 is handled via size. > + ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD)); // There isn't an integer store signed. > + unsigned immedShiftAmount = memPairOffsetShift(V, size); > + int imm7 = immediate >> immedShiftAmount; > + ASSERT((imm7 << immedShiftAmount) == immediate && isInt7(imm7)); > + return (0x29800000 | size << 30 | V << 26 | opc << 22 | (imm7 & 0x7f) << 15 | rt2 << 10 | xOrSp(rn) << 5 | rt); > + } This appears to be identical to loadStoreRegisterPairPostIndex() except for a one bit difference in the instruction opcode. I suggest refactoring these 2 functions to be expressed as follows to minimize the amount of cut and paste: ALWAYS_INLINE static int loadStoreRegisterPair(int opcodeBits, MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) { ... } ALWAYS_INLINE static int loadStoreRegisterPairPostIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) { return loadStoreRegisterPair(0x28800000, size, V, opc, immediate, rn, rt, rt2); } ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) { return loadStoreRegisterPair(0x29800000, size, V, opc, immediate, rn, rt, rt2); } > Source/JavaScriptCore/disassembler/ARM64/A64DOpcode.cpp:186 > + Please remove these empty spaces.

Michael Saboff

Comment 3 2014-01-04 09:39:51 PST

(In reply to comment #2) > (From update of attachment 220355 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=220355&action=review > > > Source/JavaScriptCore/ChangeLog:14 > > + Verified that we corectly generate the new instructions and that they disassemble > > typo: “corectly” ==> “correctly”. I will fix. > > Source/JavaScriptCore/assembler/ARM64Assembler.h:872 > > + enum MemPairOpSize { > > + MemPairOp_32, > > + MemPairOp_LoadSigned_64, > > + MemPairOp_V64 = 1, > > + MemPairOp_64, > > + MemPairOp_V128 = 2 > > + }; > > I don’t have access to the instruction encoding spec right now, and so I’m uncertain about this. Can you please confirm if MemPairOp_64 is indeed supposed to have the value 2? I could be wrong, but it seems strange to me that the instruction encoding (as used in loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex()) would have MemPairOp_64 as 2 while MemPairOp_LoadSigned_64 and MemPairOp_V64 are 1. It is indeed strange, but mostly true. The falsehood is that MemPairOp_LoadSigned_64 should really be MemPairOp_LoadSigned_32 as it loads 32 bit values and then sign extends them. The rest of the names and values match the instruction encodings. Here is a relevant table from the instruction manual (opc is what we call size): Decode fields Instruction Details opc V L 00 0 0 STP — 32-bit 00 0 1 LDP — 32-bit 00 1 0 STP (SIMD&FP) — 32-bit 00 1 1 LDP (SIMD&FP) — 32-bit 01 0 0 UNALLOCATED 01 0 1 LDPSW <= This is Load Pair Signed 01 1 0 STP (SIMD&FP) — 64-bit 01 1 1 LDP (SIMD&FP) — 64-bit 10 0 0 STP — 64-bit 10 0 1 LDP — 64-bit 10 1 0 STP (SIMD&FP) — 128-bit 10 1 1 LDP (SIMD&FP) — 128-bit 11 UNALLOCATED > If MemPairOp_64 is supposed to have the value 1 as I suspect, then I suggest expressing the above as: > > // MemPairOpSize values are computed as (log2(number of bits / 8) - 2)). > enum MemPairOpSize { > MemPairOp_32 = 0, > MemPairOp_LoadSigned_64 = 1, > MemPairOp_V64 = 1, > MemPairOp_64 = 1, > MemPairOp_V128 = 2 > }; > > Spelling out their values explicitly makes it clearer that you intended these enums to have the values of (log2(number of bits / 8) - 2) ... at least clearer than before. I also add the comment indicating that relationship (log2(number of bits / 8) - 2) since the enum is defined here, but the relationship isn’t clear unless we happened to read the implementation of memPairOffsetShift() below. > > If I’m wrong and MemPairOp_64 is supposed to be 2, then I suggest expressing the enum list as follows: > > enum MemPairOpSize { > MemPairOp_32 = 0, > MemPairOp_LoadSigned_64 = 1, > MemPairOp_64 = 2, > > MemPairOp_V64 = 1, > MemPairOp_V128 = 2 > }; I will make these explicit. > I broke out the V ops from the non-V ops based on "ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD))” which I read in loadStoreRegisterPairPostIndex(). IMHO, I think this would better show the relationship between how these enums increment in value. This is complicated a little by MemPairOp_32 being the same for scalar and vector registers. > > Source/JavaScriptCore/assembler/ARM64Assembler.h:887 > > + static unsigned memPairOffsetShift(bool V, MemPairOpSize size) > > I presume by “V”, you mean “isVectorMempairOp”. Can you use a more descriptive name like that if the one I suggested is not appropriate? Or maybe add the "“V” means Vector” comment that you use for loadStoreRegisterPairPostIndex() below. The variable name V is used throughout for the V bit in the instruction encodings. It signifies that the instruction is a vector variant. > > Source/JavaScriptCore/assembler/ARM64Assembler.h:892 > > + // return the log2 of the size in bytes, e.g. 64 bit size returns 3 > > + if (V) > > + return size + 2; > > + return (size >> 1) + 2; > > If MemPairOp_64 is supposed to be 1 instead of 2 (see comment above regarding the MemPairOpSize enum list), then this should reduce to "return size + 2;”. Otherwise, please disregard this comment. > > > Source/JavaScriptCore/assembler/ARM64Assembler.h:3488 > > + ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) > > + { > > + ASSERT(size < 3); > > + ASSERT(opc == (opc & 1)); // Only load or store, load signed 64 is handled via size. > > + ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD)); // There isn't an integer store signed. > > + unsigned immedShiftAmount = memPairOffsetShift(V, size); > > + int imm7 = immediate >> immedShiftAmount; > > + ASSERT((imm7 << immedShiftAmount) == immediate && isInt7(imm7)); > > + return (0x29800000 | size << 30 | V << 26 | opc << 22 | (imm7 & 0x7f) << 15 | rt2 << 10 | xOrSp(rn) << 5 | rt); > > + } > > This appears to be identical to loadStoreRegisterPairPostIndex() except for a one bit difference in the instruction opcode. I suggest refactoring these 2 functions to be expressed as follows to minimize the amount of cut and paste: > > ALWAYS_INLINE static int loadStoreRegisterPair(int opcodeBits, MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) > { > ... > } > > ALWAYS_INLINE static int loadStoreRegisterPairPostIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) > { > return loadStoreRegisterPair(0x28800000, size, V, opc, immediate, rn, rt, rt2); > } > > ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2) > { > return loadStoreRegisterPair(0x29800000, size, V, opc, immediate, rn, rt, rt2); > } The function names loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex() follow the convention in the rest of the file where the function name matches one of the instruction group names. For example there is the group name Load/store register (immediate post-indexed) and the corresponding existing function loadStoreRegisterPostIndex(). I followed the same pattern for the two groups Load/store register pair (post-indexed) and Load/store register pair (pre-indexed). > > Source/JavaScriptCore/disassembler/ARM64/A64DOpcode.cpp:186 > > + > > Please remove these empty spaces. I'll remove them.

Mark Lam

Comment 4 2014-01-04 22:36:14 PST

(In reply to comment #3) > (In reply to comment #2) > > > Source/JavaScriptCore/assembler/ARM64Assembler.h:872 > > > + enum MemPairOpSize { > > > + MemPairOp_32, > > > + MemPairOp_LoadSigned_64, > > > + MemPairOp_V64 = 1, > > > + MemPairOp_64, > > > + MemPairOp_V128 = 2 > > > + }; > > > > I don’t have access to the instruction encoding spec right now, and so I’m uncertain about this. Can you please confirm if MemPairOp_64 is indeed supposed to have the value 2? I could be wrong, but it seems strange to me that the instruction encoding (as used in loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex()) would have MemPairOp_64 as 2 while MemPairOp_LoadSigned_64 and MemPairOp_V64 are 1. > > It is indeed strange, but mostly true. The falsehood is that MemPairOp_LoadSigned_64 should really be MemPairOp_LoadSigned_32 as it loads 32 bit values and then sign extends them. The rest of the names and values match the instruction encodings. Here is a relevant table from the instruction manual (opc is what we call size): > Decode fields Instruction Details > opc V L > 00 0 0 STP — 32-bit > 00 0 1 LDP — 32-bit > 00 1 0 STP (SIMD&FP) — 32-bit > 00 1 1 LDP (SIMD&FP) — 32-bit > 01 0 0 UNALLOCATED > 01 0 1 LDPSW <= This is Load Pair Signed > 01 1 0 STP (SIMD&FP) — 64-bit > 01 1 1 LDP (SIMD&FP) — 64-bit > 10 0 0 STP — 64-bit > 10 0 1 LDP — 64-bit > 10 1 0 STP (SIMD&FP) — 128-bit > 10 1 1 LDP (SIMD&FP) — 128-bit > 11 UNALLOCATED Funky, but it is what it is. > > If I’m wrong and MemPairOp_64 is supposed to be 2, then I suggest expressing the enum list as follows: > > > > enum MemPairOpSize { > > MemPairOp_32 = 0, > > MemPairOp_LoadSigned_64 = 1, > > MemPairOp_64 = 2, > > > > MemPairOp_V64 = 1, > > MemPairOp_V128 = 2 > > }; > > I will make these explicit. > > > I broke out the V ops from the non-V ops based on "ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD))” which I read in loadStoreRegisterPairPostIndex(). IMHO, I think this would better show the relationship between how these enums increment in value. > > This is complicated a little by MemPairOp_32 being the same for scalar and vector registers. In light of the encoding details you provided, I agree that there isn’t an ideal case in terms of grouping. How about: enum MemPairOpSize { MemPairOp_32 = 0, MemPairOp_LoadSigned_64 = 1, MemPairOp_64 = 2, MemPairOp_V32 = MemPairOp_32, MemPairOp_V64 = 1, MemPairOp_V128 = 2 }; That still provides the Vector vs non-Vector distinction while explicitly spelling out that the V32 variant uses the same encoding as the non-Vector 32 bit variant. > The function names loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex() follow the convention in the rest of the file where the function name matches one of the instruction group names. For example there is the group name Load/store register (immediate post-indexed) and the corresponding > existing function loadStoreRegisterPostIndex(). I followed the same pattern for the two groups Load/store register pair (post-indexed) and Load/store register pair (pre-indexed). Even though there is precedence, I still don’t like the fact that we’re opting for cutting and pasting instead of using a common function. However, granted that these functions are highly unlikely to be mutated in the future (given that the hardware spec that they are based on is also high unlikely to change), I can live with it if no one else objects. With that, I’m satisfied enough with this patch. r=me with agreed changes / fixes applied.

Michael Saboff

Comment 5 2014-01-04 23:25:41 PST

Committed r161318: <http://trac.webkit.org/changeset/161318>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component JavaScriptCore

Assignee

Michael Saboff

Reported

2014-01-03 17:02 PST

Modified

2014-01-04 23:25 PST History

CC List

5 users Show

URL

Keywords

Depends on

Blocks