Bug 126474 - CStack Branch: ARM64 needs push/pop pair macro assembler instructions
Summary: CStack Branch: ARM64 needs push/pop pair macro assembler instructions
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-03 17:02 PST by Michael Saboff
Modified: 2014-01-04 23:25 PST (History)
5 users (show)

See Also:


Attachments
Patch (14.22 KB, patch)
2014-01-03 17:10 PST, Michael Saboff
mark.lam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2014-01-03 17:02:57 PST
On the ARM64 processor, register pushes and pops to/from the stack need to be done with in register pairs using the stp (store pair) and ldp (load pair) instructions.  These are needed for saving and restoring the frame pointer register and link register in function prologues and epilogues.
Comment 1 Michael Saboff 2014-01-03 17:10:42 PST
Created attachment 220355 [details]
Patch
Comment 2 Mark Lam 2014-01-04 07:02:28 PST
Comment on attachment 220355 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=220355&action=review

> Source/JavaScriptCore/ChangeLog:14
> +        Verified that we corectly generate the new instructions and that they disassemble

typo: “corectly” ==> “correctly”.

> Source/JavaScriptCore/assembler/ARM64Assembler.h:872
> +    enum MemPairOpSize {
> +        MemPairOp_32,
> +        MemPairOp_LoadSigned_64,
> +        MemPairOp_V64 = 1,
> +        MemPairOp_64,
> +        MemPairOp_V128 = 2
> +    };

I don’t have access to the instruction encoding spec right now, and so I’m uncertain about this.  Can you please confirm if MemPairOp_64 is indeed supposed to have the value 2?  I could be wrong, but it seems strange to me that the instruction encoding (as used in loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex()) would have MemPairOp_64 as 2 while MemPairOp_LoadSigned_64 and MemPairOp_V64 are 1.

If MemPairOp_64 is supposed to have the value 1 as I suspect, then I suggest expressing the above as:

    // MemPairOpSize values are computed as (log2(number of bits / 8) - 2)).
    enum MemPairOpSize {
        MemPairOp_32 = 0,
        MemPairOp_LoadSigned_64 = 1,
        MemPairOp_V64 = 1,
        MemPairOp_64 = 1,
        MemPairOp_V128 = 2
    };

Spelling out their values explicitly makes it clearer that you intended these enums to have the values of (log2(number of bits / 8) - 2) ... at least clearer than before.  I also add the comment indicating that relationship (log2(number of bits / 8) - 2) since the enum is defined here, but the relationship isn’t clear unless we happened to read the implementation of memPairOffsetShift() below.

If I’m wrong and MemPairOp_64 is supposed to be 2, then I suggest expressing the enum list as follows:

    enum MemPairOpSize {
        MemPairOp_32 = 0,
        MemPairOp_LoadSigned_64 = 1,
        MemPairOp_64 = 2,

        MemPairOp_V64 = 1,
        MemPairOp_V128 = 2
    };

I broke out the V ops from the non-V ops based on "ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD))” which I read in loadStoreRegisterPairPostIndex().  IMHO, I think this would better show the relationship between how these enums  increment in value.

> Source/JavaScriptCore/assembler/ARM64Assembler.h:887
> +    static unsigned memPairOffsetShift(bool V, MemPairOpSize size)

I presume by “V”, you mean “isVectorMempairOp”.  Can you use a more descriptive name like that if the one I suggested is not appropriate?  Or maybe add the "“V” means Vector” comment that you use for loadStoreRegisterPairPostIndex() below.

> Source/JavaScriptCore/assembler/ARM64Assembler.h:892
> +        // return the log2 of the size in bytes, e.g. 64 bit size returns 3
> +        if (V)
> +            return size + 2;
> +        return (size >> 1) + 2;

If MemPairOp_64 is supposed to be 1 instead of 2 (see comment above regarding the MemPairOpSize enum list), then this should reduce to "return size + 2;”.  Otherwise, please disregard this comment.

> Source/JavaScriptCore/assembler/ARM64Assembler.h:3488
> +    ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
> +    {
> +        ASSERT(size < 3);
> +        ASSERT(opc == (opc & 1)); // Only load or store, load signed 64 is handled via size.
> +        ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD)); // There isn't an integer store signed.
> +        unsigned immedShiftAmount = memPairOffsetShift(V, size);
> +        int imm7 = immediate >> immedShiftAmount;
> +        ASSERT((imm7 << immedShiftAmount) == immediate && isInt7(imm7));
> +        return (0x29800000 | size << 30 | V << 26 | opc << 22 | (imm7 & 0x7f) << 15 | rt2 << 10 | xOrSp(rn) << 5 | rt);
> +    }

This appears to be identical to loadStoreRegisterPairPostIndex() except for a one bit difference in the instruction opcode.  I suggest refactoring these 2 functions to be expressed as follows to minimize the amount of cut and paste:

ALWAYS_INLINE static int loadStoreRegisterPair(int opcodeBits, MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
{
    ...
}

ALWAYS_INLINE static int loadStoreRegisterPairPostIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
{
    return loadStoreRegisterPair(0x28800000, size, V, opc, immediate, rn, rt, rt2);
}

ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
{
    return loadStoreRegisterPair(0x29800000, size, V, opc, immediate, rn, rt, rt2);
}

> Source/JavaScriptCore/disassembler/ARM64/A64DOpcode.cpp:186
> +    

Please remove these empty spaces.
Comment 3 Michael Saboff 2014-01-04 09:39:51 PST
(In reply to comment #2)
> (From update of attachment 220355 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=220355&action=review
> 
> > Source/JavaScriptCore/ChangeLog:14
> > +        Verified that we corectly generate the new instructions and that they disassemble
> 
> typo: “corectly” ==> “correctly”.

I will fix.

> > Source/JavaScriptCore/assembler/ARM64Assembler.h:872
> > +    enum MemPairOpSize {
> > +        MemPairOp_32,
> > +        MemPairOp_LoadSigned_64,
> > +        MemPairOp_V64 = 1,
> > +        MemPairOp_64,
> > +        MemPairOp_V128 = 2
> > +    };
> 
> I don’t have access to the instruction encoding spec right now, and so I’m uncertain about this.  Can you please confirm if MemPairOp_64 is indeed supposed to have the value 2?  I could be wrong, but it seems strange to me that the instruction encoding (as used in loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex()) would have MemPairOp_64 as 2 while MemPairOp_LoadSigned_64 and MemPairOp_V64 are 1.

It is indeed strange, but mostly true.  The falsehood is that MemPairOp_LoadSigned_64 should really be MemPairOp_LoadSigned_32 as it loads 32 bit values and then sign extends them.  The rest of the names and values match the instruction encodings. Here is a relevant table from the instruction manual (opc is what we call size):
Decode fields	 Instruction Details
opc	V	L
00	0	0	STP — 32-bit
00	0	1	LDP — 32-bit
00	1	0	STP (SIMD&FP) — 32-bit
00	1	1	LDP (SIMD&FP) — 32-bit
01	0	0	UNALLOCATED
01	0	1	LDPSW   <= This is Load Pair Signed
01	1	0	STP (SIMD&FP) — 64-bit
01	1	1	LDP (SIMD&FP) — 64-bit
10	0	0	STP — 64-bit
10	0	1	LDP — 64-bit
10	1	0	STP (SIMD&FP) — 128-bit
10	1	1	LDP (SIMD&FP) — 128-bit
11			UNALLOCATED

> If MemPairOp_64 is supposed to have the value 1 as I suspect, then I suggest expressing the above as:
> 
>     // MemPairOpSize values are computed as (log2(number of bits / 8) - 2)).
>     enum MemPairOpSize {
>         MemPairOp_32 = 0,
>         MemPairOp_LoadSigned_64 = 1,
>         MemPairOp_V64 = 1,
>         MemPairOp_64 = 1,
>         MemPairOp_V128 = 2
>     };
> 
> Spelling out their values explicitly makes it clearer that you intended these enums to have the values of (log2(number of bits / 8) - 2) ... at least clearer than before.  I also add the comment indicating that relationship (log2(number of bits / 8) - 2) since the enum is defined here, but the relationship isn’t clear unless we happened to read the implementation of memPairOffsetShift() below.
> 
> If I’m wrong and MemPairOp_64 is supposed to be 2, then I suggest expressing the enum list as follows:
> 
>     enum MemPairOpSize {
>         MemPairOp_32 = 0,
>         MemPairOp_LoadSigned_64 = 1,
>         MemPairOp_64 = 2,
> 
>         MemPairOp_V64 = 1,
>         MemPairOp_V128 = 2
>     };

I will make these explicit.

> I broke out the V ops from the non-V ops based on "ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD))” which I read in loadStoreRegisterPairPostIndex().  IMHO, I think this would better show the relationship between how these enums  increment in value.

This is complicated a little by MemPairOp_32 being the same for scalar and vector registers.

> > Source/JavaScriptCore/assembler/ARM64Assembler.h:887
> > +    static unsigned memPairOffsetShift(bool V, MemPairOpSize size)
> 
> I presume by “V”, you mean “isVectorMempairOp”.  Can you use a more descriptive name like that if the one I suggested is not appropriate?  Or maybe add the "“V” means Vector” comment that you use for loadStoreRegisterPairPostIndex() below.

The variable name V is used throughout for the V bit in the instruction encodings.  It signifies that the instruction is a vector variant.

> > Source/JavaScriptCore/assembler/ARM64Assembler.h:892
> > +        // return the log2 of the size in bytes, e.g. 64 bit size returns 3
> > +        if (V)
> > +            return size + 2;
> > +        return (size >> 1) + 2;
> 
> If MemPairOp_64 is supposed to be 1 instead of 2 (see comment above regarding the MemPairOpSize enum list), then this should reduce to "return size + 2;”.  Otherwise, please disregard this comment.
> 
> > Source/JavaScriptCore/assembler/ARM64Assembler.h:3488
> > +    ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
> > +    {
> > +        ASSERT(size < 3);
> > +        ASSERT(opc == (opc & 1)); // Only load or store, load signed 64 is handled via size.
> > +        ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD)); // There isn't an integer store signed.
> > +        unsigned immedShiftAmount = memPairOffsetShift(V, size);
> > +        int imm7 = immediate >> immedShiftAmount;
> > +        ASSERT((imm7 << immedShiftAmount) == immediate && isInt7(imm7));
> > +        return (0x29800000 | size << 30 | V << 26 | opc << 22 | (imm7 & 0x7f) << 15 | rt2 << 10 | xOrSp(rn) << 5 | rt);
> > +    }
> 
> This appears to be identical to loadStoreRegisterPairPostIndex() except for a one bit difference in the instruction opcode.  I suggest refactoring these 2 functions to be expressed as follows to minimize the amount of cut and paste:
> 
> ALWAYS_INLINE static int loadStoreRegisterPair(int opcodeBits, MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
> {
>     ...
> }
> 
> ALWAYS_INLINE static int loadStoreRegisterPairPostIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
> {
>     return loadStoreRegisterPair(0x28800000, size, V, opc, immediate, rn, rt, rt2);
> }
> 
> ALWAYS_INLINE static int loadStoreRegisterPairPreIndex(MemPairOpSize size, bool V, MemOp opc, int immediate, RegisterID rn, FPRegisterID rt, FPRegisterID rt2)
> {
>     return loadStoreRegisterPair(0x29800000, size, V, opc, immediate, rn, rt, rt2);
> }

The function names loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex() follow the convention in the rest of the file where the function name matches one of the instruction group names.  For example there is the group name Load/store register (immediate post-indexed) and the corresponding 
existing function loadStoreRegisterPostIndex().  I followed the same pattern for the two groups Load/store register pair (post-indexed) and Load/store register pair (pre-indexed).

> > Source/JavaScriptCore/disassembler/ARM64/A64DOpcode.cpp:186
> > +    
> 
> Please remove these empty spaces.

I'll remove them.
Comment 4 Mark Lam 2014-01-04 22:36:14 PST
(In reply to comment #3)
> (In reply to comment #2)
> > > Source/JavaScriptCore/assembler/ARM64Assembler.h:872
> > > +    enum MemPairOpSize {
> > > +        MemPairOp_32,
> > > +        MemPairOp_LoadSigned_64,
> > > +        MemPairOp_V64 = 1,
> > > +        MemPairOp_64,
> > > +        MemPairOp_V128 = 2
> > > +    };
> > 
> > I don’t have access to the instruction encoding spec right now, and so I’m uncertain about this.  Can you please confirm if MemPairOp_64 is indeed supposed to have the value 2?  I could be wrong, but it seems strange to me that the instruction encoding (as used in loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex()) would have MemPairOp_64 as 2 while MemPairOp_LoadSigned_64 and MemPairOp_V64 are 1.
> 
> It is indeed strange, but mostly true.  The falsehood is that MemPairOp_LoadSigned_64 should really be MemPairOp_LoadSigned_32 as it loads 32 bit values and then sign extends them.  The rest of the names and values match the instruction encodings. Here is a relevant table from the instruction manual (opc is what we call size):
> Decode fields     Instruction Details
> opc    V    L
> 00    0    0    STP — 32-bit
> 00    0    1    LDP — 32-bit
> 00    1    0    STP (SIMD&FP) — 32-bit
> 00    1    1    LDP (SIMD&FP) — 32-bit
> 01    0    0    UNALLOCATED
> 01    0    1    LDPSW   <= This is Load Pair Signed
> 01    1    0    STP (SIMD&FP) — 64-bit
> 01    1    1    LDP (SIMD&FP) — 64-bit
> 10    0    0    STP — 64-bit
> 10    0    1    LDP — 64-bit
> 10    1    0    STP (SIMD&FP) — 128-bit
> 10    1    1    LDP (SIMD&FP) — 128-bit
> 11            UNALLOCATED

Funky, but it is what it is.

> > If I’m wrong and MemPairOp_64 is supposed to be 2, then I suggest expressing the enum list as follows:
> > 
> >     enum MemPairOpSize {
> >         MemPairOp_32 = 0,
> >         MemPairOp_LoadSigned_64 = 1,
> >         MemPairOp_64 = 2,
> > 
> >         MemPairOp_V64 = 1,
> >         MemPairOp_V128 = 2
> >     };
> 
> I will make these explicit.
> 
> > I broke out the V ops from the non-V ops based on "ASSERT(V || (size != MemPairOp_LoadSigned_64) || (opc == MemOp_LOAD))” which I read in loadStoreRegisterPairPostIndex().  IMHO, I think this would better show the relationship between how these enums  increment in value.
> 
> This is complicated a little by MemPairOp_32 being the same for scalar and vector registers.

In light of the encoding details you provided, I agree that there isn’t an ideal case in terms of grouping.  How about:

    enum MemPairOpSize {
        MemPairOp_32 = 0,
        MemPairOp_LoadSigned_64 = 1,
        MemPairOp_64 = 2,

        MemPairOp_V32 = MemPairOp_32,
        MemPairOp_V64 = 1,
        MemPairOp_V128 = 2
    };

That still provides the Vector vs non-Vector distinction while explicitly spelling out that the V32 variant uses the same encoding as the non-Vector 32 bit variant.

> The function names loadStoreRegisterPairPostIndex() and loadStoreRegisterPairPreIndex() follow the convention in the rest of the file where the function name matches one of the instruction group names.  For example there is the group name Load/store register (immediate post-indexed) and the corresponding 
> existing function loadStoreRegisterPostIndex().  I followed the same pattern for the two groups Load/store register pair (post-indexed) and Load/store register pair (pre-indexed).

Even though there is precedence, I still don’t like the fact that we’re opting for cutting and pasting instead of using a common function.  However, granted that these functions are highly unlikely to be mutated in the future (given that the hardware spec that they are based on is also high unlikely to change), I can live with it if no one else objects.

With that, I’m satisfied enough with this patch.  r=me with agreed changes / fixes applied.
Comment 5 Michael Saboff 2014-01-04 23:25:41 PST
Committed r161318: <http://trac.webkit.org/changeset/161318>