Bug 170215

Summary: WebAssembly: Air::Inst::generate crashes on large binary on A64
Product: WebKit Reporter: JF Bastien <jfbastien>
Component: JavaScriptCoreAssignee: JF Bastien <jfbastien>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, fpizlo, jfbastien, keith_miller, mark.lam, msaboff, saam, zan
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 171278    
Bug Blocks: 159775, 171387, 171799    
Attachments:
Description Flags
Air dump
none
Air dump of the function at O1
none
quick hack
jfbastien: review-, jfbastien: commit-queue-
patch
fpizlo: review-
path
none
patch
none
patch none

Description JF Bastien 2017-03-28 17:07:11 PDT
I've got a large binary that crashes on A64 when generating move -> loadPtr -> load64 because getCachedMemoryTempRegisterIDAndInvalidate is sad. We definitely fail to say we'll use a scratch register, and yet we try to sneak a scratch register usage in there.

My current theory is that we're dumb about huge stack frames, and assume they never have giant immediate. This code will not like huge frames if we don't give it a scratch register:

    void load64(ImplicitAddress address, RegisterID dest)
    {
        if (tryLoadWithOffset<64>(dest, address.base, address.offset))
            return;

        signExtend32ToPtr(TrustedImm32(address.offset), getCachedMemoryTempRegisterIDAndInvalidate());
        m_assembler.ldr<64>(dest, address.base, memoryTempRegister);
    }

It could be something else than a huge frame, in any case it's something with a huge immediate.
Comment 1 JF Bastien 2017-03-28 17:08:16 PDT
Created attachment 305677 [details]
Air dump

Here's the Air dump for that one function from the huge wasm program.
Comment 2 JF Bastien 2017-03-28 17:13:13 PDT
The conditions that need to fail are:


ALWAYS_INLINE bool isInt9(int32_t value)
{
    return value == ((value << 23) >> 23);
}

and:

template<int datasize>
ALWAYS_INLINE bool isValidScaledUImm12(int32_t offset)
{
    int32_t maxPImm = 4095 * (datasize / 8);
    if (offset < 0)
        return false;
    if (offset > maxPImm)
        return false;
    if (offset & ((datasize / 8) - 1))
        return false;
    return true;
}

with datasize = 64.
Comment 3 JF Bastien 2017-04-25 09:24:30 PDT
Back at this... Function #94671 is the problem.

The problem is with:
    move Address(31, 1380) -> Address(31, 2008), scratch: 0

1380 isn't a valid A64 offset. Won't fit a scaled 12-bit immediate.

The first occurrence of 1380 is after allocateStack (that’s in the log, before Fil split this for O1/O2):
    Move32 %r0, 1380(%sp), @1845

After Fil’s change this occurs even with JSC_webAssemblyB3OptimizationLevel=1, but on an Add64 generated because of addPtr. The frame size is then 4688.

So both allocateRegistersAndStackByLinearScan and allocateStackByGraphColoring can generate huge frames, and then lowerStackArgs derps too-big immediate offsets and A64 is sad. Luckily there’s a FIXME for that:
    // FIXME: This may produce addresses that aren't valid if we end up with a ginormous stack frame.
    // We would have to scavenge for temporaries if this happened. Fortunately, this case will be
    // extremely rare so we can do crazy things when it arises.
    // https://bugs.webkit.org/show_bug.cgi?id=152530

Is scavenging even worth it? Could we detect, at O1 and O2, that the frame was bigger than valid immediate encodings, and then in that rare case re-allocate with the macro scratch reserved? It'll be ugly, but big functions call for dumb outcomes...
Comment 4 JF Bastien 2017-04-25 09:25:01 PDT
Created attachment 308106 [details]
Air dump of the function at O1
Comment 5 Filip Pizlo 2017-04-25 10:17:09 PDT
I wonder if there would be any perf regression to unconditionally reserving a register during RA for this purpose.  We'd only do this on ARM64, which has a ton of registers.  I'd be surprised if you could measure a perf regression from deducting one of them.
Comment 6 JF Bastien 2017-04-25 13:22:16 PDT
(In reply to Filip Pizlo from comment #5)
> I wonder if there would be any perf regression to unconditionally reserving
> a register during RA for this purpose.  We'd only do this on ARM64, which
> has a ton of registers.  I'd be surprised if you could measure a perf
> regression from deducting one of them.

I'll try this out first and will report with benchmark results on ARM, for JS as well as wasm.


Earlier Fil suggested the following, which I'll leave here if we need to revisit:

----

  1) in the loop where it currently does the lowering, if you get a null stackAddr, set aside some information about the Arg* or Inst* that has the problem
  2) if after that loop there are one or more things that had problems, run RegLiveness
  3) use RegLiveness to compute a mapping from Insts that have problems to the RegisterSet of live registers that that inst
  4) loop over the code again, and now when you encounter an unlowered address you'll have a RegisterSet of live registers available if no register is live, you'll have to pick some register that this instruction does not use and spill it.

Also, we should have some smarts for stackmaps/patchpoints. If the stack address is meant to be an argument to a stackmap, then we don't need it to be isValidForm
The point is that it's unnecessary for lowerStackAddrs to do the isValidForm check in certain cases.  stackmaps and patchpoints are examples of this. Currently there is no way to communicate this information.

----

I mentioned re-running regalloc in this case. I thought we didn't need to copy anything, but Fil says we'd have to copy the entire Code. That may be cheap, and could be better than the suggestion above.
Comment 7 JF Bastien 2017-04-26 09:14:31 PDT
Created attachment 308255 [details]
quick hack

Here's a hack which fixes part of the issue by reserving x15 on ARM64. I'm not super confident about:
 - The reservation occurs in reservedHardwareRegisters. Is that weird?
 - Not giving it another name when I use it for this purpose.
 - How far away stackAddrImpl is from AirLowerStackArgs.cpp, yet they're now intertwined (if the former fails, the later does similar work on top).
 - I don't think the #else clause needs to be there anymore? x86 simply can't fail, right?

Note that x15 is only explicitly used in the ExecutableAllocator's jitWriteThunkGenerator, and this is a fine place to use it AFAICT.

There's another bug at O1, very similar but this time in add64. It's trying to materialize the frame size, it's too big, and the same assert about scratch register fires. I assume it's in the same post-RA code area. I'll fix in this bug as well.
Comment 8 JF Bastien 2017-04-26 09:25:20 PDT
During lowerStackArgs the code generated with a huge frame will transform a bunch of:

    Move (spill337), (spill201), %r0, @8735

Into something like this:

    Move 1416(%sp), 2016(%sp), %r0, @8735
    Move $1404, %r15, @8736
    Add64 %sp, %r15, @8736
    Move (%r15), 2032(%sp), %r0, @8736
    Move 1400(%sp), 2024(%sp), %r0, @8737
    Move 16(%r15), 2160(%sp), %r0, @8738
    Move 360(%r15), 2280(%sp), %r0, @8739
    Move 1376(%sp), 2080(%sp), %r0, @8740
    Move 24(%r15), 2104(%sp), %r0, @8741
    Move 1040(%sp), -208(%fp), %r0, @8742
    Move 1360(%sp), 2200(%sp), %r0, @8743
    Move 8(%r15), 2048(%sp), %r0, @8744
    Move 1048(%sp), -248(%fp), %r0, @8745
    Move 1328(%sp), -200(%fp), %r0, @8746
    Move 1392(%sp), 2176(%sp), %r0, @8747
    Move 1384(%sp), 2088(%sp), %r0, @8748
    Move 32(%r15), 2096(%sp), %r0, @8749
    Move 1344(%sp), 2328(%sp), %r0, @8750
    Move 40(%r15), 2120(%sp), %r0, @8751
    Move 56(%r15), 2240(%sp), %r0, @8752
    Move 1312(%sp), -216(%fp), %r0, @8753
    Move 1320(%sp), -232(%fp), %r0, @8754
    Move 1336(%sp), 2248(%sp), %r0, @8755
    Move 64(%r15), 2224(%sp), %r0, @8756
    Move32 1356(%sp), %r1, @8757

Note that r15 is only set once, and we index off it afterwards (unless it too is out of range, then we generate a new r15).
Comment 9 Michael Saboff 2017-04-26 09:42:35 PDT
Comment on attachment 308255 [details]
quick hack

View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review

This looks sound.

> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:69
> +        // if what's already in a register will do, instead just offset from the latest offset.

Maybe it would be a little clearer to say "offset from the prior offset".

> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:103
> +                        dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");

Why put the error message twice?
Comment 10 JF Bastien 2017-04-26 09:53:04 PDT
Comment on attachment 308255 [details]
quick hack

View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review

>> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:103
>> +                        dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");
> 
> Why put the error message twice?

That's how the code was before. It's kinda nice: dataLog(code) can spew a BUNCH of stuff, so having it twice is convenient.

I'm wondering if that entire #else is dead though: it'll never happen on ARM64 with my change, and IIUC it already couldn't happen on x86. So maybe I can just delete it (and leave the RELEASE_ASSERT_NOT_REACHED).
Comment 11 Michael Saboff 2017-04-26 09:54:57 PDT
Comment on attachment 308255 [details]
quick hack

View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review

>>> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:103
>>> +                        dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n");
>> 
>> Why put the error message twice?
> 
> That's how the code was before. It's kinda nice: dataLog(code) can spew a BUNCH of stuff, so having it twice is convenient.
> 
> I'm wondering if that entire #else is dead though: it'll never happen on ARM64 with my change, and IIUC it already couldn't happen on x86. So maybe I can just delete it (and leave the RELEASE_ASSERT_NOT_REACHED).

I'm fine eliminating the #else and replacing with a RELEASE_ASSERT_NOT_REACHED.
Comment 12 Filip Pizlo 2017-04-26 10:28:03 PDT
Comment on attachment 308255 [details]
quick hack

View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review

> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:70
> +        int32_t lastLargeOffsetSP = 0;

There are a lot of things wrong with how you do this optimization.  For starters, you're relying on x15 not being clobbered ever, which is totally wrong.  I recommend removing this optimization.

> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:81
> +                        Air::Tmp tmp = Air::Tmp(ARM64Registers::x15);

You should define x15 as a scratch register somewhere.  If possible, you should use the address scratch register for this.  You should not refer to the regsiter name directly.

> Source/JavaScriptCore/jit/RegisterSet.cpp:49
> -    return RegisterSet(ARM64Registers::x18, ARM64Registers::lr);
> +    return RegisterSet(ARM64Registers::x15, ARM64Registers::x18, ARM64Registers::lr);

This is not how we reserve registers.

I think that you want to pin x18.  lowerStackArgs *must* assume that x18 can get clobbered.
Comment 13 Saam Barati 2017-04-26 12:43:43 PDT
Why don't we need this for x86?
Comment 14 JF Bastien 2017-04-26 13:48:18 PDT
(In reply to Saam Barati from comment #13)
> Why don't we need this for x86?

Unless I'm wrong, x86 can encode any offset from SP when performing spills.
Comment 15 Filip Pizlo 2017-04-26 13:49:31 PDT
(In reply to JF Bastien from comment #14)
> (In reply to Saam Barati from comment #13)
> > Why don't we need this for x86?
> 
> Unless I'm wrong, x86 can encode any offset from SP when performing spills.

Correct.  isValidForm always returns true on x86.
Comment 16 Saam Barati 2017-04-26 14:50:46 PDT
(In reply to JF Bastien from comment #14)
> (In reply to Saam Barati from comment #13)
> > Why don't we need this for x86?
> 
> Unless I'm wrong, x86 can encode any offset from SP when performing spills.

πŸ‘πŸΎ
Comment 17 JF Bastien 2017-04-28 16:54:23 PDT
Created attachment 308618 [details]
patch

Here's a cleaned up patch. It passes all the JSC tests locally, and the huge binary isn't crashing on ARM64 anymore (or at least, not for that reason).
Comment 18 Filip Pizlo 2017-05-01 10:02:57 PDT
Comment on attachment 308618 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=308618&action=review

> Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:297
> +    void add64(TrustedImm32 imm, RegisterID src, RegisterID dest, RegisterID scratch)
> +    {
> +        if (isUInt12(imm.m_value)) {
> +            m_assembler.add<64>(dest, src, UInt12(imm.m_value));
> +            return;
> +        }
> +        if (isUInt12(-imm.m_value)) {
> +            m_assembler.sub<64>(dest, src, UInt12(-imm.m_value));
> +            return;
> +        }
> +
> +        signExtend32ToPtr(imm, scratch);
> +        m_assembler.add<64>(dest, src, scratch);
> +    }
> +

This method is completely unnecessary.  It's anti-idiomatic in a number of crucial ways:

- MacroAssembler methods shouldn't take explicit scratch registers.  Anything that high-level belongs in AssemblyHelpers.  The only exception are methods we added to support pseudo-opcodes in Air.  This isn't an Air opcode (nor should it be - in Air we would have wanted to reveal the signExtend32ToPtr as a separate Inst).

- It's a copy-paste of the other add64() functions.  It would be a lot better to consolidate.

- There is no need for this function to exist, since the only caller of this function could just call the other add64() that uses the data temp for the immediate.

> Source/JavaScriptCore/b3/air/AirGenerate.cpp:221
> +                jit.addPtr(CCallHelpers::TrustedImm32(-code.frameSize()), MacroAssembler::stackPointerRegister, MacroAssembler::stackPointerRegister, *pinnedExtendedOffsetAddrRegister());

I think it would be a lot better if you just inlined that stuff from addPtr here, or if you just called the normal addPtr.  Note that you can AllowMacroScratchRegisterUsage around this addPtr because you're in the prologue and so the scratch registers are definitely available (macro scratches are not callee-save).
Comment 19 JF Bastien 2017-05-02 02:08:58 PDT
Created attachment 308810 [details]
path

Address Fil's comments.

There's a new crash which I think comes after my latest rebase:

ASSERTION FAILED: m_allowScratchRegister
/Source/JavaScriptCore/assembler/MacroAssemblerARM64.h(3830) : RegisterID JSC::MacroAssemblerARM64::getCachedDataTempRegisterIDAndInvalidate()
1   0x105771280 WTFCrash
2   0x1045fa30c JSC::MacroAssemblerARM64::getCachedDataTempRegisterIDAndInvalidate()
3   0x1045f6f0c JSC::MacroAssemblerARM64::call(JSC::AbstractMacroAssembler<JSC::ARM64Assembler, JSC::MacroAssemblerARM64>::Address)
4   0x1045f6cb0 JSC::B3::Air::CCallSpecial::generate(JSC::B3::Air::Inst&, JSC::CCallHelpers&, JSC::B3::Air::GenerationContext&)
5   0x10468ecb0 JSC::B3::Air::PatchCustom::generate(JSC::B3::Air::Inst&, JSC::CCallHelpers&, JSC::B3::Air::GenerationContext&)
6   0x104686c94 JSC::B3::Air::Inst::generate(JSC::CCallHelpers&, JSC::B3::Air::GenerationContext&)
7   0x10463d228 JSC::B3::Air::generate(JSC::B3::Air::Code&, JSC::CCallHelpers&)
8   0x104765400 JSC::B3::generate(JSC::B3::Procedure&, JSC::CCallHelpers&)
9   0x105625368 JSC::Wasm::parseAndCompile(JSC::Wasm::CompilationContext&, unsigned char const*, unsigned long, JSC::Wasm::Signature const&, WTF::Vector<JSC::Wasm::UnlinkedWasmToWasmCall, 0ul, WTF::CrashOnOverflow, 16ul>&, JSC::Wasm::ModuleInformation const&, JSC::Wasm::MemoryMode, JSC::Wasm::CompilationMode, unsigned int, JSC::Wasm::TierUpCount*)
10  0x1047294f4 JSC::Wasm::BBQPlan::compileFunctions(JSC::Wasm::Plan::CompilationEffort)


I think it's similar to what I'm fixing here, but may be separate and fixable as a follow-up. I'll send repro instructions.
Comment 20 JF Bastien 2017-05-02 02:11:29 PDT
This new crash is weird, because I don't expect call+address to be anything special like huge offsets are. They're just something we do... Nothing particular about huge binaries / functions.
Comment 21 Saam Barati 2017-05-02 09:18:46 PDT
Maybe related to:
https://bugs.webkit.org/show_bug.cgi?id=170672
Comment 22 Saam Barati 2017-05-02 22:29:37 PDT
Comment on attachment 308810 [details]
path

View in context: https://bugs.webkit.org/attachment.cgi?id=308810&action=review

quick comments. Need to keep reading.

> Source/JavaScriptCore/b3/B3Common.cpp:76
> +    return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);

Why the +?

> Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp:128
> +bool CCallSpecial::admitsExtendedOffsetAddr(Inst&, unsigned argIndex)
> +{
> +    // The callee can be on the stack.
> +    if (argIndex == calleeArgOffset)
> +        return true;
> +
> +    return false;
> +}

I think this is outdated w.r.t ToT, where this should be false on ARM for the callee.

> Source/JavaScriptCore/b3/air/AirCCallSpecial.h:64
> +    void dumpImpl(PrintStream&) const final;

I think it's webkit style to just use override even if it's final, but I'm not 100% sure

> Source/JavaScriptCore/b3/air/AirGenerate.cpp:221
> +                AllowMacroScratchRegisterUsage allowScratch(jit);

Shouldn't this be AllowScratchIf(arm64())?
Comment 23 JF Bastien 2017-05-03 00:01:25 PDT
Created attachment 308890 [details]
patch

> > Source/JavaScriptCore/b3/B3Common.cpp:76
> > +    return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);
> 
> Why the +?

You found the πŸ’Ž gem πŸ’Ž hidden in this patch! πŸŽ‰

dataTempRegister in the header is actually a *declaration* and not a *definition*. Even adding constexpr (as I did) doesn't make it a definition. This causes the linker to crap itself on missing dataTempRegister  symbol. Sad indeed! Why does it work elsewhere? ✨ The magic of C++ ✨ One fix is to switch to C++17 and use inline variables, another is to just declare dataTempRegister in the header and then add a definition in the .cpp file which is super ugly and **is deprecated in C++17** in favor of inline variables. But the other fix... the beautiful fix... is to turn the use of dataTempRegister into an rvalue! This beautiful usage of unary plus magically makes the problem go away because now we're not asking the linker to get this value from oh-so-far away it would surely need a relocation! No, we're just rvalue-ing, nothing to see here linker. So there you have it, unary plus isn't totally useless. It should be useless, but here it's useful in compensating for something ridiculous.

Fun related fact! Unary plus is also valid (and useful!) for C++ lambdas:
  +[](){}


#cplusplus #awesome #hashtag


> > Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp:128
> > +bool CCallSpecial::admitsExtendedOffsetAddr(Inst&, unsigned argIndex)
> > +{
> > +    // The callee can be on the stack.
> > +    if (argIndex == calleeArgOffset)
> > +        return true;
> > +
> > +    return false;
> > +}
> 
> I think this is outdated w.r.t ToT, where this should be false on ARM for
> the callee.

Ah right, fixed. I rebased and the issue is indeed gone!


> > Source/JavaScriptCore/b3/air/AirCCallSpecial.h:64
> > +    void dumpImpl(PrintStream&) const final;
> 
> I think it's webkit style to just use override even if it's final, but I'm
> not 100% sure

Seems super weird. I added it because, reading the code, this wasn't obvious to me. There's a tiny optimization argument to be made too, but it coms with the world's tiniest violin for those missed cycles.


> > Source/JavaScriptCore/b3/air/AirGenerate.cpp:221
> > +                AllowMacroScratchRegisterUsage allowScratch(jit);
> 
> Shouldn't this be AllowScratchIf(arm64())?

Yeah that's good, done.
Comment 24 Saam Barati 2017-05-03 19:23:53 PDT
(In reply to JF Bastien from comment #23)
> Created attachment 308890 [details]
> patch
> 
> > > Source/JavaScriptCore/b3/B3Common.cpp:76
> > > +    return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);
> > 
> > Why the +?
> 
> You found the πŸ’Ž gem πŸ’Ž hidden in this patch! πŸŽ‰
> 
> dataTempRegister in the header is actually a *declaration* and not a
> *definition*. Even adding constexpr (as I did) doesn't make it a definition.
> This causes the linker to crap itself on missing dataTempRegister  symbol.
> Sad indeed! Why does it work elsewhere? ✨ The magic of C++ ✨ One fix is to
> switch to C++17 and use inline variables, another is to just declare
> dataTempRegister in the header and then add a definition in the .cpp file
> which is super ugly and **is deprecated in C++17** in favor of inline
> variables. But the other fix... the beautiful fix... is to turn the use of
> dataTempRegister into an rvalue! This beautiful usage of unary plus
> magically makes the problem go away because now we're not asking the linker
> to get this value from oh-so-far away it would surely need a relocation! No,
> we're just rvalue-ing, nothing to see here linker. So there you have it,
> unary plus isn't totally useless. It should be useless, but here it's useful
> in compensating for something ridiculous. 
I don't get why constexpr wouldn't fix this.
What about a "using" declaration?

> Fun related fact! Unary plus is also valid (and useful!) for C++ lambdas:
>   +[](){}
> 
> 
> #cplusplus #awesome #hashtag
> 
> 
> > > Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp:128
> > > +bool CCallSpecial::admitsExtendedOffsetAddr(Inst&, unsigned argIndex)
> > > +{
> > > +    // The callee can be on the stack.
> > > +    if (argIndex == calleeArgOffset)
> > > +        return true;
> > > +
> > > +    return false;
> > > +}
> > 
> > I think this is outdated w.r.t ToT, where this should be false on ARM for
> > the callee.
> 
> Ah right, fixed. I rebased and the issue is indeed gone!
> 
> 
> > > Source/JavaScriptCore/b3/air/AirCCallSpecial.h:64
> > > +    void dumpImpl(PrintStream&) const final;
> > 
> > I think it's webkit style to just use override even if it's final, but I'm
> > not 100% sure
> 
> Seems super weird. I added it because, reading the code, this wasn't obvious
> to me. There's a tiny optimization argument to be made too, but it coms with
> the world's tiniest violin for those missed cycles.
> 
> 
> > > Source/JavaScriptCore/b3/air/AirGenerate.cpp:221
> > > +                AllowMacroScratchRegisterUsage allowScratch(jit);
> > 
> > Shouldn't this be AllowScratchIf(arm64())?
> 
> Yeah that's good, done.
Comment 25 Saam Barati 2017-05-03 19:47:40 PDT
Comment on attachment 308890 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=308890&action=review

A few questions/comments, but I think Filip will be the best reviewer for this.

> Source/JavaScriptCore/ChangeLog:21
> +        We now unconditionally pin the dataTempRegister on ARM64, and use
> +        it when immediates don't fit.

Have you measured perf on WasmBench?

> Source/JavaScriptCore/ChangeLog:38
> +        Number 2. is more complex: not all Inst can receive a stack
> +        argument whose base register isn't SP or FP. Specifically,
> +        Patchpoints and Stackmaps get very sad because they just want to
> +        know the offset value, but when we materialize the offset as
> +        follows:
> +
> +            Move (spill337), (spill201), %r0, @8735
> +
> +        Becomes (where %r16 is dataTempRegister):
> +            Move $1404, %r16, @8736
> +            Add64 %sp, %r16, @8736
> +            Move (%r16), 2032(%sp), %r0, @8736
> +

This example confuses me. Why can't we encoded 1404 but we can encode 2032?

> Source/JavaScriptCore/ChangeLog:44
> +        generate them, otherwise we generate Addr as shown above.

What if they expect the base of the address to be SP/FP and they don't accept ExtendedOffsetAddr?

> Source/JavaScriptCore/b3/B3CheckSpecial.cpp:147
> +    return admitsStackImpl(numB3Args(inst), m_numCheckArgs + 1, inst, argIndex);

Why admitsStackImpl here? Maybe  the function needs a new name?

> Source/JavaScriptCore/b3/B3PatchpointSpecial.cpp:136
> +bool PatchpointSpecial::admitsExtendedOffsetAddr(Inst&, unsigned)
> +{
> +    return true;
> +}

Why is this just true, but admitsStack above is not unconditional? Seems wrong to me, especially given the above checks if things need registers, etc.
Comment 26 Saam Barati 2017-05-03 19:48:29 PDT
Also, can you please add some tests for this?

It seems like you could trigger this code inside testb3 either with some carefully crafted code, or perhaps just using O0.
Comment 27 JF Bastien 2017-05-03 23:59:19 PDT
Created attachment 309022 [details]
patch

> > Source/JavaScriptCore/ChangeLog:21
> > +        We now unconditionally pin the dataTempRegister on ARM64, and use
> > +        it when immediates don't fit.
> 
> Have you measured perf on WasmBench?

WasmBench crashes without my patch :(
Works fine with my patch... but the numbers are meaningless on their own!

TitzerBench is perf neutral before / after.


> > Source/JavaScriptCore/ChangeLog:38
> > +        Number 2. is more complex: not all Inst can receive a stack
> > +        argument whose base register isn't SP or FP. Specifically,
> > +        Patchpoints and Stackmaps get very sad because they just want to
> > +        know the offset value, but when we materialize the offset as
> > +        follows:
> > +
> > +            Move (spill337), (spill201), %r0, @8735
> > +
> > +        Becomes (where %r16 is dataTempRegister):
> > +            Move $1404, %r16, @8736
> > +            Add64 %sp, %r16, @8736
> > +            Move (%r16), 2032(%sp), %r0, @8736
> > +
> 
> This example confuses me. Why can't we encoded 1404 but we can encode 2032?

Because of isValidScaledUImm12, this clause: offset & ((datasize / 8) - 1)


> > Source/JavaScriptCore/ChangeLog:44
> > +        generate them, otherwise we generate Addr as shown above.
> 
> What if they expect the base of the address to be SP/FP and they don't
> accept ExtendedOffsetAddr?

Then it shouldn't be a ExtendedOffsetAddr? I'm not sure I understand.


> > Source/JavaScriptCore/b3/B3CheckSpecial.cpp:147
> > +    return admitsStackImpl(numB3Args(inst), m_numCheckArgs + 1, inst, argIndex);
> 
> Why admitsStackImpl here? Maybe  the function needs a new name?

My mental model may be wrong, but my thinking was: if it admits a stack value then it could maybe admit an extended address. This case first disallows extended address for hidden branches, but lets others through as admitsStack would.

Maybe I should just call admitsStack here?


> > Source/JavaScriptCore/b3/B3PatchpointSpecial.cpp:136
> > +bool PatchpointSpecial::admitsExtendedOffsetAddr(Inst&, unsigned)
> > +{
> > +    return true;
> > +}
> 
> Why is this just true, but admitsStack above is not unconditional? Seems
> wrong to me, especially given the above checks if things need registers, etc.

Right, you're expressing my mental model above... My understanding was that I could relax it here because Patchpoint won't generate an Addr for a register. But re-reading the code that seems wrong because admitsStackImpl on Void origin does more than this.

So I'll make it just call admitsStack.

(In reply to Saam Barati from comment #26)
> Also, can you please add some tests for this?
> 
> It seems like you could trigger this code inside testb3 either with some
> carefully crafted code, or perhaps just using O0.

The code does trigger, even on x86! It's not specific to huge frames on purpose, so it's not a corner case. Extended address is there super often.


(In reply to Saam Barati from comment #24)
> (In reply to JF Bastien from comment #23)
> > Created attachment 308890 [details]
> > patch
> > 
> > > > Source/JavaScriptCore/b3/B3Common.cpp:76
> > > > +    return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);
> > > 
> > > Why the +?
> > 
> > You found the πŸ’Ž gem πŸ’Ž hidden in this patch! πŸŽ‰
> > 
> > dataTempRegister in the header is actually a *declaration* and not a
> > *definition*. Even adding constexpr (as I did) doesn't make it a definition.
> > This causes the linker to crap itself on missing dataTempRegister  symbol.
> > Sad indeed! Why does it work elsewhere? ✨ The magic of C++ ✨ One fix is to
> > switch to C++17 and use inline variables, another is to just declare
> > dataTempRegister in the header and then add a definition in the .cpp file
> > which is super ugly and **is deprecated in C++17** in favor of inline
> > variables. But the other fix... the beautiful fix... is to turn the use of
> > dataTempRegister into an rvalue! This beautiful usage of unary plus
> > magically makes the problem go away because now we're not asking the linker
> > to get this value from oh-so-far away it would surely need a relocation! No,
> > we're just rvalue-ing, nothing to see here linker. So there you have it,
> > unary plus isn't totally useless. It should be useless, but here it's useful
> > in compensating for something ridiculous. 
> I don't get why constexpr wouldn't fix this.
> What about a "using" declaration?

See last page of http://wg21.link/P0386R2
It's a declaration, not a definition, until C++17.
Comment 28 Zan Dobersek 2017-05-04 03:46:39 PDT
*** Bug 171563 has been marked as a duplicate of this bug. ***
Comment 29 Filip Pizlo 2017-05-05 20:28:32 PDT
Comment on attachment 309022 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=309022&action=review

This is cool!  I like how this turned out.  Sorry I didn't get around to reviewing it sooner.

> Source/JavaScriptCore/b3/B3Common.cpp:76
> +    return static_cast<GPRReg>(+MacroAssembler::dataTempRegister);

Why isn't dataTempRegister a GPRReg already?  I think that GPRReg is a typedef for MacroAssembler::RegisterID.

> Source/JavaScriptCore/b3/air/AirCustom.h:90
> +    static bool admitsExtendedOffsetAddr(Inst& inst, unsigned argIndex)
> +    {
> +        if (!argIndex)
> +            return false;
> +        return inst.args[0].special()->admitsExtendedOffsetAddr(inst, argIndex);
> +    }
> +

You don't need to thread this through AirCustom if it's only for Patch.  You could have Inst just assert that it's a Patch, or check if it is and return false if it's not, and then do the special()->admintsBlahBlah thing directly.

However, this isn't bad.  I'll allow it. ;-)

> Source/JavaScriptCore/b3/air/opcode_generator.rb:232
> -    token =~ /\A((Tmp)|(Imm)|(BigImm)|(BitImm)|(BitImm64)|(SimpleAddr)|(Addr)|(Index)|(RelCond)|(ResCond)|(DoubleCond)|(StatusCond))\Z/
> +    token =~ /\A((Tmp)|(Imm)|(BigImm)|(BitImm)|(BitImm64)|(SimpleAddr)|(Addr)|(ExtendedOffsetAddr)|(Index)|(RelCond)|(ResCond)|(DoubleCond)|(StatusCond))\Z/

Again, don't really need to teach opcode_generator about it if it's just for Patch.

But, this kind of generality isn't a bad thing and there's precedent for making things a bit more general than strictly necessary, so it's OK to do it this way.
Comment 30 Filip Pizlo 2017-05-05 20:43:16 PDT
Comment on attachment 309022 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=309022&action=review

> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:80
> +                        if (isPatch && inst.admitsExtendedOffsetAddr(arg)) {

I would drop the isPatch check
Comment 31 WebKit Commit Bot 2017-05-05 20:57:46 PDT
Comment on attachment 309022 [details]
patch

Clearing flags on attachment: 309022

Committed r216306: <http://trac.webkit.org/changeset/216306>
Comment 32 WebKit Commit Bot 2017-05-05 20:57:48 PDT
All reviewed patches have been landed.  Closing bug.