RESOLVED FIXED Bug 170215
WebAssembly: Air::Inst::generate crashes on large binary on A64
https://bugs.webkit.org/show_bug.cgi?id=170215
Summary WebAssembly: Air::Inst::generate crashes on large binary on A64
JF Bastien
Reported 2017-03-28 17:07:11 PDT
I've got a large binary that crashes on A64 when generating move -> loadPtr -> load64 because getCachedMemoryTempRegisterIDAndInvalidate is sad. We definitely fail to say we'll use a scratch register, and yet we try to sneak a scratch register usage in there. My current theory is that we're dumb about huge stack frames, and assume they never have giant immediate. This code will not like huge frames if we don't give it a scratch register: void load64(ImplicitAddress address, RegisterID dest) { if (tryLoadWithOffset<64>(dest, address.base, address.offset)) return; signExtend32ToPtr(TrustedImm32(address.offset), getCachedMemoryTempRegisterIDAndInvalidate()); m_assembler.ldr<64>(dest, address.base, memoryTempRegister); } It could be something else than a huge frame, in any case it's something with a huge immediate.
Attachments
Air dump (4.12 MB, application/octet-stream)
2017-03-28 17:08 PDT, JF Bastien
no flags
Air dump of the function at O1 (3.00 MB, text/plain)
2017-04-25 09:25 PDT, JF Bastien
no flags
quick hack (5.75 KB, patch)
2017-04-26 09:14 PDT, JF Bastien
jfbastien: review-
jfbastien: commit-queue-
patch (40.63 KB, patch)
2017-04-28 16:54 PDT, JF Bastien
fpizlo: review-
path (39.51 KB, patch)
2017-05-02 02:08 PDT, JF Bastien
no flags
patch (39.38 KB, patch)
2017-05-03 00:01 PDT, JF Bastien
no flags
patch (39.38 KB, patch)
2017-05-03 23:59 PDT, JF Bastien
no flags
JF Bastien
Comment 1 2017-03-28 17:08:16 PDT
Created attachment 305677 [details] Air dump Here's the Air dump for that one function from the huge wasm program.
JF Bastien
Comment 2 2017-03-28 17:13:13 PDT
The conditions that need to fail are: ALWAYS_INLINE bool isInt9(int32_t value) { return value == ((value << 23) >> 23); } and: template<int datasize> ALWAYS_INLINE bool isValidScaledUImm12(int32_t offset) { int32_t maxPImm = 4095 * (datasize / 8); if (offset < 0) return false; if (offset > maxPImm) return false; if (offset & ((datasize / 8) - 1)) return false; return true; } with datasize = 64.
JF Bastien
Comment 3 2017-04-25 09:24:30 PDT
Back at this... Function #94671 is the problem. The problem is with: move Address(31, 1380) -> Address(31, 2008), scratch: 0 1380 isn't a valid A64 offset. Won't fit a scaled 12-bit immediate. The first occurrence of 1380 is after allocateStack (that’s in the log, before Fil split this for O1/O2): Move32 %r0, 1380(%sp), @1845 After Fil’s change this occurs even with JSC_webAssemblyB3OptimizationLevel=1, but on an Add64 generated because of addPtr. The frame size is then 4688. So both allocateRegistersAndStackByLinearScan and allocateStackByGraphColoring can generate huge frames, and then lowerStackArgs derps too-big immediate offsets and A64 is sad. Luckily there’s a FIXME for that: // FIXME: This may produce addresses that aren't valid if we end up with a ginormous stack frame. // We would have to scavenge for temporaries if this happened. Fortunately, this case will be // extremely rare so we can do crazy things when it arises. // https://bugs.webkit.org/show_bug.cgi?id=152530 Is scavenging even worth it? Could we detect, at O1 and O2, that the frame was bigger than valid immediate encodings, and then in that rare case re-allocate with the macro scratch reserved? It'll be ugly, but big functions call for dumb outcomes...
JF Bastien
Comment 4 2017-04-25 09:25:01 PDT
Created attachment 308106 [details] Air dump of the function at O1
Filip Pizlo
Comment 5 2017-04-25 10:17:09 PDT
I wonder if there would be any perf regression to unconditionally reserving a register during RA for this purpose. We'd only do this on ARM64, which has a ton of registers. I'd be surprised if you could measure a perf regression from deducting one of them.
JF Bastien
Comment 6 2017-04-25 13:22:16 PDT
(In reply to Filip Pizlo from comment #5) > I wonder if there would be any perf regression to unconditionally reserving > a register during RA for this purpose. We'd only do this on ARM64, which > has a ton of registers. I'd be surprised if you could measure a perf > regression from deducting one of them. I'll try this out first and will report with benchmark results on ARM, for JS as well as wasm. Earlier Fil suggested the following, which I'll leave here if we need to revisit: ---- 1) in the loop where it currently does the lowering, if you get a null stackAddr, set aside some information about the Arg* or Inst* that has the problem 2) if after that loop there are one or more things that had problems, run RegLiveness 3) use RegLiveness to compute a mapping from Insts that have problems to the RegisterSet of live registers that that inst 4) loop over the code again, and now when you encounter an unlowered address you'll have a RegisterSet of live registers available if no register is live, you'll have to pick some register that this instruction does not use and spill it. Also, we should have some smarts for stackmaps/patchpoints. If the stack address is meant to be an argument to a stackmap, then we don't need it to be isValidForm The point is that it's unnecessary for lowerStackAddrs to do the isValidForm check in certain cases. stackmaps and patchpoints are examples of this. Currently there is no way to communicate this information. ---- I mentioned re-running regalloc in this case. I thought we didn't need to copy anything, but Fil says we'd have to copy the entire Code. That may be cheap, and could be better than the suggestion above.
JF Bastien
Comment 7 2017-04-26 09:14:31 PDT
Created attachment 308255 [details] quick hack Here's a hack which fixes part of the issue by reserving x15 on ARM64. I'm not super confident about: - The reservation occurs in reservedHardwareRegisters. Is that weird? - Not giving it another name when I use it for this purpose. - How far away stackAddrImpl is from AirLowerStackArgs.cpp, yet they're now intertwined (if the former fails, the later does similar work on top). - I don't think the #else clause needs to be there anymore? x86 simply can't fail, right? Note that x15 is only explicitly used in the ExecutableAllocator's jitWriteThunkGenerator, and this is a fine place to use it AFAICT. There's another bug at O1, very similar but this time in add64. It's trying to materialize the frame size, it's too big, and the same assert about scratch register fires. I assume it's in the same post-RA code area. I'll fix in this bug as well.
JF Bastien
Comment 8 2017-04-26 09:25:20 PDT
During lowerStackArgs the code generated with a huge frame will transform a bunch of: Move (spill337), (spill201), %r0, @8735 Into something like this: Move 1416(%sp), 2016(%sp), %r0, @8735 Move $1404, %r15, @8736 Add64 %sp, %r15, @8736 Move (%r15), 2032(%sp), %r0, @8736 Move 1400(%sp), 2024(%sp), %r0, @8737 Move 16(%r15), 2160(%sp), %r0, @8738 Move 360(%r15), 2280(%sp), %r0, @8739 Move 1376(%sp), 2080(%sp), %r0, @8740 Move 24(%r15), 2104(%sp), %r0, @8741 Move 1040(%sp), -208(%fp), %r0, @8742 Move 1360(%sp), 2200(%sp), %r0, @8743 Move 8(%r15), 2048(%sp), %r0, @8744 Move 1048(%sp), -248(%fp), %r0, @8745 Move 1328(%sp), -200(%fp), %r0, @8746 Move 1392(%sp), 2176(%sp), %r0, @8747 Move 1384(%sp), 2088(%sp), %r0, @8748 Move 32(%r15), 2096(%sp), %r0, @8749 Move 1344(%sp), 2328(%sp), %r0, @8750 Move 40(%r15), 2120(%sp), %r0, @8751 Move 56(%r15), 2240(%sp), %r0, @8752 Move 1312(%sp), -216(%fp), %r0, @8753 Move 1320(%sp), -232(%fp), %r0, @8754 Move 1336(%sp), 2248(%sp), %r0, @8755 Move 64(%r15), 2224(%sp), %r0, @8756 Move32 1356(%sp), %r1, @8757 Note that r15 is only set once, and we index off it afterwards (unless it too is out of range, then we generate a new r15).
Michael Saboff
Comment 9 2017-04-26 09:42:35 PDT
Comment on attachment 308255 [details] quick hack View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review This looks sound. > Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:69 > + // if what's already in a register will do, instead just offset from the latest offset. Maybe it would be a little clearer to say "offset from the prior offset". > Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:103 > + dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n"); Why put the error message twice?
JF Bastien
Comment 10 2017-04-26 09:53:04 PDT
Comment on attachment 308255 [details] quick hack View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review >> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:103 >> + dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n"); > > Why put the error message twice? That's how the code was before. It's kinda nice: dataLog(code) can spew a BUNCH of stuff, so having it twice is convenient. I'm wondering if that entire #else is dead though: it'll never happen on ARM64 with my change, and IIUC it already couldn't happen on x86. So maybe I can just delete it (and leave the RELEASE_ASSERT_NOT_REACHED).
Michael Saboff
Comment 11 2017-04-26 09:54:57 PDT
Comment on attachment 308255 [details] quick hack View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review >>> Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:103 >>> + dataLog("FATAL: Could not create stack reference for offset = ", offset, " and width = ", width, "\n"); >> >> Why put the error message twice? > > That's how the code was before. It's kinda nice: dataLog(code) can spew a BUNCH of stuff, so having it twice is convenient. > > I'm wondering if that entire #else is dead though: it'll never happen on ARM64 with my change, and IIUC it already couldn't happen on x86. So maybe I can just delete it (and leave the RELEASE_ASSERT_NOT_REACHED). I'm fine eliminating the #else and replacing with a RELEASE_ASSERT_NOT_REACHED.
Filip Pizlo
Comment 12 2017-04-26 10:28:03 PDT
Comment on attachment 308255 [details] quick hack View in context: https://bugs.webkit.org/attachment.cgi?id=308255&action=review > Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:70 > + int32_t lastLargeOffsetSP = 0; There are a lot of things wrong with how you do this optimization. For starters, you're relying on x15 not being clobbered ever, which is totally wrong. I recommend removing this optimization. > Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:81 > + Air::Tmp tmp = Air::Tmp(ARM64Registers::x15); You should define x15 as a scratch register somewhere. If possible, you should use the address scratch register for this. You should not refer to the regsiter name directly. > Source/JavaScriptCore/jit/RegisterSet.cpp:49 > - return RegisterSet(ARM64Registers::x18, ARM64Registers::lr); > + return RegisterSet(ARM64Registers::x15, ARM64Registers::x18, ARM64Registers::lr); This is not how we reserve registers. I think that you want to pin x18. lowerStackArgs *must* assume that x18 can get clobbered.
Saam Barati
Comment 13 2017-04-26 12:43:43 PDT
Why don't we need this for x86?
JF Bastien
Comment 14 2017-04-26 13:48:18 PDT
(In reply to Saam Barati from comment #13) > Why don't we need this for x86? Unless I'm wrong, x86 can encode any offset from SP when performing spills.
Filip Pizlo
Comment 15 2017-04-26 13:49:31 PDT
(In reply to JF Bastien from comment #14) > (In reply to Saam Barati from comment #13) > > Why don't we need this for x86? > > Unless I'm wrong, x86 can encode any offset from SP when performing spills. Correct. isValidForm always returns true on x86.
Saam Barati
Comment 16 2017-04-26 14:50:46 PDT
(In reply to JF Bastien from comment #14) > (In reply to Saam Barati from comment #13) > > Why don't we need this for x86? > > Unless I'm wrong, x86 can encode any offset from SP when performing spills. 👍🏾
JF Bastien
Comment 17 2017-04-28 16:54:23 PDT
Created attachment 308618 [details] patch Here's a cleaned up patch. It passes all the JSC tests locally, and the huge binary isn't crashing on ARM64 anymore (or at least, not for that reason).
Filip Pizlo
Comment 18 2017-05-01 10:02:57 PDT
Comment on attachment 308618 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=308618&action=review > Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:297 > + void add64(TrustedImm32 imm, RegisterID src, RegisterID dest, RegisterID scratch) > + { > + if (isUInt12(imm.m_value)) { > + m_assembler.add<64>(dest, src, UInt12(imm.m_value)); > + return; > + } > + if (isUInt12(-imm.m_value)) { > + m_assembler.sub<64>(dest, src, UInt12(-imm.m_value)); > + return; > + } > + > + signExtend32ToPtr(imm, scratch); > + m_assembler.add<64>(dest, src, scratch); > + } > + This method is completely unnecessary. It's anti-idiomatic in a number of crucial ways: - MacroAssembler methods shouldn't take explicit scratch registers. Anything that high-level belongs in AssemblyHelpers. The only exception are methods we added to support pseudo-opcodes in Air. This isn't an Air opcode (nor should it be - in Air we would have wanted to reveal the signExtend32ToPtr as a separate Inst). - It's a copy-paste of the other add64() functions. It would be a lot better to consolidate. - There is no need for this function to exist, since the only caller of this function could just call the other add64() that uses the data temp for the immediate. > Source/JavaScriptCore/b3/air/AirGenerate.cpp:221 > + jit.addPtr(CCallHelpers::TrustedImm32(-code.frameSize()), MacroAssembler::stackPointerRegister, MacroAssembler::stackPointerRegister, *pinnedExtendedOffsetAddrRegister()); I think it would be a lot better if you just inlined that stuff from addPtr here, or if you just called the normal addPtr. Note that you can AllowMacroScratchRegisterUsage around this addPtr because you're in the prologue and so the scratch registers are definitely available (macro scratches are not callee-save).
JF Bastien
Comment 19 2017-05-02 02:08:58 PDT
Created attachment 308810 [details] path Address Fil's comments. There's a new crash which I think comes after my latest rebase: ASSERTION FAILED: m_allowScratchRegister /Source/JavaScriptCore/assembler/MacroAssemblerARM64.h(3830) : RegisterID JSC::MacroAssemblerARM64::getCachedDataTempRegisterIDAndInvalidate() 1 0x105771280 WTFCrash 2 0x1045fa30c JSC::MacroAssemblerARM64::getCachedDataTempRegisterIDAndInvalidate() 3 0x1045f6f0c JSC::MacroAssemblerARM64::call(JSC::AbstractMacroAssembler<JSC::ARM64Assembler, JSC::MacroAssemblerARM64>::Address) 4 0x1045f6cb0 JSC::B3::Air::CCallSpecial::generate(JSC::B3::Air::Inst&, JSC::CCallHelpers&, JSC::B3::Air::GenerationContext&) 5 0x10468ecb0 JSC::B3::Air::PatchCustom::generate(JSC::B3::Air::Inst&, JSC::CCallHelpers&, JSC::B3::Air::GenerationContext&) 6 0x104686c94 JSC::B3::Air::Inst::generate(JSC::CCallHelpers&, JSC::B3::Air::GenerationContext&) 7 0x10463d228 JSC::B3::Air::generate(JSC::B3::Air::Code&, JSC::CCallHelpers&) 8 0x104765400 JSC::B3::generate(JSC::B3::Procedure&, JSC::CCallHelpers&) 9 0x105625368 JSC::Wasm::parseAndCompile(JSC::Wasm::CompilationContext&, unsigned char const*, unsigned long, JSC::Wasm::Signature const&, WTF::Vector<JSC::Wasm::UnlinkedWasmToWasmCall, 0ul, WTF::CrashOnOverflow, 16ul>&, JSC::Wasm::ModuleInformation const&, JSC::Wasm::MemoryMode, JSC::Wasm::CompilationMode, unsigned int, JSC::Wasm::TierUpCount*) 10 0x1047294f4 JSC::Wasm::BBQPlan::compileFunctions(JSC::Wasm::Plan::CompilationEffort) I think it's similar to what I'm fixing here, but may be separate and fixable as a follow-up. I'll send repro instructions.
JF Bastien
Comment 20 2017-05-02 02:11:29 PDT
This new crash is weird, because I don't expect call+address to be anything special like huge offsets are. They're just something we do... Nothing particular about huge binaries / functions.
Saam Barati
Comment 21 2017-05-02 09:18:46 PDT
Saam Barati
Comment 22 2017-05-02 22:29:37 PDT
Comment on attachment 308810 [details] path View in context: https://bugs.webkit.org/attachment.cgi?id=308810&action=review quick comments. Need to keep reading. > Source/JavaScriptCore/b3/B3Common.cpp:76 > + return static_cast<GPRReg>(+MacroAssembler::dataTempRegister); Why the +? > Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp:128 > +bool CCallSpecial::admitsExtendedOffsetAddr(Inst&, unsigned argIndex) > +{ > + // The callee can be on the stack. > + if (argIndex == calleeArgOffset) > + return true; > + > + return false; > +} I think this is outdated w.r.t ToT, where this should be false on ARM for the callee. > Source/JavaScriptCore/b3/air/AirCCallSpecial.h:64 > + void dumpImpl(PrintStream&) const final; I think it's webkit style to just use override even if it's final, but I'm not 100% sure > Source/JavaScriptCore/b3/air/AirGenerate.cpp:221 > + AllowMacroScratchRegisterUsage allowScratch(jit); Shouldn't this be AllowScratchIf(arm64())?
JF Bastien
Comment 23 2017-05-03 00:01:25 PDT
Created attachment 308890 [details] patch > > Source/JavaScriptCore/b3/B3Common.cpp:76 > > + return static_cast<GPRReg>(+MacroAssembler::dataTempRegister); > > Why the +? You found the 💎 gem 💎 hidden in this patch! 🎉 dataTempRegister in the header is actually a *declaration* and not a *definition*. Even adding constexpr (as I did) doesn't make it a definition. This causes the linker to crap itself on missing dataTempRegister symbol. Sad indeed! Why does it work elsewhere? ✨ The magic of C++ ✨ One fix is to switch to C++17 and use inline variables, another is to just declare dataTempRegister in the header and then add a definition in the .cpp file which is super ugly and **is deprecated in C++17** in favor of inline variables. But the other fix... the beautiful fix... is to turn the use of dataTempRegister into an rvalue! This beautiful usage of unary plus magically makes the problem go away because now we're not asking the linker to get this value from oh-so-far away it would surely need a relocation! No, we're just rvalue-ing, nothing to see here linker. So there you have it, unary plus isn't totally useless. It should be useless, but here it's useful in compensating for something ridiculous. Fun related fact! Unary plus is also valid (and useful!) for C++ lambdas: +[](){} #cplusplus #awesome #hashtag > > Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp:128 > > +bool CCallSpecial::admitsExtendedOffsetAddr(Inst&, unsigned argIndex) > > +{ > > + // The callee can be on the stack. > > + if (argIndex == calleeArgOffset) > > + return true; > > + > > + return false; > > +} > > I think this is outdated w.r.t ToT, where this should be false on ARM for > the callee. Ah right, fixed. I rebased and the issue is indeed gone! > > Source/JavaScriptCore/b3/air/AirCCallSpecial.h:64 > > + void dumpImpl(PrintStream&) const final; > > I think it's webkit style to just use override even if it's final, but I'm > not 100% sure Seems super weird. I added it because, reading the code, this wasn't obvious to me. There's a tiny optimization argument to be made too, but it coms with the world's tiniest violin for those missed cycles. > > Source/JavaScriptCore/b3/air/AirGenerate.cpp:221 > > + AllowMacroScratchRegisterUsage allowScratch(jit); > > Shouldn't this be AllowScratchIf(arm64())? Yeah that's good, done.
Saam Barati
Comment 24 2017-05-03 19:23:53 PDT
(In reply to JF Bastien from comment #23) > Created attachment 308890 [details] > patch > > > > Source/JavaScriptCore/b3/B3Common.cpp:76 > > > + return static_cast<GPRReg>(+MacroAssembler::dataTempRegister); > > > > Why the +? > > You found the 💎 gem 💎 hidden in this patch! 🎉 > > dataTempRegister in the header is actually a *declaration* and not a > *definition*. Even adding constexpr (as I did) doesn't make it a definition. > This causes the linker to crap itself on missing dataTempRegister symbol. > Sad indeed! Why does it work elsewhere? ✨ The magic of C++ ✨ One fix is to > switch to C++17 and use inline variables, another is to just declare > dataTempRegister in the header and then add a definition in the .cpp file > which is super ugly and **is deprecated in C++17** in favor of inline > variables. But the other fix... the beautiful fix... is to turn the use of > dataTempRegister into an rvalue! This beautiful usage of unary plus > magically makes the problem go away because now we're not asking the linker > to get this value from oh-so-far away it would surely need a relocation! No, > we're just rvalue-ing, nothing to see here linker. So there you have it, > unary plus isn't totally useless. It should be useless, but here it's useful > in compensating for something ridiculous. I don't get why constexpr wouldn't fix this. What about a "using" declaration? > Fun related fact! Unary plus is also valid (and useful!) for C++ lambdas: > +[](){} > > > #cplusplus #awesome #hashtag > > > > > Source/JavaScriptCore/b3/air/AirCCallSpecial.cpp:128 > > > +bool CCallSpecial::admitsExtendedOffsetAddr(Inst&, unsigned argIndex) > > > +{ > > > + // The callee can be on the stack. > > > + if (argIndex == calleeArgOffset) > > > + return true; > > > + > > > + return false; > > > +} > > > > I think this is outdated w.r.t ToT, where this should be false on ARM for > > the callee. > > Ah right, fixed. I rebased and the issue is indeed gone! > > > > > Source/JavaScriptCore/b3/air/AirCCallSpecial.h:64 > > > + void dumpImpl(PrintStream&) const final; > > > > I think it's webkit style to just use override even if it's final, but I'm > > not 100% sure > > Seems super weird. I added it because, reading the code, this wasn't obvious > to me. There's a tiny optimization argument to be made too, but it coms with > the world's tiniest violin for those missed cycles. > > > > > Source/JavaScriptCore/b3/air/AirGenerate.cpp:221 > > > + AllowMacroScratchRegisterUsage allowScratch(jit); > > > > Shouldn't this be AllowScratchIf(arm64())? > > Yeah that's good, done.
Saam Barati
Comment 25 2017-05-03 19:47:40 PDT
Comment on attachment 308890 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=308890&action=review A few questions/comments, but I think Filip will be the best reviewer for this. > Source/JavaScriptCore/ChangeLog:21 > + We now unconditionally pin the dataTempRegister on ARM64, and use > + it when immediates don't fit. Have you measured perf on WasmBench? > Source/JavaScriptCore/ChangeLog:38 > + Number 2. is more complex: not all Inst can receive a stack > + argument whose base register isn't SP or FP. Specifically, > + Patchpoints and Stackmaps get very sad because they just want to > + know the offset value, but when we materialize the offset as > + follows: > + > + Move (spill337), (spill201), %r0, @8735 > + > + Becomes (where %r16 is dataTempRegister): > + Move $1404, %r16, @8736 > + Add64 %sp, %r16, @8736 > + Move (%r16), 2032(%sp), %r0, @8736 > + This example confuses me. Why can't we encoded 1404 but we can encode 2032? > Source/JavaScriptCore/ChangeLog:44 > + generate them, otherwise we generate Addr as shown above. What if they expect the base of the address to be SP/FP and they don't accept ExtendedOffsetAddr? > Source/JavaScriptCore/b3/B3CheckSpecial.cpp:147 > + return admitsStackImpl(numB3Args(inst), m_numCheckArgs + 1, inst, argIndex); Why admitsStackImpl here? Maybe the function needs a new name? > Source/JavaScriptCore/b3/B3PatchpointSpecial.cpp:136 > +bool PatchpointSpecial::admitsExtendedOffsetAddr(Inst&, unsigned) > +{ > + return true; > +} Why is this just true, but admitsStack above is not unconditional? Seems wrong to me, especially given the above checks if things need registers, etc.
Saam Barati
Comment 26 2017-05-03 19:48:29 PDT
Also, can you please add some tests for this? It seems like you could trigger this code inside testb3 either with some carefully crafted code, or perhaps just using O0.
JF Bastien
Comment 27 2017-05-03 23:59:19 PDT
Created attachment 309022 [details] patch > > Source/JavaScriptCore/ChangeLog:21 > > + We now unconditionally pin the dataTempRegister on ARM64, and use > > + it when immediates don't fit. > > Have you measured perf on WasmBench? WasmBench crashes without my patch :( Works fine with my patch... but the numbers are meaningless on their own! TitzerBench is perf neutral before / after. > > Source/JavaScriptCore/ChangeLog:38 > > + Number 2. is more complex: not all Inst can receive a stack > > + argument whose base register isn't SP or FP. Specifically, > > + Patchpoints and Stackmaps get very sad because they just want to > > + know the offset value, but when we materialize the offset as > > + follows: > > + > > + Move (spill337), (spill201), %r0, @8735 > > + > > + Becomes (where %r16 is dataTempRegister): > > + Move $1404, %r16, @8736 > > + Add64 %sp, %r16, @8736 > > + Move (%r16), 2032(%sp), %r0, @8736 > > + > > This example confuses me. Why can't we encoded 1404 but we can encode 2032? Because of isValidScaledUImm12, this clause: offset & ((datasize / 8) - 1) > > Source/JavaScriptCore/ChangeLog:44 > > + generate them, otherwise we generate Addr as shown above. > > What if they expect the base of the address to be SP/FP and they don't > accept ExtendedOffsetAddr? Then it shouldn't be a ExtendedOffsetAddr? I'm not sure I understand. > > Source/JavaScriptCore/b3/B3CheckSpecial.cpp:147 > > + return admitsStackImpl(numB3Args(inst), m_numCheckArgs + 1, inst, argIndex); > > Why admitsStackImpl here? Maybe the function needs a new name? My mental model may be wrong, but my thinking was: if it admits a stack value then it could maybe admit an extended address. This case first disallows extended address for hidden branches, but lets others through as admitsStack would. Maybe I should just call admitsStack here? > > Source/JavaScriptCore/b3/B3PatchpointSpecial.cpp:136 > > +bool PatchpointSpecial::admitsExtendedOffsetAddr(Inst&, unsigned) > > +{ > > + return true; > > +} > > Why is this just true, but admitsStack above is not unconditional? Seems > wrong to me, especially given the above checks if things need registers, etc. Right, you're expressing my mental model above... My understanding was that I could relax it here because Patchpoint won't generate an Addr for a register. But re-reading the code that seems wrong because admitsStackImpl on Void origin does more than this. So I'll make it just call admitsStack. (In reply to Saam Barati from comment #26) > Also, can you please add some tests for this? > > It seems like you could trigger this code inside testb3 either with some > carefully crafted code, or perhaps just using O0. The code does trigger, even on x86! It's not specific to huge frames on purpose, so it's not a corner case. Extended address is there super often. (In reply to Saam Barati from comment #24) > (In reply to JF Bastien from comment #23) > > Created attachment 308890 [details] > > patch > > > > > > Source/JavaScriptCore/b3/B3Common.cpp:76 > > > > + return static_cast<GPRReg>(+MacroAssembler::dataTempRegister); > > > > > > Why the +? > > > > You found the 💎 gem 💎 hidden in this patch! 🎉 > > > > dataTempRegister in the header is actually a *declaration* and not a > > *definition*. Even adding constexpr (as I did) doesn't make it a definition. > > This causes the linker to crap itself on missing dataTempRegister symbol. > > Sad indeed! Why does it work elsewhere? ✨ The magic of C++ ✨ One fix is to > > switch to C++17 and use inline variables, another is to just declare > > dataTempRegister in the header and then add a definition in the .cpp file > > which is super ugly and **is deprecated in C++17** in favor of inline > > variables. But the other fix... the beautiful fix... is to turn the use of > > dataTempRegister into an rvalue! This beautiful usage of unary plus > > magically makes the problem go away because now we're not asking the linker > > to get this value from oh-so-far away it would surely need a relocation! No, > > we're just rvalue-ing, nothing to see here linker. So there you have it, > > unary plus isn't totally useless. It should be useless, but here it's useful > > in compensating for something ridiculous. > I don't get why constexpr wouldn't fix this. > What about a "using" declaration? See last page of http://wg21.link/P0386R2 It's a declaration, not a definition, until C++17.
Zan Dobersek
Comment 28 2017-05-04 03:46:39 PDT
*** Bug 171563 has been marked as a duplicate of this bug. ***
Filip Pizlo
Comment 29 2017-05-05 20:28:32 PDT
Comment on attachment 309022 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=309022&action=review This is cool! I like how this turned out. Sorry I didn't get around to reviewing it sooner. > Source/JavaScriptCore/b3/B3Common.cpp:76 > + return static_cast<GPRReg>(+MacroAssembler::dataTempRegister); Why isn't dataTempRegister a GPRReg already? I think that GPRReg is a typedef for MacroAssembler::RegisterID. > Source/JavaScriptCore/b3/air/AirCustom.h:90 > + static bool admitsExtendedOffsetAddr(Inst& inst, unsigned argIndex) > + { > + if (!argIndex) > + return false; > + return inst.args[0].special()->admitsExtendedOffsetAddr(inst, argIndex); > + } > + You don't need to thread this through AirCustom if it's only for Patch. You could have Inst just assert that it's a Patch, or check if it is and return false if it's not, and then do the special()->admintsBlahBlah thing directly. However, this isn't bad. I'll allow it. ;-) > Source/JavaScriptCore/b3/air/opcode_generator.rb:232 > - token =~ /\A((Tmp)|(Imm)|(BigImm)|(BitImm)|(BitImm64)|(SimpleAddr)|(Addr)|(Index)|(RelCond)|(ResCond)|(DoubleCond)|(StatusCond))\Z/ > + token =~ /\A((Tmp)|(Imm)|(BigImm)|(BitImm)|(BitImm64)|(SimpleAddr)|(Addr)|(ExtendedOffsetAddr)|(Index)|(RelCond)|(ResCond)|(DoubleCond)|(StatusCond))\Z/ Again, don't really need to teach opcode_generator about it if it's just for Patch. But, this kind of generality isn't a bad thing and there's precedent for making things a bit more general than strictly necessary, so it's OK to do it this way.
Filip Pizlo
Comment 30 2017-05-05 20:43:16 PDT
Comment on attachment 309022 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=309022&action=review > Source/JavaScriptCore/b3/air/AirLowerStackArgs.cpp:80 > + if (isPatch && inst.admitsExtendedOffsetAddr(arg)) { I would drop the isPatch check
WebKit Commit Bot
Comment 31 2017-05-05 20:57:46 PDT
Comment on attachment 309022 [details] patch Clearing flags on attachment: 309022 Committed r216306: <http://trac.webkit.org/changeset/216306>
WebKit Commit Bot
Comment 32 2017-05-05 20:57:48 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.