REOPENED 226375
Put the Baseline JIT prologue and op_loop_hint code in JIT thunks.
https://bugs.webkit.org/show_bug.cgi?id=226375
Summary Put the Baseline JIT prologue and op_loop_hint code in JIT thunks.
Mark Lam
Reported 2021-05-28 08:41:51 PDT
And also optimize op_enter a bit more.
Attachments
wip. (45.86 KB, patch)
2021-05-28 08:49 PDT, Mark Lam
ews-feeder: commit-queue-
wip. (45.88 KB, patch)
2021-05-28 09:09 PDT, Mark Lam
ews-feeder: commit-queue-
work in progress. (45.89 KB, patch)
2021-05-28 10:11 PDT, Mark Lam
ews-feeder: commit-queue-
work in progress. (43.83 KB, patch)
2021-05-29 17:39 PDT, Mark Lam
ews-feeder: commit-queue-
work in progress. (43.83 KB, patch)
2021-05-29 20:31 PDT, Mark Lam
no flags
work in progress. (43.57 KB, patch)
2021-05-30 01:28 PDT, Mark Lam
no flags
proposed patch. (48.74 KB, patch)
2021-06-03 17:58 PDT, Mark Lam
no flags
proposed patch. (45.39 KB, patch)
2021-06-04 15:41 PDT, Mark Lam
ews-feeder: commit-queue-
proposed patch. (49.87 KB, patch)
2021-06-04 17:09 PDT, Mark Lam
ews-feeder: commit-queue-
proposed patch. (52.80 KB, patch)
2021-06-06 11:41 PDT, Mark Lam
ews-feeder: commit-queue-
proposed patch. (52.75 KB, patch)
2021-06-06 11:45 PDT, Mark Lam
no flags
proposed patch. (51.13 KB, patch)
2021-06-06 23:32 PDT, Mark Lam
no flags
proposed patch. (52.67 KB, patch)
2021-06-07 08:03 PDT, Mark Lam
keith_miller: review+
patch for landing. (53.90 KB, patch)
2021-06-07 09:44 PDT, Mark Lam
no flags
Mark Lam
Comment 1 2021-05-28 08:49:28 PDT
Mark Lam
Comment 2 2021-05-28 09:09:10 PDT
Mark Lam
Comment 3 2021-05-28 10:11:42 PDT
Created attachment 430023 [details] work in progress.
Mark Lam
Comment 4 2021-05-29 17:39:15 PDT
Created attachment 430119 [details] work in progress.
Mark Lam
Comment 5 2021-05-29 20:31:00 PDT
Created attachment 430127 [details] work in progress.
Mark Lam
Comment 6 2021-05-30 01:28:48 PDT
Created attachment 430133 [details] work in progress.
Mark Lam
Comment 7 2021-06-03 17:58:03 PDT
Created attachment 430522 [details] proposed patch. Let's test this on the EWS first.
Radar WebKit Bug Importer
Comment 8 2021-06-04 08:42:18 PDT
Mark Lam
Comment 9 2021-06-04 15:41:31 PDT
Created attachment 430623 [details] proposed patch.
Mark Lam
Comment 10 2021-06-04 17:09:22 PDT
Created attachment 430632 [details] proposed patch.
Mark Lam
Comment 11 2021-06-06 11:41:43 PDT
Created attachment 430691 [details] proposed patch.
Mark Lam
Comment 12 2021-06-06 11:45:49 PDT
Created attachment 430692 [details] proposed patch.
Robin Morisset
Comment 13 2021-06-06 23:04:38 PDT
Comment on attachment 430692 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=430692&action=review > Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:2238 > + void negate64(RegisterID dest) Any reason not to do sub64(ARM64Registers::zr, dest, dest)? > Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h:1809 > + // we'll need to reload it with the high bits of the address afterwards. I don't see where you do this reloading? > Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:921 > + void negate64(RegisterID dest) There seems to be an opcode exactly for this: https://www.aldeid.com/wiki/X86-assembly/Instructions/neg > Source/JavaScriptCore/jit/JIT.cpp:894 > + static constexpr ThunkGenerator generators[] = { Isn't it already defined a few dozen lines above? How do they differ? More general the entire block that follows appear duplicated, and I don't understand why. > Source/JavaScriptCore/jit/JIT.cpp:966 > + // We'll be imminently returing with a retab in the normal path, which will do validation. returing -> returning ? retab -> retag ?
Mark Lam
Comment 14 2021-06-06 23:19:16 PDT
Comment on attachment 430692 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=430692&action=review >> Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:2238 >> + void negate64(RegisterID dest) > > Any reason not to do sub64(ARM64Registers::zr, dest, dest)? No good reason. It's only because I forgot about ARM64Registers::zr. Will fix. >> Source/JavaScriptCore/assembler/MacroAssemblerARMv7.h:1809 >> + // we'll need to reload it with the high bits of the address afterwards. > > I don't see where you do this reloading? Stale comment copied from `Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, AbsoluteAddress dest)`. Does not apply here because we are free to use addressTempRegister in this function. I'll remove the comment. >> Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h:921 >> + void negate64(RegisterID dest) > > There seems to be an opcode exactly for this: https://www.aldeid.com/wiki/X86-assembly/Instructions/neg k, thanks. I'll look into it. >> Source/JavaScriptCore/jit/JIT.cpp:894 >> + static constexpr ThunkGenerator generators[] = { > > Isn't it already defined a few dozen lines above? How do they differ? > More general the entire block that follows appear duplicated, and I don't understand why. The table above is for the normal prologue thunks. This table is for the arity fixup prologue thunks. >> Source/JavaScriptCore/jit/JIT.cpp:966 >> + // We'll be imminently returing with a retab in the normal path, which will do validation. > > returing -> returning ? > retab -> retag ? Will fix "returing". "retab" is not a typo. It's the ARM64E instruction for "return with Authentication using the B key". The ARM64E MacroAssembler automatically emits retab in place of ret when you see the `ret()` call below.
Mark Lam
Comment 15 2021-06-06 23:26:18 PDT
Comment on attachment 430692 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=430692&action=review >>> Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:2238 >>> + void negate64(RegisterID dest) >> >> Any reason not to do sub64(ARM64Registers::zr, dest, dest)? > > No good reason. It's only because I forgot about ARM64Registers::zr. Will fix. Turns out, our MacroAssembler already has the neg64() emitter. I just didn't think to search for neg64 instead of negate64. Will remove these redundant negate64 emitters.
Mark Lam
Comment 16 2021-06-06 23:32:05 PDT
Created attachment 430714 [details] proposed patch.
Mark Lam
Comment 17 2021-06-07 08:03:13 PDT
Created attachment 430746 [details] proposed patch.
Keith Miller
Comment 18 2021-06-07 08:43:35 PDT
Comment on attachment 430746 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=430746&action=review r=me with some nits. > Source/JavaScriptCore/ChangeLog:45 > + LinkBuffer size results: Are these number from a specific workload? > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2313 > + Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, ImplicitAddress destAddress) I don't know anything about MIPS so I'm just gonna assume this is correct if tests pass. > Source/JavaScriptCore/jit/JIT.cpp:799 > + unsigned startArgument = m_codeBlock->isConstructor() ? 1 : 0; Nit, I think you don't need the ternary here. True implicitly converts to unsigned 1. > Source/JavaScriptCore/jit/JIT.cpp:834 > + unsigned generatorSelector = doesProfiling << 2 | isConstructor << 1 | hasHugeFrame << 0; Nit: Maybe this should be an inline function next to the macro for future code changes? > Source/JavaScriptCore/jit/JIT.cpp:898 > + static constexpr ThunkGenerator generators[] = { > +#define USE_PROLOGUE_GENERATOR(doesProfiling, isConstructor, hasHugeFrame, name, arityFixupName) arityFixupName, > + FOR_EACH_PROLOGUE_GENERATOR(USE_PROLOGUE_GENERATOR) > +#undef USE_PROLOGUE_GENERATOR > + }; How does this not conflict with the generators definition on line 825? O.o > Source/JavaScriptCore/jit/JIT.cpp:943 > + store64(codeBlockGPR, addressFor(CallFrameSlot::codeBlock)); Nit: I think this should be storePtr. > Source/JavaScriptCore/jit/JIT.cpp:966 > + // We'll be imminently returning with a retab in the normal path, which will do validation. Nit: retab => retag. > Source/JavaScriptCore/jit/JIT.cpp:1067 > +#endif Maybe add an else with a #error "unsupported architecture" for each of these? > Source/JavaScriptCore/jit/JIT.cpp:1069 > + store64(codeBlockGPR, addressFor(CallFrameSlot::codeBlock)); Ditto on storePtr.
Keith Miller
Comment 19 2021-06-07 08:44:42 PDT
Comment on attachment 430746 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=430746&action=review > Source/JavaScriptCore/jit/JIT.cpp:696 > + v( doesProfiling, !isConstructor, !hasHugeFrame, prologueGenerator4, arityFixup_prologueGenerator4) \ > + v( doesProfiling, !isConstructor, hasHugeFrame, prologueGenerator5, arityFixup_prologueGenerator5) \ > + v( doesProfiling, isConstructor, !hasHugeFrame, prologueGenerator6, arityFixup_prologueGenerator6) \ > + v( doesProfiling, isConstructor, hasHugeFrame, prologueGenerator7, arityFixup_prologueGenerator7) Nit: Style comment seems somewhat relevant here.
Mark Lam
Comment 20 2021-06-07 09:37:20 PDT
Comment on attachment 430746 [details] proposed patch. View in context: https://bugs.webkit.org/attachment.cgi?id=430746&action=review >> Source/JavaScriptCore/ChangeLog:45 >> + LinkBuffer size results: > > Are these number from a specific workload? This is from a single run of Speedometer2. I will update the ChangeLog to say so. >> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2313 >> + Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, ImplicitAddress destAddress) > > I don't know anything about MIPS so I'm just gonna assume this is correct if tests pass. Ditto. FWIW, this implementation is based on `Jump branchAdd32(ResultCondition cond, TrustedImm32 imm, AbsoluteAddress destAddress)` but tweaked based on the implementation of `void load32(ImplicitAddress address, RegisterID dest)` and `void store32(RegisterID src, ImplicitAddress address)` plus a little googled knowledge: (1) MIPS (32-bit) instructions are always 4 bytes in size, and (2) the integers in the 2nd argument of conditional branches are the number of instructions between the branch instruction and the target. >> Source/JavaScriptCore/jit/JIT.cpp:696 >> + v( doesProfiling, isConstructor, hasHugeFrame, prologueGenerator7, arityFixup_prologueGenerator7) > > Nit: Style comment seems somewhat relevant here. I deliberately kept the columns aligned (in violation of the style) because this makes it easier to verify (both for me and future readers of the code) that I got the permutations correct. I'll leave it as is. >> Source/JavaScriptCore/jit/JIT.cpp:799 >> + unsigned startArgument = m_codeBlock->isConstructor() ? 1 : 0; > > Nit, I think you don't need the ternary here. True implicitly converts to unsigned 1. That's true, but I think that this makes it clear that we want to start from argument 1 if isConstructor and argument 0 if otherwise, and not require the cognitive leap to realize that true == 1 (small as the cognitive leap may be). I'm going to leave this as is. >> Source/JavaScriptCore/jit/JIT.cpp:834 >> + unsigned generatorSelector = doesProfiling << 2 | isConstructor << 1 | hasHugeFrame << 0; > > Nit: Maybe this should be an inline function next to the macro for future code changes? Sounds good. Will do. >> Source/JavaScriptCore/jit/JIT.cpp:898 >> + }; > > How does this not conflict with the generators definition on line 825? O.o Nope. This one is inside an if statement block i.e. nested scope. >> Source/JavaScriptCore/jit/JIT.cpp:943 >> + store64(codeBlockGPR, addressFor(CallFrameSlot::codeBlock)); > > Nit: I think this should be storePtr. Will fix. >> Source/JavaScriptCore/jit/JIT.cpp:966 >> + // We'll be imminently returning with a retab in the normal path, which will do validation. > > Nit: retab => retag. Nope. retab is ARM64E's "return with Authentication using the B key". This is the 2nd time someone got confused about this (Robin asked first). I will update the comment to clarify. Here's the updated comment: // We'll be imminently returning with a `retab` (ARM64E's return with authentication // using the B key) in the normal path (see MacroAssemblerARM64E's implementation of // ret()), which will do validation. So, extra validation here is redundant and unnecessary. >> Source/JavaScriptCore/jit/JIT.cpp:1067 >> +#endif > > Maybe add an else with a #error "unsupported architecture" for each of these? If we do this, there will be many many places where we'll have to add this, and make the code unsightly. As a compromise, I'll add the following at the top of JIT.cpp to alert the port developer: #if ENABLE(EXTRA_CTI_THUNKS) #if CPU(ARM64) || (CPU(X86_64) && !OS(WINDOWS)) // These are supported ports. #else // This is a courtesy reminder (and warning) that the implementation of EXTRA_CTI_THUNKS can // use up to 6 argument registers and/or 6/7 temp registers, and make use of ARM64 like // features. Hence, it may not work for many other ports without significant work. If you // plan on adding EXTRA_CTI_THUNKS support for your port, please remember to search the // EXTRA_CTI_THUNKS code for CPU(ARM64) and CPU(X86_64) conditional code, and add support // for your port there as well. #error "unsupported architecture" #endif #endif // ENABLE(EXTRA_CTI_THUNKS) >> Source/JavaScriptCore/jit/JIT.cpp:1069 >> + store64(codeBlockGPR, addressFor(CallFrameSlot::codeBlock)); > > Ditto on storePtr. Will fix.
Mark Lam
Comment 21 2021-06-07 09:44:00 PDT
Created attachment 430756 [details] patch for landing.
Mark Lam
Comment 22 2021-06-07 15:52:31 PDT
Thanks for the reviews. Landed in r278576: <http://trac.webkit.org/r278576>.
Mark Lam
Comment 23 2021-06-19 01:26:09 PDT
Reverted patch in r279049: <http://trac.webkit.org/r279049> dues to suspected Speedometer2 regression.
Note You need to log in before you can comment on or make changes to this bug.