RESOLVED FIXED 228543
[JSC] load/store with BaseIndex is inefficient in ARM64
https://bugs.webkit.org/show_bug.cgi?id=228543
Summary [JSC] load/store with BaseIndex is inefficient in ARM64
Yusuke Suzuki
Reported 2021-07-28 01:49:56 PDT
[JSC] load/store with BaseIndex is inefficient in ARM64
Attachments
Patch (27.33 KB, patch)
2021-07-28 01:51 PDT, Yusuke Suzuki
no flags
Patch (27.56 KB, patch)
2021-07-28 10:32 PDT, Yusuke Suzuki
mark.lam: review+
ews-feeder: commit-queue-
Yusuke Suzuki
Comment 1 2021-07-28 01:51:21 PDT
Mark Lam
Comment 2 2021-07-28 09:08:09 PDT
Comment on attachment 434411 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434411&action=review > Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:1433 > + if (!address.scale || address.scale == 3) { Can you use the Scale enum values instead here? if (address.scale == TimesOne || address.scale == TimesEight) { > Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:1435 > + if (auto baseGPR = tryFoldBaseAndOffsetPart(address)) { > + m_assembler.ldr<64>(dest, baseGPR.value(), address.index, indexExtendType(address), address.scale); I think you can do better: if (scale == TimesEight && isUInt12(address.offset / 8)), then you can use the `ALWAYS_INLINE void ldr(RegisterID rt, RegisterID rn, unsigned pimm)` form and just issue 1 instruction to do this load. See also section C6.2.101 of the spec. See also LDUR in section C6.2.136 if scale == TimesOne.
Mark Lam
Comment 3 2021-07-28 09:10:38 PDT
Comment on attachment 434411 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434411&action=review >> Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:1435 >> + m_assembler.ldr<64>(dest, baseGPR.value(), address.index, indexExtendType(address), address.scale); > > I think you can do better: if (scale == TimesEight && isUInt12(address.offset / 8)), then you can use the `ALWAYS_INLINE void ldr(RegisterID rt, RegisterID rn, unsigned pimm)` form and just issue 1 instruction to do this load. See also section C6.2.101 of the spec. > > See also LDUR in section C6.2.136 if scale == TimesOne. I should clarify: my point is that you may be able to do this with just 1 instruction instead of 2. Ditto for other cases below. So, I think it's worth checking the spec manual for available instructions.
Yusuke Suzuki
Comment 4 2021-07-28 10:31:13 PDT
Comment on attachment 434411 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434411&action=review >> Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:1433 >> + if (!address.scale || address.scale == 3) { > > Can you use the Scale enum values instead here? > if (address.scale == TimesOne || address.scale == TimesEight) { Changed. >>> Source/JavaScriptCore/assembler/MacroAssemblerARM64.h:1435 >>> + m_assembler.ldr<64>(dest, baseGPR.value(), address.index, indexExtendType(address), address.scale); >> >> I think you can do better: if (scale == TimesEight && isUInt12(address.offset / 8)), then you can use the `ALWAYS_INLINE void ldr(RegisterID rt, RegisterID rn, unsigned pimm)` form and just issue 1 instruction to do this load. See also section C6.2.101 of the spec. >> >> See also LDUR in section C6.2.136 if scale == TimesOne. > > I should clarify: my point is that you may be able to do this with just 1 instruction instead of 2. Ditto for other cases below. So, I think it's worth checking the spec manual for available instructions. I don't think it is available since here there is base and index registers (since this is BaseIndex). But `ALWAYS_INLINE void ldr(RegisterID rt, RegisterID rn, unsigned pimm)` only takes base register. So if we would like to use `ALWAYS_INLINE void ldr(RegisterID rt, RegisterID rn, unsigned pimm)`, then we first need to fold index into base. This means that it requires 2 instructions anyway.
Yusuke Suzuki
Comment 5 2021-07-28 10:32:09 PDT
Mark Lam
Comment 6 2021-07-28 11:35:32 PDT
Comment on attachment 434434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434434&action=review r=me with suggestions. > Source/JavaScriptCore/assembler/testmasm.cpp:4224 > + uint64_t array[] = { 1, 2, 3, 4, 5, }; > + CHECK_EQ(invoke<uint64_t>(test, array, static_cast<UCPURegister>(3)), 3); nit: can you add INT64_MAX to each of the values in array so that this test will also catch bugs where the load instruction doesn't load the full 8 bytes? > Source/JavaScriptCore/assembler/testmasm.cpp:4234 > + uint64_t array[] = { 1, 2, 3, 4, 5, }; > + CHECK_EQ(invoke<uint64_t>(test, array, static_cast<UCPURegister>(3)), 5); Ditto. > Source/JavaScriptCore/assembler/testmasm.cpp:4247 > + uint32_t array[] = { 1, 2, 3, 4, 5, }; > + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 3); Ditto: add INT32_MAX. > Source/JavaScriptCore/assembler/testmasm.cpp:4257 > + uint32_t array[] = { 1, 2, 3, 4, 5, }; > + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 5); Ditto. > Source/JavaScriptCore/assembler/testmasm.cpp:4269 > + uint16_t array[] = { 1, 2, 3, 4, 5, }; > + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 3); Ditto: add INT16_MAX. > Source/JavaScriptCore/assembler/testmasm.cpp:4279 > + uint16_t array[] = { 1, 2, 3, 4, static_cast<uint16_t>(-1), }; > + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 0xffff); Ditto. > Source/JavaScriptCore/assembler/testmasm.cpp:4291 > + uint16_t array[] = { 1, 2, 3, 4, 5, }; > + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 3); nit: add (INT16_MAX - 10)? > Source/JavaScriptCore/assembler/testmasm.cpp:4301 > + uint16_t array[] = { 1, 2, 3, 4, static_cast<uint16_t>(-1), }; > + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), static_cast<uint32_t>(-1)); nit: add (INT16_MAX - 10) but make leave the last one as -1? > Source/JavaScriptCore/assembler/testmasm.cpp:4406 > + invoke<void>(test, array, 3, 42); > + CHECK_EQ(array[2], 42); nit: Add INT64_MAX to 42. > Source/JavaScriptCore/assembler/testmasm.cpp:4417 > + invoke<void>(test, array, 3, 42); > + CHECK_EQ(array[4], 42); Ditto. > Source/JavaScriptCore/assembler/testmasm.cpp:4431 > + invoke<void>(test, array, 3, 42); > + CHECK_EQ(array[2], 42); nit: Add INT32_MAX to 42. > Source/JavaScriptCore/assembler/testmasm.cpp:4442 > + invoke<void>(test, array, 3, 42); > + CHECK_EQ(array[4], 42); Ditto. > Source/JavaScriptCore/assembler/testmasm.cpp:4455 > + invoke<void>(test, array, 3, 42); > + CHECK_EQ(array[2], 42); nit: Add INT16_MAX to 42. > Source/JavaScriptCore/assembler/testmasm.cpp:4466 > + invoke<void>(test, array, 3, 42); > + CHECK_EQ(array[4], 42); Ditto.
Yusuke Suzuki
Comment 7 2021-07-28 11:47:08 PDT
Comment on attachment 434434 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=434434&action=review Thanks! >> Source/JavaScriptCore/assembler/testmasm.cpp:4224 >> + CHECK_EQ(invoke<uint64_t>(test, array, static_cast<UCPURegister>(3)), 3); > > nit: can you add INT64_MAX to each of the values in array so that this test will also catch bugs where the load instruction doesn't load the full 8 bytes? Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4234 >> + CHECK_EQ(invoke<uint64_t>(test, array, static_cast<UCPURegister>(3)), 5); > > Ditto. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4247 >> + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 3); > > Ditto: add INT32_MAX. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4257 >> + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 5); > > Ditto. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4279 >> + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 0xffff); > > Ditto. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4291 >> + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), 3); > > nit: add (INT16_MAX - 10)? Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4301 >> + CHECK_EQ(invoke<uint32_t>(test, array, static_cast<UCPURegister>(3)), static_cast<uint32_t>(-1)); > > nit: add (INT16_MAX - 10) but make leave the last one as -1? Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4406 >> + CHECK_EQ(array[2], 42); > > nit: Add INT64_MAX to 42. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4417 >> + CHECK_EQ(array[4], 42); > > Ditto. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4431 >> + CHECK_EQ(array[2], 42); > > nit: Add INT32_MAX to 42. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4442 >> + CHECK_EQ(array[4], 42); > > Ditto. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4455 >> + CHECK_EQ(array[2], 42); > > nit: Add INT16_MAX to 42. Changed. >> Source/JavaScriptCore/assembler/testmasm.cpp:4466 >> + CHECK_EQ(array[4], 42); > > Ditto. Changed.
Yusuke Suzuki
Comment 8 2021-07-28 11:49:36 PDT
Radar WebKit Bug Importer
Comment 9 2021-07-28 11:50:19 PDT
Yusuke Suzuki
Comment 10 2021-07-28 13:23:08 PDT
There is 0% chance that https://ews-build.webkit.org/#/builders/56/builds/12242 failures are related to this patch since 1. This is x64 EWS 2. This patch does nothing on x64. It only affects on ARM64.
Yusuke Suzuki
Comment 11 2021-07-28 13:24:20 PDT
(In reply to Yusuke Suzuki from comment #10) > There is 0% chance that > https://ews-build.webkit.org/#/builders/56/builds/12242 failures are related > to this patch since > > 1. This is x64 EWS > 2. This patch does nothing on x64. It only affects on ARM64. Looks like checking-out is failing. So, it is EWS issue.
Note You need to log in before you can comment on or make changes to this bug.