RESOLVED FIXED 183243
[MIPS] Optimize generated JIT code for loads/stores
https://bugs.webkit.org/show_bug.cgi?id=183243
Summary [MIPS] Optimize generated JIT code for loads/stores
Stanislav Ocovaj
Reported 2018-03-01 05:47:59 PST
JIT currently generates three MIPS instructions for a load/store from/to an absolute address: lui adrTmpReg, address >> 16 ori adrTmpReg, address & 0xffff lw dataReg, 0(adrTmpReg) Since load/store instructions on MIPS have a 16-bit offset, lower 16 bits of the address can be encoded into the load/store and ori can be removed: lui adrTmpReg, (address + 0x8000) >> 16 lw dataReg, (address & 0xffff)(adrTmpReg)
Attachments
Patch (18.54 KB, patch)
2018-03-01 06:08 PST, Stanislav Ocovaj
no flags
Patch (20.66 KB, patch)
2018-03-07 02:23 PST, Stanislav Ocovaj
no flags
Patch (27.73 KB, patch)
2018-03-15 08:07 PDT, Stanislav Ocovaj
ysuzuki: review-
Patch (27.73 KB, patch)
2018-03-15 09:13 PDT, Stanislav Ocovaj
no flags
Patch (26.56 KB, patch)
2018-03-15 09:15 PDT, Stanislav Ocovaj
no flags
Stanislav Ocovaj
Comment 1 2018-03-01 06:08:52 PST
Created attachment 334812 [details] Patch The patch that implements the proposed optimizations. This improves the benchmark results by ~1-2%
Petar Jovanovic
Comment 2 2018-03-06 08:27:39 PST
Can someone look at this patch and review it?
Yusuke Suzuki
Comment 3 2018-03-06 08:54:40 PST
(In reply to Petar Jovanovic from comment #2) > Can someone look at this patch and review it? Please describe the details of this change in ChangeLog file, and set r? flag :)
Stanislav Ocovaj
Comment 4 2018-03-07 02:23:41 PST
Created attachment 335180 [details] Patch Rebased and added details to ChangeLog
Adrian Perez
Comment 5 2018-03-07 10:32:10 PST
Comment on attachment 335180 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=335180&action=review Hi there! Though I am not officially a reviewer, I have been doing informal reviews on MIPS-related patches. This patch looks good to me, and even when shaving one instruction per load/store may not look like a lot, it probably makes for measurable savings as there will be many load/store operations in typical generated code (By th way: do you have any performance numbers with and without the patch? — I am sure this is something we want to merge, I am just being curious myself :-D). I have just a small comment about using “uintptr_t” instead of “uint32_t”; but it's not something that should block landing the patch. Thanks a lot for your contribution! > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:239 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_ptr); Small nit: I would use “uintptr_t” here. Technically it will be equivalent, but IMHO it makes the intention of storing a pointer as an integer value clearer. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:242 > + if (imm.m_value >= -32768 && imm.m_value <= 32767) I know that this statement was just moved around and was already written this way, but maybe it would be nicer to use “INT16_MIN” and “INT16_MAX” instead of writing the not-so-magic numbers in the condition check. I am fine leaving the patch as-is currently, and if we want to use the constant names I would change that in a separate patch. At any rate, I am just thinking out loud, and I am not sure whether this is worth caring about — what do other reviewers think about this? > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:268 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:453 > + uint32_t adr = reinterpret_cast<uint32_t>(dest.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:507 > + uint32_t adr = reinterpret_cast<uint32_t>(dest.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:693 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:879 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:945 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1111 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1271 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1290 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1478 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1496 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2091 > + uint32_t adr = reinterpret_cast<uint32_t>(dest.m_ptr); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2714 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_value); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2870 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_value); Ditto.
Stanislav Ocovaj
Comment 6 2018-03-15 08:07:21 PDT
Stanislav Ocovaj
Comment 7 2018-03-15 08:14:57 PDT
I agree about the use of uintptr_t instead of uint32_t, I changed that. As for constants like 32767, they are used in many places in the file, unrelated to this patch, so I guess it would be better to fix that in a separate patch. I also added an optimization for 8-bit and 32-bit loads and stores with BaseIndex addressing, where a left shift instruction can be omitted if address.scale is 0.
Yusuke Suzuki
Comment 8 2018-03-15 09:00:54 PDT
Comment on attachment 335846 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=335846&action=review Overall, looks good to me. But r- because I found one bug in this patch. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:269 > + if ((adr >> 15) == ((adr+4) >> 15)) { Style nits: use `(adr + 4)` instead of `(adr+4)`. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:280 > + m_assembler.lw(dataTempRegister, addrTempRegister, (adr+4) & 0xffff); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:284 > + m_assembler.sw(dataTempRegister, addrTempRegister, (adr+4) & 0xffff); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:709 > + li addrTemp, address > + li immTemp, imm > + lw dataTemp, 0(addrTemp) > + subu dataTemp, dataTemp, immTemp > + sw dataTemp, 0(addrTemp) We need to fix this to align to the actual code. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:807 > + } It is wrong. ConvertibleLoadLabel's instructions need to be convertible to addPtr instruction sequence Please check MIPSAssembler::replaceWithLoad too.
Stanislav Ocovaj
Comment 9 2018-03-15 09:13:51 PDT
Stanislav Ocovaj
Comment 10 2018-03-15 09:15:14 PDT
Yusuke Suzuki
Comment 11 2018-03-15 09:35:37 PDT
Comment on attachment 335853 [details] Patch r=me, nice!
EWS Watchlist
Comment 12 2018-03-15 10:31:14 PDT
Comment on attachment 335853 [details] Patch Attachment 335853 [details] did not pass jsc-ews (mac): Output: http://webkit-queues.webkit.org/results/6965799 New failing tests: stress/ftl-put-by-id-setter-exception-interesting-live-state.js.no-cjit-collect-continuously
Stanislav Ocovaj
Comment 13 2018-03-16 07:23:55 PDT
Just for the record, with these optimizations I got the following results on a mips32r1 platform: Benchmark Original Optimized Improvement Octane2 845 861 1.9% Kraken 3306 ms 3247 ms 1.8% JetStream 7.13 7.27 2.0%
Adrian Perez
Comment 14 2018-03-16 13:15:07 PDT
(In reply to Stanislav Ocovaj from comment #13) > Just for the record, with these optimizations I got the following results on > a mips32r1 platform: > Benchmark Original Optimized Improvement > Octane2 845 861 1.9% > Kraken 3306 ms 3247 ms 1.8% > JetStream 7.13 7.27 2.0% That's pretty good for such a relatively small change, good work!
Petar Jovanovic
Comment 15 2018-03-20 09:31:56 PDT
How do we get this change in?
Adrian Perez
Comment 16 2018-03-20 10:27:56 PDT
Comment on attachment 335853 [details] Patch The test failed in the EWS has to be unrelated to this patch, because this only touches “MacroAssemblerMIPS.h” and the EWS is not running on a MIPS box, so I'll just cq+ to get this landed. I'll keep an eye on the bots but, as said, anything else than MIPS should be affected.
Adrian Perez
Comment 17 2018-03-20 10:32:49 PDT
(In reply to Petar Jovanovic from comment #15) > How do we get this change in? There is a positive review (r+) from Yusuke, and JSC is his area of expertise, that's perfect. The EWS bot cancelled the commit-queue flag due to a test case failure, so I have set it back to cq+ because it seems completely unrelated. The commit-queue bot will apply the patch, so you don't need to do anything else. (Thanks for pinging in the bug, though; many of us are busy and the reminder from Bugzilla usually works. Alternatively, you can try to ping people in the IRC Freenode #webkit channel :D)
WebKit Commit Bot
Comment 18 2018-03-20 10:54:04 PDT
Comment on attachment 335853 [details] Patch Clearing flags on attachment: 335853 Committed r229766: <https://trac.webkit.org/changeset/229766>
WebKit Commit Bot
Comment 19 2018-03-20 10:54:05 PDT
All reviewed patches have been landed. Closing bug.
Radar WebKit Bug Importer
Comment 20 2018-03-20 10:56:24 PDT
Note You need to log in before you can comment on or make changes to this bug.