JIT currently generates three MIPS instructions for a load/store from/to an absolute address: lui adrTmpReg, address >> 16 ori adrTmpReg, address & 0xffff lw dataReg, 0(adrTmpReg) Since load/store instructions on MIPS have a 16-bit offset, lower 16 bits of the address can be encoded into the load/store and ori can be removed: lui adrTmpReg, (address + 0x8000) >> 16 lw dataReg, (address & 0xffff)(adrTmpReg)
Created attachment 334812 [details] Patch The patch that implements the proposed optimizations. This improves the benchmark results by ~1-2%
Can someone look at this patch and review it?
(In reply to Petar Jovanovic from comment #2) > Can someone look at this patch and review it? Please describe the details of this change in ChangeLog file, and set r? flag :)
Created attachment 335180 [details] Patch Rebased and added details to ChangeLog
Comment on attachment 335180 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=335180&action=review Hi there! Though I am not officially a reviewer, I have been doing informal reviews on MIPS-related patches. This patch looks good to me, and even when shaving one instruction per load/store may not look like a lot, it probably makes for measurable savings as there will be many load/store operations in typical generated code (By th way: do you have any performance numbers with and without the patch? — I am sure this is something we want to merge, I am just being curious myself :-D). I have just a small comment about using “uintptr_t” instead of “uint32_t”; but it's not something that should block landing the patch. Thanks a lot for your contribution! > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:239 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_ptr); Small nit: I would use “uintptr_t” here. Technically it will be equivalent, but IMHO it makes the intention of storing a pointer as an integer value clearer. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:242 > + if (imm.m_value >= -32768 && imm.m_value <= 32767) I know that this statement was just moved around and was already written this way, but maybe it would be nicer to use “INT16_MIN” and “INT16_MAX” instead of writing the not-so-magic numbers in the condition check. I am fine leaving the patch as-is currently, and if we want to use the constant names I would change that in a separate patch. At any rate, I am just thinking out loud, and I am not sure whether this is worth caring about — what do other reviewers think about this? > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:268 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:453 > + uint32_t adr = reinterpret_cast<uint32_t>(dest.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:507 > + uint32_t adr = reinterpret_cast<uint32_t>(dest.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:693 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_ptr); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:879 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:945 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1111 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1271 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Same here about using “uintptr_t”. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1290 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1478 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1496 > + uint32_t adr = reinterpret_cast<uint32_t>(address); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2091 > + uint32_t adr = reinterpret_cast<uint32_t>(dest.m_ptr); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2714 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_value); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2870 > + uint32_t adr = reinterpret_cast<uint32_t>(address.m_value); Ditto.
Created attachment 335846 [details] Patch
I agree about the use of uintptr_t instead of uint32_t, I changed that. As for constants like 32767, they are used in many places in the file, unrelated to this patch, so I guess it would be better to fix that in a separate patch. I also added an optimization for 8-bit and 32-bit loads and stores with BaseIndex addressing, where a left shift instruction can be omitted if address.scale is 0.
Comment on attachment 335846 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=335846&action=review Overall, looks good to me. But r- because I found one bug in this patch. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:269 > + if ((adr >> 15) == ((adr+4) >> 15)) { Style nits: use `(adr + 4)` instead of `(adr+4)`. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:280 > + m_assembler.lw(dataTempRegister, addrTempRegister, (adr+4) & 0xffff); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:284 > + m_assembler.sw(dataTempRegister, addrTempRegister, (adr+4) & 0xffff); Ditto. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:709 > + li addrTemp, address > + li immTemp, imm > + lw dataTemp, 0(addrTemp) > + subu dataTemp, dataTemp, immTemp > + sw dataTemp, 0(addrTemp) We need to fix this to align to the actual code. > Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:807 > + } It is wrong. ConvertibleLoadLabel's instructions need to be convertible to addPtr instruction sequence Please check MIPSAssembler::replaceWithLoad too.
Created attachment 335852 [details] Patch
Created attachment 335853 [details] Patch
Comment on attachment 335853 [details] Patch r=me, nice!
Comment on attachment 335853 [details] Patch Attachment 335853 [details] did not pass jsc-ews (mac): Output: http://webkit-queues.webkit.org/results/6965799 New failing tests: stress/ftl-put-by-id-setter-exception-interesting-live-state.js.no-cjit-collect-continuously
Just for the record, with these optimizations I got the following results on a mips32r1 platform: Benchmark Original Optimized Improvement Octane2 845 861 1.9% Kraken 3306 ms 3247 ms 1.8% JetStream 7.13 7.27 2.0%
(In reply to Stanislav Ocovaj from comment #13) > Just for the record, with these optimizations I got the following results on > a mips32r1 platform: > Benchmark Original Optimized Improvement > Octane2 845 861 1.9% > Kraken 3306 ms 3247 ms 1.8% > JetStream 7.13 7.27 2.0% That's pretty good for such a relatively small change, good work!
How do we get this change in?
Comment on attachment 335853 [details] Patch The test failed in the EWS has to be unrelated to this patch, because this only touches “MacroAssemblerMIPS.h” and the EWS is not running on a MIPS box, so I'll just cq+ to get this landed. I'll keep an eye on the bots but, as said, anything else than MIPS should be affected.
(In reply to Petar Jovanovic from comment #15) > How do we get this change in? There is a positive review (r+) from Yusuke, and JSC is his area of expertise, that's perfect. The EWS bot cancelled the commit-queue flag due to a test case failure, so I have set it back to cq+ because it seems completely unrelated. The commit-queue bot will apply the patch, so you don't need to do anything else. (Thanks for pinging in the bug, though; many of us are busy and the reminder from Bugzilla usually works. Alternatively, you can try to ping people in the IRC Freenode #webkit channel :D)
Comment on attachment 335853 [details] Patch Clearing flags on attachment: 335853 Committed r229766: <https://trac.webkit.org/changeset/229766>
All reviewed patches have been landed. Closing bug.
<rdar://problem/38670859>