Bug 183130

Summary: [MIPS] Optimize generated JIT code for branches
Product: WebKit Reporter: Stanislav Ocovaj <stanislav.ocovaj>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: aperez, commit-queue, ews-watchlist, guijemont, keith_miller, mark.lam, mips32r2, msaboff, saam, stanislav.ocovaj, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Other   
OS: Linux   
Attachments:
Description Flags
MIPS branches
none
MIPS branches
ysuzuki: review-
Patch
none
Patch none

Description Stanislav Ocovaj 2018-02-26 08:18:16 PST
There are a couple of places where a more efficient code can be generated for branches on a MIPS platform.
Firstly, the patch https://bugs.webkit.org/show_bug.cgi?id=101328 added two nop instructions to branchEqual() and branchNotEqual() in order to prevent the beq/bne instruction in a patchable branch from being overwritten by replaceWithJump() in case when we have to jump outside the 256MB-aligned block. However, this adds a significant overhead for all other types of branches. Since these nop's protect the code that comes after the code generated by moveWithPatch(), this function seems like a better place to add them.
Secondly, in branches with immediate operand, the immediate value is 0 in many cases, and in these cases we can use the zero register directly without generating a move (pseudo)instruction.
Comment 1 Stanislav Ocovaj 2018-02-26 08:29:19 PST
Created attachment 334617 [details]
MIPS branches

This patch implements the proposed optimizations
Comment 2 Stanislav Ocovaj 2018-02-26 08:31:46 PST
The uploaded patch also fixes a bug in revertJumpToMove() where an incorrect pointer is passed to cacheFlush().
The implemented optimizations improve the benchmark results by ~5%
Comment 3 Petar Jovanovic 2018-03-05 09:39:34 PST
Can anyone take a look at this change?
Comment 4 Guillaume Emont 2018-03-06 18:15:27 PST
The patch should be flagged as "r?" to attract the attention of reviewers.
Comment 5 Mark Lam 2018-03-06 21:14:49 PST
(In reply to Guillaume Emont from comment #4)
> The patch should be flagged as "r?" to attract the attention of reviewers.

Also, this patch is also all in MIPS code.  I think it is better for someone who knows MIPS to do the review.  Are there any reviewers with MIPS expertise?
Comment 6 Stanislav Ocovaj 2018-03-07 00:17:49 PST
Created attachment 335174 [details]
MIPS branches

Rebased and added details to ChangeLog
Comment 7 Petar Jovanovic 2018-03-20 09:51:29 PDT
Anyone?
Comment 8 Yusuke Suzuki 2018-03-20 10:54:18 PDT
Comment on attachment 335174 [details]
MIPS branches

View in context: https://bugs.webkit.org/attachment.cgi?id=335174&action=review

> Source/JavaScriptCore/ChangeLog:15
> +        on a MIPS platform. Firstly, the patch https://bugs.webkit.org/show_bug.cgi?id=101328
> +        added two nop instructions to branchEqual() and branchNotEqual() in order to prevent
> +        the beq/bne instruction in a patchable branch from being overwritten by replaceWithJump()
> +        in case when we have to jump outside the 256MB-aligned block. However, this adds a significant
> +        overhead for all other types of branches. Since these nop's protect the code that comes after
> +        the code generated by moveWithPatch(), this function seems like a better place to add them.
> +        Secondly, in branches with immediate operand, the immediate value is 0 in many cases, and in

What happens if we just use patchableBranch64 and repatch it?
I don't think this is safe if it becomes broken.
How about separating patchableBranchPtr, patchableBranchPtrWithPatch etc. from branchPtr implementation and emitting nops in patchable versions?

> Source/JavaScriptCore/assembler/MIPSAssembler.h:1016
> +        cacheFlush(insn - 2, codeSize);

Lol, nice. I think `cacheFlush(instructionStart, codeSize);` is nicer.
BTW, replaceWithLoad and replaceWithAddressComputation should be fixed at least. Could you review all the other replacing functions too?

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1834
>              */

Let's change the comment.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1887
>              */

Ditto.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:1961
>              if (imm.m_value >= -32768 && imm.m_value  <= 32767 && !m_fixedWidth) {

Ditto.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2025
>              m_assembler.mfhi(dataTempRegister);

Ditto.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2070
>              m_assembler.mfhi(dataTempRegister);

Ditto.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2120
>              */

Ditto.

> Source/JavaScriptCore/assembler/MacroAssemblerMIPS.h:2179
>              */

Ditto.
Comment 9 Yusuke Suzuki 2018-03-20 10:55:09 PDT
Comment on attachment 335174 [details]
MIPS branches

View in context: https://bugs.webkit.org/attachment.cgi?id=335174&action=review

>> Source/JavaScriptCore/ChangeLog:15
>> +        Secondly, in branches with immediate operand, the immediate value is 0 in many cases, and in
> 
> What happens if we just use patchableBranch64 and repatch it?
> I don't think this is safe if it becomes broken.
> How about separating patchableBranchPtr, patchableBranchPtrWithPatch etc. from branchPtr implementation and emitting nops in patchable versions?

I mean, using replacing function with PatchableJump which does not come after moveWithPatch.
Comment 10 Stanislav Ocovaj 2018-03-21 02:41:48 PDT
Comment on attachment 335174 [details]
MIPS branches

View in context: https://bugs.webkit.org/attachment.cgi?id=335174&action=review

>> Source/JavaScriptCore/assembler/MIPSAssembler.h:1016
>> +        cacheFlush(insn - 2, codeSize);
> 
> Lol, nice. I think `cacheFlush(instructionStart, codeSize);` is nicer.
> BTW, replaceWithLoad and replaceWithAddressComputation should be fixed at least. Could you review all the other replacing functions too?

replaceWithLoad and replaceWithAddressComputation are OK because they only modify the last instruction. Other functions are also correct. It think repatchInt32 could be modified too, to use cacheFlush(from... instead of cacheFlush(insn... and remove the insn-- statement.
Comment 11 Stanislav Ocovaj 2018-03-21 04:36:48 PDT
Sorry, I think my explanation was not clear enough. The patch https://bugs.webkit.org/show_bug.cgi?id=101328 added two nop's to branchEqual() and branchNotEqual() in order to fix the following issue.
Without the nop's, branchPtrWithPatch() generates a block of instructions like this:

  lui immTempRegister, initialRightValue >> 16
  ori immTempRegister, immTempRegister, initialRightValue & 0xffff
  beq left, immTempRegister, label
  nop
  beq zero, zero, 1
  nop
  nop
  nop
1:

Then, at some point, this code may be changed by replaceWithJump() to

  lui t9, newLabel >> 16
  ori t9, newLabel & 0xffff
  jr  t9
  nop
  beq zero, zero, 1
  nop
  nop
  nop
1:

which overwrites the original beq instruction. Later on, the JIT may want to revert the jump back to branchPtrWithPatch by calling revertJumpReplacementToBranchPtrWithPatch(), which results with the following block:

  lui immTempRegister, initialRightValue >> 16
  ori immTempRegister, immTempRegister, initialRightValue & 0xffff
  nop
  nop
  beq zero, zero, 1
  nop
  nop
  nop
1:

Since the original beq instruction is lost, this code won't work anymore. So, the added nop's have nothing to do with patchable branches actually, they were added in orded to allow the code generated by branchPtrWithPatch() to be reverted back to branchPtrWithPatch after replacing it with a 4-instruction jump.
Repatching branches only modifies the last 6 instructions of the branch (see relinkJump() in MIPSAssembler.h), so that should not be affected.
Comment 12 Stanislav Ocovaj 2018-03-21 07:45:52 PDT
Created attachment 336192 [details]
Patch

I added the requested changes and rebased the patch. I removed the optimizations for branches with immediate operand as they are now included in another patch (https://bugs.webkit.org/show_bug.cgi?id=183740). I also moved the additional nop's from moveWithPatch to branchPtrWithPatch since that is the only place where they are really needed.
Comment 13 Stanislav Ocovaj 2018-03-22 08:44:55 PDT
Just for the record, with this optimization I got the following results on a mips32r1 platform:
Benchmark  Original  Optimized  Improvement
Octane2    845       897        6.2%
Kraken     3306 ms   3133 ms    5.5%
JetStream  7.13      7.56       6.0%
Comment 14 Petar Jovanovic 2018-03-29 08:33:57 PDT
Ping.
Comment 15 Stanislav Ocovaj 2018-04-04 07:31:49 PDT
Created attachment 337166 [details]
Patch

Rebased
Comment 16 Petar Jovanovic 2018-04-05 06:37:20 PDT
Yusuke Suzuki, are you OK with the change?
Comment 17 Yusuke Suzuki 2018-04-05 11:45:23 PDT
Comment on attachment 337166 [details]
Patch

OK, r=me.
Comment 18 WebKit Commit Bot 2018-04-05 12:12:05 PDT
Comment on attachment 337166 [details]
Patch

Clearing flags on attachment: 337166

Committed r230310: <https://trac.webkit.org/changeset/230310>
Comment 19 WebKit Commit Bot 2018-04-05 12:12:07 PDT
All reviewed patches have been landed.  Closing bug.
Comment 20 Radar WebKit Bug Importer 2018-04-05 12:13:26 PDT
<rdar://problem/39215379>