177100 – [DFG] Introduces fused compare and jump

Yusuke Suzuki

Reported 2017-09-18 13:28:04 PDT

Octane/zlib shows, CompareLess(Int32:UInt32ToNumber(@1), Int32:UInt32ToNumber(@2)). This can be changed to CompareBelow(Int32:@1, Int32:@2).

Yusuke Suzuki

Comment 1 2017-09-18 13:35:47 PDT

I think adhoc optimization is enough. if it is more likely used as uint32, dfg should extend it to int52.

Yusuke Suzuki

Comment 2 2017-09-20 12:21:11 PDT

Created attachment 321342 [details] Patch

Yusuke Suzuki

Comment 3 2017-09-20 12:25:44 PDT

Comment on attachment 321342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321342&action=review > Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 > + Node* terminal = block->terminal(); > + if (terminal->op() == Branch) { > + for (unsigned i = 0; i < terminal->numSuccessors(); ++i) { > + BasicBlock* successor = terminal->successor(i); > + m_graph.forAllLiveInBytecode( > + successor->at(0)->origin.forExit, > + [&] (VirtualRegister reg) { > + m_state.operand(reg) = currentEpoch; > + }); > + } > + } else { > + m_graph.forAllLiveInBytecode( > + terminal->origin.forExit, > + [&] (VirtualRegister reg) { > + m_state.operand(reg) = currentEpoch; > + }); > + } I would like to generalize and clean up this part. The problem is that we have fused compare & jump bytecodes like op_jless. In this case, the bytecode includes registers used for comparison. Ideally, if we know that this comparison does not cause any exit, we should be able to drop MovHint for these registers. However, currently we mark live registers for this `op_jless` here. Therefore, used registers are considered live at this point. This effectively prevents us from removing a specific MovHint in Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. With this change, we can see 6% improvement in Octane/zlib.

Build Bot

Comment 4 2017-09-20 13:04:13 PDT

Comment on attachment 321342 [details] Patch Attachment 321342 [details] did not pass jsc-ews (mac): Output: http://webkit-queues.webkit.org/results/4606415 New failing tests: stress/typedarray-functions-with-neutered.js.ftl-eager-no-cjit-b3o1 stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-no-cjit-small-pool stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-no-cjit-no-inline-validate jsc-layout-tests.yaml/js/script-tests/dfg-branch-logical-not-peephole-around-osr-exit.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/dfg-branch-not-fail.js.layout-ftl-eager-no-cjit stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-no-cjit-validate-sampling-profiler jsc-layout-tests.yaml/js/script-tests/dfg-branch-logical-not-peephole-around-osr-exit.js.layout-ftl-no-cjit jsc-layout-tests.yaml/js/script-tests/parser-syntax-check.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/dfg-branch-not-fail.js.layout stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-no-cjit-b3o1 stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-eager-no-cjit stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-eager jsc-layout-tests.yaml/js/script-tests/dfg-branch-not-fail.js.layout-ftl-no-cjit jsc-layout-tests.yaml/js/script-tests/arrowfunction-syntax-errors.js.layout-ftl-eager-no-cjit stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-eager-no-cjit-b3o1 jsc-layout-tests.yaml/js/script-tests/get-by-pname.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/dfg-branch-logical-not-peephole-around-osr-exit.js.layout stress/branch-check-int32-on-boolean-to-number-untyped.js.ftl-no-cjit-no-put-stack-validate

Filip Pizlo

Comment 5 2017-09-20 13:15:22 PDT

(In reply to Yusuke Suzuki from comment #3) > Comment on attachment 321342 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=321342&action=review > > > Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 > > + Node* terminal = block->terminal(); > > + if (terminal->op() == Branch) { > > + for (unsigned i = 0; i < terminal->numSuccessors(); ++i) { > > + BasicBlock* successor = terminal->successor(i); > > + m_graph.forAllLiveInBytecode( > > + successor->at(0)->origin.forExit, > > + [&] (VirtualRegister reg) { > > + m_state.operand(reg) = currentEpoch; > > + }); > > + } > > + } else { > > + m_graph.forAllLiveInBytecode( > > + terminal->origin.forExit, > > + [&] (VirtualRegister reg) { > > + m_state.operand(reg) = currentEpoch; > > + }); > > + } > > I would like to generalize and clean up this part. The problem is that we > have fused compare & jump bytecodes like op_jless. > In this case, the bytecode includes registers used for comparison. > Ideally, if we know that this comparison does not cause any exit, we should > be able to drop MovHint for these registers. > However, currently we mark live registers for this `op_jless` here. > Therefore, used registers are considered live at this point. > > This effectively prevents us from removing a specific MovHint in > Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. > With this change, we can see 6% improvement in Octane/zlib. Removing MovHints is super hard, as I'm sure you're starting to appreciate. It's basically not sound... the DFG has hardly enough information to remove MovHints reliably in this case. If you think that this is valuable, I propose an entirely different solution: fuse compare/branch in bytecode. Then, it's guaranteed that we can keep them fused in DFG. Maybe we can just remove the MovHint removal phase entirely and all of the flaky DFG IR compare/branch fusion. It's really worth noting that this isn't even the first time when the DFG compare/branch fusion was broken. It's the 3rd time or so. It's been broken for other reasons in the past, and when I did the OSR exit refactoring that gave us the current MovHint situation, I decided not to fix the problem anymore because we have FTL and I was tired of dealing with the horrible bug fallout from the previous times that someone tried to make branch fusion work. So, if this is important (it seems like it is), then lets finally make our bytecode have both compare opcodes and branch opcodes. This means more redundant-looking bytecode opcodes, but that's an OK price to pay - since the alternative is either less perf or MovHint removal. More opcodes is better than either of those other things.

Build Bot

Comment 6 2017-09-20 13:21:38 PDT

Comment on attachment 321342 [details] Patch Attachment 321342 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4606568 Number of test failures exceeded the failure limit.

Build Bot

Comment 7 2017-09-20 13:21:39 PDT

Created attachment 321350 [details] Archive of layout-test-results from ews117 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Yusuke Suzuki

Comment 8 2017-09-20 14:05:14 PDT

Comment on attachment 321342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321342&action=review >>> Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 >>> + } >> >> I would like to generalize and clean up this part. The problem is that we have fused compare & jump bytecodes like op_jless. >> In this case, the bytecode includes registers used for comparison. >> Ideally, if we know that this comparison does not cause any exit, we should be able to drop MovHint for these registers. >> However, currently we mark live registers for this `op_jless` here. Therefore, used registers are considered live at this point. >> >> This effectively prevents us from removing a specific MovHint in Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. >> With this change, we can see 6% improvement in Octane/zlib. > > Removing MovHints is super hard, as I'm sure you're starting to appreciate. It's basically not sound... the DFG has hardly enough information to remove MovHints reliably in this case. > > If you think that this is valuable, I propose an entirely different solution: fuse compare/branch in bytecode. Then, it's guaranteed that we can keep them fused in DFG. Maybe we can just remove the MovHint removal phase entirely and all of the flaky DFG IR compare/branch fusion. It's really worth noting that this isn't even the first time when the DFG compare/branch fusion was broken. It's the 3rd time or so. It's been broken for other reasons in the past, and when I did the OSR exit refactoring that gave us the current MovHint situation, I decided not to fix the problem anymore because we have FTL and I was tired of dealing with the horrible bug fallout from the previous times that someone tried to make branch fusion work. > > So, if this is important (it seems like it is), then lets finally make our bytecode have both compare opcodes and branch opcodes. This means more redundant-looking bytecode opcodes, but that's an OK price to pay - since the alternative is either less perf or MovHint removal. More opcodes is better than either of those other things. Oh, maybe, my comment is misleading, I'll explain the details. We have the DFG nodes like the following in Octane/zlib. 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) 306:< 1:loc39> CompareGreater(Int32:@297, Int32:@303, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) While UInt32ToNumber here seems not costly, it becomes a problem if this is *so* heavily executed in the loop. The interesting thing is that the results of UInt32ToNumber are always within uint32 area. If it is used in CompareGreater, we could have a way to improve it by changing it to UInt32 aware op, (CompareAbove). Performing the conversion from CompareGreater(UInt32ToNumber(Int32:@1), UInt32ToNumber(Int32:@2)) to CompareAbove(Int32:@1, Int32:@2). 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) 306:< 1:loc39> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) Then, we have a chance to drop UInt32ToNumber if it is not used. Actually, it is not used in any other places. Only @298, @303 MovHint are using them. The problem here is that MovHint is not removed by this phase since op_jgreater holds the registers to them. When starting this MovHint removal phase, we set epoch like, m_graph.forAllLiveInBytecode( block->terminal()->origin.forExit, [&] (VirtualRegister reg) { m_state.operand(reg) = currentEpoch; }); As a result, op_jgreater's live registers are marked as currentEpoch here even though they are not used after op_jgreater. I'm not sure fusing DFG Compare and Branch solves this problem. Even if introducing a DFG node like BranchCompareGreater(), still op_jgreater tells us that these registers are live at this point, right? On the other hand, IIRC, the second suggestion, decoupling op_jgreater into op_branch and op_greater can solve this issue. This is because the terminal node's exit site becomes op_branch and it does not say these registers are live. But it enlarges a bytecode a bit (it may affect inlining) and I'm worried about the performance of LLInt's jgreater/jless etc.'s jumps.

Yusuke Suzuki

Comment 9 2017-09-20 14:12:04 PDT

Comment on attachment 321342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321342&action=review >>>> Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 >>>> + } >>> >>> I would like to generalize and clean up this part. The problem is that we have fused compare & jump bytecodes like op_jless. >>> In this case, the bytecode includes registers used for comparison. >>> Ideally, if we know that this comparison does not cause any exit, we should be able to drop MovHint for these registers. >>> However, currently we mark live registers for this `op_jless` here. Therefore, used registers are considered live at this point. >>> >>> This effectively prevents us from removing a specific MovHint in Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. >>> With this change, we can see 6% improvement in Octane/zlib. >> >> Removing MovHints is super hard, as I'm sure you're starting to appreciate. It's basically not sound... the DFG has hardly enough information to remove MovHints reliably in this case. >> >> If you think that this is valuable, I propose an entirely different solution: fuse compare/branch in bytecode. Then, it's guaranteed that we can keep them fused in DFG. Maybe we can just remove the MovHint removal phase entirely and all of the flaky DFG IR compare/branch fusion. It's really worth noting that this isn't even the first time when the DFG compare/branch fusion was broken. It's the 3rd time or so. It's been broken for other reasons in the past, and when I did the OSR exit refactoring that gave us the current MovHint situation, I decided not to fix the problem anymore because we have FTL and I was tired of dealing with the horrible bug fallout from the previous times that someone tried to make branch fusion work. >> >> So, if this is important (it seems like it is), then lets finally make our bytecode have both compare opcodes and branch opcodes. This means more redundant-looking bytecode opcodes, but that's an OK price to pay - since the alternative is either less perf or MovHint removal. More opcodes is better than either of those other things. > > Oh, maybe, my comment is misleading, I'll explain the details. > We have the DFG nodes like the following in Octane/zlib. > > 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) > 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) > 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) > 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) > 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) > 306:< 1:loc39> CompareGreater(Int32:@297, Int32:@303, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) > 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) > > While UInt32ToNumber here seems not costly, it becomes a problem if this is *so* heavily executed in the loop. > The interesting thing is that the results of UInt32ToNumber are always within uint32 area. If it is used in CompareGreater, we could have a way to improve it by changing it to UInt32 aware op, (CompareAbove). > Performing the conversion from CompareGreater(UInt32ToNumber(Int32:@1), UInt32ToNumber(Int32:@2)) to CompareAbove(Int32:@1, Int32:@2). > > 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) > 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) > 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) > 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) > 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) > 306:< 1:loc39> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) > 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) > > Then, we have a chance to drop UInt32ToNumber if it is not used. Actually, it is not used in any other places. > Only @298, @303 MovHint are using them. > > The problem here is that MovHint is not removed by this phase since op_jgreater holds the registers to them. > When starting this MovHint removal phase, we set epoch like, > > m_graph.forAllLiveInBytecode( > block->terminal()->origin.forExit, > [&] (VirtualRegister reg) { > m_state.operand(reg) = currentEpoch; > }); > > As a result, op_jgreater's live registers are marked as currentEpoch here even though they are not used after op_jgreater. > I'm not sure fusing DFG Compare and Branch solves this problem. Even if introducing a DFG node like BranchCompareGreater(), still op_jgreater tells us that these registers are live at this point, right? > > On the other hand, IIRC, the second suggestion, decoupling op_jgreater into op_branch and op_greater can solve this issue. This is because the terminal node's exit site becomes op_branch and it does not say these registers are live. > But it enlarges a bytecode a bit (it may affect inlining) and I'm worried about the performance of LLInt's jgreater/jless etc.'s jumps. Note that I removed ComapreAbove's Exits carefully (It is only shown if binary use kind is Int32Use). And I also removed UInt32ToNumber's MustGenerate here. 297:< 1:-> UInt32ToNumber(Int32:@212, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) 204:<!0:-> KillStack(MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) 298:<!0:-> MovHint(Untyped:Kill:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitInvalid) 1060:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) 303:< 1:-> UInt32ToNumber(Int32:@288, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) 437:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) 304:<!0:-> MovHint(Untyped:Kill:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitInvalid) 306:< 1:-> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, bc#351, ExitValid) 307:<!0:-> Branch(KnownBoolean:Kill:@306, MustGen, T:#2/w:1.000000, F:#3/w:1.000000, W:SideState, bc#351, ExitInvalid)

Yusuke Suzuki

Comment 10 2017-09-20 14:14:04 PDT

Comment on attachment 321342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321342&action=review >>>>> Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 >>>>> + } >>>> >>>> I would like to generalize and clean up this part. The problem is that we have fused compare & jump bytecodes like op_jless. >>>> In this case, the bytecode includes registers used for comparison. >>>> Ideally, if we know that this comparison does not cause any exit, we should be able to drop MovHint for these registers. >>>> However, currently we mark live registers for this `op_jless` here. Therefore, used registers are considered live at this point. >>>> >>>> This effectively prevents us from removing a specific MovHint in Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. >>>> With this change, we can see 6% improvement in Octane/zlib. >>> >>> Removing MovHints is super hard, as I'm sure you're starting to appreciate. It's basically not sound... the DFG has hardly enough information to remove MovHints reliably in this case. >>> >>> If you think that this is valuable, I propose an entirely different solution: fuse compare/branch in bytecode. Then, it's guaranteed that we can keep them fused in DFG. Maybe we can just remove the MovHint removal phase entirely and all of the flaky DFG IR compare/branch fusion. It's really worth noting that this isn't even the first time when the DFG compare/branch fusion was broken. It's the 3rd time or so. It's been broken for other reasons in the past, and when I did the OSR exit refactoring that gave us the current MovHint situation, I decided not to fix the problem anymore because we have FTL and I was tired of dealing with the horrible bug fallout from the previous times that someone tried to make branch fusion work. >>> >>> So, if this is important (it seems like it is), then lets finally make our bytecode have both compare opcodes and branch opcodes. This means more redundant-looking bytecode opcodes, but that's an OK price to pay - since the alternative is either less perf or MovHint removal. More opcodes is better than either of those other things. >> >> Oh, maybe, my comment is misleading, I'll explain the details. >> We have the DFG nodes like the following in Octane/zlib. >> >> 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >> 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >> 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >> 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >> 306:< 1:loc39> CompareGreater(Int32:@297, Int32:@303, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) >> 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) >> >> While UInt32ToNumber here seems not costly, it becomes a problem if this is *so* heavily executed in the loop. >> The interesting thing is that the results of UInt32ToNumber are always within uint32 area. If it is used in CompareGreater, we could have a way to improve it by changing it to UInt32 aware op, (CompareAbove). >> Performing the conversion from CompareGreater(UInt32ToNumber(Int32:@1), UInt32ToNumber(Int32:@2)) to CompareAbove(Int32:@1, Int32:@2). >> >> 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >> 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >> 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >> 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >> 306:< 1:loc39> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) >> 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) >> >> Then, we have a chance to drop UInt32ToNumber if it is not used. Actually, it is not used in any other places. >> Only @298, @303 MovHint are using them. >> >> The problem here is that MovHint is not removed by this phase since op_jgreater holds the registers to them. >> When starting this MovHint removal phase, we set epoch like, >> >> m_graph.forAllLiveInBytecode( >> block->terminal()->origin.forExit, >> [&] (VirtualRegister reg) { >> m_state.operand(reg) = currentEpoch; >> }); >> >> As a result, op_jgreater's live registers are marked as currentEpoch here even though they are not used after op_jgreater. >> I'm not sure fusing DFG Compare and Branch solves this problem. Even if introducing a DFG node like BranchCompareGreater(), still op_jgreater tells us that these registers are live at this point, right? >> >> On the other hand, IIRC, the second suggestion, decoupling op_jgreater into op_branch and op_greater can solve this issue. This is because the terminal node's exit site becomes op_branch and it does not say these registers are live. >> But it enlarges a bytecode a bit (it may affect inlining) and I'm worried about the performance of LLInt's jgreater/jless etc.'s jumps. > > Note that I removed ComapreAbove's Exits carefully (It is only shown if binary use kind is Int32Use). And I also removed UInt32ToNumber's MustGenerate here. > > 297:< 1:-> UInt32ToNumber(Int32:@212, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) > 204:<!0:-> KillStack(MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) > 298:<!0:-> MovHint(Untyped:Kill:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitInvalid) > 1060:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) > 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) > 303:< 1:-> UInt32ToNumber(Int32:@288, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) > 437:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) > 304:<!0:-> MovHint(Untyped:Kill:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitInvalid) > 306:< 1:-> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, bc#351, ExitValid) > 307:<!0:-> Branch(KnownBoolean:Kill:@306, MustGen, T:#2/w:1.000000, F:#3/w:1.000000, W:SideState, bc#351, ExitInvalid) And current this patch is buggy. My intention is like, + Node* terminal = block->terminal(); + if (terminal->op() == Branch) { + BitVector liveAfter; + liveAfter.ensureSize(m_graph.block(0)->variablesAtHead.numberOfLocals()); + for (unsigned i = 0; i < terminal->numSuccessors(); ++i) { + BasicBlock* successor = terminal->successor(i); + liveAfter.merge(m_graph.localsLiveInBytecode(successor->at(0)->origin.forExit)); + } + m_graph.forAllLiveInBytecode( + terminal->origin.forExit, + [&] (VirtualRegister reg) { + if (liveAfter.get(reg.toLocal())) + return; + m_state.operand(reg) = currentEpoch; + }); + } else { + m_graph.forAllLiveInBytecode( + terminal->origin.forExit, + [&] (VirtualRegister reg) { + m_state.operand(reg) = currentEpoch; + }); + }

Yusuke Suzuki

Comment 11 2017-09-20 14:31:06 PDT

Comment on attachment 321342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321342&action=review >>>>>> Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 >>>>>> + } >>>>> >>>>> I would like to generalize and clean up this part. The problem is that we have fused compare & jump bytecodes like op_jless. >>>>> In this case, the bytecode includes registers used for comparison. >>>>> Ideally, if we know that this comparison does not cause any exit, we should be able to drop MovHint for these registers. >>>>> However, currently we mark live registers for this `op_jless` here. Therefore, used registers are considered live at this point. >>>>> >>>>> This effectively prevents us from removing a specific MovHint in Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. >>>>> With this change, we can see 6% improvement in Octane/zlib. >>>> >>>> Removing MovHints is super hard, as I'm sure you're starting to appreciate. It's basically not sound... the DFG has hardly enough information to remove MovHints reliably in this case. >>>> >>>> If you think that this is valuable, I propose an entirely different solution: fuse compare/branch in bytecode. Then, it's guaranteed that we can keep them fused in DFG. Maybe we can just remove the MovHint removal phase entirely and all of the flaky DFG IR compare/branch fusion. It's really worth noting that this isn't even the first time when the DFG compare/branch fusion was broken. It's the 3rd time or so. It's been broken for other reasons in the past, and when I did the OSR exit refactoring that gave us the current MovHint situation, I decided not to fix the problem anymore because we have FTL and I was tired of dealing with the horrible bug fallout from the previous times that someone tried to make branch fusion work. >>>> >>>> So, if this is important (it seems like it is), then lets finally make our bytecode have both compare opcodes and branch opcodes. This means more redundant-looking bytecode opcodes, but that's an OK price to pay - since the alternative is either less perf or MovHint removal. More opcodes is better than either of those other things. >>> >>> Oh, maybe, my comment is misleading, I'll explain the details. >>> We have the DFG nodes like the following in Octane/zlib. >>> >>> 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >>> 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >>> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >>> 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >>> 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >>> 306:< 1:loc39> CompareGreater(Int32:@297, Int32:@303, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) >>> 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) >>> >>> While UInt32ToNumber here seems not costly, it becomes a problem if this is *so* heavily executed in the loop. >>> The interesting thing is that the results of UInt32ToNumber are always within uint32 area. If it is used in CompareGreater, we could have a way to improve it by changing it to UInt32 aware op, (CompareAbove). >>> Performing the conversion from CompareGreater(UInt32ToNumber(Int32:@1), UInt32ToNumber(Int32:@2)) to CompareAbove(Int32:@1, Int32:@2). >>> >>> 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >>> 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >>> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >>> 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >>> 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >>> 306:< 1:loc39> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) >>> 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) >>> >>> Then, we have a chance to drop UInt32ToNumber if it is not used. Actually, it is not used in any other places. >>> Only @298, @303 MovHint are using them. >>> >>> The problem here is that MovHint is not removed by this phase since op_jgreater holds the registers to them. >>> When starting this MovHint removal phase, we set epoch like, >>> >>> m_graph.forAllLiveInBytecode( >>> block->terminal()->origin.forExit, >>> [&] (VirtualRegister reg) { >>> m_state.operand(reg) = currentEpoch; >>> }); >>> >>> As a result, op_jgreater's live registers are marked as currentEpoch here even though they are not used after op_jgreater. >>> I'm not sure fusing DFG Compare and Branch solves this problem. Even if introducing a DFG node like BranchCompareGreater(), still op_jgreater tells us that these registers are live at this point, right? >>> >>> On the other hand, IIRC, the second suggestion, decoupling op_jgreater into op_branch and op_greater can solve this issue. This is because the terminal node's exit site becomes op_branch and it does not say these registers are live. >>> But it enlarges a bytecode a bit (it may affect inlining) and I'm worried about the performance of LLInt's jgreater/jless etc.'s jumps. >> >> Note that I removed ComapreAbove's Exits carefully (It is only shown if binary use kind is Int32Use). And I also removed UInt32ToNumber's MustGenerate here. >> >> 297:< 1:-> UInt32ToNumber(Int32:@212, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >> 204:<!0:-> KillStack(MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >> 298:<!0:-> MovHint(Untyped:Kill:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitInvalid) >> 1060:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >> 303:< 1:-> UInt32ToNumber(Int32:@288, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >> 437:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >> 304:<!0:-> MovHint(Untyped:Kill:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitInvalid) >> 306:< 1:-> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, bc#351, ExitValid) >> 307:<!0:-> Branch(KnownBoolean:Kill:@306, MustGen, T:#2/w:1.000000, F:#3/w:1.000000, W:SideState, bc#351, ExitInvalid) > > And current this patch is buggy. My intention is like, > > + Node* terminal = block->terminal(); > + if (terminal->op() == Branch) { > + BitVector liveAfter; > + liveAfter.ensureSize(m_graph.block(0)->variablesAtHead.numberOfLocals()); > + for (unsigned i = 0; i < terminal->numSuccessors(); ++i) { > + BasicBlock* successor = terminal->successor(i); > + liveAfter.merge(m_graph.localsLiveInBytecode(successor->at(0)->origin.forExit)); > + } > + m_graph.forAllLiveInBytecode( > + terminal->origin.forExit, > + [&] (VirtualRegister reg) { > + if (liveAfter.get(reg.toLocal())) > + return; > + m_state.operand(reg) = currentEpoch; > + }); > + } else { > + m_graph.forAllLiveInBytecode( > + terminal->origin.forExit, > + [&] (VirtualRegister reg) { > + m_state.operand(reg) = currentEpoch; > + }); > + } Ah, not correct, yeah, it's super hard :P Anyway, I would like to achieve it somehow. Fil's idea is separating op_jless into op_less and op_true, is my understanding correct?

Yusuke Suzuki

Comment 12 2017-09-20 14:49:40 PDT

Comment on attachment 321342 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=321342&action=review >>>>>>> Source/JavaScriptCore/dfg/DFGMovHintRemovalPhase.cpp:103 >>>>>>> + } >>>>>> >>>>>> I would like to generalize and clean up this part. The problem is that we have fused compare & jump bytecodes like op_jless. >>>>>> In this case, the bytecode includes registers used for comparison. >>>>>> Ideally, if we know that this comparison does not cause any exit, we should be able to drop MovHint for these registers. >>>>>> However, currently we mark live registers for this `op_jless` here. Therefore, used registers are considered live at this point. >>>>>> >>>>>> This effectively prevents us from removing a specific MovHint in Octane/zlib. And it prevents us from dropping unnecessary UInt32ToNumber. >>>>>> With this change, we can see 6% improvement in Octane/zlib. >>>>> >>>>> Removing MovHints is super hard, as I'm sure you're starting to appreciate. It's basically not sound... the DFG has hardly enough information to remove MovHints reliably in this case. >>>>> >>>>> If you think that this is valuable, I propose an entirely different solution: fuse compare/branch in bytecode. Then, it's guaranteed that we can keep them fused in DFG. Maybe we can just remove the MovHint removal phase entirely and all of the flaky DFG IR compare/branch fusion. It's really worth noting that this isn't even the first time when the DFG compare/branch fusion was broken. It's the 3rd time or so. It's been broken for other reasons in the past, and when I did the OSR exit refactoring that gave us the current MovHint situation, I decided not to fix the problem anymore because we have FTL and I was tired of dealing with the horrible bug fallout from the previous times that someone tried to make branch fusion work. >>>>> >>>>> So, if this is important (it seems like it is), then lets finally make our bytecode have both compare opcodes and branch opcodes. This means more redundant-looking bytecode opcodes, but that's an OK price to pay - since the alternative is either less perf or MovHint removal. More opcodes is better than either of those other things. >>>> >>>> Oh, maybe, my comment is misleading, I'll explain the details. >>>> We have the DFG nodes like the following in Octane/zlib. >>>> >>>> 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >>>> 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >>>> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >>>> 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >>>> 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >>>> 306:< 1:loc39> CompareGreater(Int32:@297, Int32:@303, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) >>>> 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) >>>> >>>> While UInt32ToNumber here seems not costly, it becomes a problem if this is *so* heavily executed in the loop. >>>> The interesting thing is that the results of UInt32ToNumber are always within uint32 area. If it is used in CompareGreater, we could have a way to improve it by changing it to UInt32 aware op, (CompareAbove). >>>> Performing the conversion from CompareGreater(UInt32ToNumber(Int32:@1), UInt32ToNumber(Int32:@2)) to CompareAbove(Int32:@1, Int32:@2). >>>> >>>> 297:<!2:loc41> UInt32ToNumber(Int32:@212, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >>>> 298:<!0:-> MovHint(Untyped:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >>>> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >>>> 303:<!2:loc39> UInt32ToNumber(Int32:@288, Number|MustGen|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >>>> 304:<!0:-> MovHint(Untyped:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >>>> 306:< 1:loc39> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, Exits, bc#351, ExitValid) >>>> 307:<!0:-> Branch(KnownBoolean:@306, MustGen, T:#1/w:1.000000, F:#2/w:1.000000, W:SideState, bc#351, ExitInvalid) >>>> >>>> Then, we have a chance to drop UInt32ToNumber if it is not used. Actually, it is not used in any other places. >>>> Only @298, @303 MovHint are using them. >>>> >>>> The problem here is that MovHint is not removed by this phase since op_jgreater holds the registers to them. >>>> When starting this MovHint removal phase, we set epoch like, >>>> >>>> m_graph.forAllLiveInBytecode( >>>> block->terminal()->origin.forExit, >>>> [&] (VirtualRegister reg) { >>>> m_state.operand(reg) = currentEpoch; >>>> }); >>>> >>>> As a result, op_jgreater's live registers are marked as currentEpoch here even though they are not used after op_jgreater. >>>> I'm not sure fusing DFG Compare and Branch solves this problem. Even if introducing a DFG node like BranchCompareGreater(), still op_jgreater tells us that these registers are live at this point, right? >>>> >>>> On the other hand, IIRC, the second suggestion, decoupling op_jgreater into op_branch and op_greater can solve this issue. This is because the terminal node's exit site becomes op_branch and it does not say these registers are live. >>>> But it enlarges a bytecode a bit (it may affect inlining) and I'm worried about the performance of LLInt's jgreater/jless etc.'s jumps. >>> >>> Note that I removed ComapreAbove's Exits carefully (It is only shown if binary use kind is Int32Use). And I also removed UInt32ToNumber's MustGenerate here. >>> >>> 297:< 1:-> UInt32ToNumber(Int32:@212, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#341, ExitValid) >>> 204:<!0:-> KillStack(MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitValid) >>> 298:<!0:-> MovHint(Untyped:Kill:@297, MustGen, loc36, W:SideState, ClobbersExit, bc#341, ExitInvalid) >>> 1060:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >>> 301:<!0:-> MovHint(Untyped:@288, MustGen, loc37, W:SideState, ClobbersExit, bc#344, ExitInvalid) >>> 303:< 1:-> UInt32ToNumber(Int32:@288, Number|UseAsOther, Int32, CheckOverflow, Exits, bc#348, ExitValid) >>> 437:<!0:-> KillStack(MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitValid) >>> 304:<!0:-> MovHint(Untyped:Kill:@303, MustGen, loc37, W:SideState, ClobbersExit, bc#348, ExitInvalid) >>> 306:< 1:-> CompareAbove(Int32:@212, Int32:@288, Boolean|UseAsOther, Bool, bc#351, ExitValid) >>> 307:<!0:-> Branch(KnownBoolean:Kill:@306, MustGen, T:#2/w:1.000000, F:#3/w:1.000000, W:SideState, bc#351, ExitInvalid) >> >> And current this patch is buggy. My intention is like, >> >> + Node* terminal = block->terminal(); >> + if (terminal->op() == Branch) { >> + BitVector liveAfter; >> + liveAfter.ensureSize(m_graph.block(0)->variablesAtHead.numberOfLocals()); >> + for (unsigned i = 0; i < terminal->numSuccessors(); ++i) { >> + BasicBlock* successor = terminal->successor(i); >> + liveAfter.merge(m_graph.localsLiveInBytecode(successor->at(0)->origin.forExit)); >> + } >> + m_graph.forAllLiveInBytecode( >> + terminal->origin.forExit, >> + [&] (VirtualRegister reg) { >> + if (liveAfter.get(reg.toLocal())) >> + return; >> + m_state.operand(reg) = currentEpoch; >> + }); >> + } else { >> + m_graph.forAllLiveInBytecode( >> + terminal->origin.forExit, >> + [&] (VirtualRegister reg) { >> + m_state.operand(reg) = currentEpoch; >> + }); >> + } > > Ah, not correct, yeah, it's super hard :P > Anyway, I would like to achieve it somehow. > > Fil's idea is separating op_jless into op_less and op_true, is my understanding correct? BTW, for this specific case, I think we can optimize cases by doing simple bytecode generator level optimization, like, For the given case, a>>>0 comparison b >>>0 currently, we emit op_unsigned(op_urshift(a)) comparison op_unsigned(op_urshift(b)) by changing it to op_urshift(a) unsignedComparison op_urshift(b) we can achieve optimized result. I think it would be good since it's very simple, and it's suitable for the actual use cases. Basically, >>>0 is used to convert the given value to uint32. But signedness (int32 v.s. uint32) is typically considered when using comparison. So the above case would be frequently shown.

Filip Pizlo

Comment 13 2017-09-20 15:45:40 PDT

I think I was a bit confused. Bytecode already does the right thing in most places. My idea is making DFG IR have fused compare brand opcodes, like BranchLessThan. Then combine this with the bytecode generator using fused opcodes everywhere it matters (I think that’s almost true today). I think that would massively simplicity things.

Yusuke Suzuki

Comment 14 2017-09-20 16:44:55 PDT

(In reply to Filip Pizlo from comment #13) > I think I was a bit confused. Bytecode already does the right thing in most > places. > > My idea is making DFG IR have fused compare brand opcodes, like > BranchLessThan. Then combine this with the bytecode generator using fused > opcodes everywhere it matters (I think that’s almost true today). > > I think that would massively simplicity things. Aha, OK. I think there are two separated things in this issue. First one is ungood MovHint handlings. I think introducing fused DFG bytecodes make it simple. And we can drop bunch of peephole fusings, which is not working right now. Second one is the issue found in this patch, optimizing Octane/zlib. After considering the current issues, I think even the first issue is fixed, Octane/zlib cannot get perf improvement because of UInt32ToNumber's checks. I think the second one should be rather simply fixed in the bytecode compiler layer. I'll change this issue to handle fused DFG compare & branch. And I'll open a bug for the second issue.

Yusuke Suzuki

Comment 15 2018-03-20 09:09:26 PDT

After seeing so frequent use of == / === / != / !== with jump, I believe we should introduce fused bytecodes as well as we do for < / > / <= / >= etc. As we discussed here with Fil, we do not want to complicate DFG MovHint removal further. Just doing it in bytecode generator is the simplest (And it offers good consistency with jless jgreater etc.).

Yusuke Suzuki

Comment 16 2018-03-22 01:23:17 PDT

Performance impact seems ok. Benchmark report for Octane and Kraken on hinako-note (MacBookPro11,4). VMs tested: "baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/fused-bytecodes-tot/Release/jsc "patched" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/fused-bytecodes/Release/jsc Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched Octane: encrypt 0.18213+-0.00209 0.18101+-0.00123 decrypt 3.26492+-0.01790 3.26353+-0.02301 deltablue x2 0.17262+-0.00761 0.16982+-0.00176 might be 1.0164x faster earley 0.35352+-0.00231 0.35318+-0.00184 boyer 6.18038+-0.05002 ^ 6.06990+-0.04176 ^ definitely 1.0182x faster navier-stokes x2 5.53909+-0.03123 ? 5.54093+-0.03980 ? raytrace x2 0.84667+-0.00701 0.84106+-0.00588 richards x2 0.09137+-0.00113 ? 0.09188+-0.00165 ? splay x2 0.37905+-0.00223 0.37752+-0.00176 regexp x2 19.87391+-0.36430 19.85331+-0.50770 pdfjs x2 41.08605+-0.39294 41.00525+-0.40552 mandreel x2 52.35994+-0.62308 51.73654+-0.33236 might be 1.0120x faster gbemu x2 37.53180+-0.28675 37.39952+-0.37394 closure 0.56499+-0.00439 ? 0.56546+-0.00212 ? jquery 7.45761+-0.04372 7.42978+-0.02304 box2d x2 9.83831+-0.06194 ? 9.89235+-0.09460 ? zlib x2 352.39444+-2.12288 350.40908+-1.93405 typescript x2 720.48181+-4.60443 718.85740+-10.25924 splay-latency 2.28818+-0.02927 2.25106+-0.04024 might be 1.0165x faster <geometric> 5.60128+-0.02407 5.57838+-0.01607 might be 1.0041x faster baseline patched Kraken: ai-astar 98.990+-1.312 ? 100.684+-2.597 ? might be 1.0171x slower audio-beat-detection 47.885+-0.353 ? 50.853+-6.487 ? might be 1.0620x slower audio-dft 110.699+-2.853 110.513+-2.224 audio-fft 37.217+-0.873 ? 37.545+-0.793 ? audio-oscillator 58.774+-0.875 58.446+-0.157 imaging-darkroom 71.650+-4.357 69.776+-1.194 might be 1.0269x faster imaging-desaturate 61.692+-1.612 ? 62.078+-1.521 ? imaging-gaussian-blur 83.674+-4.373 ? 84.398+-4.647 ? json-parse-financial 39.173+-1.480 ? 40.793+-1.779 ? might be 1.0413x slower json-stringify-tinderbox 22.466+-1.092 21.606+-0.099 might be 1.0398x faster stanford-crypto-aes 43.842+-1.293 43.707+-0.994 stanford-crypto-ccm 38.109+-2.125 ? 38.397+-2.491 ? stanford-crypto-pbkdf2 72.941+-1.667 72.887+-1.642 stanford-crypto-sha256-iterative 23.511+-1.447 22.296+-0.286 might be 1.0545x faster <arithmetic> 57.902+-0.628 ? 58.141+-0.581 ? might be 1.0041x slower baseline patched Geomean of preferred means: <scaled-result> 18.0084+-0.1055 ? 18.0088+-0.0847 ? might be 1.0000x slower

Yusuke Suzuki

Comment 17 2018-03-22 01:24:54 PDT

Created attachment 336264 [details] Patch

Yusuke Suzuki

Comment 18 2018-03-22 04:58:58 PDT

Created attachment 336271 [details] Patch

Yusuke Suzuki

Comment 19 2018-03-22 05:02:23 PDT

*** Bug 149713 has been marked as a duplicate of this bug. ***

Yusuke Suzuki

Comment 20 2018-03-22 05:09:46 PDT

Created attachment 336272 [details] Patch

Yusuke Suzuki

Comment 21 2018-03-22 05:27:08 PDT

Created attachment 336275 [details] Patch

Mark Lam

Comment 22 2018-03-24 14:16:15 PDT

Comment on attachment 336275 [details] Patch r=me

Yusuke Suzuki

Comment 23 2018-03-24 14:43:28 PDT

Committed r229957: <https://trac.webkit.org/changeset/229957>

Saam Barati

Comment 24 2018-03-25 22:17:32 PDT

Comment on attachment 336275 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=336275&action=review > Source/JavaScriptCore/ChangeLog:18 > + 3. It offers the way to fuse check and jump in DFG code generation. Since > + we have MovHint between Branch and CompareEq/CompareStrictEq previously, > + we cannot do this optimization. It reduces the machine code size in DFG too. I wonder how many bugs this is going to reveal.

Radar WebKit Bug Importer

Comment 25 2018-03-25 22:17:48 PDT

<rdar://problem/38854297>

Attachments
Patch (20.62 KB, patch) 2017-09-20 12:21 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews117 for mac-elcapitan (660.08 KB, application/zip) 2017-09-20 13:21 PDT, Build Bot	no flags	Details
Patch (33.47 KB, patch) 2018-03-22 01:24 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (49.77 KB, patch) 2018-03-22 04:58 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (50.44 KB, patch) 2018-03-22 05:09 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (55.43 KB, patch) 2018-03-22 05:27 PDT, Yusuke Suzuki	mark.lam: review+	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.