Consider that we have a program like: return a + b Currently we'll emit code like: movl %ecx, %rax addl %edx, %rax jo slow Note the movl. That's there because the OSR exit slow path will want both a and b, and so the result of the addition must go into a different register than either of the inputs. But we could fix that if the OSR exit path undid the addition: %result = @llvm.sadd.with.overflow(%a, %b) if (extract %result, 1) exit(%a, %result - %a) In this case, we can do an addl that destroys %b. I've written some code at the FTL lowering level that does this, and it does seem to remove a bunch of movl's from the code for V8v7/crypto's am3() function. But it's not an overall speed-up. I suspect it would be better to do this optimization in LLVM. It's pretty weird to do this at the time of LLVM IR generation.
Created attachment 219271 [details] the patch I don't think this works yet.
*** This bug has been marked as a duplicate of bug 126545 ***