Consider that we have a program like:
return a + b
Currently we'll emit code like:
movl %ecx, %rax
addl %edx, %rax
Note the movl. That's there because the OSR exit slow path will want both a and b, and so the result of the addition must go into a different register than either of the inputs.
But we could fix that if the OSR exit path undid the addition:
%result = @llvm.sadd.with.overflow(%a, %b)
if (extract %result, 1)
exit(%a, %result - %a)
In this case, we can do an addl that destroys %b.
I've written some code at the FTL lowering level that does this, and it does seem to remove a bunch of movl's from the code for V8v7/crypto's am3() function. But it's not an overall speed-up. I suspect it would be better to do this optimization in LLVM. It's pretty weird to do this at the time of LLVM IR generation.
Created attachment 219271 [details]
I don't think this works yet.
*** This bug has been marked as a duplicate of bug 126545 ***