This function currently computes the inverse transformation using a generic algorithm for 4x4 matrices.

Created attachment 407822 [details] Patch

Comment on attachment 407822 [details] Patch Fun. Our of curiosity, did you try / benchmark using something like simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there are few good resources on this - https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the general case?

(In reply to Sam Weinig from comment #2) > Comment on attachment 407822 [details] > Patch > > Fun. Our of curiosity, did you try / benchmark using something like > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > are few good resources on this - > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > general case? Ah, so I did look into doing something similar to this briefly (like we do in TransformationMatrix::multiply), but decided to stick with this because it was platform/architecture-agnostic, simpler to implement, and non-affine transformation matrices are relatively uncommon vs. affine transformation matrices. That said, as I continue hunting for progression opportunities, I think we can apply these techniques to some other hot codepaths around the vicinity (e.g. TransformationMatrix::map*()).

(In reply to Wenson Hsieh from comment #3) > (In reply to Sam Weinig from comment #2) > > Comment on attachment 407822 [details] > > Patch > > > > Fun. Our of curiosity, did you try / benchmark using something like > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > > are few good resources on this - > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > > general case? > > Ah, so I did look into doing something similar to this briefly (like we do > in TransformationMatrix::multiply), but decided to stick with this because > it was platform/architecture-agnostic, simpler to implement, and non-affine > transformation matrices are relatively uncommon vs. affine transformation > matrices. > > That said, as I continue hunting for progression opportunities, I think we > can apply these techniques to some other hot codepaths around the vicinity > (e.g. TransformationMatrix::map*()). Ok. We are going to need to start thinking more about how we want to work with simd types in WebCore over time, as we are definitely leaving perf on the table. Doing what TransformationMatrix::multiply() does (inlining #ifdef'd assembly and intrinsics) is probably not a very scalable or reusable solution. I think we will will likely want to build out a bunch of primitives (similar to what Darwin's simd provides) and have good fallback. More related to this change, can the new logic be used to optimize TransformationMatrix::isInvertible() for the affine case?

(In reply to Sam Weinig from comment #4) > (In reply to Wenson Hsieh from comment #3) > > (In reply to Sam Weinig from comment #2) > > > Comment on attachment 407822 [details] > > > Patch > > > > > > Fun. Our of curiosity, did you try / benchmark using something like > > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > > > are few good resources on this - > > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > > > general case? > > > > Ah, so I did look into doing something similar to this briefly (like we do > > in TransformationMatrix::multiply), but decided to stick with this because > > it was platform/architecture-agnostic, simpler to implement, and non-affine > > transformation matrices are relatively uncommon vs. affine transformation > > matrices. > > > > That said, as I continue hunting for progression opportunities, I think we > > can apply these techniques to some other hot codepaths around the vicinity > > (e.g. TransformationMatrix::map*()). > > Ok. We are going to need to start thinking more about how we want to work > with simd types in WebCore over time, as we are definitely leaving perf on > the table. Doing what TransformationMatrix::multiply() does (inlining > #ifdef'd assembly and intrinsics) is probably not a very scalable or > reusable solution. I think we will will likely want to build out a bunch of > primitives (similar to what Darwin's simd provides) and have good fallback. This sounds great! I plan to experiment more with this. > > More related to this change, can the new logic be used to optimize > TransformationMatrix::isInvertible() for the affine case? Good point — I think that is very likely, though this only showed up a little bit in traces. I suspect that is because it only computes the determinant and not the adjoint matrix, the latter of which contains the bulk of the expense (~720 floating point multiplies and adds/subtracts, and the former with about 187). I should be able to bring this down to just 2 multiplications and 1 subtraction for the affine transform case. Some quick experiments showed that it is called ~7-8 million times in this test (roughly as many times as invert()), so it seems like it would be worth optimizing as well. I’ll do some perf testing and see!

(In reply to Wenson Hsieh from comment #5) > (In reply to Sam Weinig from comment #4) > > (In reply to Wenson Hsieh from comment #3) > > > (In reply to Sam Weinig from comment #2) > > > > Comment on attachment 407822 [details] > > > > Patch > > > > > > > > Fun. Our of curiosity, did you try / benchmark using something like > > > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > > > > are few good resources on this - > > > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > > > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > > > > general case? > > > > > > Ah, so I did look into doing something similar to this briefly (like we do > > > in TransformationMatrix::multiply), but decided to stick with this because > > > it was platform/architecture-agnostic, simpler to implement, and non-affine > > > transformation matrices are relatively uncommon vs. affine transformation > > > matrices. > > > > > > That said, as I continue hunting for progression opportunities, I think we > > > can apply these techniques to some other hot codepaths around the vicinity > > > (e.g. TransformationMatrix::map*()). > > > > Ok. We are going to need to start thinking more about how we want to work > > with simd types in WebCore over time, as we are definitely leaving perf on > > the table. Doing what TransformationMatrix::multiply() does (inlining > > #ifdef'd assembly and intrinsics) is probably not a very scalable or > > reusable solution. I think we will will likely want to build out a bunch of > > primitives (similar to what Darwin's simd provides) and have good fallback. > > This sounds great! I plan to experiment more with this. > > > > > More related to this change, can the new logic be used to optimize > > TransformationMatrix::isInvertible() for the affine case? > > Good point — I think that is very likely, though this only showed up a > little bit in traces. I suspect that is because it only computes the > determinant and not the adjoint matrix, the latter of which contains the > bulk of the expense (~720 floating point multiplies and adds/subtracts, and > the former with about 187). I should be able to bring this down to just 2 > multiplications and 1 subtraction for the affine transform case. > > Some quick experiments showed that it is called ~7-8 million times in this > test (roughly as many times as invert()), so it seems like it would be worth > optimizing as well. I’ll do some perf testing and see! Results are more or less the same compared to just optimizing TransformationMatrix::inverse(), but I’ll include isInvertible in this patch as well, as it does seem to reduce the amount of time spent in that function (from looking at resulting artraces).

Created attachment 407851 [details] Patch

Committed r266513: <https://trac.webkit.org/changeset/266513> All reviewed patches have been landed. Closing bug and clearing flags on attachment 407851 [details].

<rdar://problem/68277416>