216101 – Make TransformationMatrix::inverse() faster in the case of affine transformation matrices

RESOLVED FIXED 216101

Make TransformationMatrix::inverse() faster in the case of affine transformation matrices

https://bugs.webkit.org/show_bug.cgi?id=216101

Summary Make TransformationMatrix::inverse() faster in the case of affine transformat...

Wenson Hsieh

Reported 2020-09-02 14:53:31 PDT

This function currently computes the inverse transformation using a generic algorithm for 4x4 matrices.

Attachments
Patch (5.60 KB, patch) 2020-09-02 15:23 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Patch (6.43 KB, patch) 2020-09-02 19:36 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Wenson Hsieh

Comment 1 2020-09-02 15:23:05 PDT

Created attachment 407822 [details] Patch

Sam Weinig

Comment 2 2020-09-02 16:12:46 PDT

Comment on attachment 407822 [details] Patch Fun. Our of curiosity, did you try / benchmark using something like simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there are few good resources on this - https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the general case?

Wenson Hsieh

Comment 3 2020-09-02 16:26:43 PDT

(In reply to Sam Weinig from comment #2) > Comment on attachment 407822 [details] > Patch > > Fun. Our of curiosity, did you try / benchmark using something like > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > are few good resources on this - > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > general case? Ah, so I did look into doing something similar to this briefly (like we do in TransformationMatrix::multiply), but decided to stick with this because it was platform/architecture-agnostic, simpler to implement, and non-affine transformation matrices are relatively uncommon vs. affine transformation matrices. That said, as I continue hunting for progression opportunities, I think we can apply these techniques to some other hot codepaths around the vicinity (e.g. TransformationMatrix::map*()).

Sam Weinig

Comment 4 2020-09-02 17:54:38 PDT

(In reply to Wenson Hsieh from comment #3) > (In reply to Sam Weinig from comment #2) > > Comment on attachment 407822 [details] > > Patch > > > > Fun. Our of curiosity, did you try / benchmark using something like > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > > are few good resources on this - > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > > general case? > > Ah, so I did look into doing something similar to this briefly (like we do > in TransformationMatrix::multiply), but decided to stick with this because > it was platform/architecture-agnostic, simpler to implement, and non-affine > transformation matrices are relatively uncommon vs. affine transformation > matrices. > > That said, as I continue hunting for progression opportunities, I think we > can apply these techniques to some other hot codepaths around the vicinity > (e.g. TransformationMatrix::map*()). Ok. We are going to need to start thinking more about how we want to work with simd types in WebCore over time, as we are definitely leaving perf on the table. Doing what TransformationMatrix::multiply() does (inlining #ifdef'd assembly and intrinsics) is probably not a very scalable or reusable solution. I think we will will likely want to build out a bunch of primitives (similar to what Darwin's simd provides) and have good fallback. More related to this change, can the new logic be used to optimize TransformationMatrix::isInvertible() for the affine case?

Wenson Hsieh

Comment 5 2020-09-02 18:29:16 PDT

(In reply to Sam Weinig from comment #4) > (In reply to Wenson Hsieh from comment #3) > > (In reply to Sam Weinig from comment #2) > > > Comment on attachment 407822 [details] > > > Patch > > > > > > Fun. Our of curiosity, did you try / benchmark using something like > > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > > > are few good resources on this - > > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > > > general case? > > > > Ah, so I did look into doing something similar to this briefly (like we do > > in TransformationMatrix::multiply), but decided to stick with this because > > it was platform/architecture-agnostic, simpler to implement, and non-affine > > transformation matrices are relatively uncommon vs. affine transformation > > matrices. > > > > That said, as I continue hunting for progression opportunities, I think we > > can apply these techniques to some other hot codepaths around the vicinity > > (e.g. TransformationMatrix::map*()). > > Ok. We are going to need to start thinking more about how we want to work > with simd types in WebCore over time, as we are definitely leaving perf on > the table. Doing what TransformationMatrix::multiply() does (inlining > #ifdef'd assembly and intrinsics) is probably not a very scalable or > reusable solution. I think we will will likely want to build out a bunch of > primitives (similar to what Darwin's simd provides) and have good fallback. This sounds great! I plan to experiment more with this. > > More related to this change, can the new logic be used to optimize > TransformationMatrix::isInvertible() for the affine case? Good point — I think that is very likely, though this only showed up a little bit in traces. I suspect that is because it only computes the determinant and not the adjoint matrix, the latter of which contains the bulk of the expense (~720 floating point multiplies and adds/subtracts, and the former with about 187). I should be able to bring this down to just 2 multiplications and 1 subtraction for the affine transform case. Some quick experiments showed that it is called ~7-8 million times in this test (roughly as many times as invert()), so it seems like it would be worth optimizing as well. I’ll do some perf testing and see!

Wenson Hsieh

Comment 6 2020-09-02 19:29:04 PDT

(In reply to Wenson Hsieh from comment #5) > (In reply to Sam Weinig from comment #4) > > (In reply to Wenson Hsieh from comment #3) > > > (In reply to Sam Weinig from comment #2) > > > > Comment on attachment 407822 [details] > > > > Patch > > > > > > > > Fun. Our of curiosity, did you try / benchmark using something like > > > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there > > > > are few good resources on this - > > > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD- > > > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the > > > > general case? > > > > > > Ah, so I did look into doing something similar to this briefly (like we do > > > in TransformationMatrix::multiply), but decided to stick with this because > > > it was platform/architecture-agnostic, simpler to implement, and non-affine > > > transformation matrices are relatively uncommon vs. affine transformation > > > matrices. > > > > > > That said, as I continue hunting for progression opportunities, I think we > > > can apply these techniques to some other hot codepaths around the vicinity > > > (e.g. TransformationMatrix::map*()). > > > > Ok. We are going to need to start thinking more about how we want to work > > with simd types in WebCore over time, as we are definitely leaving perf on > > the table. Doing what TransformationMatrix::multiply() does (inlining > > #ifdef'd assembly and intrinsics) is probably not a very scalable or > > reusable solution. I think we will will likely want to build out a bunch of > > primitives (similar to what Darwin's simd provides) and have good fallback. > > This sounds great! I plan to experiment more with this. > > > > > More related to this change, can the new logic be used to optimize > > TransformationMatrix::isInvertible() for the affine case? > > Good point — I think that is very likely, though this only showed up a > little bit in traces. I suspect that is because it only computes the > determinant and not the adjoint matrix, the latter of which contains the > bulk of the expense (~720 floating point multiplies and adds/subtracts, and > the former with about 187). I should be able to bring this down to just 2 > multiplications and 1 subtraction for the affine transform case. > > Some quick experiments showed that it is called ~7-8 million times in this > test (roughly as many times as invert()), so it seems like it would be worth > optimizing as well. I’ll do some perf testing and see! Results are more or less the same compared to just optimizing TransformationMatrix::inverse(), but I’ll include isInvertible in this patch as well, as it does seem to reduce the amount of time spent in that function (from looking at resulting artraces).

Wenson Hsieh

Comment 7 2020-09-02 19:36:50 PDT

Created attachment 407851 [details] Patch

EWS

Comment 8 2020-09-03 07:20:30 PDT

Committed r266513: <https://trac.webkit.org/changeset/266513> All reviewed patches have been landed. Closing bug and clearing flags on attachment 407851 [details].

Radar WebKit Bug Importer

Comment 9 2020-09-03 07:21:18 PDT

<rdar://problem/68277416>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebCore Misc.

Assignee

Wenson Hsieh

Reported

2020-09-02 14:53 PDT

Modified

2020-09-03 07:21 PDT History

CC List

6 users Show

URL

Keywords InRadar

Depends on

Blocks