Bug 216101 - Make TransformationMatrix::inverse() faster in the case of affine transformation matrices
Summary: Make TransformationMatrix::inverse() faster in the case of affine transformat...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Wenson Hsieh
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-09-02 14:53 PDT by Wenson Hsieh
Modified: 2020-09-03 07:21 PDT (History)
6 users (show)

See Also:


Attachments
Patch (5.60 KB, patch)
2020-09-02 15:23 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Patch (6.43 KB, patch)
2020-09-02 19:36 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wenson Hsieh 2020-09-02 14:53:31 PDT
This function currently computes the inverse transformation using a generic algorithm for 4x4 matrices.
Comment 1 Wenson Hsieh 2020-09-02 15:23:05 PDT
Created attachment 407822 [details]
Patch
Comment 2 Sam Weinig 2020-09-02 16:12:46 PDT
Comment on attachment 407822 [details]
Patch

Fun. Our of curiosity, did you try / benchmark using something like simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there are few good resources on this - https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the general case?
Comment 3 Wenson Hsieh 2020-09-02 16:26:43 PDT
(In reply to Sam Weinig from comment #2)
> Comment on attachment 407822 [details]
> Patch
> 
> Fun. Our of curiosity, did you try / benchmark using something like
> simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there
> are few good resources on this -
> https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-
> Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the
> general case?

Ah, so I did look into doing something similar to this briefly (like we do in TransformationMatrix::multiply), but decided to stick with this because it was platform/architecture-agnostic, simpler to implement, and non-affine transformation matrices are relatively uncommon vs. affine transformation matrices.

That said, as I continue hunting for progression opportunities, I think we can apply these techniques to some other hot codepaths around the vicinity (e.g. TransformationMatrix::map*()).
Comment 4 Sam Weinig 2020-09-02 17:54:38 PDT
(In reply to Wenson Hsieh from comment #3)
> (In reply to Sam Weinig from comment #2)
> > Comment on attachment 407822 [details]
> > Patch
> > 
> > Fun. Our of curiosity, did you try / benchmark using something like
> > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there
> > are few good resources on this -
> > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-
> > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the
> > general case?
> 
> Ah, so I did look into doing something similar to this briefly (like we do
> in TransformationMatrix::multiply), but decided to stick with this because
> it was platform/architecture-agnostic, simpler to implement, and non-affine
> transformation matrices are relatively uncommon vs. affine transformation
> matrices.
> 
> That said, as I continue hunting for progression opportunities, I think we
> can apply these techniques to some other hot codepaths around the vicinity
> (e.g. TransformationMatrix::map*()).

Ok. We are going to need to start thinking more about how we want to work with simd types in WebCore over time, as we are definitely leaving perf on the table. Doing what TransformationMatrix::multiply() does (inlining #ifdef'd assembly and intrinsics) is probably not a very scalable or reusable solution. I think we will will likely want to build out a bunch of primitives (similar to what Darwin's simd provides) and have good fallback. 

More related to this change, can the new logic be used to optimize TransformationMatrix::isInvertible() for the affine case?
Comment 5 Wenson Hsieh 2020-09-02 18:29:16 PDT
(In reply to Sam Weinig from comment #4)
> (In reply to Wenson Hsieh from comment #3)
> > (In reply to Sam Weinig from comment #2)
> > > Comment on attachment 407822 [details]
> > > Patch
> > > 
> > > Fun. Our of curiosity, did you try / benchmark using something like
> > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there
> > > are few good resources on this -
> > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-
> > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the
> > > general case?
> > 
> > Ah, so I did look into doing something similar to this briefly (like we do
> > in TransformationMatrix::multiply), but decided to stick with this because
> > it was platform/architecture-agnostic, simpler to implement, and non-affine
> > transformation matrices are relatively uncommon vs. affine transformation
> > matrices.
> > 
> > That said, as I continue hunting for progression opportunities, I think we
> > can apply these techniques to some other hot codepaths around the vicinity
> > (e.g. TransformationMatrix::map*()).
> 
> Ok. We are going to need to start thinking more about how we want to work
> with simd types in WebCore over time, as we are definitely leaving perf on
> the table. Doing what TransformationMatrix::multiply() does (inlining
> #ifdef'd assembly and intrinsics) is probably not a very scalable or
> reusable solution. I think we will will likely want to build out a bunch of
> primitives (similar to what Darwin's simd provides) and have good fallback. 

This sounds great! I plan to experiment more with this.

> 
> More related to this change, can the new logic be used to optimize
> TransformationMatrix::isInvertible() for the affine case?

Good point — I think that is very likely, though this only showed up a little bit in traces. I suspect that is because it only computes the determinant and not the adjoint matrix, the latter of which contains the bulk of the expense (~720 floating point multiplies and adds/subtracts, and the former with about 187). I should be able to bring this down to just 2 multiplications and 1 subtraction for the affine transform case.

Some quick experiments showed that it is called ~7-8 million times in this test (roughly as many times as invert()), so it seems like it would be worth optimizing as well. I’ll do some perf testing and see!
Comment 6 Wenson Hsieh 2020-09-02 19:29:04 PDT
(In reply to Wenson Hsieh from comment #5)
> (In reply to Sam Weinig from comment #4)
> > (In reply to Wenson Hsieh from comment #3)
> > > (In reply to Sam Weinig from comment #2)
> > > > Comment on attachment 407822 [details]
> > > > Patch
> > > > 
> > > > Fun. Our of curiosity, did you try / benchmark using something like
> > > > simd_inverse(simd_double4x4) (or try rolling your own simd inverse, there
> > > > are few good resources on this -
> > > > https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-
> > > > Explained.html, https://github.com/niswegmann/small-matrix-inverse) for the
> > > > general case?
> > > 
> > > Ah, so I did look into doing something similar to this briefly (like we do
> > > in TransformationMatrix::multiply), but decided to stick with this because
> > > it was platform/architecture-agnostic, simpler to implement, and non-affine
> > > transformation matrices are relatively uncommon vs. affine transformation
> > > matrices.
> > > 
> > > That said, as I continue hunting for progression opportunities, I think we
> > > can apply these techniques to some other hot codepaths around the vicinity
> > > (e.g. TransformationMatrix::map*()).
> > 
> > Ok. We are going to need to start thinking more about how we want to work
> > with simd types in WebCore over time, as we are definitely leaving perf on
> > the table. Doing what TransformationMatrix::multiply() does (inlining
> > #ifdef'd assembly and intrinsics) is probably not a very scalable or
> > reusable solution. I think we will will likely want to build out a bunch of
> > primitives (similar to what Darwin's simd provides) and have good fallback. 
> 
> This sounds great! I plan to experiment more with this.
> 
> > 
> > More related to this change, can the new logic be used to optimize
> > TransformationMatrix::isInvertible() for the affine case?
> 
> Good point — I think that is very likely, though this only showed up a
> little bit in traces. I suspect that is because it only computes the
> determinant and not the adjoint matrix, the latter of which contains the
> bulk of the expense (~720 floating point multiplies and adds/subtracts, and
> the former with about 187). I should be able to bring this down to just 2
> multiplications and 1 subtraction for the affine transform case.
> 
> Some quick experiments showed that it is called ~7-8 million times in this
> test (roughly as many times as invert()), so it seems like it would be worth
> optimizing as well. I’ll do some perf testing and see!

Results are more or less the same compared to just optimizing TransformationMatrix::inverse(), but I’ll include isInvertible in this patch as well, as it does seem to reduce the amount of time spent in that function (from looking at resulting artraces).
Comment 7 Wenson Hsieh 2020-09-02 19:36:50 PDT
Created attachment 407851 [details]
Patch
Comment 8 EWS 2020-09-03 07:20:30 PDT
Committed r266513: <https://trac.webkit.org/changeset/266513>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 407851 [details].
Comment 9 Radar WebKit Bug Importer 2020-09-03 07:21:18 PDT
<rdar://problem/68277416>