Bug 219546

Summary: Improve vectorization in SincResampler
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: Web AudioAssignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: cdumez, darin, eric.carlson, ews-watchlist, ggaren, glenn, jer.noble, philipj, psaavedra, sergio, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Chris Dumez 2020-12-04 11:29:39 PST
Improve vectorization in SincResampler.
Comment 1 Chris Dumez 2020-12-04 11:42:49 PST
Created attachment 415438 [details]
Patch
Comment 2 Chris Dumez 2020-12-04 12:12:20 PST
Created attachment 415442 [details]
Patch
Comment 3 Darin Adler 2020-12-04 12:24:01 PST
Comment on attachment 415442 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=415442&action=review

How did you test all four versions?

> Source/WebCore/platform/audio/SincResampler.cpp:348
> +    vDSP_dotpr(inputP, 1, k1, 1, &sum1, kernelSize);
> +    vDSP_dotpr(inputP, 1, k2, 1, &sum2, kernelSize);

Is there a reason that we don’t write a 4 different implementations of a "dot product" function instead of having the algorithm differ a level above that?

> Source/WebCore/platform/audio/SincResampler.cpp:351
> +    return static_cast<float>((1.0 - kernelInterpolationFactor) * sum1 + kernelInterpolationFactor * sum2);

Is it important do do this as double and then convert back to float? I notice we do that in both versions.
Comment 4 Chris Dumez 2020-12-04 12:32:40 PST
(In reply to Darin Adler from comment #3)
> Comment on attachment 415442 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=415442&action=review
> 
> How did you test all four versions?

I have an Intel Mac so I was able to test ACCELERATE, SSE and slow implementations.
The only implementation I could not test was NEON but this implementation is a straight import from Chromium.

CPU(ARM_NEON) does not appear to be true on iPhone unless I tested wrong.

> 
> > Source/WebCore/platform/audio/SincResampler.cpp:348
> > +    vDSP_dotpr(inputP, 1, k1, 1, &sum1, kernelSize);
> > +    vDSP_dotpr(inputP, 1, k2, 1, &sum2, kernelSize);
> 
> Is there a reason that we don’t write a 4 different implementations of a
> "dot product" function instead of having the algorithm differ a level above
> that?

I guess we could do that. I was trying to stay in sync with Chromium here. ACCELERATE() is the only implementation that is not present in Chromium but that I chose to add for better performance on Cocoa platforms.

> 
> > Source/WebCore/platform/audio/SincResampler.cpp:351
> > +    return static_cast<float>((1.0 - kernelInterpolationFactor) * sum1 + kernelInterpolationFactor * sum2);
> 
> Is it important do do this as double and then convert back to float? I
> notice we do that in both versions.

Probably not since our vectorized versions only work with floats. I will test and fix.
Comment 5 Chris Dumez 2020-12-04 13:11:27 PST
Created attachment 415453 [details]
Patch
Comment 6 EWS 2020-12-04 14:21:45 PST
Committed r270457: <https://trac.webkit.org/changeset/270457>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 415453 [details].
Comment 7 Radar WebKit Bug Importer 2020-12-04 14:22:19 PST
<rdar://problem/71993191>
Comment 8 Pablo Saavedra 2020-12-09 03:45:44 PST
Maybe related: https://bugs.webkit.org/show_bug.cgi?id=219676?