Summary: | SSE optimization for FFTFrameFFMPEG.cpp::multiply() | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Xingnan Wang <xingnan.wang> | ||||||||||||
Component: | Web Audio | Assignee: | Nobody <webkit-unassigned> | ||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||
Severity: | Normal | CC: | crogers, dglazkov, hclam, kbr, webkit.review.bot | ||||||||||||
Priority: | P2 | ||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Attachments: |
|
Description
Xingnan Wang
2011-12-19 00:10:33 PST
Created attachment 119829 [details]
Patch
Thanks for doing this. You got to it first before I could recommend it to you! One thing I would suggest is to take this code out of FFTFrameFFMPEG and move it to a place where other FFTFrame implementations could potentially use it (or other code needing complex multiply). Maybe it makes sense to make a function for this in VectorMath? In FFTFrameMac.cpp there's a call to a vecLib function called vDSP_zvmul(), so zvmul()?? (In reply to comment #2) > Thanks for doing this. You got to it first before I could recommend it to you! > > One thing I would suggest is to take this code out of FFTFrameFFMPEG and move it to a place where other FFTFrame implementations could potentially use it (or other code needing complex multiply). Maybe it makes sense to make a function for this in VectorMath? In FFTFrameMac.cpp there's a call to a vecLib function called vDSP_zvmul(), so zvmul()?? Hi Roger, I agree that we should add a zvmul into VectorMath, and in my mind it`s also very meaning to add more basic vector calculation functions such as vDSP_deq22D in Biquad.cpp to VectorMath and thus we can get more benefits from SSE optimization in audio processing. If so I am very glad to contribute more such optimization codes to enlarge the VectorMath. For the zvmul, I would like to add two with different types: //1. To be consistent with vDSP_zvmul void zvmul(const float* source1P, int sourceStride1, const float* source2P, int sourceStride2, float* destP, int destStride, size_t framesToProcess, int conjugate); //2. Used in FFTFrame, avoiding splitting and merging of complex_data void zvmul(const float* real1P, const float* imag1P, const float* real2P, const* imag2P, float* realDestP, float* imagDestP, size_t framesToProcess); They have different implementations of SSE optimization. If the two function`s definitions are OK, patch coming soon. Thanks for any comments. great to hear about the Biquad optimization! I'll have to look more closely, but I'm pretty sure we only need the zvmul() version which operates on non-interleaved (planar) complex data like we use in FFTFrameFFMPEG. I *thought* that was the same as the one used in FFTFrameMac. (In reply to comment #4) > great to hear about the Biquad optimization! > Now I am doing some work on the Biquad optimization, I will submit another patch when finishing it. > I'll have to look more closely, but I'm pretty sure we only need the zvmul() version which operates on non-interleaved (planar) complex data like we use in FFTFrameFFMPEG. I *thought* that was the same as the one used in FFTFrameMac. Yes you are right , sorry for misunderstanding the vDSP_zvmul, which surely uses non-interleaved complex data. I`ll update the patch soon. BTW, do you think it makes sense to add two functions like vDSP_ctoz and vDSP_ztoc in VectorMath? Maybe we can improve the performance of packing and unpacking complex data after some optimization on them. vDSP_ctoz and vDSP_ztoc might be useful. But the FFT algorithm itself is *really* the hotspot. Do you think it would make sense to create an FFTFrameIPP which calls into an appropriate Intel IPP library FFT function? We already have one for MKL (FFTFrameMKL), but I've since learned that the IPP library may be more appropriate for media/audio applications. (In reply to comment #6) > vDSP_ctoz and vDSP_ztoc might be useful. But the FFT algorithm itself is *really* the hotspot. Do you think it would make sense to create an FFTFrameIPP which calls into an appropriate Intel IPP library FFT function? We already have one for MKL (FFTFrameMKL), but I've since learned that the IPP library may be more appropriate for media/audio applications. Agree, IPP is more suitable than MKL for FFT in audio processing and it makes sense to leverage the IPP to do FFT. So I could have a try to add an FFTFrameIPP and enable the library. Created attachment 120142 [details]
Patch
Patch updated as comments.
Add zvmul in VectorMath and use zvmul and vsmul in multiply().
Thanks.
Comment on attachment 120142 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120142&action=review Looks good overall, with a few comments: > Source/WebCore/platform/audio/VectorMath.cpp:301 > +void zvmul(const float* real1P, const float* imag1P, const float* real2P, const float* imag2P, float* realDestP, float* imagDestP, size_t framesToProcess) We need to provide a version of this function in the DARWIN section using vDSP_zvmul. It shouldn't be too hard. You can look in FFTFrameMac.cpp for an example usage. It will involve using three local variables of type "DSPSplitComplex" /* Complex-split vector multiply, single-precision.*/ /* * vDSP_zvmul() * * Availability: * Mac OS X: in version 10.0 and later in vecLib.framework * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib * Non-Carbon CFM: in vecLib 1.0 and later */ extern void vDSP_zvmul( const DSPSplitComplex *__vDSP_input1, vDSP_Stride __vDSP_stride1, const DSPSplitComplex *__vDSP_input2, vDSP_Stride __vDSP_stride2, const DSPSplitComplex *__vDSP_result, vDSP_Stride __vDSP_strideResult, vDSP_Length __vDSP_size, int __vDSP_conjugate) AVAILABLE_MAC_OS_X_VERSION_10_0_AND_LATER; > Source/WebCore/platform/audio/VectorMath.cpp:305 > + // Only handle the very common case that all addresses are 16-byte aligned. I would elaborate on this comment a little by saying something like: // Only use the SSE optimization in the very common case that all addresses are 16-byte aligned. // Otherwise, fall through to the scalar code below. As the comment is right now, it seems to imply that we don't handle "at all" the non-aligned case. > Source/WebCore/platform/audio/ffmpeg/FFTFrameFFMPEG.cpp:-138 > - } I would restore the comment: // Multiply the packed DC/nyquist component Created attachment 120259 [details]
Patch
Update the patch as comments.
Change the FFTFrameMac::multiply() to the same way of FFTFrameFFMPEG, due to the real1P, imag1P, real2P and imag2P are all 16-byte aligned and it`s better to start process from 0 of buffer than 1.
Comment on attachment 120259 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120259&action=review > Source/WebCore/platform/audio/VectorMath.cpp:79 > +#else You need to invoke ::zvmul exactly with the same arguments as vDSP_zvmul() and move lines 77:79 down to just after line 88 In other words ::zvmul() also needs to take DSPSplitComplex arguments (as opposed to our new VectorMath::zvmul() function...) > Source/WebCore/platform/audio/ffmpeg/FFTFrameFFMPEG.cpp:122 > float scale = 0.5f; Please move lines 121:122 to just before first use, which is now just before line 134 > Source/WebCore/platform/audio/mac/FFTFrameMac.cpp:113 > float scale = 0.5f; Please move lines 112:113 to just right before first use, which is now just before line 126 Comment on attachment 120259 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120259&action=review >> Source/WebCore/platform/audio/VectorMath.cpp:79 >> +#else > > You need to invoke ::zvmul exactly with the same arguments as vDSP_zvmul() > > and move lines 77:79 down to just after line 88 > > In other words ::zvmul() also needs to take DSPSplitComplex arguments (as opposed to our new VectorMath::zvmul() function...) Not quite understand it, do you mean that the ::zvmul should be re-defined as VDSP_zvmul and import DSPSplitComplex into non-OS(DARWIN)? or add another zvmul definition below in VectorMath.h? #if OS(DARWIN) void vDSP_zvmul(const DSPSplitComplex *, vDSP_Stride ,const DSPSplitComplex *, vDSP_Stride, const DSPSplitComplex *, vDSP_Stride, vDSP_Length , int ); #endif (In reply to comment #12) > Not quite understand it, do you mean that the ::zvmul should be re-defined as VDSP_zvmul and import DSPSplitComplex into non-OS(DARWIN)? > or > add another zvmul definition below in VectorMath.h? > #if OS(DARWIN) > void vDSP_zvmul(const DSPSplitComplex *, vDSP_Stride ,const DSPSplitComplex *, vDSP_Stride, const DSPSplitComplex *, vDSP_Stride, vDSP_Length , int ); > #endif It's easier if I just show you how the function should be defined: void zvmul(const float* real1P, const float* imag1P, const float* real2P, const float* imag2P, float* realDestP, float* imagDestP, size_t framesToProcess) { DSPSplitComplex sc1; DSPSplitComplex sc2; DSPSplitComplex dest; sc1.realp = real1P; sc1.imagp = imag1P; sc2.realp = real2P; sc2.imagp = imag2P; dest.realp = realDestP; dest.imagp = imagDestP; #if defined(__ppc__) || defined(__i386__) ::zvmul(&sc1, 1, &sc2, 1, &dest, 1, framesToProcess, 1); #else vDSP_zvmul(&sc1, 1, &sc2, 1, &dest, 1, framesToProcess, 1); #endif } It's too bad we have to do this. It's really just a hack/work-around for the way some macros are defined in <Accelerate/Accelerat.h> The reason is explained in this comment in VectorMath.cpp: // On the Mac we use the highly optimized versions in Accelerate.framework // In 32-bit mode (__ppc__ or __i386__) <Accelerate/Accelerate.h> includes <vecLib/vDSP_translate.h> which defines macros of the same name as // our namespaced function names, so we must handle this case differently. Other architectures (64bit, ARM, etc.) do not include this header file. Created attachment 120270 [details]
Patch
(In reply to comment #14) > It's too bad we have to do this. It's really just a hack/work-around for the way some macros are defined in <Accelerate/Accelerat.h> The reason is explained in this comment in VectorMath.cpp: > > // On the Mac we use the highly optimized versions in Accelerate.framework > // In 32-bit mode (__ppc__ or __i386__) <Accelerate/Accelerate.h> includes <vecLib/vDSP_translate.h> which defines macros of the same name as > // our namespaced function names, so we must handle this case differently. Other architectures (64bit, ARM, etc.) do not include this header file. Got it. Thanks for explanation. Patch is updated. Comment on attachment 120270 [details] Patch Attachment 120270 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10986368 New failing tests: fast/workers/storage/use-same-database-in-page-and-workers.html (In reply to comment #17) > (From update of attachment 120270 [details]) > Attachment 120270 [details] did not pass chromium-ews (chromium-xvfb): > Output: http://queues.webkit.org/results/10986368 > > New failing tests: > fast/workers/storage/use-same-database-in-page-and-workers.html Hi Roger, My patch is failed to pass the tests in buildbots, but I am confused that these tests seems not have not too much relation with the change from my patch. I am not familiar with the chromium test framework so could you help me to check what`s the root cause of the building failure? Thank you very much. It's unlikely that it's related to your patch, and is probably some other recent change. You can try re-uploading your patch to have it try again. Created attachment 121031 [details]
Patch
Hi Roger,
I re-upload the patch, please help to submit to commit-queue again, thank you~
Looks good. Comment on attachment 121031 [details]
Patch
rs=me based on Chris's review.
Comment on attachment 121031 [details] Patch Clearing flags on attachment: 121031 Committed r104143: <http://trac.webkit.org/changeset/104143> All reviewed patches have been landed. Closing bug. Looks like this patch breaks the tree. See the following compilation error: /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:80:15:{80:17-80:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc1.realp = real1P; ^ ~~~~~~ /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:81:15:{81:17-81:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc1.imagp = imag1P; ^ ~~~~~~ /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:82:15:{82:17-82:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc2.realp = real2P; ^ ~~~~~~ /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:83:15:{83:17-83:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc2.imagp = imag2P; ^ ~~~~~~ Any fix for this? (In reply to comment #25) > Looks like this patch breaks the tree. See the following compilation error: > > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:80:15:{80:17-80:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.realp = real1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:81:15:{81:17-81:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.imagp = imag1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:82:15:{82:17-82:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.realp = real2P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:83:15:{83:17-83:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.imagp = imag2P; > ^ ~~~~~~ > > Any fix for this? Alpha: thanks for patching up these compilation failures in http://trac.webkit.org/changeset/104147 . (In reply to comment #25) > Looks like this patch breaks the tree. See the following compilation error: > > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:80:15:{80:17-80:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.realp = real1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:81:15:{81:17-81:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.imagp = imag1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:82:15:{82:17-82:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.realp = real2P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:83:15:{83:17-83:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.imagp = imag2P; > ^ ~~~~~~ > > Any fix for this? Thanks for your patch. |