Full Text Bug Listing

Xingnan Wang

Reported 2011-12-19 00:10:33 PST

We will get a better performance with SSE2 optimization due to it is a vector related multiply in FFTFrameFFMPEG.cpp::multiply().

Xingnan Wang

Comment 1 2011-12-19 00:18:42 PST

Created attachment 119829 [details] Patch

Chris Rogers

Comment 2 2011-12-19 10:31:48 PST

Thanks for doing this. You got to it first before I could recommend it to you! One thing I would suggest is to take this code out of FFTFrameFFMPEG and move it to a place where other FFTFrame implementations could potentially use it (or other code needing complex multiply). Maybe it makes sense to make a function for this in VectorMath? In FFTFrameMac.cpp there's a call to a vecLib function called vDSP_zvmul(), so zvmul()??

Xingnan Wang

Comment 3 2011-12-20 01:44:55 PST

(In reply to comment #2) > Thanks for doing this. You got to it first before I could recommend it to you! > > One thing I would suggest is to take this code out of FFTFrameFFMPEG and move it to a place where other FFTFrame implementations could potentially use it (or other code needing complex multiply). Maybe it makes sense to make a function for this in VectorMath? In FFTFrameMac.cpp there's a call to a vecLib function called vDSP_zvmul(), so zvmul()?? Hi Roger, I agree that we should add a zvmul into VectorMath, and in my mind it`s also very meaning to add more basic vector calculation functions such as vDSP_deq22D in Biquad.cpp to VectorMath and thus we can get more benefits from SSE optimization in audio processing. If so I am very glad to contribute more such optimization codes to enlarge the VectorMath. For the zvmul, I would like to add two with different types: //1. To be consistent with vDSP_zvmul void zvmul(const float* source1P, int sourceStride1, const float* source2P, int sourceStride2, float* destP, int destStride, size_t framesToProcess, int conjugate); //2. Used in FFTFrame, avoiding splitting and merging of complex_data void zvmul(const float* real1P, const float* imag1P, const float* real2P, const* imag2P, float* realDestP, float* imagDestP, size_t framesToProcess); They have different implementations of SSE optimization. If the two function`s definitions are OK, patch coming soon. Thanks for any comments.

Chris Rogers

Comment 4 2011-12-20 10:28:41 PST

great to hear about the Biquad optimization! I'll have to look more closely, but I'm pretty sure we only need the zvmul() version which operates on non-interleaved (planar) complex data like we use in FFTFrameFFMPEG. I *thought* that was the same as the one used in FFTFrameMac.

Xingnan Wang

Comment 5 2011-12-20 18:14:37 PST

(In reply to comment #4) > great to hear about the Biquad optimization! > Now I am doing some work on the Biquad optimization, I will submit another patch when finishing it. > I'll have to look more closely, but I'm pretty sure we only need the zvmul() version which operates on non-interleaved (planar) complex data like we use in FFTFrameFFMPEG. I *thought* that was the same as the one used in FFTFrameMac. Yes you are right , sorry for misunderstanding the vDSP_zvmul, which surely uses non-interleaved complex data. I`ll update the patch soon. BTW, do you think it makes sense to add two functions like vDSP_ctoz and vDSP_ztoc in VectorMath? Maybe we can improve the performance of packing and unpacking complex data after some optimization on them.

Chris Rogers

Comment 6 2011-12-20 18:33:26 PST

vDSP_ctoz and vDSP_ztoc might be useful. But the FFT algorithm itself is *really* the hotspot. Do you think it would make sense to create an FFTFrameIPP which calls into an appropriate Intel IPP library FFT function? We already have one for MKL (FFTFrameMKL), but I've since learned that the IPP library may be more appropriate for media/audio applications.

Xingnan Wang

Comment 7 2011-12-20 19:05:59 PST

(In reply to comment #6) > vDSP_ctoz and vDSP_ztoc might be useful. But the FFT algorithm itself is *really* the hotspot. Do you think it would make sense to create an FFTFrameIPP which calls into an appropriate Intel IPP library FFT function? We already have one for MKL (FFTFrameMKL), but I've since learned that the IPP library may be more appropriate for media/audio applications. Agree, IPP is more suitable than MKL for FFT in audio processing and it makes sense to leverage the IPP to do FFT. So I could have a try to add an FFTFrameIPP and enable the library.

Xingnan Wang

Comment 8 2011-12-20 21:52:57 PST

Created attachment 120142 [details] Patch Patch updated as comments. Add zvmul in VectorMath and use zvmul and vsmul in multiply(). Thanks.

Chris Rogers

Comment 9 2011-12-21 15:44:28 PST

Comment on attachment 120142 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120142&action=review Looks good overall, with a few comments: > Source/WebCore/platform/audio/VectorMath.cpp:301 > +void zvmul(const float* real1P, const float* imag1P, const float* real2P, const float* imag2P, float* realDestP, float* imagDestP, size_t framesToProcess) We need to provide a version of this function in the DARWIN section using vDSP_zvmul. It shouldn't be too hard. You can look in FFTFrameMac.cpp for an example usage. It will involve using three local variables of type "DSPSplitComplex" /* Complex-split vector multiply, single-precision.*/ /* * vDSP_zvmul() * * Availability: * Mac OS X: in version 10.0 and later in vecLib.framework * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib * Non-Carbon CFM: in vecLib 1.0 and later */ extern void vDSP_zvmul( const DSPSplitComplex *__vDSP_input1, vDSP_Stride __vDSP_stride1, const DSPSplitComplex *__vDSP_input2, vDSP_Stride __vDSP_stride2, const DSPSplitComplex *__vDSP_result, vDSP_Stride __vDSP_strideResult, vDSP_Length __vDSP_size, int __vDSP_conjugate) AVAILABLE_MAC_OS_X_VERSION_10_0_AND_LATER; > Source/WebCore/platform/audio/VectorMath.cpp:305 > + // Only handle the very common case that all addresses are 16-byte aligned. I would elaborate on this comment a little by saying something like: // Only use the SSE optimization in the very common case that all addresses are 16-byte aligned. // Otherwise, fall through to the scalar code below. As the comment is right now, it seems to imply that we don't handle "at all" the non-aligned case. > Source/WebCore/platform/audio/ffmpeg/FFTFrameFFMPEG.cpp:-138 > - } I would restore the comment: // Multiply the packed DC/nyquist component

Xingnan Wang

Comment 10 2011-12-21 19:23:39 PST

Created attachment 120259 [details] Patch Update the patch as comments. Change the FFTFrameMac::multiply() to the same way of FFTFrameFFMPEG, due to the real1P, imag1P, real2P and imag2P are all 16-byte aligned and it`s better to start process from 0 of buffer than 1.

Chris Rogers

Comment 11 2011-12-21 20:35:52 PST

Comment on attachment 120259 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120259&action=review > Source/WebCore/platform/audio/VectorMath.cpp:79 > +#else You need to invoke ::zvmul exactly with the same arguments as vDSP_zvmul() and move lines 77:79 down to just after line 88 In other words ::zvmul() also needs to take DSPSplitComplex arguments (as opposed to our new VectorMath::zvmul() function...) > Source/WebCore/platform/audio/ffmpeg/FFTFrameFFMPEG.cpp:122 > float scale = 0.5f; Please move lines 121:122 to just before first use, which is now just before line 134 > Source/WebCore/platform/audio/mac/FFTFrameMac.cpp:113 > float scale = 0.5f; Please move lines 112:113 to just right before first use, which is now just before line 126

Xingnan Wang

Comment 12 2011-12-21 21:04:35 PST

Comment on attachment 120259 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=120259&action=review >> Source/WebCore/platform/audio/VectorMath.cpp:79 >> +#else > > You need to invoke ::zvmul exactly with the same arguments as vDSP_zvmul() > > and move lines 77:79 down to just after line 88 > > In other words ::zvmul() also needs to take DSPSplitComplex arguments (as opposed to our new VectorMath::zvmul() function...) Not quite understand it, do you mean that the ::zvmul should be re-defined as VDSP_zvmul and import DSPSplitComplex into non-OS(DARWIN)? or add another zvmul definition below in VectorMath.h? #if OS(DARWIN) void vDSP_zvmul(const DSPSplitComplex *, vDSP_Stride ,const DSPSplitComplex *, vDSP_Stride, const DSPSplitComplex *, vDSP_Stride, vDSP_Length , int ); #endif

Chris Rogers

Comment 13 2011-12-21 21:29:00 PST

(In reply to comment #12) > Not quite understand it, do you mean that the ::zvmul should be re-defined as VDSP_zvmul and import DSPSplitComplex into non-OS(DARWIN)? > or > add another zvmul definition below in VectorMath.h? > #if OS(DARWIN) > void vDSP_zvmul(const DSPSplitComplex *, vDSP_Stride ,const DSPSplitComplex *, vDSP_Stride, const DSPSplitComplex *, vDSP_Stride, vDSP_Length , int ); > #endif It's easier if I just show you how the function should be defined: void zvmul(const float* real1P, const float* imag1P, const float* real2P, const float* imag2P, float* realDestP, float* imagDestP, size_t framesToProcess) { DSPSplitComplex sc1; DSPSplitComplex sc2; DSPSplitComplex dest; sc1.realp = real1P; sc1.imagp = imag1P; sc2.realp = real2P; sc2.imagp = imag2P; dest.realp = realDestP; dest.imagp = imagDestP; #if defined(__ppc__) || defined(__i386__) ::zvmul(&sc1, 1, &sc2, 1, &dest, 1, framesToProcess, 1); #else vDSP_zvmul(&sc1, 1, &sc2, 1, &dest, 1, framesToProcess, 1); #endif }

Chris Rogers

Comment 14 2011-12-21 21:30:53 PST

It's too bad we have to do this. It's really just a hack/work-around for the way some macros are defined in <Accelerate/Accelerat.h> The reason is explained in this comment in VectorMath.cpp: // On the Mac we use the highly optimized versions in Accelerate.framework // In 32-bit mode (__ppc__ or __i386__) <Accelerate/Accelerate.h> includes <vecLib/vDSP_translate.h> which defines macros of the same name as // our namespaced function names, so we must handle this case differently. Other architectures (64bit, ARM, etc.) do not include this header file.

Xingnan Wang

Comment 15 2011-12-21 21:50:29 PST

Created attachment 120270 [details] Patch

Xingnan Wang

Comment 16 2011-12-21 21:51:38 PST

(In reply to comment #14) > It's too bad we have to do this. It's really just a hack/work-around for the way some macros are defined in <Accelerate/Accelerat.h> The reason is explained in this comment in VectorMath.cpp: > > // On the Mac we use the highly optimized versions in Accelerate.framework > // In 32-bit mode (__ppc__ or __i386__) <Accelerate/Accelerate.h> includes <vecLib/vDSP_translate.h> which defines macros of the same name as > // our namespaced function names, so we must handle this case differently. Other architectures (64bit, ARM, etc.) do not include this header file. Got it. Thanks for explanation. Patch is updated.

WebKit Review Bot

Comment 17 2011-12-22 00:39:14 PST

Comment on attachment 120270 [details] Patch Attachment 120270 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10986368 New failing tests: fast/workers/storage/use-same-database-in-page-and-workers.html

Xingnan Wang

Comment 18 2011-12-22 02:11:10 PST

(In reply to comment #17) > (From update of attachment 120270 [details]) > Attachment 120270 [details] did not pass chromium-ews (chromium-xvfb): > Output: http://queues.webkit.org/results/10986368 > > New failing tests: > fast/workers/storage/use-same-database-in-page-and-workers.html Hi Roger, My patch is failed to pass the tests in buildbots, but I am confused that these tests seems not have not too much relation with the change from my patch. I am not familiar with the chromium test framework so could you help me to check what`s the root cause of the building failure? Thank you very much.

Chris Rogers

Comment 19 2011-12-22 11:50:51 PST

It's unlikely that it's related to your patch, and is probably some other recent change. You can try re-uploading your patch to have it try again.

Xingnan Wang

Comment 20 2012-01-03 18:06:47 PST

Created attachment 121031 [details] Patch Hi Roger, I re-upload the patch, please help to submit to commit-queue again, thank you~

Chris Rogers

Comment 21 2012-01-04 17:40:39 PST

Looks good.

Kenneth Russell

Comment 22 2012-01-04 17:42:22 PST

Comment on attachment 121031 [details] Patch rs=me based on Chris's review.

WebKit Review Bot

Comment 23 2012-01-05 04:54:20 PST

Comment on attachment 121031 [details] Patch Clearing flags on attachment: 121031 Committed r104143: <http://trac.webkit.org/changeset/104143>

WebKit Review Bot

Comment 24 2012-01-05 04:54:25 PST

All reviewed patches have been landed. Closing bug.

Hin-Chung Lam

Comment 25 2012-01-05 05:27:37 PST

Looks like this patch breaks the tree. See the following compilation error: /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:80:15:{80:17-80:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc1.realp = real1P; ^ ~~~~~~ /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:81:15:{81:17-81:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc1.imagp = imag1P; ^ ~~~~~~ /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:82:15:{82:17-82:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc2.realp = real2P; ^ ~~~~~~ /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:83:15:{83:17-83:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] sc2.imagp = imag2P; ^ ~~~~~~ Any fix for this?

Kenneth Russell

Comment 26 2012-01-05 10:18:10 PST

(In reply to comment #25) > Looks like this patch breaks the tree. See the following compilation error: > > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:80:15:{80:17-80:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.realp = real1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:81:15:{81:17-81:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.imagp = imag1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:82:15:{82:17-82:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.realp = real2P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:83:15:{83:17-83:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.imagp = imag2P; > ^ ~~~~~~ > > Any fix for this? Alpha: thanks for patching up these compilation failures in http://trac.webkit.org/changeset/104147 .

Xingnan Wang

Comment 27 2012-01-05 17:27:21 PST

(In reply to comment #25) > Looks like this patch breaks the tree. See the following compilation error: > > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:80:15:{80:17-80:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.realp = real1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:81:15:{81:17-81:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc1.imagp = imag1P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:82:15:{82:17-82:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.realp = real2P; > ^ ~~~~~~ > /Volumes/Data/slave/lion-intel-leaks/build/Source/WebCore/platform/audio/VectorMath.cpp:83:15:{83:17-83:23}: error: assigning to 'float *' from incompatible type 'const float *' [3] > sc2.imagp = imag2P; > ^ ~~~~~~ > > Any fix for this? Thanks for your patch.

Attachments
Patch (2.88 KB, patch) 2011-12-19 00:18 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Patch (4.81 KB, patch) 2011-12-20 21:52 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Patch (7.52 KB, patch) 2011-12-21 19:23 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Patch (8.11 KB, patch) 2011-12-21 21:50 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Patch (8.15 KB, patch) 2012-01-03 18:06 PST, Xingnan Wang	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Bug 74842