RESOLVED FIXED 98131
Add ARM-NEON support to VectorMath in WebAudio
https://bugs.webkit.org/show_bug.cgi?id=98131
Summary Add ARM-NEON support to VectorMath in WebAudio
Gabor Rapcsanyi
Reported 2012-10-02 02:47:47 PDT
Add ARM-NEON support to VectorMath in WebAudio.
Attachments
proposed patch (7.86 KB, patch)
2012-10-02 06:48 PDT, Gabor Rapcsanyi
no flags
proposed patch 2 (7.84 KB, patch)
2012-10-03 02:21 PDT, Gabor Rapcsanyi
no flags
proposed patch 3 (7.57 KB, patch)
2012-10-05 05:15 PDT, Gabor Rapcsanyi
no flags
Gabor Rapcsanyi
Comment 1 2012-10-02 06:48:37 PDT
Created attachment 166684 [details] proposed patch
Chris Rogers
Comment 2 2012-10-02 10:16:10 PDT
Ray, could you please have a look at this?
Zoltan Herczeg
Comment 3 2012-10-02 10:33:27 PDT
Comment on attachment 166684 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > Source/WebCore/platform/audio/VectorMath.cpp:632 > + max = std::max(max, groupMax[0]); > + max = std::max(max, groupMax[1]); > + max = std::max(max, groupMax[2]); > + max = std::max(max, groupMax[3]); Can't we use vmax here?
Raymond Toy
Comment 4 2012-10-02 11:18:26 PDT
Comment on attachment 166684 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > Source/WebCore/platform/audio/VectorMath.cpp:493 > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); Can the vmla and vmls instrinsics be used here to speed things up slightly?
Gabor Rapcsanyi
Comment 5 2012-10-03 01:54:38 PDT
(In reply to comment #3) > (From update of attachment 166684 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > Source/WebCore/platform/audio/VectorMath.cpp:632 > > + max = std::max(max, groupMax[0]); > > + max = std::max(max, groupMax[1]); > > + max = std::max(max, groupMax[2]); > > + max = std::max(max, groupMax[3]); > > Can't we use vmax here? Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector.
Gabor Rapcsanyi
Comment 6 2012-10-03 02:17:38 PDT
(In reply to comment #4) > (From update of attachment 166684 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > Yes I have made independent tests on the functions. The vector size was 1000 with float numbers and I ran it 100000 times. The average results: vsma: REF elasped time is 3747 ms NEON elasped time is 3642 ms vsmul: REF elasped time is 2654 ms NEON elasped time is 2375 ms vadd: REF elasped time is 3457 ms NEON elasped time is 2631 ms vmul: REF elasped time is 3528 ms NEON elasped time is 2635 ms vmaxmgv: REF elasped time is 5203 ms NEON elasped time is 2201 ms vsvesq: REF elasped time is 2234 ms NEON elasped time is 1840 ms zvmul: (with vmla and vmls) REF elasped time is 8778 ms NEON elasped time is 7963 ms > > Source/WebCore/platform/audio/VectorMath.cpp:493 > > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); > > Can the vmla and vmls instrinsics be used here to speed things up slightly? Yes you're right. We can use it here I will change this.
Gabor Rapcsanyi
Comment 7 2012-10-03 02:21:04 PDT
Created attachment 166832 [details] proposed patch 2
Raymond Toy
Comment 8 2012-10-03 13:18:38 PDT
(In reply to comment #6) > (In reply to comment #4) > > (From update of attachment 166684 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > Yes I have made independent tests on the functions. > The vector size was 1000 with float numbers and I ran it 100000 times. > The average results: > > vsma: > REF elasped time is 3747 ms > NEON elasped time is 3642 ms Thank you very much for the measurements. There wasn't much difference here. Do you have any ideas on why there was very little improvement? > > vsmul: > REF elasped time is 2654 ms > NEON elasped time is 2375 ms > > vadd: > REF elasped time is 3457 ms > NEON elasped time is 2631 ms > > vmul: > REF elasped time is 3528 ms > NEON elasped time is 2635 ms > > vmaxmgv: > REF elasped time is 5203 ms > NEON elasped time is 2201 ms > > vsvesq: > REF elasped time is 2234 ms > NEON elasped time is 1840 ms > > zvmul: (with vmla and vmls) > REF elasped time is 8778 ms > NEON elasped time is 7963 ms > > > > Source/WebCore/platform/audio/VectorMath.cpp:493 > > > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); > > > > Can the vmla and vmls instrinsics be used here to speed things up slightly? > > Yes you're right. We can use it here I will change this.
Raymond Toy
Comment 9 2012-10-03 13:25:17 PDT
Comment on attachment 166832 [details] proposed patch 2 This looks fine.
Gabor Rapcsanyi
Comment 10 2012-10-04 04:03:34 PDT
(In reply to comment #8) > (In reply to comment #6) > > (In reply to comment #4) > > > (From update of attachment 166684 [details] [details] [details]) > > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > > > > Yes I have made independent tests on the functions. > > The vector size was 1000 with float numbers and I ran it 100000 times. > > The average results: > > > > vsma: > > REF elasped time is 3747 ms > > NEON elasped time is 3642 ms > > Thank you very much for the measurements. > > There wasn't much difference here. Do you have any ideas on why there was very little improvement? > Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it. So the results with -O2 on the same tests: vsma: REF elasped time is 1106 ms NEON elasped time is 458 ms vsmul: REF elasped time is 803 ms NEON elasped time is 573 ms vadd: REF elasped time is 712 ms NEON elasped time is 361 ms vmaxmgv: REF elasped time is 803 ms NEON elasped time is 217 ms vsvesq: REF elasped time is 501 ms NEON elasped time is 250 ms vmul: REF elasped time is 805 ms NEON elasped time is 363 ms zvmul: REF elasped time is 1657 ms NEON elasped time is 953 ms Sorry for the confusion.
Raymond Toy
Comment 11 2012-10-04 12:41:13 PDT
(In reply to comment #10) > (In reply to comment #8) > > (In reply to comment #6) > > > (In reply to comment #4) > > > > (From update of attachment 166684 [details] [details] [details] [details]) > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > > > > > > > Yes I have made independent tests on the functions. > > > The vector size was 1000 with float numbers and I ran it 100000 times. > > > The average results: > > > > > > vsma: > > > REF elasped time is 3747 ms > > > NEON elasped time is 3642 ms > > > > Thank you very much for the measurements. > > > > There wasn't much difference here. Do you have any ideas on why there was very little improvement? > > > > Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it. > So the results with -O2 on the same tests: Thanks for rerunning the tests. The results look very nice now, with about 2x improvement everywhere. > > vsma: > REF elasped time is 1106 ms > NEON elasped time is 458 ms > > vsmul: > REF elasped time is 803 ms > NEON elasped time is 573 ms > > vadd: > REF elasped time is 712 ms > NEON elasped time is 361 ms > > vmaxmgv: > REF elasped time is 803 ms > NEON elasped time is 217 ms > > vsvesq: > REF elasped time is 501 ms > NEON elasped time is 250 ms > > vmul: > REF elasped time is 805 ms > NEON elasped time is 363 ms > > zvmul: > REF elasped time is 1657 ms > NEON elasped time is 953 ms > > > Sorry for the confusion.
Zoltan Herczeg
Comment 12 2012-10-04 15:00:03 PDT
(In reply to comment #5) > (In reply to comment #3) > > (From update of attachment 166684 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > > Source/WebCore/platform/audio/VectorMath.cpp:632 > > > + max = std::max(max, groupMax[0]); > > > + max = std::max(max, groupMax[1]); > > > + max = std::max(max, groupMax[2]); > > > + max = std::max(max, groupMax[3]); > > > > Can't we use vmax here? > > Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector. Hm true, but still you can reduce it by one max: max(a,b,c,d): t1 = max(a,b); t2 = max(c,d); return max(t1,t2)
Gabor Rapcsanyi
Comment 13 2012-10-05 05:15:10 PDT
Created attachment 167312 [details] proposed patch 3 I put some more optimizations into vsvesq() and vmaxmgv() as Zoltan suggested.
Zoltan Herczeg
Comment 14 2012-10-05 05:42:13 PDT
Comment on attachment 167312 [details] proposed patch 3 r=me
WebKit Review Bot
Comment 15 2012-10-05 05:47:32 PDT
Comment on attachment 167312 [details] proposed patch 3 Clearing flags on attachment: 167312 Committed r130497: <http://trac.webkit.org/changeset/130497>
WebKit Review Bot
Comment 16 2012-10-05 05:47:36 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.