98131 – Add ARM-NEON support to VectorMath in WebAudio

RESOLVED FIXED 98131

Add ARM-NEON support to VectorMath in WebAudio

https://bugs.webkit.org/show_bug.cgi?id=98131

Summary Add ARM-NEON support to VectorMath in WebAudio

Gabor Rapcsanyi

Reported 2012-10-02 02:47:47 PDT

Add ARM-NEON support to VectorMath in WebAudio.

Attachments
proposed patch (7.86 KB, patch) 2012-10-02 06:48 PDT, Gabor Rapcsanyi	no flags	Details Formatted Diff Diff
proposed patch 2 (7.84 KB, patch) 2012-10-03 02:21 PDT, Gabor Rapcsanyi	no flags	Details Formatted Diff Diff
proposed patch 3 (7.57 KB, patch) 2012-10-05 05:15 PDT, Gabor Rapcsanyi	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Gabor Rapcsanyi

Comment 1 2012-10-02 06:48:37 PDT

Created attachment 166684 [details] proposed patch

Chris Rogers

Comment 2 2012-10-02 10:16:10 PDT

Ray, could you please have a look at this?

Zoltan Herczeg

Comment 3 2012-10-02 10:33:27 PDT

Comment on attachment 166684 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > Source/WebCore/platform/audio/VectorMath.cpp:632 > + max = std::max(max, groupMax[0]); > + max = std::max(max, groupMax[1]); > + max = std::max(max, groupMax[2]); > + max = std::max(max, groupMax[3]); Can't we use vmax here?

Raymond Toy

Comment 4 2012-10-02 11:18:26 PDT

Comment on attachment 166684 [details] proposed patch View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > Source/WebCore/platform/audio/VectorMath.cpp:493 > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); Can the vmla and vmls instrinsics be used here to speed things up slightly?

Gabor Rapcsanyi

Comment 5 2012-10-03 01:54:38 PDT

(In reply to comment #3) > (From update of attachment 166684 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > Source/WebCore/platform/audio/VectorMath.cpp:632 > > + max = std::max(max, groupMax[0]); > > + max = std::max(max, groupMax[1]); > > + max = std::max(max, groupMax[2]); > > + max = std::max(max, groupMax[3]); > > Can't we use vmax here? Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector.

Gabor Rapcsanyi

Comment 6 2012-10-03 02:17:38 PDT

(In reply to comment #4) > (From update of attachment 166684 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > Yes I have made independent tests on the functions. The vector size was 1000 with float numbers and I ran it 100000 times. The average results: vsma: REF elasped time is 3747 ms NEON elasped time is 3642 ms vsmul: REF elasped time is 2654 ms NEON elasped time is 2375 ms vadd: REF elasped time is 3457 ms NEON elasped time is 2631 ms vmul: REF elasped time is 3528 ms NEON elasped time is 2635 ms vmaxmgv: REF elasped time is 5203 ms NEON elasped time is 2201 ms vsvesq: REF elasped time is 2234 ms NEON elasped time is 1840 ms zvmul: (with vmla and vmls) REF elasped time is 8778 ms NEON elasped time is 7963 ms > > Source/WebCore/platform/audio/VectorMath.cpp:493 > > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); > > Can the vmla and vmls instrinsics be used here to speed things up slightly? Yes you're right. We can use it here I will change this.

Gabor Rapcsanyi

Comment 7 2012-10-03 02:21:04 PDT

Created attachment 166832 [details] proposed patch 2

Raymond Toy

Comment 8 2012-10-03 13:18:38 PDT

(In reply to comment #6) > (In reply to comment #4) > > (From update of attachment 166684 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > Yes I have made independent tests on the functions. > The vector size was 1000 with float numbers and I ran it 100000 times. > The average results: > > vsma: > REF elasped time is 3747 ms > NEON elasped time is 3642 ms Thank you very much for the measurements. There wasn't much difference here. Do you have any ideas on why there was very little improvement? > > vsmul: > REF elasped time is 2654 ms > NEON elasped time is 2375 ms > > vadd: > REF elasped time is 3457 ms > NEON elasped time is 2631 ms > > vmul: > REF elasped time is 3528 ms > NEON elasped time is 2635 ms > > vmaxmgv: > REF elasped time is 5203 ms > NEON elasped time is 2201 ms > > vsvesq: > REF elasped time is 2234 ms > NEON elasped time is 1840 ms > > zvmul: (with vmla and vmls) > REF elasped time is 8778 ms > NEON elasped time is 7963 ms > > > > Source/WebCore/platform/audio/VectorMath.cpp:493 > > > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); > > > > Can the vmla and vmls instrinsics be used here to speed things up slightly? > > Yes you're right. We can use it here I will change this.

Raymond Toy

Comment 9 2012-10-03 13:25:17 PDT

Comment on attachment 166832 [details] proposed patch 2 This looks fine.

Gabor Rapcsanyi

Comment 10 2012-10-04 04:03:34 PDT

(In reply to comment #8) > (In reply to comment #6) > > (In reply to comment #4) > > > (From update of attachment 166684 [details] [details] [details]) > > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > > > > Yes I have made independent tests on the functions. > > The vector size was 1000 with float numbers and I ran it 100000 times. > > The average results: > > > > vsma: > > REF elasped time is 3747 ms > > NEON elasped time is 3642 ms > > Thank you very much for the measurements. > > There wasn't much difference here. Do you have any ideas on why there was very little improvement? > Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it. So the results with -O2 on the same tests: vsma: REF elasped time is 1106 ms NEON elasped time is 458 ms vsmul: REF elasped time is 803 ms NEON elasped time is 573 ms vadd: REF elasped time is 712 ms NEON elasped time is 361 ms vmaxmgv: REF elasped time is 803 ms NEON elasped time is 217 ms vsvesq: REF elasped time is 501 ms NEON elasped time is 250 ms vmul: REF elasped time is 805 ms NEON elasped time is 363 ms zvmul: REF elasped time is 1657 ms NEON elasped time is 953 ms Sorry for the confusion.

Raymond Toy

Comment 11 2012-10-04 12:41:13 PDT

(In reply to comment #10) > (In reply to comment #8) > > (In reply to comment #6) > > > (In reply to comment #4) > > > > (From update of attachment 166684 [details] [details] [details] [details]) > > > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > > > > > > > Yes I have made independent tests on the functions. > > > The vector size was 1000 with float numbers and I ran it 100000 times. > > > The average results: > > > > > > vsma: > > > REF elasped time is 3747 ms > > > NEON elasped time is 3642 ms > > > > Thank you very much for the measurements. > > > > There wasn't much difference here. Do you have any ideas on why there was very little improvement? > > > > Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it. > So the results with -O2 on the same tests: Thanks for rerunning the tests. The results look very nice now, with about 2x improvement everywhere. > > vsma: > REF elasped time is 1106 ms > NEON elasped time is 458 ms > > vsmul: > REF elasped time is 803 ms > NEON elasped time is 573 ms > > vadd: > REF elasped time is 712 ms > NEON elasped time is 361 ms > > vmaxmgv: > REF elasped time is 803 ms > NEON elasped time is 217 ms > > vsvesq: > REF elasped time is 501 ms > NEON elasped time is 250 ms > > vmul: > REF elasped time is 805 ms > NEON elasped time is 363 ms > > zvmul: > REF elasped time is 1657 ms > NEON elasped time is 953 ms > > > Sorry for the confusion.

Zoltan Herczeg

Comment 12 2012-10-04 15:00:03 PDT

(In reply to comment #5) > (In reply to comment #3) > > (From update of attachment 166684 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review > > > > > Source/WebCore/platform/audio/VectorMath.cpp:632 > > > + max = std::max(max, groupMax[0]); > > > + max = std::max(max, groupMax[1]); > > > + max = std::max(max, groupMax[2]); > > > + max = std::max(max, groupMax[3]); > > > > Can't we use vmax here? > > Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector. Hm true, but still you can reduce it by one max: max(a,b,c,d): t1 = max(a,b); t2 = max(c,d); return max(t1,t2)

Gabor Rapcsanyi

Comment 13 2012-10-05 05:15:10 PDT

Created attachment 167312 [details] proposed patch 3 I put some more optimizations into vsvesq() and vmaxmgv() as Zoltan suggested.

Zoltan Herczeg

Comment 14 2012-10-05 05:42:13 PDT

Comment on attachment 167312 [details] proposed patch 3 r=me

WebKit Review Bot

Comment 15 2012-10-05 05:47:32 PDT

Comment on attachment 167312 [details] proposed patch 3 Clearing flags on attachment: 167312 Committed r130497: <http://trac.webkit.org/changeset/130497>

WebKit Review Bot

Comment 16 2012-10-05 05:47:36 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Audio

Assignee

Nobody

Reported

2012-10-02 02:47 PDT

Modified

2013-02-25 18:04 PST History

CC List

9 users Show

URL

Keywords

Depends on

Blocks