WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
98131
Add ARM-NEON support to VectorMath in WebAudio
https://bugs.webkit.org/show_bug.cgi?id=98131
Summary
Add ARM-NEON support to VectorMath in WebAudio
Gabor Rapcsanyi
Reported
2012-10-02 02:47:47 PDT
Add ARM-NEON support to VectorMath in WebAudio.
Attachments
proposed patch
(7.86 KB, patch)
2012-10-02 06:48 PDT
,
Gabor Rapcsanyi
no flags
Details
Formatted Diff
Diff
proposed patch 2
(7.84 KB, patch)
2012-10-03 02:21 PDT
,
Gabor Rapcsanyi
no flags
Details
Formatted Diff
Diff
proposed patch 3
(7.57 KB, patch)
2012-10-05 05:15 PDT
,
Gabor Rapcsanyi
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
Gabor Rapcsanyi
Comment 1
2012-10-02 06:48:37 PDT
Created
attachment 166684
[details]
proposed patch
Chris Rogers
Comment 2
2012-10-02 10:16:10 PDT
Ray, could you please have a look at this?
Zoltan Herczeg
Comment 3
2012-10-02 10:33:27 PDT
Comment on
attachment 166684
[details]
proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> Source/WebCore/platform/audio/VectorMath.cpp:632 > + max = std::max(max, groupMax[0]); > + max = std::max(max, groupMax[1]); > + max = std::max(max, groupMax[2]); > + max = std::max(max, groupMax[3]);
Can't we use vmax here?
Raymond Toy
Comment 4
2012-10-02 11:18:26 PDT
Comment on
attachment 166684
[details]
proposed patch View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious.
> Source/WebCore/platform/audio/VectorMath.cpp:493 > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2));
Can the vmla and vmls instrinsics be used here to speed things up slightly?
Gabor Rapcsanyi
Comment 5
2012-10-03 01:54:38 PDT
(In reply to
comment #3
)
> (From update of
attachment 166684
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > Source/WebCore/platform/audio/VectorMath.cpp:632 > > + max = std::max(max, groupMax[0]); > > + max = std::max(max, groupMax[1]); > > + max = std::max(max, groupMax[2]); > > + max = std::max(max, groupMax[3]); > > Can't we use vmax here?
Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector.
Gabor Rapcsanyi
Comment 6
2012-10-03 02:17:38 PDT
(In reply to
comment #4
)
> (From update of
attachment 166684
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. >
Yes I have made independent tests on the functions. The vector size was 1000 with float numbers and I ran it 100000 times. The average results: vsma: REF elasped time is 3747 ms NEON elasped time is 3642 ms vsmul: REF elasped time is 2654 ms NEON elasped time is 2375 ms vadd: REF elasped time is 3457 ms NEON elasped time is 2631 ms vmul: REF elasped time is 3528 ms NEON elasped time is 2635 ms vmaxmgv: REF elasped time is 5203 ms NEON elasped time is 2201 ms vsvesq: REF elasped time is 2234 ms NEON elasped time is 1840 ms zvmul: (with vmla and vmls) REF elasped time is 8778 ms NEON elasped time is 7963 ms
> > Source/WebCore/platform/audio/VectorMath.cpp:493 > > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); > > Can the vmla and vmls instrinsics be used here to speed things up slightly?
Yes you're right. We can use it here I will change this.
Gabor Rapcsanyi
Comment 7
2012-10-03 02:21:04 PDT
Created
attachment 166832
[details]
proposed patch 2
Raymond Toy
Comment 8
2012-10-03 13:18:38 PDT
(In reply to
comment #6
)
> (In reply to
comment #4
) > > (From update of
attachment 166684
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > Yes I have made independent tests on the functions. > The vector size was 1000 with float numbers and I ran it 100000 times. > The average results: > > vsma: > REF elasped time is 3747 ms > NEON elasped time is 3642 ms
Thank you very much for the measurements. There wasn't much difference here. Do you have any ideas on why there was very little improvement?
> > vsmul: > REF elasped time is 2654 ms > NEON elasped time is 2375 ms > > vadd: > REF elasped time is 3457 ms > NEON elasped time is 2631 ms > > vmul: > REF elasped time is 3528 ms > NEON elasped time is 2635 ms > > vmaxmgv: > REF elasped time is 5203 ms > NEON elasped time is 2201 ms > > vsvesq: > REF elasped time is 2234 ms > NEON elasped time is 1840 ms > > zvmul: (with vmla and vmls) > REF elasped time is 8778 ms > NEON elasped time is 7963 ms > > > > Source/WebCore/platform/audio/VectorMath.cpp:493 > > > + float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2)); > > > > Can the vmla and vmls instrinsics be used here to speed things up slightly? > > Yes you're right. We can use it here I will change this.
Raymond Toy
Comment 9
2012-10-03 13:25:17 PDT
Comment on
attachment 166832
[details]
proposed patch 2 This looks fine.
Gabor Rapcsanyi
Comment 10
2012-10-04 04:03:34 PDT
(In reply to
comment #8
)
> (In reply to
comment #6
) > > (In reply to
comment #4
) > > > (From update of
attachment 166684
[details]
[details] [details]) > > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > > > > Yes I have made independent tests on the functions. > > The vector size was 1000 with float numbers and I ran it 100000 times. > > The average results: > > > > vsma: > > REF elasped time is 3747 ms > > NEON elasped time is 3642 ms > > Thank you very much for the measurements. > > There wasn't much difference here. Do you have any ideas on why there was very little improvement? >
Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it. So the results with -O2 on the same tests: vsma: REF elasped time is 1106 ms NEON elasped time is 458 ms vsmul: REF elasped time is 803 ms NEON elasped time is 573 ms vadd: REF elasped time is 712 ms NEON elasped time is 361 ms vmaxmgv: REF elasped time is 803 ms NEON elasped time is 217 ms vsvesq: REF elasped time is 501 ms NEON elasped time is 250 ms vmul: REF elasped time is 805 ms NEON elasped time is 363 ms zvmul: REF elasped time is 1657 ms NEON elasped time is 953 ms Sorry for the confusion.
Raymond Toy
Comment 11
2012-10-04 12:41:13 PDT
(In reply to
comment #10
)
> (In reply to
comment #8
) > > (In reply to
comment #6
) > > > (In reply to
comment #4
) > > > > (From update of
attachment 166684
[details]
[details] [details] [details]) > > > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > > > > > > Looks good. Do you have any measurements to show what kind of improvements these instrinsics make? I'm just curious. > > > > > > > > > > Yes I have made independent tests on the functions. > > > The vector size was 1000 with float numbers and I ran it 100000 times. > > > The average results: > > > > > > vsma: > > > REF elasped time is 3747 ms > > > NEON elasped time is 3642 ms > > > > Thank you very much for the measurements. > > > > There wasn't much difference here. Do you have any ideas on why there was very little improvement? > > > > Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it. > So the results with -O2 on the same tests:
Thanks for rerunning the tests. The results look very nice now, with about 2x improvement everywhere.
> > vsma: > REF elasped time is 1106 ms > NEON elasped time is 458 ms > > vsmul: > REF elasped time is 803 ms > NEON elasped time is 573 ms > > vadd: > REF elasped time is 712 ms > NEON elasped time is 361 ms > > vmaxmgv: > REF elasped time is 803 ms > NEON elasped time is 217 ms > > vsvesq: > REF elasped time is 501 ms > NEON elasped time is 250 ms > > vmul: > REF elasped time is 805 ms > NEON elasped time is 363 ms > > zvmul: > REF elasped time is 1657 ms > NEON elasped time is 953 ms > > > Sorry for the confusion.
Zoltan Herczeg
Comment 12
2012-10-04 15:00:03 PDT
(In reply to
comment #5
)
> (In reply to
comment #3
) > > (From update of
attachment 166684
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > > > Source/WebCore/platform/audio/VectorMath.cpp:632 > > > + max = std::max(max, groupMax[0]); > > > + max = std::max(max, groupMax[1]); > > > + max = std::max(max, groupMax[2]); > > > + max = std::max(max, groupMax[3]); > > > > Can't we use vmax here? > > Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector.
Hm true, but still you can reduce it by one max: max(a,b,c,d): t1 = max(a,b); t2 = max(c,d); return max(t1,t2)
Gabor Rapcsanyi
Comment 13
2012-10-05 05:15:10 PDT
Created
attachment 167312
[details]
proposed patch 3 I put some more optimizations into vsvesq() and vmaxmgv() as Zoltan suggested.
Zoltan Herczeg
Comment 14
2012-10-05 05:42:13 PDT
Comment on
attachment 167312
[details]
proposed patch 3 r=me
WebKit Review Bot
Comment 15
2012-10-05 05:47:32 PDT
Comment on
attachment 167312
[details]
proposed patch 3 Clearing flags on attachment: 167312 Committed
r130497
: <
http://trac.webkit.org/changeset/130497
>
WebKit Review Bot
Comment 16
2012-10-05 05:47:36 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug