Bug 98131 - Add ARM-NEON support to VectorMath in WebAudio
Summary: Add ARM-NEON support to VectorMath in WebAudio
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Audio (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-02 02:47 PDT by Gabor Rapcsanyi
Modified: 2013-02-25 18:04 PST (History)
9 users (show)

See Also:


Attachments
proposed patch (7.86 KB, patch)
2012-10-02 06:48 PDT, Gabor Rapcsanyi
no flags Details | Formatted Diff | Diff
proposed patch 2 (7.84 KB, patch)
2012-10-03 02:21 PDT, Gabor Rapcsanyi
no flags Details | Formatted Diff | Diff
proposed patch 3 (7.57 KB, patch)
2012-10-05 05:15 PDT, Gabor Rapcsanyi
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Rapcsanyi 2012-10-02 02:47:47 PDT
Add ARM-NEON support to VectorMath in WebAudio.
Comment 1 Gabor Rapcsanyi 2012-10-02 06:48:37 PDT
Created attachment 166684 [details]
proposed patch
Comment 2 Chris Rogers 2012-10-02 10:16:10 PDT
Ray, could you please have a look at this?
Comment 3 Zoltan Herczeg 2012-10-02 10:33:27 PDT
Comment on attachment 166684 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review

> Source/WebCore/platform/audio/VectorMath.cpp:632
> +        max = std::max(max, groupMax[0]);
> +        max = std::max(max, groupMax[1]);
> +        max = std::max(max, groupMax[2]);
> +        max = std::max(max, groupMax[3]);

Can't we use vmax here?
Comment 4 Raymond Toy 2012-10-02 11:18:26 PDT
Comment on attachment 166684 [details]
proposed patch

View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review

Looks good.  Do you have any measurements to show what kind of improvements these instrinsics make?  I'm just curious.

> Source/WebCore/platform/audio/VectorMath.cpp:493
> +            float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2));

Can the vmla and vmls instrinsics be used here to speed things up slightly?
Comment 5 Gabor Rapcsanyi 2012-10-03 01:54:38 PDT
(In reply to comment #3)
> (From update of attachment 166684 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> 
> > Source/WebCore/platform/audio/VectorMath.cpp:632
> > +        max = std::max(max, groupMax[0]);
> > +        max = std::max(max, groupMax[1]);
> > +        max = std::max(max, groupMax[2]);
> > +        max = std::max(max, groupMax[3]);
> 
> Can't we use vmax here?

Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector.
Comment 6 Gabor Rapcsanyi 2012-10-03 02:17:38 PDT
(In reply to comment #4)
> (From update of attachment 166684 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> 
> Looks good.  Do you have any measurements to show what kind of improvements these instrinsics make?  I'm just curious.
> 

Yes I have made independent tests on the functions.
The vector size was 1000 with float numbers and I ran it 100000 times.
The average results:

vsma:
REF  elasped time is 3747 ms
NEON elasped time is 3642 ms

vsmul:
REF  elasped time is 2654 ms
NEON elasped time is 2375 ms

vadd:
REF  elasped time is 3457 ms
NEON elasped time is 2631 ms

vmul:
REF  elasped time is 3528 ms
NEON elasped time is 2635 ms

vmaxmgv:
REF  elasped time is 5203 ms
NEON elasped time is 2201 ms

vsvesq:
REF  elasped time is 2234 ms
NEON elasped time is 1840 ms

zvmul: (with vmla and vmls)
REF  elasped time is 8778 ms
NEON elasped time is 7963 ms

> > Source/WebCore/platform/audio/VectorMath.cpp:493
> > +            float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2));
> 
> Can the vmla and vmls instrinsics be used here to speed things up slightly?

Yes you're right. We can use it here I will change this.
Comment 7 Gabor Rapcsanyi 2012-10-03 02:21:04 PDT
Created attachment 166832 [details]
proposed patch 2
Comment 8 Raymond Toy 2012-10-03 13:18:38 PDT
(In reply to comment #6)
> (In reply to comment #4)
> > (From update of attachment 166684 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > 
> > Looks good.  Do you have any measurements to show what kind of improvements these instrinsics make?  I'm just curious.
> > 
> 
> Yes I have made independent tests on the functions.
> The vector size was 1000 with float numbers and I ran it 100000 times.
> The average results:
> 
> vsma:
> REF  elasped time is 3747 ms
> NEON elasped time is 3642 ms

Thank you very much for the measurements.

There wasn't much difference here.  Do you have any ideas on why there was very little improvement?

> 
> vsmul:
> REF  elasped time is 2654 ms
> NEON elasped time is 2375 ms
> 
> vadd:
> REF  elasped time is 3457 ms
> NEON elasped time is 2631 ms
> 
> vmul:
> REF  elasped time is 3528 ms
> NEON elasped time is 2635 ms
> 
> vmaxmgv:
> REF  elasped time is 5203 ms
> NEON elasped time is 2201 ms
> 
> vsvesq:
> REF  elasped time is 2234 ms
> NEON elasped time is 1840 ms
> 
> zvmul: (with vmla and vmls)
> REF  elasped time is 8778 ms
> NEON elasped time is 7963 ms
> 
> > > Source/WebCore/platform/audio/VectorMath.cpp:493
> > > +            float32x4_t imagResult = vaddq_f32(vmulq_f32(real1, imag2), vmulq_f32(imag1, real2));
> > 
> > Can the vmla and vmls instrinsics be used here to speed things up slightly?
> 
> Yes you're right. We can use it here I will change this.
Comment 9 Raymond Toy 2012-10-03 13:25:17 PDT
Comment on attachment 166832 [details]
proposed patch 2

This looks fine.
Comment 10 Gabor Rapcsanyi 2012-10-04 04:03:34 PDT
(In reply to comment #8)
> (In reply to comment #6)
> > (In reply to comment #4)
> > > (From update of attachment 166684 [details] [details] [details])
> > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > 
> > > Looks good.  Do you have any measurements to show what kind of improvements these instrinsics make?  I'm just curious.
> > > 
> > 
> > Yes I have made independent tests on the functions.
> > The vector size was 1000 with float numbers and I ran it 100000 times.
> > The average results:
> > 
> > vsma:
> > REF  elasped time is 3747 ms
> > NEON elasped time is 3642 ms
> 
> Thank you very much for the measurements.
> 
> There wasn't much difference here.  Do you have any ideas on why there was very little improvement?
> 

Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it.
So the results with -O2 on the same tests:

vsma:
REF  elasped time is 1106 ms
NEON elasped time is 458 ms

vsmul:
REF  elasped time is 803 ms
NEON elasped time is 573 ms

vadd:
REF  elasped time is 712 ms
NEON elasped time is 361 ms

vmaxmgv:
REF  elasped time is 803 ms
NEON elasped time is 217 ms

vsvesq:
REF  elasped time is 501 ms
NEON elasped time is 250 ms

vmul:
REF  elasped time is 805 ms
NEON elasped time is 363 ms

zvmul:
REF  elasped time is 1657 ms
NEON elasped time is 953 ms


Sorry for the confusion.
Comment 11 Raymond Toy 2012-10-04 12:41:13 PDT
(In reply to comment #10)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > (In reply to comment #4)
> > > > (From update of attachment 166684 [details] [details] [details] [details])
> > > > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > > > 
> > > > Looks good.  Do you have any measurements to show what kind of improvements these instrinsics make?  I'm just curious.
> > > > 
> > > 
> > > Yes I have made independent tests on the functions.
> > > The vector size was 1000 with float numbers and I ran it 100000 times.
> > > The average results:
> > > 
> > > vsma:
> > > REF  elasped time is 3747 ms
> > > NEON elasped time is 3642 ms
> > 
> > Thank you very much for the measurements.
> > 
> > There wasn't much difference here.  Do you have any ideas on why there was very little improvement?
> > 
> 
> Yes that was strange for me as well so I played with it a little and than I realized that I forgot to use -O2 when I compiled it.
> So the results with -O2 on the same tests:

Thanks for rerunning the tests. The results look very nice now, with about 2x improvement everywhere.


> 
> vsma:
> REF  elasped time is 1106 ms
> NEON elasped time is 458 ms
> 
> vsmul:
> REF  elasped time is 803 ms
> NEON elasped time is 573 ms
> 
> vadd:
> REF  elasped time is 712 ms
> NEON elasped time is 361 ms
> 
> vmaxmgv:
> REF  elasped time is 803 ms
> NEON elasped time is 217 ms
> 
> vsvesq:
> REF  elasped time is 501 ms
> NEON elasped time is 250 ms
> 
> vmul:
> REF  elasped time is 805 ms
> NEON elasped time is 363 ms
> 
> zvmul:
> REF  elasped time is 1657 ms
> NEON elasped time is 953 ms
> 
> 
> Sorry for the confusion.
Comment 12 Zoltan Herczeg 2012-10-04 15:00:03 PDT
(In reply to comment #5)
> (In reply to comment #3)
> > (From update of attachment 166684 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=166684&action=review
> > 
> > > Source/WebCore/platform/audio/VectorMath.cpp:632
> > > +        max = std::max(max, groupMax[0]);
> > > +        max = std::max(max, groupMax[1]);
> > > +        max = std::max(max, groupMax[2]);
> > > +        max = std::max(max, groupMax[3]);
> > 
> > Can't we use vmax here?
> 
> Unfortunately we can't because vmax is working on vectors and the result will be a vector as well. There is no function which gives you the highest value from a vector.

Hm true, but still you can reduce it by one max:

max(a,b,c,d):
  t1 = max(a,b);
  t2 = max(c,d);
  return max(t1,t2)
Comment 13 Gabor Rapcsanyi 2012-10-05 05:15:10 PDT
Created attachment 167312 [details]
proposed patch 3

I put some more optimizations into vsvesq() and vmaxmgv() as Zoltan suggested.
Comment 14 Zoltan Herczeg 2012-10-05 05:42:13 PDT
Comment on attachment 167312 [details]
proposed patch 3

r=me
Comment 15 WebKit Review Bot 2012-10-05 05:47:32 PDT
Comment on attachment 167312 [details]
proposed patch 3

Clearing flags on attachment: 167312

Committed r130497: <http://trac.webkit.org/changeset/130497>
Comment 16 WebKit Review Bot 2012-10-05 05:47:36 PDT
All reviewed patches have been landed.  Closing bug.