217766 – Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()

RESOLVED FIXED 217766

Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()

https://bugs.webkit.org/show_bug.cgi?id=217766

Summary Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()

Chris Dumez

Reported 2020-10-15 10:43:05 PDT

Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues().

Attachments
Patch (2.06 KB, patch) 2020-10-15 10:44 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Chris Dumez

Comment 1 2020-10-15 10:44:53 PDT

Created attachment 411460 [details] Patch

Geoffrey Garen

Comment 2 2020-10-15 10:47:16 PDT

Comment on attachment 411460 [details] Patch r=me

Darin Adler

Comment 3 2020-10-15 11:03:17 PDT

Comment on attachment 411460 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411460&action=review > Source/WebCore/ChangeLog:8 > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument.

Chris Dumez

Comment 4 2020-10-15 11:11:47 PDT

(In reply to Darin Adler from comment #3) > Comment on attachment 411460 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > Source/WebCore/ChangeLog:8 > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > Does this get us any vectorization or parallelization? I know that’s the > long term ambition of the C++ library, they even have > std::execution::seq/par/par_unseq/unseq that you can pass as the first > argument. I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? [1] https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc

Darin Adler

Comment 5 2020-10-15 11:18:00 PDT

Comment on attachment 411460 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411460&action=review >>> Source/WebCore/ChangeLog:8 >>> + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). >> >> Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument. > > I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc Like most optimization situations, I think: 1) We should optimize if it makes a measurable difference. 2) Outside of that, we should choose an idiom that is both easy to understand and reasonably optimized by default. I think that using std::fill_n already accomplishes (2) and we could go further if we measure something that shows an optimization opportunity.

Sam Weinig

Comment 6 2020-10-15 11:18:27 PDT

(In reply to Chris Dumez from comment #4) > (In reply to Darin Adler from comment #3) > > Comment on attachment 411460 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > Source/WebCore/ChangeLog:8 > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > Does this get us any vectorization or parallelization? I know that’s the > > long term ambition of the C++ library, they even have > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > argument. > > I actually don't know. I figured the code was more concise with std::fill > and we *might* get more optimized code. I guess I could write a simple > benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] > https://developer.apple.com/documentation/accelerate/1450501- > vdsp_vfill?language=objc I think using accelerate as much as possible in VectorMath is what we should be doing. I really see it as a "platform" abstraction around it. I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it the right way to go. It doesn't guarantee any vectorization, but since it is closer the the compiler, if autovectorization is in the compiler, it is more likely the compiler will ensure it works with standard library idioms.

Chris Dumez

Comment 7 2020-10-15 11:20:01 PDT

(In reply to Sam Weinig from comment #6) > (In reply to Chris Dumez from comment #4) > > (In reply to Darin Adler from comment #3) > > > Comment on attachment 411460 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > Source/WebCore/ChangeLog:8 > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > long term ambition of the C++ library, they even have > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > argument. > > > > I actually don't know. I figured the code was more concise with std::fill > > and we *might* get more optimized code. I guess I could write a simple > > benchmark to compare the 2. > > > > Also note that we could very easily add a VectorMath function that uses > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > [1] > > https://developer.apple.com/documentation/accelerate/1450501- > > vdsp_vfill?language=objc > > I think using accelerate as much as possible in VectorMath is what we should > be doing. I really see it as a "platform" abstraction around it. > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > the right way to go. It doesn't guarantee any vectorization, but since it is > closer the the compiler, if autovectorization is in the compiler, it is more > likely the compiler will ensure it works with standard library idioms. Yes, I think this is a good idea. I will still benchmark std::fill_n() and vDSP_fill() because I am curious now :)

Chris Dumez

Comment 8 2020-10-15 11:55:36 PDT

(In reply to Chris Dumez from comment #7) > (In reply to Sam Weinig from comment #6) > > (In reply to Chris Dumez from comment #4) > > > (In reply to Darin Adler from comment #3) > > > > Comment on attachment 411460 [details] > > > > Patch > > > > > > > > View in context: > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > long term ambition of the C++ library, they even have > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > argument. > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > and we *might* get more optimized code. I guess I could write a simple > > > benchmark to compare the 2. > > > > > > Also note that we could very easily add a VectorMath function that uses > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > [1] > > > https://developer.apple.com/documentation/accelerate/1450501- > > > vdsp_vfill?language=objc > > > > I think using accelerate as much as possible in VectorMath is what we should > > be doing. I really see it as a "platform" abstraction around it. > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > the right way to go. It doesn't guarantee any vectorization, but since it is > > closer the the compiler, if autovectorization is in the compiler, it is more > > likely the compiler will ensure it works with standard library idioms. > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > vDSP_fill() because I am curious now :) Interestingly, std::fill_n() seems consistently faster for large arrays than vDSP_fill() on my MacBook Pro: std::fill_n() took 1.0884ms vDSP::vfill() took 1.20753ms Benchmark: // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o fill_benchmark #include <algorithm> #include <chrono> #include <iostream> #include <Accelerate/Accelerate.h> int main() { constexpr unsigned N = 524288; const float pi = 3.1415926535; float array1[N]; float array2[N]; auto start = std::chrono::steady_clock::now(); std::fill_n(array1, N, pi); auto end = std::chrono::steady_clock::now(); std::chrono::duration<double> diff = end - start; std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; start = std::chrono::steady_clock::now(); vDSP_vfill(&pi, array2, 1, N); end = std::chrono::steady_clock::now(); diff = end - start; std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; std::cout << array1[2000] << " " << array2[2000] << "\n"; return 0; }

Sam Weinig

Comment 9 2020-10-15 13:59:37 PDT

(In reply to Chris Dumez from comment #8) > (In reply to Chris Dumez from comment #7) > > (In reply to Sam Weinig from comment #6) > > > (In reply to Chris Dumez from comment #4) > > > > (In reply to Darin Adler from comment #3) > > > > > Comment on attachment 411460 [details] > > > > > Patch > > > > > > > > > > View in context: > > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > long term ambition of the C++ library, they even have > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > argument. > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > and we *might* get more optimized code. I guess I could write a simple > > > > benchmark to compare the 2. > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > [1] > > > > https://developer.apple.com/documentation/accelerate/1450501- > > > > vdsp_vfill?language=objc > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > likely the compiler will ensure it works with standard library idioms. > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > vDSP_fill() because I am curious now :) > > Interestingly, std::fill_n() seems consistently faster for large arrays than > vDSP_fill() on my MacBook Pro: > > std::fill_n() took 1.0884ms > vDSP::vfill() took 1.20753ms > > Benchmark: > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > fill_benchmark > > #include <algorithm> > #include <chrono> > #include <iostream> > #include <Accelerate/Accelerate.h> > > int main() > { > constexpr unsigned N = 524288; > const float pi = 3.1415926535; > float array1[N]; > float array2[N]; > > auto start = std::chrono::steady_clock::now(); > std::fill_n(array1, N, pi); > auto end = std::chrono::steady_clock::now(); > > std::chrono::duration<double> diff = end - start; > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > start = std::chrono::steady_clock::now(); > vDSP_vfill(&pi, array2, 1, N); > end = std::chrono::steady_clock::now(); > > diff = end - start; > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > return 0; > } Interesting. Probably worth a radar to the Accelerate folks.

Chris Dumez

Comment 10 2020-10-15 14:12:47 PDT

(In reply to Sam Weinig from comment #9) > (In reply to Chris Dumez from comment #8) > > (In reply to Chris Dumez from comment #7) > > > (In reply to Sam Weinig from comment #6) > > > > (In reply to Chris Dumez from comment #4) > > > > > (In reply to Darin Adler from comment #3) > > > > > > Comment on attachment 411460 [details] > > > > > > Patch > > > > > > > > > > > > View in context: > > > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > > long term ambition of the C++ library, they even have > > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > > argument. > > > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > > and we *might* get more optimized code. I guess I could write a simple > > > > > benchmark to compare the 2. > > > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > > > [1] > > > > > https://developer.apple.com/documentation/accelerate/1450501- > > > > > vdsp_vfill?language=objc > > > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > > likely the compiler will ensure it works with standard library idioms. > > > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > > vDSP_fill() because I am curious now :) > > > > Interestingly, std::fill_n() seems consistently faster for large arrays than > > vDSP_fill() on my MacBook Pro: > > > > std::fill_n() took 1.0884ms > > vDSP::vfill() took 1.20753ms > > > > Benchmark: > > > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > > fill_benchmark > > > > #include <algorithm> > > #include <chrono> > > #include <iostream> > > #include <Accelerate/Accelerate.h> > > > > int main() > > { > > constexpr unsigned N = 524288; > > const float pi = 3.1415926535; > > float array1[N]; > > float array2[N]; > > > > auto start = std::chrono::steady_clock::now(); > > std::fill_n(array1, N, pi); > > auto end = std::chrono::steady_clock::now(); > > > > std::chrono::duration<double> diff = end - start; > > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > > > start = std::chrono::steady_clock::now(); > > vDSP_vfill(&pi, array2, 1, N); > > end = std::chrono::steady_clock::now(); > > > > diff = end - start; > > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > > > return 0; > > } > > Interesting. Probably worth a radar to the Accelerate folks. Ok, rdar://problem/70351530.

EWS

Comment 11 2020-10-15 14:15:23 PDT

Committed r268553: <https://trac.webkit.org/changeset/268553> All reviewed patches have been landed. Closing bug and clearing flags on attachment 411460 [details].

Radar WebKit Bug Importer

Comment 12 2020-10-15 14:16:20 PDT

<rdar://problem/70351684>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Audio

Assignee

Chris Dumez

Reported

2020-10-15 10:43 PDT

Modified

2020-10-15 14:16 PDT History

CC List

11 users Show

URL

Keywords InRadar

Depends on

Blocks

212611

Dependencies

tree graph