RESOLVED FIXED 217766
Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()
https://bugs.webkit.org/show_bug.cgi?id=217766
Summary Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()
Chris Dumez
Reported 2020-10-15 10:43:05 PDT
Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues().
Attachments
Patch (2.06 KB, patch)
2020-10-15 10:44 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2020-10-15 10:44:53 PDT
Geoffrey Garen
Comment 2 2020-10-15 10:47:16 PDT
Comment on attachment 411460 [details] Patch r=me
Darin Adler
Comment 3 2020-10-15 11:03:17 PDT
Comment on attachment 411460 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411460&action=review > Source/WebCore/ChangeLog:8 > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument.
Chris Dumez
Comment 4 2020-10-15 11:11:47 PDT
(In reply to Darin Adler from comment #3) > Comment on attachment 411460 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > Source/WebCore/ChangeLog:8 > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > Does this get us any vectorization or parallelization? I know that’s the > long term ambition of the C++ library, they even have > std::execution::seq/par/par_unseq/unseq that you can pass as the first > argument. I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? [1] https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc
Darin Adler
Comment 5 2020-10-15 11:18:00 PDT
Comment on attachment 411460 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411460&action=review >>> Source/WebCore/ChangeLog:8 >>> + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). >> >> Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument. > > I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc Like most optimization situations, I think: 1) We should optimize if it makes a measurable difference. 2) Outside of that, we should choose an idiom that is both easy to understand and reasonably optimized by default. I think that using std::fill_n already accomplishes (2) and we could go further if we measure something that shows an optimization opportunity.
Sam Weinig
Comment 6 2020-10-15 11:18:27 PDT
(In reply to Chris Dumez from comment #4) > (In reply to Darin Adler from comment #3) > > Comment on attachment 411460 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > Source/WebCore/ChangeLog:8 > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > Does this get us any vectorization or parallelization? I know that’s the > > long term ambition of the C++ library, they even have > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > argument. > > I actually don't know. I figured the code was more concise with std::fill > and we *might* get more optimized code. I guess I could write a simple > benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] > https://developer.apple.com/documentation/accelerate/1450501- > vdsp_vfill?language=objc I think using accelerate as much as possible in VectorMath is what we should be doing. I really see it as a "platform" abstraction around it. I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it the right way to go. It doesn't guarantee any vectorization, but since it is closer the the compiler, if autovectorization is in the compiler, it is more likely the compiler will ensure it works with standard library idioms.
Chris Dumez
Comment 7 2020-10-15 11:20:01 PDT
(In reply to Sam Weinig from comment #6) > (In reply to Chris Dumez from comment #4) > > (In reply to Darin Adler from comment #3) > > > Comment on attachment 411460 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > Source/WebCore/ChangeLog:8 > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > long term ambition of the C++ library, they even have > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > argument. > > > > I actually don't know. I figured the code was more concise with std::fill > > and we *might* get more optimized code. I guess I could write a simple > > benchmark to compare the 2. > > > > Also note that we could very easily add a VectorMath function that uses > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > [1] > > https://developer.apple.com/documentation/accelerate/1450501- > > vdsp_vfill?language=objc > > I think using accelerate as much as possible in VectorMath is what we should > be doing. I really see it as a "platform" abstraction around it. > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > the right way to go. It doesn't guarantee any vectorization, but since it is > closer the the compiler, if autovectorization is in the compiler, it is more > likely the compiler will ensure it works with standard library idioms. Yes, I think this is a good idea. I will still benchmark std::fill_n() and vDSP_fill() because I am curious now :)
Chris Dumez
Comment 8 2020-10-15 11:55:36 PDT
(In reply to Chris Dumez from comment #7) > (In reply to Sam Weinig from comment #6) > > (In reply to Chris Dumez from comment #4) > > > (In reply to Darin Adler from comment #3) > > > > Comment on attachment 411460 [details] > > > > Patch > > > > > > > > View in context: > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > long term ambition of the C++ library, they even have > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > argument. > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > and we *might* get more optimized code. I guess I could write a simple > > > benchmark to compare the 2. > > > > > > Also note that we could very easily add a VectorMath function that uses > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > [1] > > > https://developer.apple.com/documentation/accelerate/1450501- > > > vdsp_vfill?language=objc > > > > I think using accelerate as much as possible in VectorMath is what we should > > be doing. I really see it as a "platform" abstraction around it. > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > the right way to go. It doesn't guarantee any vectorization, but since it is > > closer the the compiler, if autovectorization is in the compiler, it is more > > likely the compiler will ensure it works with standard library idioms. > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > vDSP_fill() because I am curious now :) Interestingly, std::fill_n() seems consistently faster for large arrays than vDSP_fill() on my MacBook Pro: std::fill_n() took 1.0884ms vDSP::vfill() took 1.20753ms Benchmark: // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o fill_benchmark #include <algorithm> #include <chrono> #include <iostream> #include <Accelerate/Accelerate.h> int main() { constexpr unsigned N = 524288; const float pi = 3.1415926535; float array1[N]; float array2[N]; auto start = std::chrono::steady_clock::now(); std::fill_n(array1, N, pi); auto end = std::chrono::steady_clock::now(); std::chrono::duration<double> diff = end - start; std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; start = std::chrono::steady_clock::now(); vDSP_vfill(&pi, array2, 1, N); end = std::chrono::steady_clock::now(); diff = end - start; std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; std::cout << array1[2000] << " " << array2[2000] << "\n"; return 0; }
Sam Weinig
Comment 9 2020-10-15 13:59:37 PDT
(In reply to Chris Dumez from comment #8) > (In reply to Chris Dumez from comment #7) > > (In reply to Sam Weinig from comment #6) > > > (In reply to Chris Dumez from comment #4) > > > > (In reply to Darin Adler from comment #3) > > > > > Comment on attachment 411460 [details] > > > > > Patch > > > > > > > > > > View in context: > > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > long term ambition of the C++ library, they even have > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > argument. > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > and we *might* get more optimized code. I guess I could write a simple > > > > benchmark to compare the 2. > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > [1] > > > > https://developer.apple.com/documentation/accelerate/1450501- > > > > vdsp_vfill?language=objc > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > likely the compiler will ensure it works with standard library idioms. > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > vDSP_fill() because I am curious now :) > > Interestingly, std::fill_n() seems consistently faster for large arrays than > vDSP_fill() on my MacBook Pro: > > std::fill_n() took 1.0884ms > vDSP::vfill() took 1.20753ms > > Benchmark: > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > fill_benchmark > > #include <algorithm> > #include <chrono> > #include <iostream> > #include <Accelerate/Accelerate.h> > > int main() > { > constexpr unsigned N = 524288; > const float pi = 3.1415926535; > float array1[N]; > float array2[N]; > > auto start = std::chrono::steady_clock::now(); > std::fill_n(array1, N, pi); > auto end = std::chrono::steady_clock::now(); > > std::chrono::duration<double> diff = end - start; > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > start = std::chrono::steady_clock::now(); > vDSP_vfill(&pi, array2, 1, N); > end = std::chrono::steady_clock::now(); > > diff = end - start; > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > return 0; > } Interesting. Probably worth a radar to the Accelerate folks.
Chris Dumez
Comment 10 2020-10-15 14:12:47 PDT
(In reply to Sam Weinig from comment #9) > (In reply to Chris Dumez from comment #8) > > (In reply to Chris Dumez from comment #7) > > > (In reply to Sam Weinig from comment #6) > > > > (In reply to Chris Dumez from comment #4) > > > > > (In reply to Darin Adler from comment #3) > > > > > > Comment on attachment 411460 [details] > > > > > > Patch > > > > > > > > > > > > View in context: > > > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > > long term ambition of the C++ library, they even have > > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > > argument. > > > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > > and we *might* get more optimized code. I guess I could write a simple > > > > > benchmark to compare the 2. > > > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > > > [1] > > > > > https://developer.apple.com/documentation/accelerate/1450501- > > > > > vdsp_vfill?language=objc > > > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > > likely the compiler will ensure it works with standard library idioms. > > > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > > vDSP_fill() because I am curious now :) > > > > Interestingly, std::fill_n() seems consistently faster for large arrays than > > vDSP_fill() on my MacBook Pro: > > > > std::fill_n() took 1.0884ms > > vDSP::vfill() took 1.20753ms > > > > Benchmark: > > > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > > fill_benchmark > > > > #include <algorithm> > > #include <chrono> > > #include <iostream> > > #include <Accelerate/Accelerate.h> > > > > int main() > > { > > constexpr unsigned N = 524288; > > const float pi = 3.1415926535; > > float array1[N]; > > float array2[N]; > > > > auto start = std::chrono::steady_clock::now(); > > std::fill_n(array1, N, pi); > > auto end = std::chrono::steady_clock::now(); > > > > std::chrono::duration<double> diff = end - start; > > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > > > start = std::chrono::steady_clock::now(); > > vDSP_vfill(&pi, array2, 1, N); > > end = std::chrono::steady_clock::now(); > > > > diff = end - start; > > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > > > return 0; > > } > > Interesting. Probably worth a radar to the Accelerate folks. Ok, rdar://problem/70351530.
EWS
Comment 11 2020-10-15 14:15:23 PDT
Committed r268553: <https://trac.webkit.org/changeset/268553> All reviewed patches have been landed. Closing bug and clearing flags on attachment 411460 [details].
Radar WebKit Bug Importer
Comment 12 2020-10-15 14:16:20 PDT
Note You need to log in before you can comment on or make changes to this bug.