WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
217766
Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()
https://bugs.webkit.org/show_bug.cgi?id=217766
Summary
Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues()
Chris Dumez
Reported
2020-10-15 10:43:05 PDT
Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues().
Attachments
Patch
(2.06 KB, patch)
2020-10-15 10:44 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2020-10-15 10:44:53 PDT
Created
attachment 411460
[details]
Patch
Geoffrey Garen
Comment 2
2020-10-15 10:47:16 PDT
Comment on
attachment 411460
[details]
Patch r=me
Darin Adler
Comment 3
2020-10-15 11:03:17 PDT
Comment on
attachment 411460
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> Source/WebCore/ChangeLog:8 > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues().
Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument.
Chris Dumez
Comment 4
2020-10-15 11:11:47 PDT
(In reply to Darin Adler from
comment #3
)
> Comment on
attachment 411460
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> > > Source/WebCore/ChangeLog:8 > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > Does this get us any vectorization or parallelization? I know that’s the > long term ambition of the C++ library, they even have > std::execution::seq/par/par_unseq/unseq that you can pass as the first > argument.
I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? [1]
https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc
Darin Adler
Comment 5
2020-10-15 11:18:00 PDT
Comment on
attachment 411460
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
>>> Source/WebCore/ChangeLog:8 >>> + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). >> >> Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument. > > I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1]
https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc
Like most optimization situations, I think: 1) We should optimize if it makes a measurable difference. 2) Outside of that, we should choose an idiom that is both easy to understand and reasonably optimized by default. I think that using std::fill_n already accomplishes (2) and we could go further if we measure something that shows an optimization opportunity.
Sam Weinig
Comment 6
2020-10-15 11:18:27 PDT
(In reply to Chris Dumez from
comment #4
)
> (In reply to Darin Adler from
comment #3
) > > Comment on
attachment 411460
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> > > > > Source/WebCore/ChangeLog:8 > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > Does this get us any vectorization or parallelization? I know that’s the > > long term ambition of the C++ library, they even have > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > argument. > > I actually don't know. I figured the code was more concise with std::fill > and we *might* get more optimized code. I guess I could write a simple > benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] >
https://developer.apple.com/documentation/accelerate/1450501
- > vdsp_vfill?language=objc
I think using accelerate as much as possible in VectorMath is what we should be doing. I really see it as a "platform" abstraction around it. I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it the right way to go. It doesn't guarantee any vectorization, but since it is closer the the compiler, if autovectorization is in the compiler, it is more likely the compiler will ensure it works with standard library idioms.
Chris Dumez
Comment 7
2020-10-15 11:20:01 PDT
(In reply to Sam Weinig from
comment #6
)
> (In reply to Chris Dumez from
comment #4
) > > (In reply to Darin Adler from
comment #3
) > > > Comment on
attachment 411460
[details]
> > > Patch > > > > > > View in context: > > >
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> > > > > > > Source/WebCore/ChangeLog:8 > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > long term ambition of the C++ library, they even have > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > argument. > > > > I actually don't know. I figured the code was more concise with std::fill > > and we *might* get more optimized code. I guess I could write a simple > > benchmark to compare the 2. > > > > Also note that we could very easily add a VectorMath function that uses > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > [1] > >
https://developer.apple.com/documentation/accelerate/1450501
- > > vdsp_vfill?language=objc > > I think using accelerate as much as possible in VectorMath is what we should > be doing. I really see it as a "platform" abstraction around it. > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > the right way to go. It doesn't guarantee any vectorization, but since it is > closer the the compiler, if autovectorization is in the compiler, it is more > likely the compiler will ensure it works with standard library idioms.
Yes, I think this is a good idea. I will still benchmark std::fill_n() and vDSP_fill() because I am curious now :)
Chris Dumez
Comment 8
2020-10-15 11:55:36 PDT
(In reply to Chris Dumez from
comment #7
)
> (In reply to Sam Weinig from
comment #6
) > > (In reply to Chris Dumez from
comment #4
) > > > (In reply to Darin Adler from
comment #3
) > > > > Comment on
attachment 411460
[details]
> > > > Patch > > > > > > > > View in context: > > > >
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > long term ambition of the C++ library, they even have > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > argument. > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > and we *might* get more optimized code. I guess I could write a simple > > > benchmark to compare the 2. > > > > > > Also note that we could very easily add a VectorMath function that uses > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > [1] > > >
https://developer.apple.com/documentation/accelerate/1450501
- > > > vdsp_vfill?language=objc > > > > I think using accelerate as much as possible in VectorMath is what we should > > be doing. I really see it as a "platform" abstraction around it. > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > the right way to go. It doesn't guarantee any vectorization, but since it is > > closer the the compiler, if autovectorization is in the compiler, it is more > > likely the compiler will ensure it works with standard library idioms. > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > vDSP_fill() because I am curious now :)
Interestingly, std::fill_n() seems consistently faster for large arrays than vDSP_fill() on my MacBook Pro: std::fill_n() took 1.0884ms vDSP::vfill() took 1.20753ms Benchmark: // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o fill_benchmark #include <algorithm> #include <chrono> #include <iostream> #include <Accelerate/Accelerate.h> int main() { constexpr unsigned N = 524288; const float pi = 3.1415926535; float array1[N]; float array2[N]; auto start = std::chrono::steady_clock::now(); std::fill_n(array1, N, pi); auto end = std::chrono::steady_clock::now(); std::chrono::duration<double> diff = end - start; std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; start = std::chrono::steady_clock::now(); vDSP_vfill(&pi, array2, 1, N); end = std::chrono::steady_clock::now(); diff = end - start; std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; std::cout << array1[2000] << " " << array2[2000] << "\n"; return 0; }
Sam Weinig
Comment 9
2020-10-15 13:59:37 PDT
(In reply to Chris Dumez from
comment #8
)
> (In reply to Chris Dumez from
comment #7
) > > (In reply to Sam Weinig from
comment #6
) > > > (In reply to Chris Dumez from
comment #4
) > > > > (In reply to Darin Adler from
comment #3
) > > > > > Comment on
attachment 411460
[details]
> > > > > Patch > > > > > > > > > > View in context: > > > > >
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > long term ambition of the C++ library, they even have > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > argument. > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > and we *might* get more optimized code. I guess I could write a simple > > > > benchmark to compare the 2. > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > [1] > > > >
https://developer.apple.com/documentation/accelerate/1450501
- > > > > vdsp_vfill?language=objc > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > likely the compiler will ensure it works with standard library idioms. > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > vDSP_fill() because I am curious now :) > > Interestingly, std::fill_n() seems consistently faster for large arrays than > vDSP_fill() on my MacBook Pro: > > std::fill_n() took 1.0884ms > vDSP::vfill() took 1.20753ms > > Benchmark: > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > fill_benchmark > > #include <algorithm> > #include <chrono> > #include <iostream> > #include <Accelerate/Accelerate.h> > > int main() > { > constexpr unsigned N = 524288; > const float pi = 3.1415926535; > float array1[N]; > float array2[N]; > > auto start = std::chrono::steady_clock::now(); > std::fill_n(array1, N, pi); > auto end = std::chrono::steady_clock::now(); > > std::chrono::duration<double> diff = end - start; > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > start = std::chrono::steady_clock::now(); > vDSP_vfill(&pi, array2, 1, N); > end = std::chrono::steady_clock::now(); > > diff = end - start; > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > return 0; > }
Interesting. Probably worth a radar to the Accelerate folks.
Chris Dumez
Comment 10
2020-10-15 14:12:47 PDT
(In reply to Sam Weinig from
comment #9
)
> (In reply to Chris Dumez from
comment #8
) > > (In reply to Chris Dumez from
comment #7
) > > > (In reply to Sam Weinig from
comment #6
) > > > > (In reply to Chris Dumez from
comment #4
) > > > > > (In reply to Darin Adler from
comment #3
) > > > > > > Comment on
attachment 411460
[details]
> > > > > > Patch > > > > > > > > > > > > View in context: > > > > > >
https://bugs.webkit.org/attachment.cgi?id=411460&action=review
> > > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > > long term ambition of the C++ library, they even have > > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > > argument. > > > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > > and we *might* get more optimized code. I guess I could write a simple > > > > > benchmark to compare the 2. > > > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > > > [1] > > > > >
https://developer.apple.com/documentation/accelerate/1450501
- > > > > > vdsp_vfill?language=objc > > > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > > likely the compiler will ensure it works with standard library idioms. > > > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > > vDSP_fill() because I am curious now :) > > > > Interestingly, std::fill_n() seems consistently faster for large arrays than > > vDSP_fill() on my MacBook Pro: > > > > std::fill_n() took 1.0884ms > > vDSP::vfill() took 1.20753ms > > > > Benchmark: > > > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > > fill_benchmark > > > > #include <algorithm> > > #include <chrono> > > #include <iostream> > > #include <Accelerate/Accelerate.h> > > > > int main() > > { > > constexpr unsigned N = 524288; > > const float pi = 3.1415926535; > > float array1[N]; > > float array2[N]; > > > > auto start = std::chrono::steady_clock::now(); > > std::fill_n(array1, N, pi); > > auto end = std::chrono::steady_clock::now(); > > > > std::chrono::duration<double> diff = end - start; > > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > > > start = std::chrono::steady_clock::now(); > > vDSP_vfill(&pi, array2, 1, N); > > end = std::chrono::steady_clock::now(); > > > > diff = end - start; > > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > > > return 0; > > } > > Interesting. Probably worth a radar to the Accelerate folks.
Ok,
rdar://problem/70351530
.
EWS
Comment 11
2020-10-15 14:15:23 PDT
Committed
r268553
: <
https://trac.webkit.org/changeset/268553
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 411460
[details]
.
Radar WebKit Bug Importer
Comment 12
2020-10-15 14:16:20 PDT
<
rdar://problem/70351684
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug