217766 2020-10-15 10:43:05 -0700 Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues() 2020-10-15 14:16:20 -0700 1 1 1 Unclassified WebKit Web Audio WebKit Nightly Build Unspecified Unspecified RESOLVED FIXED InRadar P2 Normal --- 212611 1 cdumez cdumez cdumez darin eric.carlson ews-watchlist ggaren glenn jer.noble philipj sam sergio webkit-bug-importer oldest_to_newest 1698174 0 cdumez 2020-10-15 10:43:05 -0700 Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). 1698177 1 411460 cdumez 2020-10-15 10:44:53 -0700 Created attachment 411460 Patch 1698179 2 411460 ggaren 2020-10-15 10:47:16 -0700 Comment on attachment 411460 Patch r=me 1698184 3 411460 darin 2020-10-15 11:03:17 -0700 Comment on attachment 411460 Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411460&action=review > Source/WebCore/ChangeLog:8 > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument. 1698191 4 cdumez 2020-10-15 11:11:47 -0700 (In reply to Darin Adler from comment #3) > Comment on attachment 411460 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > Source/WebCore/ChangeLog:8 > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > Does this get us any vectorization or parallelization? I know that’s the > long term ambition of the C++ library, they even have > std::execution::seq/par/par_unseq/unseq that you can pass as the first > argument. I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? [1] https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc 1698200 5 411460 darin 2020-10-15 11:18:00 -0700 Comment on attachment 411460 Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411460&action=review >>> Source/WebCore/ChangeLog:8 >>> + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). >> >> Does this get us any vectorization or parallelization? I know that’s the long term ambition of the C++ library, they even have std::execution::seq/par/par_unseq/unseq that you can pass as the first argument. > > I actually don't know. I figured the code was more concise with std::fill and we *might* get more optimized code. I guess I could write a simple benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] https://developer.apple.com/documentation/accelerate/1450501-vdsp_vfill?language=objc Like most optimization situations, I think: 1) We should optimize if it makes a measurable difference. 2) Outside of that, we should choose an idiom that is both easy to understand and reasonably optimized by default. I think that using std::fill_n already accomplishes (2) and we could go further if we measure something that shows an optimization opportunity. 1698201 6 sam 2020-10-15 11:18:27 -0700 (In reply to Chris Dumez from comment #4) > (In reply to Darin Adler from comment #3) > > Comment on attachment 411460 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > Source/WebCore/ChangeLog:8 > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > Does this get us any vectorization or parallelization? I know that’s the > > long term ambition of the C++ library, they even have > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > argument. > > I actually don't know. I figured the code was more concise with std::fill > and we *might* get more optimized code. I guess I could write a simple > benchmark to compare the 2. > > Also note that we could very easily add a VectorMath function that uses > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > [1] > https://developer.apple.com/documentation/accelerate/1450501- > vdsp_vfill?language=objc I think using accelerate as much as possible in VectorMath is what we should be doing. I really see it as a "platform" abstraction around it. I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it the right way to go. It doesn't guarantee any vectorization, but since it is closer the the compiler, if autovectorization is in the compiler, it is more likely the compiler will ensure it works with standard library idioms. 1698204 7 cdumez 2020-10-15 11:20:01 -0700 (In reply to Sam Weinig from comment #6) > (In reply to Chris Dumez from comment #4) > > (In reply to Darin Adler from comment #3) > > > Comment on attachment 411460 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > Source/WebCore/ChangeLog:8 > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > long term ambition of the C++ library, they even have > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > argument. > > > > I actually don't know. I figured the code was more concise with std::fill > > and we *might* get more optimized code. I guess I could write a simple > > benchmark to compare the 2. > > > > Also note that we could very easily add a VectorMath function that uses > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > [1] > > https://developer.apple.com/documentation/accelerate/1450501- > > vdsp_vfill?language=objc > > I think using accelerate as much as possible in VectorMath is what we should > be doing. I really see it as a "platform" abstraction around it. > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > the right way to go. It doesn't guarantee any vectorization, but since it is > closer the the compiler, if autovectorization is in the compiler, it is more > likely the compiler will ensure it works with standard library idioms. Yes, I think this is a good idea. I will still benchmark std::fill_n() and vDSP_fill() because I am curious now :) 1698215 8 cdumez 2020-10-15 11:55:36 -0700 (In reply to Chris Dumez from comment #7) > (In reply to Sam Weinig from comment #6) > > (In reply to Chris Dumez from comment #4) > > > (In reply to Darin Adler from comment #3) > > > > Comment on attachment 411460 [details] > > > > Patch > > > > > > > > View in context: > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > long term ambition of the C++ library, they even have > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > argument. > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > and we *might* get more optimized code. I guess I could write a simple > > > benchmark to compare the 2. > > > > > > Also note that we could very easily add a VectorMath function that uses > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > [1] > > > https://developer.apple.com/documentation/accelerate/1450501- > > > vdsp_vfill?language=objc > > > > I think using accelerate as much as possible in VectorMath is what we should > > be doing. I really see it as a "platform" abstraction around it. > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > the right way to go. It doesn't guarantee any vectorization, but since it is > > closer the the compiler, if autovectorization is in the compiler, it is more > > likely the compiler will ensure it works with standard library idioms. > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > vDSP_fill() because I am curious now :) Interestingly, std::fill_n() seems consistently faster for large arrays than vDSP_fill() on my MacBook Pro: std::fill_n() took 1.0884ms vDSP::vfill() took 1.20753ms Benchmark: // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o fill_benchmark #include <algorithm> #include <chrono> #include <iostream> #include <Accelerate/Accelerate.h> int main() { constexpr unsigned N = 524288; const float pi = 3.1415926535; float array1[N]; float array2[N]; auto start = std::chrono::steady_clock::now(); std::fill_n(array1, N, pi); auto end = std::chrono::steady_clock::now(); std::chrono::duration<double> diff = end - start; std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; start = std::chrono::steady_clock::now(); vDSP_vfill(&pi, array2, 1, N); end = std::chrono::steady_clock::now(); diff = end - start; std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; std::cout << array1[2000] << " " << array2[2000] << "\n"; return 0; } 1698267 9 sam 2020-10-15 13:59:37 -0700 (In reply to Chris Dumez from comment #8) > (In reply to Chris Dumez from comment #7) > > (In reply to Sam Weinig from comment #6) > > > (In reply to Chris Dumez from comment #4) > > > > (In reply to Darin Adler from comment #3) > > > > > Comment on attachment 411460 [details] > > > > > Patch > > > > > > > > > > View in context: > > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > long term ambition of the C++ library, they even have > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > argument. > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > and we *might* get more optimized code. I guess I could write a simple > > > > benchmark to compare the 2. > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > [1] > > > > https://developer.apple.com/documentation/accelerate/1450501- > > > > vdsp_vfill?language=objc > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > likely the compiler will ensure it works with standard library idioms. > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > vDSP_fill() because I am curious now :) > > Interestingly, std::fill_n() seems consistently faster for large arrays than > vDSP_fill() on my MacBook Pro: > > std::fill_n() took 1.0884ms > vDSP::vfill() took 1.20753ms > > Benchmark: > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > fill_benchmark > > #include <algorithm> > #include <chrono> > #include <iostream> > #include <Accelerate/Accelerate.h> > > int main() > { > constexpr unsigned N = 524288; > const float pi = 3.1415926535; > float array1[N]; > float array2[N]; > > auto start = std::chrono::steady_clock::now(); > std::fill_n(array1, N, pi); > auto end = std::chrono::steady_clock::now(); > > std::chrono::duration<double> diff = end - start; > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > start = std::chrono::steady_clock::now(); > vDSP_vfill(&pi, array2, 1, N); > end = std::chrono::steady_clock::now(); > > diff = end - start; > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > return 0; > } Interesting. Probably worth a radar to the Accelerate folks. 1698275 10 cdumez 2020-10-15 14:12:47 -0700 (In reply to Sam Weinig from comment #9) > (In reply to Chris Dumez from comment #8) > > (In reply to Chris Dumez from comment #7) > > > (In reply to Sam Weinig from comment #6) > > > > (In reply to Chris Dumez from comment #4) > > > > > (In reply to Darin Adler from comment #3) > > > > > > Comment on attachment 411460 [details] > > > > > > Patch > > > > > > > > > > > > View in context: > > > > > > https://bugs.webkit.org/attachment.cgi?id=411460&action=review > > > > > > > > > > > > > Source/WebCore/ChangeLog:8 > > > > > > > + Use std::fill_n() instead of for loops in AudioParam::calculateFinalValues(). > > > > > > > > > > > > Does this get us any vectorization or parallelization? I know that’s the > > > > > > long term ambition of the C++ library, they even have > > > > > > std::execution::seq/par/par_unseq/unseq that you can pass as the first > > > > > > argument. > > > > > > > > > > I actually don't know. I figured the code was more concise with std::fill > > > > > and we *might* get more optimized code. I guess I could write a simple > > > > > benchmark to compare the 2. > > > > > > > > > > Also note that we could very easily add a VectorMath function that uses > > > > > vDSP_fill() [1] to guarantee we get vectorization. What do you think? > > > > > > > > > > [1] > > > > > https://developer.apple.com/documentation/accelerate/1450501- > > > > > vdsp_vfill?language=objc > > > > > > > > I think using accelerate as much as possible in VectorMath is what we should > > > > be doing. I really see it as a "platform" abstraction around it. > > > > > > > > I still think using std::fill_n here for the non-HAVE(ACCELERATE) case it > > > > the right way to go. It doesn't guarantee any vectorization, but since it is > > > > closer the the compiler, if autovectorization is in the compiler, it is more > > > > likely the compiler will ensure it works with standard library idioms. > > > > > > Yes, I think this is a good idea. I will still benchmark std::fill_n() and > > > vDSP_fill() because I am curious now :) > > > > Interestingly, std::fill_n() seems consistently faster for large arrays than > > vDSP_fill() on my MacBook Pro: > > > > std::fill_n() took 1.0884ms > > vDSP::vfill() took 1.20753ms > > > > Benchmark: > > > > // clang++ -O2 -std=c++14 -framework Accelerate fill_benchmark.cpp -o > > fill_benchmark > > > > #include <algorithm> > > #include <chrono> > > #include <iostream> > > #include <Accelerate/Accelerate.h> > > > > int main() > > { > > constexpr unsigned N = 524288; > > const float pi = 3.1415926535; > > float array1[N]; > > float array2[N]; > > > > auto start = std::chrono::steady_clock::now(); > > std::fill_n(array1, N, pi); > > auto end = std::chrono::steady_clock::now(); > > > > std::chrono::duration<double> diff = end - start; > > std::cout << "std::fill_n() took " << diff.count() * 1000 << "ms\n"; > > > > start = std::chrono::steady_clock::now(); > > vDSP_vfill(&pi, array2, 1, N); > > end = std::chrono::steady_clock::now(); > > > > diff = end - start; > > std::cout << "vDSP::vfill() took " << diff.count() * 1000 << "ms\n"; > > > > std::cout << array1[2000] << " " << array2[2000] << "\n"; > > > > return 0; > > } > > Interesting. Probably worth a radar to the Accelerate folks. Ok, rdar://problem/70351530. 1698277 11 ews-feeder 2020-10-15 14:15:23 -0700 Committed r268553: <https://trac.webkit.org/changeset/268553> All reviewed patches have been landed. Closing bug and clearing flags on attachment 411460. 1698278 12 webkit-bug-importer 2020-10-15 14:16:20 -0700 <rdar://problem/70351684> 411460 2020-10-15 10:44:53 -0700 2020-10-15 14:15:24 -0700 Patch bug-217766-20201015104452.patch text/plain 2113 cdumez U3VidmVyc2lvbiBSZXZpc2lvbjogMjY4NTI2CmRpZmYgLS1naXQgYS9Tb3VyY2UvV2ViQ29yZS9D aGFuZ2VMb2cgYi9Tb3VyY2UvV2ViQ29yZS9DaGFuZ2VMb2cKaW5kZXggNmEyMWQ5OWViZGExZWQ2 MDg1NzJhZGFiZjJhNWI4MjEzM2EzYmU5NS4uZTllNDRlMzgzYzc2NTU1YzY0NDA4NDkyZWY3ZWRi NGMzNjRjNTY3MCAxMDA2NDQKLS0tIGEvU291cmNlL1dlYkNvcmUvQ2hhbmdlTG9nCisrKyBiL1Nv dXJjZS9XZWJDb3JlL0NoYW5nZUxvZwpAQCAtMSwzICsxLDE3IEBACisyMDIwLTEwLTE1ICBDaHJp cyBEdW1leiAgPGNkdW1lekBhcHBsZS5jb20+CisKKyAgICAgICAgVXNlIHN0ZDo6ZmlsbF9uKCkg aW5zdGVhZCBvZiBmb3IgbG9vcHMgaW4gQXVkaW9QYXJhbTo6Y2FsY3VsYXRlRmluYWxWYWx1ZXMo KQorICAgICAgICBodHRwczovL2J1Z3Mud2Via2l0Lm9yZy9zaG93X2J1Zy5jZ2k/aWQ9MjE3NzY2 CisKKyAgICAgICAgUmV2aWV3ZWQgYnkgTk9CT0RZIChPT1BTISkuCisKKyAgICAgICAgVXNlIHN0 ZDo6ZmlsbF9uKCkgaW5zdGVhZCBvZiBmb3IgbG9vcHMgaW4gQXVkaW9QYXJhbTo6Y2FsY3VsYXRl RmluYWxWYWx1ZXMoKS4KKworICAgICAgICBObyBuZXcgdGVzdHMsIG5vIFdlYi1mYWNpbmcgYmVo YXZpb3IgY2hhbmdlLgorCisgICAgICAgICogTW9kdWxlcy93ZWJhdWRpby9BdWRpb1BhcmFtLmNw cDoKKyAgICAgICAgKFdlYkNvcmU6OkF1ZGlvUGFyYW06OmNhbGN1bGF0ZUZpbmFsVmFsdWVzKToK KwogMjAyMC0xMC0xNSAgQ2hyaXMgRHVtZXogIDxjZHVtZXpAYXBwbGUuY29tPgogCiAgICAgICAg IFZlY3Rvcml6ZSBTdGVyZW9QYW5uZXIncyBwYW5Ub1RhcmdldFZhbHVlKCkKZGlmZiAtLWdpdCBh L1NvdXJjZS9XZWJDb3JlL01vZHVsZXMvd2ViYXVkaW8vQXVkaW9QYXJhbS5jcHAgYi9Tb3VyY2Uv V2ViQ29yZS9Nb2R1bGVzL3dlYmF1ZGlvL0F1ZGlvUGFyYW0uY3BwCmluZGV4IDIyNTRjNjU4NjJl ODZhMTJhMmJiN2EzN2FlNmUyYWQ5OGI5ZTFiZDguLmExNDIwYmJhYjk2MzY1NGIzNjA4Y2RmYTE0 ZGMwOWYwOTIyYjEwMjEgMTAwNjQ0Ci0tLSBhL1NvdXJjZS9XZWJDb3JlL01vZHVsZXMvd2ViYXVk aW8vQXVkaW9QYXJhbS5jcHAKKysrIGIvU291cmNlL1dlYkNvcmUvTW9kdWxlcy93ZWJhdWRpby9B dWRpb1BhcmFtLmNwcApAQCAtMjY2LDkgKzI2Niw3IEBAIHZvaWQgQXVkaW9QYXJhbTo6Y2FsY3Vs YXRlRmluYWxWYWx1ZXMoZmxvYXQqIHZhbHVlcywgdW5zaWduZWQgbnVtYmVyT2ZWYWx1ZXMsIGJv CiAKICAgICAgICAgaWYgKHRpbWVsaW5lVmFsdWUpCiAgICAgICAgICAgICBtX3ZhbHVlID0gKnRp bWVsaW5lVmFsdWU7Ci0KLSAgICAgICAgZm9yICh1bnNpZ25lZCBpID0gMDsgaSA8IG51bWJlck9m VmFsdWVzOyArK2kpCi0gICAgICAgICAgICB2YWx1ZXNbaV0gPSBtX3ZhbHVlOworICAgICAgICBz dGQ6OmZpbGxfbih2YWx1ZXMsIG51bWJlck9mVmFsdWVzLCBtX3ZhbHVlKTsKICAgICB9CiAKICAg ICBpZiAoIW51bWJlck9mUmVuZGVyaW5nQ29ubmVjdGlvbnMoKSkKQEAgLTI5NCwxMCArMjkyLDgg QEAgdm9pZCBBdWRpb1BhcmFtOjpjYWxjdWxhdGVGaW5hbFZhbHVlcyhmbG9hdCogdmFsdWVzLCB1 bnNpZ25lZCBudW1iZXJPZlZhbHVlcywgYm8KICAgICB9CiAKICAgICAvLyBJZiB3ZSdyZSBub3Qg c2FtcGxlIGFjY3VyYXRlLCBkdXBsaWNhdGUgdGhlIGZpcnN0IGVsZW1lbnQgb2YgfHZhbHVlc3wg dG8gYWxsIG9mIHRoZSBlbGVtZW50cy4KLSAgICBpZiAoIXNhbXBsZUFjY3VyYXRlKSB7Ci0gICAg ICAgIGZvciAodW5zaWduZWQgaSA9IDE7IGkgPCBudW1iZXJPZlZhbHVlczsgKytpKQotICAgICAg ICAgICAgdmFsdWVzW2ldID0gdmFsdWVzWzBdOwotICAgIH0KKyAgICBpZiAoIXNhbXBsZUFjY3Vy YXRlKQorICAgICAgICBzdGQ6OmZpbGxfbih2YWx1ZXMgKyAxLCBudW1iZXJPZlZhbHVlcyAtIDEs IHZhbHVlc1swXSk7CiAKICAgICAvLyBDbGFtcCB2YWx1ZXMgYmFzZWQgb24gcmFuZ2UgYWxsb3dl ZCBieSBBdWRpb1BhcmFtJ3MgbWluIGFuZCBtYXggdmFsdWVzLgogICAgIFZlY3Rvck1hdGg6OmNs YW1wKHZhbHVlcywgbWluVmFsdWUoKSwgbWF4VmFsdWUoKSwgdmFsdWVzLCBudW1iZXJPZlZhbHVl cyk7Cg==