RESOLVED FIXED 226358
DelayDSPKernel::process() is slow
https://bugs.webkit.org/show_bug.cgi?id=226358
Summary DelayDSPKernel::process() is slow
Chris Dumez
Reported 2021-05-27 15:17:51 PDT
When I profiled the demo at https://jsfiddle.net/KrisJohnson/s5vL24o1/123/, I noticed that 20% of the CPU time was spent under DelayDSPKernel::process(). We should vectorize DelayDSPKernel::process() whenever possible to speed things up.
Attachments
Patch (20.21 KB, patch)
2021-05-27 15:29 PDT, Chris Dumez
no flags
Patch (20.32 KB, patch)
2021-05-27 15:47 PDT, Chris Dumez
no flags
Patch (21.20 KB, patch)
2021-05-28 16:26 PDT, Chris Dumez
no flags
Chris Dumez
Comment 1 2021-05-27 15:29:32 PDT
Chris Dumez
Comment 2 2021-05-27 15:47:14 PDT
Sam Weinig
Comment 3 2021-05-27 17:35:49 PDT
This is not really related to the specifics of this change, but for something like this, why are we writing our own kernels rather than trying to utilize something like an existing audio unit effect unit (e.g. kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on supported platforms? Do the existing audio units not work here?
Chris Dumez
Comment 4 2021-05-27 18:27:40 PDT
(In reply to Sam Weinig from comment #3) > This is not really related to the specifics of this change, but for > something like this, why are we writing our own kernels rather than trying > to utilize something like an existing audio unit effect unit (e.g. > kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on > supported platforms? Do the existing audio units not work here? That's a good question and I am actually not sure :) I am hacking the WebAudio code but I know very little about our platform's media libraries. Jer or Eric may know the answer to that. I don't think any of the audio code is using such platform functionality in either WebKit or Blink (AFAICT). There MIGHT also be some restrictions due to this code running in the WebProcess but us talking to CoreAudio in the GPUProcess but I am really not sure.
Peng Liu
Comment 5 2021-05-28 13:14:52 PDT
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 > + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. Do we have any protection for the bad case? Or do we need the protection? :-)
Peng Liu
Comment 6 2021-05-28 13:14:53 PDT
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 > + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. Do we have any protection for the bad case? Or do we need the protection? :-)
Chris Dumez
Comment 7 2021-05-28 13:26:21 PDT
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review >>> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 >>> + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. >> >> Do we have any protection for the bad case? Or do we need the protection? :-) > > Do we have any protection for the bad case? Or do we need the protection? :-) The bufferLength is computed using `bufferLengthForDelay(maxDelayTime, sampleRate)`: ``` size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const { // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer. return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up); } ``` In WebAudio, framesToProcess is never greater than AudioUtilities::renderQuantumSize. Therefore, I believe this is safe.
Peng Liu
Comment 8 2021-05-28 14:11:15 PDT
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128 > + float interpolationFactor = readPosition - readIndex1; Just curious, is there accuracy concern here? The same to other changes from "double" to "float".
Chris Dumez
Comment 9 2021-05-28 14:12:39 PDT
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review >> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128 >> + float interpolationFactor = readPosition - readIndex1; > > Just curious, is there accuracy concern here? The same to other changes from "double" to "float". I am not too worried since Blink is also using float here. Using float results in faster arithmetic.
Darin Adler
Comment 10 2021-05-28 15:07:47 PDT
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40 > + ASSERT(bufferLength >= framesToProcess); I’m sure this is indeed safe. But Wwe could also just use std::clamp, std::min, or std::max to be robust against this mistake instead o asserting. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44 > + float* writePointer = &buffer[writeIndex]; auto? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45 > + int remainder = bufferLength - writeIndex; If we want a signed type so make the negative number part work, maybe ptrdiff_t would be a better choice. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48 > + memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder)); Might be better to use ptrdiff_t than int here. I think that sizeof(*writePointer) would be even better than sizeof(float). Could even create template function that guarantees this is done right, like a memcpy that takes a number of items to copy instead of a number of bytes, and does the multiplication with overflow checking done carefully. This relies on framesToProcess/reminder not being huge numbers that would overflow an int when multiplied by 4. That’s probably not a risk, but did you think about where that overflow guarantee comes from? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49 > + memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder)); Same issues. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97 > + if (UNLIKELY(!m_buffer.size() || !source || !destination)) isEmpty better than calling size? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110 > + int bufferLength = m_buffer.size(); Seems like a bad patten to use int for sizes. I’d like to see auto here to guarantee I am not chopping a bigger size down to a smaller one, or size_t because that’s a natural for a size. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141 > + int bufferLength = m_buffer.size(); Ditto.
Chris Dumez
Comment 11 2021-05-28 15:29:55 PDT
(In reply to Darin Adler from comment #10) > Comment on attachment 429943 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=429943&action=review > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40 > > + ASSERT(bufferLength >= framesToProcess); > > I’m sure this is indeed safe. But Wwe could also just use std::clamp, > std::min, or std::max to be robust against this mistake instead o asserting. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44 > > + float* writePointer = &buffer[writeIndex]; > > auto? > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45 > > + int remainder = bufferLength - writeIndex; > > If we want a signed type so make the negative number part work, maybe > ptrdiff_t would be a better choice. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48 > > + memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder)); > > Might be better to use ptrdiff_t than int here. > > I think that sizeof(*writePointer) would be even better than sizeof(float). > Could even create template function that guarantees this is done right, like > a memcpy that takes a number of items to copy instead of a number of bytes, > and does the multiplication with overflow checking done carefully. > > This relies on framesToProcess/reminder not being huge numbers that would > overflow an int when multiplied by 4. That’s probably not a risk, but did > you think about where that overflow guarantee comes from? > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49 > > + memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder)); > > Same issues. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97 > > + if (UNLIKELY(!m_buffer.size() || !source || !destination)) > > isEmpty better than calling size? OK, but then I am going to have to add a isEmpty() function to AudioArray :) > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110 > > + int bufferLength = m_buffer.size(); > > Seems like a bad patten to use int for sizes. I’d like to see auto here to > guarantee I am not chopping a bigger size down to a smaller one, or size_t > because that’s a natural for a size. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141 > > + int bufferLength = m_buffer.size(); > > Ditto.
Chris Dumez
Comment 12 2021-05-28 15:49:26 PDT
FYI, we are not dealing with large numbers here because frameToProcess is always 128 (per WebAudio spec) and we have a cap on the delay at 180, as well as the sample rate (384000). BufferLength is computed like so: size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const { // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer. return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up); } maxDelayTime cannot be larger than 180. sampleRate cannot be larger than 384000 and AudioUtilities::renderQuantumSize is 128. So the BufferLength cannot be larger than: `128 + timeToSampleFrame(128, 384000, AudioUtilities::SampleFrameRounding::Up)` timeToSampleFrame() would end up being: ``` constexpr double oversampleFactor = 1024; double frame = std::ceil(std::round(time * sampleRate * oversampleFactor) / oversampleFactor); ``` So value returned by timeToSampleFrame() would be 49152000. So max buffer length would be 49152000 + 128 = 49152128. INT_MAX on 32 bit would be 2147483647.
Chris Dumez
Comment 13 2021-05-28 16:26:46 PDT
Darin Adler
Comment 14 2021-05-28 16:27:25 PDT
All sounds great. All those maximums and guarantees are so far away from this code. Reading this one function it’s really hard to know this is all safe. But I accept your explanation that this is not risky and overflow is far from possibility and don’t mind if we leave it like this. Unless there is some way to make it easier to see it’s all safe.
Chris Dumez
Comment 15 2021-05-28 16:34:27 PDT
(In reply to Darin Adler from comment #14) > All sounds great. All those maximums and guarantees are so far away from > this code. Reading this one function it’s really hard to know this is all > safe. But I accept your explanation that this is not risky and overflow is > far from possibility and don’t mind if we leave it like this. > > Unless there is some way to make it easier to see it’s all safe. I use size_t consistently in my latest iteration and introduce a very simple function that returns `a <= b ? 0 : (a - b)` to make sure I end up with positive values (instead of ptrdiff_t / int + std::max(0, result).
Darin Adler
Comment 16 2021-05-28 16:43:52 PDT
(In reply to Chris Dumez from comment #15) > introduce a very simple > function that returns `a <= b ? 0 : (a - b)` to make sure I end up with > positive values (instead of ptrdiff_t / int + std::max(0, result). Sounds like a function that we would put into SaturatedArithmetic.h.
EWS
Comment 17 2021-05-28 17:52:32 PDT
Committed r278231 (238268@main): <https://commits.webkit.org/238268@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 430070 [details].
Radar WebKit Bug Importer
Comment 18 2021-05-28 17:53:18 PDT
Note You need to log in before you can comment on or make changes to this bug.