When I profiled the demo at https://jsfiddle.net/KrisJohnson/s5vL24o1/123/, I noticed that 20% of the CPU time was spent under DelayDSPKernel::process(). We should vectorize DelayDSPKernel::process() whenever possible to speed things up.
Created attachment 429939 [details] Patch
Created attachment 429943 [details] Patch
This is not really related to the specifics of this change, but for something like this, why are we writing our own kernels rather than trying to utilize something like an existing audio unit effect unit (e.g. kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on supported platforms? Do the existing audio units not work here?
(In reply to Sam Weinig from comment #3) > This is not really related to the specifics of this change, but for > something like this, why are we writing our own kernels rather than trying > to utilize something like an existing audio unit effect unit (e.g. > kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on > supported platforms? Do the existing audio units not work here? That's a good question and I am actually not sure :) I am hacking the WebAudio code but I know very little about our platform's media libraries. Jer or Eric may know the answer to that. I don't think any of the audio code is using such platform functionality in either WebKit or Blink (AFAICT). There MIGHT also be some restrictions due to this code running in the WebProcess but us talking to CoreAudio in the GPUProcess but I am really not sure.
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 > + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. Do we have any protection for the bad case? Or do we need the protection? :-)
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review >>> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 >>> + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. >> >> Do we have any protection for the bad case? Or do we need the protection? :-) > > Do we have any protection for the bad case? Or do we need the protection? :-) The bufferLength is computed using `bufferLengthForDelay(maxDelayTime, sampleRate)`: ``` size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const { // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer. return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up); } ``` In WebAudio, framesToProcess is never greater than AudioUtilities::renderQuantumSize. Therefore, I believe this is safe.
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128 > + float interpolationFactor = readPosition - readIndex1; Just curious, is there accuracy concern here? The same to other changes from "double" to "float".
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review >> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128 >> + float interpolationFactor = readPosition - readIndex1; > > Just curious, is there accuracy concern here? The same to other changes from "double" to "float". I am not too worried since Blink is also using float here. Using float results in faster arithmetic.
Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40 > + ASSERT(bufferLength >= framesToProcess); I’m sure this is indeed safe. But Wwe could also just use std::clamp, std::min, or std::max to be robust against this mistake instead o asserting. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44 > + float* writePointer = &buffer[writeIndex]; auto? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45 > + int remainder = bufferLength - writeIndex; If we want a signed type so make the negative number part work, maybe ptrdiff_t would be a better choice. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48 > + memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder)); Might be better to use ptrdiff_t than int here. I think that sizeof(*writePointer) would be even better than sizeof(float). Could even create template function that guarantees this is done right, like a memcpy that takes a number of items to copy instead of a number of bytes, and does the multiplication with overflow checking done carefully. This relies on framesToProcess/reminder not being huge numbers that would overflow an int when multiplied by 4. That’s probably not a risk, but did you think about where that overflow guarantee comes from? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49 > + memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder)); Same issues. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97 > + if (UNLIKELY(!m_buffer.size() || !source || !destination)) isEmpty better than calling size? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110 > + int bufferLength = m_buffer.size(); Seems like a bad patten to use int for sizes. I’d like to see auto here to guarantee I am not chopping a bigger size down to a smaller one, or size_t because that’s a natural for a size. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141 > + int bufferLength = m_buffer.size(); Ditto.
(In reply to Darin Adler from comment #10) > Comment on attachment 429943 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=429943&action=review > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40 > > + ASSERT(bufferLength >= framesToProcess); > > I’m sure this is indeed safe. But Wwe could also just use std::clamp, > std::min, or std::max to be robust against this mistake instead o asserting. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44 > > + float* writePointer = &buffer[writeIndex]; > > auto? > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45 > > + int remainder = bufferLength - writeIndex; > > If we want a signed type so make the negative number part work, maybe > ptrdiff_t would be a better choice. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48 > > + memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder)); > > Might be better to use ptrdiff_t than int here. > > I think that sizeof(*writePointer) would be even better than sizeof(float). > Could even create template function that guarantees this is done right, like > a memcpy that takes a number of items to copy instead of a number of bytes, > and does the multiplication with overflow checking done carefully. > > This relies on framesToProcess/reminder not being huge numbers that would > overflow an int when multiplied by 4. That’s probably not a risk, but did > you think about where that overflow guarantee comes from? > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49 > > + memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder)); > > Same issues. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97 > > + if (UNLIKELY(!m_buffer.size() || !source || !destination)) > > isEmpty better than calling size? OK, but then I am going to have to add a isEmpty() function to AudioArray :) > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110 > > + int bufferLength = m_buffer.size(); > > Seems like a bad patten to use int for sizes. I’d like to see auto here to > guarantee I am not chopping a bigger size down to a smaller one, or size_t > because that’s a natural for a size. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141 > > + int bufferLength = m_buffer.size(); > > Ditto.
FYI, we are not dealing with large numbers here because frameToProcess is always 128 (per WebAudio spec) and we have a cap on the delay at 180, as well as the sample rate (384000). BufferLength is computed like so: size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const { // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer. return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up); } maxDelayTime cannot be larger than 180. sampleRate cannot be larger than 384000 and AudioUtilities::renderQuantumSize is 128. So the BufferLength cannot be larger than: `128 + timeToSampleFrame(128, 384000, AudioUtilities::SampleFrameRounding::Up)` timeToSampleFrame() would end up being: ``` constexpr double oversampleFactor = 1024; double frame = std::ceil(std::round(time * sampleRate * oversampleFactor) / oversampleFactor); ``` So value returned by timeToSampleFrame() would be 49152000. So max buffer length would be 49152000 + 128 = 49152128. INT_MAX on 32 bit would be 2147483647.
Created attachment 430070 [details] Patch
All sounds great. All those maximums and guarantees are so far away from this code. Reading this one function it’s really hard to know this is all safe. But I accept your explanation that this is not risky and overflow is far from possibility and don’t mind if we leave it like this. Unless there is some way to make it easier to see it’s all safe.
(In reply to Darin Adler from comment #14) > All sounds great. All those maximums and guarantees are so far away from > this code. Reading this one function it’s really hard to know this is all > safe. But I accept your explanation that this is not risky and overflow is > far from possibility and don’t mind if we leave it like this. > > Unless there is some way to make it easier to see it’s all safe. I use size_t consistently in my latest iteration and introduce a very simple function that returns `a <= b ? 0 : (a - b)` to make sure I end up with positive values (instead of ptrdiff_t / int + std::max(0, result).
(In reply to Chris Dumez from comment #15) > introduce a very simple > function that returns `a <= b ? 0 : (a - b)` to make sure I end up with > positive values (instead of ptrdiff_t / int + std::max(0, result). Sounds like a function that we would put into SaturatedArithmetic.h.
Committed r278231 (238268@main): <https://commits.webkit.org/238268@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 430070 [details].
<rdar://problem/78638796>