226358 – DelayDSPKernel::process() is slow

RESOLVED FIXED 226358

DelayDSPKernel::process() is slow

https://bugs.webkit.org/show_bug.cgi?id=226358

Summary DelayDSPKernel::process() is slow

Chris Dumez

Reported 2021-05-27 15:17:51 PDT

When I profiled the demo at https://jsfiddle.net/KrisJohnson/s5vL24o1/123/, I noticed that 20% of the CPU time was spent under DelayDSPKernel::process(). We should vectorize DelayDSPKernel::process() whenever possible to speed things up.

Attachments
Patch (20.21 KB, patch) 2021-05-27 15:29 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (20.32 KB, patch) 2021-05-27 15:47 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (21.20 KB, patch) 2021-05-28 16:26 PDT, Chris Dumez	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Chris Dumez

Comment 1 2021-05-27 15:29:32 PDT

Created attachment 429939 [details] Patch

Chris Dumez

Comment 2 2021-05-27 15:47:14 PDT

Created attachment 429943 [details] Patch

Sam Weinig

Comment 3 2021-05-27 17:35:49 PDT

This is not really related to the specifics of this change, but for something like this, why are we writing our own kernels rather than trying to utilize something like an existing audio unit effect unit (e.g. kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on supported platforms? Do the existing audio units not work here?

Chris Dumez

Comment 4 2021-05-27 18:27:40 PDT

(In reply to Sam Weinig from comment #3) > This is not really related to the specifics of this change, but for > something like this, why are we writing our own kernels rather than trying > to utilize something like an existing audio unit effect unit (e.g. > kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on > supported platforms? Do the existing audio units not work here? That's a good question and I am actually not sure :) I am hacking the WebAudio code but I know very little about our platform's media libraries. Jer or Eric may know the answer to that. I don't think any of the audio code is using such platform functionality in either WebKit or Blink (AFAICT). There MIGHT also be some restrictions due to this code running in the WebProcess but us talking to CoreAudio in the GPUProcess but I am really not sure.

Peng Liu

Comment 5 2021-05-28 13:14:52 PDT

Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 > + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. Do we have any protection for the bad case? Or do we need the protection? :-)

Peng Liu

Comment 6 2021-05-28 13:14:53 PDT

Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 > + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. Do we have any protection for the bad case? Or do we need the protection? :-)

Chris Dumez

Comment 7 2021-05-28 13:26:21 PDT

Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review >>> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39 >>> + // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once. >> >> Do we have any protection for the bad case? Or do we need the protection? :-) > > Do we have any protection for the bad case? Or do we need the protection? :-) The bufferLength is computed using `bufferLengthForDelay(maxDelayTime, sampleRate)`: ``` size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const { // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer. return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up); } ``` In WebAudio, framesToProcess is never greater than AudioUtilities::renderQuantumSize. Therefore, I believe this is safe.

Peng Liu

Comment 8 2021-05-28 14:11:15 PDT

Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128 > + float interpolationFactor = readPosition - readIndex1; Just curious, is there accuracy concern here? The same to other changes from "double" to "float".

Chris Dumez

Comment 9 2021-05-28 14:12:39 PDT

Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review >> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128 >> + float interpolationFactor = readPosition - readIndex1; > > Just curious, is there accuracy concern here? The same to other changes from "double" to "float". I am not too worried since Blink is also using float here. Using float results in faster arithmetic.

Darin Adler

Comment 10 2021-05-28 15:07:47 PDT

Comment on attachment 429943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40 > + ASSERT(bufferLength >= framesToProcess); I’m sure this is indeed safe. But Wwe could also just use std::clamp, std::min, or std::max to be robust against this mistake instead o asserting. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44 > + float* writePointer = &buffer[writeIndex]; auto? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45 > + int remainder = bufferLength - writeIndex; If we want a signed type so make the negative number part work, maybe ptrdiff_t would be a better choice. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48 > + memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder)); Might be better to use ptrdiff_t than int here. I think that sizeof(*writePointer) would be even better than sizeof(float). Could even create template function that guarantees this is done right, like a memcpy that takes a number of items to copy instead of a number of bytes, and does the multiplication with overflow checking done carefully. This relies on framesToProcess/reminder not being huge numbers that would overflow an int when multiplied by 4. That’s probably not a risk, but did you think about where that overflow guarantee comes from? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49 > + memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder)); Same issues. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97 > + if (UNLIKELY(!m_buffer.size() || !source || !destination)) isEmpty better than calling size? > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110 > + int bufferLength = m_buffer.size(); Seems like a bad patten to use int for sizes. I’d like to see auto here to guarantee I am not chopping a bigger size down to a smaller one, or size_t because that’s a natural for a size. > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141 > + int bufferLength = m_buffer.size(); Ditto.

Chris Dumez

Comment 11 2021-05-28 15:29:55 PDT

(In reply to Darin Adler from comment #10) > Comment on attachment 429943 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=429943&action=review > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40 > > + ASSERT(bufferLength >= framesToProcess); > > I’m sure this is indeed safe. But Wwe could also just use std::clamp, > std::min, or std::max to be robust against this mistake instead o asserting. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44 > > + float* writePointer = &buffer[writeIndex]; > > auto? > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45 > > + int remainder = bufferLength - writeIndex; > > If we want a signed type so make the negative number part work, maybe > ptrdiff_t would be a better choice. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48 > > + memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder)); > > Might be better to use ptrdiff_t than int here. > > I think that sizeof(*writePointer) would be even better than sizeof(float). > Could even create template function that guarantees this is done right, like > a memcpy that takes a number of items to copy instead of a number of bytes, > and does the multiplication with overflow checking done carefully. > > This relies on framesToProcess/reminder not being huge numbers that would > overflow an int when multiplied by 4. That’s probably not a risk, but did > you think about where that overflow guarantee comes from? > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49 > > + memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder)); > > Same issues. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97 > > + if (UNLIKELY(!m_buffer.size() || !source || !destination)) > > isEmpty better than calling size? OK, but then I am going to have to add a isEmpty() function to AudioArray :) > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110 > > + int bufferLength = m_buffer.size(); > > Seems like a bad patten to use int for sizes. I’d like to see auto here to > guarantee I am not chopping a bigger size down to a smaller one, or size_t > because that’s a natural for a size. > > > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141 > > + int bufferLength = m_buffer.size(); > > Ditto.

Chris Dumez

Comment 12 2021-05-28 15:49:26 PDT

FYI, we are not dealing with large numbers here because frameToProcess is always 128 (per WebAudio spec) and we have a cap on the delay at 180, as well as the sample rate (384000). BufferLength is computed like so: size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const { // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer. return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up); } maxDelayTime cannot be larger than 180. sampleRate cannot be larger than 384000 and AudioUtilities::renderQuantumSize is 128. So the BufferLength cannot be larger than: `128 + timeToSampleFrame(128, 384000, AudioUtilities::SampleFrameRounding::Up)` timeToSampleFrame() would end up being: ``` constexpr double oversampleFactor = 1024; double frame = std::ceil(std::round(time * sampleRate * oversampleFactor) / oversampleFactor); ``` So value returned by timeToSampleFrame() would be 49152000. So max buffer length would be 49152000 + 128 = 49152128. INT_MAX on 32 bit would be 2147483647.

Chris Dumez

Comment 13 2021-05-28 16:26:46 PDT

Created attachment 430070 [details] Patch

Darin Adler

Comment 14 2021-05-28 16:27:25 PDT

All sounds great. All those maximums and guarantees are so far away from this code. Reading this one function it’s really hard to know this is all safe. But I accept your explanation that this is not risky and overflow is far from possibility and don’t mind if we leave it like this. Unless there is some way to make it easier to see it’s all safe.

Chris Dumez

Comment 15 2021-05-28 16:34:27 PDT

(In reply to Darin Adler from comment #14) > All sounds great. All those maximums and guarantees are so far away from > this code. Reading this one function it’s really hard to know this is all > safe. But I accept your explanation that this is not risky and overflow is > far from possibility and don’t mind if we leave it like this. > > Unless there is some way to make it easier to see it’s all safe. I use size_t consistently in my latest iteration and introduce a very simple function that returns `a <= b ? 0 : (a - b)` to make sure I end up with positive values (instead of ptrdiff_t / int + std::max(0, result).

Darin Adler

Comment 16 2021-05-28 16:43:52 PDT

(In reply to Chris Dumez from comment #15) > introduce a very simple > function that returns `a <= b ? 0 : (a - b)` to make sure I end up with > positive values (instead of ptrdiff_t / int + std::max(0, result). Sounds like a function that we would put into SaturatedArithmetic.h.

EWS

Comment 17 2021-05-28 17:52:32 PDT

Committed r278231 (238268@main): <https://commits.webkit.org/238268@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 430070 [details].

Radar WebKit Bug Importer

Comment 18 2021-05-28 17:53:18 PDT

<rdar://problem/78638796>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Web Audio

Assignee

Chris Dumez

Reported

2021-05-27 15:17 PDT

Modified

2021-05-28 17:53 PDT History

CC List

11 users Show

URL

Keywords InRadar

Depends on

Blocks