Bug 226358 - DelayDSPKernel::process() is slow
Summary: DelayDSPKernel::process() is slow
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Audio (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Chris Dumez
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-05-27 15:17 PDT by Chris Dumez
Modified: 2021-05-28 17:53 PDT (History)
11 users (show)

See Also:


Attachments
Patch (20.21 KB, patch)
2021-05-27 15:29 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (20.32 KB, patch)
2021-05-27 15:47 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff
Patch (21.20 KB, patch)
2021-05-28 16:26 PDT, Chris Dumez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2021-05-27 15:17:51 PDT
When I profiled the demo at https://jsfiddle.net/KrisJohnson/s5vL24o1/123/, I noticed that 20% of the CPU time was spent under DelayDSPKernel::process().
We should vectorize DelayDSPKernel::process() whenever possible to speed things up.
Comment 1 Chris Dumez 2021-05-27 15:29:32 PDT
Created attachment 429939 [details]
Patch
Comment 2 Chris Dumez 2021-05-27 15:47:14 PDT
Created attachment 429943 [details]
Patch
Comment 3 Sam Weinig 2021-05-27 17:35:49 PDT
This is not really related to the specifics of this change, but for something like this, why are we writing our own kernels rather than trying to utilize something like an existing audio unit effect unit (e.g. kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on supported platforms? Do the existing audio units not work here?
Comment 4 Chris Dumez 2021-05-27 18:27:40 PDT
(In reply to Sam Weinig from comment #3)
> This is not really related to the specifics of this change, but for
> something like this, why are we writing our own kernels rather than trying
> to utilize something like an existing audio unit effect unit (e.g.
> kAudioUnitSubType_Delay, kAudioUnitSubType_LowPassFilter, etc.) at least on
> supported platforms? Do the existing audio units not work here?

That's a good question and I am actually not sure :) I am hacking the WebAudio code but I know very little about our platform's media libraries.
Jer or Eric may know the answer to that. I don't think any of the audio code is using such platform functionality in either WebKit or Blink (AFAICT).

There MIGHT also be some restrictions due to this code running in the WebProcess but us talking to CoreAudio in the GPUProcess but I am really not sure.
Comment 5 Peng Liu 2021-05-28 13:14:52 PDT
Comment on attachment 429943 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39
> +    // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once.

Do we have any protection for the bad case? Or do we need the protection? :-)
Comment 6 Peng Liu 2021-05-28 13:14:53 PDT
Comment on attachment 429943 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39
> +    // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once.

Do we have any protection for the bad case? Or do we need the protection? :-)
Comment 7 Chris Dumez 2021-05-28 13:26:21 PDT
Comment on attachment 429943 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review

>>> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:39
>>> +    // The algorithm below depends on this being true because we don't expect to have to fill the entire buffer more than once.
>> 
>> Do we have any protection for the bad case? Or do we need the protection? :-)
> 
> Do we have any protection for the bad case? Or do we need the protection? :-)

The bufferLength is computed using `bufferLengthForDelay(maxDelayTime, sampleRate)`:
```
size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const
{
    // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can
    // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer.
    return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up);
}
```

In WebAudio, framesToProcess is never greater than AudioUtilities::renderQuantumSize. Therefore, I believe this is safe.
Comment 8 Peng Liu 2021-05-28 14:11:15 PDT
Comment on attachment 429943 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128
> +        float interpolationFactor = readPosition - readIndex1;

Just curious, is there accuracy concern here? The same to other changes from "double" to "float".
Comment 9 Chris Dumez 2021-05-28 14:12:39 PDT
Comment on attachment 429943 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review

>> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:128
>> +        float interpolationFactor = readPosition - readIndex1;
> 
> Just curious, is there accuracy concern here? The same to other changes from "double" to "float".

I am not too worried since Blink is also using float here. Using float results in faster arithmetic.
Comment 10 Darin Adler 2021-05-28 15:07:47 PDT
Comment on attachment 429943 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429943&action=review

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40
> +    ASSERT(bufferLength >= framesToProcess);

I’m sure this is indeed safe. But Wwe could also just use std::clamp, std::min, or std::max to be robust against this mistake instead o asserting.

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44
> +    float* writePointer = &buffer[writeIndex];

auto?

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45
> +    int remainder = bufferLength - writeIndex;

If we want a signed type so make the negative number part work, maybe ptrdiff_t would be a better choice.

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48
> +    memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder));

Might be better to use ptrdiff_t than int here.

I think that sizeof(*writePointer) would be even better than sizeof(float). Could even create template function that guarantees this is done right, like a memcpy that takes a number of items to copy instead of a number of bytes, and does the multiplication with overflow checking done carefully.

This relies on framesToProcess/reminder not being huge numbers that would overflow an int when multiplied by 4. That’s probably not a risk, but did you think about where that overflow guarantee comes from?

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49
> +    memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder));

Same issues.

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97
> +    if (UNLIKELY(!m_buffer.size() || !source || !destination))

isEmpty better than calling size?

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110
> +    int bufferLength = m_buffer.size();

Seems like a bad patten to use int for sizes. I’d like to see auto here to guarantee I am not chopping a bigger size down to a smaller one, or size_t because that’s a natural for a size.

> Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141
> +    int bufferLength = m_buffer.size();

Ditto.
Comment 11 Chris Dumez 2021-05-28 15:29:55 PDT
(In reply to Darin Adler from comment #10)
> Comment on attachment 429943 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=429943&action=review
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:40
> > +    ASSERT(bufferLength >= framesToProcess);
> 
> I’m sure this is indeed safe. But Wwe could also just use std::clamp,
> std::min, or std::max to be robust against this mistake instead o asserting.
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:44
> > +    float* writePointer = &buffer[writeIndex];
> 
> auto?
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:45
> > +    int remainder = bufferLength - writeIndex;
> 
> If we want a signed type so make the negative number part work, maybe
> ptrdiff_t would be a better choice.
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:48
> > +    memcpy(writePointer, source, sizeof(float) * std::min(static_cast<int>(framesToProcess), remainder));
> 
> Might be better to use ptrdiff_t than int here.
> 
> I think that sizeof(*writePointer) would be even better than sizeof(float).
> Could even create template function that guarantees this is done right, like
> a memcpy that takes a number of items to copy instead of a number of bytes,
> and does the multiplication with overflow checking done carefully.
> 
> This relies on framesToProcess/reminder not being huge numbers that would
> overflow an int when multiplied by 4. That’s probably not a risk, but did
> you think about where that overflow guarantee comes from?
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:49
> > +    memcpy(buffer, source + remainder, sizeof(float) * std::max(0, static_cast<int>(framesToProcess) - remainder));
> 
> Same issues.
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:97
> > +    if (UNLIKELY(!m_buffer.size() || !source || !destination))
> 
> isEmpty better than calling size?

OK, but then I am going to have to add a isEmpty() function to AudioArray :)

> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:110
> > +    int bufferLength = m_buffer.size();
> 
> Seems like a bad patten to use int for sizes. I’d like to see auto here to
> guarantee I am not chopping a bigger size down to a smaller one, or size_t
> because that’s a natural for a size.
> 
> > Source/WebCore/Modules/webaudio/DelayDSPKernel.cpp:141
> > +    int bufferLength = m_buffer.size();
> 
> Ditto.
Comment 12 Chris Dumez 2021-05-28 15:49:26 PDT
FYI, we are not dealing with large numbers here because frameToProcess is always 128 (per WebAudio spec) and we have a cap on the delay at 180, as well as the sample rate (384000).

BufferLength is computed like so:
size_t DelayDSPKernel::bufferLengthForDelay(double maxDelayTime, double sampleRate) const
{
    // Compute the length of the buffer needed to handle a max delay of |maxDelayTime|. Add an additional render quantum frame size so we can
    // vectorize the delay processing. The extra space is needed so that writes to the buffer won't overlap reads from the buffer.
    return AudioUtilities::renderQuantumSize + AudioUtilities::timeToSampleFrame(maxDelayTime, sampleRate, AudioUtilities::SampleFrameRounding::Up);
}

maxDelayTime cannot be larger than 180. sampleRate cannot be larger than 384000 and AudioUtilities::renderQuantumSize is 128.

So the BufferLength cannot be larger than:
`128 + timeToSampleFrame(128, 384000, AudioUtilities::SampleFrameRounding::Up)`

timeToSampleFrame() would end up being:
```
constexpr double oversampleFactor = 1024;
double frame = std::ceil(std::round(time * sampleRate * oversampleFactor) / oversampleFactor);
```

So value returned by timeToSampleFrame() would be 49152000.

So max buffer length would be 49152000 + 128 = 49152128. INT_MAX on 32 bit would be 2147483647.
Comment 13 Chris Dumez 2021-05-28 16:26:46 PDT
Created attachment 430070 [details]
Patch
Comment 14 Darin Adler 2021-05-28 16:27:25 PDT
All sounds great. All those maximums and guarantees are so far away from this code. Reading this one function it’s really hard to know this is all safe. But I accept your explanation that this is not risky and overflow is far from possibility and don’t mind if we leave it like this.

Unless there is some way to make it easier to see it’s all safe.
Comment 15 Chris Dumez 2021-05-28 16:34:27 PDT
(In reply to Darin Adler from comment #14)
> All sounds great. All those maximums and guarantees are so far away from
> this code. Reading this one function it’s really hard to know this is all
> safe. But I accept your explanation that this is not risky and overflow is
> far from possibility and don’t mind if we leave it like this.
> 
> Unless there is some way to make it easier to see it’s all safe.

I use size_t consistently in my latest iteration and introduce a very simple function that returns `a <= b ? 0 : (a - b)` to make sure I end up with positive values (instead of ptrdiff_t / int + std::max(0, result).
Comment 16 Darin Adler 2021-05-28 16:43:52 PDT
(In reply to Chris Dumez from comment #15)
> introduce a very simple
> function that returns `a <= b ? 0 : (a - b)` to make sure I end up with
> positive values (instead of ptrdiff_t / int + std::max(0, result).

Sounds like a function that we would put into SaturatedArithmetic.h.
Comment 17 EWS 2021-05-28 17:52:32 PDT
Committed r278231 (238268@main): <https://commits.webkit.org/238268@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 430070 [details].
Comment 18 Radar WebKit Bug Importer 2021-05-28 17:53:18 PDT
<rdar://problem/78638796>