Bug 219715

Summary:

[GPUProcess] Cut in half amount of IPC needed to do WebAudio

Product:

WebKit

Reporter:

Chris Dumez <cdumez>

Component:

Web Audio

Assignee:

Chris Dumez <cdumez>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

cdumez, darin, eric.carlson, ews-watchlist, ggaren, glenn, jer.noble, peng.liu6, philipj, sergio, webkit-bug-importer, youennf

Priority:

Keywords:

InRadar

Version:

WebKit Nightly Build

Hardware:

Unspecified

OS:

Unspecified

See Also:

https://bugs.webkit.org/show_bug.cgi?id=217969

Attachments:

Description	Flags
Patch	none
Patch	ews-feeder: commit-queue-
Patch	none
Patch	none
Patch	none
Patch	none
Patch	none
Patch	none

Chris Dumez

Reported 2020-12-09 15:41:54 PST

Cut in half amount of IPC needed to do WebAudio.

Attachments
Patch (24.20 KB, patch) 2020-12-09 16:04 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (24.20 KB, patch) 2020-12-09 16:05 PST, Chris Dumez	ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (24.16 KB, patch) 2020-12-09 16:17 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (24.16 KB, patch) 2020-12-09 16:26 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (55.52 KB, patch) 2020-12-10 13:22 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (51.78 KB, patch) 2020-12-10 13:54 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (51.78 KB, patch) 2020-12-10 14:01 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (51.76 KB, patch) 2020-12-10 14:37 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Chris Dumez

Comment 1 2020-12-09 16:04:14 PST

Created attachment 415805 [details] Patch

Chris Dumez

Comment 2 2020-12-09 16:05:41 PST

Created attachment 415806 [details] Patch

Geoffrey Garen

Comment 3 2020-12-09 16:17:27 PST

GTK failure: ../../Source/WebKit/WebProcess/GPU/media/RemoteAudioDestinationProxy.cpp:35:10: fatal error: SharedRingBufferFrameBounds.h: No such file or directory

Chris Dumez

Comment 4 2020-12-09 16:17:36 PST

Created attachment 415809 [details] Patch

Geoffrey Garen

Comment 5 2020-12-09 16:24:49 PST

> Source/WebKit/ChangeLog:14 > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared why -> by > Source/WebKit/ChangeLog:20 > + This is still too much IPC but this is a step in the right directly. We directly -> direction > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); I'm not super familiar with this code. In the case where the opportunistic tryGet() fails, what ensures that we'll eventually tryGet() again and succeed?

Chris Dumez

Comment 6 2020-12-09 16:26:43 PST

Created attachment 415811 [details] Patch

Chris Dumez

Comment 7 2020-12-09 16:30:14 PST

(In reply to Geoffrey Garen from comment #5) > > Source/WebKit/ChangeLog:14 > > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared > > why -> by > > > Source/WebKit/ChangeLog:20 > > + This is still too much IPC but this is a step in the right directly. We > > directly -> direction > > > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); > > I'm not super familiar with this code. In the case where the opportunistic > tryGet() fails, what ensures that we'll eventually tryGet() again and > succeed? Are you worried about the case where we get very bad luck and every time the GPUProcess needs sample the WebProcess happens to writing the frame bounds (and thus holding the lock). I guess this is a possibility but it sounds not very likely? I used tryGet() because: 1. It is generally frowned upon to have the audio thread waiting on a lock 2. Even though we don't have the latest bounds, the RingBuffer may already have enough samples to provide an answer. That said, if our media people say otherwise, I can switch to a get() and actually wait for the lock.

Chris Dumez

Comment 8 2020-12-09 17:00:18 PST

(In reply to Chris Dumez from comment #7) > (In reply to Geoffrey Garen from comment #5) > > > Source/WebKit/ChangeLog:14 > > > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared > > > > why -> by > > > > > Source/WebKit/ChangeLog:20 > > > + This is still too much IPC but this is a step in the right directly. We > > > > directly -> direction > > > > > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > > > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > > > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); > > > > I'm not super familiar with this code. In the case where the opportunistic > > tryGet() fails, what ensures that we'll eventually tryGet() again and > > succeed? > > Are you worried about the case where we get very bad luck and every time the > GPUProcess needs sample the WebProcess happens to writing the frame bounds > (and thus holding the lock). I guess this is a possibility but it sounds not > very likely? > > I used tryGet() because: > 1. It is generally frowned upon to have the audio thread waiting on a lock > 2. Even though we don't have the latest bounds, the RingBuffer may already > have enough samples to provide an answer. > > That said, if our media people say otherwise, I can switch to a get() and > actually wait for the lock. Jer confirmed on Slack that we should NEVER grab a lock on the CA audio thread (try lock is ok though).

Peng Liu

Comment 9 2020-12-09 20:11:19 PST

I believe this patch works, but it may not be the best approach. To move audio samples from a web process to the GPU process, we use a pair of CARingBuffers with shared memory (SharedRingBufferStorage) that stores audio samples. We need to set bounds information in the GPU Process (reader) because the shared memory only stores audio samples, but not the internal data structure to implement the lockless operations. That works for two CARingBuffers in one process but different threads. But when the two CARingBuffers are in different processes, we need to synchronize the internal data structures in the reader. Fortunately, only synchronizing the bounds information will make the CARingBuffer in the GPU process functional. If we can modify the CARingBuffer implementation to store other information in the shared memory in addition to audio samples, we may be able to use the CARingBuffer in two processes as in two threads (don't need to set the bounds information in a reader). Since audio capture also uses CARingBuffer in the same manner, Youenn and Eric may have some comments.

Darin Adler

Comment 10 2020-12-10 08:40:33 PST

Comment on attachment 415811 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415811&action=review > Source/WebKit/Shared/Cocoa/SharedRingBufferFrameBounds.cpp:36 > +static inline void yield() This will almost certainly get inlined without the inline keyword. I’m not sure we should use the keyword in cases like this. > Source/WebKit/Shared/Cocoa/SharedRingBufferFrameBounds.h:38 > + WTF_MAKE_FAST_ALLOCATED; > + WTF_MAKE_NONCOPYABLE(SharedRingBufferFrameBounds) Not sure why the first has a semicolon and the second does not.

Chris Dumez

Comment 11 2020-12-10 08:47:50 PST

(In reply to Peng Liu from comment #9) > I believe this patch works, but it may not be the best approach. > > To move audio samples from a web process to the GPU process, we use a pair > of CARingBuffers with shared memory (SharedRingBufferStorage) that stores > audio samples. We need to set bounds information in the GPU Process (reader) > because the shared memory only stores audio samples, but not the internal > data structure to implement the lockless operations. That works for two > CARingBuffers in one process but different threads. But when the two > CARingBuffers are in different processes, we need to synchronize the > internal data structures in the reader. Fortunately, only synchronizing the > bounds information will make the CARingBuffer in the GPU process functional. I agree it would be nicer if CARingBuffer did it for us. I will investigate to see how much work that would be.

Geoffrey Garen

Comment 12 2020-12-10 10:03:58 PST

> Are you worried about the case where we get very bad luck and every time the > GPUProcess needs sample the WebProcess happens to writing the frame bounds > (and thus holding the lock). I guess this is a possibility but it sounds not > very likely? Yes, I guess that's my worry. Or, at least, not *every* time, but enough times in a row that we run out of samples. I agree that this case is unlikely, but it does seem possible. A related worry is that we would fail the tryGet() on the very last update, possibly cutting off the end of the audio stream. (I'm not sure how much we would cut off in that case.) How frequently will the audio thread run the tryGet() operation? What happens if tryGet() fails on the last attempt? > I used tryGet() because: > 1. It is generally frowned upon to have the audio thread waiting on a lock > 2. Even though we don't have the latest bounds, the RingBuffer may already > have enough samples to provide an answer. > > That said, if our media people say otherwise, I can switch to a get() and > actually wait for the lock. Definitely agree that we should not lock on the audio thread. Another option is, instead of { isLocked, pair }, you can use { pair*, std::array<pair>, 2> }. When the writing thread wants to update the bounds, it writes the new pair value to the un-pointed-to pair and then atomically swaps the pair* (with a release memory barrier). When the reading thread wants to read the bounds, it reads using the pair* (with an acquire memory barrier). This ensures no locking, and also no pathological failure. However, it is still possible to miss the very last update if you never get another chance to check.

Chris Dumez

Comment 13 2020-12-10 10:14:53 PST

(In reply to Geoffrey Garen from comment #12) > > Are you worried about the case where we get very bad luck and every time the > > GPUProcess needs sample the WebProcess happens to writing the frame bounds > > (and thus holding the lock). I guess this is a possibility but it sounds not > > very likely? > > Yes, I guess that's my worry. Or, at least, not *every* time, but enough > times in a row that we run out of samples. > > I agree that this case is unlikely, but it does seem possible. > > A related worry is that we would fail the tryGet() on the very last update, > possibly cutting off the end of the audio stream. (I'm not sure how much we > would cut off in that case.) > > How frequently will the audio thread run the tryGet() operation? That depends on the sample rate. A usual sample rate is 44khz and render is called for 128 samples each time. I think this means render would get called ~340 per seconds in general. Media people can confirm. > > What happens if tryGet() fails on the last attempt? I am personally not too worried about this part. If Audio is still audible by the time by stop rendering then it will always be cut off in general. Since requests to stop come on the main thread and since audio is being rendered on the audio thread, the exact point we stop rendering is always undefined. > > > I used tryGet() because: > > 1. It is generally frowned upon to have the audio thread waiting on a lock > > 2. Even though we don't have the latest bounds, the RingBuffer may already > > have enough samples to provide an answer. > > > > That said, if our media people say otherwise, I can switch to a get() and > > actually wait for the lock. > > Definitely agree that we should not lock on the audio thread. > > Another option is, instead of { isLocked, pair }, you can use { pair*, > std::array<pair>, 2> }. When the writing thread wants to update the bounds, > it writes the new pair value to the un-pointed-to pair and then atomically > swaps the pair* (with a release memory barrier). When the reading thread > wants to read the bounds, it reads using the pair* (with an acquire memory > barrier). This ensures no locking, and also no pathological failure. > However, it is still possible to miss the very last update if you never get > another chance to check. This sounds like a good idea. I will see if I can make it work.

Chris Dumez

Comment 14 2020-12-10 13:22:09 PST

Created attachment 415921 [details] Patch

Chris Dumez

Comment 15 2020-12-10 13:24:18 PST

(In reply to Chris Dumez from comment #14) > Created attachment 415921 [details] > Patch New approach based on feedback from Peng and Geoff: 1. I updated the CARingBuffer class so that it now stores its frame bounds in its memory backing (which may be shared memory). This way, we no longer need a separate SharedRingBufferFrameBounds class to get this functionality. 2. I no longer use a lock for the frame bounds in shared memory and instead use the approach suggested by Geoff (buffer of bounds + atomic index).

Chris Dumez

Comment 16 2020-12-10 13:42:42 PST

Comment on attachment 415921 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415921&action=review > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:82 > + auto memory = SharedMemory::map(ipcHandle.handle, SharedMemory::Protection::ReadWrite); Now that I no longer have a lock, I can keep this ReadOnly. Will update the patch accordingly.

Chris Dumez

Comment 17 2020-12-10 13:54:09 PST

Created attachment 415925 [details] Patch

Chris Dumez

Comment 18 2020-12-10 14:01:58 PST

Created attachment 415927 [details] Patch

Geoffrey Garen

Comment 19 2020-12-10 14:33:33 PST

Comment on attachment 415927 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415927&action=review r=me > Source/WebCore/platform/audio/cocoa/CARingBuffer.cpp:332 > + // When the RingBuffer is backed by shared memory, make sure we pull frame bounds from shared memory before fetching. > + updateFrameBounds(); > + > + uint64_t start, end; > + getCurrentFrameBoundsWithoutUpdate(start, end); Should this just be getCurrentFrameBounds()? Seems like the behavior would be equivalent. > Source/WebKit/Shared/Cocoa/SharedRingBufferStorage.h:73 > + static constexpr unsigned boundsBufferSize { 32 }; > + struct FrameBounds { > + std::pair<uint64_t, uint64_t> boundsBuffer[boundsBufferSize]; > + Atomic<unsigned> boundsBufferIndex { 0 }; > + }; The logic of this data structure and its related code seem right to me, but I don't fully understand why it's OK for this class and the related CARingBuffer class to assume that the reader will not fall behind by more than 32 frames. I get it that falling behind is very rare and will definitely cause audio distortion. But this logic (and the pre-existing kGeneralRingTimeBoundsQueueSize in CARingBuffer) seem to take the audio distortion bug and embiggen it into a buffer overflow bug. Seems like something we should follow up on with our audio experts. Maybe they can explain why 32 is always enough. Or maybe we need to adjust this logic in both places. (We could use a strategy similar to your previous tryLock() strategy; or we could have the *reader* update the boundsBufferIndex and have the writer keep writing to the same index until the reader is ready for the next one).

Chris Dumez

Comment 20 2020-12-10 14:37:02 PST

Created attachment 415931 [details] Patch

Chris Dumez

Comment 21 2020-12-10 14:40:56 PST

(In reply to Geoffrey Garen from comment #19) > Comment on attachment 415927 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=415927&action=review > > r=me > > > Source/WebCore/platform/audio/cocoa/CARingBuffer.cpp:332 > > + // When the RingBuffer is backed by shared memory, make sure we pull frame bounds from shared memory before fetching. > > + updateFrameBounds(); > > + > > + uint64_t start, end; > > + getCurrentFrameBoundsWithoutUpdate(start, end); > > Should this just be getCurrentFrameBounds()? Seems like the behavior would > be equivalent. Yes, I was trying to be more explicit about the update but I made the change. > > Source/WebKit/Shared/Cocoa/SharedRingBufferStorage.h:73 > > + static constexpr unsigned boundsBufferSize { 32 }; > > + struct FrameBounds { > > + std::pair<uint64_t, uint64_t> boundsBuffer[boundsBufferSize]; > > + Atomic<unsigned> boundsBufferIndex { 0 }; > > + }; > > The logic of this data structure and its related code seem right to me, but > I don't fully understand why it's OK for this class and the related > CARingBuffer class to assume that the reader will not fall behind by more > than 32 frames. I get it that falling behind is very rare and will > definitely cause audio distortion. But this logic (and the pre-existing > kGeneralRingTimeBoundsQueueSize in CARingBuffer) seem to take the audio > distortion bug and embiggen it into a buffer overflow bug. Seems like > something we should follow up on with our audio experts. Maybe they can > explain why 32 is always enough. Or maybe we need to adjust this logic in > both places. (We could use a strategy similar to your previous tryLock() > strategy; or we could have the *reader* update the boundsBufferIndex and > have the writer keep writing to the same index until the reader is ready for > the next one). Yes, I am discussing this with Jer. It seem CARingBuffer::frameOffset() always make sure we read inside the RingBuffer by doing a modulo BufferSize.

EWS

Comment 22 2020-12-10 18:15:08 PST

Committed r270663: <https://trac.webkit.org/changeset/270663> All reviewed patches have been landed. Closing bug and clearing flags on attachment 415931 [details].

Radar WebKit Bug Importer

Comment 23 2020-12-10 18:16:20 PST

<rdar://problem/72206083>

Note You need to log in before you can comment on or make changes to this bug.