RESOLVED FIXED 219715
[GPUProcess] Cut in half amount of IPC needed to do WebAudio
https://bugs.webkit.org/show_bug.cgi?id=219715
Summary [GPUProcess] Cut in half amount of IPC needed to do WebAudio
Chris Dumez
Reported 2020-12-09 15:41:54 PST
Cut in half amount of IPC needed to do WebAudio.
Attachments
Patch (24.20 KB, patch)
2020-12-09 16:04 PST, Chris Dumez
no flags
Patch (24.20 KB, patch)
2020-12-09 16:05 PST, Chris Dumez
ews-feeder: commit-queue-
Patch (24.16 KB, patch)
2020-12-09 16:17 PST, Chris Dumez
no flags
Patch (24.16 KB, patch)
2020-12-09 16:26 PST, Chris Dumez
no flags
Patch (55.52 KB, patch)
2020-12-10 13:22 PST, Chris Dumez
no flags
Patch (51.78 KB, patch)
2020-12-10 13:54 PST, Chris Dumez
no flags
Patch (51.78 KB, patch)
2020-12-10 14:01 PST, Chris Dumez
no flags
Patch (51.76 KB, patch)
2020-12-10 14:37 PST, Chris Dumez
no flags
Chris Dumez
Comment 1 2020-12-09 16:04:14 PST
Chris Dumez
Comment 2 2020-12-09 16:05:41 PST
Geoffrey Garen
Comment 3 2020-12-09 16:17:27 PST
GTK failure: ../../Source/WebKit/WebProcess/GPU/media/RemoteAudioDestinationProxy.cpp:35:10: fatal error: SharedRingBufferFrameBounds.h: No such file or directory
Chris Dumez
Comment 4 2020-12-09 16:17:36 PST
Geoffrey Garen
Comment 5 2020-12-09 16:24:49 PST
> Source/WebKit/ChangeLog:14 > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared why -> by > Source/WebKit/ChangeLog:20 > + This is still too much IPC but this is a step in the right directly. We directly -> direction > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); I'm not super familiar with this code. In the case where the opportunistic tryGet() fails, what ensures that we'll eventually tryGet() again and succeed?
Chris Dumez
Comment 6 2020-12-09 16:26:43 PST
Chris Dumez
Comment 7 2020-12-09 16:30:14 PST
(In reply to Geoffrey Garen from comment #5) > > Source/WebKit/ChangeLog:14 > > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared > > why -> by > > > Source/WebKit/ChangeLog:20 > > + This is still too much IPC but this is a step in the right directly. We > > directly -> direction > > > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); > > I'm not super familiar with this code. In the case where the opportunistic > tryGet() fails, what ensures that we'll eventually tryGet() again and > succeed? Are you worried about the case where we get very bad luck and every time the GPUProcess needs sample the WebProcess happens to writing the frame bounds (and thus holding the lock). I guess this is a possibility but it sounds not very likely? I used tryGet() because: 1. It is generally frowned upon to have the audio thread waiting on a lock 2. Even though we don't have the latest bounds, the RingBuffer may already have enough samples to provide an answer. That said, if our media people say otherwise, I can switch to a get() and actually wait for the lock.
Chris Dumez
Comment 8 2020-12-09 17:00:18 PST
(In reply to Chris Dumez from comment #7) > (In reply to Geoffrey Garen from comment #5) > > > Source/WebKit/ChangeLog:14 > > > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared > > > > why -> by > > > > > Source/WebKit/ChangeLog:20 > > > + This is still too much IPC but this is a step in the right directly. We > > > > directly -> direction > > > > > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > > > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > > > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); > > > > I'm not super familiar with this code. In the case where the opportunistic > > tryGet() fails, what ensures that we'll eventually tryGet() again and > > succeed? > > Are you worried about the case where we get very bad luck and every time the > GPUProcess needs sample the WebProcess happens to writing the frame bounds > (and thus holding the lock). I guess this is a possibility but it sounds not > very likely? > > I used tryGet() because: > 1. It is generally frowned upon to have the audio thread waiting on a lock > 2. Even though we don't have the latest bounds, the RingBuffer may already > have enough samples to provide an answer. > > That said, if our media people say otherwise, I can switch to a get() and > actually wait for the lock. Jer confirmed on Slack that we should NEVER grab a lock on the CA audio thread (try lock is ok though).
Peng Liu
Comment 9 2020-12-09 20:11:19 PST
I believe this patch works, but it may not be the best approach. To move audio samples from a web process to the GPU process, we use a pair of CARingBuffers with shared memory (SharedRingBufferStorage) that stores audio samples. We need to set bounds information in the GPU Process (reader) because the shared memory only stores audio samples, but not the internal data structure to implement the lockless operations. That works for two CARingBuffers in one process but different threads. But when the two CARingBuffers are in different processes, we need to synchronize the internal data structures in the reader. Fortunately, only synchronizing the bounds information will make the CARingBuffer in the GPU process functional. If we can modify the CARingBuffer implementation to store other information in the shared memory in addition to audio samples, we may be able to use the CARingBuffer in two processes as in two threads (don't need to set the bounds information in a reader). Since audio capture also uses CARingBuffer in the same manner, Youenn and Eric may have some comments.
Darin Adler
Comment 10 2020-12-10 08:40:33 PST
Comment on attachment 415811 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415811&action=review > Source/WebKit/Shared/Cocoa/SharedRingBufferFrameBounds.cpp:36 > +static inline void yield() This will almost certainly get inlined without the inline keyword. I’m not sure we should use the keyword in cases like this. > Source/WebKit/Shared/Cocoa/SharedRingBufferFrameBounds.h:38 > + WTF_MAKE_FAST_ALLOCATED; > + WTF_MAKE_NONCOPYABLE(SharedRingBufferFrameBounds) Not sure why the first has a semicolon and the second does not.
Chris Dumez
Comment 11 2020-12-10 08:47:50 PST
(In reply to Peng Liu from comment #9) > I believe this patch works, but it may not be the best approach. > > To move audio samples from a web process to the GPU process, we use a pair > of CARingBuffers with shared memory (SharedRingBufferStorage) that stores > audio samples. We need to set bounds information in the GPU Process (reader) > because the shared memory only stores audio samples, but not the internal > data structure to implement the lockless operations. That works for two > CARingBuffers in one process but different threads. But when the two > CARingBuffers are in different processes, we need to synchronize the > internal data structures in the reader. Fortunately, only synchronizing the > bounds information will make the CARingBuffer in the GPU process functional. I agree it would be nicer if CARingBuffer did it for us. I will investigate to see how much work that would be.
Geoffrey Garen
Comment 12 2020-12-10 10:03:58 PST
> Are you worried about the case where we get very bad luck and every time the > GPUProcess needs sample the WebProcess happens to writing the frame bounds > (and thus holding the lock). I guess this is a possibility but it sounds not > very likely? Yes, I guess that's my worry. Or, at least, not *every* time, but enough times in a row that we run out of samples. I agree that this case is unlikely, but it does seem possible. A related worry is that we would fail the tryGet() on the very last update, possibly cutting off the end of the audio stream. (I'm not sure how much we would cut off in that case.) How frequently will the audio thread run the tryGet() operation? What happens if tryGet() fails on the last attempt? > I used tryGet() because: > 1. It is generally frowned upon to have the audio thread waiting on a lock > 2. Even though we don't have the latest bounds, the RingBuffer may already > have enough samples to provide an answer. > > That said, if our media people say otherwise, I can switch to a get() and > actually wait for the lock. Definitely agree that we should not lock on the audio thread. Another option is, instead of { isLocked, pair }, you can use { pair*, std::array<pair>, 2> }. When the writing thread wants to update the bounds, it writes the new pair value to the un-pointed-to pair and then atomically swaps the pair* (with a release memory barrier). When the reading thread wants to read the bounds, it reads using the pair* (with an acquire memory barrier). This ensures no locking, and also no pathological failure. However, it is still possible to miss the very last update if you never get another chance to check.
Chris Dumez
Comment 13 2020-12-10 10:14:53 PST
(In reply to Geoffrey Garen from comment #12) > > Are you worried about the case where we get very bad luck and every time the > > GPUProcess needs sample the WebProcess happens to writing the frame bounds > > (and thus holding the lock). I guess this is a possibility but it sounds not > > very likely? > > Yes, I guess that's my worry. Or, at least, not *every* time, but enough > times in a row that we run out of samples. > > I agree that this case is unlikely, but it does seem possible. > > A related worry is that we would fail the tryGet() on the very last update, > possibly cutting off the end of the audio stream. (I'm not sure how much we > would cut off in that case.) > > How frequently will the audio thread run the tryGet() operation? That depends on the sample rate. A usual sample rate is 44khz and render is called for 128 samples each time. I think this means render would get called ~340 per seconds in general. Media people can confirm. > > What happens if tryGet() fails on the last attempt? I am personally not too worried about this part. If Audio is still audible by the time by stop rendering then it will always be cut off in general. Since requests to stop come on the main thread and since audio is being rendered on the audio thread, the exact point we stop rendering is always undefined. > > > I used tryGet() because: > > 1. It is generally frowned upon to have the audio thread waiting on a lock > > 2. Even though we don't have the latest bounds, the RingBuffer may already > > have enough samples to provide an answer. > > > > That said, if our media people say otherwise, I can switch to a get() and > > actually wait for the lock. > > Definitely agree that we should not lock on the audio thread. > > Another option is, instead of { isLocked, pair }, you can use { pair*, > std::array<pair>, 2> }. When the writing thread wants to update the bounds, > it writes the new pair value to the un-pointed-to pair and then atomically > swaps the pair* (with a release memory barrier). When the reading thread > wants to read the bounds, it reads using the pair* (with an acquire memory > barrier). This ensures no locking, and also no pathological failure. > However, it is still possible to miss the very last update if you never get > another chance to check. This sounds like a good idea. I will see if I can make it work.
Chris Dumez
Comment 14 2020-12-10 13:22:09 PST
Chris Dumez
Comment 15 2020-12-10 13:24:18 PST
(In reply to Chris Dumez from comment #14) > Created attachment 415921 [details] > Patch New approach based on feedback from Peng and Geoff: 1. I updated the CARingBuffer class so that it now stores its frame bounds in its memory backing (which may be shared memory). This way, we no longer need a separate SharedRingBufferFrameBounds class to get this functionality. 2. I no longer use a lock for the frame bounds in shared memory and instead use the approach suggested by Geoff (buffer of bounds + atomic index).
Chris Dumez
Comment 16 2020-12-10 13:42:42 PST
Comment on attachment 415921 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415921&action=review > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:82 > + auto memory = SharedMemory::map(ipcHandle.handle, SharedMemory::Protection::ReadWrite); Now that I no longer have a lock, I can keep this ReadOnly. Will update the patch accordingly.
Chris Dumez
Comment 17 2020-12-10 13:54:09 PST
Chris Dumez
Comment 18 2020-12-10 14:01:58 PST
Geoffrey Garen
Comment 19 2020-12-10 14:33:33 PST
Comment on attachment 415927 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415927&action=review r=me > Source/WebCore/platform/audio/cocoa/CARingBuffer.cpp:332 > + // When the RingBuffer is backed by shared memory, make sure we pull frame bounds from shared memory before fetching. > + updateFrameBounds(); > + > + uint64_t start, end; > + getCurrentFrameBoundsWithoutUpdate(start, end); Should this just be getCurrentFrameBounds()? Seems like the behavior would be equivalent. > Source/WebKit/Shared/Cocoa/SharedRingBufferStorage.h:73 > + static constexpr unsigned boundsBufferSize { 32 }; > + struct FrameBounds { > + std::pair<uint64_t, uint64_t> boundsBuffer[boundsBufferSize]; > + Atomic<unsigned> boundsBufferIndex { 0 }; > + }; The logic of this data structure and its related code seem right to me, but I don't fully understand why it's OK for this class and the related CARingBuffer class to assume that the reader will not fall behind by more than 32 frames. I get it that falling behind is very rare and will definitely cause audio distortion. But this logic (and the pre-existing kGeneralRingTimeBoundsQueueSize in CARingBuffer) seem to take the audio distortion bug and embiggen it into a buffer overflow bug. Seems like something we should follow up on with our audio experts. Maybe they can explain why 32 is always enough. Or maybe we need to adjust this logic in both places. (We could use a strategy similar to your previous tryLock() strategy; or we could have the *reader* update the boundsBufferIndex and have the writer keep writing to the same index until the reader is ready for the next one).
Chris Dumez
Comment 20 2020-12-10 14:37:02 PST
Chris Dumez
Comment 21 2020-12-10 14:40:56 PST
(In reply to Geoffrey Garen from comment #19) > Comment on attachment 415927 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=415927&action=review > > r=me > > > Source/WebCore/platform/audio/cocoa/CARingBuffer.cpp:332 > > + // When the RingBuffer is backed by shared memory, make sure we pull frame bounds from shared memory before fetching. > > + updateFrameBounds(); > > + > > + uint64_t start, end; > > + getCurrentFrameBoundsWithoutUpdate(start, end); > > Should this just be getCurrentFrameBounds()? Seems like the behavior would > be equivalent. Yes, I was trying to be more explicit about the update but I made the change. > > Source/WebKit/Shared/Cocoa/SharedRingBufferStorage.h:73 > > + static constexpr unsigned boundsBufferSize { 32 }; > > + struct FrameBounds { > > + std::pair<uint64_t, uint64_t> boundsBuffer[boundsBufferSize]; > > + Atomic<unsigned> boundsBufferIndex { 0 }; > > + }; > > The logic of this data structure and its related code seem right to me, but > I don't fully understand why it's OK for this class and the related > CARingBuffer class to assume that the reader will not fall behind by more > than 32 frames. I get it that falling behind is very rare and will > definitely cause audio distortion. But this logic (and the pre-existing > kGeneralRingTimeBoundsQueueSize in CARingBuffer) seem to take the audio > distortion bug and embiggen it into a buffer overflow bug. Seems like > something we should follow up on with our audio experts. Maybe they can > explain why 32 is always enough. Or maybe we need to adjust this logic in > both places. (We could use a strategy similar to your previous tryLock() > strategy; or we could have the *reader* update the boundsBufferIndex and > have the writer keep writing to the same index until the reader is ready for > the next one). Yes, I am discussing this with Jer. It seem CARingBuffer::frameOffset() always make sure we read inside the RingBuffer by doing a modulo BufferSize.
EWS
Comment 22 2020-12-10 18:15:08 PST
Committed r270663: <https://trac.webkit.org/changeset/270663> All reviewed patches have been landed. Closing bug and clearing flags on attachment 415931 [details].
Radar WebKit Bug Importer
Comment 23 2020-12-10 18:16:20 PST
Note You need to log in before you can comment on or make changes to this bug.