WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
219715
[GPUProcess] Cut in half amount of IPC needed to do WebAudio
https://bugs.webkit.org/show_bug.cgi?id=219715
Summary
[GPUProcess] Cut in half amount of IPC needed to do WebAudio
Chris Dumez
Reported
2020-12-09 15:41:54 PST
Cut in half amount of IPC needed to do WebAudio.
Attachments
Patch
(24.20 KB, patch)
2020-12-09 16:04 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(24.20 KB, patch)
2020-12-09 16:05 PST
,
Chris Dumez
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(24.16 KB, patch)
2020-12-09 16:17 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(24.16 KB, patch)
2020-12-09 16:26 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(55.52 KB, patch)
2020-12-10 13:22 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(51.78 KB, patch)
2020-12-10 13:54 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(51.78 KB, patch)
2020-12-10 14:01 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(51.76 KB, patch)
2020-12-10 14:37 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(7)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2020-12-09 16:04:14 PST
Created
attachment 415805
[details]
Patch
Chris Dumez
Comment 2
2020-12-09 16:05:41 PST
Created
attachment 415806
[details]
Patch
Geoffrey Garen
Comment 3
2020-12-09 16:17:27 PST
GTK failure: ../../Source/WebKit/WebProcess/GPU/media/RemoteAudioDestinationProxy.cpp:35:10: fatal error: SharedRingBufferFrameBounds.h: No such file or directory
Chris Dumez
Comment 4
2020-12-09 16:17:36 PST
Created
attachment 415809
[details]
Patch
Geoffrey Garen
Comment 5
2020-12-09 16:24:49 PST
> Source/WebKit/ChangeLog:14 > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared
why -> by
> Source/WebKit/ChangeLog:20 > + This is still too much IPC but this is a step in the right directly. We
directly -> direction
> Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second);
I'm not super familiar with this code. In the case where the opportunistic tryGet() fails, what ensures that we'll eventually tryGet() again and succeed?
Chris Dumez
Comment 6
2020-12-09 16:26:43 PST
Created
attachment 415811
[details]
Patch
Chris Dumez
Comment 7
2020-12-09 16:30:14 PST
(In reply to Geoffrey Garen from
comment #5
)
> > Source/WebKit/ChangeLog:14 > > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared > > why -> by > > > Source/WebKit/ChangeLog:20 > > + This is still too much IPC but this is a step in the right directly. We > > directly -> direction > > > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); > > I'm not super familiar with this code. In the case where the opportunistic > tryGet() fails, what ensures that we'll eventually tryGet() again and > succeed?
Are you worried about the case where we get very bad luck and every time the GPUProcess needs sample the WebProcess happens to writing the frame bounds (and thus holding the lock). I guess this is a possibility but it sounds not very likely? I used tryGet() because: 1. It is generally frowned upon to have the audio thread waiting on a lock 2. Even though we don't have the latest bounds, the RingBuffer may already have enough samples to provide an answer. That said, if our media people say otherwise, I can switch to a get() and actually wait for the lock.
Chris Dumez
Comment 8
2020-12-09 17:00:18 PST
(In reply to Chris Dumez from
comment #7
)
> (In reply to Geoffrey Garen from
comment #5
) > > > Source/WebKit/ChangeLog:14 > > > + This patch gets rid of IPC 2 why storing the RingBuffer bounds in shared > > > > why -> by > > > > > Source/WebKit/ChangeLog:20 > > > + This is still too much IPC but this is a step in the right directly. We > > > > directly -> direction > > > > > Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:157 > > > + if (auto ringBufferBounds = m_ringBufferFrameBounds->tryGet()) > > > + m_ringBuffer->setCurrentFrameBounds(ringBufferBounds->first, ringBufferBounds->second); > > > > I'm not super familiar with this code. In the case where the opportunistic > > tryGet() fails, what ensures that we'll eventually tryGet() again and > > succeed? > > Are you worried about the case where we get very bad luck and every time the > GPUProcess needs sample the WebProcess happens to writing the frame bounds > (and thus holding the lock). I guess this is a possibility but it sounds not > very likely? > > I used tryGet() because: > 1. It is generally frowned upon to have the audio thread waiting on a lock > 2. Even though we don't have the latest bounds, the RingBuffer may already > have enough samples to provide an answer. > > That said, if our media people say otherwise, I can switch to a get() and > actually wait for the lock.
Jer confirmed on Slack that we should NEVER grab a lock on the CA audio thread (try lock is ok though).
Peng Liu
Comment 9
2020-12-09 20:11:19 PST
I believe this patch works, but it may not be the best approach. To move audio samples from a web process to the GPU process, we use a pair of CARingBuffers with shared memory (SharedRingBufferStorage) that stores audio samples. We need to set bounds information in the GPU Process (reader) because the shared memory only stores audio samples, but not the internal data structure to implement the lockless operations. That works for two CARingBuffers in one process but different threads. But when the two CARingBuffers are in different processes, we need to synchronize the internal data structures in the reader. Fortunately, only synchronizing the bounds information will make the CARingBuffer in the GPU process functional. If we can modify the CARingBuffer implementation to store other information in the shared memory in addition to audio samples, we may be able to use the CARingBuffer in two processes as in two threads (don't need to set the bounds information in a reader). Since audio capture also uses CARingBuffer in the same manner, Youenn and Eric may have some comments.
Darin Adler
Comment 10
2020-12-10 08:40:33 PST
Comment on
attachment 415811
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=415811&action=review
> Source/WebKit/Shared/Cocoa/SharedRingBufferFrameBounds.cpp:36 > +static inline void yield()
This will almost certainly get inlined without the inline keyword. I’m not sure we should use the keyword in cases like this.
> Source/WebKit/Shared/Cocoa/SharedRingBufferFrameBounds.h:38 > + WTF_MAKE_FAST_ALLOCATED; > + WTF_MAKE_NONCOPYABLE(SharedRingBufferFrameBounds)
Not sure why the first has a semicolon and the second does not.
Chris Dumez
Comment 11
2020-12-10 08:47:50 PST
(In reply to Peng Liu from
comment #9
)
> I believe this patch works, but it may not be the best approach. > > To move audio samples from a web process to the GPU process, we use a pair > of CARingBuffers with shared memory (SharedRingBufferStorage) that stores > audio samples. We need to set bounds information in the GPU Process (reader) > because the shared memory only stores audio samples, but not the internal > data structure to implement the lockless operations. That works for two > CARingBuffers in one process but different threads. But when the two > CARingBuffers are in different processes, we need to synchronize the > internal data structures in the reader. Fortunately, only synchronizing the > bounds information will make the CARingBuffer in the GPU process functional.
I agree it would be nicer if CARingBuffer did it for us. I will investigate to see how much work that would be.
Geoffrey Garen
Comment 12
2020-12-10 10:03:58 PST
> Are you worried about the case where we get very bad luck and every time the > GPUProcess needs sample the WebProcess happens to writing the frame bounds > (and thus holding the lock). I guess this is a possibility but it sounds not > very likely?
Yes, I guess that's my worry. Or, at least, not *every* time, but enough times in a row that we run out of samples. I agree that this case is unlikely, but it does seem possible. A related worry is that we would fail the tryGet() on the very last update, possibly cutting off the end of the audio stream. (I'm not sure how much we would cut off in that case.) How frequently will the audio thread run the tryGet() operation? What happens if tryGet() fails on the last attempt?
> I used tryGet() because: > 1. It is generally frowned upon to have the audio thread waiting on a lock > 2. Even though we don't have the latest bounds, the RingBuffer may already > have enough samples to provide an answer. > > That said, if our media people say otherwise, I can switch to a get() and > actually wait for the lock.
Definitely agree that we should not lock on the audio thread. Another option is, instead of { isLocked, pair }, you can use { pair*, std::array<pair>, 2> }. When the writing thread wants to update the bounds, it writes the new pair value to the un-pointed-to pair and then atomically swaps the pair* (with a release memory barrier). When the reading thread wants to read the bounds, it reads using the pair* (with an acquire memory barrier). This ensures no locking, and also no pathological failure. However, it is still possible to miss the very last update if you never get another chance to check.
Chris Dumez
Comment 13
2020-12-10 10:14:53 PST
(In reply to Geoffrey Garen from
comment #12
)
> > Are you worried about the case where we get very bad luck and every time the > > GPUProcess needs sample the WebProcess happens to writing the frame bounds > > (and thus holding the lock). I guess this is a possibility but it sounds not > > very likely? > > Yes, I guess that's my worry. Or, at least, not *every* time, but enough > times in a row that we run out of samples. > > I agree that this case is unlikely, but it does seem possible. > > A related worry is that we would fail the tryGet() on the very last update, > possibly cutting off the end of the audio stream. (I'm not sure how much we > would cut off in that case.) > > How frequently will the audio thread run the tryGet() operation?
That depends on the sample rate. A usual sample rate is 44khz and render is called for 128 samples each time. I think this means render would get called ~340 per seconds in general. Media people can confirm.
> > What happens if tryGet() fails on the last attempt?
I am personally not too worried about this part. If Audio is still audible by the time by stop rendering then it will always be cut off in general. Since requests to stop come on the main thread and since audio is being rendered on the audio thread, the exact point we stop rendering is always undefined.
> > > I used tryGet() because: > > 1. It is generally frowned upon to have the audio thread waiting on a lock > > 2. Even though we don't have the latest bounds, the RingBuffer may already > > have enough samples to provide an answer. > > > > That said, if our media people say otherwise, I can switch to a get() and > > actually wait for the lock. > > Definitely agree that we should not lock on the audio thread. > > Another option is, instead of { isLocked, pair }, you can use { pair*, > std::array<pair>, 2> }. When the writing thread wants to update the bounds, > it writes the new pair value to the un-pointed-to pair and then atomically > swaps the pair* (with a release memory barrier). When the reading thread > wants to read the bounds, it reads using the pair* (with an acquire memory > barrier). This ensures no locking, and also no pathological failure. > However, it is still possible to miss the very last update if you never get > another chance to check.
This sounds like a good idea. I will see if I can make it work.
Chris Dumez
Comment 14
2020-12-10 13:22:09 PST
Created
attachment 415921
[details]
Patch
Chris Dumez
Comment 15
2020-12-10 13:24:18 PST
(In reply to Chris Dumez from
comment #14
)
> Created
attachment 415921
[details]
> Patch
New approach based on feedback from Peng and Geoff: 1. I updated the CARingBuffer class so that it now stores its frame bounds in its memory backing (which may be shared memory). This way, we no longer need a separate SharedRingBufferFrameBounds class to get this functionality. 2. I no longer use a lock for the frame bounds in shared memory and instead use the approach suggested by Geoff (buffer of bounds + atomic index).
Chris Dumez
Comment 16
2020-12-10 13:42:42 PST
Comment on
attachment 415921
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=415921&action=review
> Source/WebKit/GPUProcess/media/RemoteAudioDestinationManager.cpp:82 > + auto memory = SharedMemory::map(ipcHandle.handle, SharedMemory::Protection::ReadWrite);
Now that I no longer have a lock, I can keep this ReadOnly. Will update the patch accordingly.
Chris Dumez
Comment 17
2020-12-10 13:54:09 PST
Created
attachment 415925
[details]
Patch
Chris Dumez
Comment 18
2020-12-10 14:01:58 PST
Created
attachment 415927
[details]
Patch
Geoffrey Garen
Comment 19
2020-12-10 14:33:33 PST
Comment on
attachment 415927
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=415927&action=review
r=me
> Source/WebCore/platform/audio/cocoa/CARingBuffer.cpp:332 > + // When the RingBuffer is backed by shared memory, make sure we pull frame bounds from shared memory before fetching. > + updateFrameBounds(); > + > + uint64_t start, end; > + getCurrentFrameBoundsWithoutUpdate(start, end);
Should this just be getCurrentFrameBounds()? Seems like the behavior would be equivalent.
> Source/WebKit/Shared/Cocoa/SharedRingBufferStorage.h:73 > + static constexpr unsigned boundsBufferSize { 32 }; > + struct FrameBounds { > + std::pair<uint64_t, uint64_t> boundsBuffer[boundsBufferSize]; > + Atomic<unsigned> boundsBufferIndex { 0 }; > + };
The logic of this data structure and its related code seem right to me, but I don't fully understand why it's OK for this class and the related CARingBuffer class to assume that the reader will not fall behind by more than 32 frames. I get it that falling behind is very rare and will definitely cause audio distortion. But this logic (and the pre-existing kGeneralRingTimeBoundsQueueSize in CARingBuffer) seem to take the audio distortion bug and embiggen it into a buffer overflow bug. Seems like something we should follow up on with our audio experts. Maybe they can explain why 32 is always enough. Or maybe we need to adjust this logic in both places. (We could use a strategy similar to your previous tryLock() strategy; or we could have the *reader* update the boundsBufferIndex and have the writer keep writing to the same index until the reader is ready for the next one).
Chris Dumez
Comment 20
2020-12-10 14:37:02 PST
Created
attachment 415931
[details]
Patch
Chris Dumez
Comment 21
2020-12-10 14:40:56 PST
(In reply to Geoffrey Garen from
comment #19
)
> Comment on
attachment 415927
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=415927&action=review
> > r=me > > > Source/WebCore/platform/audio/cocoa/CARingBuffer.cpp:332 > > + // When the RingBuffer is backed by shared memory, make sure we pull frame bounds from shared memory before fetching. > > + updateFrameBounds(); > > + > > + uint64_t start, end; > > + getCurrentFrameBoundsWithoutUpdate(start, end); > > Should this just be getCurrentFrameBounds()? Seems like the behavior would > be equivalent.
Yes, I was trying to be more explicit about the update but I made the change.
> > Source/WebKit/Shared/Cocoa/SharedRingBufferStorage.h:73 > > + static constexpr unsigned boundsBufferSize { 32 }; > > + struct FrameBounds { > > + std::pair<uint64_t, uint64_t> boundsBuffer[boundsBufferSize]; > > + Atomic<unsigned> boundsBufferIndex { 0 }; > > + }; > > The logic of this data structure and its related code seem right to me, but > I don't fully understand why it's OK for this class and the related > CARingBuffer class to assume that the reader will not fall behind by more > than 32 frames. I get it that falling behind is very rare and will > definitely cause audio distortion. But this logic (and the pre-existing > kGeneralRingTimeBoundsQueueSize in CARingBuffer) seem to take the audio > distortion bug and embiggen it into a buffer overflow bug. Seems like > something we should follow up on with our audio experts. Maybe they can > explain why 32 is always enough. Or maybe we need to adjust this logic in > both places. (We could use a strategy similar to your previous tryLock() > strategy; or we could have the *reader* update the boundsBufferIndex and > have the writer keep writing to the same index until the reader is ready for > the next one).
Yes, I am discussing this with Jer. It seem CARingBuffer::frameOffset() always make sure we read inside the RingBuffer by doing a modulo BufferSize.
EWS
Comment 22
2020-12-10 18:15:08 PST
Committed
r270663
: <
https://trac.webkit.org/changeset/270663
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 415931
[details]
.
Radar WebKit Bug Importer
Comment 23
2020-12-10 18:16:20 PST
<
rdar://problem/72206083
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug