WebGL GPU process IPC should be faster than current WebKit IPC Most likely design involves putting the remote graphics context method calls into a shared memory until a synchronisation point such as synchronised method call is invoked.
Created attachment 415640 [details] Patch
Created attachment 415641 [details] Patch
Seems like this is trying to solve a very similar problem to the problem Wenson is solving for DisplayList IPC. Rather than having two divergent implementations it would be much better to unify on a single set of primitives. (Even more ideally, if it turns out we can improve all IPC using a different model than we currently use, we should look into that, but it may be that for most uses the ergonomics of the existing approach make more sense).
Created attachment 415993 [details] Patch
> Rather than having two divergent > implementations it would be much better to unify on a single set of > primitives. What, specifically, do you think is the single set of primitives that would be much better? How, specifically, do you think we should evaluate whether that set of primitives is, in fact, much better?
(In reply to Geoffrey Garen from comment #5) > > Rather than having two divergent > > implementations it would be much better to unify on a single set of > > primitives. > > What, specifically, do you think is the single set of primitives that would > be much better? That's a really tough question, and one I am not able to answer. I am raising a concern here that what I see looks like a lot of code that is duplicating logic that is particularly tricky to get to right, and that feels worth bringing up. Given this seems to be about WebGL, which has relatively similar characteristics to a 2d graphics context, I think a good starting point would be trying to generalize Wenson and others' shared memory display list work. > How, specifically, do you think we should evaluate whether that set of > primitives is, in fact, much better? This is would be subjective, but in my experience, reducing the number of ways to do similar things, and breaking them down into manageable re-usable components makes for an easier time building new things on top of them. And when it seems like we are duplicating things, taking a step back, and seeing if there is a collaboration that can be done can be useful. There are of course times when multiple similar data structures or approaches are the right decision, whether for specialized memory or performance or compatibility reasons, but those, again in my experience, tend to be the exception and not the rule.
> > What, specifically, do you think is the single set of primitives that would > > be much better? > > That's a really tough question, and one I am not able to answer. I am > raising a concern here that what I see looks like a lot of code that is > duplicating logic that is particularly tricky to get to right, and that > feels worth bringing up. What, specifically, has been duplicated? How, specifically, should it be de-duplicated? > Given this seems to be about WebGL, which has relatively similar > characteristics to a 2d graphics context, I think a good starting point > would be trying to generalize Wenson and others' shared memory display list > work. 'staring point' seems to imply that nothing else comes before it. That would mean setting aside this patch. A few problems with that proposal: (1) Setting aside (mostly) working WebGL code has the concrete downside that it moves progress backwards on WebGL. (2) The shared memory display list is currently a performance regression. This bug report is about getting a speedup. Standardizing on a regression is not a good way to get a speedup. (3) MessageStreamReceiver, being lower level and more generic, is more likely in my opinion than the shared memory display list to be a reusable abstraction. So, the proposal may defeat its own goal statement. (4) The proposal is not falsifiable. It does not include any condition under which we could change course and choose not to share code, nor any test for such a condition. In my experience, software and software engineering practices are more successful when they are tested. (5) Sharing does not include any stated upside. "easier time building new things on top of them" comes close, but doesn't specify what else we need to build on top, or why it matters. The obvious counter-argument is YAGNI.
Comment on attachment 415993 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415993&action=review > Source/WebKit/ChangeLog:28 > + stream andm processing will continue once the out-of-line and > Source/WebKit/ChangeLog:36 > + - Simple things work > + - Buggy in certain conditions (asserts/crashes) Seems like this is not ready to land yet? I'll mark it r- for now to indicate it is not ready. > Source/WebKit/ChangeLog:48 > + No new tests, tested by existing tests. Do we have preliminary perf data from this patch? > Source/WebKit/Platform/IPC/Decoder.cpp:195 > - > + Whitespace change > Source/WebKit/Platform/IPC/Decoder.h:58 > + static constexpr size_t streamMessageAlignment = alignof(uint64_t); What are we trying to achieve with this alignment setting? Is the goal to enforce a minimum alignment sufficient to read any type? > Source/WebKit/Platform/IPC/Decoder.h:59 > + static constexpr size_t outOfLineStreamMessageSize = sizeof(uint64_t) * 2; Why 2? > Source/WebKit/Platform/IPC/Decoder.h:161 > - > + Whitespace change > Source/WebKit/Platform/IPC/DestinationStream.h:39 > +class DestinationStreamBase { I think this might read more clearly as a data member named DestinationStream (or just Stream), rather than a base clase. A stream reader/writer has a stream to read/write, but is not itself a stream. > Source/WebKit/Platform/IPC/DestinationStream.h:51 > + uint64_t cacheLinePadding[2]; > + Atomic<size_t> writeOffset; I think you can use alignas instead of cacheLinePadding. > Source/WebKit/Platform/IPC/DestinationStream.h:57 > + size_t adjustOffset(size_t offset) { This function could use a more specific name. I think the adjustment being performed here is wrapping, right? > Source/WebKit/Platform/IPC/DestinationStream.h:71 > + static constexpr size_t kReadLimitSleepTag = 1 << 31; I think this indicates that the reader is sleeping, so I would call it kReaderIsSleepingTag instead. But also: commitWrite() doesn't seem to consult this tag. Is that a bug? > Source/WebKit/Platform/IPC/DestinationStream.h:107 > + void write(uint8_t*& ptr, size_t& capacity) > + { > + size_t writeLimit = std::min(header().readOffset.load(std::memory_order_acquire), m_dataSize); > + alignedSpan(ptr, capacity, m_writeOffset, writeLimit); > + } > + void writep(uint8_t*& ptr, size_t& capacity) > + { > + size_t writeLimit = std::min(header().readOffset.load(std::memory_order_acquire), m_dataSize); > + alignedSpan(ptr, capacity, m_writeOffset, writeLimit); > + } What's the difference between these two functions? > Source/WebKit/Platform/IPC/DestinationStream.h:120 > + if (readLimit != expectedReadLimit) > + return WakeUpReader::Yes; > + return WakeUpReader::No; Should we check kReadLimitSleepTag here? > Source/WebKit/Platform/IPC/Encoder.h:148 > - > + Whitespace change > Source/WebKit/Platform/IPC/MessageStreamReceiver.h:51 > + ASSERT(decoder.isValid()); Probably need a stronger check here. > Source/WebKit/Platform/IPC/MessageStreamSender.h:52 > + Thread::yield(); This is probably OK for now, but I think we should switch to using a semaphore (for both reading and writing). See discussion in https://bugs.webkit.org/show_bug.cgi?id=219586. > Source/WebKit/Platform/IPC/MessageStreamSender.h:64 > + auto wakeupResult = m_stream.commitWrite(Encoder::outOfLineStreamMessageSize); Since the StreamMessageOutOfLine message is a flag, why does it need size at all? Are we trying to overload the read operation in some way? If so, can we do something more explicit instead?
Created attachment 416239 [details] Patch
Created attachment 416240 [details] Patch
<rdar://problem/72340651>
Comment on attachment 416239 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=416239&action=review Very cool > Source/WebKit/Platform/IPC/DestinationStream.h:50 > +// DestinationStreamSender and DestinationStremaReceiver use StreamBuffer to communicate. typo 'DestinationStremaReceiver' > Source/WebKit/Platform/IPC/DestinationStream.h:214 > + size_t len(size_t offset, size_t limit) WebKit generally prefers full words, length instead len > Source/WebKit/Platform/IPC/DestinationStream.h:299 > + size_t len(size_t offset, size_t limit) here too > Source/WebKit/Platform/IPC/Encoder.cpp:225 > - > - std::memset(m_buffer + m_bufferSize, 0, alignedSize - m_bufferSize); > - > +#if 0 > + if (m_bufferSize < alignedSize) > + std::memset(m_buffer + m_bufferSize, 0, alignedSize - m_bufferSize); > +#endif some disabled code here > Source/WebKit/Platform/IPC/MessageStreamSender.h:61 > + encoder = makeUnique<Encoder>(Messages::RemoteGraphicsContextGL::StreamWakeUp::name(), destinationID); I guess RemoteGraphicsContextGL here should be something more generic in the future.
(In reply to Geoffrey Garen from comment #8) > > Source/WebKit/ChangeLog:48 > > + No new tests, tested by existing tests. > > Do we have preliminary perf data from this patch? I added couple of examples. IIRC current ToT MotionMark for me is 800 pts, and this new one is 4500pts and non-pup is 8800 pts. > > Source/WebKit/Platform/IPC/Decoder.h:58 > > + static constexpr size_t streamMessageAlignment = alignof(uint64_t); > > What are we trying to achieve with this alignment setting? Is the goal to > enforce a minimum alignment sufficient to read any type? That's the sort-of-goal of the Encoder/Decoder when encoding a normal message. A stream message must be encoded with similar restrictions, since if the message does not fit to the stream, it must be sent out of line. The current Encoder is not able to "stop", so we must convert it on-the-fly to encode a normal message. The current "coders" e.g. the code using Encoder, are not polymorphic, so we cannot create a StreamEncoder and "normal" Encoder. It's not ideal. Added this as a hand-wavy comment in Encoder.h > > Source/WebKit/Platform/IPC/Decoder.h:59 > > + static constexpr size_t outOfLineStreamMessageSize = sizeof(uint64_t) * 2; > > Why 2? This is the message header size. Added this as a comment in the Encoder.h > > Source/WebKit/Platform/IPC/MessageStreamSender.h:64 > > + auto wakeupResult = m_stream.commitWrite(Encoder::outOfLineStreamMessageSize); > > Since the StreamMessageOutOfLine message is a flag, why does it need size at > all? Are we trying to overload the read operation in some way? If so, can we > do something more explicit instead? Items in the normal message are: [message header][message data] So items in the stream buffer are: [message header][message data] "Out of line message" in the stream buffer is: [message header with OOL bit on] The reason for the stream message having the message header encoded is explained above: it's not easy currently to encode the message so that the format would be different in stream message than in normal message. The encoding part is a bit bad, so indeed that might be something to refine. > > > Source/WebKit/Platform/IPC/Decoder.h:161 > > - > > + > > Whitespace change > > > Source/WebKit/Platform/IPC/DestinationStream.h:39 > > +class DestinationStreamBase { > > I think this might read more clearly as a data member named > DestinationStream (or just Stream), rather than a base clase. A stream > reader/writer has a stream to read/write, but is not itself a stream. Done. (It makes it a bit clunkier in some ways, but might be remedied if the implementation is changed to a different circular buffer scheme, see the FIXME.) > > > Source/WebKit/Platform/IPC/DestinationStream.h:51 > > + uint64_t cacheLinePadding[2]; > > + Atomic<size_t> writeOffset; > > I think you can use alignas instead of cacheLinePadding. So the idea is not to align, if that was what you thought the idea was? (The idea was that if the two variables are adjacent in memory, they may fall into same cache line. I seem to recall this may create (a very slight?) bottleneck: even if the variables are distinct, they are shared from the processors perspective) or did you mean I could use something like "alignas(uint64_t[2])" ? I did not think of that, I'll see if that can be used.. > > > Source/WebKit/Platform/IPC/DestinationStream.h:57 > > + size_t adjustOffset(size_t offset) { > > This function could use a more specific name. I think the adjustment being > performed here is wrapping, right? Changed to wrapOffset. > > > Source/WebKit/Platform/IPC/DestinationStream.h:71 > > + static constexpr size_t kReadLimitSleepTag = 1 << 31; > > I think this indicates that the reader is sleeping, so I would call it > kReaderIsSleepingTag instead. > > But also: commitWrite() doesn't seem to consult this tag. Is that a bug? It does, but in a bit subtle way. The sender knows what it wrote, so it just checks if the value is still there. I added an comment. > > > Source/WebKit/Platform/IPC/DestinationStream.h:107 > > + void write(uint8_t*& ptr, size_t& capacity) > > + void writep(uint8_t*& ptr, size_t& capacity) > > What's the difference between these two functions? Debugging leftover, sorry. Removed. > > > Source/WebKit/Platform/IPC/DestinationStream.h:120 > > + if (readLimit != expectedReadLimit) > > + return WakeUpReader::Yes; > > + return WakeUpReader::No; > > Should we check kReadLimitSleepTag here? So it's the "if readLimit is different than what I wrote there previously". > > Source/WebKit/Platform/IPC/MessageStreamReceiver.h:51 > > + ASSERT(decoder.isValid()); > > Probably need a stronger check here. Done.
Created attachment 416242 [details] Patch
Created attachment 416533 [details] Patch
FYI, as of https://trac.webkit.org/changeset/270938/webkit, you can now use wtf/cocoa/MachSemaphore.h to establish a cross-process semaphore. I think this means that the sender shouldn't have to live-yield anymore. Also, maybe the receiver can wait for more data by using a semaphore instead of requiring a new IPC to wake it up (at least in the common case).
(In reply to Geoffrey Garen from comment #16) > FYI, as of https://trac.webkit.org/changeset/270938/webkit, you can now use > wtf/cocoa/MachSemaphore.h to establish a cross-process semaphore. > > I think this means that the sender shouldn't have to live-yield anymore. > > Also, maybe the receiver can wait for more data by using a semaphore instead > of requiring a new IPC to wake it up (at least in the common case). Ok.
Created attachment 418406 [details] Patch
Created attachment 418915 [details] Patch
(In reply to Geoffrey Garen from comment #16) > FYI, as of https://trac.webkit.org/changeset/270938/webkit, you can now use > wtf/cocoa/MachSemaphore.h to establish a cross-process semaphore. > > I think this means that the sender shouldn't have to live-yield anymore. > > Also, maybe the receiver can wait for more data by using a semaphore instead > of requiring a new IPC to wake it up (at least in the common case). The new patch addresses this. Unfortunately the ChangeLog is a bit lacking, but I'll work on it more. Just uploading this in case there's interest in giving feedback on the code part.
Created attachment 418916 [details] Patch
Comment on attachment 418916 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=418916&action=review > Source/WebKit/Platform/IPC/ArgumentCoders.cpp:-114 > - Some stray whitespace changes in this file. > Source/WebKit/Platform/IPC/Connection.cpp:389 > + auto messageFilterLocker = holdLock(m_messageFiltersLock); > + m_messageFilters.addFilter(messageFilter, receiverName, destinationID); Would be good to add the message ordering comment here too. > Source/WebKit/Platform/IPC/Connection.cpp:390 > + auto locker = holdLock(m_incomingMessagesMutex); What is the case where we intend to hold m_incomingMessagesMutex without holding m_messageFiltersLock, or vice versa? Maybe we can reduce to just one lock? > Source/WebKit/Platform/IPC/Connection.cpp:396 > + m_incomingMessages.remove(it); > + end = m_incomingMessages.end(); I think this is a bug. remove() invalidates 'it', so re-reading end() here is not sufficient. Also, it's kind of expensive to do O(n) removal each time through the loop. Also, I think it would be better to use an explicit return value to indicate message consumption, rather than testing for null. C++ does not require that moved-from values be null. > Source/WebKit/Platform/IPC/Connection.cpp:447 > - > + Some whitespace changes in this file. > Source/WebKit/Platform/IPC/Connection.cpp:1154 > + // FIXME: This has a message ordering bug where the early messages dispatched here would arrive > + // later than later messages that get dispatched by the message processing queue. This should be fixed by making WorkQueueReceiver > + // a message filter. Do we need to fix this before landing? (Seems like probably?)
Thanks for the review. I'll fix the whitespace and move some commits away from the patch. (In reply to Geoffrey Garen from comment #22) > Comment on attachment 418916 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=418916&action=review > > > Source/WebKit/Platform/IPC/ArgumentCoders.cpp:-114 > > - > > Some stray whitespace changes in this file. > > > Source/WebKit/Platform/IPC/Connection.cpp:389 > > + auto messageFilterLocker = holdLock(m_messageFiltersLock); > > + m_messageFilters.addFilter(messageFilter, receiverName, destinationID); > > Would be good to add the message ordering comment here too. > > > Source/WebKit/Platform/IPC/Connection.cpp:390 > > + auto locker = holdLock(m_incomingMessagesMutex); > > What is the case where we intend to hold m_incomingMessagesMutex without > holding m_messageFiltersLock, or vice versa? Maybe we can reduce to just one > lock? > > > Source/WebKit/Platform/IPC/Connection.cpp:396 > > + m_incomingMessages.remove(it); > > + end = m_incomingMessages.end(); > > I think this is a bug. remove() invalidates 'it', so re-reading end() here > is not sufficient. Right, I'll do it another way. As far as I understood looking at the implementation it didn't invalidate the iterator but yes, it's sketchy. > Also, it's kind of expensive to do O(n) removal each time through the loop. Right, I'll do it another way. > Also, I think it would be better to use an explicit return value to indicate > message consumption, rather than testing for null. C++ does not require that > moved-from values be null. Sure. I'll change the contract.. FWIW, C++ does require moved-away unique_ptr to be null, which is being used here. https://en.cppreference.com/w/cpp/memory/unique_ptr/unique_ptr unique_ptr( unique_ptr&& u ) noexcept; (5) 5) Constructs a unique_ptr by transferring ownership from u to *this and stores the. null pointer in u. > > Source/WebKit/Platform/IPC/Connection.cpp:447 > > - > > + > > Some whitespace changes in this file. > > > Source/WebKit/Platform/IPC/Connection.cpp:1154 > > + // FIXME: This has a message ordering bug where the early messages dispatched here would arrive > > + // later than later messages that get dispatched by the message processing queue. This should be fixed by making WorkQueueReceiver > > + // a message filter. > > Do we need to fix this before landing? (Seems like probably?) I can do the fix when moving the code you commented to another patch, but it's a bit peripheral. For this patch it's not needed.
Created attachment 419427 [details] rebased to tot, not fully split
Created attachment 419705 [details] removed ipc message receive queue
Moved the code commented below into bug 221560. (In reply to Geoffrey Garen from comment #22) > Some stray whitespace changes in this file. Done. > What is the case where we intend to hold m_incomingMessagesMutex without > holding m_messageFiltersLock, or vice versa? Maybe we can reduce to just one > lock? Done in other patch, uses m_incomingMessagesMutex. > I think this is a bug. remove() invalidates 'it', so re-reading end() here > is not sufficient. > Also, it's kind of expensive to do O(n) removal each time through the loop. Done in the other patch. It willl construct a new list with the remaining items and swap. > Also, I think it would be better to use an explicit return value to indicate > message consumption, rather than testing for null. Done in other patch. The API now always transfers the decoder to the client, no programmatic opt out. > Some whitespace changes in this file. Done > Do we need to fix this before landing? (Seems like probably?) Done in the other patch. Only few examples, not all call sites.
Created attachment 419710 [details] ready for review
(In reply to Kimmo Kinnunen from comment #26) > Moved the code commented below into bug 221560. > > (In reply to Geoffrey Garen from comment #22) > > Some stray whitespace changes in this file. > > Done. Comments like these are fine, but not necessary.
Created attachment 420612 [details] Patch
Created attachment 420623 [details] Patch
Created attachment 420625 [details] Patch
Created attachment 420627 [details] Patch
Created attachment 420671 [details] Patch
Created attachment 420677 [details] Patch
Created attachment 420684 [details] Patch
Created attachment 420687 [details] Patch
Created attachment 420802 [details] Patch
Created attachment 420824 [details] Patch
Comment on attachment 420824 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=420824&action=review r=me I wasn't able to review the synchronization details as thoroughly as I'd like, but I guess we can refine more once this is in-tree. EWS failures seem like they were caused by an unrelated transient bug in media streams. Still, I'd suggest re-uploading and hitting cq+ to make sure this lands cleanly. > Source/WebKit/Platform/IPC/StreamClientConnection.h:167 > + // Not notifying on wake up since the out-of-stream message will do that. > + (void) release(encoder.size()); WebKit style is to use the UNUSED_VARIABLE macro here. > Source/WebKit/Platform/IPC/StreamClientConnection.h:193 > + return WTF::nullopt; Are there any valid conditions under which we would receive some data, but not enough data to be a valid message? If so, let's document them here. If not, let's ASSERT. > Source/WebKit/Platform/IPC/StreamClientConnection.h:205 > + if (receiverLimit != expectedReceiverLimit) > + return WakeUpReceiver::Yes; I think this would be clearer as "if (receiverLimit & senderOffsetReceiverIsSleepingTag)". Maybe also add an ASSERT that (receiverLimit & ~senderOffsetReceiverIsSleepingTag) == expectedReceiverLimit. > Source/WebKit/Platform/IPC/StreamServerConnection.cpp:89 > + size_t oldReceiverOffset = sharedReceiverOffset().exchange(receiverOffset, std::memory_order_acq_rel); > + // If the sender wrote over receiverOffset, it means the sender is waiting. > + if (oldReceiverOffset != m_receiverOffset) I think this would be clearer as "if (oldReceiverOffset & receiverOffsetSenderIsWaitingTag)". Same comment about adding an assert.
> Still, I'd suggest re-uploading and hitting cq+ to make sure this lands cleanly. ... since that will get the benefit of an up-to-date EWS run.
Created attachment 420941 [details] Patch
Created attachment 420956 [details] Patch
Created attachment 420959 [details] Patch
Created attachment 421080 [details] Patch
Created attachment 421097 [details] Patch for landing
Comment on attachment 421097 [details] Patch for landing Clearing flags on attachment: 421097 Committed r273204 (234390@main): <https://commits.webkit.org/234390@main>
All reviewed patches have been landed. Closing bug.