Make it possible for the web process to write display list items while the GPU process is reading items from shared memory.
Created attachment 413231 [details] Patch
Created attachment 413367 [details] Depends on #218588 and #218406
Created attachment 413510 [details] Rebase on trunk
Created attachment 413525 [details] Patch
Created attachment 413526 [details] Tweak ChangeLog wording
Comment on attachment 413526 [details] Tweak ChangeLog wording View in context: https://bugs.webkit.org/attachment.cgi?id=413526&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 > + DisplayList::ItemBufferReadingClient* itemBufferClient; > if (imageBuffer->isAccelerated()) > - displayList->setItemBufferClient(static_cast<AcceleratedRemoteImageBuffer*>(imageBuffer)); > + itemBufferClient = static_cast<AcceleratedRemoteImageBuffer*>(imageBuffer); > else > - displayList->setItemBufferClient(static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer)); > + itemBufferClient = static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer); What is the purpose of these casts?
Comment on attachment 413526 [details] Tweak ChangeLog wording View in context: https://bugs.webkit.org/attachment.cgi?id=413526&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 >> + itemBufferClient = static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer); > > What is the purpose of these casts? These casts are currently here because RemoteImageBuffer is the thing that implements ItemBufferReadingClient. (That said, perhaps it would be cleaner to make RemoteRenderingBackend the client, since none of the logic for decoding display list items is specific to RemoteImageBuffer anyways. If I did that, then I would be able to remove these casts)
(In reply to Wenson Hsieh from comment #7) > Comment on attachment 413526 [details] > Tweak ChangeLog wording > > View in context: > https://bugs.webkit.org/attachment.cgi?id=413526&action=review > > >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 > >> + itemBufferClient = static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer); > > > > What is the purpose of these casts? > > These casts are currently here because RemoteImageBuffer is the thing that > implements ItemBufferReadingClient. > > (That said, perhaps it would be cleaner to make RemoteRenderingBackend the > client, since none of the logic for decoding display list items is specific > to RemoteImageBuffer anyways. If I did that, then I would be able to remove > these casts) I see. That does sound cleaner.
Created attachment 413534 [details] Depends on #218689
<rdar://problem/71167220>
Comment on attachment 413534 [details] Depends on #218689 View in context: https://bugs.webkit.org/attachment.cgi?id=413534&action=review > Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.cpp:34 > + unreadBytesHandle() -= amount; Check for underflow? > Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.h:42 > + std::unique_ptr<WebCore::DisplayList::DisplayList> createDisplayList(size_t offset, size_t capacity, WebCore::DisplayList::ItemBufferReadingClient&) const; Maybe readDisplayListFromHandle() to not make it sound like it creates a fresh, empty DisplayList. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:128 > +void RemoteRenderingBackend::wakeUpAndProcessDisplayList(WebCore::DisplayList::ItemBufferIdentifier initialIdentifier, uint64_t initialOffset, RenderingResourceIdentifier renderingResourceIdentifier) renderingResourceIdentifier -> destinationBufferIdentifier > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:130 > + auto imageBuffer = m_remoteResourceCache.cachedImageBuffer(renderingResourceIdentifier); Why is this getting a designation buffer from the source cache? Resouce cache buffers are readonly in the GPU process. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:132 > // FIXME: Add a message check to terminate the web process. I think we need to do this very soon. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 > + SharedDisplayListHandle::Lock locker { *sharedHandle }; The cool kids do something like auto locker = SharedDisplayListHandle::Lock { ... } > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:151 > + imageBuffer->flushDisplayList(*sharedHandle->createDisplayList(offset, sizeToRead, *this)); We need to call this something other than "flushDisplayList". It's more "render display list". > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:153 > + offset += sizeToRead; Do we need overflow checks? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:166 > + sharedHandle = m_sharedDisplayListHandles.get(nextIdentifier); I think I'd feel more comfortable if you didn't change the sharedHandle in this loop, since sharedHandle and sizeToRead go together and it would be easy to introduce mistakes in future. Can you refractor the code to read from a single handle into its own function? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:203 > +void RemoteRenderingBackend::didCreateSharedDisplayListHandle(DisplayList::ItemBufferIdentifier identifier, const SharedMemory::IPCHandle& handle, RenderingResourceIdentifier remoteResourceIdentifier) remoteResourceIdentifier -> destinationBufferIdentifier > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:205 > + if (auto sharedMemory = SharedMemory::map(handle.handle, SharedMemory::Protection::ReadWrite)) Why are we mapping ReadWrite in the GPUP? Is this just to be able to write unreadBytes? Should that instead be in its own little shared memory handle? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.h:105 > + void didCreateSharedDisplayListHandle(WebCore::DisplayList::ItemBufferIdentifier, const SharedMemory::IPCHandle&, WebCore::RenderingResourceIdentifier); Give RenderingResourceIdentifier a name > Source/WebKit/Shared/SharedDisplayListHandle.h:38 > + static constexpr auto reservedCapacityAtStart = sizeof(bool) + sizeof(size_t); Maybe round this up for better alignment? > Source/WebKit/Shared/SharedDisplayListHandle.h:42 > + size_t unreadBytes() const { return *reinterpret_cast<size_t*>(data() + sizeof(bool)); } Rather than this manual offsetting, how about describing the buffer contents via a struct. > Source/WebKit/Shared/SharedDisplayListHandle.h:58 > + auto& atomicValue = m_handle.atomicLockValue(); > + while (true) { > + std::this_thread::yield(); > + bool unlocked = false; > + if (atomicValue.compare_exchange_weak(unlocked, true)) > + break; Please check with a locking/scheduling/priority-donating expert. > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:35 > + m_writableOffset += amount; > + unreadBytesHandle() += amount; Check for overflow? > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:40 > + return sharedMemory().size() - writableOffset(); Check for underflow? > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:55 > + ASSERT(m_writableOffset == SharedDisplayListHandle::reservedCapacityAtStart); RELEASE_ASSERT? > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:59 > + Lock locker { *this }; auto locker = > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:129 > + SharedDisplayListHandle::Lock locker { *sharedHandle }; auto locker =
Comment on attachment 413534 [details] Depends on #218689 View in context: https://bugs.webkit.org/attachment.cgi?id=413534&action=review >> Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.cpp:34 >> + unreadBytesHandle() -= amount; > > Check for underflow? Done! Made this robust against underflow. >> Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.h:42 >> + std::unique_ptr<WebCore::DisplayList::DisplayList> createDisplayList(size_t offset, size_t capacity, WebCore::DisplayList::ItemBufferReadingClient&) const; > > Maybe readDisplayListFromHandle() to not make it sound like it creates a fresh, empty DisplayList. I do agree `readDisplayListFromHandle` is better than my current name, but at the same time `readDisplayListFromHandle` sounds like it's performing an action (namely, reading a display list). I'll change this to `displayListForReadingFromHandle`. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:128 >> +void RemoteRenderingBackend::wakeUpAndProcessDisplayList(WebCore::DisplayList::ItemBufferIdentifier initialIdentifier, uint64_t initialOffset, RenderingResourceIdentifier renderingResourceIdentifier) > > renderingResourceIdentifier -> destinationBufferIdentifier Changed to destinationBufferIdentifier. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:130 >> + auto imageBuffer = m_remoteResourceCache.cachedImageBuffer(renderingResourceIdentifier); > > Why is this getting a designation buffer from the source cache? Resouce cache buffers are readonly in the GPU process. The current status quo is that remote image buffers that back canvases are tracked in the GPU process as cached resources (see RemoteRenderingBackend::createImageBuffer, above). I agree it's a bit confusing, but this might have benefits (e.g. when painting from canvas to canvas). Perhaps RemoteResourceCache and its cacheImageBuffer method should be renamed to something more general? (For example, RemoteResourceStore::addImageBuffer or RemoteResourceRegistry::registerImageBuffer). >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:132 >> // FIXME: Add a message check to terminate the web process. > > I think we need to do this very soon. Yes! >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 >> + SharedDisplayListHandle::Lock locker { *sharedHandle }; > > The cool kids do something like auto locker = SharedDisplayListHandle::Lock { ... } 👍🏻 >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:151 >> + imageBuffer->flushDisplayList(*sharedHandle->createDisplayList(offset, sizeToRead, *this)); > > We need to call this something other than "flushDisplayList". It's more "render display list". `renderDisplayList` sounds good to me. I think I'll split this renaming out into a different patch. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:153 >> + offset += sizeToRead; > > Do we need overflow checks? Added! >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:166 >> + sharedHandle = m_sharedDisplayListHandles.get(nextIdentifier); > > I think I'd feel more comfortable if you didn't change the sharedHandle in this loop, since sharedHandle and sizeToRead go together and it would be easy to introduce mistakes in future. Can you refractor the code to read from a single handle into its own function? Yeah, that makes sense — I will split out the logic for reading from a single shared handle into a new private helper. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:203 >> +void RemoteRenderingBackend::didCreateSharedDisplayListHandle(DisplayList::ItemBufferIdentifier identifier, const SharedMemory::IPCHandle& handle, RenderingResourceIdentifier remoteResourceIdentifier) > > remoteResourceIdentifier -> destinationBufferIdentifier Fixed! >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:205 >> + if (auto sharedMemory = SharedMemory::map(handle.handle, SharedMemory::Protection::ReadWrite)) > > Why are we mapping ReadWrite in the GPUP? Is this just to be able to write unreadBytes? Should that instead be in its own little shared memory handle? Yes, it is so that we can update the "unread bytes" count. I recall discussing it with Geoff, and he suggested that we should just stick it in the first bit of each shared memory chunk instead of taking up a whole new page for an atomic. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.h:105 >> + void didCreateSharedDisplayListHandle(WebCore::DisplayList::ItemBufferIdentifier, const SharedMemory::IPCHandle&, WebCore::RenderingResourceIdentifier); > > Give RenderingResourceIdentifier a name 👍🏻 "destinationBufferIdentifier" >> Source/WebKit/Shared/SharedDisplayListHandle.h:38 >> + static constexpr auto reservedCapacityAtStart = sizeof(bool) + sizeof(size_t); > > Maybe round this up for better alignment? Done! (changed to 16 bytes) >> Source/WebKit/Shared/SharedDisplayListHandle.h:58 >> + break; > > Please check with a locking/scheduling/priority-donating expert. I came up with this part with some help from Geoff; perhaps he has some more thoughts on this! One thing we should investigate is changing this to use a single atomic size_t counter instead of an atomic lock. When I tried doing this, however, it ended up hurting performance for (as-of-yet) unknown reasons... >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:35 >> + unreadBytesHandle() += amount; > > Check for overflow? Fixed! >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:40 >> + return sharedMemory().size() - writableOffset(); > > Check for underflow? Fixed! >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:55 >> + ASSERT(m_writableOffset == SharedDisplayListHandle::reservedCapacityAtStart); > > RELEASE_ASSERT? Changed to RELEASE_ASSERT. >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:59 >> + Lock locker { *this }; > > auto locker = Fixed! >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:129 >> + SharedDisplayListHandle::Lock locker { *sharedHandle }; > > auto locker = Fixed!
Created attachment 413701 [details] Patch
Created attachment 413729 [details] Use CheckedSize instead of Checked<>
Comment on attachment 413729 [details] Use CheckedSize instead of Checked<> View in context: https://bugs.webkit.org/attachment.cgi?id=413729&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:136 > + auto locker = SharedDisplayListHandle::Lock { handle }; > + sizeToRead = handle.unreadBytes(); This seems error prone. Can't we just add the lock to unreadBytes() instead? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:167 > + auto locker = SharedDisplayListHandle::Lock { handle }; > + handle.advance(sizeToRead); > + sizeToRead = handle.unreadBytes(); Ditto about moving the lock into advance. Maybe advance can return the new unread bytes value. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:181 > + auto initialHandle = m_sharedDisplayListHandles.get(initialIdentifier); > + if (UNLIKELY(!initialHandle)) { As we discussed, it's a bit dangerous to rely on the fact the map won't be mutated while we're accessing this handle. We should either make the handle ref-counted or add some kind of message check to make sure we don't remove stuff from the map. > Source/WebKit/Shared/SharedDisplayListHandle.h:57 > + std::this_thread::yield(); Why are we yielding before checking the value at least once? That seems rather inefficient. Also, use Thread::yield? > Source/WebKit/Shared/SharedDisplayListHandle.h:61 > + } We should probably limit ourselves from spinning forever. We probably need an equivalent of parking lot cross-process. e.g. Use the OS-level lock feature for example. We can do that in a follow up patch though. > Source/WebKit/Shared/SharedDisplayListHandle.h:83 > + struct DisplayListHandleStructure { I would have called this DisplayListSharedMemoryHeader or something. "Structure" sounds rather vague & sounds like something related to JSC::Structure, which it is not. > Source/WebKit/Shared/SharedDisplayListHandle.h:88 > + std::atomic<uint64_t>& atomicLockValue() { return structure().lock; } Maybe use WTF::Atomic instead? > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:132 > + auto locker = SharedDisplayListHandle::Lock { *sharedHandle }; > + > + bool unreadCountWasEmpty = !sharedHandle->unreadBytes(); > + sharedHandle->advance(handle.capacity); It seems like this is the only part we need to put under lock? Can we move the lock into advance? Maybe advance can return the old & new values of unread bytes.
Comment on attachment 413729 [details] Use CheckedSize instead of Checked<> View in context: https://bugs.webkit.org/attachment.cgi?id=413729&action=review > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:81 > + auto locker = Lock { *this }; > + if (unreadBytes()) We should move the lock to unreadBytes(). There is no need to keep holding onto the lock when we're updating m_writableOffset.
Comment on attachment 413729 [details] Use CheckedSize instead of Checked<> View in context: https://bugs.webkit.org/attachment.cgi?id=413729&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:136 >> + sizeToRead = handle.unreadBytes(); > > This seems error prone. Can't we just add the lock to unreadBytes() instead? Good call — done! Also made the inner Lock class protected, since the code using these handles won't need to manually lock anymore. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:167 >> + sizeToRead = handle.unreadBytes(); > > Ditto about moving the lock into advance. Maybe advance can return the new unread bytes value. Added to advance() too, and made it return the new value. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:181 >> + if (UNLIKELY(!initialHandle)) { > > As we discussed, it's a bit dangerous to rely on the fact the map won't be mutated while we're accessing this handle. > We should either make the handle ref-counted or add some kind of message check to make sure we don't remove stuff from the map. Yes — made SharedDisplayListHandle (and its reader/writer subclasses) ref-counted, and also added an early return in `RemoteRenderingBackend::didCreateSharedDisplayListHandle`, in the case where we're attempting to re-register an ID, with a FIXME to addionally turn it into a MESSAGE_CHECK. In an upcoming patch, I'm going to make all of these returns MESSAGE_CHECK and kill the web content process. >> Source/WebKit/Shared/SharedDisplayListHandle.h:57 >> + std::this_thread::yield(); > > Why are we yielding before checking the value at least once? That seems rather inefficient. > Also, use Thread::yield? Whoops, good catch. Changed to use Thread::yield, and also moved it after the break; statement so we'll only yield if we were unsuccessful in grabbing the lock the first time. >> Source/WebKit/Shared/SharedDisplayListHandle.h:61 >> + } > > We should probably limit ourselves from spinning forever. > We probably need an equivalent of parking lot cross-process. > e.g. Use the OS-level lock feature for example. > We can do that in a follow up patch though. Added a FIXME for this. >> Source/WebKit/Shared/SharedDisplayListHandle.h:83 >> + struct DisplayListHandleStructure { > > I would have called this DisplayListSharedMemoryHeader or something. > "Structure" sounds rather vague & sounds like something related to JSC::Structure, which it is not. Sounds good — renamed to DisplayListSharedMemoryHeader (and renamed the structure() method to header()). >> Source/WebKit/Shared/SharedDisplayListHandle.h:88 >> + std::atomic<uint64_t>& atomicLockValue() { return structure().lock; } > > Maybe use WTF::Atomic instead? Done! >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:81 >> + if (unreadBytes()) > > We should move the lock to unreadBytes(). > There is no need to keep holding onto the lock when we're updating m_writableOffset. Fixed! >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:132 >> + sharedHandle->advance(handle.capacity); > > It seems like this is the only part we need to put under lock? > Can we move the lock into advance? Maybe advance can return the old & new values of unread bytes. Done!
Created attachment 413767 [details] For EWS
Created attachment 413775 [details] For EWS
Committed r269682: <https://trac.webkit.org/changeset/269682> All reviewed patches have been landed. Closing bug and clearing flags on attachment 413775 [details].