Bug 218426

Summary: [Concurrent display lists] Add an initial implementation of concurrent display list rendering
Product: WebKit Reporter: Wenson Hsieh <wenson_hsieh>
Component: WebKit2Assignee: Wenson Hsieh <wenson_hsieh>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, rniwa, saam, sabouhallawa, sam, simon.fraser, thorton, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 218568, 218588, 218661, 218689, 218720    
Bug Blocks: 218614    
Attachments:
Description Flags
Patch
none
Depends on #218588 and #218406
none
Rebase on trunk
ews-feeder: commit-queue-
Patch
none
Tweak ChangeLog wording
none
Depends on #218689
none
Patch
none
Use CheckedSize instead of Checked<>
none
For EWS
none
For EWS none

Wenson Hsieh
Reported 2020-11-01 11:28:54 PST
Make it possible for the web process to write display list items while the GPU process is reading items from shared memory.
Attachments
Patch (57.92 KB, patch)
2020-11-04 17:28 PST, Wenson Hsieh
no flags
Depends on #218588 and #218406 (57.44 KB, patch)
2020-11-05 16:16 PST, Wenson Hsieh
no flags
Rebase on trunk (57.78 KB, patch)
2020-11-06 20:42 PST, Wenson Hsieh
ews-feeder: commit-queue-
Patch (57.78 KB, patch)
2020-11-07 09:37 PST, Wenson Hsieh
no flags
Tweak ChangeLog wording (57.79 KB, patch)
2020-11-07 09:40 PST, Wenson Hsieh
no flags
Depends on #218689 (57.52 KB, patch)
2020-11-07 13:16 PST, Wenson Hsieh
no flags
Patch (60.25 KB, patch)
2020-11-10 08:52 PST, Wenson Hsieh
no flags
Use CheckedSize instead of Checked<> (61.73 KB, patch)
2020-11-10 13:20 PST, Wenson Hsieh
no flags
For EWS (62.14 KB, patch)
2020-11-10 20:47 PST, Wenson Hsieh
no flags
For EWS (62.18 KB, patch)
2020-11-10 21:33 PST, Wenson Hsieh
no flags
Wenson Hsieh
Comment 1 2020-11-04 17:28:56 PST Comment hidden (obsolete)
Wenson Hsieh
Comment 2 2020-11-05 16:16:31 PST Comment hidden (obsolete)
Wenson Hsieh
Comment 3 2020-11-06 20:42:25 PST Comment hidden (obsolete)
Wenson Hsieh
Comment 4 2020-11-07 09:37:39 PST Comment hidden (obsolete)
Wenson Hsieh
Comment 5 2020-11-07 09:40:32 PST
Created attachment 413526 [details] Tweak ChangeLog wording
Sam Weinig
Comment 6 2020-11-07 10:28:14 PST
Comment on attachment 413526 [details] Tweak ChangeLog wording View in context: https://bugs.webkit.org/attachment.cgi?id=413526&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 > + DisplayList::ItemBufferReadingClient* itemBufferClient; > if (imageBuffer->isAccelerated()) > - displayList->setItemBufferClient(static_cast<AcceleratedRemoteImageBuffer*>(imageBuffer)); > + itemBufferClient = static_cast<AcceleratedRemoteImageBuffer*>(imageBuffer); > else > - displayList->setItemBufferClient(static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer)); > + itemBufferClient = static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer); What is the purpose of these casts?
Wenson Hsieh
Comment 7 2020-11-07 11:28:43 PST
Comment on attachment 413526 [details] Tweak ChangeLog wording View in context: https://bugs.webkit.org/attachment.cgi?id=413526&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 >> + itemBufferClient = static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer); > > What is the purpose of these casts? These casts are currently here because RemoteImageBuffer is the thing that implements ItemBufferReadingClient. (That said, perhaps it would be cleaner to make RemoteRenderingBackend the client, since none of the logic for decoding display list items is specific to RemoteImageBuffer anyways. If I did that, then I would be able to remove these casts)
Sam Weinig
Comment 8 2020-11-07 11:48:27 PST
(In reply to Wenson Hsieh from comment #7) > Comment on attachment 413526 [details] > Tweak ChangeLog wording > > View in context: > https://bugs.webkit.org/attachment.cgi?id=413526&action=review > > >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 > >> + itemBufferClient = static_cast<UnacceleratedRemoteImageBuffer*>(imageBuffer); > > > > What is the purpose of these casts? > > These casts are currently here because RemoteImageBuffer is the thing that > implements ItemBufferReadingClient. > > (That said, perhaps it would be cleaner to make RemoteRenderingBackend the > client, since none of the logic for decoding display list items is specific > to RemoteImageBuffer anyways. If I did that, then I would be able to remove > these casts) I see. That does sound cleaner.
Wenson Hsieh
Comment 9 2020-11-07 13:16:19 PST
Created attachment 413534 [details] Depends on #218689
Radar WebKit Bug Importer
Comment 10 2020-11-08 11:29:15 PST
Simon Fraser (smfr)
Comment 11 2020-11-09 10:40:34 PST
Comment on attachment 413534 [details] Depends on #218689 View in context: https://bugs.webkit.org/attachment.cgi?id=413534&action=review > Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.cpp:34 > + unreadBytesHandle() -= amount; Check for underflow? > Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.h:42 > + std::unique_ptr<WebCore::DisplayList::DisplayList> createDisplayList(size_t offset, size_t capacity, WebCore::DisplayList::ItemBufferReadingClient&) const; Maybe readDisplayListFromHandle() to not make it sound like it creates a fresh, empty DisplayList. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:128 > +void RemoteRenderingBackend::wakeUpAndProcessDisplayList(WebCore::DisplayList::ItemBufferIdentifier initialIdentifier, uint64_t initialOffset, RenderingResourceIdentifier renderingResourceIdentifier) renderingResourceIdentifier -> destinationBufferIdentifier > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:130 > + auto imageBuffer = m_remoteResourceCache.cachedImageBuffer(renderingResourceIdentifier); Why is this getting a designation buffer from the source cache? Resouce cache buffers are readonly in the GPU process. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:132 > // FIXME: Add a message check to terminate the web process. I think we need to do this very soon. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 > + SharedDisplayListHandle::Lock locker { *sharedHandle }; The cool kids do something like auto locker = SharedDisplayListHandle::Lock { ... } > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:151 > + imageBuffer->flushDisplayList(*sharedHandle->createDisplayList(offset, sizeToRead, *this)); We need to call this something other than "flushDisplayList". It's more "render display list". > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:153 > + offset += sizeToRead; Do we need overflow checks? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:166 > + sharedHandle = m_sharedDisplayListHandles.get(nextIdentifier); I think I'd feel more comfortable if you didn't change the sharedHandle in this loop, since sharedHandle and sizeToRead go together and it would be easy to introduce mistakes in future. Can you refractor the code to read from a single handle into its own function? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:203 > +void RemoteRenderingBackend::didCreateSharedDisplayListHandle(DisplayList::ItemBufferIdentifier identifier, const SharedMemory::IPCHandle& handle, RenderingResourceIdentifier remoteResourceIdentifier) remoteResourceIdentifier -> destinationBufferIdentifier > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:205 > + if (auto sharedMemory = SharedMemory::map(handle.handle, SharedMemory::Protection::ReadWrite)) Why are we mapping ReadWrite in the GPUP? Is this just to be able to write unreadBytes? Should that instead be in its own little shared memory handle? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.h:105 > + void didCreateSharedDisplayListHandle(WebCore::DisplayList::ItemBufferIdentifier, const SharedMemory::IPCHandle&, WebCore::RenderingResourceIdentifier); Give RenderingResourceIdentifier a name > Source/WebKit/Shared/SharedDisplayListHandle.h:38 > + static constexpr auto reservedCapacityAtStart = sizeof(bool) + sizeof(size_t); Maybe round this up for better alignment? > Source/WebKit/Shared/SharedDisplayListHandle.h:42 > + size_t unreadBytes() const { return *reinterpret_cast<size_t*>(data() + sizeof(bool)); } Rather than this manual offsetting, how about describing the buffer contents via a struct. > Source/WebKit/Shared/SharedDisplayListHandle.h:58 > + auto& atomicValue = m_handle.atomicLockValue(); > + while (true) { > + std::this_thread::yield(); > + bool unlocked = false; > + if (atomicValue.compare_exchange_weak(unlocked, true)) > + break; Please check with a locking/scheduling/priority-donating expert. > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:35 > + m_writableOffset += amount; > + unreadBytesHandle() += amount; Check for overflow? > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:40 > + return sharedMemory().size() - writableOffset(); Check for underflow? > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:55 > + ASSERT(m_writableOffset == SharedDisplayListHandle::reservedCapacityAtStart); RELEASE_ASSERT? > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:59 > + Lock locker { *this }; auto locker = > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:129 > + SharedDisplayListHandle::Lock locker { *sharedHandle }; auto locker =
Wenson Hsieh
Comment 12 2020-11-09 11:48:54 PST
Comment on attachment 413534 [details] Depends on #218689 View in context: https://bugs.webkit.org/attachment.cgi?id=413534&action=review >> Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.cpp:34 >> + unreadBytesHandle() -= amount; > > Check for underflow? Done! Made this robust against underflow. >> Source/WebKit/GPUProcess/graphics/DisplayListReaderHandle.h:42 >> + std::unique_ptr<WebCore::DisplayList::DisplayList> createDisplayList(size_t offset, size_t capacity, WebCore::DisplayList::ItemBufferReadingClient&) const; > > Maybe readDisplayListFromHandle() to not make it sound like it creates a fresh, empty DisplayList. I do agree `readDisplayListFromHandle` is better than my current name, but at the same time `readDisplayListFromHandle` sounds like it's performing an action (namely, reading a display list). I'll change this to `displayListForReadingFromHandle`. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:128 >> +void RemoteRenderingBackend::wakeUpAndProcessDisplayList(WebCore::DisplayList::ItemBufferIdentifier initialIdentifier, uint64_t initialOffset, RenderingResourceIdentifier renderingResourceIdentifier) > > renderingResourceIdentifier -> destinationBufferIdentifier Changed to destinationBufferIdentifier. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:130 >> + auto imageBuffer = m_remoteResourceCache.cachedImageBuffer(renderingResourceIdentifier); > > Why is this getting a designation buffer from the source cache? Resouce cache buffers are readonly in the GPU process. The current status quo is that remote image buffers that back canvases are tracked in the GPU process as cached resources (see RemoteRenderingBackend::createImageBuffer, above). I agree it's a bit confusing, but this might have benefits (e.g. when painting from canvas to canvas). Perhaps RemoteResourceCache and its cacheImageBuffer method should be renamed to something more general? (For example, RemoteResourceStore::addImageBuffer or RemoteResourceRegistry::registerImageBuffer). >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:132 >> // FIXME: Add a message check to terminate the web process. > > I think we need to do this very soon. Yes! >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:146 >> + SharedDisplayListHandle::Lock locker { *sharedHandle }; > > The cool kids do something like auto locker = SharedDisplayListHandle::Lock { ... } 👍🏻 >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:151 >> + imageBuffer->flushDisplayList(*sharedHandle->createDisplayList(offset, sizeToRead, *this)); > > We need to call this something other than "flushDisplayList". It's more "render display list". `renderDisplayList` sounds good to me. I think I'll split this renaming out into a different patch. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:153 >> + offset += sizeToRead; > > Do we need overflow checks? Added! >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:166 >> + sharedHandle = m_sharedDisplayListHandles.get(nextIdentifier); > > I think I'd feel more comfortable if you didn't change the sharedHandle in this loop, since sharedHandle and sizeToRead go together and it would be easy to introduce mistakes in future. Can you refractor the code to read from a single handle into its own function? Yeah, that makes sense — I will split out the logic for reading from a single shared handle into a new private helper. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:203 >> +void RemoteRenderingBackend::didCreateSharedDisplayListHandle(DisplayList::ItemBufferIdentifier identifier, const SharedMemory::IPCHandle& handle, RenderingResourceIdentifier remoteResourceIdentifier) > > remoteResourceIdentifier -> destinationBufferIdentifier Fixed! >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:205 >> + if (auto sharedMemory = SharedMemory::map(handle.handle, SharedMemory::Protection::ReadWrite)) > > Why are we mapping ReadWrite in the GPUP? Is this just to be able to write unreadBytes? Should that instead be in its own little shared memory handle? Yes, it is so that we can update the "unread bytes" count. I recall discussing it with Geoff, and he suggested that we should just stick it in the first bit of each shared memory chunk instead of taking up a whole new page for an atomic. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.h:105 >> + void didCreateSharedDisplayListHandle(WebCore::DisplayList::ItemBufferIdentifier, const SharedMemory::IPCHandle&, WebCore::RenderingResourceIdentifier); > > Give RenderingResourceIdentifier a name 👍🏻 "destinationBufferIdentifier" >> Source/WebKit/Shared/SharedDisplayListHandle.h:38 >> + static constexpr auto reservedCapacityAtStart = sizeof(bool) + sizeof(size_t); > > Maybe round this up for better alignment? Done! (changed to 16 bytes) >> Source/WebKit/Shared/SharedDisplayListHandle.h:58 >> + break; > > Please check with a locking/scheduling/priority-donating expert. I came up with this part with some help from Geoff; perhaps he has some more thoughts on this! One thing we should investigate is changing this to use a single atomic size_t counter instead of an atomic lock. When I tried doing this, however, it ended up hurting performance for (as-of-yet) unknown reasons... >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:35 >> + unreadBytesHandle() += amount; > > Check for overflow? Fixed! >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:40 >> + return sharedMemory().size() - writableOffset(); > > Check for underflow? Fixed! >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:55 >> + ASSERT(m_writableOffset == SharedDisplayListHandle::reservedCapacityAtStart); > > RELEASE_ASSERT? Changed to RELEASE_ASSERT. >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:59 >> + Lock locker { *this }; > > auto locker = Fixed! >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:129 >> + SharedDisplayListHandle::Lock locker { *sharedHandle }; > > auto locker = Fixed!
Wenson Hsieh
Comment 13 2020-11-10 08:52:46 PST Comment hidden (obsolete)
Wenson Hsieh
Comment 14 2020-11-10 13:20:10 PST
Created attachment 413729 [details] Use CheckedSize instead of Checked<>
Ryosuke Niwa
Comment 15 2020-11-10 16:53:13 PST
Comment on attachment 413729 [details] Use CheckedSize instead of Checked<> View in context: https://bugs.webkit.org/attachment.cgi?id=413729&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:136 > + auto locker = SharedDisplayListHandle::Lock { handle }; > + sizeToRead = handle.unreadBytes(); This seems error prone. Can't we just add the lock to unreadBytes() instead? > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:167 > + auto locker = SharedDisplayListHandle::Lock { handle }; > + handle.advance(sizeToRead); > + sizeToRead = handle.unreadBytes(); Ditto about moving the lock into advance. Maybe advance can return the new unread bytes value. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:181 > + auto initialHandle = m_sharedDisplayListHandles.get(initialIdentifier); > + if (UNLIKELY(!initialHandle)) { As we discussed, it's a bit dangerous to rely on the fact the map won't be mutated while we're accessing this handle. We should either make the handle ref-counted or add some kind of message check to make sure we don't remove stuff from the map. > Source/WebKit/Shared/SharedDisplayListHandle.h:57 > + std::this_thread::yield(); Why are we yielding before checking the value at least once? That seems rather inefficient. Also, use Thread::yield? > Source/WebKit/Shared/SharedDisplayListHandle.h:61 > + } We should probably limit ourselves from spinning forever. We probably need an equivalent of parking lot cross-process. e.g. Use the OS-level lock feature for example. We can do that in a follow up patch though. > Source/WebKit/Shared/SharedDisplayListHandle.h:83 > + struct DisplayListHandleStructure { I would have called this DisplayListSharedMemoryHeader or something. "Structure" sounds rather vague & sounds like something related to JSC::Structure, which it is not. > Source/WebKit/Shared/SharedDisplayListHandle.h:88 > + std::atomic<uint64_t>& atomicLockValue() { return structure().lock; } Maybe use WTF::Atomic instead? > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:132 > + auto locker = SharedDisplayListHandle::Lock { *sharedHandle }; > + > + bool unreadCountWasEmpty = !sharedHandle->unreadBytes(); > + sharedHandle->advance(handle.capacity); It seems like this is the only part we need to put under lock? Can we move the lock into advance? Maybe advance can return the old & new values of unread bytes.
Ryosuke Niwa
Comment 16 2020-11-10 16:54:45 PST
Comment on attachment 413729 [details] Use CheckedSize instead of Checked<> View in context: https://bugs.webkit.org/attachment.cgi?id=413729&action=review > Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:81 > + auto locker = Lock { *this }; > + if (unreadBytes()) We should move the lock to unreadBytes(). There is no need to keep holding onto the lock when we're updating m_writableOffset.
Wenson Hsieh
Comment 17 2020-11-10 20:14:40 PST
Comment on attachment 413729 [details] Use CheckedSize instead of Checked<> View in context: https://bugs.webkit.org/attachment.cgi?id=413729&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:136 >> + sizeToRead = handle.unreadBytes(); > > This seems error prone. Can't we just add the lock to unreadBytes() instead? Good call — done! Also made the inner Lock class protected, since the code using these handles won't need to manually lock anymore. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:167 >> + sizeToRead = handle.unreadBytes(); > > Ditto about moving the lock into advance. Maybe advance can return the new unread bytes value. Added to advance() too, and made it return the new value. >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:181 >> + if (UNLIKELY(!initialHandle)) { > > As we discussed, it's a bit dangerous to rely on the fact the map won't be mutated while we're accessing this handle. > We should either make the handle ref-counted or add some kind of message check to make sure we don't remove stuff from the map. Yes — made SharedDisplayListHandle (and its reader/writer subclasses) ref-counted, and also added an early return in `RemoteRenderingBackend::didCreateSharedDisplayListHandle`, in the case where we're attempting to re-register an ID, with a FIXME to addionally turn it into a MESSAGE_CHECK. In an upcoming patch, I'm going to make all of these returns MESSAGE_CHECK and kill the web content process. >> Source/WebKit/Shared/SharedDisplayListHandle.h:57 >> + std::this_thread::yield(); > > Why are we yielding before checking the value at least once? That seems rather inefficient. > Also, use Thread::yield? Whoops, good catch. Changed to use Thread::yield, and also moved it after the break; statement so we'll only yield if we were unsuccessful in grabbing the lock the first time. >> Source/WebKit/Shared/SharedDisplayListHandle.h:61 >> + } > > We should probably limit ourselves from spinning forever. > We probably need an equivalent of parking lot cross-process. > e.g. Use the OS-level lock feature for example. > We can do that in a follow up patch though. Added a FIXME for this. >> Source/WebKit/Shared/SharedDisplayListHandle.h:83 >> + struct DisplayListHandleStructure { > > I would have called this DisplayListSharedMemoryHeader or something. > "Structure" sounds rather vague & sounds like something related to JSC::Structure, which it is not. Sounds good — renamed to DisplayListSharedMemoryHeader (and renamed the structure() method to header()). >> Source/WebKit/Shared/SharedDisplayListHandle.h:88 >> + std::atomic<uint64_t>& atomicLockValue() { return structure().lock; } > > Maybe use WTF::Atomic instead? Done! >> Source/WebKit/WebProcess/GPU/graphics/DisplayListWriterHandle.cpp:81 >> + if (unreadBytes()) > > We should move the lock to unreadBytes(). > There is no need to keep holding onto the lock when we're updating m_writableOffset. Fixed! >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:132 >> + sharedHandle->advance(handle.capacity); > > It seems like this is the only part we need to put under lock? > Can we move the lock into advance? Maybe advance can return the old & new values of unread bytes. Done!
Wenson Hsieh
Comment 18 2020-11-10 20:47:47 PST
Wenson Hsieh
Comment 19 2020-11-10 21:33:59 PST
EWS
Comment 20 2020-11-11 07:07:16 PST
Committed r269682: <https://trac.webkit.org/changeset/269682> All reviewed patches have been landed. Closing bug and clearing flags on attachment 413775 [details].
Note You need to log in before you can comment on or make changes to this bug.