RESOLVED FIXED 230860
[GPU Process] Use IPC streaming for concurrent display list processing
https://bugs.webkit.org/show_bug.cgi?id=230860
Summary [GPU Process] Use IPC streaming for concurrent display list processing
Wenson Hsieh
Reported 2021-09-27 15:10:23 PDT
.
Attachments
Extremely hacky WIP - no resource caching (193.12 KB, patch)
2021-09-27 15:15 PDT, Wenson Hsieh
no flags
WIP - supports cached images (324.15 KB, patch)
2021-10-01 17:17 PDT, Wenson Hsieh
no flags
WIP - clip to mask, get|putImageData, draw glyphs, paint media (359.45 KB, patch)
2021-10-03 16:30 PDT, Wenson Hsieh
no flags
WIP - rebase on trunk (354.75 KB, patch)
2021-10-04 18:10 PDT, Wenson Hsieh
no flags
Final piece (still blocked on subtasks) (98.99 KB, patch)
2021-10-11 08:52 PDT, Wenson Hsieh
no flags
Rebase on trunk (101.37 KB, patch)
2021-10-11 17:38 PDT, Wenson Hsieh
no flags
Rebase again (100.79 KB, patch)
2021-10-12 09:39 PDT, Wenson Hsieh
no flags
Rebase again + fix test failures (100.28 KB, patch)
2021-10-12 15:37 PDT, Wenson Hsieh
no flags
Wenson Hsieh
Comment 1 2021-09-27 15:15:01 PDT
Created attachment 439399 [details] Extremely hacky WIP - no resource caching
Kimmo Kinnunen
Comment 2 2021-09-28 00:23:24 PDT
Comment on attachment 439399 [details] Extremely hacky WIP - no resource caching View in context: https://bugs.webkit.org/attachment.cgi?id=439399&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:705 > + MESSAGE_CHECK(m_renderingDestinationImageBuffer, "No destination image buffer specified."); So this is a bit highlighting the pondering issue: The "Save" is a message to the context of the destination image buffer. How can we be in the middle of dispatching "save" to non-existing object? The implementation would be: void RemoteGraphicsContext::save(DisplayList::Save&& item) { item.apply(m_context); } > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:25 > +messages -> RemoteRenderingBackend NotRefCounted Stream { I know this is WIP and I see the value of having it this way due to existing code. However, I want to put this out there so you can ponder about it. (Also I don't mean to say you didn't already) Nomenclature: ImageBuffer == drawing target == drawing context == display list context == roughly GraphicsContext Now you have structure of: GPUConnectionToWebProcess: - Map of RemoteRenderingBackends RemoteRenderingBackend: - Map of ImageBuffers So identity reference (id) gatekeeping is done at two levels: - I want to send message to RemoteRenderingBackend by this id (CreateImageBuffer) - I want to draw to this ImageBuffer ( HandleMetaCommandChangeDestinationImageBuffer ) The first level of gatekeeping is redundant. Each individual RemoteRenderingBackend do not _really_ contribute to the algorithm. So if you think of C++, you wouldn't necessarily have API like: MyRenderingBackend::handleSave(ImageBuffer, DisplayList::Save); Instead, you would have MyImageBuffer::handleSave(DisplayList::Save) Which we already actually have: GraphicsContext::save(...); The current strangeness is visible in the API, since following can happen: HandleSave HandleMetaCommandChangeDestinationImageBuffer HandleFillRect E.g. it is obvious that the ImageBuffers carry the state that is modified through the RemoteRenderingBackend, while RemoteRenderingBackend cannot carry that. So then the question is: why isn't the command sent to the ImageBuffer? So the structure could be: GPUConnectionToWebProcess: - Map of RemoteImageBuffers / RemoteGraphicsContext / RemoteDisplayListContext GPUConnectionToWebProcess::CreateImageBuffer The "stream" signifies the sequence of ordered operations on a group of objects. So the stream actually already has implemented `HandleMetaCommandChangeDestinationImageBuffer`, it's IPC control message `SetStreamDestinationID`. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:28 > + HandleSave(WebCore::DisplayList::Save item) Also this I know might be just because of how the WIP is organized, but mentioning just in case. If there's value in having the args already in DisplayList items this makes sense. E.g. if we anyway construct the lists. If we don' t construct the lists, then this wraps a message inside a message. So these could be just plain GraphicsContext args. .messages.in: HandleFillRect(FloatRect rect) proxy: void RemoteGraphicsContextProxy::fillRect(FloatRect rect) { m_streamConnection->send(Messages::HandleFillRect { rect } ); } remote: void RemoteGraphicsContext::handleFillRect(Messages::HandleFillRect&& msg) { m_context.fillRect(msg.rect) }
Kimmo Kinnunen
Comment 3 2021-09-28 00:48:02 PDT
Comment on attachment 439399 [details] Extremely hacky WIP - no resource caching View in context: https://bugs.webkit.org/attachment.cgi?id=439399&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:39 > + HandleSetLineCap(WebCore::DisplayList::SetLineCap item) So one example where the logic of "use DisplayList:: items so we can memcpy things around and that's that" is flawed is that the IPC layer was built inherently for the reason of security, and as such the messages need to be verified by the machinery. Currently the DisplayList items are insecure.
Wenson Hsieh
Comment 4 2021-09-28 09:07:55 PDT
(In reply to Kimmo Kinnunen from comment #3) > Comment on attachment 439399 [details] > Extremely hacky WIP - no resource caching > > View in context: > https://bugs.webkit.org/attachment.cgi?id=439399&action=review > > > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:39 > > + HandleSetLineCap(WebCore::DisplayList::SetLineCap item) > > So one example where the logic of "use DisplayList:: items so we can memcpy > things around and that's that" is flawed is that > the IPC layer was built inherently for the reason of security, and as such > the messages need to be verified by the > machinery. Currently the DisplayList items are insecure. Note that this patch drops the `isValid()` checks for each of the inline display list items, which we currently have in trunk. Once we add these back in, DisplayList items sent via SimpleArgumentCoder will no longer be insecure (and validation failure should result in a MESSAGE_CHECK).
Wenson Hsieh
Comment 5 2021-09-28 09:57:19 PDT
Comment on attachment 439399 [details] Extremely hacky WIP - no resource caching View in context: https://bugs.webkit.org/attachment.cgi?id=439399&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:705 >> + MESSAGE_CHECK(m_renderingDestinationImageBuffer, "No destination image buffer specified."); > > So this is a bit highlighting the pondering issue: > The "Save" is a message to the context of the destination image buffer. > How can we be in the middle of dispatching "save" to non-existing object? > > The implementation would be: > > void RemoteGraphicsContext::save(DisplayList::Save&& item) { > item.apply(m_context); > } This is my eventual goal (in this patch, I even included empty RemoteGraphicsContext/RemoteGraphicsContextProxy files that I added with this intention, but did not build on, since adding logic to the existing RRB/RRBP pipeline was more convenient). I plan to make both RRB and the new RemoteGraphicsContext streaming IPC message receivers that share the same StreamServerConnection; then, RemoteGraphicsContextProxy in the web process would be able to directly message RemoteGraphicsContext.
Wenson Hsieh
Comment 6 2021-09-28 10:08:29 PDT
(In reply to Wenson Hsieh from comment #4) > (In reply to Kimmo Kinnunen from comment #3) > > Comment on attachment 439399 [details] > > Extremely hacky WIP - no resource caching > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=439399&action=review > > > > > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:39 > > > + HandleSetLineCap(WebCore::DisplayList::SetLineCap item) > > > > So one example where the logic of "use DisplayList:: items so we can memcpy > > things around and that's that" is flawed is that > > the IPC layer was built inherently for the reason of security, and as such > > the messages need to be verified by the > > machinery. Currently the DisplayList items are insecure. > > Note that this patch drops the `isValid()` checks for each of the inline > display list items, which we currently have in trunk. Once we add these back > in, DisplayList items sent via SimpleArgumentCoder will no longer be > insecure (and validation failure should result in a MESSAGE_CHECK). I would also add that we know for a fact that very fast, low-overhead encoding/decoding of display list items and their arguments is necessary in order to hit our performance goals, particularly on the Canvas subtests. In theory, we could use normal IPC encoding/decoding and validation instead, but this would require us to make equivalent progressions elsewhere to mitigate the performance hit.
Kimmo Kinnunen
Comment 7 2021-09-29 10:06:27 PDT
(In reply to Wenson Hsieh from comment #6) > I would also add that we know for a fact that very fast, low-overhead > encoding/decoding of display list items and their arguments is necessary in > order to hit our performance goals, particularly on the Canvas subtests. > > In theory, we could use normal IPC encoding/decoding and validation instead, > but this would require us to make equivalent progressions elsewhere to > mitigate the performance hit. This statement is just not logical. Encoding of IPC messages filled with display list items: - Put draw args to display list item - memcpy item to message - send message - receive message - memcpy item from message - extract args from display list item - call with args Encoding of IPC messages filled with draw args: - Put draw args to message - send message - receive message - extract args from display list item - call with args All other things being equal, this does not make sense. The only way it would make sense is we already had the display list items. Think of it the other way: Putting FloatRect into display list item is *superior* to putting it to plain IPC message directly. Why would we use inferior method for IPC messages? Where is the magic that encodes FloatRects faster than IPC encoder? What is the fundamental reason that magic is not the default mode of the IPC encoder?
Kimmo Kinnunen
Comment 8 2021-09-29 10:07:14 PDT
Err, it should have been: Encoding of IPC messages filled with draw args: - Put draw args to message - send message - receive message - extract args from message - call with args
Wenson Hsieh
Comment 9 2021-09-29 10:23:22 PDT
(In reply to Kimmo Kinnunen from comment #7) > (In reply to Wenson Hsieh from comment #6) > > > I would also add that we know for a fact that very fast, low-overhead > > encoding/decoding of display list items and their arguments is necessary in > > order to hit our performance goals, particularly on the Canvas subtests. > > > > In theory, we could use normal IPC encoding/decoding and validation instead, > > but this would require us to make equivalent progressions elsewhere to > > mitigate the performance hit. > > This statement is just not logical. > Encoding of IPC messages filled with display list items: > - Put draw args to display list item > - memcpy item to message > - send message > - receive message > - memcpy item from message > - extract args from display list item > - call with args > > Encoding of IPC messages filled with draw args: > - Put draw args to message > - send message > - receive message > - extract args from display list item > - call with args This is not what I'm comparing against. "normal IPC encoding/decoding" was referring to the established IPC argument encoding technique of allocating a new chunk of memory and encoding the arguments (one fixed-length piece of data at a time), resizing the buffer and copying out the contents of buffer as needed. I would hope that this: > Encoding of IPC messages filled with display list items: ..and this: > Encoding of IPC messages filled with draw args: ..would yield about the same performance in IPC streams, though the latter will make multiple calls to `memcpy` whereas the former would just make a single call. > > > All other things being equal, this does not make sense. The only way it > would make sense is we already had the display list items. > > Think of it the other way: > Putting FloatRect into display list item is *superior* to putting it to > plain IPC message directly. > Why would we use inferior method for IPC messages? > Where is the magic that encodes FloatRects faster than IPC encoder? I'm not sure I totally understand what you mean, but FloatRect encoding is optimized to use SimpleArgumentCoder, which performs a single `memcpy`. > What is the fundamental reason that magic is not the default mode of the IPC > encoder? (..following from above, SimpleArgumentCoder is not the default mode if the IPC encoder because we can't assume objects consist only of memcpy-able data with no pointers by default).
Kimmo Kinnunen
Comment 10 2021-09-29 10:30:08 PDT
Yeah, you're probably right that the dl items will be in practice faster due to the nature of how the IPC encoders iterate field by field. Maybe someday we could just memcpy the messages that have trivially copyable fields.
Wenson Hsieh
Comment 11 2021-10-01 17:17:13 PDT
Created attachment 439939 [details] WIP - supports cached images
Wenson Hsieh
Comment 12 2021-10-03 16:30:59 PDT
Created attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media
Kimmo Kinnunen
Comment 13 2021-10-04 11:06:26 PDT
Comment on attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media View in context: https://bugs.webkit.org/attachment.cgi?id=440023&action=review Such an amazingly clean patch for a WIP. Some small comment, did not go through all of it yet. > Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:61 > + m_remoteRenderingBackend.streamConnection().startReceivingMessages(m_context.get(), Messages::RemoteGraphicsContext::messageReceiverName(), renderingResourceIdentifier.toUInt64()); You need the startReceivingIPC - stopReceivingIPC pattern, since you will start receiving vfunc calls immediately after this method call, and the vtable ptr is not yet set up. (or initialize-stopReceivingIPC, cannot remember which). > Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:67 > + m_remoteRenderingBackend.streamConnection().stopReceivingMessages(Messages::RemoteGraphicsContext::messageReceiverName(), m_renderingResourceIdentifier.toUInt64()); The object lifetime has already ended, and you might still receive messages until this method returns. So the initialize - stopReceivingIPC is needed. > Source/WebKit/Platform/IPC/Timeout.h:37 > + : m_deadline(std::isinf(timeDelta) ? MonotonicTime::infinity() : MonotonicTime::now() + timeDelta) Great addition! Overall, I think passing infinity generally should be a bug, so in production something else is needed for the call sites? Either that approximate time or then maybe we could compute the deadline upon first would-block call? E.g. explain in timeout that it's not strictly accurate timeout, rather it would start approximately during the first blocking call..
Wenson Hsieh
Comment 14 2021-10-04 11:48:01 PDT
Comment on attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media View in context: https://bugs.webkit.org/attachment.cgi?id=440023&action=review >> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:61 >> + m_remoteRenderingBackend.streamConnection().startReceivingMessages(m_context.get(), Messages::RemoteGraphicsContext::messageReceiverName(), renderingResourceIdentifier.toUInt64()); > > You need the startReceivingIPC - stopReceivingIPC pattern, since you will start receiving vfunc calls immediately after this method call, and the vtable ptr is not yet set up. > (or initialize-stopReceivingIPC, cannot remember which). Good catch! Definitely going to fix this for my next WIP iteration. >> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:67 >> + m_remoteRenderingBackend.streamConnection().stopReceivingMessages(Messages::RemoteGraphicsContext::messageReceiverName(), m_renderingResourceIdentifier.toUInt64()); > > The object lifetime has already ended, and you might still receive messages until this method returns. So the initialize - stopReceivingIPC is needed. 👍🏻 >> Source/WebKit/Platform/IPC/Timeout.h:37 >> + : m_deadline(std::isinf(timeDelta) ? MonotonicTime::infinity() : MonotonicTime::now() + timeDelta) > > Great addition! > > Overall, I think passing infinity generally should be a bug, so in production something else is needed for the call sites? > Either that approximate time or then maybe we could compute the deadline upon first would-block call? > E.g. explain in timeout that it's not strictly accurate timeout, rather it would start approximately during the first blocking call.. Ah, yes — using ∞ and adding the optimization here was just meant to be temporary, until I could adopt a faster, coarse-granularity alternative to `MonotonicTime::now()`. Yusuke recently added `WTF::ApproximateTime` in r283161, so my next step was to adopt that, change the ∞ timeouts in this patch to something reasonable, and finally re-run Canvas Lines to verify that it's sufficient to address the performance regression on this subtest.
Radar WebKit Bug Importer
Comment 15 2021-10-04 15:11:21 PDT Comment hidden (obsolete)
Wenson Hsieh
Comment 16 2021-10-04 16:09:05 PDT
Comment on attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media View in context: https://bugs.webkit.org/attachment.cgi?id=440023&action=review >>> Source/WebKit/Platform/IPC/Timeout.h:37 >>> + : m_deadline(std::isinf(timeDelta) ? MonotonicTime::infinity() : MonotonicTime::now() + timeDelta) >> >> Great addition! >> >> Overall, I think passing infinity generally should be a bug, so in production something else is needed for the call sites? >> Either that approximate time or then maybe we could compute the deadline upon first would-block call? >> E.g. explain in timeout that it's not strictly accurate timeout, rather it would start approximately during the first blocking call.. > > Ah, yes — using ∞ and adding the optimization here was just meant to be temporary, until I could adopt a faster, coarse-granularity alternative to `MonotonicTime::now()`. > > Yusuke recently added `WTF::ApproximateTime` in r283161, so my next step was to adopt that, change the ∞ timeouts in this patch to something reasonable, and finally re-run Canvas Lines to verify that it's sufficient to address the performance regression on this subtest. Hm…using `ApproximateTime::now()` does partially mitigate the regression, but using ∞ and skipping the `mach_(approximate|absolute)_time` call altogether is still significantly faster. I might stick with an indefinite wait for now, with a FIXME to refactor it to speed this up. I suspect that we might be able to cache an IPC timeout once per frame, and use that instead.
Wenson Hsieh
Comment 17 2021-10-04 18:10:40 PDT Comment hidden (obsolete)
Jon Lee
Comment 18 2021-10-06 14:27:18 PDT
Wenson Hsieh
Comment 19 2021-10-11 08:52:26 PDT Comment hidden (obsolete)
Wenson Hsieh
Comment 20 2021-10-11 17:38:17 PDT Comment hidden (obsolete)
Wenson Hsieh
Comment 21 2021-10-12 09:39:00 PDT
Created attachment 440949 [details] Rebase again
Tim Horton
Comment 22 2021-10-12 12:28:51 PDT
Comment on attachment 440949 [details] Rebase again View in context: https://bugs.webkit.org/attachment.cgi?id=440949&action=review > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:78 > + m_streamConnection = makeUnique<IPC::StreamClientConnection>(gpuProcessConnection.connection(), 1 << 21); What in the world is this 1 << 21
Wenson Hsieh
Comment 23 2021-10-12 12:34:07 PDT
Comment on attachment 440949 [details] Rebase again View in context: https://bugs.webkit.org/attachment.cgi?id=440949&action=review >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:78 >> + m_streamConnection = makeUnique<IPC::StreamClientConnection>(gpuProcessConnection.connection(), 1 << 21); > > What in the world is this 1 << 21 …good point! I'll pull this out into a `constexpr` variable. > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:204 > + sendSyncToStream(Messages::RemoteRenderingBackend::GetDataURLForImageBuffer(mimeType, quality, preserveResolution, renderingResourceIdentifier), Messages::RemoteRenderingBackend::GetDataURLForImageBuffer::Reply(urlString), 3_s); Also going to change these timeouts back to 1_s from 3_s, as we discussed. I think I changed these when debugging `get/putImageData` and forgot to change them back.
Wenson Hsieh
Comment 24 2021-10-12 15:37:46 PDT
Created attachment 441004 [details] Rebase again + fix test failures
EWS
Comment 25 2021-10-12 22:50:43 PDT
Committed r284079 (242907@main): <https://commits.webkit.org/242907@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 441004 [details].
Note You need to log in before you can comment on or make changes to this bug.