230860 – [GPU Process] Use IPC streaming for concurrent display list processing

RESOLVED FIXED 230860

[GPU Process] Use IPC streaming for concurrent display list processing

https://bugs.webkit.org/show_bug.cgi?id=230860

Summary [GPU Process] Use IPC streaming for concurrent display list processing

Wenson Hsieh

Reported 2021-09-27 15:10:23 PDT

Attachments
Extremely hacky WIP - no resource caching (193.12 KB, patch) 2021-09-27 15:15 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
WIP - supports cached images (324.15 KB, patch) 2021-10-01 17:17 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
WIP - clip to mask, get\|putImageData, draw glyphs, paint media (359.45 KB, patch) 2021-10-03 16:30 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
WIP - rebase on trunk (354.75 KB, patch) 2021-10-04 18:10 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Final piece (still blocked on subtasks) (98.99 KB, patch) 2021-10-11 08:52 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Rebase on trunk (101.37 KB, patch) 2021-10-11 17:38 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Rebase again (100.79 KB, patch) 2021-10-12 09:39 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Rebase again + fix test failures (100.28 KB, patch) 2021-10-12 15:37 PDT, Wenson Hsieh	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Wenson Hsieh

Comment 1 2021-09-27 15:15:01 PDT

Created attachment 439399 [details] Extremely hacky WIP - no resource caching

Kimmo Kinnunen

Comment 2 2021-09-28 00:23:24 PDT

Comment on attachment 439399 [details] Extremely hacky WIP - no resource caching View in context: https://bugs.webkit.org/attachment.cgi?id=439399&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:705 > + MESSAGE_CHECK(m_renderingDestinationImageBuffer, "No destination image buffer specified."); So this is a bit highlighting the pondering issue: The "Save" is a message to the context of the destination image buffer. How can we be in the middle of dispatching "save" to non-existing object? The implementation would be: void RemoteGraphicsContext::save(DisplayList::Save&& item) { item.apply(m_context); } > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:25 > +messages -> RemoteRenderingBackend NotRefCounted Stream { I know this is WIP and I see the value of having it this way due to existing code. However, I want to put this out there so you can ponder about it. (Also I don't mean to say you didn't already) Nomenclature: ImageBuffer == drawing target == drawing context == display list context == roughly GraphicsContext Now you have structure of: GPUConnectionToWebProcess: - Map of RemoteRenderingBackends RemoteRenderingBackend: - Map of ImageBuffers So identity reference (id) gatekeeping is done at two levels: - I want to send message to RemoteRenderingBackend by this id (CreateImageBuffer) - I want to draw to this ImageBuffer ( HandleMetaCommandChangeDestinationImageBuffer ) The first level of gatekeeping is redundant. Each individual RemoteRenderingBackend do not _really_ contribute to the algorithm. So if you think of C++, you wouldn't necessarily have API like: MyRenderingBackend::handleSave(ImageBuffer, DisplayList::Save); Instead, you would have MyImageBuffer::handleSave(DisplayList::Save) Which we already actually have: GraphicsContext::save(...); The current strangeness is visible in the API, since following can happen: HandleSave HandleMetaCommandChangeDestinationImageBuffer HandleFillRect E.g. it is obvious that the ImageBuffers carry the state that is modified through the RemoteRenderingBackend, while RemoteRenderingBackend cannot carry that. So then the question is: why isn't the command sent to the ImageBuffer? So the structure could be: GPUConnectionToWebProcess: - Map of RemoteImageBuffers / RemoteGraphicsContext / RemoteDisplayListContext GPUConnectionToWebProcess::CreateImageBuffer The "stream" signifies the sequence of ordered operations on a group of objects. So the stream actually already has implemented `HandleMetaCommandChangeDestinationImageBuffer`, it's IPC control message `SetStreamDestinationID`. > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:28 > + HandleSave(WebCore::DisplayList::Save item) Also this I know might be just because of how the WIP is organized, but mentioning just in case. If there's value in having the args already in DisplayList items this makes sense. E.g. if we anyway construct the lists. If we don' t construct the lists, then this wraps a message inside a message. So these could be just plain GraphicsContext args. .messages.in: HandleFillRect(FloatRect rect) proxy: void RemoteGraphicsContextProxy::fillRect(FloatRect rect) { m_streamConnection->send(Messages::HandleFillRect { rect } ); } remote: void RemoteGraphicsContext::handleFillRect(Messages::HandleFillRect&& msg) { m_context.fillRect(msg.rect) }

Kimmo Kinnunen

Comment 3 2021-09-28 00:48:02 PDT

Comment on attachment 439399 [details] Extremely hacky WIP - no resource caching View in context: https://bugs.webkit.org/attachment.cgi?id=439399&action=review > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:39 > + HandleSetLineCap(WebCore::DisplayList::SetLineCap item) So one example where the logic of "use DisplayList:: items so we can memcpy things around and that's that" is flawed is that the IPC layer was built inherently for the reason of security, and as such the messages need to be verified by the machinery. Currently the DisplayList items are insecure.

Wenson Hsieh

Comment 4 2021-09-28 09:07:55 PDT

(In reply to Kimmo Kinnunen from comment #3) > Comment on attachment 439399 [details] > Extremely hacky WIP - no resource caching > > View in context: > https://bugs.webkit.org/attachment.cgi?id=439399&action=review > > > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:39 > > + HandleSetLineCap(WebCore::DisplayList::SetLineCap item) > > So one example where the logic of "use DisplayList:: items so we can memcpy > things around and that's that" is flawed is that > the IPC layer was built inherently for the reason of security, and as such > the messages need to be verified by the > machinery. Currently the DisplayList items are insecure. Note that this patch drops the `isValid()` checks for each of the inline display list items, which we currently have in trunk. Once we add these back in, DisplayList items sent via SimpleArgumentCoder will no longer be insecure (and validation failure should result in a MESSAGE_CHECK).

Wenson Hsieh

Comment 5 2021-09-28 09:57:19 PDT

Comment on attachment 439399 [details] Extremely hacky WIP - no resource caching View in context: https://bugs.webkit.org/attachment.cgi?id=439399&action=review >> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:705 >> + MESSAGE_CHECK(m_renderingDestinationImageBuffer, "No destination image buffer specified."); > > So this is a bit highlighting the pondering issue: > The "Save" is a message to the context of the destination image buffer. > How can we be in the middle of dispatching "save" to non-existing object? > > The implementation would be: > > void RemoteGraphicsContext::save(DisplayList::Save&& item) { > item.apply(m_context); > } This is my eventual goal (in this patch, I even included empty RemoteGraphicsContext/RemoteGraphicsContextProxy files that I added with this intention, but did not build on, since adding logic to the existing RRB/RRBP pipeline was more convenient). I plan to make both RRB and the new RemoteGraphicsContext streaming IPC message receivers that share the same StreamServerConnection; then, RemoteGraphicsContextProxy in the web process would be able to directly message RemoteGraphicsContext.

Wenson Hsieh

Comment 6 2021-09-28 10:08:29 PDT

(In reply to Wenson Hsieh from comment #4) > (In reply to Kimmo Kinnunen from comment #3) > > Comment on attachment 439399 [details] > > Extremely hacky WIP - no resource caching > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=439399&action=review > > > > > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.messages.in:39 > > > + HandleSetLineCap(WebCore::DisplayList::SetLineCap item) > > > > So one example where the logic of "use DisplayList:: items so we can memcpy > > things around and that's that" is flawed is that > > the IPC layer was built inherently for the reason of security, and as such > > the messages need to be verified by the > > machinery. Currently the DisplayList items are insecure. > > Note that this patch drops the `isValid()` checks for each of the inline > display list items, which we currently have in trunk. Once we add these back > in, DisplayList items sent via SimpleArgumentCoder will no longer be > insecure (and validation failure should result in a MESSAGE_CHECK). I would also add that we know for a fact that very fast, low-overhead encoding/decoding of display list items and their arguments is necessary in order to hit our performance goals, particularly on the Canvas subtests. In theory, we could use normal IPC encoding/decoding and validation instead, but this would require us to make equivalent progressions elsewhere to mitigate the performance hit.

Kimmo Kinnunen

Comment 7 2021-09-29 10:06:27 PDT

(In reply to Wenson Hsieh from comment #6) > I would also add that we know for a fact that very fast, low-overhead > encoding/decoding of display list items and their arguments is necessary in > order to hit our performance goals, particularly on the Canvas subtests. > > In theory, we could use normal IPC encoding/decoding and validation instead, > but this would require us to make equivalent progressions elsewhere to > mitigate the performance hit. This statement is just not logical. Encoding of IPC messages filled with display list items: - Put draw args to display list item - memcpy item to message - send message - receive message - memcpy item from message - extract args from display list item - call with args Encoding of IPC messages filled with draw args: - Put draw args to message - send message - receive message - extract args from display list item - call with args All other things being equal, this does not make sense. The only way it would make sense is we already had the display list items. Think of it the other way: Putting FloatRect into display list item is *superior* to putting it to plain IPC message directly. Why would we use inferior method for IPC messages? Where is the magic that encodes FloatRects faster than IPC encoder? What is the fundamental reason that magic is not the default mode of the IPC encoder?

Kimmo Kinnunen

Comment 8 2021-09-29 10:07:14 PDT

Err, it should have been: Encoding of IPC messages filled with draw args: - Put draw args to message - send message - receive message - extract args from message - call with args

Wenson Hsieh

Comment 9 2021-09-29 10:23:22 PDT

(In reply to Kimmo Kinnunen from comment #7) > (In reply to Wenson Hsieh from comment #6) > > > I would also add that we know for a fact that very fast, low-overhead > > encoding/decoding of display list items and their arguments is necessary in > > order to hit our performance goals, particularly on the Canvas subtests. > > > > In theory, we could use normal IPC encoding/decoding and validation instead, > > but this would require us to make equivalent progressions elsewhere to > > mitigate the performance hit. > > This statement is just not logical. > Encoding of IPC messages filled with display list items: > - Put draw args to display list item > - memcpy item to message > - send message > - receive message > - memcpy item from message > - extract args from display list item > - call with args > > Encoding of IPC messages filled with draw args: > - Put draw args to message > - send message > - receive message > - extract args from display list item > - call with args This is not what I'm comparing against. "normal IPC encoding/decoding" was referring to the established IPC argument encoding technique of allocating a new chunk of memory and encoding the arguments (one fixed-length piece of data at a time), resizing the buffer and copying out the contents of buffer as needed. I would hope that this: > Encoding of IPC messages filled with display list items: ..and this: > Encoding of IPC messages filled with draw args: ..would yield about the same performance in IPC streams, though the latter will make multiple calls to `memcpy` whereas the former would just make a single call. > > > All other things being equal, this does not make sense. The only way it > would make sense is we already had the display list items. > > Think of it the other way: > Putting FloatRect into display list item is *superior* to putting it to > plain IPC message directly. > Why would we use inferior method for IPC messages? > Where is the magic that encodes FloatRects faster than IPC encoder? I'm not sure I totally understand what you mean, but FloatRect encoding is optimized to use SimpleArgumentCoder, which performs a single `memcpy`. > What is the fundamental reason that magic is not the default mode of the IPC > encoder? (..following from above, SimpleArgumentCoder is not the default mode if the IPC encoder because we can't assume objects consist only of memcpy-able data with no pointers by default).

Kimmo Kinnunen

Comment 10 2021-09-29 10:30:08 PDT

Yeah, you're probably right that the dl items will be in practice faster due to the nature of how the IPC encoders iterate field by field. Maybe someday we could just memcpy the messages that have trivially copyable fields.

Wenson Hsieh

Comment 11 2021-10-01 17:17:13 PDT

Created attachment 439939 [details] WIP - supports cached images

Wenson Hsieh

Comment 12 2021-10-03 16:30:59 PDT

Created attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media

Kimmo Kinnunen

Comment 13 2021-10-04 11:06:26 PDT

Comment on attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media View in context: https://bugs.webkit.org/attachment.cgi?id=440023&action=review Such an amazingly clean patch for a WIP. Some small comment, did not go through all of it yet. > Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:61 > + m_remoteRenderingBackend.streamConnection().startReceivingMessages(m_context.get(), Messages::RemoteGraphicsContext::messageReceiverName(), renderingResourceIdentifier.toUInt64()); You need the startReceivingIPC - stopReceivingIPC pattern, since you will start receiving vfunc calls immediately after this method call, and the vtable ptr is not yet set up. (or initialize-stopReceivingIPC, cannot remember which). > Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:67 > + m_remoteRenderingBackend.streamConnection().stopReceivingMessages(Messages::RemoteGraphicsContext::messageReceiverName(), m_renderingResourceIdentifier.toUInt64()); The object lifetime has already ended, and you might still receive messages until this method returns. So the initialize - stopReceivingIPC is needed. > Source/WebKit/Platform/IPC/Timeout.h:37 > + : m_deadline(std::isinf(timeDelta) ? MonotonicTime::infinity() : MonotonicTime::now() + timeDelta) Great addition! Overall, I think passing infinity generally should be a bug, so in production something else is needed for the call sites? Either that approximate time or then maybe we could compute the deadline upon first would-block call? E.g. explain in timeout that it's not strictly accurate timeout, rather it would start approximately during the first blocking call..

Wenson Hsieh

Comment 14 2021-10-04 11:48:01 PDT

Comment on attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media View in context: https://bugs.webkit.org/attachment.cgi?id=440023&action=review >> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:61 >> + m_remoteRenderingBackend.streamConnection().startReceivingMessages(m_context.get(), Messages::RemoteGraphicsContext::messageReceiverName(), renderingResourceIdentifier.toUInt64()); > > You need the startReceivingIPC - stopReceivingIPC pattern, since you will start receiving vfunc calls immediately after this method call, and the vtable ptr is not yet set up. > (or initialize-stopReceivingIPC, cannot remember which). Good catch! Definitely going to fix this for my next WIP iteration. >> Source/WebKit/GPUProcess/graphics/RemoteImageBuffer.h:67 >> + m_remoteRenderingBackend.streamConnection().stopReceivingMessages(Messages::RemoteGraphicsContext::messageReceiverName(), m_renderingResourceIdentifier.toUInt64()); > > The object lifetime has already ended, and you might still receive messages until this method returns. So the initialize - stopReceivingIPC is needed. 👍🏻 >> Source/WebKit/Platform/IPC/Timeout.h:37 >> + : m_deadline(std::isinf(timeDelta) ? MonotonicTime::infinity() : MonotonicTime::now() + timeDelta) > > Great addition! > > Overall, I think passing infinity generally should be a bug, so in production something else is needed for the call sites? > Either that approximate time or then maybe we could compute the deadline upon first would-block call? > E.g. explain in timeout that it's not strictly accurate timeout, rather it would start approximately during the first blocking call.. Ah, yes — using ∞ and adding the optimization here was just meant to be temporary, until I could adopt a faster, coarse-granularity alternative to `MonotonicTime::now()`. Yusuke recently added `WTF::ApproximateTime` in r283161, so my next step was to adopt that, change the ∞ timeouts in this patch to something reasonable, and finally re-run Canvas Lines to verify that it's sufficient to address the performance regression on this subtest.

Radar WebKit Bug Importer

Comment 15 2021-10-04 15:11:21 PDT Comment hidden (obsolete)

<rdar://problem/83857897>

Wenson Hsieh

Comment 16 2021-10-04 16:09:05 PDT

Comment on attachment 440023 [details] WIP - clip to mask, get|putImageData, draw glyphs, paint media View in context: https://bugs.webkit.org/attachment.cgi?id=440023&action=review >>> Source/WebKit/Platform/IPC/Timeout.h:37 >>> + : m_deadline(std::isinf(timeDelta) ? MonotonicTime::infinity() : MonotonicTime::now() + timeDelta) >> >> Great addition! >> >> Overall, I think passing infinity generally should be a bug, so in production something else is needed for the call sites? >> Either that approximate time or then maybe we could compute the deadline upon first would-block call? >> E.g. explain in timeout that it's not strictly accurate timeout, rather it would start approximately during the first blocking call.. > > Ah, yes — using ∞ and adding the optimization here was just meant to be temporary, until I could adopt a faster, coarse-granularity alternative to `MonotonicTime::now()`. > > Yusuke recently added `WTF::ApproximateTime` in r283161, so my next step was to adopt that, change the ∞ timeouts in this patch to something reasonable, and finally re-run Canvas Lines to verify that it's sufficient to address the performance regression on this subtest. Hm…using `ApproximateTime::now()` does partially mitigate the regression, but using ∞ and skipping the `mach_(approximate|absolute)_time` call altogether is still significantly faster. I might stick with an indefinite wait for now, with a FIXME to refactor it to speed this up. I suspect that we might be able to cache an IPC timeout once per frame, and use that instead.

Wenson Hsieh

Comment 17 2021-10-04 18:10:40 PDT Comment hidden (obsolete)

Created attachment 440135 [details] WIP - rebase on trunk

Jon Lee

Comment 18 2021-10-06 14:27:18 PDT

rdar://83438038

Wenson Hsieh

Comment 19 2021-10-11 08:52:26 PDT Comment hidden (obsolete)

Created attachment 440798 [details] Final piece (still blocked on subtasks)

Wenson Hsieh

Comment 20 2021-10-11 17:38:17 PDT Comment hidden (obsolete)

Created attachment 440866 [details] Rebase on trunk

Wenson Hsieh

Comment 21 2021-10-12 09:39:00 PDT

Created attachment 440949 [details] Rebase again

Tim Horton

Comment 22 2021-10-12 12:28:51 PDT

Comment on attachment 440949 [details] Rebase again View in context: https://bugs.webkit.org/attachment.cgi?id=440949&action=review > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:78 > + m_streamConnection = makeUnique<IPC::StreamClientConnection>(gpuProcessConnection.connection(), 1 << 21); What in the world is this 1 << 21

Wenson Hsieh

Comment 23 2021-10-12 12:34:07 PDT

Comment on attachment 440949 [details] Rebase again View in context: https://bugs.webkit.org/attachment.cgi?id=440949&action=review >> Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:78 >> + m_streamConnection = makeUnique<IPC::StreamClientConnection>(gpuProcessConnection.connection(), 1 << 21); > > What in the world is this 1 << 21 …good point! I'll pull this out into a `constexpr` variable. > Source/WebKit/WebProcess/GPU/graphics/RemoteRenderingBackendProxy.cpp:204 > + sendSyncToStream(Messages::RemoteRenderingBackend::GetDataURLForImageBuffer(mimeType, quality, preserveResolution, renderingResourceIdentifier), Messages::RemoteRenderingBackend::GetDataURLForImageBuffer::Reply(urlString), 3_s); Also going to change these timeouts back to 1_s from 3_s, as we discussed. I think I changed these when debugging `get/putImageData` and forgot to change them back.

Wenson Hsieh

Comment 24 2021-10-12 15:37:46 PDT

Created attachment 441004 [details] Rebase again + fix test failures

EWS

Comment 25 2021-10-12 22:50:43 PDT

Committed r284079 (242907@main): <https://commits.webkit.org/242907@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 441004 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit2

Assignee

Wenson Hsieh

Reported

2021-09-27 15:10 PDT

Modified

2021-10-13 15:06 PDT History

CC List

7 users Show

URL

Keywords InRadar

Depends on

230714 231263 231296 231301 231304 231305 231319 231326 231335 231447 231483 231484 231532 231584

Blocks

231596

Dependencies

tree graph