Bug 231681 - REGRESSION (r284079): fast/canvas/gradient-with-clip.html and fast/canvas/gradient-text-with-shadow.html are flaky failures
Summary: REGRESSION (r284079): fast/canvas/gradient-with-clip.html and fast/canvas/gra...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Canvas (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Wenson Hsieh
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-10-13 09:39 PDT by ayumi_kojima
Modified: 2022-03-25 10:47 PDT (History)
11 users (show)

See Also:


Attachments
Patch (11.17 KB, patch)
2021-10-17 15:13 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Initial approach (11.96 KB, patch)
2021-10-17 15:24 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Workaround to keep tests passing (13.09 KB, patch)
2021-10-18 08:53 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Async approach, v2 (15.04 KB, patch)
2021-10-19 10:38 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Async approach, v2 (14.34 KB, patch)
2021-10-19 10:45 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description ayumi_kojima 2021-10-13 09:39:29 PDT
fast/canvas/gradient-text-with-shadow.html
fast/canvas/gradient-with-clip.html

Are flaky failing on iOS Release and macOS Release wk2.

History: https://results.webkit.org/?suite=layout-tests&suite=layout-tests&test=fast%2Fcanvas%2Fgradient-text-with-shadow.html&test=fast%2Fcanvas%2Fgradient-with-clip.html

Result page: https://build.webkit.org/results/Apple-iOS-15-Simulator-Release-WK2-Tests/r284084%20(149)/results.html#

Diff:

--- /Volumes/Data/worker/ios-simulator-15-release-tests-wk2/build/layout-test-results/fast/canvas/gradient-text-with-shadow-expected.txt
+++ /Volumes/Data/worker/ios-simulator-15-release-tests-wk2/build/layout-test-results/fast/canvas/gradient-text-with-shadow-actual.txt
@@ -1 +1 @@
-PASSED
+FAILED

Diff:

--- /Volumes/Data/worker/ios-simulator-15-release-tests-wk2/build/layout-test-results/fast/canvas/gradient-with-clip-expected.txt
+++ /Volumes/Data/worker/ios-simulator-15-release-tests-wk2/build/layout-test-results/fast/canvas/gradient-with-clip-actual.txt
@@ -1,10 +1,11 @@
+CONSOLE MESSAGE: Unable to get image data from canvas. Requested size was 1 x 1
+CONSOLE MESSAGE: InvalidStateError: The object is in an invalid state.
 Test for canvas regression where gradient clips were not cleared https://bugs.webkit.org/show_bug.cgi?id=21498
 
 On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
 
 
-PASS pixel 20, 20 is [0, 128, 0, 255]
-PASS successfullyParsed is true
+FAIL successfullyParsed should be true (of type boolean). Was undefined (of type undefined).
 
 TEST COMPLETE
Comment 1 Radar WebKit Bug Importer 2021-10-13 09:40:44 PDT
<rdar://problem/84202478>
Comment 2 ayumi_kojima 2021-10-13 10:05:15 PDT
Marked test expectations https://trac.webkit.org/changeset/284100/webkit
Comment 3 Alexey Proskuryakov 2021-10-14 16:54:55 PDT
Wenson, could this have been caused by r284079? The first failure is r284081, and these are pretty frequent.
Comment 4 Wenson Hsieh 2021-10-14 16:56:10 PDT
(In reply to Alexey Proskuryakov from comment #3)
> Wenson, could this have been caused by r284079? The first failure is
> r284081, and these are pretty frequent.

Potentially — let me take a look.
Comment 5 Wenson Hsieh 2021-10-14 17:53:12 PDT
(In reply to Wenson Hsieh from comment #4)
> (In reply to Alexey Proskuryakov from comment #3)
> > Wenson, could this have been caused by r284079? The first failure is
> > r284081, and these are pretty frequent.
> 
> Potentially — let me take a look.

I have been unable to reproduce either test failure locally by running both for 500 iterations each. That said, I am on macOS Monterey, and it seems like the bots are on older macOS releases. It's also possible that this is dependent on test order (although I wasn't able to reproduce by running all the tests in fast/canvas either).

I'll try iOS simulator next; if I still can't reproduce, I'll try to repro in EWS by uploading patches.
Comment 6 Wenson Hsieh 2021-10-15 08:18:08 PDT
Okay, I can occasionally reproduce in iOS simulator by running `fast/canvas` (at least, the `fast/canvas/gradient-with-clip.html` failure).

Taking a look...
Comment 7 Wenson Hsieh 2021-10-15 13:26:14 PDT
(In reply to Wenson Hsieh from comment #6)
> Okay, I can occasionally reproduce in iOS simulator by running `fast/canvas`
> (at least, the `fast/canvas/gradient-with-clip.html` failure).
> 
> Taking a look...

This seems to be relatively consistent method of reproducing the flaky failure:

run-webkit-tests --ios-simulator --exit-after-n-failures 1 --iterations 100 fast/canvas/getPutImageDataPairTest.html fast/canvas/gradient-text-with-shadow.html fast/canvas/gradient-with-clip.html
Comment 8 Wenson Hsieh 2021-10-15 15:13:56 PDT
Some logging in the application (WKTR), GPU, and web processes demonstrates that when this test fails, the smoking gun is the fact that we fail to receive the GetPixelBuffer stream message. Occasionally, the GPU Process seems to drop IPC stream messages in the middle of processing...

```
(UI) Running /Users/whsieh/work/OpenSource/LayoutTests/fast/canvas/gradient-with-clip.html
...
(WEB) Sending: RemoteRenderingBackend_FinalizeRenderingUpdate
                                                                  (GPU) Received: RemoteRenderingBackend_FinalizeRenderingUpdate
(WEB) Sending: RemoteRenderingBackend_CreateImageBuffer
(WEB) Sending: RemoteDisplayListRecorder_Save
(WEB) Sending: RemoteDisplayListRecorder_Save
(WEB) Sending: RemoteDisplayListRecorder_SetState
(WEB) Sending: RemoteDisplayListRecorder_FillRect
(WEB) Sending: RemoteDisplayListRecorder_Restore
(WEB) Sending: RemoteDisplayListRecorder_GetPixelBuffer
(WEB) Sending: RemoteDisplayListRecorder_FlushContext
                                                                  (GPU) Received: RemoteRenderingBackend_CreateImageBuffer
                                                                  (GPU) Received: RemoteDisplayListRecorder_Save
                                                                  (GPU) Received: RemoteDisplayListRecorder_Save
                                                                  (GPU) Received: RemoteDisplayListRecorder_SetState
                                                                  (GPU) Received: RemoteDisplayListRecorder_FillRect
                                                                  (GPU) Received: RemoteDisplayListRecorder_Restore
                                                                  (GPU) Received: RemoteDisplayListRecorder_GetPixelBuffer
(WEB) Sending: RemoteRenderingBackend_CreateImageBuffer
                                                                  (GPU) Received: RemoteDisplayListRecorder_FlushContext
(WEB) Sending: RemoteDisplayListRecorder_Save
(WEB) Sending: RemoteDisplayListRecorder_SetState
(WEB) Sending: RemoteDisplayListRecorder_FillPath
(WEB) Sending: RemoteDisplayListRecorder_SetInlineFillColor
(WEB) Sending: RemoteDisplayListRecorder_FillRect
(WEB) Sending: RemoteDisplayListRecorder_GetPixelBuffer
(WEB) Sending: RemoteDisplayListRecorder_FlushContext
                                                                  (GPU) Received: RemoteRenderingBackend_CreateImageBuffer
                                                                  (GPU) Received: RemoteDisplayListRecorder_Save
(WEB) FAILED TO WAIT FOR GetPixelBuffer!
(WEB) DESTROYING GET PIXEL BUFFER SHARED MEMORY.
```
Comment 9 Wenson Hsieh 2021-10-15 17:06:56 PDT
I think I'm beginning to get some grasp of what's happening:

```
(UI) Running /Users/whsieh/work/OpenSource/LayoutTests/fast/canvas/gradient-with-clip.html
(WEB) Waking up the GPU process.
(WEB) Sent in stream: RemoteRenderingBackend_FinalizeRenderingUpdate @[2096960]
                                                                                (GPU) Received: RemoteRenderingBackend_FinalizeRenderingUpdate
                                                                                (GPU) Waiting for more messages...
(WEB) Waking up the GPU process.
(WEB) Sent in stream: RemoteRenderingBackend_CreateImageBuffer @[2096976]
(WEB) Sent in stream: RemoteDisplayListRecorder_Save @[2097064]
(WEB) Sent in stream: RemoteDisplayListRecorder_Save @[2097074]
(WEB) Sent in stream: RemoteDisplayListRecorder_SetState @[2097084]
(WEB) Resetting stream buffer capacity (size=22 bytes, capacity=14 bytes)
(WEB) Sent out of stream: RemoteDisplayListRecorder_FillRect
(WEB) Sent in stream: RemoteDisplayListRecorder_Restore @[0]
(WEB) Sent in stream: RemoteDisplayListRecorder_GetPixelBuffer @[10]
(WEB) Sent in stream: RemoteDisplayListRecorder_FlushContext @[66]
                                                                                (GPU) Received: RemoteRenderingBackend_CreateImageBuffer
                                                                                (GPU) Received: RemoteDisplayListRecorder_Save
                                                                                (GPU) Received: RemoteDisplayListRecorder_Save
                                                                                (GPU) Received: RemoteDisplayListRecorder_SetState
                                                                                (GPU) No out of stream messages to process?
                                                                                (GPU) Waiting for more messages...
                                                                                (GPU) No out of stream messages to process?
                                                                                (GPU) Waiting for more messages...
(WEB) FAILED TO WAIT FOR GetPixelBuffer!
```

...when we exhaust the available (2MB) capacity in the stream buffer, the last message that would've caused us to run off the end of the buffer is instead dispatched to the GPU Process as an out-of-stream message, and we just drop a `ProcessOutOfStreamMessage` at the end, instead. The write offset is then returned to the start, and subsequent IPC stream messages are sent in the stream, starting at offset 0.

For some reason, we never wake up in the GPU Process to process the last stream message that was sent before exhausting capacity. Investigating further to discern why that is the case.
Comment 10 Wenson Hsieh 2021-10-16 12:52:55 PDT
I understand what's happening now:

```
(UI) Running /Users/whsieh/work/OpenSource/LayoutTests/fast/canvas/gradient-with-clip.html
(WEB) Sent in stream: RemoteRenderingBackend_FinalizeRenderingUpdate @[2096800]
                                                                                (GPU) Dispatching in stream: RemoteRenderingBackend_FinalizeRenderingUpdate
(WEB) Sent in stream: RemoteRenderingBackend_CreateImageBuffer @[2096816]
                                                                                (GPU) Dispatching in stream: RemoteRenderingBackend_CreateImageBuffer
(WEB) Sent in stream: RemoteDisplayListRecorder_Save @[2096904]
(WEB) Sent in stream: RemoteDisplayListRecorder_Save @[2096914]
(WEB) Sent in stream: RemoteDisplayListRecorder_SetState @[2096924]
(WEB) Sent in stream: RemoteDisplayListRecorder_FillRect @[2096946]
(WEB) Sent in stream: RemoteDisplayListRecorder_Restore @[2096968]
(WEB) Sent in stream: RemoteDisplayListRecorder_GetPixelBuffer @[2096978]
(WEB) Sent in stream: RemoteDisplayListRecorder_FlushContext @[2097034]
                                                                                (GPU) Start receiving messages for RemoteDisplayListRecorder(#20899)
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_Save
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_Save
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_SetState
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_FillRect
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_Restore
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_GetPixelBuffer
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_FlushContext
(WEB) Sent out of stream: RemoteRenderingBackend_CreateImageBuffer
(WEB) Sent in stream: RemoteDisplayListRecorder_Save @[2097088]
                                                                                (GPU) Dispatching out-of-stream: RemoteRenderingBackend_CreateImageBuffer
(WEB) Sent out of stream: RemoteDisplayListRecorder_SetState
                                                                                (GPU) Processing incoming message: RemoteDisplayListRecorder_SetState for RemoteDisplayListRecorder(#20901) 0x0
(WEB) Sent out of stream: RemoteDisplayListRecorder_FillPath
(WEB) Sent in stream: RemoteDisplayListRecorder_SetInlineFillColor @[0]
                                                                                (GPU) Processing incoming message: RemoteDisplayListRecorder_FillPath for RemoteDisplayListRecorder(#20901) 0x0
(WEB) Sent in stream: RemoteDisplayListRecorder_FillRect @[10]
(WEB) Sent in stream: RemoteDisplayListRecorder_GetPixelBuffer @[32]
(WEB) Sent in stream: RemoteDisplayListRecorder_FlushContext @[90]
                                                                                (GPU) Start receiving messages for RemoteDisplayListRecorder(#20901)
                                                                                (GPU) Dispatching in stream: RemoteDisplayListRecorder_Save
(WEB) FAILED TO WAIT FOR GetPixelBuffer!
```

When we are about to run off the end of the stream buffer and (as a result) send out-of-stream messages, it's possible for these OOS IPC messages to be received on the IPC::Connection via the IPC thread, right before we add the RemoteDisplayListRecorder as an IPC receive queue in `StreamServerConnection::startReceivingMessages` when creating a new remote image buffer:

```
void Connection::processIncomingMessage(std::unique_ptr<Decoder> message)
{
    …

    Locker incomingMessagesLocker { m_incomingMessagesLock };
    if (auto* receiveQueue = m_receiveQueues.get(*message)) {
        receiveQueue->enqueueMessage(*this, WTFMove(message));
        return;
    }
```

..as a result, `receiveQueue` above ends up being null, and there is no mechanism for queueing IPC messages for a receive queue that will be added in the future, so we end up missing the OOS message altogether, and stop processing stream messages.

In order to address this, we need to somehow fix the race condition between:
1. Adding RemoteDisplayListRecorder as an IPC receive queue, and
2. Receiving out of stream messages on the IPC thread
Comment 11 Wenson Hsieh 2021-10-17 15:13:22 PDT
Created attachment 441552 [details]
Patch
Comment 12 Wenson Hsieh 2021-10-17 15:24:59 PDT
Created attachment 441553 [details]
Initial approach
Comment 13 Kimmo Kinnunen 2021-10-18 01:18:16 PDT
Comment on attachment 441553 [details]
Initial approach

View in context: https://bugs.webkit.org/attachment.cgi?id=441553&action=review

> Source/WebKit/ChangeLog:35
> +        receiver on `m_incomingMessages` instead of falling through the early return. This allows us to enqueue the

The message comes as the first message in m_incomingMessages.
If it doesn't have a receiver, this puts the message back as the last message in m_incomingMessages.
As far as I understand, this reorders messages, which is not a desirable characteristic of a IPC system.

If you consider message scenario like 
 Save()
 Restore()

Suppose both get sent as OOS. 
Suppose following sequence:
1) Process Save. Notice no receive queue. Requeue Save as the last message.
2) RRB installs receive queue for the new RemoteImageBuffer.
3) Process Restore. Notice the receive queue. Send it for processing.
4) Process Save. Notice the receive queue. Send it for processing.

So this makes Restore happen before Save.
Comment 14 Kimmo Kinnunen 2021-10-18 02:00:25 PDT
Great debugging!

The root cause of this issue is that the install trigger is on one stream, where as the payload messages are in other.

So the design of the stream system is spelled out in one of the headers, namely
- Messages in one stream for one destination are guaranteed to be processed in order.
- There are no guarantees between messages across different streams or different destinations.

This didn't explicitly change even though you changed it so that the underlying stream buffer can be used to deliver many different streams.

The unwritten requirement is that the install trigger be normal, main-thread processed message. This I forgot completely when you mentioned both IPC receivers would be streams.

This is an implementation flaw of the stream system that was inherited from the way the IPC system works.

For this particular problem, one of the following could be a better solution
a) Make RRB a normal IPC receiver in main thread (how does this relate to the threading model for the other messages? Maybe not feasible?)

b) Make StreamConnectionWorkQueue thread safe, and register RDLR in the RRB+RDLR thread. Connection should already be thread safe for this use-case, as well as StreamServerConnection.

c) Add concept which allows for StreamServerConnection to register "all RDLR messages go to this receive queue". There's already support for this in Connection. For the StreamServerConnection implementation it's a bit verbose and adds a mode, so it'd be elaborate.

d) Make sure OOS messages are not sent until sender knows receiver is ready, e.g. sender semaphore is received (or some other signal). Since this in current system is message interface specific operation, this would need to be a virtual function that waits on the particular message response, this seems quite elaborate too.

--

Long term, I think workable solution would be to define that:
1) message receivers should receive the messages in send order (currently only guaranteed for stream messages)
2) message order across receivers is not preserved
3) explicit messaging synchronisation primitives can be used to express cross-receiver order

Then the implementation would be to have one explicit queue per receiver with hard-cap on the queued messages, where exceeding the cap would be cause for termination for untrusted connections. And queue should probably be pair (mach message port, vector)
Comment 15 Kimmo Kinnunen 2021-10-18 02:11:42 PDT
> b) Make StreamConnectionWorkQueue thread safe, and register RDLR in the RRB+RDLR thread.

Scratch this, this doesn't fix anything..

e) createImageBuffer could be synchronous (for the time being, to unblock?)


Maybe e), c), undo e) sound the best for short-term fixing from my perspective.
Comment 16 Wenson Hsieh 2021-10-18 08:04:02 PDT
(In reply to Kimmo Kinnunen from comment #13)
> Comment on attachment 441553 [details]
> Rebaseline unit test
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=441553&action=review
> 
> > Source/WebKit/ChangeLog:35
> > +        receiver on `m_incomingMessages` instead of falling through the early return. This allows us to enqueue the
> 
> The message comes as the first message in m_incomingMessages.
> If it doesn't have a receiver, this puts the message back as the last
> message in m_incomingMessages.
> As far as I understand, this reorders messages, which is not a desirable
> characteristic of a IPC system.
> 
> If you consider message scenario like 
>  Save()
>  Restore()
> 
> Suppose both get sent as OOS. 
> Suppose following sequence:
> 1) Process Save. Notice no receive queue. Requeue Save as the last message.
> 2) RRB installs receive queue for the new RemoteImageBuffer.
> 3) Process Restore. Notice the receive queue. Send it for processing.
> 4) Process Save. Notice the receive queue. Send it for processing.
> 
> So this makes Restore happen before Save.

So in this scenario, when (2) happens, `enqueueMatchingMessagesToMessageReceiveQueue` will move the queued Save in `m_incomingMessages` to the new receive queue, so that we end up processing the Save before the Restore and don't end up with out-of-order messages.
Comment 17 Wenson Hsieh 2021-10-18 08:08:19 PDT
(In reply to Kimmo Kinnunen from comment #15)
> > b) Make StreamConnectionWorkQueue thread safe, and register RDLR in the RRB+RDLR thread.
> 
> Scratch this, this doesn't fix anything..
> 
> e) createImageBuffer could be synchronous (for the time being, to unblock?)
> 
> 
> Maybe e), c), undo e) sound the best for short-term fixing from my
> perspective.

Thank you for your thoughts!

I think (e) seems...maybe okay as a temporary workaround, but I'm concerned about potential performance impact on Canvas Images. I agree that (c) sounds like a good plan, so I'll work towards that!
Comment 18 Wenson Hsieh 2021-10-18 08:53:46 PDT
Created attachment 441608 [details]
Workaround to keep tests passing
Comment 19 Wenson Hsieh 2021-10-18 09:47:14 PDT
Comment on attachment 441553 [details]
Initial approach

(Re-marking this patch for r?, since I don't _think_ this should result in out of order messages within the same IPC stream...)
Comment 20 Wenson Hsieh 2021-10-19 09:54:55 PDT
Comment on attachment 441553 [details]
Initial approach

Kimmo pointed out that here is still the possibility of reordered messages here in the following scenario:

1. Receive stream IPC message (A) for an unknown receiver
2. Receive normal IPC message (B)
3. Receive stream IPC message (C) for an unknown receiver

It's possible that (2) causes us to try and dispatch an incoming message, which (in this patch) will take (A) and append it to the end. Later, when the receive queue is actually added, we'll end up moving messages (C) and (A) into the new receive queue, in that order.

I think we could adjust the patch to be robust against this possibility by adding receiver-less stream messages to a separate queue that is not `m_incomingMessages`, and taking messages from that separate queue underneath `enqueueMatchingMessagesToMessageReceiveQueue`.
Comment 21 Wenson Hsieh 2021-10-19 10:38:32 PDT Comment hidden (obsolete)
Comment 22 Wenson Hsieh 2021-10-19 10:45:53 PDT
Created attachment 441755 [details]
Async approach, v2
Comment 23 Kimmo Kinnunen 2021-10-19 11:08:18 PDT
Comment on attachment 441755 [details]
Async approach, v2

View in context: https://bugs.webkit.org/attachment.cgi?id=441755&action=review

> Source/WebKit/Platform/IPC/Connection.cpp:362
> +    enqueueMatchingMessagesToMessageReceiveQueue(receiveQueue, m_incomingReceiveQueueMessages, receiverName, destinationID);

Receive queue is not a stream-specific thing.
Everything that receives messages off main thread is a receive queue.
This now dispatches only the pending stream messages to the newly added receive queues.

> Source/WebKit/Platform/IPC/Connection.cpp:772
> +    if (isStreamMessageReceiver(message->messageReceiverName())) {

I don't think generally this sort of special casing of "is it stream or is it not" is warranted.

The root of the problem is that there are two different message receivers.
These message receivers are not supposed to be sequential, nothing guarantees it.
These are not guaranteed to be sequential, so we shouldn't  try to find stop-gap measures to make them so.

If we want two message receivers be sequential, we need to define the mechanism as an API guarantee which makes it happen. Then we need to implement that.
Comment 24 Wenson Hsieh 2021-10-19 11:18:42 PDT
Comment on attachment 441755 [details]
Async approach, v2

View in context: https://bugs.webkit.org/attachment.cgi?id=441755&action=review

>> Source/WebKit/Platform/IPC/Connection.cpp:362
>> +    enqueueMatchingMessagesToMessageReceiveQueue(receiveQueue, m_incomingReceiveQueueMessages, receiverName, destinationID);
> 
> Receive queue is not a stream-specific thing.
> Everything that receives messages off main thread is a receive queue.
> This now dispatches only the pending stream messages to the newly added receive queues.

I think it's true that stream messages should only ever be enqueued on receive queues, though? (In other words, I don't think there should ever be `m_incomingMessages` that are destined for a `MessageReceiveQueue`).

>> Source/WebKit/Platform/IPC/Connection.cpp:772
>> +    if (isStreamMessageReceiver(message->messageReceiverName())) {
> 
> I don't think generally this sort of special casing of "is it stream or is it not" is warranted.
> 
> The root of the problem is that there are two different message receivers.
> These message receivers are not supposed to be sequential, nothing guarantees it.
> These are not guaranteed to be sequential, so we shouldn't  try to find stop-gap measures to make them so.
> 
> If we want two message receivers be sequential, we need to define the mechanism as an API guarantee which makes it happen. Then we need to implement that.

I agree that it's a bit fragile, but IIUC `isStreamMessageReceiver()` implies that the message should end up in a receive queue, currently.

Instead of checking for `isStreamMessageReceiver`, we could instead either:
1. Add a mechanism for the owner of the Connection (e.g. `GPUConnectionToWebProcess`) to indicate that certain ReceiverNames are expected to always end up in receive queues, or
2. Bake this into the grammar of `*.messages.in` as a new receiver attribute

In lieu of these alternatives, could we temporarily address the flaky test failures by landing the other mitigation (making CreateImageBuffer sync) first?
Comment 25 Kimmo Kinnunen 2021-10-19 11:23:04 PDT
Comment on attachment 441608 [details]
Workaround to keep tests passing

Looks good as far as I understand RRB.
I think the message could return the handle, but then it'd reply out of stream and I understand you want to preserve the didCreate.. message so that one day we can make the sync call async again
Comment 26 Wenson Hsieh 2021-10-19 11:26:21 PDT
(In reply to Kimmo Kinnunen from comment #25)
> Comment on attachment 441608 [details]
> Workaround to keep tests passing
> 
> Looks good as far as I understand RRB.
> I think the message could return the handle, but then it'd reply out of
> stream and I understand you want to preserve the didCreate.. message so that
> one day we can make the sync call async again

Indeed — it will make it easier to restore the async codepath in a followup.

Filed https://bugs.webkit.org/show_bug.cgi?id=231970 to track this. Thanks!
Comment 27 Kimmo Kinnunen 2021-10-19 11:36:16 PDT
(In reply to Wenson Hsieh from comment #24)
> I think it's true that stream messages should only ever be enqueued on
> receive queues, though? (In other words, I don't think there should ever be
> `m_incomingMessages` that are destined for a `MessageReceiveQueue`).

(How I read the statements, what is not in parenthesis talks about stream messages and what is in parenthesis talks about non-stream messages, but the statements claimed do not overlap, and as such binding "in other words" is not correct.)

There are normal IPC messages that are sent to MessageReceiveQueues or Connection::Client (e.g. "to main thread message receiver").
So I think this counters the claim :
> "I don't think there should ever be `m_incomingMessages` that are destined for a `MessageReceiveQueue`"


There are stream IPC messages that are sent to MessageReceiveQueues.
Conceivably there could be "main thread message receiver" that would process stream messages and as such could receive (OOS) stream messages through main thread dispatch mechanism, but there are none today and it wouldn't be too useful.
So I think this discusses the practically true claim:
>I think it's true that stream messages should only ever be enqueued on
> receive queues, though?
Comment 28 Wenson Hsieh 2021-10-19 12:00:21 PDT
(In reply to Kimmo Kinnunen from comment #27)
> (In reply to Wenson Hsieh from comment #24)
> > I think it's true that stream messages should only ever be enqueued on
> > receive queues, though? (In other words, I don't think there should ever be
> > `m_incomingMessages` that are destined for a `MessageReceiveQueue`).
> 
> (How I read the statements, what is not in parenthesis talks about stream
> messages and what is in parenthesis talks about non-stream messages, but the
> statements claimed do not overlap, and as such binding "in other words" is
> not correct.)
> 
> There are normal IPC messages that are sent to MessageReceiveQueues or
> Connection::Client (e.g. "to main thread message receiver").
> So I think this counters the claim :
> > "I don't think there should ever be `m_incomingMessages` that are destined for a `MessageReceiveQueue`"

Sorry, I should've been more explicit — what I mean is that `Connection::addMessageReceiveQueue` is only ever called in the context of streaming IPC, so in the previous patch, this line:

```
enqueueMatchingMessagesToMessageReceiveQueue(receiveQueue, m_incomingReceiveQueueMessages, receiverName, destinationID);
```

..is not incorrect, since the `receiveQueue` passed into `addMessageReceiveQueue` is only ever going to receive stream messages. That said, I do agree that making this contract explicit (through some new mechanism on Connection) is a better approach, since it makes the code less fragile.

> 
> 
> There are stream IPC messages that are sent to MessageReceiveQueues.
> Conceivably there could be "main thread message receiver" that would process
> stream messages and as such could receive (OOS) stream messages through main
> thread dispatch mechanism, but there are none today and it wouldn't be too
> useful.
> So I think this discusses the practically true claim:
> >I think it's true that stream messages should only ever be enqueued on
> > receive queues, though?
Comment 29 EWS 2021-10-19 12:09:49 PDT
Committed r284476 (243235@main): <https://commits.webkit.org/243235@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 441608 [details].