WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
224566
REGRESSION(
r275921
-
r275922
): [ wk2 ] http/tests/security/webgl-remote-read-remote-image-allowed-with-credentials.html is flakey crashing
https://bugs.webkit.org/show_bug.cgi?id=224566
Summary
REGRESSION(r275921-r275922): [ wk2 ] http/tests/security/webgl-remote-read-re...
Robert Jenner
Reported
2021-04-14 11:39:17 PDT
http/tests/security/webgl-remote-read-remote-image-allowed-with-credentials.html is flakey crashing in wk2 on iOS 14 Simulator, Catalina and BigSur. HISTORY:
https://results.webkit.org/?suite=layout-tests&test=http%2Ftests%2Fsecurity%2Fwebgl-remote-read-remote-image-allowed-with-credentials.html
No crashlog was generated for this test. However, I was able to find a STDERR. STDERR TEXT: stderr: 2021-04-14 07:23:46.915 com.apple.WebKit.Networking.Development[91141:40671502] DiskCookieStorage changing policy from 0 to 2, cookie file: file:///Users/buildbot/Library/HTTPStorages/com.apple.WebKit.WebKitTestRunner.binarycookies LEAK: 1 WebPageProxy
Attachments
Patch
(10.59 KB, patch)
2021-04-14 23:08 PDT
,
Chris Dumez
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2021-04-14 11:40:48 PDT
<
rdar://problem/76657007
>
Robert Jenner
Comment 2
2021-04-14 11:49:04 PDT
Updated test expectations to Pass Crash here:
https://trac.webkit.org/changeset/275955/webkit
Updated before investigating because this test was causing issues for EWS and commit-queue.
Robert Jenner
Comment 3
2021-04-14 15:12:25 PDT
(In reply to Robert Jenner from
comment #2
)
> Updated test expectations to Pass Crash here: > >
https://trac.webkit.org/changeset/275955/webkit
> > Updated before investigating because this test was causing issues for EWS > and commit-queue.
Forgot to add expectations to Mac wk2. Updated here:
https://trac.webkit.org/changeset/275974/webkit
Chris Dumez
Comment 4
2021-04-14 15:52:58 PDT
No crash log? Weird.
Chris Dumez
Comment 5
2021-04-14 16:29:36 PDT
I was able to reproduce the crash locally on macOS with the command that was provided. I see this: 16:24:02.300 6056 com.apple.WebKit.WebContent.Development crash, pid = 6113 16:24:02.318 6056 killed pid 6059 16:26:17.392 6056 worker/26 http/tests/security/webgl-remote-read-remote-image-allowed-with-credentials.html crashed, (stderr lines): 16:26:17.392 6056 LEAK: 1 WebPageProxy 16:26:17.394 5731 [2000/2000] http/tests/security/webgl-remote-read-remote-image-allowed-with-credentials.html failed unexpectedly (com.apple.WebKit.WebContent.Development crashed [pid=6113]) But again, no generated crash log.
Chris Dumez
Comment 6
2021-04-14 16:33:24 PDT
Looking at the system logs I found: com.apple.WebKit IPC fault 16:32:20.017530-0700 9721 com.apple.WebKit.GPU.Development Exiting: Received XPC event type: XPC_ERROR_CONNECTION_INVALID com.apple.WebKit IPC fault 16:32:20.447207-0700 9728 com.apple.WebKit.GPU.Development Exiting: Received XPC event type: XPC_ERROR_CONNECTION_INVALID com.apple.WebKit IPC fault 16:32:20.592596-0700 9729 com.apple.WebKit.GPU.Development Exiting: Received XPC event type: XPC_ERROR_CONNECTION_INVALID com.apple.WebKit IPC fault 16:32:20.593085-0700 9658 com.apple.WebKit.GPU.Development Exiting: Received XPC event type: XPC_ERROR_CONNECTION_INVALID com.apple.WebKit IPC fault 16:32:20.593305-0700 9790 com.apple.WebKit.WebContent.Development Exiting: Received XPC event type: XPC_ERROR_CONNECTION_INVALID
Chris Dumez
Comment 7
2021-04-14 16:39:46 PDT
It is super odd that
r275921
would have regressed this on macOS given that it was an iOS patch (VISIBILITY_PROPAGATION_VIEW is only 1 on iOS).
Chris Dumez
Comment 8
2021-04-14 16:50:51 PDT
On the flakiness dashboard, we can see a crash as early as
r273654
. That said, it is true the the crash became super frequent recently (
r275922
).
r275922
is not platform-specific and is actually more likely to have caused this.
Chris Dumez
Comment 9
2021-04-14 16:55:53 PDT
As I suspected, the regression point is wrong. I can still reproduce the crash after reverting
r275921
locally. I will try reverting
r275922
next.
Chris Dumez
Comment 10
2021-04-14 17:05:15 PDT
(In reply to Chris Dumez from
comment #9
)
> As I suspected, the regression point is wrong. I can still reproduce the > crash after reverting
r275921
locally. I will try reverting
r275922
next.
I can still reproduce after reverting
r275922
. I am thus not sure what caused this.
Chris Dumez
Comment 11
2021-04-14 17:16:13 PDT
Could be just flakiness but it looks like reverting both
r275921
and
r275922
makes the flaky crash go away. Both patches may impact when we decide to launch to GPUProcess so the issue could be timing-related. I'll investigate more tomorrow.
Chris Dumez
Comment 12
2021-04-14 17:23:42 PDT
(In reply to Chris Dumez from
comment #11
)
> Could be just flakiness but it looks like reverting both
r275921
and
r275922
> makes the flaky crash go away. Both patches may impact when we decide to > launch to GPUProcess so the issue could be timing-related. I'll investigate > more tomorrow.
I think this has to do with when we decide to initiate the connection to the GPUProcess.
r275921
is mostly an iOS patch but it does one thing on macOS: - It delays the construction of the RemoteRenderingBackendProxy until we actually need it. It used to get constructed as soon as we constructed the WebPage, now it only gets constructed when the GPUProcess launches or when we start doing canvas rendering.
r275922
is related too. Before
r275922
, constructing the RemoteRenderingBackendProxy would initiate the GPUProcess connection right away (in the constructor). After
r275922
, eagerly constructing a RemoteRenderingBackendProxy does not actually initiate the GPUProcess connection. We only launch the GPUProcess when we actually do canvas rendering via the RemoteRenderingBackendProxy. I think this is why reverting
r275921
without reverting
r275922
does not fix the crash. That said, now I have to figure out why launching the GPUProcess when we start canvas drawing instead of on WebPage creation actually causes this flaky crash. Even weirder, it is not a real crash but an XPC error (XPC_ERROR_CONNECTION_INVALID).
Robert Jenner
Comment 13
2021-04-14 18:58:43 PDT
I found two more tests that fail at the
r275921
and
r275922
revisions, but not
r275920
: imported/w3c/web-platform-tests/webmessaging/with-ports/011.html webgl/1.0.3/conformance/glsl/matrices/glsl-mat3-construction.html As such, I have added these tests to the previous test expectations I set:
https://trac.webkit.org/changeset/275987/webkit
Chris Dumez
Comment 14
2021-04-14 22:05:40 PDT
I am not sure if this is the root cause but there is definitely an IPC ordering issue. I see the WebProcess sending to the GPUProcess: 1. RemoteRenderingBackend::CreateImageBuffer 2. RemoteRenderingBackend::GetDataURLForImageBuffer And the GPUProcess receiving then in the opposite order. As a result, the second IPC handling is unable to find the image buffer. The first message is async and the second one is async so this is likely where the ordering issue comes from. Oddly though, we are using the IPC::SendOption::DispatchMessageEvenWhenWaitingForSyncReply flag when sending the CreateImageBuffer IPC, which *should* maintain ordering with sync messages. That said, the receiver seems to be a threaded message receiver, which is not super common so it is possible we have a bug with this configuration.
Chris Dumez
Comment 15
2021-04-14 22:33:07 PDT
Adding Kimmo in cc as this seems to be using his new MessageReceiveQueue code (since RemoteRenderingBackend is a WorkQueueMessageReceiver). With some local debugging, I can see there there seems to be an issue here in void Connection::processIncomingMessage(std::unique_ptr<Decoder> message): ``` if (auto* receiveQueue = m_receiveQueues.get(*message)) { receiveQueue->enqueueMessage(*this, WTFMove(message)); return; } ``` I would expect all RemoteRenderingBackend IPC messages to use this code path, since RemoteRenderingBackend is a WorkQueueMessageReceiver. However, I can see locally that the first couple of RemoteRenderingBackend IPC messages get a receiveQueue that is null. Starting with the 3rd RemoteRenderingBackend message, receiveQueue becomes non-null. Seems like we may have a race somewhere if my logging is correct.
Chris Dumez
Comment 16
2021-04-14 22:43:36 PDT
My patches did not actually change the timing of the GPUProcess launch in the context of the tests. There is something during the window.internals initialization that ends up launching the GPUProcess. What did change with my patches is the timing of when the GPUConnectionToWebProcess::CreateRenderingBackend IPC get sent. It used to get sent as soon as the RemoteRenderingBackendProxy got created and now it only gets send right before the first IPC from the RemoteRenderingBackendProxy to the GPUProcess. I think we've had a IPC synchronization issue on GPUProcess side that became obvious after my patches because the IPCs are now so close together. In particular, we send a GPUConnectionToWebProcess::CreateRenderingBackend async IPC that creates the RemoteRenderingBackend on the GPUProcess side, on the main thread. When constructed, the RemoteRenderingBackend registers itself as a WorkQueueMessageReceiver. Right after the WebProcess sends IPC from the WebProcess to the RemoteRenderingBackend. When we get those IPC on the IPC thread, the RemoteRenderingBackend may not have registered itself as a WorkQueueMessageReceiver yet on the main thread. As a result, RemoteRenderingBackend IPC messages may get processed out of order because some of them use the MessageReceiveQueue and some early IPC messages don't.
Chris Dumez
Comment 17
2021-04-14 23:08:25 PDT
Created
attachment 426076
[details]
Patch
Kimmo Kinnunen
Comment 18
2021-04-14 23:16:33 PDT
Yes, good debugging. That's the reason for the reordering. There was multiple of these reordering bugs. The series of changes related to MessageReceiveQueue fixed most of them, but this one remains as the FIXME mentions. So basically the Connection implementation is still such that messages are delivered in random order. I have a wip patch locally that fixes this reordering. However, it's not entirely simple to solve since the Connection behavior when dispatching messages while waiting wrt all global Connection instances is a bit underspecified. E.g. it's a complex patch to solve if we want to preserve the behavior that *ALL* Connections get their messages dispatched in main thread while one connection waits on a particular message.
Kimmo Kinnunen
Comment 19
2021-04-14 23:22:18 PDT
The FIXME in: // FIXME: The matches here come from // m_messagesToDispatchWhileWaitingForSyncReply.
Chris Dumez
Comment 20
2021-04-14 23:42:31 PDT
(In reply to Kimmo Kinnunen from
comment #19
)
> The FIXME in: > // FIXME: The matches here come from > // m_messagesToDispatchWhileWaitingForSyncReply.
Just focusing on this particular WorkQueueMessageReceiver issue, I think one possible fix at the IPC::Connection layer might be in Connection::processIncomingMessage(std::unique_ptr<Decoder> message): ``` if (auto* receiveQueue = m_receiveQueues.get(*message)) { receiveQueue->enqueueMessage(*this, WTFMove(message)); return; } ``` In the if scope, we should check if there are any pending messages in m_incomingMessages for this receiver name, if there are, then we could remove them from m_incomingMessages and then enqueue them on the receiveQueue *before* the current message. I think this would preserve ordering for async messages at least. We'd likely need a similar fix for sync messages (maybe checking m_messagesToDispatchWhileWaitingForSyncReply instead of m_incomingMessages). Let me know what you think. Again though, I'd want to land the current patch (simple fix make the IPC sync) first to address crashes quickly. We can follow-up with a better / more generic solution.
EWS
Comment 21
2021-04-15 00:37:13 PDT
Committed
r276007
(
236559@main
): <
https://commits.webkit.org/236559@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 426076
[details]
.
Kimmo Kinnunen
Comment 22
2021-04-15 01:25:35 PDT
(In reply to Chris Dumez from
comment #20
)
> In the if scope, we should check if there are any pending messages in > m_incomingMessages for this receiver name, if there are, then we could > remove them from m_incomingMessages and then enqueue them on the > receiveQueue *before* the current message.
I interpret your suggestion to mean the normal operation, e.g. the server is not waiting on a specific message. So the normal operation is already covered, or should be covered. If you're seeing the behavior in normal delivery, then I think something is still missing there and could be addressed simply. However, I tried to make this covered, so I don't know if this is the case. In normal operation, if messages are scheduled to m_incomingMessages *before* the work queue is attached, this is business-as-usual. The m_incomingMessages are delivered when the work queue is attached, right at that moment, holding a lock. So if there are pending messages in m_incomingMessages, they should be processed at work queue install time and no additional changes should be needed in Connection::processIncomingMessage. (as per the intention, there could of course be bugs) The reordering happens when server starts waiting for messages. 1) Normal operation queues messages to m_incomingMessages (regardless of options) 2) Wait starts 3a) Messages during wait are queued to m_incomingMessages, if they don't have options 3b) Messages during wait are queued to m_messagesToDispatchWhileWaitingForSyncReply if they have the specific option or if they're sync messages So it is impossible to order messages correctly, if they pseudo-randomly get queued to two different queues. So even the messages with DispatchMessageEvenWhenWaitingForSyncReply
Kimmo Kinnunen
Comment 23
2021-04-15 01:52:00 PDT
(In reply to Chris Dumez from
comment #14
)
> I am not sure if this is the root cause but there is definitely an IPC > ordering issue. > > I see the WebProcess sending to the GPUProcess: > 1. RemoteRenderingBackend::CreateImageBuffer > 2. RemoteRenderingBackend::GetDataURLForImageBuffer > > And the GPUProcess receiving then in the opposite order
This interpretation is a bit suspicious and perhaps warrants a double check. The normal logical operation of creating a image buffer in Web Content process side is currently: 1) send(CreateImageBuffer()) 2) waitForDidCreateImageBufferBackend() 3) do other operations If we apply this logic to GetDataURL, without looking at the implementation it should be: 1) send(CreateImageBuffer()) 2) waitForDidCreateImageBufferBackend() 3) send(GetDataURLForImageBuffer) If this is the case, it should simply not be possible to observe any reordering, as the Web Content process will not send the GetDataURL message before GPU Process has gotten and replied the CreateImageBuffer. However, there's no such requirement for CreateRemoteRenderingBackend + CreateImageBuffer, so I could imagine those being reordered.
Chris Dumez
Comment 24
2021-04-15 07:43:35 PDT
(In reply to Kimmo Kinnunen from
comment #22
)
> (In reply to Chris Dumez from
comment #20
) > > In the if scope, we should check if there are any pending messages in > > m_incomingMessages for this receiver name, if there are, then we could > > remove them from m_incomingMessages and then enqueue them on the > > receiveQueue *before* the current message. > > I interpret your suggestion to mean the normal operation, e.g. the server is > not waiting on a specific message. > > So the normal operation is already covered, or should be covered. If you're > seeing the behavior in normal delivery, then I think something is still > missing there and could be addressed simply. However, I tried to make this > covered, so I don't know if this is the case. > > In normal operation, if messages are scheduled to m_incomingMessages > *before* the work queue is attached, this is business-as-usual. The > m_incomingMessages are delivered when the work queue is attached, right at > that moment, holding a lock.
I guess you are talking about this code: ``` void Connection::addMessageReceiveQueue(MessageReceiveQueue& receiveQueue, ReceiverName receiverName, uint64_t destinationID) { auto locker = holdLock(m_incomingMessagesMutex); m_incomingMessages = filterWithMessageReceiveQueue(*this, receiveQueue, receiverName, destinationID, WTFMove(m_incomingMessages)); m_receiveQueues.add(receiveQueue, receiverName, destinationID); } ``` In particular the `filterWithMessageReceiveQueue()` part. I did not realize we already did that and it is great. However, the RemoteRenderingBackend::CreateImageBuffer IPC gets sent with the IPC::SendOption::DispatchMessageEvenWhenWaitingForSyncReply flag. As a result, the message does not get appended to m_incomingMessages. Instead, it gets appended to m_messagesToDispatchWhileWaitingForSyncReply :) As I mentioned earlier in one of my comments, we have to take care of both regular async messages (in m_incomingMessages) and sync messages / async messages with DispatchMessageEvenWhenWaitingForSyncReply flag (in m_messagesToDispatchWhileWaitingForSyncReply). Seems like our code currently only deals with regular async messages?
Chris Dumez
Comment 25
2021-04-15 14:00:22 PDT
(In reply to Chris Dumez from
comment #24
)
> (In reply to Kimmo Kinnunen from
comment #22
) > > (In reply to Chris Dumez from
comment #20
) > > > In the if scope, we should check if there are any pending messages in > > > m_incomingMessages for this receiver name, if there are, then we could > > > remove them from m_incomingMessages and then enqueue them on the > > > receiveQueue *before* the current message. > > > > I interpret your suggestion to mean the normal operation, e.g. the server is > > not waiting on a specific message. > > > > So the normal operation is already covered, or should be covered. If you're > > seeing the behavior in normal delivery, then I think something is still > > missing there and could be addressed simply. However, I tried to make this > > covered, so I don't know if this is the case. > > > > In normal operation, if messages are scheduled to m_incomingMessages > > *before* the work queue is attached, this is business-as-usual. The > > m_incomingMessages are delivered when the work queue is attached, right at > > that moment, holding a lock. > > I guess you are talking about this code: > ``` > void Connection::addMessageReceiveQueue(MessageReceiveQueue& receiveQueue, > ReceiverName receiverName, uint64_t destinationID) > { > auto locker = holdLock(m_incomingMessagesMutex); > m_incomingMessages = filterWithMessageReceiveQueue(*this, receiveQueue, > receiverName, destinationID, WTFMove(m_incomingMessages)); > m_receiveQueues.add(receiveQueue, receiverName, destinationID); > } > ``` > > In particular the `filterWithMessageReceiveQueue()` part. I did not realize > we already did that and it is great. > > However, the RemoteRenderingBackend::CreateImageBuffer IPC gets sent with > the IPC::SendOption::DispatchMessageEvenWhenWaitingForSyncReply flag. As a > result, the message does not get appended to m_incomingMessages. Instead, it > gets appended to m_messagesToDispatchWhileWaitingForSyncReply :) > > As I mentioned earlier in one of my comments, we have to take care of both > regular async messages (in m_incomingMessages) and sync messages / async > messages with DispatchMessageEvenWhenWaitingForSyncReply flag (in > m_messagesToDispatchWhileWaitingForSyncReply). > Seems like our code currently only deals with regular async messages?
Looks like I figured out a proper way to fix the ordering issue. I will address via
https://bugs.webkit.org/show_bug.cgi?id=224623
and make the CreateImageBuffer IPC async again.
Alexey Proskuryakov
Comment 26
2021-04-20 09:44:25 PDT
***
Bug 224612
has been marked as a duplicate of this bug. ***
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug