Another crash due to backend creation failure: Termination Signal: Segmentation fault: 11 Termination Reason: Namespace SIGNAL, Code 0xb Terminating Process: exc handler [2237] Triggered by Thread: 0 Thread 0 name: Dispatch queue: com.apple.main-thread Thread 0 Crashed: 0 WebKit 0x000000010352857c WebKit::ImageBufferShareableIOSurfaceBackend::createImageBufferBackendHandle() const + 24 1 WebKit 0x0000000103206474 WebKit::RemoteLayerBackingStore::encode(IPC::Encoder&) const + 316 2 WebKit 0x0000000103206474 WebKit::RemoteLayerBackingStore::encode(IPC::Encoder&) const + 316 3 WebKit 0x000000010320a050 WebKit::RemoteLayerTreeTransaction::LayerProperties::encode(IPC::Encoder&) const + 1184 4 WebKit 0x000000010320ac84 WebKit::RemoteLayerTreeTransaction::encode(IPC::Encoder&) const + 164 5 WebKit 0x000000010308fee4 WebKit::RemoteLayerTreeDrawingArea::updateRendering() + 552 6 WebCore 0x00000001091d42e4 WebCore::ThreadTimers::sharedTimerFiredInternal() + 224 7 WebCore 0x00000001091faa34 WebCore::timerFired(__CFRunLoopTimer*, void*) + 32 8 CoreFoundation 0x00000001a782734c __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 32 9 CoreFoundation 0x00000001a7826f48 __CFRunLoopDoTimer + 1076 10 CoreFoundation 0x00000001a7826398 __CFRunLoopDoTimers + 328 11 CoreFoundation 0x00000001a782014c __CFRunLoopRun + 1944 12 CoreFoundation 0x00000001a781f480 CFRunLoopRunSpecific + 600 13 Foundation 0x00000001a8ceccac -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232 14 Foundation 0x00000001a8d203b8 -[NSRunLoop(NSRunLoop) run] + 92 15 libxpc.dylib 0x00000001f560e2cc _xpc_objc_main + 688 16 libxpc.dylib 0x00000001f56105e8 xpc_main + 180 17 WebKit 0x00000001031ce0b4 WebKit::XPCServiceMain(int, char const**) + 124 18 libdyld.dylib 0x00000001a74dd38c start + 4
Created attachment 416124 [details] Patch
Comment on attachment 416124 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=416124&action=review > Source/WebKit/ChangeLog:21 > + B subsequently finishes waiting for backend creation because it incorrectly believes that its backend has been > + initialized, and we end up crashing downstream in `RemoteLayerBackingStore::encode()` because > + `ensureBackendCreated()` for B returned null. Seems like we will still have this issue, just perhaps less often, since we will still return null from ensureBackendCreated() if enough callers call ensureBackendCreated() at the same time, or am I missing something? > Source/WebKit/WebProcess/GPU/graphics/RemoteImageBufferProxy.h:140 > + static constexpr unsigned maximumNumberOfTimeouts = 3; > + unsigned numberOfTimeouts = 0; > + while (!m_backend && numberOfTimeouts < maximumNumberOfTimeouts) { > + if (!m_remoteRenderingBackendProxy->waitForDidCreateImageBufferBackend()) > + ++numberOfTimeouts; > + } It seems a bit odd to call these timeouts if they also represent other sync messages that have been delivered in the interim (which is what I think you are saying they do). Can we differentiate between actual timeouts and other things that would cause waitForDidCreateImageBufferBackend to return prematurely?
Comment on attachment 416124 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=416124&action=review >> Source/WebKit/ChangeLog:21 >> + `ensureBackendCreated()` for B returned null. > > Seems like we will still have this issue, just perhaps less often, since we will still return null from ensureBackendCreated() if enough callers call ensureBackendCreated() at the same time, or am I missing something? After this change, we'll only return null here if we time out 3 times. As explained in the changelog entry below, we only increment the counter upon IPC timeout (or any other kind of IPC communication failure). So if we receive, say, 10 `RemoteRenderingBackendProxy::DidCreateImageBufferBackend` responses where the 10th response confirms our RemoteImageBufferProxy, we'll spin in the while loop a total of 10 times and then return successfully. However, if the GPU process, say, hits an infinite loop, we'll subsequently loop 3 times here and then give up and return null. I think I also want to refactor `ensureBackendCreated` so that it doesn't have a max timeout limit and instead returns an `ImageBufferBackend&`, but I was planning to do that separately. >> Source/WebKit/WebProcess/GPU/graphics/RemoteImageBufferProxy.h:140 >> + } > > It seems a bit odd to call these timeouts if they also represent other sync messages that have been delivered in the interim (which is what I think you are saying they do). Can we differentiate between actual timeouts and other things that would cause waitForDidCreateImageBufferBackend to return prematurely? Hm…is this addressed by my comment above?
(In reply to Sam Weinig from comment #2) > Comment on attachment 416124 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=416124&action=review > > > Source/WebKit/ChangeLog:21 > > + B subsequently finishes waiting for backend creation because it incorrectly believes that its backend has been > > + initialized, and we end up crashing downstream in `RemoteLayerBackingStore::encode()` because > > + `ensureBackendCreated()` for B returned null. > > Seems like we will still have this issue, just perhaps less often, since we > will still return null from ensureBackendCreated() if enough callers call > ensureBackendCreated() at the same time, or am I missing something? I should also clarify — the DidCreateImageBufferBackend message is both sent by the GPU process asynchronously and received by the web process asynchronously. We only hit this sync codepath in the web process (ensureBackendCreated) in the case where we definitely require the backend to have been created in the web process. Since all callers of `ensureBackendCreated()` are on the main thread, there should only ever be one caller of `ensureBackendCreated()` at a time in the web process — the issue is that there could be multiple `DidCreateImageBufferBackend` messages that have been dispatched by the GPU process, and are waiting to be handled in the web process at the point where we call `ensureBackendCreated()`. > > > Source/WebKit/WebProcess/GPU/graphics/RemoteImageBufferProxy.h:140 > > + static constexpr unsigned maximumNumberOfTimeouts = 3; > > + unsigned numberOfTimeouts = 0; > > + while (!m_backend && numberOfTimeouts < maximumNumberOfTimeouts) { > > + if (!m_remoteRenderingBackendProxy->waitForDidCreateImageBufferBackend()) > > + ++numberOfTimeouts; > > + } > > It seems a bit odd to call these timeouts if they also represent other sync > messages that have been delivered in the interim (which is what I think you > are saying they do). Can we differentiate between actual timeouts and other > things that would cause waitForDidCreateImageBufferBackend to return > prematurely?
(In reply to Wenson Hsieh from comment #3) > Comment on attachment 416124 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=416124&action=review > > >> Source/WebKit/ChangeLog:21 > >> + `ensureBackendCreated()` for B returned null. > > > > Seems like we will still have this issue, just perhaps less often, since we will still return null from ensureBackendCreated() if enough callers call ensureBackendCreated() at the same time, or am I missing something? > > After this change, we'll only return null here if we time out 3 times. As > explained in the changelog entry below, we only increment the counter upon > IPC timeout (or any other kind of IPC communication failure). > > So if we receive, say, 10 > `RemoteRenderingBackendProxy::DidCreateImageBufferBackend` responses where > the 10th response confirms our RemoteImageBufferProxy, we'll spin in the > while loop a total of 10 times and then return successfully. However, if the > GPU process, say, hits an infinite loop, we'll subsequently loop 3 times > here and then give up and return null. I'm having trouble following the logic in the code, but I think I get it. If RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() returns true, which happens when a sync message is received (and not a timeout) we just loop. I think the real issue I am having is that the function RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() returns a bool and gives no indication what that bool means (thought this seems to extend down in Connection::waitForAndDispatchImmediately as well). Ideally we would only return bools from functions that were clearly predicates. Can we change RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() to return an enum to make it make it clear what it is saying. Otherwise, this code is just mysterious.
(In reply to Sam Weinig from comment #5) > (In reply to Wenson Hsieh from comment #3) > > Comment on attachment 416124 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=416124&action=review > > > > >> Source/WebKit/ChangeLog:21 > > >> + `ensureBackendCreated()` for B returned null. > > > > > > Seems like we will still have this issue, just perhaps less often, since we will still return null from ensureBackendCreated() if enough callers call ensureBackendCreated() at the same time, or am I missing something? > > > > After this change, we'll only return null here if we time out 3 times. As > > explained in the changelog entry below, we only increment the counter upon > > IPC timeout (or any other kind of IPC communication failure). > > > > So if we receive, say, 10 > > `RemoteRenderingBackendProxy::DidCreateImageBufferBackend` responses where > > the 10th response confirms our RemoteImageBufferProxy, we'll spin in the > > while loop a total of 10 times and then return successfully. However, if the > > GPU process, say, hits an infinite loop, we'll subsequently loop 3 times > > here and then give up and return null. > > I'm having trouble following the logic in the code, but I think I get it. If > RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() returns > true, which happens when a sync message is received (and not a timeout) we > just loop. Ah, yes — this was indeed the key bit. > > I think the real issue I am having is that the function > RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() returns a > bool and gives no indication what that bool means (thought this seems to > extend down in Connection::waitForAndDispatchImmediately as well). Ideally > we would only return bools from functions that were clearly predicates. Can > we change RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() > to return an enum to make it make it clear what it is saying. Otherwise, > this code is just mysterious. Yep, that sounds good! Will change this to return something along the lines of: enum class DidReceiveBackendCreationMessage : bool { No, Yes };
(In reply to Wenson Hsieh from comment #6) > (In reply to Sam Weinig from comment #5) > > (In reply to Wenson Hsieh from comment #3) > > > Comment on attachment 416124 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=416124&action=review > > > > > > >> Source/WebKit/ChangeLog:21 > > > >> + `ensureBackendCreated()` for B returned null. > > > > > > > > Seems like we will still have this issue, just perhaps less often, since we will still return null from ensureBackendCreated() if enough callers call ensureBackendCreated() at the same time, or am I missing something? > > > > > > After this change, we'll only return null here if we time out 3 times. As > > > explained in the changelog entry below, we only increment the counter upon > > > IPC timeout (or any other kind of IPC communication failure). > > > > > > So if we receive, say, 10 > > > `RemoteRenderingBackendProxy::DidCreateImageBufferBackend` responses where > > > the 10th response confirms our RemoteImageBufferProxy, we'll spin in the > > > while loop a total of 10 times and then return successfully. However, if the > > > GPU process, say, hits an infinite loop, we'll subsequently loop 3 times > > > here and then give up and return null. > > > > I'm having trouble following the logic in the code, but I think I get it. If > > RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() returns > > true, which happens when a sync message is received (and not a timeout) we > > just loop. > > Ah, yes — this was indeed the key bit. > > > > > I think the real issue I am having is that the function > > RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() returns a > > bool and gives no indication what that bool means (thought this seems to > > extend down in Connection::waitForAndDispatchImmediately as well). Ideally > > we would only return bools from functions that were clearly predicates. Can > > we change RemoteRenderingBackendProxy::waitForDidCreateImageBufferBackend() > > to return an enum to make it make it clear what it is saying. Otherwise, > > this code is just mysterious. > > Yep, that sounds good! Will change this to return something along the lines > of: > > enum class DidReceiveBackendCreationMessage : bool { > No, > Yes > }; I think it would be even better if the reason could be in there. Perhaps something like: enum class DidReceiveBackendCreationResult : bool { Success, TimeoutOrIPCFailure }; I am not super happy with "Success", since it might be success for a different backend, though I am not sure which word would indicate the truth succinctly. (Sorry for not being more specific before).
Created attachment 416135 [details] Patch
Comment on attachment 416135 [details] Patch Thanks for the review!
Committed r270775: <https://trac.webkit.org/changeset/270775> All reviewed patches have been landed. Closing bug and clearing flags on attachment 416135 [details].
<rdar://problem/72302200>