Bug 228665 - [GPU Process] REGRESSION: iCloud Photos Web app may crash WebProcess once the GPUProcess is relaunched
Summary: [GPU Process] REGRESSION: iCloud Photos Web app may crash WebProcess once the...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Canvas (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Said Abou-Hallawa
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-07-30 15:50 PDT by Said Abou-Hallawa
Modified: 2021-08-10 14:27 PDT (History)
5 users (show)

See Also:


Attachments
Patch (7.06 KB, patch)
2021-07-30 16:42 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (3.57 KB, patch)
2021-08-03 20:27 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Said Abou-Hallawa 2021-07-30 15:50:28 PDT
If the WebProcess has sent a RemoteRenderingBackend.ReleaseRemoteResource message but the GPUProcess was terminated an has just been relaunched. The new relaunched GPUProcess will process this message for a resource which it has not recached it yet. In this scenario, the WebProcess will be mistakenly terminated.
Comment 1 Said Abou-Hallawa 2021-07-30 16:41:30 PDT
<rdar://80857877>
Comment 2 Said Abou-Hallawa 2021-07-30 16:42:54 PDT
Created attachment 434673 [details]
Patch
Comment 3 Sam Weinig 2021-08-01 10:51:09 PDT
(In reply to Said Abou-Hallawa from comment #0)
> If the WebProcess has sent a RemoteRenderingBackend.ReleaseRemoteResource
> message but the GPUProcess was terminated an has just been relaunched. The
> new relaunched GPUProcess will process this message for a resource which it
> has not recached it yet. In this scenario, the WebProcess will be mistakenly
> terminated.

Why not just make the WebProcess not send the message in this case?
Comment 4 Said Abou-Hallawa 2021-08-02 12:56:04 PDT
I put some logging and I tried it on iOS device by opening iCloud/Photos/Albums/Live Photos and I kept moving between scrolling the album and swiping the between the images after clicking on one of them and here is what I found:

com.apple.WebKit.GPU: (JavaScriptCore) shallawa: this = 0x106d5c030, in RemoteResourceCache::RemoteResourceCache()
...
com.apple.WebKit.WebContent: (JavaScriptCore) shallawa: this = 0x10109c4d0, in RemoteRenderingBackendProxy::cacheNativeImage() renderingResourceIdentifier = 1070
com.apple.WebKit.GPU: (JavaScriptCore) shallawa: this = 0x106d5c030, in RemoteResourceCache::cacheNativeImage() renderingResourceIdentifier = 1070
...
com.apple.WebKit.GPU: (JavaScriptCore) shallawa: this = 0x104944030, in RemoteResourceCache::RemoteResourceCache()
...
com.apple.WebKit.WebContent: (JavaScriptCore) shallawa: this = 0x10109c4d0, in RemoteRenderingBackendProxy::releaseRemoteResource() renderingResourceIdentifier = 1070
...
com.apple.WebKit.GPU: (JavaScriptCore) shallawa: this = 0x104944030, in RemoteResourceCache::releaseRemoteResource() resource was not found in m_resourceUseCounters renderingResourceIdentifier = 1070

And this is the explanation of these messages:

(message 1) (GPUProcess) Create a RemoteResourceCache (0x106d5c030)
(message 2) (WebProcess) Cache the NativeImage (1070) in RemoteRenderingBackendProxy (0x10109c4d0)
(message 3) (GPUProcess) Cache the NativeImage (1070) in RemoteResourceCache (0x106d5c030)
(message 4) (GPUProcess) Create a new RemoteResourceCache (0x104944030)
(message 5) (WebProcess) Release the NativeImage (1070) in RemoteRenderingBackendProxy (0x10109c4d0)
(message 6) (GPUProcess) The NativeImage (1070) could not be found in RemoteResourceCache (0x104944030)

It is clear from (message 1) and (message 4) the RemoteResourceCache (0x106d5c030) has crashed/terminated and a new one was created 0x104944030). (message 2) and (message 6) show that the NativeImage was cached in a RemoteResourceCache (0x106d5c030) but was requested to be released from another RemoteResourceCache (0x104944030). And this is why RemoteResourceCache::releaseRemoteResource() returns false in this case and terminates the WebProcess.
Comment 5 Sam Weinig 2021-08-02 14:10:28 PDT
Seems like message 5 is the issue then? 

When the GPU Process crashes, the resources all need to know they are no longer being used and therefore don't need to message for release.
Comment 6 Said Abou-Hallawa 2021-08-03 20:27:06 PDT
Created attachment 434881 [details]
Patch
Comment 7 Said Abou-Hallawa 2021-08-04 10:35:41 PDT
I think the failures on mac-AS bot are unrelated.
Comment 8 EWS 2021-08-04 10:43:23 PDT
Committed r280639 (240252@main): <https://commits.webkit.org/240252@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 434881 [details].