Bug 238558 - REGRESSION(r288847): GPU Process crash under GPUProcess::canExitUnderMemoryPressure()
Summary: REGRESSION(r288847): GPU Process crash under GPUProcess::canExitUnderMemoryPr...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Process Model (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Said Abou-Hallawa
URL:
Keywords: InRadar
Depends on:
Blocks: 236508
  Show dependency treegraph
 
Reported: 2022-03-30 10:18 PDT by Simon Fraser (smfr)
Modified: 2022-05-13 15:12 PDT (History)
11 users (show)

See Also:


Attachments
Crash log (19.91 KB, text/plain)
2022-03-30 10:18 PDT, Simon Fraser (smfr)
no flags Details
Patch (5.14 KB, patch)
2022-04-08 18:02 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (7.35 KB, patch)
2022-04-11 17:48 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Alternative (6.92 KB, patch)
2022-04-13 05:17 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Another alternative (6.83 KB, patch)
2022-04-13 21:47 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (10.05 KB, patch)
2022-04-14 19:47 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff
Patch (13.79 KB, patch)
2022-04-15 15:13 PDT, Said Abou-Hallawa
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Fraser (smfr) 2022-03-30 10:18:13 PDT
Created attachment 456139 [details]
Crash log

iOS layout tests show a crash with this backtrace:

Thread 0 Crashed::   Dispatch queue: com.apple.main-thread
0   com.apple.JavaScriptCore      	       0x136c83ace    WTFCrash
1   com.apple.WebKit              	       0x11903c278    WTFCrashWithInfo(int, char const*, char const*, int)
2   com.apple.WebKit              	       0x119c90970    WebKit::QualifiedResourceHeap::checkInvariants() const
3   com.apple.WebKit              	       0x119c996f6    WebKit::QualifiedResourceHeap::hasImageBuffer() const
4   com.apple.WebKit              	       0x119c83f76    WebKit::RemoteResourceCache::hasActiveDrawables() const
5   com.apple.WebKit              	       0x119c71346    WebKit::RemoteRenderingBackend::allowsExitUnderMemoryPressure() const
6   com.apple.WebKit              	       0x119c70dc0    WebKit::GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() const
7   com.apple.WebKit              	       0x119c76640    WebKit::GPUProcess::canExitUnderMemoryPressure() const
8   com.apple.WebKit              	       0x119c756fc    WebKit::GPUProcess::tryExitIfUnused()
9   com.apple.WebKit              	       0x119cb10fc    decltype(*(std::__1::forward<WebKit::GPUProcess*&>(fp0)).*fp()) std::__1::__invoke<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*&, void>(void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*&)
10  com.apple.WebKit              	       0x119cb107a    std::__1::__bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<>, __is_valid_bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, 0ul, std::__1::tuple<> >(void (WebKit::GPUProcess::*&)(), std::__1::tuple<WebKit::GPUProcess*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&)
11  com.apple.WebKit              	       0x119cb102e    std::__1::__bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<>, __is_valid_bind_return<void (WebKit::GPUProcess::*)(), std::__1::tuple<WebKit::GPUProcess*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*>::operator()<>()
12  com.apple.WebKit              	       0x119cb0fc6    WTF::Detail::CallableWrapper<std::__1::__bind<void (WebKit::GPUProcess::*&)(), WebKit::GPUProcess*>, void>::call()
13  com.apple.WebKit              	       0x11909b0c0    WTF::Function<void ()>::operator()() const
14  com.apple.WebKit              	       0x1195f2876    WebCore::Timer::fired()
15  com.apple.WebCore             	       0x1637dd2f4    WebCore::ThreadTimers::sharedTimerFiredInternal()
16  com.apple.WebCore             	       0x1637e772e    WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0::operator()() const
17  com.apple.WebCore             	       0x1637e76c6    WTF::Detail::CallableWrapper<WebCore::ThreadTimers::setSharedTimer(WebCore::SharedTimer*)::$_0, void>::call()
18  com.apple.WebCore             	       0x15f0206f0    WTF::Function<void ()>::operator()() const
19  com.apple.WebCore             	       0x16377521e    WebCore::MainThreadSharedTimer::fired()


https://build.webkit.org/results/Apple-iOS-15-Simulator-Debug-WK2-Tests/r292101%20(2082)/results.html
Comment 1 Radar WebKit Bug Importer 2022-03-30 10:18:36 PDT
<rdar://problem/91052033>
Comment 2 Simon Fraser (smfr) 2022-03-30 10:24:45 PDT
Looks like a case where we access the QualifiedResourceHeap on the main thread; it should only be accessed from the thread that RemoteRenderingBackend lives on.
Comment 3 Said Abou-Hallawa 2022-04-08 18:02:20 PDT
Created attachment 457131 [details]
Patch
Comment 4 Chris Dumez 2022-04-09 23:39:37 PDT
Comment on attachment 457131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review

> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:533
> +    m_workQueue->dispatchSync([&] {

We already tried that and it caused hangs:
https://trac.webkit.org/changeset/282693/webkit

Has the code changed in ways that it is now OK?
Comment 5 Chris Dumez 2022-04-09 23:42:47 PDT
(In reply to Chris Dumez from comment #4)
> Comment on attachment 457131 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=457131&action=review
> 
> > Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:533
> > +    m_workQueue->dispatchSync([&] {
> 
> We already tried that and it caused hangs:
> https://trac.webkit.org/changeset/282693/webkit
> 
> Has the code changed in ways that it is now OK?

I don't remember all the details but I think the issue had to do with the fact that the rendering backend thread was stuck on an IPC Semaphore (because that's how we were passing display lists to the GPU process, at least at the time, not sure if this changed). As a result, the background thread would not process the dispatch_sync().
Comment 6 Kimmo Kinnunen 2022-04-11 06:10:36 PDT
Comment on attachment 457131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review

> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:534
> +        hasActiveDrawables = m_remoteResourceCache.hasActiveDrawables();

Yeah, this is not really sound in many ways.


I think a better approach would be that:
1) RemoteResourceCache acquires ScopedRenderingResourcesRequest every time it starts having active drawables.
2) RemoteRenderingBackend::allowsExitUnderMemoryPressure() is removed altogether
3) GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() checks for new function ScopedRenderingResourcesRequest::hasActiveUsers() or similar

Additionally ScopedRenderingResourcesRequest could be renamed to something else, like ScopedGPUProcessActivity or something.

ScopedRenderingResourcesRequest could also be modified to not purge anything but just maintain s_requests
Comment 7 Kimmo Kinnunen 2022-04-11 11:05:06 PDT
Comment on attachment 457131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review

> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:535
> +    });

A better thread safe implementation of this would be to have the dispatched function capture a normal binarysemaphore, see the implementation at WorkQueue::dispatchSync.
(for your interest, it probably isn't good to do this sync call)
Comment 8 Said Abou-Hallawa 2022-04-11 13:50:55 PDT
Comment on attachment 457131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=457131&action=review

>> Source/WebKit/GPUProcess/graphics/RemoteRenderingBackend.cpp:535
>> +    });
> 
> A better thread safe implementation of this would be to have the dispatched function capture a normal binarysemaphore, see the implementation at WorkQueue::dispatchSync.
> (for your interest, it probably isn't good to do this sync call)

It seems this sync call might lead to deadlock. For instance RemoteDisplayListRecorder::paintFrameForMedia(), which runs on the "RemoteRenderingBackend work queue", makes a media call using callOnMainRunLoopAndWait(). So if RemoteRenderingBackend::allowsExitUnderMemoryPressure() blocks the execution on the main run loop till the "RemoteRenderingBackend work queue" executes the call to RemoteResourceCache::hasActiveDrawables() and there is a PaintFrameForMedia message which was dispatched to the StreamConnectionWorkQueue but has not been processed yet, a deadlock will happen. The main run loop and the "RemoteRenderingBackend work queue" will both be blocked; each is waiting for the other to finish.
Comment 9 Said Abou-Hallawa 2022-04-11 17:48:14 PDT
Created attachment 457303 [details]
Patch
Comment 10 Said Abou-Hallawa 2022-04-11 19:05:27 PDT
Although this patch may fix the assertion, I am not convinced RemoteRenderingBackend::allowsExitUnderMemoryPressure() has the right implementation. This function returns a value which is not guaranteed to stay the same even while it is used by the caller GPUConnectionToWebProcess::allowsExitUnderMemoryPressure().

The "RemoteRenderingBackend work queue" can change the state of RemoteRenderingBackend::m_remoteResourceCache while GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() makes a decision based on out-dated state.
Comment 11 Chris Dumez 2022-04-11 19:12:00 PDT
(In reply to Said Abou-Hallawa from comment #10)
> Although this patch may fix the assertion, I am not convinced
> RemoteRenderingBackend::allowsExitUnderMemoryPressure() has the right
> implementation. This function returns a value which is not guaranteed to
> stay the same even while it is used by the caller
> GPUConnectionToWebProcess::allowsExitUnderMemoryPressure().
> 
> The "RemoteRenderingBackend work queue" can change the state of
> RemoteRenderingBackend::m_remoteResourceCache while
> GPUConnectionToWebProcess::allowsExitUnderMemoryPressure() makes a decision
> based on out-dated state.

allowsExitUnderMemoryPressure() is known to be racy, that is fine. It is always racy anyway due to IPC. Worse case scenario, we'll exit the GPUProcess and the WebProcess will get notified it exited and it is needs it, it will relaunch it.

Note that eager exit on memory pressure is no longer useful once we enable DOM rendering in the GPUProcess I think (since the GPUProcess will always be needed for DOM rendering). As a result, we should probably consider disabling this idle exit logic altogether when DOM rendering in GPUProcess is enabled.
Comment 12 Chris Dumez 2022-04-12 11:23:43 PDT
API test failure looks related.
Comment 13 Kimmo Kinnunen 2022-04-13 05:17:29 PDT
Created attachment 457526 [details]
Alternative
Comment 14 Said Abou-Hallawa 2022-04-13 19:29:11 PDT
Comment on attachment 457526 [details]
Alternative

View in context: https://bugs.webkit.org/attachment.cgi?id=457526&action=review

> Source/WebKit/ChangeLog:18
> +        the count is now counted by RemoteMediaPlayerProxy creation, destruction.

I think this solution is a good alternative except it does not consider the native image an ActiveDrawable resource as the current code does. RemoteRenderingBackend calls updateRenderingResourceRequest() from RemoteRenderingBackend::createImageBufferWithQualifiedIdentifier() and RemoteRenderingBackend::releaseRemoteResourceWithQualifiedIdentifier(). It does not call it from RemoteRenderingBackend::cacheNativeImageWithQualifiedIdentifier(). 

Do we need to call it from cacheNativeImageWithQualifiedIdentifier()? 
Do we even need to consider the NativeImage an ActiveDrawable resource? 
Do we need to consider the Font also as an ActiveDrawable resource? 
What makes a resource be ActiveDrawable?

RemoteRenderingBackend::allowsExitUnderMemoryPressure() was added in r276148. This is the original version of it:

bool RemoteRenderingBackend::allowsExitUnderMemoryPressure() const
{
    return m_remoteResourceCache.imageBuffers().isEmpty() && m_remoteResourceCache.nativeImages().isEmpty();
}
Comment 15 Said Abou-Hallawa 2022-04-13 21:31:18 PDT
In fact this bug is a regression of r288847. This revision removes r282754 which was fixing exactly the same bug.
Comment 16 Said Abou-Hallawa 2022-04-13 21:47:33 PDT
Created attachment 457594 [details]
Another alternative
Comment 17 Kimmo Kinnunen 2022-04-14 03:08:49 PDT
(In reply to Said Abou-Hallawa from comment #14)
> I think this solution is a good alternative except it does not consider the
> native image an ActiveDrawable resource as the current code does.
> RemoteRenderingBackend calls updateRenderingResourceRequest() from
> RemoteRenderingBackend::createImageBufferWithQualifiedIdentifier() and
> RemoteRenderingBackend::releaseRemoteResourceWithQualifiedIdentifier(). It
> does not call it from
> RemoteRenderingBackend::cacheNativeImageWithQualifiedIdentifier(). 

Right, but that's just a bug about updateRenderingResourceRequest().
Comment 18 Said Abou-Hallawa 2022-04-14 19:47:45 PDT
Created attachment 457664 [details]
Patch
Comment 19 Said Abou-Hallawa 2022-04-14 20:48:09 PDT
I talked to Simon and we agreed on counting the ImageBuffers only when allowing or disallowing the GPUP to exit under the memory pressure. Counting the NativeImages should not affect this decision because they are remotely cached only when they are drawn on the RemoteImageBuffers. The life cycle of the NativeImage in GPUP is different from the life cycle of the ImageBuffer. The NativeImages in GPUP are released only when the counterpart NativeImages in WebProcess are released. 

So allowing the GPUP process to exit when no ImageBuffer is cached but there are NativeImages cached should be fine.
Comment 20 Simon Fraser (smfr) 2022-04-15 14:10:37 PDT
Comment on attachment 457664 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=457664&action=review

> Source/WebKit/ChangeLog:19
> +        Remove RemoteMediaPlayerManagerProxy::allowsExitUnderMemoryPressure(),
> +        the count is now counted by RemoteMediaPlayerProxy creation, destruction.

I don't understand this part. Where does RemoteMediaPlayerManagerProxy use ScopedRenderingResourcesRequest?
Comment 21 Said Abou-Hallawa 2022-04-15 15:13:41 PDT
Created attachment 457727 [details]
Patch
Comment 22 Said Abou-Hallawa 2022-04-15 15:18:39 PDT
(In reply to Said Abou-Hallawa from comment #21)
> Created attachment 457727 [details]
> Patch

In this patch I added a nicer approach which Simon suggested. RemoteImageBuffer will have ScopedRenderingResourcesRequest. So we are going to count the active ImageBuffers and not the RemoteRenderingBackends which have active ImageBuffers as we used to do before.
Comment 23 EWS 2022-04-18 11:31:24 PDT
Committed r292961 (249726@main): <https://commits.webkit.org/249726@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 457727 [details].