Bug 226426 - [iOS] UI process crashes when deallocating WKWebView in a script message handler during an active touch event
Summary: [iOS] UI process crashes when deallocating WKWebView in a script message hand...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: UI Events (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Wenson Hsieh
URL:
Keywords: InRadar
: 227088 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-05-29 15:50 PDT by Wenson Hsieh
Modified: 2021-06-16 13:26 PDT (History)
9 users (show)

See Also:


Attachments
Patch (19.80 KB, patch)
2021-05-29 16:14 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff
Patch (20.33 KB, patch)
2021-05-30 08:04 PDT, Wenson Hsieh
darin: review+
Details | Formatted Diff | Diff
Patch for landing (20.38 KB, patch)
2021-05-30 11:23 PDT, Wenson Hsieh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wenson Hsieh 2021-05-29 15:50:57 PDT
rdar://75425319
Comment 1 Wenson Hsieh 2021-05-29 16:14:28 PDT
Created attachment 430114 [details]
Patch
Comment 2 Darin Adler 2021-05-29 22:22:25 PDT
Comment on attachment 430114 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430114&action=review

> Source/WebKit/ChangeLog:22
> +        This is because the async replay block inside `WebPageProxy::handlePreventableTouchEvent()` strongly captures a
> +        reference to the `WebPageProxy`, thus keeping it alive; however, the `WebPageProxy`'s weak pointer to the page
> +        client is nulled out, which causes `WebPageProxy::pageClient()` to crash with a null dereference.
> +
> +        To fix this, we weakly capture `WebPageProxy` instead and return early if it has already been destroyed by the
> +        time we process the completion handler. Note that it's unnecessary to call into `doneDeferringTouch(Start|End)`
> +        to unblock any deferred gestures here, because the process of destroying the content view will call
> +        `-cleanUpInteraction` and remove all deferring gestures from the view, regardless of whether they're still in
> +        Possible state.

This doesn’t seem like it would fix the problem described above reliably. All we need for the crash to happen again is for someone to keep WebPageProxy alive by holding a ref to it after WebPageProxy’s weak pointer to the page client is null.

To fix the problem described above, we need to check the page client for null, not the WebPageProxy.

I don’t think this indirect fix is nearly as good. However, the idea that we should only capture a weak reference seems good in and of itself. It’s just insufficient to fix the "web page proxy could outlive the page client" issue.
Comment 3 Wenson Hsieh 2021-05-30 07:59:31 PDT
Comment on attachment 430114 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430114&action=review

>> Source/WebKit/ChangeLog:22
>> +        Possible state.
> 
> This doesn’t seem like it would fix the problem described above reliably. All we need for the crash to happen again is for someone to keep WebPageProxy alive by holding a ref to it after WebPageProxy’s weak pointer to the page client is null.
> 
> To fix the problem described above, we need to check the page client for null, not the WebPageProxy.
> 
> I don’t think this indirect fix is nearly as good. However, the idea that we should only capture a weak reference seems good in and of itself. It’s just insufficient to fix the "web page proxy could outlive the page client" issue.

Makes sense — I'll keep the weak capture, but add a null check for `m_pageClient` as well.
Comment 4 Wenson Hsieh 2021-05-30 08:04:58 PDT
Created attachment 430135 [details]
Patch
Comment 5 Darin Adler 2021-05-30 10:55:26 PDT
Comment on attachment 430135 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430135&action=review

> Source/WebKit/UIProcess/WebPageProxy.cpp:3141
> +            if (!m_pageClient)
> +                return;

I understand why we skip the calls to the page client if it’s null. Very mechanical and direct.

It’s less clear to me why it’s good to skip the call to didReceiveEvent. The reason is that it apparently depends on a non-null page client, and also that all the things it does are unimportant if the page client has already been destroyed.

I am starting to get the impression that there are a lot of calls to pageClient() that are just hoping they are never called after the underlying page has been destroyed, and none of them have a solid guarantee this is the case. Thus this problem is not specific to this particular async reply. For example, take the didReceiveEvent function itself. If it does any calls out to the app that is using WebKit, the perhaps that app can cause the page to be destroyed partway through that function. If so, a check at the top of the function isn’t good enough.

I think there’s a relatively deep problem with this class making assumptions about destruction not happening at an inopportune time. This is probably fixable, but not with changes that are just local to this function.

The patch is OK as is. I also think we are seeing that WebPageProxy’s assumptions about the WebCore::Page lifetime underlying it are flawed, and we could fix this up more completely so it doesn’t bite us in future.
Comment 6 Darin Adler 2021-05-30 10:56:05 PDT
> and none of them have a solid guarantee

and many of them do not have a solid guarantee
Comment 7 Wenson Hsieh 2021-05-30 11:20:10 PDT
Thanks for the review!

Yeah, it wasn't clear to me exactly when it's safe to assume the existence of m_pageClient inside WebPageProxy — it looks like there are only two other places that are robust to a null PageClient in WebPageProxy at the moment. After a bit of archaeology, it looks like m_pageClient was turned into a WeakPtr in trac.webkit.org/r235903, which suggests that the WebPageProxy wasn't originally intended to outlive its PageClient, as it was previously just a PageClient&. From code inspection, the only way this could happen is if the WebPageProxy is kept alive via strong reference (as was done here prior to the fix).

> It’s less clear to me why it’s good to skip the call to didReceiveEvent.

That's true — there isn't really a compelling reason for this, since didReceiveEvent() does not require a non-null m_pageClient in the case where we've handled a touch event. I will move the null check down by 1 line (i.e. after the didReceiveEvent() call) before landing.
Comment 8 Wenson Hsieh 2021-05-30 11:23:31 PDT
Created attachment 430138 [details]
Patch for landing
Comment 9 Darin Adler 2021-05-30 11:38:13 PDT
(In reply to Wenson Hsieh from comment #7)
> From code inspection, the only way this could happen is
> if the WebPageProxy is kept alive via strong reference

OK, but it’s normal for a reference-counted object to be kept alive via a reference. If for this class, it’s not normal, then the right thing to do could be to remove the reference counting from WebPageProxy and use a single owning pointer and WeakPtr instead.
Comment 10 EWS 2021-05-30 12:19:03 PDT
Committed r278254 (238291@main): <https://commits.webkit.org/238291@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 430138 [details].
Comment 11 ma 2021-06-11 04:03:48 PDT
Dear 

I found that it still crash even though my codes releated to messageHandlers have been removed. it really happened with message handlers?
Comment 12 Wenson Hsieh 2021-06-11 07:25:30 PDT
(In reply to ma from comment #11)
> Dear 
> 
> I found that it still crash even though my codes releated to messageHandlers
> have been removed. it really happened with message handlers?

Hi,

If you're seeing crashes underneath the WebKit framework, I suggest that you file a Bugzilla bug with a crash report (and, if possible, a simple test app as well and/or repro steps).

Thanks!
Comment 13 Konstantinos Natsios 2021-06-15 01:07:23 PDT
Is this patch going live in the next iOS version?
If so how can we deal with the crashes that we currently have in iOS 15?
Comment 14 Darin Adler 2021-06-15 09:02:41 PDT
(In reply to Konstantinos Natsios from comment #13)
> Is this patch going live in the next iOS version?

Yes. This fix is included in the iOS 15 developer beta.

> If so how can we deal with the crashes that we currently have in iOS 15?

I’m not sure I understand the question.

If you are seeing crashes you could report the bugs. As Wenson mentioned, to have the highest chance people will understand and be able to fix the bug from your report, we will want a crash report, a simple test app, and steps to reproduce.

Please don’t assume that any crash you see is this same bug.
Comment 15 Ali Juma 2021-06-16 13:26:26 PDT
*** Bug 227088 has been marked as a duplicate of this bug. ***