Bug 222336

Summary: [Cocoa] Make it possible to release a WKWebView on a non-main thread without a crash due to WKScriptMessage race
Product: WebKit Reporter: Joshua Wise <joshua.wise>
Component: WebKit APIAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, beidson, benjamin, cdumez, cmarcelo, darin, eric.carlson, ews-watchlist, ggaren, glenn, jer.noble, jiewen_tan, joshua.wise, keith_miller, mark.lam, msaboff, nathan, philipj, saam, sergio, timothy, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Safari 14   
Hardware: iPhone / iPad   
OS: iOS 14   
See Also: https://bugs.webkit.org/show_bug.cgi?id=223013
https://bugs.webkit.org/show_bug.cgi?id=188955
Attachments:
Description Flags
Minimized reproducer for WKScriptMessage _initWithBody / WKWebView release race.
none
Patch
ews-feeder: commit-queue-
Patch
none
Patch
none
Patch cdumez: review+

Description Joshua Wise 2021-02-23 16:15:41 PST
Created attachment 421367 [details]
Minimized reproducer for WKScriptMessage _initWithBody / WKWebView release race.

If a `WKWebView` is destroyed while the main thread is processing a `WKScriptMessage` that is sent from the JavaScript to the embedder, then the weak reference to the `WKWebView` as part of `[WKScriptMessage _initWithBody]` can fall apart, with the following stack trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00000001c6fe21f0 libsystem_kernel.dylib`__abort_with_payload + 8
    frame #1: 0x00000001c6fe69c0 libsystem_kernel.dylib`abort_with_payload_wrapper_internal + 100
    frame #2: 0x00000001c6fe695c libsystem_kernel.dylib`abort_with_reason + 28
    frame #3: 0x00000001af7ca4cc libobjc.A.dylib`_objc_fatalv(unsigned long long, unsigned long long, char const*, char*) + 104
    frame #4: 0x00000001af7ca464 libobjc.A.dylib`_objc_fatal(char const*, ...) + 28
    frame #5: 0x00000001af7c61ec libobjc.A.dylib`weak_register_no_lock + 344
    frame #6: 0x00000001af7c7ff0 libobjc.A.dylib`objc_storeWeak + 348
    frame #7: 0x00000001a72d3ed4 WebKit`-[WKScriptMessage _initWithBody:webView:frameInfo:name:world:] + 136
    frame #8: 0x00000001a72de3fc WebKit`ScriptMessageHandlerDelegate::didPostMessage(WebKit::WebPageProxy&, WebKit::FrameInfoData&&, API::ContentWorld&, WebCore::SerializedScriptValue&) + 204
    frame #9: 0x00000001a74cbeb8 WebKit`WebKit::WebUserContentControllerProxy::didPostMessage(WTF::ObjectIdentifier<WebKit::WebPageProxyIdentifierType>, WebKit::FrameInfoData&&, unsigned long long, IPC::DataReference const&, WTF::CompletionHandler<void (IPC::DataReference const&, WTF::String const&)>&&) + 556
    frame #10: 0x00000001a76f07c8 WebKit`void IPC::handleMessageAsync<Messages::WebUserContentControllerProxy::DidPostMessage, WebKit::WebUserContentControllerProxy, void (WebKit::WebUserContentControllerProxy::*)(WTF::ObjectIdentifier<WebKit::WebPageProxyIdentifierType>, WebKit::FrameInfoData&&, unsigned long long, IPC::DataReference const&, WTF::CompletionHandler<void (IPC::DataReference const&, WTF::String const&)>&&)>(IPC::Connection&, IPC::Decoder&, WebKit::WebUserContentControllerProxy*, void (WebKit::WebUserContentControllerProxy::*)(WTF::ObjectIdentifier<WebKit::WebPageProxyIdentifierType>, WebKit::FrameInfoData&&, unsigned long long, IPC::DataReference const&, WTF::CompletionHandler<void (IPC::DataReference const&, WTF::String const&)>&&)) + 3232
    frame #11: 0x00000001a76efb14 WebKit`WebKit::WebUserContentControllerProxy::didReceiveMessage(IPC::Connection&, IPC::Decoder&) + 76
    frame #12: 0x00000001a70e666c WebKit`IPC::MessageReceiverMap::dispatchMessage(IPC::Connection&, IPC::Decoder&) + 116
    frame #13: 0x00000001a73f4584 WebKit`WebKit::WebProcessProxy::didReceiveMessage(IPC::Connection&, IPC::Decoder&) + 36
    frame #14: 0x00000001a70ca218 WebKit`IPC::Connection::dispatchMessage(std::__1::unique_ptr<IPC::Decoder, std::__1::default_delete<IPC::Decoder> >) + 596
    frame #15: 0x00000001a70c9b5c WebKit`IPC::Connection::dispatchIncomingMessages() + 492
    frame #16: 0x00000001a4f38bcc JavaScriptCore`WTF::RunLoop::performWork() + 528
    frame #17: 0x00000001a4f39718 JavaScriptCore`WTF::RunLoop::performWork(void*) + 32
    frame #18: 0x000000019b3a4bf0 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24

It is generally considered to be bad etiquette in the UIKit world to call a function on a UIView without being on the main thread.  However, I would argue that this is something of an exception: in this case, `[WKWebView release]` is the function that races, which means that not only is it unsafe to call a method on a WKWebView, but it's unsafe even to hold a strong reference to a WKWebView (or to an object that holds a strong reference to a WKWebView deep in its hierarchy!) and do nothing with it.  This probably should be something that is safe to do.

Attached is a minimized reproducer for this crash.  On my iPod Touch (7th Generation), it crashes in about 5 to 10 seconds.
Comment 1 Radar WebKit Bug Importer 2021-03-02 16:16:24 PST
<rdar://problem/74955196>
Comment 2 Chris Dumez 2021-03-09 16:07:56 PST
The test case seems to be using the WKWebView API off the main thread, which is not supported. The WKWebView API is main thread only. I am not convinced there is a bug here.
Comment 3 Chris Dumez 2021-03-09 16:09:18 PST
Please let me know if I misread the test case or misunderstood the issue.
Comment 4 Joshua Wise 2021-03-17 01:01:47 PDT
Hi Chris --

In general, I agree with you: the UIKit API (and the WKWebView API) shouldn't be called off the main thread.  I think that `release`ing -- and, by extension, `dealloc`ing -- a WKWebView needs to be a special exception to this rule, though.  It is very difficult to avoid accidentally causing a `[WKWebView dealloc]` off of the main thread: although in this minimal reproducer case, a `WKWebView` is directly held off the main thread, consider the case in which another object deeply holds a strong reference to a `WKWebView`, and another thread happens to hold a strong reference to this object.

It seems like an exceptionally non-obvious pitfall that this can go wrong, along three axes of surprise for a programmer:

1) There is no textual representation in the source that a disallowed use of `WKWebView` will occur: even if a block does not directly make mention of a `WKWebView`, the failure can occur.  (Phrased another way, it would be expected that it is unsafe to use a `WKWebView` -- but it's not expected that it's unsafe to even "think about" a `WKWebView`.)

2) Although this is disallowed behavior for other `UIKit` APIs (and objects that derive from `UIView`), in practice, deallocation from off the main thread does not seem to fail for other `UIKit` APIs.  (This is, admittedly, weak justification.  But programmers are, if nothing else, pattern recognition machines...)

3) There is a violation of an otherwise-reasonable assumption in a programmer's mental model that ARC-managed objects should be, in general, responsible for avoiding wild pointers (i.e., that an ARC-managed object, if it needs to have a longer retain lifecycle than a local scope allows, should be managed by the holder of that pointer).

Although `WKWebView` is technically *allowed* to fail in this fashion by its specification, it is probably not a good idea for it to do so nondeterministically: it should either be tolerant of this misuse, or reliably crash or emit a diagnostic on this misuse [1].

Technically, it is probably not a bug, if the defect metric is "does WKWebView correctly implement the specification".  But if the question is "does WKWebView present an interface that avoids footguns", I would probably classify this as a defect.

Thanks for your consideration --
joshua

[1] One way that `WKWebView` could emit a diagnostic is particularly unpleasant: it could implement `[WKWebView release]` on its own (eventually calling into `[NSObject release]`), and call into the particularly-reviled `[self retainCount]` to determine whether the system is about to deallocate the object.  If it is, and the `release` operation is happening off the main thread -- i.e., `[self retainCount] == 1 && ![NSThread isMainThread]` -- then WKWebView could subsequently assert, or minimally, emit a warning.

This relies on additional undefined and unpleasant behavior -- `[NSObject retainCount]` is specified to be unspecified.  But not only does it *happen* to work for this application, examination of the Objective-C runtime reveals that it *will* work for this application, too.  Tests of this by swizzling `[WKWebView release]` reliably catch this behavior in user applications.  (Astute observers may note that this cannot be 100% reliable, and that there is another -- smaller -- race condition here.  That's true, but the diagnostic now occurs when we 'win' the race, not when we 'lose', and as a best-effort method to help users avoid API misuse, that's probably good enough.)

All that said, doing that is not pleasant.  It may just be better to make WKWebView tolerant of this misuse :-)
Comment 5 Chris Dumez 2021-03-17 07:32:06 PDT
(In reply to Joshua Wise from comment #4)
> Hi Chris --
> 
> In general, I agree with you: the UIKit API (and the WKWebView API)
> shouldn't be called off the main thread.  I think that `release`ing -- and,
> by extension, `dealloc`ing -- a WKWebView needs to be a special exception to
> this rule, though.  It is very difficult to avoid accidentally causing a
> `[WKWebView dealloc]` off of the main thread: although in this minimal
> reproducer case, a `WKWebView` is directly held off the main thread,
> consider the case in which another object deeply holds a strong reference to
> a `WKWebView`, and another thread happens to hold a strong reference to this
> object.
> 
> It seems like an exceptionally non-obvious pitfall that this can go wrong,
> along three axes of surprise for a programmer:
> 
> 1) There is no textual representation in the source that a disallowed use of
> `WKWebView` will occur: even if a block does not directly make mention of a
> `WKWebView`, the failure can occur.  (Phrased another way, it would be
> expected that it is unsafe to use a `WKWebView` -- but it's not expected
> that it's unsafe to even "think about" a `WKWebView`.)
> 
> 2) Although this is disallowed behavior for other `UIKit` APIs (and objects
> that derive from `UIView`), in practice, deallocation from off the main
> thread does not seem to fail for other `UIKit` APIs.  (This is, admittedly,
> weak justification.  But programmers are, if nothing else, pattern
> recognition machines...)
> 
> 3) There is a violation of an otherwise-reasonable assumption in a
> programmer's mental model that ARC-managed objects should be, in general,
> responsible for avoiding wild pointers (i.e., that an ARC-managed object, if
> it needs to have a longer retain lifecycle than a local scope allows, should
> be managed by the holder of that pointer).
> 
> Although `WKWebView` is technically *allowed* to fail in this fashion by its
> specification, it is probably not a good idea for it to do so
> nondeterministically: it should either be tolerant of this misuse, or
> reliably crash or emit a diagnostic on this misuse [1].
> 
> Technically, it is probably not a bug, if the defect metric is "does
> WKWebView correctly implement the specification".  But if the question is
> "does WKWebView present an interface that avoids footguns", I would probably
> classify this as a defect.
> 
> Thanks for your consideration --
> joshua
> 
> [1] One way that `WKWebView` could emit a diagnostic is particularly
> unpleasant: it could implement `[WKWebView release]` on its own (eventually
> calling into `[NSObject release]`), and call into the particularly-reviled
> `[self retainCount]` to determine whether the system is about to deallocate
> the object.  If it is, and the `release` operation is happening off the main
> thread -- i.e., `[self retainCount] == 1 && ![NSThread isMainThread]` --
> then WKWebView could subsequently assert, or minimally, emit a warning.
> 
> This relies on additional undefined and unpleasant behavior -- `[NSObject
> retainCount]` is specified to be unspecified.  But not only does it *happen*
> to work for this application, examination of the Objective-C runtime reveals
> that it *will* work for this application, too.  Tests of this by swizzling
> `[WKWebView release]` reliably catch this behavior in user applications. 
> (Astute observers may note that this cannot be 100% reliable, and that there
> is another -- smaller -- race condition here.  That's true, but the
> diagnostic now occurs when we 'win' the race, not when we 'lose', and as a
> best-effort method to help users avoid API misuse, that's probably good
> enough.)
> 
> All that said, doing that is not pleasant.  It may just be better to make
> WKWebView tolerant of this misuse :-)

Deallocating a WKWebView (and other WK API objects) on a non-main thread was made safe very recently via Bug 223013.
Comment 6 Joshua Wise 2021-03-17 10:34:44 PDT
Hi Chris --

*Very* interesting!

Although I don't have an environment to test this locally, I strongly suspect that if you run this reproducer against a WebKit that has been patched with Bug 223013, however, you will discover that the crash still occurs.  When I tried the thing I mentioned in [1] in comment #4 below, I initially implemented this as a swizzle on `[WKWebView dealloc]` that teleported the `dealloc` to the main thread (much as Bug 223013 does), rather than `[WKWebView release]`, and I found that I would still crash inside of `weak_register_no_lock`.

The reason why is that deallocation of an ARC or MRC managed object actually happens in two phases in Objective-C (much to my consternation...).  The base implementation of `[NSObject release]` is implemented as the C++ function `objc_object::rootRelease`: https://github.com/opensource-apple/objc4/blob/cd5e62a5597ea7a31dccef089317abb3a661c154/runtime/objc-object.h#L567 .  But, as you note in the linked line, just *before* it calls into `[self dealloc]`, it sets a flag in the object's internal state, `deallocating`!

Unfortunately, the core implementation of a weak pointer store -- weak_register_no_lock -- checks for this `deallocating` flag as set by `release`, not as set by `dealloc`: https://github.com/opensource-apple/objc4/blob/master/runtime/objc-weak.mm#L402

So teleporting `dealloc` to the main thread is not sufficient, and instead we need to teleport `release`.

I will make reference to this in Bug 223013 too.

Thanks,
joshua
Comment 7 Geoffrey Garen 2021-03-17 11:00:37 PDT
Thanks for clarifying.

I think this issue is more narrow than general -release thread safety. I think this issue might be the use of __unsafe_unretained here:

static HashMap<WebKit::WebPageProxy*, __unsafe_unretained WKWebView *>& pageToViewMap()
{
    static NeverDestroyed<HashMap<WebKit::WebPageProxy*, __unsafe_unretained WKWebView *>> map;
    return map;
}

-[WKScriptMessage _initWithBody...] is correctly signaling the error that we're storing a raw pointer into a weak pointer after the pointed-to object has initiated destruction.

The trivial solution would be to store a weak pointer in the map. Maybe that would just work. But I'm not sure why we chose __unsafe_unretained in the first place.
Comment 8 Chris Dumez 2021-03-17 11:04:45 PDT
Reopening since it appears some issues persist after Bug 223013.
Comment 9 Darin Adler 2021-03-17 11:07:01 PDT
We definitely will *not* fix this by "teleporting" a release to the main thread!

I think we should consider using an NSMapTable for the pageToViewMap, to fix this.
Comment 10 Darin Adler 2021-03-17 11:10:44 PDT
This is not a bug, exactly.

Historically, the modern WebKit has *not* supported releasing its objects on non-main threads. Any app running into this can and should fix this by doing release of WebKit objects only on the main thread.

However, it would be nice to allow releasing WebKit objects on any thread. That's why we made the changes in bug 223013, and we *can* fix this issue as well. It would be nice to make WebKit programming easier.
Comment 11 Darin Adler 2021-03-17 11:32:42 PDT
Retitled this to reflect the enhancement aspect of the bug.
Comment 12 Darin Adler 2021-03-17 11:34:16 PDT
Releasing an object off the main thread is *not* an exception to the threading rule at this time, but in a future release we’d like it to be, as an enhancement.

It’s possible and practical to avoid releasing a WKWebView on a non-main thread and any program expecting to work correctly with today’s iOS must do that.
Comment 13 Joshua Wise 2021-03-17 12:31:35 PDT
Thanks, all, for the considered feedback.

(In reply to Geoffrey Garen from comment #7)
> -[WKScriptMessage _initWithBody...] is correctly signaling the error that
> we're storing a raw pointer into a weak pointer after the pointed-to object
> has initiated destruction.
> 
> The trivial solution would be to store a weak pointer in the map. Maybe that
> would just work. But I'm not sure why we chose __unsafe_unretained in the
> first place.

One of the reasons why I did not propose a patch to fix this is that I didn't really understand the semantics of thread safety of weak pointers *in general* in Objective-C.  When I was reading through the Objective-C runtime, it wasn't obvious to me that, for instance:

  __weak id foo1; /* already assigned to something */
  __weak id foo2 = foo1;

cannot race with the destruction of p in a way that could cause a very similar crash.

I could not find a reference anywhere as to what the semantics of this are supposed to be...

(In reply to Darin Adler from comment #9)
> We definitely will *not* fix this by "teleporting" a release to the main
> thread!

I agree :) I actually do "teleport" in a swizzle now to keep misbehaving customer apps from crashing.  (Sorry.)  But I would rather not.

(In reply to Darin Adler from comment #12)
> It’s possible and practical to avoid releasing a WKWebView on a non-main
> thread and any program expecting to work correctly with today’s iOS must do
> that.

This is probably the most important thing here.  This is a fix that will make things better going forward, but it won't fix issues on shipping iOS -- and, as a result, applications need to be careful of this *anyway* for the foreseeable future.

The genesis of this is that I have a framework that can inject WKScriptMessages into existing WKWebViews -- and, as a result, applications that don't crash normally (but still rely on this undefined and disallowed behavior) begin crashing with this race condition, and the customers of this framework blame me :)

The fix that I think might be more palatable to you would be to detect this condition in -release, and rather than "teleporting" -release, instantly abort() when it's called off the main thread.  This allows end users to quickly detect in development when something is about to go wrong (and fix it), and, more importantly, means they'll stop blaming me :)
Comment 14 Chris Dumez 2021-03-17 14:14:50 PDT
The test project indeed still crashes with WebKit trunk:
Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: Namespace OBJC, Code 0x1
Termination Description: OBJC, Cannot form weak reference to instance (0x109820c00) of class WKWebView. It is possible that this object was over-released, or is in the process of deallocation.
Triggered by Thread:  0

Thread 0 name:  Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0   libsystem_kernel.dylib        	0x00000001beb5dfd8 __abort_with_payload + 8
1   libsystem_kernel.dylib        	0x00000001beb62848 abort_with_payload_wrapper_internal + 100
2   libsystem_kernel.dylib        	0x00000001beb627e4 abort_with_payload_wrapper_internal + 0
3   libobjc.A.dylib               	0x00000001a4ebc7b4 _objc_fatalv(unsigned long long, unsigned long long, char const*, char*) + 104
4   libobjc.A.dylib               	0x00000001a4ebc74c _objc_fatalv(unsigned long long, unsigned long long, char const*, char*) + 0
5   libobjc.A.dylib               	0x00000001a4eb8cec weak_clear_no_lock + 0
6   libobjc.A.dylib               	0x00000001a4ebaacc objc_storeWeak + 344
7   WebKit                        	0x0000000106a1979c -[WKScriptMessage _initWithBody:webView:frameInfo:name:world:] + 136
8   WebKit                        	0x0000000106a1b888 ScriptMessageHandlerDelegate::didPostMessage(WebKit::WebPageProxy&, WebKit::FrameInfoData&&, API::ContentWorld&, WebCore::SerializedScriptValue&) + 196
9   WebKit                        	0x0000000106c23ddc WebKit::WebUserContentControllerProxy::didPostMessage(WTF::ObjectIdentifier<WebKit::WebPageProxyIdentifierType>, WebKit::FrameInfoData&&, unsigned long long, IPC::ArrayReference<unsigned char, 18446744073709551615ul> const&, WTF::CompletionHandler<void (IPC::ArrayReference<unsigned char, 18446744073709551615ul> const&, WTF::String const&)>&&) + 364
10  WebKit                        	0x0000000106efaeb4 void IPC::handleMessageAsync<Messages::WebUserContentControllerProxy::DidPostMessage, WebKit::WebUserContentControllerProxy, void (WebKit::WebUserContentControllerProxy::*)(WTF::ObjectIdentifier<WebKit::WebPageProxyIdentifierType>, WebKit::FrameInfoData&&, unsigned long long, IPC::ArrayReference<unsigned char, 18446744073709551615ul> const&, WTF::CompletionHandler<void (IPC::ArrayReference<unsigned char, 18446744073709551615ul> const&, WTF::String const&)>&&)>(IPC::Connection&, IPC::Decoder&, WebKit::WebUserContentControllerProxy*, void (WebKit::WebUserContentControllerProxy::*)(WTF::ObjectIdentifier<WebKit::WebPageProxyIdentifierType>, WebKit::FrameInfoData&&, unsigned long long, IPC::ArrayReference<unsigned char, 18446744073709551615ul> const&, WTF::CompletionHandler<void (IPC::ArrayReference<unsigned char, 18446744073709551615ul> const&, WTF::String const&)>&&)) + 200
11  WebKit                        	0x0000000106efadc4 WebKit::WebUserContentControllerProxy::didReceiveMessage(IPC::Connection&, IPC::Decoder&) + 72
12  WebKit                        	0x00000001067d3a64 IPC::MessageReceiverMap::dispatchMessage(IPC::Connection&, IPC::Decoder&) + 260
13  WebKit                        	0x0000000106b704ec WebKit::WebProcessProxy::didReceiveMessage(IPC::Connection&, IPC::Decoder&) + 32
14  WebKit                        	0x00000001067b27d0 IPC::Connection::dispatchMessage(std::__1::unique_ptr<IPC::Decoder, std::__1::default_delete<IPC::Decoder> >) + 352
15  WebKit                        	0x00000001067b1e08 IPC::Connection::dispatchIncomingMessages() + 396
16  JavaScriptCore                	0x000000010a15aa34 WTF::RunLoop::performWork() + 368
17  JavaScriptCore                	0x000000010a15b30c WTF::RunLoop::performWork(void*) + 36
18  CoreFoundation                	0x000000018f3e5808 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
19  CoreFoundation                	0x000000018f3e5704 __CFRunLoopDoSource0 + 204
20  CoreFoundation                	0x000000018f3e4a78 __CFRunLoopDoSources0 + 256
21  CoreFoundation                	0x000000018f3def68 __CFRunLoopRun + 768
22  CoreFoundation                	0x000000018f3de724 CFRunLoopRunSpecific + 572
23  GraphicsServices              	0x00000001a7a29be4 GSEventRunModal + 160
24  UIKitCore                     	0x0000000192101d50 -[UIApplication _run] + 1064
25  UIKitCore                     	0x000000019210817c UIApplicationMain + 1904
26  WKScriptMessage race          	0x00000001044bddb4 0x1044b8000 + 23988
27  dyld                          	0x00000001045c5248 start + 564
Comment 15 Darin Adler 2021-03-17 16:43:48 PDT
To fix this we will change fromWebPageProxy so it returns a RetainPtr<WKWebView> and so that it can return nil. And as Geoff suggested, use weak object pointers under the hood rather than a hand-implemented alternative with different timing (later in the dealloc method rather than before dealloc is called). Therefore, during the dealloc process, when dealloc has begun but not finished yet, we'll return nil. Callers will all need to be revised to be able to deal with nil.

Separately, we will need to audit to see if we have any other cases like this with back-pointers from C++ code to Objective-C objects that might be in the process of deallocation. I’m hoping this is the only one.
Comment 16 Darin Adler 2021-03-17 20:54:41 PDT
Created attachment 423559 [details]
Patch
Comment 17 Darin Adler 2021-03-17 21:16:09 PDT
Created attachment 423561 [details]
Patch
Comment 18 Chris Dumez 2021-03-18 08:31:22 PDT
Looks like the URLSchemeHandler.SyncXHR API test may have regressed.
Comment 19 Darin Adler 2021-03-18 09:41:40 PDT
Yes, noticed that last night. I’ll be debugging it to find out why when I have time later today.
Comment 20 Darin Adler 2021-03-18 12:54:08 PDT
This failure is an interesting case. The test requires that our WKURLSchemeHandler gets a -webView:stopURLSchemeTask: method called during the process of destroying a WKWebView. It's called because the WKWebView stops loading. The test does not tell the WKWebView to stop loading; instead it just releases the last reference to the WKWebView, and it's the deallocation of the WKWebView that causes the loading to stop. During deallocation, weak references to a WKWebView are already nil. The old code would pass this "zombie" WKWebView pointer to the -webView:stopURLSchemeTask: method. If anyone tried to retain the WKWebView, I believe we would see the same kind of crash as the one described in this bug report, even without a multi-thread race.

So we have choices:

1) Decide that not calling -webView:stopURLSchemeTask: in this case is OK. One risk of this strategy is that this could break real apps if they rely on this kind of behavior as the test does. If we take this option, then I'd need to update the test, perhaps I could use the -[WKWebView stopLoading] method instead of relying on deallocation.

2) Decide that we should call -webView:stopURLSchemeTask: with a WKWebView of nil. This would work fine for the test and likely would also be OK for most real clients too; very few are likely to actually use the WKWebView pointer and we could document this behavior.

3) Decide that we need to preserve this "pass the zombie pointer" behavior for this delegate. We can't do that for cases like WKScriptMessage that retain the WKWebView pointer, but we could do it for this case. Annoying to implement, but not impossibly difficult.
Comment 21 Darin Adler 2021-03-18 18:16:59 PDT
Chose option (1) after talking to some other WebKit contributors in the internal Apple Slack. Geoff Garen also pointed out that we should not throw exceptions from WKURLSchemeHandler if calls are made after a stopURLSchemeTask that was never delivered, so I will make that change too.
Comment 22 Darin Adler 2021-03-19 12:21:28 PDT
Created attachment 423758 [details]
Patch
Comment 23 Chris Dumez 2021-03-19 14:09:22 PDT
It appears URLSchemeHandler.SyncXHR is still failing?
Comment 24 Darin Adler 2021-03-19 18:22:00 PDT
That’s so annoying; it passed locally!
Comment 25 Darin Adler 2021-03-19 18:22:39 PDT
I was able to reproduce the failure locally, and then made the change and it started passing. But I’ll double check my work and figure it out.
Comment 26 Darin Adler 2021-03-19 18:32:43 PDT
I was running the test wrong. Had to use _close instead of stopLoading!
Comment 27 Darin Adler 2021-03-19 18:37:34 PDT
Created attachment 423796 [details]
Patch
Comment 28 Darin Adler 2021-03-22 08:33:38 PDT
OK, a relief. That test is fixed now.
Comment 29 Chris Dumez 2021-03-22 08:45:13 PDT
Comment on attachment 423796 [details]
Patch

r=me
Comment 30 Darin Adler 2021-03-22 11:47:15 PDT
Committed r274764 (235574@main): <https://commits.webkit.org/235574@main>