|Summary:||Crash in NSURLConnectionInternal, messaging released WebView|
|Product:||WebKit||Reporter:||Daniel Jalkut <jalkut>|
|Component:||Page Loading||Assignee:||Nobody <webkit-unassigned>|
|Severity:||Normal||CC:||andersca, ap, buildbot, commit-queue, darin, rniwa|
|Version:||528+ (Nightly build)|
Description Daniel Jalkut 2014-11-05 15:06:13 PST
Created attachment 241058 [details] Crash log for the typical crashing case described here I've been looking into a crash that affects real-world users of my app, MarsEdit. The nuance of this bug is that a customer experiences crashes in my app (which features a WebView-based rich editor) only when they embed HTML code from Microsoft's "OneDrive" service. I have confirmed that the crash is not specific to my app by reproducing it easily with a trunk build of WebKit, running MiniBrowser in WebKitLegacy mode: 1. Build WebKit from the command line: ./Tools/Scripts/build-webkit --debug 2. Run ./Tools/Scripts/run-minibrowser --debug 3. Open a new window in MiniBrowser and load this URL: https://onedrive.live.com/embed?cid=481F46B257143886&resid=481F46B257143886%21125&authkey=ADg4zT2NXK6xtIw&em=2 3. Close the window while the view is loading or shortly after it has completed. The bug reproduces pretty easily upon closing the window, but to make it even more likely to occur, try closing it in the midst of the view's content loading. I'm attaching a crash log that shows the issue is typically that the WebView associated with the closed window is deallocated by the time an active NSURLConnection's delegate dispatch attempts to call back to it. I'm a little confused actually because the crash often appears to occur upon release after a retain by the NSURLConnection, so it may come down to an issue in NSURLConnection, but I saw a few things in ResourceHandleMac that had me concerned: 1. In ResourceHandle::~ResourceHandle, cancel() is not called. It seems like it should be to ensure that any pending NSURLConnection tracked by m_connection is stopped. 2. I suspect that beyond calling cancel, it may be necessary to mirror the calls to scheduleInRunLoop:... from ResourceHandle::start, with calls to unscheduleFromRunLoop:... documentation in NSURLConnection is a little contradictory but I think the safest best is to unschedule the connection from runloops it was scheduled to, to avoid any unwanted delegate dispatch after the delegate may be gone. 3. I noticed that ResourceHandleMac implements ResourceHandle::schedule and ResourceHandle::unschedule, but these methods are not reached in typical cases I've run in MiniBrowser. I wonder if these are vestigial from a past implementation that didn't rely upon the scheduling that occurs in ::start? If so they can probably be removed to help clarify the obligation of ResourceHandleMac to unschedule m_connection only from the runloops it scheduled in ::start. Now the bad news: even after adding to ResourceHandle::~ResourceHandle a call to close() and a mirror of the code from ::start that schedules the connection on the runloop, except to unschedule from the run loop, the crashing behavior is still occurring for me. So while I think the analysis above has some truth to it, I clearly don't understand the situation well enough to completely remedy the problem. I hope my initial work helps somebody more experienced with ResourceHandleMac to speculate about what's going on in this crash and hopefully come up with a fix. If you have advice for me to pursue creation of a patch on my own I'd be happy to help with that.
Comment 1 Daniel Jalkut 2014-11-05 15:08:01 PST
I forgot to mention as far as reproducing the bug: try opening a new window if it doesn't crash right away. In the worst case, try opening a new window and loading the same URL again, closing, repeat with subtly different timing until it crashes.
Comment 2 Alexey Proskuryakov 2014-11-05 22:17:04 PST
I'll need to refresh my memory on how ResourceHandleMac destruction works. In the meanwhile, it appears that there is an over-release of WebView in MiniBrowser that's unrelated to resource loading. See what happens with zombies: $ NSZombieEnabled=YES run-minibrowser 2014-11-05 22:13:32.630 MiniBrowser[35650:3221420] *** -[WebView release]: message sent to deallocated instance 0x169bb4f60 #0 0x00007fff8ac34980 in ___forwarding___ () #1 0x00007fff8ac345f8 in __forwarding_prep_0___ () #2 0x00007fff8c63a91f in (anonymous namespace)::AutoreleasePoolPage::pop(void*) () #3 0x00007fff8abc2272 in _CFAutoreleasePoolPop () #4 0x00007fff9323914f in -[NSAutoreleasePool drain] () #5 0x00007fff9200c041 in -[NSApplication run] () #6 0x00007fff91ff7424 in NSApplicationMain () #7 0x00000001019e4c02 in main at /Users/ap/Safari/OpenSource/Tools/MiniBrowser/mac/main.m:30
Comment 3 Alexey Proskuryakov 2014-11-06 00:13:20 PST
Comment 4 Alexey Proskuryakov 2014-11-06 00:16:37 PST
We can of course add a boolean data member for "isBeingDeallocated", and skip dangerous work in -_close if it's set, but that would result in not sending a notification when closing a MiniBrowser window.
Comment 5 Daniel Jalkut 2014-11-06 07:08:42 PST
Thanks for taking a look at this, Alexey. Indeed, the crashes I'm having in MarsEdit are also caused when closing windows. When I saw the same crash in MiniBrowser it encouraged me that it's an issue internal to WebView, WebCore, or maybe NSURLConnection/CFNetwork. But you're right that the zombie element of the equation is perplexing. I don't think the over-releasing of the WebView happens though unless the problematic web content is loaded into the frame. And I am able to reproduce the problem using a new document-based Cocoa app whose document window has a xib-instantiated WebView that has the problematic content loaded in at nibDidLoad time. This makes me think that something in the WebKit realm or below is over-releasing the WebView, and maybe the perceived messiness in ResourceHandleMac is not really a problem so long as the WebView sticks around for as long as it should.
Comment 6 Daniel Jalkut 2014-11-06 07:19:46 PST
What's most suspicious about an Instruments "Zombie" audit of the crash is the WebView in question is logged as being released twice by the "CFNetwork" library, but it is never credited for any retains. They are the last two recorded events for the object, culminating in the overrelease, each coming from: [NSURLConnectionInternal _withConnectionAndDelegate:onlyActive:]
Comment 7 Alexey Proskuryakov 2014-11-06 09:07:30 PST
> [NSURLConnectionInternal _withConnectionAndDelegate:onlyActive:] The history view of Instruments is misleading, one needs to look at actual stack traces in its right pane. Specifically, this release that is credited to NSURLConnectionInternal is actually just draining an autorelease pool, and the -autorelease call happens in the expected correct place.
Comment 8 Daniel Jalkut 2014-11-06 09:15:44 PST
Yeah, but looking at the code in NSURLConnectionInternal, it's draining an AutoRelease pool that it pushed on the stack itself. It seems to be trying to manage the retain/release of itself and the delegate. I'm going to keep poking around and see if I can wrap my head around the numerous retain/releases and pair them off manually.
Comment 9 Daniel Jalkut 2014-11-06 12:12:38 PST
Created attachment 241122 [details] Simple patch to wrap WebView's dealloc method with an autorelease pool
Comment 10 Daniel Jalkut 2014-11-06 12:13:41 PST
Comment 11 Daniel Jalkut 2014-11-06 12:14:14 PST
Created attachment 241124 [details] Stack trace showing how the same dealloc method series of events leads to autorelease the WebView
Comment 12 Daniel Jalkut 2014-11-06 12:20:29 PST
Comment 13 Daniel Jalkut 2014-11-06 12:41:01 PST
Created attachment 241125 [details] Wrap the bulk of WebView's dealloc method with an autorelease pool to prevent self being autoreleased after dealloc
Comment 14 Daniel Jalkut 2014-11-06 12:58:38 PST
Created attachment 241127 [details] Proposed fix including ChangeLog update I am amenable to the idea of writing an API test for this but I have never done so and ran into a snag trying to get TestWebKitAPI to build. I'm probably done for the day but I may find the energy to tackle that tomorrow.
Comment 15 WebKit Commit Bot 2014-11-06 18:11:15 PST
Comment on attachment 241127 [details] Proposed fix including ChangeLog update Clearing flags on attachment: 241127 Committed r175733: <http://trac.webkit.org/changeset/175733>
Comment 16 WebKit Commit Bot 2014-11-06 18:11:20 PST
All reviewed patches have been landed. Closing bug.
Comment 17 Darin Adler 2014-11-07 09:00:45 PST
This is a fine workaround, but I am pretty sure it’s illegal to retain an object that is already inside its dealloc method. So making it non-fatal with this autorelease pool is one thing, but another even better thing to do would be to fix the code path inside dealloc that ends up retaining the WebView. We could debug this by adding an override of retain, an "in dealloc" flag, and asserting that we are not in dealloc inside the retain before calling super. Then find a adjust the code so it doesn't retain/release itself.
Comment 18 Alexey Proskuryakov 2014-11-07 09:59:16 PST
Comment 19 Daniel Jalkut 2014-11-07 11:05:08 PST
Comment 20 Darin Adler 2014-11-07 15:28:27 PST
The real point is that these call out to the app, which could then retain the WebView. The app has no way of knowing that the WebView is not already retained. I suggest we find a way to change the semantics of the WebView in the future so that if a client doesn’t explicitly call close the WebViewWillCloseNotification is not sent, and the unload event is not sent to the webpage either. Then this bug could still come up, but only for old clients, and eventually we can phase the behavior out entirely.
Comment 21 Anders Carlsson 2014-11-11 08:47:48 PST
(In reply to comment #20) > The real point is that these call out to the app, which could then retain > the WebView. The app has no way of knowing that the WebView is not already > retained. > > I suggest we find a way to change the semantics of the WebView in the future > so that if a client doesn’t explicitly call close the > WebViewWillCloseNotification is not sent, and the unload event is not sent > to the webpage either. We should also figure out why WebViewWillCloseNotification was added, and if anyone is still using it (it's SPI). > > Then this bug could still come up, but only for old clients, and eventually > we can phase the behavior out entirely. Agreed. We should also figure out what our story should be for the modern API. Right now we don't have an explicit close method, we just invalidate everything on dealloc. (FWIW, I think other classes such as NSWindow have hacks in place to handle being retained from inside dealloc).