Bug 46579

Summary: Reproducible crash in appcache code when closing pgatour.com
Product: WebKit Reporter: Jeff Johnson <opendarwin>
Component: Page LoadingAssignee: Nate Chapin <japhet>
Status: VERIFIED FIXED    
Severity: Critical CC: ap, beidson, ddkilzer, fishd, japhet, laszlo.gombos, yael
Priority: P1 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Mac (Intel)   
OS: OS X 10.6   
URL: http://www.pgatour.com/
Attachments:
Description Flags
Backtrace from gdb
none
Safari crash log
none
webarchive of http://pgatour.com/
none
Webarchive converted to "web page, complete" format
none
patch
ap: review-, ap: commit-queue-
stack trace for starting the load
none
Patch that was discussed offline but never uploaded due to test-case reduction.
none
another test
none
patch - reorder calls in FrameLoader::detachFromParent() ap: review+

Description Jeff Johnson 2010-09-26 11:53:31 PDT
Overview:
When I load http://www.pgatour.com/ in Safari and close the window or tab containing the page, Safari crashes.

Steps to reproduce:
1) Launch Safari
2) Load http://www.pgatour.com/
3) Close window (either command-w or clicking the close widget)

Build Date & Platform:
Mac OS X 10.6.4 Build 10F2521, Safari 5.0.2 (6533.18.5), Mac Pro 3.33 GHz 6-core

Additional Builds and Platforms:
WebKit Debug build from source at svn r68344.

Additional Information:
Attached crash log from Safari and backtrace from gdb.
Comment 1 Jeff Johnson 2010-09-26 11:54:02 PDT
Created attachment 68852 [details]
Backtrace from gdb
Comment 2 Jeff Johnson 2010-09-26 11:55:07 PDT
Created attachment 68853 [details]
Safari crash log
Comment 3 Jeff Johnson 2010-09-26 12:21:44 PDT
It's crashing here:

    return m_page ? m_page->settings() : 0

obviously because m_page has already been deallocated. That happened at WebView.mm:1140 (delete page).

Here are some values in gdb from the backtrace in connection:didFailWithError:

(gdb) po connection
<NSURLConnection: 0x13245f470, http://0.0.0.0/del.gif?A295924_14:14:03.828_%5B19%5DIn_Cleanup_for_ad_295924,_sending_T_ping_and_cleaning_up>
(gdb) po error
Error Domain=NSURLErrorDomain Code=-1004 UserInfo=0x131e8a420 "Could not connect to the server." Underlying Error=(Error Domain=kCFErrorDomainCFNetwork Code=-1004 UserInfo=0x131e8de30 "Could not connect to the server.")
Comment 4 Jeff Johnson 2010-09-26 12:49:00 PDT
Created attachment 68855 [details]
webarchive of http://pgatour.com/
Comment 5 Jeff Johnson 2010-09-26 12:50:09 PDT
I've saved a webarchive of the page. I can reproduce the crash using the webarchive, even without an internet connection.
Comment 6 Alexey Proskuryakov 2010-09-27 10:11:04 PDT
Confirmed with Safari/WebKit 5.0.2.
Comment 7 Alexey Proskuryakov 2010-09-27 10:11:49 PDT
<rdar://problem/8481581>
Comment 8 Alexey Proskuryakov 2010-09-28 08:43:49 PDT
Someone investigated this issue, and found that this request is started from onunload. But we're not using the new PingLoader for it for some reason.

Why didn't PingLoader get used? And in fact, does it perhaps have the same problem as ImageLoader here?
Comment 9 Nate Chapin 2010-09-28 08:52:01 PDT
(In reply to comment #8)
> Someone investigated this issue, and found that this request is started from onunload. But we're not using the new PingLoader for it for some reason.
> 
> Why didn't PingLoader get used? And in fact, does it perhaps have the same problem as ImageLoader here?

PingLoader only gets used in a couple of explicit cases at the moment, not as a general rule for all requests from unload handlers.  I don't think there's anything stopping us from changing that.

As far as I know, PingLoader shouldn't have the same problem, because its connection to the Frame should be severed once the PingLoader's constructor returns.  The didFail() callback would be received, and PingLoader would just cancel itself without doing any other work.
Comment 10 Alexey Proskuryakov 2010-09-28 11:46:27 PDT
FWIW, the site was clearly trying to make a ping (it's changed already, and we don't crash on it any more - thankfully, we have the webarchive). I haven't looked at how exactly it was making the request - but the stack trace I've been given was:

#9    0x101542304 in WebCore::CachedResourceLoader::requestImage at CachedResourceLoader.cpp:137
#10  0x10199e13e in WebCore::ImageLoader::updateFromElement at ImageLoader.cpp:172
#11  0x10199e35c in WebCore::ImageLoader::updateFromElementIgnoringPreviousError at ImageLoader.cpp:210
#12  0x101918e81 in WebCore::HTMLImageElement::parseMappedAttribute at HTMLImageElement.cpp:108
...
#17  0x101b56a6f in WebCore::setJSHTMLImageElementSrc at JSHTMLImageElement.cpp:393
...
<onunload handler>
Comment 11 David Kilzer (:ddkilzer) 2010-09-29 14:46:33 PDT
Created attachment 69251 [details]
Webarchive converted to "web page, complete" format

Converted Attachment #68855 [details] to "web page, complete" format.
Comment 12 Nate Chapin 2010-10-27 14:42:12 PDT
Created attachment 72089 [details]
patch

This was also reported to chromium in http://crbug.com/52772.

The attached patch continues to use m_pageDismissalEventBeingDispatched in the same way internally to FrameLoader, but changes the way it is exposed to check whether any frame in the tree is dispatching a page dismissal event. This ensures that we use PingLoader for all loads from unload handlers, even if the load is being handled in a different frame.

Note that as a part of this patch, I've also extended PingLoader to be triggered for all CachedResource loads, not just images. If this isn't desired, it can easily be removed.
Comment 13 Alexey Proskuryakov 2010-10-27 15:05:48 PDT
Comment on attachment 72089 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=72089&action=review

Is fixing the crasher in ImageLoader code path something that would make sense to do? Can it be triggered in any other cases?

> LayoutTests/http/tests/navigation/resources/image-load-in-subframe-unload-handler-helper.html:6
> +        img.src = "redirect302.pl";

I think that this warrants a comment. Also, why not use 0.0.0.0, as in the original problem?
Comment 14 Alexey Proskuryakov 2010-10-27 15:09:00 PDT
I now see that it's been a redirect in the case you investigated. Maybe both should be tested?
Comment 15 Nate Chapin 2010-10-27 15:38:21 PDT
(In reply to comment #13)
> (From update of attachment 72089 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=72089&action=review
> 
> Is fixing the crasher in ImageLoader code path something that would make sense to do? Can it be triggered in any other cases?

I assume that you're referring to the stack in Comment #10? If so, I'd need to see the top of the stack (if it's still available) to be sure, but I think this will fix it.

> 
> > LayoutTests/http/tests/navigation/resources/image-load-in-subframe-unload-handler-helper.html:6
> > +        img.src = "redirect302.pl";
> 
> I think that this warrants a comment. Also, why not use 0.0.0.0, as in the original problem?

(In reply to comment #14)
> I now see that it's been a redirect in the case you investigated. Maybe both should be tested?

From what I can tell, chromium doesn't crash in ApplicationCacheHost because their implementations of those callbacks are dummies.  Only certain ResourceLoader callbacks (e.g., willSendRequest) reach into the members of Frame that are assumed to be non-null (but are once the Frame is detached) regardless of whether appcache is enabled.  So I just picked a case that consistently crashed every port I could build.   I'm happy to add additional cases if you think it's worthwhile.
Comment 16 Alexey Proskuryakov 2010-10-27 16:13:06 PDT
Created attachment 72105 [details]
stack trace for starting the load
Comment 17 Alexey Proskuryakov 2010-10-27 16:17:44 PDT
> I assume that you're referring to the stack in Comment #10? If so, I'd need to see the 
> top of the stack (if it's still available) to be sure, but I think this will fix it.

Attached the full stack trace.

But my question was a bit different - can we enter the same crashing code path via some other means? Basically, do we also need a dumb null check or something like that, to be safe in other potentially problematic cases (if any exist)?
Comment 18 Nate Chapin 2010-10-27 16:44:38 PDT
(In reply to comment #17)
> > I assume that you're referring to the stack in Comment #10? If so, I'd need to see the 
> > top of the stack (if it's still available) to be sure, but I think this will fix it.
> 
> Attached the full stack trace.
> 
> But my question was a bit different - can we enter the same crashing code path via some other means? Basically, do we also need a dumb null check or something like that, to be safe in other potentially problematic cases (if any exist)?

Ah, sorry for my misunderstanding. Let me make sure I understand that stack trace, and I'll get back to you.
Comment 19 Nate Chapin 2010-10-28 13:03:49 PDT
(In reply to comment #18)
> (In reply to comment #17)
> > > I assume that you're referring to the stack in Comment #10? If so, I'd need to see the 
> > > top of the stack (if it's still available) to be sure, but I think this will fix it.
> > 
> > Attached the full stack trace.
> > 
> > But my question was a bit different - can we enter the same crashing code path via some other means? Basically, do we also need a dumb null check or something like that, to be safe in other potentially problematic cases (if any exist)?
> 
> Ah, sorry for my misunderstanding. Let me make sure I understand that stack trace, and I'll get back to you.

I've thus far been unable to crash when starting the load.  Is there a known way to trigger it?
Comment 20 Ramesh Bapanapalli 2010-10-29 02:34:07 PDT
Created attachment 72303 [details]
Patch that was discussed offline but never uploaded due to test-case reduction.

I had created a patch (3 in fact) and discussed offline with Webkit expert(s). The the reason I am attaching this path is for one good reason.

First thing first, Nate's path is certainly better than the path I am attaching now w.r.t following points:
1) Is addressing all Subloads not just the ImageSubloads.
2) He has a testcase added to regression, which I was unable to do it; due to time limitation and new to Webkit contribution.

So to the point, the only reason I am uploading this patch is, my proposed patch is to store the page dismissal event
(m_childsPageDismissalEventBeingDispatched) at the FrameLoader of MainFrame. There by it is possible to avoid traversal 
of FrameLoader loop and possible to save CPU cycles at the expense of an extra bool variable on FrameLoader.

Just thought of sharing my opinion, if any one including Nate agree this is not such a bad idea, then I would propose merging both solutions.

Just my 2 cents.
Comment 21 Adam Barth 2010-10-31 18:17:37 PDT
Comment on attachment 72089 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=72089&action=review

> WebCore/loader/FrameLoader.cpp:1219
> +bool FrameLoader::pageDismissalEventBeingDispatchedInAnyFrame() const
> +{
> +    for (Frame* frame = m_frame->page()->mainFrame(); frame; frame = frame->tree()->traverseNext()) {
> +        if (frame->loader()->m_pageDismissalEventBeingDispatched)
> +            return true;
> +    }
> +    return false;
> +}

Sigh.  Grow FrameLoader!  Grow!

> WebCore/loader/CachedResource.h:90
> +    static ResourceRequest::TargetType resourceRequestTargetType(Type);

ap thinks we should get rid of ResourceRequest::TargetType.

> WebCore/loader/FrameLoader.h:331
> -    bool pageDismissalEventBeingDispatched() const { return m_pageDismissalEventBeingDispatched; }
> +    bool pageDismissalEventBeingDispatchedInAnyFrame() const;

Maybe better on Page?
Comment 22 Nate Chapin 2010-11-01 10:22:05 PDT
(In reply to comment #21)
> (From update of attachment 72089 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=72089&action=review
> 
> > WebCore/loader/FrameLoader.cpp:1219
> > +bool FrameLoader::pageDismissalEventBeingDispatchedInAnyFrame() const
> > +{
> > +    for (Frame* frame = m_frame->page()->mainFrame(); frame; frame = frame->tree()->traverseNext()) {
> > +        if (frame->loader()->m_pageDismissalEventBeingDispatched)
> > +            return true;
> > +    }
> > +    return false;
> > +}
> 
> Sigh.  Grow FrameLoader!  Grow!
> 
> > WebCore/loader/CachedResource.h:90
> > +    static ResourceRequest::TargetType resourceRequestTargetType(Type);
> 
> ap thinks we should get rid of ResourceRequest::TargetType.

Let's do it.  I'd be happy to take it on, though I suspect I'll need some help determining how it's used in each platform.

> 
> > WebCore/loader/FrameLoader.h:331
> > -    bool pageDismissalEventBeingDispatched() const { return m_pageDismissalEventBeingDispatched; }
> > +    bool pageDismissalEventBeingDispatchedInAnyFrame() const;
> 
> Maybe better on Page?

Good point.  Any preference on whether I just fix before landing or upload for review again?
Comment 23 Alexey Proskuryakov 2010-11-01 10:30:41 PDT
Please don't land yet, I'd like to experiment with this fix a little (today).
Comment 24 Nate Chapin 2010-11-01 10:31:44 PDT
(In reply to comment #23)
> Please don't land yet, I'd like to experiment with this fix a little (today).

No problem, let me know if there's anything I can do to help.
Comment 25 Alexey Proskuryakov 2010-11-01 12:43:02 PDT
Created attachment 72542 [details]
another test

Just adding a test that I now made, it's quite similar. Not sure if it's worth landing both.

We can't go to 0.0.0.0 from a regression test, because DumpRenderTree blocks other hosts. Here, I just used a non-existing resource.
Comment 26 Alexey Proskuryakov 2010-11-01 12:56:00 PDT
Comment on attachment 72089 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=72089&action=review

> WebCore/ChangeLog:5
> +        If we're in an unload/beforeunload/pagetransition event in any frame

+        If we're in an unload/beforeunload/pagetransition event in any frame

Page transition events include pagehide, pageshow, and manually created events. I don't think this changes, or should change behavior for the latter two.

> WebCore/ChangeLog:7
> +        and we start loading a subresource, load it with PingLoader rather than
> +        SubresourceLoader. Otherwise, we won't cancel it and the resource will

This new behavior sounds overly restrictive. Why shouldn't we be able to add subresources to main frame while processing onunload in a subframe? The main frame may not be closing at all - perhaps it wants to replace its subframe with an image, for example!

I think that the root cause of this crash is different. Somehow, we fail to cancel these resource loads when a frame goes away. It's not clear why the resource load isn't canceled, and when else that can happen besides this multi-frame setup. That needs to be investigated.

Changing PingLoader to work on pgatour.com is probably good, but it's separate from actual crash fix.
Comment 27 Alexey Proskuryakov 2010-11-01 13:00:40 PDT
Or maybe resource load isn't supposed to be canceled when frame goes away, and we just need a null check to make sure we don't crash?
Comment 28 Nate Chapin 2010-11-01 13:51:34 PDT
(In reply to comment #26)
> (From update of attachment 72089 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=72089&action=review
> 
> > WebCore/ChangeLog:5
> > +        If we're in an unload/beforeunload/pagetransition event in any frame
> 
> +        If we're in an unload/beforeunload/pagetransition event in any frame
> 
> Page transition events include pagehide, pageshow, and manually created events. I don't think this changes, or should change behavior for the latter two.

You're right, I misread some FrameLoader code just before I wrote that summary.

> 
> > WebCore/ChangeLog:7
> > +        and we start loading a subresource, load it with PingLoader rather than
> > +        SubresourceLoader. Otherwise, we won't cancel it and the resource will
> 
> This new behavior sounds overly restrictive. Why shouldn't we be able to add subresources to main frame while processing onunload in a subframe? The main frame may not be closing at all - perhaps it wants to replace its subframe with an image, for example!
> 
> I think that the root cause of this crash is different. Somehow, we fail to cancel these resource loads when a frame goes away. It's not clear why the resource load isn't canceled, and when else that can happen besides this multi-frame setup. That needs to be investigated.

We used to call FrameLoader::stopLoading() multiple times in the process of detaching the Frame.  This meant that stopLoading() for a Frame was called both before and after its child frames ran their unload events, but now we only call before.  I think we changed this in r61801.

The possible solutions that come to my mind are:
1. Ensure all unload events are handled through PingLoader (loses the ability to have them affect the DOM, as you noted)
2. Cancel all loaders later in the process of detaching the frame
3. Null check m_frame->page() in every ResourceLoader callback

I think #2 seems like the least bad of these options, but there may be unintended side of effects that I haven't discovered yet :)

> 
> Changing PingLoader to work on pgatour.com is probably good, but it's separate from actual crash fix.

Ok, so maybe we should keep PingLoader image-only, but extend it to unload event for any frame?
Comment 29 Alexey Proskuryakov 2010-11-01 13:59:50 PDT
> I think we changed this in r61801.

Interesting! Does reverting that patch fix the crash? I thought that I only removed calls that had no effect ever.

> I think #2 seems like the least bad of these options

Agreed.

> > Changing PingLoader to work on pgatour.com is probably good, but it's separate from actual crash fix.
> Ok, so maybe we should keep PingLoader image-only, but extend it to unload event for any frame?

I don't know - perhaps let's start a separate bug for that? We know that something an actual site uses for tracking doesn't work, but it's not clear to me how to fix that.
Comment 30 Nate Chapin 2010-11-01 15:28:46 PDT
(In reply to comment #29)
> > I think we changed this in r61801.
> 
> Interesting! Does reverting that patch fix the crash? I thought that I only removed calls that had no effect ever.

I believe just reverting the changes to FrameLoader::stopLoading() stop the crash.

I didn't try reverting the full patch, and I also didn't check whether just changing stopLoading() broke the test added in r61801. :)

> 
> > I think #2 seems like the least bad of these options
> 
> Agreed.
> 
> > > Changing PingLoader to work on pgatour.com is probably good, but it's separate from actual crash fix.
> > Ok, so maybe we should keep PingLoader image-only, but extend it to unload event for any frame?
> 
> I don't know - perhaps let's start a separate bug for that? We know that something an actual site uses for tracking doesn't work, but it's not clear to me how to fix that.

Yeah, I think it's fine to deal with it in a separate bug.  

It seems reasonable to me to assume that, if we're loading an image in an unload handler, even if that image is being loaded into another frame, that the intent is a ping.  There may, of course, be a case I'm not thinking about.
Comment 31 Alexey Proskuryakov 2010-11-01 15:45:18 PDT
> I believe just reverting the changes to FrameLoader::stopLoading() stop the crash.

OK. My explanation in r61801 was that "stopLoading() will be called for subframes via
FrameLoader::detachFromParent() and closeURL()." Why doesn't that happen in this case? Should we roll out FrameLoader::stopLoading() part of my change, or make sure that this comment becomes accurate?
Comment 32 Nate Chapin 2010-11-01 16:37:27 PDT
(In reply to comment #31)
> > I believe just reverting the changes to FrameLoader::stopLoading() stop the crash.
> 
> OK. My explanation in r61801 was that "stopLoading() will be called for subframes via
> FrameLoader::detachFromParent() and closeURL()." Why doesn't that happen in this case? Should we roll out FrameLoader::stopLoading() part of my change, or make sure that this comment becomes accurate?

I believe the original sequence was:
1. Frame X calls its own stopLoading(), triggering its unload event
2. stopLoading() recursively calls stopLoading on all its subframes, including Frame Y, which has an unload handler that starts a load in Frame X.
3. Frame X calls stopAllLoaders(), which stops all of its loads (including the load started by Frame Y)
4. Frame X calls detachChildren()

The sequence is now:
1. Frame X calls its own stopLoading(), triggering its unload event
2. Frame X calls stopAllLoaders()
3. Frame X calls detachChildren, which eventually calls Frame Y's stopLoading(), triggering the load in Frame X
4. Frame X never calls stopAllLoaders() again, so the load triggered from Frame Y's unload event continues in a detached Frame.

So your comment is correct, but the problem is that the child frame isn't call stopLoading() on its parents (and the parents have already called it on themselves).  I'm going to trying reordering the steps of FrameLoader::detachFromParent() and see if that fixes the problem.
Comment 33 Alexey Proskuryakov 2010-11-01 17:05:37 PDT
> 3. Frame X calls detachChildren, which eventually calls Frame Y's stopLoading(), triggering the load in Frame X

Sounds like this load should fail to start, but Frame X is insufficiently stopped for that.
Comment 34 Nate Chapin 2010-11-02 12:41:13 PDT
Created attachment 72714 [details]
patch - reorder calls in FrameLoader::detachFromParent()

This patch just ensures that we stopAllLoaders after any child unload event handlers have run. Note that we immediately detach this frame after calling stopAllLoaders(), so non-child frames shouldn't have a chance to start a load in this frame either.
Comment 35 Alexey Proskuryakov 2010-11-02 13:05:26 PDT
Comment on attachment 72714 [details]
patch - reorder calls in FrameLoader::detachFromParent()

View in context: https://bugs.webkit.org/attachment.cgi?id=72714&action=review

Looks great, thanks for getting to the bottom of this!

You can consider also landing my test - it's different in that it closes top frame, not an intermediate one.

> LayoutTests/http/tests/navigation/resources/image-load-in-subframe-unload-handler-helper.html:9
> +        var img = new Image(1, 1);
> +        // We're using a redirect here because it guarantees that if we're
> +        // receiving callbacks in a detached Frame, we'll acceess members
> +        // that are now invalid (e.g., DocumentLoaders).
> +        img.src = "redirect302.pl";

Maybe append the image to body, to make sure the test doesn't rely on whether detached images are loaded?
Comment 36 Alexey Proskuryakov 2010-11-03 12:20:31 PDT
This was landed in <http://trac.webkit.org/changeset/71256>.