Bug 38928 - Repro crash at http://www.sears.com
Summary: Repro crash at http://www.sears.com
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Brady Eidson
URL: http://www.sears.com
Keywords: InRadar
Depends on: 43840
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-11 12:44 PDT by Brady Eidson
Modified: 2010-09-02 12:22 PDT (History)
5 users (show)

See Also:


Attachments
Reduction with description of the problem. (1.38 KB, text/html)
2010-05-11 12:44 PDT, Brady Eidson
no flags Details
Fix + layout test (11.21 KB, patch)
2010-05-12 12:32 PDT, Brady Eidson
beidson: commit-queue-
Details | Formatted Diff | Diff
Rename m_unloadEventBeingDispatched to m_pageDismissalEventBeingDispatched (4.77 KB, patch)
2010-05-13 10:37 PDT, Brady Eidson
sullivan: review+
beidson: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brady Eidson 2010-05-11 12:44:19 PDT
1. Navigate to http://www.sears.com. 
2. Select a product category (ex. Health & Wellness > Air Treatments).
3. Press List View.
4. Select first and second product.
5. Press Compare Now.
6. Press back button.
7. Press Compare Now again.
--Observe: Nothing happens; product comparison not displayed (unexpected).
8. Select Sign In link.

Deref'ing a null DocumentLoader

#0	0x102720ba8 in WTF::OwnPtr<WebCore::ArchiveResourceCollection>::operator WebCore::ArchiveResourceCollection* WTF::OwnPtr<WebCore::ArchiveResourceCollection>::* at OwnPtr.h:69
#1	0x10271ea2e in WebCore::DocumentLoader::popArchiveForSubframe at DocumentLoader.cpp:441
#2	0x1028556ba in WebCore::FrameLoader::loadURLIntoChildFrame at FrameLoader.cpp:1082
#3	0x101fca3ab in WebFrameLoaderClient::createFrame at WebFrameLoaderClient.mm:1364
#4	0x1028559e6 in WebCore::FrameLoader::loadSubframe at FrameLoader.cpp:410
#5	0x102855cf0 in WebCore::FrameLoader::requestFrame at FrameLoader.cpp:381
#6	0x1028f1dc1 in WebCore::HTMLFrameElementBase::openURL at HTMLFrameElementBase.cpp:107
#7	0x1028f2399 in WebCore::HTMLFrameElementBase::setNameAndOpenURL at HTMLFrameElementBase.cpp:166
#8	0x1028f23b1 in WebCore::HTMLFrameElementBase::setNameAndOpenURLCallback at HTMLFrameElementBase.cpp:171
#9	0x1025cd61c in WebCore::ContainerNode::dispatchPostAttachCallbacks at ContainerNode.cpp:611
#10	0x1025cd764 in WebCore::ContainerNode::resumePostAttachCallbacks at ContainerNode.cpp:583
#11	0x1027eb223 in WebCore::Element::attach at Element.cpp:837
#12	0x1028f1964 in WebCore::HTMLFrameElementBase::attach at HTMLFrameElementBase.cpp:212
#13	0x102caa8ce in WebCore::Node::lazyAttach at Node.cpp:748
#14	0x1025cdcc1 in WebCore::ContainerNode::appendChild at ContainerNode.cpp:525
#15	0x102b03b37 in WebCore::JSNode::appendChild at JSNodeCustom.cpp:165
#16	0x102b023b6 in WebCore::jsNodePrototypeFunctionAppendChild at JSNode.cpp:471
...
Comment 1 Brady Eidson 2010-05-11 12:44:55 PDT
Created attachment 55738 [details]
Reduction with description of the problem.
Comment 2 Brady Eidson 2010-05-11 12:48:35 PDT
Sears.com scripts have bizarre Safari specific code paths that do this.

The anchor element they create within their onbeforeunload handler and then dispatch a click to actually activates the policy delegate to consult about the navigation.

When the policy delegate says "use", we call onbeforeunload again, which does the same thing with the anchor, consulting the policy delegate, etc etc etc.

All the way till we blow out the stack and javascript aborts.

Here's a cut of a single iteration of the stack blowing out:

#0	0x1026c550e in WebCore::FrameLoader::continueLoadAfterNavigationPolicy at FrameLoader.cpp:3425
#1	0x1026c561c in WebCore::FrameLoader::callContinueLoadAfterNavigationPolicy at FrameLoader.cpp:3366
#2	0x102b6e063 in WebCore::PolicyCallback::call at PolicyCallback.cpp:101
#3	0x102b6ebb3 in WebCore::PolicyChecker::continueAfterNavigationPolicy at PolicyChecker.cpp:160
#4	0x101e3b999 in WebFrameLoaderClient::receivedPolicyDecison at WebFrameLoaderClient.mm:1271
#5	0x101e3ba2e in -[WebFramePolicyListener receivedPolicyDecision:] at WebFrameLoaderClient.mm:1864
#6	0x101e38066 in -[WebFramePolicyListener use] at WebFrameLoaderClient.mm:1879
#7	Browser policy delegate
#8	Browser policy delegate
#9	Browser policy delegate
#10	0x7fff88026d8c in __invoking___
#11	0x7fff88026c5d in -[NSInvocation invoke]
#12	0x7fff88042a71 in -[NSInvocation invokeWithTarget:]
#13	0x101ed9f3e in -[_WebSafeForwarder forwardInvocation:] at WebView.mm:2467
#14	0x7fff88023dac in ___forwarding___
#15	0x7fff8801fe88 in __forwarding_prep_0___
#16	0x101e3cc30 in WebFrameLoaderClient::dispatchDecidePolicyForNavigationAction at WebFrameLoaderClient.mm:750
#17	0x102b6f130 in WebCore::PolicyChecker::checkNavigationPolicy at PolicyChecker.cpp:88
#18	0x1026c5a34 in WebCore::FrameLoader::loadWithDocumentLoader at FrameLoader.cpp:1992
#19	0x1026c680a in WebCore::FrameLoader::loadWithNavigationAction at FrameLoader.cpp:1916
#20	0x1026c7c9e in WebCore::FrameLoader::loadURL at FrameLoader.cpp:1859
#21	0x1026c819d in WebCore::FrameLoader::loadFrameRequest at FrameLoader.cpp:1795
#22	0x1026c8545 in WebCore::FrameLoader::urlSelected at FrameLoader.cpp:361
#23	0x10273842d in WebCore::HTMLAnchorElement::defaultEventHandler at HTMLAnchorElement.cpp:199
#24	0x102b193a9 in WebCore::Node::dispatchGenericEvent at Node.cpp:2683
#25	0x102b19607 in WebCore::Node::dispatchEvent at Node.cpp:2567
#26	0x102678278 in WebCore::EventTarget::dispatchEvent at EventTarget.cpp:268
#27	0x102974c23 in WebCore::jsNodePrototypeFunctionDispatchEvent at JSNode.cpp:664
#28	0x33828e4001b4 in ??
#29	0x1017acca4 in JSC::JITCode::execute at JITCode.h:77
#30	0x1017974f0 in JSC::Interpreter::execute at Interpreter.cpp:758
#31	0x1017f2298 in JSC::JSFunction::call at JSFunction.cpp:139
#32	0x10173f45b in JSC::call at CallData.cpp:39
#33	0x1028e8392 in WebCore::JSEventListener::handleEvent at JSEventListener.cpp:116
#34	0x10267818a in WebCore::EventTarget::fireEventListeners at EventTarget.cpp:329
#35	0x102678877 in WebCore::EventTarget::fireEventListeners at EventTarget.cpp:290
#36	0x102630f44 in WebCore::DOMWindow::dispatchEvent at DOMWindow.cpp:1444
#37	0x1026ad413 in WebCore::Frame::shouldClose at Frame.cpp:1684
#38	0x1026c52ff in WebCore::FrameLoader::continueLoadAfterNavigationPolicy at FrameLoader.cpp:3383
Comment 3 Brady Eidson 2010-05-11 12:52:15 PDT
When the stack blows out and javascript aborts, the FrameLoader is left in a bizarre state where it thinks it is the middle of FrameLoadStateProvisional, but doesn't have a provisional documentloader.

This allows for the later crash, but also makes the frame un-navigable in the meantime.
Comment 4 Brady Eidson 2010-05-11 12:52:31 PDT
<rdar://problem/7965182>
Comment 5 Brady Eidson 2010-05-11 12:56:21 PDT
The most obvious and glaring problem here is that we re-enter the onbeforeunload event.  Running the same test in Firefox, it's clear they don't re-enter.  Modifying it for IE, they seem to enter it twice (once for the original navigation, once for the first dispatched click) but then they navigate to the first site as expected.

I plan to protect the beforeunload event the same way we protect against unload re-entrancy, with a simple wrapper flag.

Another problem is what happens to the FrameLoader when onbeforeunload aborts (as JSCore is actually aborting the javascript execution when the stack reaches its limit).  I plan to at least add an ASSERT in ::activeDocumentLoader to catch the case that should never occur, and might explore the javascript-aborts aspect further, as well.
Comment 6 Brady Eidson 2010-05-11 12:57:48 PDT
BTW, this isn't a (recent) regression.  Happens in ToT but also in Safari 4.
Comment 7 Brady Eidson 2010-05-11 13:23:46 PDT
Debugging a bit, it's not immediately clear why the aborted javascript due to the excessive stack results in the FrameLoadState and m_provisionalDocumentLoader getting out of sync.

I'm going to focus on preventing the recursion in the onbeforeunload event handler and adding the ASSERT, then might revisit the out-of-syncness later.
Comment 8 Brady Eidson 2010-05-12 12:32:42 PDT
Created attachment 55886 [details]
Fix + layout test

Guarding beforeunload with the same flag as we do pagehide and unload makes sense.

But renaming the flag also seemed prudent.
Comment 9 Darin Adler 2010-05-12 14:42:18 PDT
Comment on attachment 55886 [details]
Fix + layout test

Instead of making setPageDismissalEventBeingDispatched public, could we instead refactor part of the Frame::shouldClose function so it can go inside FrameLoader?
Comment 10 Darin Adler 2010-05-12 14:42:41 PDT
Comment on attachment 55886 [details]
Fix + layout test

Lets land the renaming half of the patch first, separately. Sound OK?
Comment 11 Brady Eidson 2010-05-12 14:59:00 PDT
(In reply to comment #10)
> (From update of attachment 55886 [details])
> Lets land the renaming half of the patch first, separately. Sound OK?

Sounds fine.  r+ on doing that?
Comment 12 Brady Eidson 2010-05-12 15:00:10 PDT
(In reply to comment #9)
> (From update of attachment 55886 [details])
> Instead of making setPageDismissalEventBeingDispatched public, could we instead refactor part of the Frame::shouldClose function so it can go inside FrameLoader?

Frame::shouldClose has a direct API/SPI equivalent in WebKit on WebFrame.

I'm not saying it *couldn't* be done, but it might be much messier than one might think at a glance.

Would you like me to pursue that?
Comment 13 Brady Eidson 2010-05-13 10:37:00 PDT
Created attachment 55995 [details]
Rename m_unloadEventBeingDispatched to m_pageDismissalEventBeingDispatched
Comment 14 Brady Eidson 2010-05-13 10:37:13 PDT
Split off the rename, per Darin's request.
Comment 15 Brady Eidson 2010-05-13 11:18:04 PDT
Rename landed in http://trac.webkit.org/changeset/59374
Comment 16 Brady Eidson 2010-05-13 13:36:47 PDT
Not too different from the original patch, I landed http://trac.webkit.org/changeset/59384 with an in-person review from Darin.
Comment 17 WebKit Review Bot 2010-05-13 14:02:54 PDT
http://trac.webkit.org/changeset/59384 might have broken Qt Linux Release
Comment 18 Brady Eidson 2010-05-13 15:03:55 PDT
(In reply to comment #17)
> http://trac.webkit.org/changeset/59384 might have broken Qt Linux Release

I don't see a broken build on Qt Linux Release, nor did this automated message give me a link to something that shows the alleged breakage.
Comment 19 Brady Eidson 2010-05-13 15:04:23 PDT
BTW, I know I caused a failure on the Tiger bot and am getting new results ready for it as I type.
Comment 20 Brady Eidson 2010-05-13 15:13:29 PDT
http://trac.webkit.org/changeset/59389 should fix the Tiger layouttests
Comment 21 Adam Barth 2010-05-13 15:16:39 PDT
> I don't see a broken build on Qt Linux Release, nor did this automated message give me a link to something that shows the alleged breakage.

http://build.webkit.org/waterfall?show=Qt%20Linux%20Release
http://build.webkit.org/results/Qt%20Linux%20Release/r59384%20(11726)/results.html

Tests that timed out:

fast/loader/recursive-before-unload-crash.html

... which is a test added by your patch...
Comment 22 Brady Eidson 2010-05-13 15:31:48 PDT
(In reply to comment #21)
> > I don't see a broken build on Qt Linux Release, nor did this automated message give me a link to something that shows the alleged breakage.
> 
> http://build.webkit.org/waterfall?show=Qt%20Linux%20Release
> http://build.webkit.org/results/Qt%20Linux%20Release/r59384%20(11726)/results.html
> 
> Tests that timed out:
> 
> fast/loader/recursive-before-unload-crash.html
> 
> ... which is a test added by your patch...

Someone else pointed that out to me.

Did the waterfall "View Results" links used to link *DIRECTLY* to results.html?

I seem to recall not needing to click results.html in the past.  Since I was giving a directory listing and not a page that actually described results, I clicked through to fast/loader and saw nothing, and didn't know what the complaint was.

Could the error message posted in bugzilla be... more useful?

What it told me:
http://trac.webkit.org/changeset/59384 might have broken Qt Linux Release

When I read that, I first thought "Oh, I broke the build," and was actually confused when I saw the build was fine.

What it COULD have told me:
http://trac.webkit.org/changeset/59384 might have broken Qt Linux Release layout tests.  Refer to http://build.webkit.org/results/Qt%20Linux%20Release/r59384%20(11726)/results.html to see what I'm talking about.
Comment 23 Brady Eidson 2010-05-13 15:33:38 PDT
Looking at the diffs for GTK, it's clear that their DRT isn't ready for these types of resource load callback dumping.  I'll skip it there.

And the Qt hang is something I'm not equipped to explore, and the tool isn't giving any sort of info like a sample that might let me explore blindly, so I'll skip that also.
Comment 24 Brady Eidson 2010-05-13 15:40:10 PDT
Skipped for both Qt and GTK in r59392
Comment 25 WebKit Review Bot 2010-05-13 15:56:05 PDT
http://trac.webkit.org/changeset/59389 might have broken Qt Windows 32-bit Release
The following changes are on the blame list:
http://trac.webkit.org/changeset/59387
http://trac.webkit.org/changeset/59388
http://trac.webkit.org/changeset/59389
Comment 26 James Robinson 2010-05-13 17:42:53 PDT
fast/loader/recursive-before-unload.html failed again on Tiger:

--- /Volumes/Data/WebKit-BuildSlave/tiger-intel-release/build/layout-test-results/fast/loader/recursive-before-unload-crash-expected.txt	2010-05-13 17:13:21.000000000 -0700
+++ /Volumes/Data/WebKit-BuildSlave/tiger-intel-release/build/layout-test-results/fast/loader/recursive-before-unload-crash-actual.txt	2010-05-13 17:13:21.000000000 -0700
@@ -6,8 +6,8 @@
 main frame - didStartProvisionalLoadForFrame
 ALERT: Adding iframe
 frame "<!--framePath //<!--frame0-->-->" - didStartProvisionalLoadForFrame
-frame "<!--framePath //<!--frame0-->-->" - didFailProvisionalLoadWithError
 main frame - didCancelClientRedirectForFrame
+frame "<!--framePath //<!--frame0-->-->" - didFailProvisionalLoadWithError
 main frame - didFailProvisionalLoadWithError
 This test demonstrates a problem with our handling of the beforeunload event.
 If a script manages to try and navigate the frame from beforeunload - when a navigation is already pending - we end up blowing out the stack by recursively consulting the policy delegate then running onbeforeunload repeatedly.

Looks like the test itself is racy (at least on Tiger), those two lines have flipped around a few times since the test was added.
Comment 27 Brady Eidson 2010-05-13 18:09:59 PDT
I think I'm going to nuke the frame load callbacks from the test.  I thought they were a nice bonus, but if they're unreliable then it's not worth the hassle - the alert logging and lack of crashing is the important thing.
Comment 28 Robert Hogan 2010-06-08 11:09:27 PDT
(In reply to comment #23)
> Looking at the diffs for GTK, it's clear that their DRT isn't ready for these types of resource load callback dumping.  I'll skip it there.
> 
> And the Qt hang is something I'm not equipped to explore, and the tool isn't giving any sort of info like a sample that might let me explore blindly, so I'll skip that also.

I think the timeout is related to the issue tracked here: https://bugs.webkit.org/show_bug.cgi?id=39160
Comment 29 Eric Seidel (no email) 2010-06-23 16:08:32 PDT
Have seen fast/loader/recursive-before-unload-crash.html fail a couple times on the leopard commit bot.  Might be flaky.
Comment 30 Eric Seidel (no email) 2010-09-02 12:22:09 PDT
This test is in fact very flaky.  See bug 43840.