Bug 22562

Summary: REGRESSION (r37971): events not firing after going back in back/forward cache
Product: WebKit Reporter: Gustavo Delfino <gdelfino>
Component: Page LoadingAssignee: Cameron Zwarich (cpst) <zwarich>
Status: RESOLVED FIXED    
Severity: Major CC: ap, darin, ggaren
Priority: P1 Keywords: InRadar, Regression
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
Attachments:
Description Flags
Same code as shown in description.
none
Proposed patch
darin: review-
Revised patch
none
Proposed patch
none
Revised proposed patch darin: review+

Description Gustavo Delfino 2008-11-30 04:21:44 PST
When you set onclick='location.href="some url"' on an element, the link works only the first time you click it.

Steps to Reproduce:

1) Save the following code into a file:

<html>
    <head>
        <title>test</title>
    </head>
    <body>
        <p onclick='location.href="http://www.apple.com"' style="cursor: pointer">
            Click here. Then hit the back button and click again. It follows the link only the first time.
        </p>    
    </body>
</html>

2) Open in WebKit and click anywhere on the text

3) After apple.com opens, hit the back button

4) Click again anywhere on the text (nothing happens).

Expected Results:

In step 4 the browser should go to apple.com just lke in step 2

Build Date & Platform:

r38826 Intel Mac OS X 10.5.5
Comment 1 Gustavo Delfino 2008-11-30 04:22:57 PST
Created attachment 25607 [details]
Same code as shown in description.
Comment 2 Alexey Proskuryakov 2008-12-03 05:48:43 PST
Confirmed as a regression with r38930.

See also: bug 22625, bug 22194.
Comment 3 Alexey Proskuryakov 2008-12-03 05:50:02 PST
<rdar://problem/6414593>
Comment 4 Cameron Zwarich (cpst) 2008-12-11 22:28:07 PST
This regressed in r37971:

http://trac.webkit.org/changeset/37971
Comment 5 Cameron Zwarich (cpst) 2008-12-11 22:29:33 PST
The problem here is not with Alexey's patch, it is that the Frame's DOMWindow is not restored to its previous value when going back via the back/forward cache.
Comment 6 Cameron Zwarich (cpst) 2008-12-11 22:29:51 PST
*** Bug 22625 has been marked as a duplicate of this bug. ***
Comment 7 Cameron Zwarich (cpst) 2008-12-12 08:31:41 PST
Created attachment 25981 [details]
Proposed patch

The fix is pretty simple. I suppose the only question is whether this should be done explicitly or it should happen implicitly via some other piece of the back/forward logic.
Comment 8 Darin Adler 2008-12-12 09:16:34 PST
Comment on attachment 25981 [details]
Proposed patch

This is not quite the right place for this. If you look in FrameLoader::open you'll see that before calling CachedPage::restore, there are a couple calls on m_frame that change fields in the domWindow. So it's too late to change the DOMWindow after that code has already run. We probably should set the DOMWindow at almost exactly the same time we set the document. I don't really have an opinion about what code should go inside CachedPage::restore vs. what code goes in FrameLoader::open -- CachedPage::restore is only called from that one function, and the two work together without a clear division of responsibilities -- but clearly there's an order dependency that we have to get right.

Also, if you add a function the name should be setDOMWindow, not setDomWindow.

I think it's important in the longer term to have some regression tests for this, as you have mentioned before.
Comment 9 Cameron Zwarich (cpst) 2008-12-12 09:42:29 PST
(In reply to comment #8)
> (From update of attachment 25981 [details] [review])
> This is not quite the right place for this. If you look in FrameLoader::open
> you'll see that before calling CachedPage::restore, there are a couple calls on
> m_frame that change fields in the domWindow. So it's too late to change the
> DOMWindow after that code has already run. We probably should set the DOMWindow
> at almost exactly the same time we set the document. I don't really have an
> opinion about what code should go inside CachedPage::restore vs. what code goes
> in FrameLoader::open -- CachedPage::restore is only called from that one
> function, and the two work together without a clear division of
> responsibilities -- but clearly there's an order dependency that we have to get
> right.

Thanks. This is exactly the kind of feedback I wanted. I'll post a new patch shortly.

> Also, if you add a function the name should be setDOMWindow, not setDomWindow.

Sorry about that. Just a typo on my part.

> I think it's important in the longer term to have some regression tests for
> this, as you have mentioned before.

This is the straw that broke the camel's back, at least for me. I think it is okay to fix this regression before getting b/f cache layout tests working. However, I don't think it is an acceptable risk to enable the b/f cache for more pages without getting tests working.
Comment 10 Cameron Zwarich (cpst) 2008-12-12 10:39:54 PST
Created attachment 25983 [details]
Revised patch

Of course, now that I have done the right thing, the patch starts crashing in weird places. ;-) It seems that quitting Safari after navigating via the b/f cache will crash for any number of reasons.
Comment 11 Darin Adler 2008-12-13 15:36:59 PST
Cameron, I applied your patch and I'm not seeing any crash on quit. Everything seems to be working perfectly.
Comment 12 Cameron Zwarich (cpst) 2008-12-13 20:32:53 PST
(In reply to comment #11)
> Cameron, I applied your patch and I'm not seeing any crash on quit. Everything
> seems to be working perfectly.

Here are my repro steps, which work perfectly.

1) Set your home page in Safari to a blank page.

2) Launch Safari with my patch applied to WebKit.

3) Navigate to http://www.google.com/.

4) Navigate to http://www.cnn.com/.

5) Go back.

6) Close the window.

This leads to either a DOMWindow or a JSDOMWindow using the deleted Frame from a destructor. The Frame is being deleted at the same time as before, and JSDOMWindow always checks if it has a valid impl before calling out to the impl's Frame, so the problem must be that a DOMWindow from http://www.cnn.com/ is being kept alive too long after my patch. I should be able to fix this tonight.
Comment 13 Cameron Zwarich (cpst) 2008-12-13 23:14:50 PST
The backtrace for the creation of the offending DOMWindow is pretty strange:

#0  WebCore::DOMWindow::DOMWindow (this=0x1e7db4a0, frame=0x9c4a00) at /Users/Cameron/WebKit/WebCore/page/DOMWindow.cpp:148
#1  0x03649299 in WebCore::DOMWindow::create (frame=0x9c4a00) at DOMWindow.h:69
#2  0x036450ff in WebCore::Frame::domWindow (this=0x9c4a00) at /Users/Cameron/WebKit/WebCore/page/Frame.cpp:1631
#3  0x03b3ccca in WebCore::ScriptController::clearWindowShell (this=0x68101a8) at /Users/Cameron/WebKit/WebCore/bindings/js/ScriptController.cpp:137
#4  0x03652db1 in WebCore::FrameLoader::clear (this=0x680fe24, clearWindowProperties=true, clearScriptObjects=true) at /Users/Cameron/WebKit/WebCore/loader/FrameLoader.cpp:830
#5  0x036592e0 in WebCore::FrameLoader::open (this=0x680fe24, cachedPage=@0x19f6d600) at /Users/Cameron/WebKit/WebCore/loader/FrameLoader.cpp:2929

It is being done by this code:

    // Do this after detaching the document so that the unload event works.
    if (clearWindowProperties) {
        m_frame->clearDOMWindow();
        m_frame->script()->clearWindowShell();
    }

Calling Frame::clearDOMWindow() nulls out the Frame::m_domWindow, but ScriptController::clearWindowShell() does this:

    m_windowShell->setWindow(m_frame->domWindow());

Since m_frame's m_domWindow is 0, a new DOMWindow gets created by Frame::domWindow() for this purpose. This DOMWindow quickly becomes irrelevant, because later in CachedPage::restore() the window shell's window gets set to the correct value. It seems to get destroyed when its JS wrapper gets garbage collected, which happens when the window is closed. If I collect on every allocation, this stray DOMWindow gets destroyed while the Frame is still alive and no crash occurs.

The reason it crashes is that my implementation of Frame::setDOMWindow() does not add the current DOMWindow to the list of previous live DOMWindows like Frame::clearDOMWindow() does. If I fix this by doing the same thing as clearDOMWindow(), this problem goes away, although it is still a bit strange we create a meaningless DOMWindow and immediately replace it.
Comment 14 Cameron Zwarich (cpst) 2008-12-13 23:26:04 PST
Created attachment 26010 [details]
Proposed patch

One question: now that Frame::clearDOMWindow() can just be setDOMWindow(0), should I implement it this way?
Comment 15 Darin Adler 2008-12-14 08:07:37 PST
Comment on attachment 26010 [details]
Proposed patch

> +    return m_window ? m_window.get()->impl() : 0;

You shouldn't need a get() here.

r=me
Comment 16 Darin Adler 2008-12-14 08:08:14 PST
(In reply to comment #14)
> One question: now that Frame::clearDOMWindow() can just be setDOMWindow(0),
> should I implement it this way?

Lets leave that alone for now. It's unfortunate we have to have a setDOMWindow at all; maybe we can eliminate it at some point on the future.
Comment 17 Cameron Zwarich (cpst) 2008-12-14 12:06:04 PST
Comment on attachment 26010 [details]
Proposed patch

Sigh, this patch has the same DOMWindow lifetime issue as the previous patch. The above repro steps work if I also type "Cameron" in the Google search field before I close the window when I go back.

I am clearing the review flag.
Comment 18 Cameron Zwarich (cpst) 2008-12-14 14:56:09 PST
The backtrace of the crash:

#0  0x0345b8f9 in WTF::RefPtr<WebCore::StringImpl>::operator! (this=0x4) at RefPtr.h:62
#1  0x03b6dadb in WebCore::String::length (this=0x4) at /Users/Cameron/WebKit/WebCore/platform/text/String.cpp:215
#2  0x03657b09 in WebCore::FrameLoader::shouldTreatSchemeAsLocal (scheme=@0x4) at /Users/Cameron/WebKit/WebCore/loader/FrameLoader.cpp:5026
#3  0x03b543dc in WebCore::SecurityOrigin::isLocal (this=0x0) at /Users/Cameron/WebKit/WebCore/page/SecurityOrigin.cpp:187
#4  0x03b543f3 in WebCore::SecurityOrigin::canAccess (this=0x0, other=0x1a3a4ee0) at /Users/Cameron/WebKit/WebCore/page/SecurityOrigin.cpp:120
#5  0x03796c54 in WebCore::JSDOMWindowBase::allowsAccessFromPrivate (this=0x1e954260, other=0x1e954260) at JSDOMWindowCustom.h:197
#6  0x037b0dd6 in WebCore::JSDOMWindowBase::allowsAccessFrom (this=0x1e954260, other=0x1e954260) at JSDOMWindowCustom.h:153
#7  0x03bac099 in -[WebScriptObject _isSafeScript] (self=0x1bbdfeb0, _cmd=0x4662911) at /Users/Cameron/WebKit/WebCore/bindings/objc/WebScriptObject.mm:223
#8  0x03bad49a in -[WebScriptObject valueForKey:] (self=0x1bbdfeb0, _cmd=0x94ac8028, key=0x13a14c) at /Users/Cameron/WebKit/WebCore/bindings/objc/WebScriptObject.mm:378

I sometimes saw this crash before, instead of the one from the JSDOMWindow destructor that I fixed with the new version of my patch. The m_securityOrigin member variable of the stray DOMWindow is never initialized.

I should figure out why the Objective C bindings are using the JS wrapper for this stray DOMWindow.
Comment 19 Cameron Zwarich (cpst) 2008-12-14 15:25:46 PST
The RootObject for the bad DOMWindow is created by ScriptController::bindingRootObject(), which is called from ScriptController::clearPlatformScriptObjects(), which is ultimately called from FrameLoader::clear().

Darin, do have any suggestions on what I should do? I think we should never create the stray DOMWindow when going back, because of problems like this, but I am not quite sure how to rearrange the code to achieve this goal. An optional DOMWindow* parameter for FrameLoader::clear() seems like a bad idea. The body of FrameLoader::code() could be split into multiple pieces that could be reused between FrameLoader::clear() and code for the specific b/f cache case. I am not even sure how critical the order of all of this code really is. Clearly,

    // Do this after detaching the document so that the unload event works.
    if (clearWindowProperties) {
        m_frame->clearDOMWindow();
        m_frame->script()->clearWindowShell();
    }

should be before

    if (clearScriptObjects)
        m_frame->script()->clearScriptObjects();

However, these can't both be put at the end of FrameLoader::load() because of this:

    // Do not drop the document before the ScriptController and view are cleared
    // as some destructors might still try to access the document.
    m_frame->setDocument(0);
Comment 20 Darin Adler 2008-12-14 18:11:35 PST
(In reply to comment #19)
> The RootObject for the bad DOMWindow is created by
> ScriptController::bindingRootObject(), which is called from
> ScriptController::clearPlatformScriptObjects(), which is ultimately called from
> FrameLoader::clear().

It seems bad that clearPlatformScriptObjects doesn't actually do any "clearing" at all. It connects the WindowScriptObject to a new global object. That's what I'd try to fix; clearly that connection can't be done until later in the navigation process.

I think we could try making clearPlatformScriptObjects actually set the root objects to 0. Then later we could set them up pointing to the new window with some separate ScriptController call other than clear(). Unfortunately, without a bigger design change we are stuck making the same WindowScriptObject work with a single window shell and multiple window objects, and yet it has a pointer to the root object which has a pointer to the window object inside. So we are going to have to point the WindowScriptObject at the new global object, and we're going to have to do so before any Objective-C code would have a chance to run.
Comment 21 Darin Adler 2008-12-14 18:12:06 PST
Cc'ing Geoff since he worked with the Objective-C bindings in days gone by.
Comment 22 Cameron Zwarich (cpst) 2008-12-14 18:21:33 PST
That sounds like a good fix, Darin. It has the added benefit of not changing a lot of loader code. I'll try to implement it tonight.
Comment 23 Cameron Zwarich (cpst) 2008-12-15 02:00:11 PST
Created attachment 26024 [details]
Revised proposed patch
Comment 24 Darin Adler 2008-12-15 05:59:40 PST
Comment on attachment 26024 [details]
Revised proposed patch

The naming isn't quite right here.

The "platform" calls are part of the "back end" of the ScriptController object, not designed to be called directly. So the right way to do this would be to have a new function, platform independent, that calls updatePlatformScriptObjects. That would be one more level of function call, but it would preserve the design.

I also think the words clear and update here are terribly confusing. That name is left over from when we had a Window object and we called "clear" to clear state out of the Window object and get it ready for re-use. Now the "update" function is used to connect script objects to a new DOM window, and yet the name doesn't say that.

I'm going to say review+ because I like how this works, but I really don't like the names.

r=me
Comment 25 Cameron Zwarich (cpst) 2008-12-15 11:44:03 PST
Landed in r39306. I'll make a separate bug to address the problem of finding better names.