Bug 63483 - [WebKit2] Crash loading page that adds/removes frame in DOMContentLoaded/loaded
Summary: [WebKit2] Crash loading page that adds/removes frame in DOMContentLoaded/loaded
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P1 Normal
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2011-06-27 14:42 PDT by Darin Adler
Modified: 2011-06-29 13:46 PDT (History)
7 users (show)

See Also:


Attachments
Test case (515 bytes, text/html)
2011-06-27 14:42 PDT, Darin Adler
no flags Details
Improved test case (1012 bytes, text/html)
2011-06-27 15:06 PDT, Darin Adler
no flags Details
Patch (4.09 KB, patch)
2011-06-27 17:56 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (7.35 KB, patch)
2011-06-28 17:49 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (4.42 KB, patch)
2011-06-29 12:52 PDT, Darin Adler
no flags Details | Formatted Diff | Diff
Patch (7.39 KB, patch)
2011-06-29 12:58 PDT, Darin Adler
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2011-06-27 14:42:23 PDT
When looking for a way to reproduce one of the most-reported crashes from the field, a crash in DocumentWriter::end <rdar://problem/9598940>, we got a pointer to this page: <http://www.sniqueaway.com/events/#none>.

Testing with the page and looking at the crash I was sometimes able to reproduce, I was able to produce a test page that demonstrates the problem. Now, I would like to know:

    1) Is this a regression? If so, which change caused it?
    2) How can we fix it.

Diagnosing this crash I have seen multiple crazy things going on in the loader. I just can’t decide which to treat as the proximate cause of the crash.
Comment 1 Darin Adler 2011-06-27 14:42:41 PDT
Created attachment 98788 [details]
Test case
Comment 2 Darin Adler 2011-06-27 14:49:47 PDT
Here are two problems I can see:

    1) When initializing the empty document for the new frame, we end up calling begin twice, and so we end up doing clear. This second clear ends up triggering a load event for the parent frame at a bad time. The first clear doesn’t do any work because of the m_needsClear optimization.

    2) When sending a DOMContentLoaded event, it’s not good that a loaded event might get sent during the processing. The loaded event should only be sent after DOMContentLoaded has been dispatched. Something in the code should enforce this rule. At the moment, if a subframe finishes loaded during DOMContentLoaded handling then the load event will fire. Even if we fix (1) I can imagine this happening if we do something like showModalDialog in a DOMContentLoaded handler.
Comment 3 Alexey Proskuryakov 2011-06-27 15:03:46 PDT
I'm wondering what should happen if stop() is called in a DOMContentLoaded handler. In Safari 5, I see two DOMContentLoaded events, followed by a load event. In ToT, there is only a DOMContentLoaded, and no load.
Comment 4 Darin Adler 2011-06-27 15:06:06 PDT
Created attachment 98797 [details]
Improved test case
Comment 5 Darin Adler 2011-06-27 15:07:04 PDT
(In reply to comment #3)
> I'm wondering what should happen if stop() is called in a DOMContentLoaded handler. In Safari 5, I see two DOMContentLoaded events, followed by a load event. In ToT, there is only a DOMContentLoaded, and no load.

What is stop()? Is it a function that JavaScript could call?
Comment 6 Adam Barth 2011-06-27 15:07:56 PDT
> What is stop()? Is it a function that JavaScript could call?

There's a window.stop(), if that's what Alexey means.
Comment 7 Darin Adler 2011-06-27 16:55:06 PDT
Looks like this crash is WebKit2-specific.

The problem seems to be in WebFrameLoaderClient::finishedLoading, which calls WebFrameLoaderClient::committedLoad. Non-WebKit2 platforms don’t do anything like that. That results in the document being created. This means that in WebKit2 we end up calling DocumentWriter::begin twice and create the document twice.

I am not sure what the right way to fix this is.

My test case shows that the out-of-order delivery of DOMContentLoaded and loaded is not WebKit2-specific, though.
Comment 8 Darin Adler 2011-06-27 16:55:43 PDT
My guess is that this is not a regression, and has always been true in WebKit2.
Comment 9 Darin Adler 2011-06-27 17:56:21 PDT
Created attachment 98832 [details]
Patch
Comment 10 Darin Adler 2011-06-27 17:57:57 PDT
My idea of how to move forward on this is as follows:

1) Discuss with at least one other engineer whether to fix this in WebKit2 code or land this WebCore change. Act on that decision.

2) Make a patch that includes the test case and the fix. The test case will still show the out-of-order event problem, but will demonstrate that the crash is gone.

3) After landing that patch, file a new bug about the out-of-order event problem.
Comment 11 Adam Barth 2011-06-27 17:59:11 PDT
Comment on attachment 98832 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98832&action=review

> Source/WebCore/loader/FrameLoader.cpp:230
> +    if (!m_frame->document()) {

You should be able to use the m_stateMachine to make this decision rather than looking at m_frame->document().  Using the state machine might be slightly better, but we should add ASSERTs to cross-check in any case.
Comment 12 Adam Barth 2011-06-27 18:00:09 PDT
I haven't looking into this deeply enough to have an opinion about whether this should be fixed in WebCore or WebKit2.
Comment 13 Darin Adler 2011-06-27 18:07:22 PDT
(In reply to comment #11)
> View in context: https://bugs.webkit.org/attachment.cgi?id=98832&action=review
> 
> > Source/WebCore/loader/FrameLoader.cpp:230
> > +    if (!m_frame->document()) {
> 
> You should be able to use the m_stateMachine to make this decision rather than looking at m_frame->document(). Using the state machine might be slightly better, but we should add ASSERTs to cross-check in any case.

OK. What I liked about checking document for 0 is that the goal of begin/end is to create the document, so it seemed natural to run it only if there was no document.
Comment 14 Adam Barth 2011-06-27 18:11:01 PDT
> OK. What I liked about checking document for 0 is that the goal of begin/end is to create the document, so it seemed natural to run it only if there was no document.

That's fine, but then you should ASSERT that the state machine is in the correct state.
Comment 15 Darin Adler 2011-06-28 14:54:20 PDT
(In reply to comment #14)
> > OK. What I liked about checking document for 0 is that the goal of begin/end is to create the document, so it seemed natural to run it only if there was no document.
> 
> That's fine, but then you should ASSERT that the state machine is in the correct state.

What state are you expecting in the two cases? It's unclear to me what would have changed the state.
Comment 16 Adam Barth 2011-06-28 15:24:08 PDT
> What state are you expecting in the two cases? It's unclear to me what would have changed the state.

The main question is whether the document that current exists in the frame is the initial document, which case we should be in the FrameLoaderStateMachine::DisplayingInitialEmptyDocument state.  If it's a real document, I would expect to be either in the CommittedFirstRealLoad or the FirstLayoutDone state.

If it's a real document, then it's dangerous to execute this line of code:

m_stateMachine.advanceTo(FrameLoaderStateMachine::DisplayingInitialEmptyDocument);

because that would be going backwards in the state machine and being confused about whether we've got a initial document or a real document will screw up the back/forward list and (possibly) some security checks.

Long story short, I hope we're in the DisplayingInitialEmptyDocument state if we've got a non-null document at this program point.
Comment 17 Darin Adler 2011-06-28 15:26:39 PDT
(In reply to comment #16)
> Long story short, I hope we're in the DisplayingInitialEmptyDocument state if we've got a non-null document at this program point.

This is the function that sets the state to DisplayingInitialEmptyDocument, so we won’t yet be in that state.
Comment 18 Adam Barth 2011-06-28 16:31:32 PDT
> This is the function that sets the state to DisplayingInitialEmptyDocument, so we won’t yet be in that state.

Maybe I'm confused about where this document is coming from then.  Are we not re-entering this function?  If we're getting a real document before the initial document, that's going to break a lot of invariants and we should fix this in the WebKit2 layer.
Comment 19 Darin Adler 2011-06-28 17:37:09 PDT
(In reply to comment #18)
> Maybe I'm confused about where this document is coming from then. Are we not re-entering this function? If we're getting a real document before the initial document, that's going to break a lot of invariants and we should fix this in the WebKit2 layer.

It’s not from reentering this function. I can’t remember exactly how it’s getting created. I seem to recall it was a pretty normal code path for loading data, and seemed more natural than making an explicit begin/end call, but you may be right that it does some other unwanted work in some way.
Comment 20 Darin Adler 2011-06-28 17:49:28 PDT
Created attachment 99012 [details]
Patch
Comment 21 Adam Barth 2011-06-28 17:56:33 PDT
Comment on attachment 99012 [details]
Patch

It seems from this discussion that this patch isn't correct.  If there's a non-initial document in the frame, then we surely don't want to execute line 233/238, which will get the state machine out of sync with what's actually going on in the Frame.

However, if your patch works in Debug, then something very strange is going on because line 233/238 ASSERTs that we're doing a valid state transition, which means we must be in the CreatingInitialEmptyDocument state.  However, that doesn't make any sense.  Something is going wrong more deeply than this patch suggests.
Comment 22 Darin Adler 2011-06-28 17:58:40 PDT
(In reply to comment #21)
> If there's a non-initial document in the frame

There is an empty document in the frame. I don’t know why you call it non-initial.

> However, if your patch works in Debug, then something very strange is going on because line 233/238 ASSERTs that we're doing a valid state transition, which means we must be in the CreatingInitialEmptyDocument state.

Yes, we haven’t moved on to another state.

> Something is going wrong more deeply than this patch suggests.

OK.

But keep in mind that this something that is going wrong has been going wrong already without the patch. The patch certainly isn’t making it worse.
Comment 23 Darin Adler 2011-06-28 18:01:40 PDT
And yes, the patch works fine in debug builds.
Comment 24 Adam Barth 2011-06-28 18:06:59 PDT
> > If there's a non-initial document in the frame
> 
> There is an empty document in the frame. I don’t know why you call it non-initial.

The "initial document" is a special document we create in FrameLoader::init so that Frame::document() is always non-null.  There a bunch of dancing we do to make that document mostly invisible, e.g., to the history system.

Given that we're not creating a document in FrameLoader::init (and that we're not re-entering FrameLoader::init), this document shouldn't be there.  If we're in the initial document state, we'll apply all the initial document hiding magic to this document, which likely isn't what we want.

Maybe the message from the WebProcess to load a document in the Frame should be buffered until the Frame has finished initializing?  Loading a real document into a partially initialized Frame sounds like a bad idea.

> But keep in mind that this something that is going wrong has been going wrong already without the patch. The patch certainly isn’t making it worse.

I'm starting to suspect that the final patch with ASSERT that document() is null immediately before calling begin() here (instead of having the branch that you've added).  In that sense, the patch appears to be going in the wrong direction.
Comment 25 Adam Barth 2011-06-28 18:08:57 PDT
> The problem seems to be in WebFrameLoaderClient::finishedLoading, which calls WebFrameLoaderClient::committedLoad. Non-WebKit2 platforms don’t do anything like that.

Why is WebFrameLoaderClient::finishedLoading calling WebFrameLoaderClient::committedLoad ?  That doesn't seem to make much sense.  Usually when navigating a Frame, we commit to a load and then sometime later the load finishes.
Comment 26 Darin Adler 2011-06-28 18:10:50 PDT
(In reply to comment #24)
> > > If there's a non-initial document in the frame
> > 
> > There is an empty document in the frame. I don’t know why you call it non-initial.
> 
> The "initial document" is a special document we create in FrameLoader::init so that Frame::document() is always non-null.  There a bunch of dancing we do to make that document mostly invisible, e.g., to the history system.

I understand your concern, but the document that is created is indeed still invisible to the history system.

Given your theory that something is going wrong I should be able to find a concrete problem. Since I can’t, I suspect your preference to do this differently is more a matter of taste than anything else. Regardless, though, I’m willing to do it.

> Maybe the message from the WebProcess to load a document in the Frame should be buffered until the Frame has finished initializing?

There is no message from the web process involved here. It’s all inside the web process.

> Loading a real document into a partially initialized Frame sounds like a bad idea.

That's a straw man. We are not doing that.
Comment 27 Darin Adler 2011-06-28 18:11:23 PDT
(In reply to comment #25)
> Why is WebFrameLoaderClient::finishedLoading calling WebFrameLoaderClient::committedLoad ? That doesn't seem to make much sense. Usually when navigating a Frame, we commit to a load and then sometime later the load finishes.

I’ll have to ask Anders that.
Comment 28 Adam Barth 2011-06-28 18:17:37 PDT
As an example, here are a couple places where we do something different depending on whether we're displaying the initial document:

http://trac.webkit.org/browser/trunk/Source/WebCore/loader/FrameLoader.cpp#L396
http://trac.webkit.org/browser/trunk/Source/WebCore/loader/DocumentWriter.cpp#L126

There's a risk that if that piece of state is incorrect, we'll end up not clearing out some JavaScript state during navigation that we should, which might lead to a security problem.  There are a bunch of preconditions to that happening, but it's one reason why this state is somewhat sensitive.
Comment 29 Darin Adler 2011-06-28 19:02:54 PDT
(In reply to comment #28)
> There's a risk that if that piece of state is incorrect, we'll end up not clearing out some JavaScript state during navigation that we should, which might lead to a security problem. There are a bunch of preconditions to that happening, but it's one reason why this state is somewhat sensitive.

I am having trouble understanding what this concern is in concrete terms. Those call sites check isDisplayingInitialEmptyDocument, and that state is true in the WebKit2 code path in question.
Comment 30 Adam Barth 2011-06-28 20:30:35 PDT
> I am having trouble understanding what this concern is in concrete terms. Those call sites check isDisplayingInitialEmptyDocument, and that state is true in the WebKit2 code path in question.

Correct.  When isDisplayingInitialEmptyDocument, we intentionally leak some JavaScript objects from document to another.  In general, leaking JavaScript objects between objects isn't safe.  It's only safe (and needed for web compat) in the specific case of the initial document.  In this code path, there's some other document (i.e., not created by Frame::init) that's playing the role of the initial document.  The consequences of that are unknown and potentially dangerous.
Comment 31 Darin Adler 2011-06-29 08:30:51 PDT
(In reply to comment #30)
> > I am having trouble understanding what this concern is in concrete terms. Those call sites check isDisplayingInitialEmptyDocument, and that state is true in the WebKit2 code path in question.
> 
> Correct.  When isDisplayingInitialEmptyDocument, we intentionally leak some JavaScript objects from document to another.  In general, leaking JavaScript objects between objects isn't safe.  It's only safe (and needed for web compat) in the specific case of the initial document.

This *is* the initial document.

> In this code path, there's some other document (i.e., not created by Frame::init) that's playing the role of the initial document. The consequences of that are unknown and potentially dangerous.

Hard to argue with the fact that “consequences are unknown”, but I think it borders on FUD. This other document *is* created by Frame::init, just in a slightly different code path, and not by the begin call inside FrameLoader::init.

But, anyway, this debate is getting us nowhere. I will propose another fix. I haven’t yet because I haven’t had any time at a computer.
Comment 32 Darin Adler 2011-06-29 11:06:50 PDT
I found the reason that this behavior is different on various platforms. It's this code in FrameLoader::finishedLoadingDocument:

    // FIXME: Platforms shouldn't differ here!
#if PLATFORM(WIN) || PLATFORM(CHROMIUM)
    if (m_stateMachine.creatingInitialEmptyDocument())
        return;
#endif

To fit in with the other platforms, WebKit2 needs this early return. WebKit1 on Mac does not need the early return, but the reason why it does not is that the finishedLoading function eventually encounters a null pointer and so ends up doing nothing.

Most other platforms have patterned their finishedLoading functions after the one from Windows, yet are not included in this #if. It seems likely that some of them suffer from the same problem as Mac WebKit2.

It’s easy to fix this for any given platform by adding it to this white list. That means that I can fix this for Mac/WK2 by adding PLATFORM(MAC) to the list. That leaves elf, gtk, haiku, qt, wince, and wx.

I think this “Windows/Chromium” code path eventually should be on for all platforms. I’m thinking we could make it a black list instead of a white list and remove platforms one at a time as they are tested.
Comment 33 Darin Adler 2011-06-29 11:08:56 PDT
(In reply to comment #25)
> > The problem seems to be in WebFrameLoaderClient::finishedLoading, which calls WebFrameLoaderClient::committedLoad. Non-WebKit2 platforms don’t do anything like that.
> 
> Why is WebFrameLoaderClient::finishedLoading calling WebFrameLoaderClient::committedLoad? That doesn't seem to make much sense. Usually when navigating a Frame, we commit to a load and then sometime later the load finishes.

Two comments:

1) This function call is the same in the Windows port, not specific to WebKit2, so probably the same in most WebKit client implementations.

2) The finishedLoading function calls the committedLoad to share code. It’s not saying that we committed the load, but rather that when finish loading we need to do all the same things we did when committed the load. It’s possible that there is some extra work here we could eliminate. But cleaning that up seems outside the scope of this bug.
Comment 34 Adam Barth 2011-06-29 11:12:09 PDT
Switching to a blacklist sounds like a good idea.  That stanza has always been super mysterious to me.
Comment 35 Darin Adler 2011-06-29 11:24:35 PDT
On reflection, I don’t think we really need a blacklist. There is no evidence that this is helpful on any platform, so I will instead remove the #if entirely. We’ll have to test, but we’d have to test in any case.
Comment 36 Darin Adler 2011-06-29 12:52:19 PDT
Created attachment 99130 [details]
Patch
Comment 37 Darin Adler 2011-06-29 12:58:58 PDT
Created attachment 99131 [details]
Patch
Comment 38 Adam Barth 2011-06-29 13:30:51 PDT
Comment on attachment 99131 [details]
Patch

Thanks for iterating on this patch.  The final version came out very well.
Comment 39 Joseph Pecoraro 2011-06-29 13:33:50 PDT
Comment on attachment 99131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99131&action=review

> LayoutTests/fast/loader/create-frame-in-DOMContentLoaded-expected.txt:5
> +A) Entered DOMContentLoaded event handler function.
> +C) Entered load event handler function.
> +D) Exiting load event handler function.
> +Test passed if messages A, B, C, D were all in order and there was no crash.
> +B) Exiting DOMContentLoaded event handler function.

This may be correct but based on the wording it doesn't sounds correct. The
results show "A, C, D, B" not "A, B, C, D". Was this expected?
Comment 40 Darin Adler 2011-06-29 13:42:22 PDT
(In reply to comment #39)
> (From update of attachment 99131 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=99131&action=review
> 
> > LayoutTests/fast/loader/create-frame-in-DOMContentLoaded-expected.txt:5
> > +A) Entered DOMContentLoaded event handler function.
> > +C) Entered load event handler function.
> > +D) Exiting load event handler function.
> > +Test passed if messages A, B, C, D were all in order and there was no crash.
> > +B) Exiting DOMContentLoaded event handler function.
> 
> This may be correct but based on the wording it doesn't sounds correct. The
> results show "A, C, D, B" not "A, B, C, D". Was this expected?

Yes, this test shows an expected failure, not the crash, but another bug. I’ll be filing a new bug report about that failure when I land this patch.

I mentioned this in comment #10 above.
Comment 41 WebKit Review Bot 2011-06-29 13:46:46 PDT
Comment on attachment 99131 [details]
Patch

Clearing flags on attachment: 99131

Committed r90038: <http://trac.webkit.org/changeset/90038>
Comment 42 WebKit Review Bot 2011-06-29 13:46:53 PDT
All reviewed patches have been landed.  Closing bug.