Bug 67941

Summary: occasional crash in Chromium in dispatching keyEvent
Product: WebKit Reporter: Scott Graham <scottmg>
Component: UI EventsAssignee: Scott Graham <scottmg>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, fishd, jamesr, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
more informative assert
none
Patch
none
test case reduced from gmail bug report
none
Patch
none
Patch
none
update per review none

Description Scott Graham 2011-09-12 10:03:41 PDT
Chromium bug: http://code.google.com/p/chromium/issues/detail?id=96045


The crash is caused by the Document having been detached when processing the KeyEvent.

The invalid access is at 0x324 which is the offset of the m_focusedNode member of WebCore::Document. So, m_doc of Frame has been nulled at this point. I don't know yet why the ordering or detaching or whatever has changed. WebViewImpl.cpp:~657

    if (handler->keyEvent(evt)) {
        if (WebInputEvent::RawKeyDown == event.type) {
            // Suppress the next keypress event unless the focused node is a plug-in node.
            // (Flash needs these keypress events to handle non-US keyboards.)
--->        Node* node = frame->document()->focusedNode();
            if (!node || !node->renderer() || !node->renderer()->isEmbeddedObject())
                m_suppressNextKeypressEvent = true;
        }
        return true;
    }
Comment 1 Scott Graham 2011-09-12 10:10:45 PDT
Created attachment 107060 [details]
Patch
Comment 2 Scott Graham 2011-09-12 10:35:21 PDT
Created attachment 107064 [details]
more informative assert
Comment 3 James Robinson 2011-09-12 15:11:31 PDT
Seems reasonable to me, the final call would be Darin F's
Comment 4 Darin Fisher (:fishd, Google) 2011-09-13 20:57:01 PDT
Comment on attachment 107064 [details]
more informative assert

View in context: https://bugs.webkit.org/attachment.cgi?id=107064&action=review

> Source/WebKit/chromium/src/WebViewImpl.cpp:657
> +            ASSERT_WITH_MESSAGE(frame->document(),

Are you trying to catch this in production builds?  Have you considered
writing these values to the stack and then crashing (even in release
builds), so that we can get more informative crash reports?

Often you are not the one who will see this assertion, and when other
developers see the assertion, they'll most likely ignore it unless
they suspect their local changes led to the assertion.

This is why you don't see ASSERT_WITH_MESSAGE used much.  Getting the
info into a crash dump tends to be more valuable.

> Source/WebKit/chromium/src/WebViewImpl.cpp:661
>              if (!node || !node->renderer() || !node->renderer()->isEmbeddedObject())

because of this null check, we will now stop crashing in production.  won't that
mean that we lose information about this failure?

it seems really bad for the document of the focused node to not exist.  we shouldn't
have to be prepared for it to be null.  this tells me that this cannot be the right
fix.  it is just a band-aid fix.  the real fix is probably to somehow resolve why
document is null and make it not be null.
Comment 5 Darin Fisher (:fishd, Google) 2011-09-13 20:58:49 PDT
Actually, I re-read comment #0.  Do you have a test case that reproduces this?  It sounds like you are saying that the document disappearing could be a valid side-effect of dispatching the event.  I agree.

This tells me that we should probably grab the focusedNode before calling keyEvent.  We should retain the Node on the stack using a RefPtr.
Comment 6 Scott Graham 2011-09-14 13:56:33 PDT
(In reply to comment #5)
> Actually, I re-read comment #0.  Do you have a test case that reproduces this?  It sounds like you are saying that the document disappearing could be a valid side-effect of dispatching the event.  I agree.
> 
> This tells me that we should probably grab the focusedNode before calling keyEvent.  We should retain the Node on the stack using a RefPtr.

Unfortunately I haven't been able to reproduce in DRT. The patch was indeed intended as a band-aid fix (as it's a top-15-crash for renderer in dev) with the vague hope that a fuzz test might turn up the assert at some point.

Note that it's not the focusedNode that's lost its document, it's that the frame has lost its document. I'm not sure how bad of a state that is, but there is other code which simply guards against it. (Not saying that's necessarily a good thing).


More random information:

I'm not clear how we've gotten to this state. Document::detach() shuts down the event queue (so the key event wouldn't get processed), and detach() is almost always called by Frame::setDocument which in turn is the only way for the document to be null (as it is in the crash).

The only suspicious code is related to the document being in PageCache as then detach() won't be called by setDocument(), so the event queue wouldn't be closed, but I haven't found a way to exercise that path.
Comment 7 James Robinson 2011-09-14 14:01:04 PDT
We don't use the page cache in chromium (never have).
Comment 8 Scott Graham 2011-09-16 17:19:19 PDT
Created attachment 107750 [details]
Patch
Comment 9 Scott Graham 2011-09-16 17:20:45 PDT
I dug into this more. My best suspicion for the cause is the use of FocusController::focusedOrMainFrame() in focusedWebCoreFrame().

When RenderView receives the key message from the browser, it confirms that mainFrame is still there, and accesses document() w/o a null check. But, it looks like in WebViewImpl that a different frame could be used based on using focusedOrMainFrame().

Other callsites in WebViewImpl use focusedWebCoreNode() rather than manually doing something almost equivalent as keyEvent does currently. That function does two things differently:

1. uses focusedFrame() rather than focusedOrMainFrame(). 
2. checks for the document of the returned frame being null.

Users of this include autocompleteHandleKeyEvent() which is called earlier in keyEvent().

So, I think using that function instead is reasonable. Unfortunately I still can't explain the specific cause of why this started happening due to a very large search space (started on first release of m15 branch) and no repro.
Comment 10 Scott Graham 2011-09-23 12:20:11 PDT
Any objections to the most recent patch?
Comment 11 Darin Fisher (:fishd, Google) 2011-09-23 14:03:09 PDT
Comment on attachment 107750 [details]
Patch

OK, makes sense.  I'm sad that it is still unclear what is really going on, but that shouldn't hold up this patch.
Comment 12 WebKit Review Bot 2011-09-23 14:47:07 PDT
Comment on attachment 107750 [details]
Patch

Clearing flags on attachment: 107750

Committed r95861: <http://trac.webkit.org/changeset/95861>
Comment 13 WebKit Review Bot 2011-09-23 14:47:12 PDT
All reviewed patches have been landed.  Closing bug.
Comment 14 Tony Chang 2011-09-28 11:06:29 PDT
There is a repro with gmail and some labs enabled here: http://b/issue?id=5386206

Maybe we can turn that into a layout test?
Comment 15 Scott Graham 2011-09-28 11:09:06 PDT
(In reply to comment #14)
> There is a repro with gmail and some labs enabled here: http://b/issue?id=5386206
> 
> Maybe we can turn that into a layout test?

Thanks, that's great! I'll see if I can minimize it into a test.
Comment 16 Scott Graham 2011-09-30 12:46:04 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > There is a repro with gmail and some labs enabled here: http://b/issue?id=5386206
> > 
> > Maybe we can turn that into a layout test?
> 
> Thanks, that's great! I'll see if I can minimize it into a test.

I've minimized this from the reported gmail bug. It seems slightly different than the crash reports, though certainly related.

In the gmail case, the entire Frame is disappearing during the keyEvent handler. When the Frame is retrieved at WebViewImpl.cpp:627, it's stored in a raw pointer, and in certain circumstances, the handler(s) below can remove all other counted references to the Frame. (The sneakiest part was that mouse handling hangs on to another reference so it won't happen if the cursor is still in the frame!)

So, I think there needs to be a guard on the whole Frame.

Any thoughts? Is there any reason not to ref the Frame here?

I believe it's still possible for only the document to be cleanly removed also, so I intend to leave a null check there. (no repro for that though)

The crash is not currently reproducible in at least chromium DRT because random things (AXObjectCache via AccessibilityController in this case) are adding additional refs to the Frame. (Booooo! Any suggestions on improving that?)
Comment 17 Scott Graham 2011-09-30 12:47:29 PDT
Created attachment 109328 [details]
test case reduced from gmail bug report
Comment 18 Scott Graham 2011-10-03 16:31:02 PDT
Created attachment 109548 [details]
Patch
Comment 19 Scott Graham 2011-10-03 16:33:31 PDT
I think this is worth committing, even though I can't add a LayoutTest.

Changing DRT locally to remove other refcounts that Chromium doesn't have to the Frame does exhibit the crash&fix before&after this patch.
Comment 20 Tony Chang 2011-10-04 11:53:44 PDT
(In reply to comment #16)
> The crash is not currently reproducible in at least chromium DRT because random things (AXObjectCache via AccessibilityController in this case) are adding additional refs to the Frame. (Booooo! Any suggestions on improving that?)

I'm not sure I understand the problem, but changing DRT to be more like Chromium would be a good thing.  I would be happy to review such a change.
Comment 21 Scott Graham 2011-10-04 12:03:52 PDT
(In reply to comment #20)
> (In reply to comment #16)
> > The crash is not currently reproducible in at least chromium DRT because random things (AXObjectCache via AccessibilityController in this case) are adding additional refs to the Frame. (Booooo! Any suggestions on improving that?)
> 
> I'm not sure I understand the problem, but changing DRT to be more like Chromium would be a good thing.  I would be happy to review such a change.

Thanks,

The specific problem in this case is that DRT has testing code for accessibility functionality. This causes additional references to Frames which avoids the crash. The crash won't happen in Chromium if some particular accessibility features are turned on, but that's not the default state.

I'm not sure if there's a great way to make Chromium and DRT more similar. I guess I could add some test flags that allow disabling the accessibility objects to make this case crash, but it's going to be fragile because it's due to ref counting mess. (i.e. why should accessibility be disabled on this test? well, because at some point in change history it was the last reference... but in the future it could easily be some other random code that causes a bonus reference count to avoid the crash)
Comment 22 Tony Chang 2011-10-04 13:26:45 PDT
(In reply to comment #21)
> I'm not sure if there's a great way to make Chromium and DRT more similar. I guess I could add some test flags that allow disabling the accessibility objects to make this case crash, but it's going to be fragile because it's due to ref counting mess. (i.e. why should accessibility be disabled on this test? well, because at some point in change history it was the last reference... but in the future it could easily be some other random code that causes a bonus reference count to avoid the crash)

I was thinking the reverse, disable the accessibility objects in DRT by default and only enable it for the tests that need it.  That's probably pretty tricky, we should do that on a separate bug.

I think the change is fine.  I would go ahead and add the LayoutTest even if DRT chromium doesn't trigger the crash.  It should work if someone were to open it in Chromium (if it doesn't require DRT specific features) and it might catch a regression some day.
Comment 23 Scott Graham 2011-10-04 14:04:15 PDT
Created attachment 109681 [details]
Patch
Comment 24 Scott Graham 2011-10-04 14:08:41 PDT
(In reply to comment #22)
> I was thinking the reverse, disable the accessibility objects in DRT by default and only enable it for the tests that need it.  That's probably pretty tricky, we should do that on a separate bug.

Filed https://bugs.webkit.org/show_bug.cgi?id=69375

> I think the change is fine.  I would go ahead and add the LayoutTest even if DRT chromium doesn't trigger the crash.  It should work if someone were to open it in Chromium (if it doesn't require DRT specific features) and it might catch a regression some day.

Thanks, updated the patch.
Comment 25 Tony Chang 2011-10-04 14:22:55 PDT
Comment on attachment 109681 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=109681&action=review

> LayoutTests/fast/events/keyevent-iframe-removed-crash.html:1
> +<script>

Nit: <!DOCTYPE html>
<html>

> LayoutTests/fast/events/keyevent-iframe-removed-crash.html:29
> +<body onload="go()">
> +    PASS
> +</body>

Nit: Please add some text on how to run manually (press any key) and say the test passes if it does not crash.

> LayoutTests/fast/events/keyevent-iframe-removed-crash.html:30
> +

Nit: </html>
Comment 26 Scott Graham 2011-10-04 14:30:26 PDT
Created attachment 109693 [details]
update per review
Comment 27 WebKit Review Bot 2011-10-04 15:29:55 PDT
Comment on attachment 109693 [details]
update per review

Clearing flags on attachment: 109693

Committed r96654: <http://trac.webkit.org/changeset/96654>
Comment 28 WebKit Review Bot 2011-10-04 15:30:00 PDT
All reviewed patches have been landed.  Closing bug.