Bug 41507 - WebSocket: Crash caused by calling close() within onmessage handler
Summary: WebSocket: Crash caused by calling close() within onmessage handler
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Yuta Kitamura
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-01 23:01 PDT by Yuta Kitamura
Modified: 2010-07-12 19:42 PDT (History)
3 users (show)

See Also:


Attachments
Fix (8.43 KB, patch)
2010-07-01 23:16 PDT, Yuta Kitamura
no flags Details | Formatted Diff | Diff
a simpler test (733 bytes, application/x-javascript)
2010-07-07 14:09 PDT, Alexey Proskuryakov
no flags Details
Fix v2 (Simplify test, add comments) (8.66 KB, patch)
2010-07-09 02:44 PDT, Yuta Kitamura
no flags Details | Formatted Diff | Diff
Fix v3 (Improve comments as per Alexey's suggestion) (8.79 KB, patch)
2010-07-12 06:41 PDT, Yuta Kitamura
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuta Kitamura 2010-07-01 23:01:01 PDT
If you create a WebSocket object during onclose() handler of another WebSocket object, the browser will crash.

This crash may be related to http://crbug.com/44792.
Comment 1 Yuta Kitamura 2010-07-01 23:16:13 PDT
Created attachment 60341 [details]
Fix
Comment 2 Fumitoshi Ukai 2010-07-02 00:35:41 PDT
Comment on attachment 60341 [details]
Fix

LGTM
Comment 3 Alexey Proskuryakov 2010-07-02 10:29:58 PDT
Comment on attachment 60341 [details]
Fix

The code change looks reasonable, but there are some logical steps missing from the explanation. Why does creating a WebSocket object during onclose() handler of another WebSocket object cause this crash? They have separate channels and bridges.

+    while (m_workerContext && clientWrapper && !clientWrapper->syncMethodDone() && result != MessageQueueTerminated) {
+        result = runLoop.runInMode(m_workerContext.get(), m_taskMode); // May dereference this bridge and m_workerContext.
         clientWrapper = m_workerClientWrapper.get();
     }

What is the exact mechanism that dereferences worker context while waiting here? Is this part of the change covered by the test?
Comment 4 Yuta Kitamura 2010-07-06 22:52:00 PDT
(In reply to comment #3)
> (From update of attachment 60341 [details])
> The code change looks reasonable, but there are some logical steps missing from the explanation. Why does creating a WebSocket object during onclose() handler of another WebSocket object cause this crash? They have separate channels and bridges.
> 
> +    while (m_workerContext && clientWrapper && !clientWrapper->syncMethodDone() && result != MessageQueueTerminated) {
> +        result = runLoop.runInMode(m_workerContext.get(), m_taskMode); // May dereference this bridge and m_workerContext.
>          clientWrapper = m_workerClientWrapper.get();
>      }
> 
> What is the exact mechanism that dereferences worker context while waiting here? Is this part of the change covered by the test?

I found the test case first, and then looked into the failure. So I'm not sure about the exact mechanism of why this test causes the failure, but it is 100% reproducible with this test.

The source of the failure is that calling WorkerRunLoop::runInMode() may call WebSocket::didClose() (which frees everything). The stack trace is like the following:

    WebSocket::close() is called (in ws1.close())
 -> WorkerThreadableWebSocketChannel::bufferedAmount()
 -> WorkerThreadableWebSocketChannel::Bridge::bufferedAmount()
 -> WorkerThreadableWebSocketChannel::Bridge::waitForMethodCompletion()
 -> WorkerRunLoop::runInMode()
 -> ... -> workerContextDidClose()
 -> ThreadableWebSocketChannelClientWrapper::didClose()
 -> ThreadableWebSocketChannelClientWrapper::processPendingEvents()
 -> WebSocket::didClose()

This causes the channel and other things to get closed, while WebSocket::close() does not think such thing should happen.

> +        result = runLoop.runInMode(m_workerContext.get(), m_taskMode); // May dereference this bridge and m_workerContext.

This comment was a bit inaccurate. Since m_workerContext may become null in Bridge::disconnect(), the comment should be "May causes the bridge to be disconnected, which makes m_workerContext become null." I will update the comment in a new patch.
Comment 5 Alexey Proskuryakov 2010-07-07 14:09:45 PDT
Created attachment 60776 [details]
a simpler test

I'll need to think about this patch more deeply, we may be violating some more general ownership rules here. It's awesome that we have a reproducible test case!

>     m_bufferedAmountAfterClose = m_channel->bufferedAmount(); // May dereference m_channel.
>     if (m_channel)
>         m_channel->close();

This comment at least is wrong - we don't care about m_channel being dereferenced - all we care about to make this call is m_channel being non-null. WebSocket holds its own reference to channel.

I have a simpler reproducible case, one that doesn't even need to create another WebSocket. It crashes in several different places.
Comment 6 Alexey Proskuryakov 2010-07-07 17:16:33 PDT
Comment on attachment 60341 [details]
Fix

The title of the bug isn't great - there are multiple crash locations being fixed, not just waitForMethodCompletion(). And this isn't about creating a WebSocket object from onclose(), as the original description said.

I think that this patch is fine, and just needs better comments.
Comment 7 Yuta Kitamura 2010-07-09 02:44:40 PDT
Created attachment 61023 [details]
Fix v2 (Simplify test, add comments)
Comment 8 Yuta Kitamura 2010-07-09 02:47:42 PDT
Apologies for delay in response. In the new patch, I tried to:
- Add more comments and clarify the situation.
- Simplify the test according to Alexey's proposal.

Could you take another look?
Comment 9 Alexey Proskuryakov 2010-07-09 10:23:30 PDT
Comment on attachment 61023 [details]
Fix v2 (Simplify test, add comments)

 +      m_bufferedAmountAfterClose = m_channel->bufferedAmount(); // May dereference m_channel.

This comment is still misleading, we don't care about it being dereferenced. WebSocket holds its own reference to m_channel.

I thin that the real explanation is that a didClose notification may be already queued, which we will inadvertently process while waiting for bufferedAmount() to return.
Comment 10 Yuta Kitamura 2010-07-12 06:41:46 PDT
Created attachment 61222 [details]
Fix v3 (Improve comments as per Alexey's suggestion)
Comment 11 WebKit Commit Bot 2010-07-12 19:42:50 PDT
Comment on attachment 61222 [details]
Fix v3 (Improve comments as per Alexey's suggestion)

Clearing flags on attachment: 61222

Committed r63159: <http://trac.webkit.org/changeset/63159>
Comment 12 WebKit Commit Bot 2010-07-12 19:42:54 PDT
All reviewed patches have been landed.  Closing bug.