Bug 25043 - Race condition in MessagePort::clone()
Summary: Race condition in MessagePort::clone()
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: David Levin
Depends on:
Reported: 2009-04-03 17:20 PDT by Andrew Wilson
Modified: 2009-06-21 15:56 PDT (History)
2 users (show)

See Also:

proposed patch (72.93 KB, patch)
2009-06-07 11:54 PDT, Andrew Wilson
levin: review-
Details | Formatted Diff | Diff
revised patch (78.28 KB, patch)
2009-06-09 17:38 PDT, Andrew Wilson
levin: review-
Details | Formatted Diff | Diff
Proposed patch addressing Levin's comments (82.78 KB, patch)
2009-06-16 22:32 PDT, Andrew Wilson
no flags Details | Formatted Diff | Diff
Reflecting Levin's latest comment's (82.78 KB, patch)
2009-06-17 10:46 PDT, Andrew Wilson
levin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Wilson 2009-04-03 17:20:05 PDT
MessagePort::clone() has a race condition - if someone is posting a message to a MessagePort while it is being cloned by another thread, it's possible for that message to get left on the now-defunct port which means it is essentially dropped.

There's a comment in MessagePort::clone() describing where this happens.
Comment 1 Andrew Wilson 2009-06-07 11:54:25 PDT
Created attachment 31033 [details]
proposed patch

Refactored the code (again) to enable a multi-process implementation and to pass messages/do GC in a thread-safe manner.

Updated the API to bring it close to the current spec (we still don't support passing an array of ports to postMessage()).
Comment 2 David Levin 2009-06-09 11:29:49 PDT
Comment on attachment 31033 [details]
proposed patch

Two initial comments:
1. This patch does a lot more than what the bug says.
2. It is massive.  It would be easier (faster) to review if the individual things were separate patches instead of doing so much in one patch.

One other metanote if you can hide the constructor of classes and instead making them have a create method which passes back PassOwnPtr (or PassRefPtr if that makes sense), that is preferred.

I think it would be good to deal with the MessagePortProxy not deriving from MessagePortProxyBase isse before I look at it further so r- for that issue for now.

Here's an initial pass just about style issues (I'll do a more in depth review to understand it and make comments at that level later):

> diff --git a/JavaScriptCore/ChangeLog b/JavaScriptCore/ChangeLog
> index e3c19af..5affab7 100644
> --- a/JavaScriptCore/ChangeLog
> +++ b/JavaScriptCore/ChangeLog
> @@ -1,3 +1,13 @@
> +2009-06-07  Drew Wilson  <atwilson@google.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Added support for multi-threaded MessagePorts.
> +

Needs link to bug.

> diff --git a/JavaScriptCore/wtf/MessageQueue.h b/JavaScriptCore/wtf/MessageQueue.h
>          MessageQueue() : m_killed(false) {}
Ideally add space { }

> +        bool appendAndCheckEmpty(const DataType&);

This name sounds awkward.  Are there any similar methods in wtf (or WebKit) that could be used as inspiration?

> +    template<typename DataType>
> +    inline bool MessageQueue<DataType>::appendAndCheckEmpty(const DataType& message)
> +    {
> +        MutexLocker lock(m_mutex);
> +        bool empty = m_queue.isEmpty();
wasEmpty seems better especially when it is returned.

> diff --git a/LayoutTests/ChangeLog b/LayoutTests/ChangeLog
> index 85cc97f..add365a 100644
> --- a/LayoutTests/ChangeLog
> +++ b/LayoutTests/ChangeLog
> @@ -1,3 +1,24 @@
> +2009-06-07  Drew Wilson  <atwilson@google.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Updated MessageChannel/MessagePorts tests to reflect latest spec (close event has been removed).
> +        Added more tests of port cloning.

Link to bug.

Also some of these "Added" seem like they are copied from other files.  It would be good to acknowledge that.

> diff --git a/LayoutTests/fast/events/message-channel-gc-2.html-disabled b/LayoutTests/fast/events/message-channel-gc-2.html-disabled
> @@ -6,7 +6,7 @@ function gc()
>  {
>      if (window.GCController)
>          return GCController.collect();
> -
> + 
Remove added whitespace.

>      for (var i = 0; i < 10000; i++) { // > force garbage collection (FF requires about 9K allocations before a collect)
I don't understand the ">" in the comment. (I see this was copied from another place, but it still would be good to fix it if you don't understand it either.)

> diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog
> index 19da779..24b3860 100644
> --- a/WebCore/ChangeLog
> +++ b/WebCore/ChangeLog
> @@ -1,3 +1,87 @@
> +2009-06-07  Drew Wilson  <atwilson@google.com>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Removed obsolete MessagePort.startConversation(), active and onclose APIs.

Bug link

> +
> +        Refactored MessagePortProxy into MessagePortProxyBase and a platform-dependent MessagePortProxy implementation. Modified APIs to simplify cross-process implementations by moving the messaging code entirely into the platform-dependent proxy.

Usually changelog comments are wrapped to something shorter (see other comments in there).

> diff --git a/WebCore/dom/MessagePort.cpp b/WebCore/dom/MessagePort.cpp
>  MessagePort::MessagePort(ScriptExecutionContext* scriptExecutionContext)
>      : m_entangledPort(0)
>      , m_queueIsOpen(false)
> -    , m_scriptExecutionContext(scriptExecutionContext)
> -    , m_pendingCloseEvent(false)
>  {
> -    if (scriptExecutionContext)
> -        scriptExecutionContext->createdMessagePort(this);
> +    ASSERT(scriptExecutionContext);
> +    m_scriptExecutionContext = scriptExecutionContext;

Prefer the " , m_scriptExecutionContext(scriptExecutionContext)" style of initialization where possible.
If a variable can never be NULL, use the & style of declaration.

> @@ -128,47 +63,44 @@ void MessagePort::postMessage(const String& message, ExceptionCode& ec)
>  void MessagePort::postMessage(const String& message, MessagePort* dataPort, ExceptionCode& ec)
>  {
> -    if (!m_entangledPort || !m_scriptExecutionContext)
> +    ASSERT(m_scriptExecutionContext);
> +    if (!m_entangledPort)
>          return;
> -    RefPtr<MessagePort> newMessagePort;
> +    PassOwnPtr<MessagePortProxyBase> proxy;
Is it possible to use a OwnPtr here?  (For PassRefPtr they would never be declarated like this.  You would use a RefPtr instead.)

> diff --git a/WebCore/dom/MessagePortProxyBase.h b/WebCore/dom/MessagePortProxyBase.h
> +    class MessagePortProxyBase {

Do you wish to allow this class to be copied?  If no, privately derive from Noncopyable (wtf/Noncopyable.h).  (Think of this just like DISALLOW_COPY_AND_ASSIGN.)

> +    public:
> +
Typically there isn't a blank line here in WK code.

> +
> +        MessagePortProxyBase(PassRefPtr<MessagePortProxy>);
> +        ~MessagePortProxyBase();
Seems like these should be "protected:"
Typically there *is* a blank line here in WK code (right before private:/protected:).

> +    private:
> +
Typically there isn't a blank line here in WK code.

> diff --git a/WebCore/dom/default/MessagePortProxy.cpp b/WebCore/dom/default/MessagePortProxy.cpp

> +void MessagePortProxy::setRemotePort(MessagePort* port)
> +{
> +    MutexLocker lock(m_mutex);
> +    // Should never set port if it is already set.
> +    if (port)
> +        ASSERT(!m_remotePort);
"if assert" looks wierd to me at least.

What about
  ASSERT(!port || !m_remotePort);   

> +
> +void MessagePortProxy::postMessageToRemote(PassRefPtr<MessagePortProxyBase::EventData> msg)
"msg" -> "message"  avoid abbreviations.

> +bool MessagePortProxy::isProxyFor(MessagePort* port)
> +{
> +    RefPtr<MessagePortProxy> remote = entangledProxy();
> +    if (!remote)
> +        return false;
> +    return port == remote->remotePort();
  return remote && port == remote->remotePort()

your choice.

> +void MessagePortProxy::close()
> +{
> +    RefPtr<MessagePortProxy> remote = entangledProxy();
> +    if (remote) {
> +        closeInternal();
> +        remote->closeInternal();
> +    }

Consider using an early return pattern (which is more preferred in general in WK).

    if (!remote)

> diff --git a/WebCore/dom/default/MessagePortProxy.h b/WebCore/dom/default/MessagePortProxy.h
> +
> +    class MessagePort;
> +
> +    // MessagePortProxy is a platform-dependent interface to the remote side of a message channel.
> +    // This default implementation supports multiple threads running within a single process. Implementations for multi-process platforms should define these public APIs in their own platform-specific MessagePortProxy file.
> +    // The goal of this implementation is to eliminate contention except when cloning or closing the port, so each side of the channel has its own separate mutex.
> +    class MessagePortProxy : public ThreadSafeShared<MessagePortProxy> {

When there is a FooBase, Foo derives from FooBase but that pattern isn't followed here. Why not? Do the names need to be changed?

> +    public:
> +        // Creates a channel entangling two ports.
> +        static void createChannel(PassRefPtr<MessagePort>, PassRefPtr<MessagePort>);
Nice to add blank line here.

> +        class MessagePortQueue : public ThreadSafeShared<MessagePortQueue>
> +        {
{ is placed incorrectly.

> +        private:
> +            MessagePortQueue() {}
{ } add space.
Comment 3 Andrew Wilson 2009-06-09 17:38:40 PDT
Created attachment 31118 [details]
revised patch

Cleaned up style nits and renamed MessagePortProxyBase->MessagePortProxyWrapper to better reflect its purpose.

Also added a missing layout test which inexplicably didn't make it into the patch, but *was* in the ChangeLog.
Comment 4 Andrew Wilson 2009-06-09 17:42:09 PDT
Apologies for the size of this patch, btw. The API changes required to provide a threadsafe implementation were not really possible to make in an incremental fashion.
Comment 5 David Levin 2009-06-16 15:19:28 PDT
As discussed, a better name for MessagePortProxyWrapper would be MesssagePortChannel and MessagePortProxy could be PlatformMessagePortChannel.

A misc comments as I work my way through this:
* You'll need to fix up other makefiles in addition to the osx one.

> diff --git a/WebCore/dom/MessageChannel.cpp b/WebCore/dom/MessageChannel.cpp
>  MessageChannel::MessageChannel(ScriptExecutionContext* context)

* This instance also cannot be given a null context so it should take a ScriptExecutionContext& as well.

> WebCore/dom/default/MessagePortProxy.h
I don't think dom is the right place for platform specific items.  Is that where you're going with the "default" directory?

So (using the new name), PlatformMessagePortChannel should be in a different directory but perhaps we should ask about this on #webkit.

> void MessagePortProxy::createChannel(PassRefPtr<MessagePort> port1, PassRefPtr<MessagePort> port2)
>     RefPtr<MessagePortProxy> proxy1 = adoptRef(new MessagePortProxy(queue1, queue2));
Avoid "new Foo()" and instead use create methods as much as possible.

Should be MessagePortProxy::create.

> class MessagePort
>        ScriptExecutionContext* m_scriptExecutionContext;
I would make this a ScriptExecutionContext& since it can never be 0 and you can get rid of a bunch of asserts.

>          class MessagePortQueue ...
>             MessageQueue<RefPtr<MessagePortProxyWrapper::EventData> > m_queue;
RefPtr in a cross thread queue for an object that is RefCounted (a non thread safe ref counting) makes me nervous.

From my cursory inspection, it looks like this could be changed to OwnPtr and EventData would no longer be RefCounter and           EventData::create would return a PassOwnPtr<EventData>.

I'll have more comments coming but you wanted them streaming as much as possible.

Comment 6 David Levin 2009-06-16 16:18:31 PDT
> tryGetMessageFromRemote(RefPtr<MessagePortProxyWrapper::EventData>& result)

In MessagePort.cpp, it is odd that m_queueIsOpen is not set to false after MessagePort::close is called.  Perhaps it should have a different name: m_started?  or even better m_enabled? (which is terminology from the spec).

Other minor very minor nits:
In LayoutTests/fast/events/message-port-clone.html-disabled:
* it would be good to add periods to your comments (in log messages too).
* it switches from indenting by 4 to indenting by 2 spaces.  
* the formatting for things like "channel.port2.onmessage = function(evt)" is inconsistent.  (Sometimes you put the { at the end of the line and sometimes at the beginning of the next line.  I've seen both in these js files.  It seems like at the beginning of the next line is more consistent with WebKit in general.)
* var msgIndex = 1;  I'd change it to messageIndex to avoid abbreviations.

General style:
1. Blank line between functions.
2. No blank line after public:, private: but one blank line before them when they are in the middle of the class.

Following these:
Before void MessagePort::contextDestroyed(), there is an extra blank line.
In class MessagePortProxyWrapper, there is an extra blank line after public:
In class MessagePortProxyWrapper::EventData, it would be nice to add a blank line before private:
In class MessagePortProxy::MessagePortQueue, it would be nice to add a blank line before private:

In class MessagePortProxy::MessagePortQueue, the "inline" before each function defined in the class is redundant, so I'd just remove it.

So far the functionality looks great.  I'm almost done.

Comment 7 David Levin 2009-06-16 16:46:19 PDT
In WebCore/dom/MessagePortProxyWrapper.cpp, there is an extra blank before the "} // namespace WebCore"

Since I asked, I'll note it that the GC for jsc is handled in JSDOMBinding.cpp in markActiveObjectsForContext.
Comment 8 David Levin 2009-06-16 17:18:51 PDT
> LayoutTests/fast/events/message-port-clone.html-disabled

Why not post one of the messages before sending along the port? i.e. swap these two lines:
  61 channel.port1.postMessage("msg", channel2.port1);
  62 channel2.port2.postMessage("1");
Comment 9 David Levin 2009-06-16 17:25:35 PDT
Comment on attachment 31118 [details]
revised patch

Last comment:

> LayoutTests/fast/events/message-port-clone.html-disabled
 var timer = setTimeout(function() {
     log("FAIL: timeout waiting for message delivery");
 }, 1000);
I don't think it is necessary to set a timeout as the layout test mechanisms will do a timeout.  (I just don't want the test to fail if someone is running it on heavily loaded machine and has adjusted the timeout in the script accordingly.)

There enough things to address that I'm marking r- for now, but in general it looks really good to me.
Comment 10 Andrew Wilson 2009-06-16 17:57:25 PDT
>>  MessageChannel::MessageChannel(ScriptExecutionContext* context)
>* This instance also cannot be given a null context so it should take a
>ScriptExecutionContext& as well.

This is pre-existing code. We can address this in a followup patch.

>> WebCore/dom/default/MessagePortProxy.h
>I don't think dom is the right place for platform specific items.  Is that
>where you're going with the "default" directory?

I think this is OK - I'm following the pattern under WebCore/page.

>>        ScriptExecutionContext* m_scriptExecutionContext;
>I would make this a ScriptExecutionContext& since it can never be 0 and you can
>get rid of a bunch of asserts.

It can't be zero at construction time, true, but my intention is that it *can* be set to zero later (I have a pending patch that does more optimal GC which detaches the port from the ScriptExecutionContext when it is cloned to enable it to be GC'd).

>In MessagePort.cpp, it is odd that m_queueIsOpen is not set to false after
>MessagePort::close is called.  Perhaps it should have a different name:
>m_started?  or even better m_enabled? (which is terminology from the spec).

Agreed that "started" is a better name - renamed this API (enabled is not particularly descriptive). I do not set m_started to false when close() is called, because the spec states that existing messages continue to be delivered even though the port has been closed.

* the formatting for things like "channel.port2.onmessage = function(evt)" is
inconsistent.  (Sometimes you put the { at the end of the line and sometimes at
the beginning of the next line.  I've seen both in these js files.  It seems
like at the beginning of the next line is more consistent with WebKit in
I wasn't sure what the style was originally, so I looked through other examples. It seems that the WebKit style differs depending on how the function is defined.

When defining a top-level function, the brace should be on a line by itself:

function foo()
   ...do stuff...

When defining an inline function, it should be on the same line
  setTimeout(function() {
      ...do stuff...
  }, 100);

This is just what I've gathered from other test files and I've tried to mimic that style. So I think the code is correctly formatted, but I'm happy to change it if I'm wrong.

I've addressed the other comments locally - I'll upload a patch shortly
Comment 11 Andrew Wilson 2009-06-16 22:32:39 PDT
Created attachment 31400 [details]
Proposed patch addressing Levin's comments
Comment 12 David Levin 2009-06-16 23:55:12 PDT
This looks good to me.  I have only 5 very trivial comments.

It looks like you put spaces in WebCore/GNUmakefile.am where there should be tabs.

157         RefPtr<Event> evt = MessageEvent::create(eventData->message(), "", "", 0, port);
It would be just slightly better to to port.release().

It looks like there is an extra blank line before void MessagePort::contextDestroyed().

In WebCore/dom/default/PlatformMessagePortChannel.h
  bool tryGetMessage(OwnPtr<MessagePortChannel::EventData>& result)

I think "message" would be better than" result".

Two parts:
  In WebCore/page/DOMWindow.cpp,
    PassRefPtr<MessageEvent> event(ScriptExecutionContext* document)
I'd recommend "context" instead of "document" since you don't seem to rely on it being a document.

        PassRefPtr<MessagePort> messagePort;
should be
       RefPtr<MessagePort> messagePort;
(and then do a release() like in #2 when creating the MessageEvent).
Comment 13 Andrew Wilson 2009-06-17 10:46:58 PDT
Created attachment 31420 [details]
Reflecting Levin's latest comment's
Comment 14 Andrew Wilson 2009-06-18 10:31:41 PDT
Also, I have a patch which restores the optimal same-thread-GC behavior (https://bugs.webkit.org/show_bug.cgi?id=26448). If it's possible to land both patches simultaneously I'd appreciate it, as it makes the Chromium merge much simpler.
Comment 15 David Levin 2009-06-18 15:57:04 PDT
Comment on attachment 31420 [details]
Reflecting Levin's latest comment's

Looks good. 

This change has progressed hugely, and I believe addresses the former concerns about lock contention.

As discussed with ap, if he has any comments in the future, you'll address them then.
Comment 16 David Levin 2009-06-18 15:57:22 PDT
Assigned to levin for landing.
Comment 17 David Levin 2009-06-21 14:34:32 PDT
Committed as http://trac.webkit.org/changeset/44915