Bug 23705 - Messages from workers can take over main thread event loop
Summary: Messages from workers can take over main thread event loop
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P1 Normal
Assignee: Dmitry Titov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-03 02:48 PST by Alexey Proskuryakov
Modified: 2009-02-11 23:59 PST (History)
1 user (show)

See Also:


Attachments
Proposed patch (8.19 KB, patch)
2009-02-05 20:18 PST, Dmitry Titov
ap: review-
Details | Formatted Diff | Diff
Updated patch. (11.17 KB, patch)
2009-02-06 13:13 PST, Dmitry Titov
no flags Details | Formatted Diff | Diff
Updated patch, v3 (11.38 KB, patch)
2009-02-07 22:25 PST, Dmitry Titov
no flags Details | Formatted Diff | Diff
Proposed patch (9.80 KB, patch)
2009-02-10 21:18 PST, Dmitry Titov
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2009-02-03 02:48:54 PST
Steps to reproduce:
1. Open fast/workers/worker-timeout.html
2. Wait a few seconds for the length of log to become large.
3. Try to use Safari UI (e.g. close the window).

Result: the UI is blocked.

This happens because messages are delivered faster than they can be served. We need to give the main event loop a chance to run - and we also need to ensure that queued events don't use up all free memory.
Comment 1 Dmitry Titov 2009-02-03 09:32:13 PST
Relevant WHATWG discussion: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-November/017327.html
Comment 2 Dmitry Titov 2009-02-05 20:16:14 PST
I think there are 2 independent issues here:
1. UI freeze
2. Eventual OOM when message queue gets too large.

I think #1 is more urgent to address. When user can't even close window it's bad.
The problem is caused by 2 factors:
- we use [NSObject performSelectorOnMainThread] to schedule the callback on the main thread. This is internally implemented by creating a timer and posting it to the main thread's run loop. Unfortunately, the run loop picks the timers first and user input second, so in certain conditions user input is never processed.
- once we get to to the work item queue on the main thread, we process all the items and it can take very long time if there are a lot of them. The UI is frozen for that time as well.
Comment 3 Dmitry Titov 2009-02-05 20:18:26 PST
Created attachment 27376 [details]
Proposed patch

This patch addresses the issue for Mac platform. I will verify Win as well later.
Comment 4 Alexey Proskuryakov 2009-02-06 02:02:59 PST
Comment on attachment 27376 [details]
Proposed patch

+// 0.1 sec delays in UI is approximate threashold when they become noticeable.

Typo: should be threshold (repeated several times).

+static Mutex& creationMutex()
+{
+    DEFINE_STATIC_LOCAL(WTF::Mutex, staticMutex, ());
+    return staticMutex;
+}

I think that this has a race condition. Can you just use AtomicallyInitializedStatic in +[WTFMainThreadCaller mainThreadCaller]? Another option (which I like more) is to initialize WTFMainThreadCaller from WTF::initializeThreading.

+        // If we are running accumulated functions for too long so UI may become unresponsive, we need to
+        // yield so the user input can be processed. Otherwise user may not be able to even close the window.
+        // This code has effect only in case the scheduleDispatchFunctionsOnMainThread() is implemented in a way that
+        // allows input events to be processed before we are back here.

The timeout check happens before fetching the next item from the queue, so we may re-schedule even if the queue is already empty. But this doesn't seem to affect correctness.

+    // This method can be called from worker threads that do not have autorelease pool. 
+    NSAutoreleasePool*  pool = [[NSAutoreleasePool alloc] init]; 

Is there a reason not to use CFMachPort? Is the behavior we are relying upon documented for either? If CFMachPort is available in Windows CFNetwork, we could even have a common implementation for Mac and Windows.

I'm going to say r- due to WTFMainThreadCaller initialization issues - but this looks really good!
Comment 5 Dmitry Titov 2009-02-06 13:12:13 PST
(In reply to comment #4)
> (From update of attachment 27376 [details] [review])
> +// 0.1 sec delays in UI is approximate threashold when they become noticeable.
> 
> Typo: should be threshold (repeated several times).

Fixed.

> +static Mutex& creationMutex()
> +{
> +    DEFINE_STATIC_LOCAL(WTF::Mutex, staticMutex, ());
> +    return staticMutex;
> +}
> 
> I think that this has a race condition. Can you just use
> AtomicallyInitializedStatic in +[WTFMainThreadCaller mainThreadCaller]? Another
> option (which I like more) is to initialize WTFMainThreadCaller from
> WTF::initializeThreading.

Ooops, good catch! :-)
Turns out we already have initializemainThread() called from WTF::initializeThreading. So I split it inplatform-independent part and initializeMainThreadPlatform - especially since WIN implementation already goes this route with #ifdef PLATFORM(WIN). So now I don't really need a mutex in MainThreadMac!

> The timeout check happens before fetching the next item from the queue, so we
> may re-schedule even if the queue is already empty. But this doesn't seem to
> affect correctness.

It seems that unless we reschedule under a lock, there is always a potential race condition here. I left it as is since it'll over-schedule at some cases which is harmless.

> +    NSAutoreleasePool*  pool = [[NSAutoreleasePool alloc] init]; 
> 
> Is there a reason not to use CFMachPort? Is the behavior we are relying upon
> documented for either? If CFMachPort is available in Windows CFNetwork, we
> could even have a common implementation for Mac and Windows.

I could use CFMachPort. CF-Win seems to expose them... However, the WIN implementation is currently Win32-based and it's probably makes sense to keep it this way, for the reason of not splitting it into CF and WIN32 and considering the main message loop in some embedders will be Win32 (not sure CFMachPort will work right without CFRunLoop actuallu being used).
Please let me know what you think about this one.

Updated patch is coming.
Comment 6 Dmitry Titov 2009-02-06 13:13:39 PST
Created attachment 27412 [details]
Updated patch.

Addressed Alexey's review comments.
Comment 7 Alexey Proskuryakov 2009-02-07 03:20:55 PST
Comment on attachment 27412 [details]
Updated patch.

I don't know how CFMachPort is implemented - it's possible that it doesn't work with native Windows message loop indeed. Getting rid of autorelease pools seems good performance-wise, but it's not worth doing without actually measuring the effect (checking/optimizing Worker messaging performance is something we should definitely do one day).

+    NSPortMessage* messageObj = [[NSPortMessage alloc] initWithSendPort:m_port receivePort:nil components:nil]; 
+    [messageObj sendBeforeDate:[NSDate date]]; 

Is NSPortMessage destroyed automagically, or does it leak?

+    NSAutoreleasePool*  pool = [[NSAutoreleasePool alloc] init]; 

In Objective-C code, we place the stars in a different way.

+// We use NSPort here instead of more obvious [NSObject performSelectorOnMainThread] because the latter is implemented
+// by creating and posting a timer to the main thread's CFRunLoop. The timers are always fetced first before any other
+// types of messages

I was already going to say r=me, but then I re-read this comment, and realized that the same problem can happen without workers:

---------------------
function test()
{
  setTimeout(test, 0);
  setTimeout(test, 0);
}
setTimeout(test, 0);
---------------------

Should we investigate that separately, or make sure that timers don't take over the event loop, and fix both cases at once?
Comment 8 Dmitry Titov 2009-02-07 22:25:09 PST
Created attachment 27458 [details]
Updated patch, v3

Even better now (thanks for a thorough review!)

- of course the message object is leaking. Fixed.
- moved the '*' to the other side
- got rid of autorelease pool! If the only methods used are alloc-init-release, none is needed.

I thought a bit about investigating main thread timers... If possible, my proposal would be to consider it separately (create another bug?) The reasons are:
- timers (TimerBase objects) are used in WebCore for 'delayed work' very widely. Allowing user input to slip in between them can break the world in many subtle ways. For example, if there was a codepath that changes the tree and calls Timer::startOneShot(0) to do layout, then if the user moves the mouse across the page we currently have a guarantee that hittest happens only on a valid layout. startOneShot(0) is used in many places and this change in assumptions may affect some of them.
- it may change behavior for some sites. For example, a page could schedule 2 timers with the close fire time and accidentally depend on the fact that user input will not be dispatched in between - such pages can be broken if we allow user input to go in between close timer firings.
- because of the above, the solution likely will be different for main thread - perhaps we'll need 'urgent' and 'less urgent' timers.

Let me know what you think! :-)
Comment 9 Alexey Proskuryakov 2009-02-08 00:54:27 PST
Comment on attachment 27458 [details]
Updated patch, v3

r=me
Comment 10 Dmitry Titov 2009-02-09 14:04:43 PST
I think this is ready for landing. 

Separately, we'll need to see how Windows Safari behaves when we have a version of Windows Safari which works with ToT WebKit correctly. In version 3.2.1, the callOnMainThread() is not working - the MainThreadWin creates a window and uses PostMessage but the windowproc of that window is never called, indicating some problem with main thread's message loop that is in Safari itself. So addressing similar issue (or even confirming it) on Windows will need to wait.
Comment 11 Alexey Proskuryakov 2009-02-10 03:39:39 PST
> In version 3.2.1, the callOnMainThread() is not working

Do the tests in fast/workers not work for you in Windows nightly builds? They work for me.
Comment 12 Dmitry Titov 2009-02-10 20:31:15 PST
Comment on attachment 27458 [details]
Updated patch, v3

Removing r+ since I've learned more about the problem and there is a better fix. Patch forthcoming.
Comment 13 Dmitry Titov 2009-02-10 21:18:01 PST
Created attachment 27550 [details]
Proposed patch

I've removed NSPort and NSPortMessage. After re-doing experiments looking at bug 23865, I've realized the input events are dispatched in between timers. So the MainThreadMac.mm practically reverted.

The change that actually helps (and which remains here) is avoiding processing all the worker events accumulated in a FunctionQueue at once - since at some point it could take many seconds while run loop is not running. Instead measure the time and exit/reschedule if needed.

Also, I keep the initializeMainThreadPlatform to avoid platform ifdefs in MainThread.cpp. 
Also, resolved against current ToT.
Comment 14 Alexey Proskuryakov 2009-02-10 23:49:36 PST
Comment on attachment 27550 [details]
Proposed patch

r=me

+static WTFMainThreadCaller* staticMainThreadCaller = nil;

We usually rely on the fact that static variables are initialized to zero automagically - but I don't see a problem with explicitly initializing them.
Comment 15 Alexey Proskuryakov 2009-02-11 23:59:04 PST
Committed revision 40888.