Bug 57090 - Simplifying Worker termination sequence (removing unnecessary mutex)
Summary: Simplifying Worker termination sequence (removing unnecessary mutex)
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-25 00:25 PDT by Kinuko Yasuda
Modified: 2011-04-14 15:14 PDT (History)
5 users (show)

See Also:


Attachments
Patch (2.27 KB, patch)
2011-03-25 00:27 PDT, Kinuko Yasuda
no flags Details | Formatted Diff | Diff
Patch (17.34 KB, patch)
2011-03-29 20:36 PDT, Dmitry Titov
no flags Details | Formatted Diff | Diff
Patch (21.48 KB, patch)
2011-03-30 15:57 PDT, Dmitry Titov
no flags Details | Formatted Diff | Diff
Patch, added ENABLE(WORKERS) per EWS. (21.54 KB, patch)
2011-04-01 15:52 PDT, Dmitry Titov
no flags Details | Formatted Diff | Diff
Patch, another EWS fix. (21.54 KB, patch)
2011-04-01 16:37 PDT, Dmitry Titov
levin: review+
levin: commit-queue-
Details | Formatted Diff | Diff
Patch, fixed flaky test (22.31 KB, patch)
2011-04-13 14:50 PDT, Dmitry Titov
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kinuko Yasuda 2011-03-25 00:25:13 PDT
While I was trying to fix issue 56138 I found WorkerScriptController is causing another tsan data race warnings.
I'm not sure if it's false positive but it looks plausible.

WARNING: Possible data race during write of size 1 at 0x62BF478: {{{
   T0 (L{L393, L401}):
    #0  WebCore::WorkerScriptController::forbidExecution(WebCore::WorkerScriptController::ForbidExecutionOption) third_party/WebKit/Source/WebCore/bindings/v8/WorkerScriptController.cpp:93
    #1  WebCore::WorkerThread::stop() third_party/WebKit/Source/WebCore/workers/WorkerThread.cpp:231
    #2  WebKit::WebWorkerBase::stopWorkerThread() third_party/WebKit/Source/WebKit/chromium/src/WebWorkerBase.cpp:164
    #3  WebKit::WebWorkerImpl::terminateWorkerContext() third_party/WebKit/Source/WebKit/chromium/src/WebWorkerImpl.cpp:113
    #4  WebWorkerStub::OnTerminateWorkerContext() content/worker/webworker_stub.cc:56
    #5  bool IPC::Message::Dispatch<WebWorkerStub, WebWorkerStub>(IPC::Message const*, WebWorkerStub*, WebWorkerStub*, void (WebWorkerStub::*)()) ./ipc/ipc_message.h:136
    #6  WebWorkerStub::OnMessageReceived(IPC::Message const&) content/worker/webworker_stub.cc:45
    ...
  Concurrent read(s) happened at (OR AFTER) these points:
   T2 (L{}):
    #0  WebCore::WorkerScriptController::proxy() third_party/WebKit/Source/WebCore/bindings/v8/WorkerScriptController.h:51
    #1  WebCore::toV8Context(WebCore::ScriptExecutionContext*, WebCore::WorldContextHandle const&) third_party/WebKit/Source/WebCore/bindings/v8/V8Proxy.cpp:835
    #2  WebCore::V8CustomVoidCallback::handleEvent() third_party/WebKit/Source/WebCore/bindings/v8/custom/V8CustomVoidCallback.cpp:56
    #3  WebCore::VoidCallbacks::didSucceed() third_party/WebKit/Source/WebCore/fileapi/FileSystemCallbacks.cpp:312
    #4  WebKit::WebFileSystemCallbacksImpl::didSucceed() third_party/WebKit/Source/WebKit/chromium/src/WebFileSystemCallbacksImpl.cpp:65
    #5  WebKit::WorkerFileSystemCallbacksBridge::didSucceedOnWorkerThread(WebCore::ScriptExecutionContext*, WebKit::WorkerFileSystemCallbacksBridge*) third_party/WebKit/Source/WebKit/chromium/src/WorkerFileSystemCallbacksBridge.cpp:327
    #6  WebCore::CrossThreadTask1<WebKit::WorkerFileSystemCallbacksBridge*, WebKit::WorkerFileSystemCallbacksBridge*>::performTask(WebCore::ScriptExecutionContext*) third_party/WebKit/Source/WebCore/dom/CrossThreadTask.h:81
    ...
  Location 0x62BF478 is 56 bytes inside a block starting at 0x62BF440 of size 64 allocated by T2 from heap:
    #0  operator new(unsigned long) /tmp/valgrind-src/tsan/tsan/ts_valgrind_intercepts.c:418
    #1  WebCore::WorkerContext::WorkerContext(WebCore::KURL const&, WTF::String const&, WebCore::WorkerThread*) third_party/WebKit/Source/WebCore/workers/WorkerContext.cpp:109
    #2  WebCore::DedicatedWorkerContext::DedicatedWorkerContext(WebCore::KURL const&, WTF::String const&, WebCore::DedicatedWorkerThread*) third_party/WebKit/Source/WebCore/workers/DedicatedWorkerContext.cpp:45
    #3  WebCore::DedicatedWorkerContext::create(WebCore::KURL const&, WTF::String const&, WebCore::DedicatedWorkerThread*) third_party/WebKit/Source/WebCore/workers/DedicatedWorkerContext.h:48
    #4  WebCore::DedicatedWorkerThread::createWorkerContext(WebCore::KURL const&, WTF::String const&) third_party/WebKit/Source/WebCore/workers/DedicatedWorkerThread.cpp:59
    #5  WebCore::WorkerThread::workerThread() third_party/WebKit/Source/WebCore/workers/WorkerThread.cpp:125
    #6  WebCore::WorkerThread::workerThreadStart(void*) third_party/WebKit/Source/WebCore/workers/WorkerThread.cpp:118
    ...
  Locks involved in this report (reporting last lock sites): {L393, L401}
   L393 (0x6239C10)
    #0  pthread_mutex_lock /tmp/valgrind-src/tsan/tsan/ts_valgrind_intercepts.c:891
    #1  WTF::Mutex::lock() third_party/WebKit/Source/JavaScriptCore/wtf/ThreadingPthreads.cpp:269
    #2  WTF::Locker<WTF::Mutex>::Locker(WTF::Mutex&) third_party/WebKit/Source/JavaScriptCore/wtf/Locker.h:38
    #3  WebCore::WorkerThread::stop() third_party/WebKit/Source/WebCore/workers/WorkerThread.cpp:227
    #4  WebKit::WebWorkerBase::stopWorkerThread() third_party/WebKit/Source/WebKit/chromium/src/WebWorkerBase.cpp:164
    #5  WebKit::WebWorkerImpl::terminateWorkerContext() third_party/WebKit/Source/WebKit/chromium/src/WebWorkerImpl.cpp:113
    #6  WebWorkerStub::OnTerminateWorkerContext() content/worker/webworker_stub.cc:56
    ...
   L401 (0x62BF450)
    #0  pthread_mutex_lock /tmp/valgrind-src/tsan/tsan/ts_valgrind_intercepts.c:891
    #1  WTF::Mutex::lock() third_party/WebKit/Source/JavaScriptCore/wtf/ThreadingPthreads.cpp:269
    #2  WTF::Locker<WTF::Mutex>::Locker(WTF::Mutex&) third_party/WebKit/Source/JavaScriptCore/wtf/Locker.h:38
    #3  WebCore::WorkerScriptController::forbidExecution(WebCore::WorkerScriptController::ForbidExecutionOption) third_party/WebKit/Source/WebCore/bindings/v8/WorkerScriptController.cpp:92
    #4  WebCore::WorkerThread::stop() third_party/WebKit/Source/WebCore/workers/WorkerThread.cpp:231
    #5  WebKit::WebWorkerBase::stopWorkerThread() third_party/WebKit/Source/WebKit/chromium/src/WebWorkerBase.cpp:164
    #6  WebKit::WebWorkerImpl::terminateWorkerContext() third_party/WebKit/Source/WebKit/chromium/src/WebWorkerImpl.cpp:113
   Race verifier data: 0x2107DEC,0x20EE364
}}}
Comment 1 Kinuko Yasuda 2011-03-25 00:27:58 PDT
Created attachment 86904 [details]
Patch
Comment 2 Dmitry Titov 2011-03-25 08:48:36 PDT
Adding a lock to this accessor does not prevent  the possibility of the flag being flipped right after the exit from the accessor. In this case using the v8 proxy returned would have same results as not having a lock.
Let me take a better look, i think this code is ok without locking, need to verify.
Comment 3 David Levin 2011-03-25 09:35:09 PDT
(In reply to comment #2)
> Adding a lock to this accessor does not prevent  the possibility of the flag being flipped right after the exit from the accessor. In this case using the v8 proxy returned would have same results as not having a lock.
> Let me take a better look, i think this code is ok without locking, need to verify.

Looking at http://trac.webkit.org/browser/trunk/Source/WebCore/bindings/v8/WorkerScriptController.cpp, I don't understand why the mutex is ever necessary (by similar reasoning to the above).
Comment 4 Dmitry Titov 2011-03-25 11:52:16 PDT
(In reply to comment #3)
> (In reply to comment #2)
> Looking at http://trac.webkit.org/browser/trunk/Source/WebCore/bindings/v8/WorkerScriptController.cpp, I don't understand why the mutex is ever necessary (by similar reasoning to the above).

Good point. The lock was grandfathered (euphemism for "just copied by me"), it was added by Alexey for JSC: http://trac.webkit.org/changeset/38689

However, termination code was changed and I think it is not needed now, at least in V8 case.

Here is how it works:
- V8::TerminateExecution() is invoked from any thread. It takes internal lock in V8 and sets a flag.
- On a worker thread, we can be either in v8 code already or in C++ WebCore code. The latter case may eventually enter V8 code as well. It's not a problem, since it's ok to enter the v8 code once after the termination flag was set
- V8 discovers a termination flag (after taking an internal lock) and raises the internal exception that unwinds us out of V8 code into C++ WebCore frame. If there are more v8 frames on the stack, the C++ code must check the result and if we have uncatchable exception unwinding (v8::TryCatch::CanContinue() == false), it should gracefully exit to allow the exception to unwind to the bottom. During this unwinding, V8 fast-returns with trivial results from all V8 APIs if they are called anyways.
- After we get out of the last V8 frame, it is not legal to call into V8 again. This is the only place where reentry into V8 is harmful. We should terminate worker then.

One change that makes sense here is to stop setting m_executionForbidden on a non-worker thread, and instead set it on worker thread, the moment we return from v8 frame unwinding termination exception. This will prevent us from re-entering v8 and make this flag set/examined from the single thread.

Removing r? flag from Kinuko's patch for now. Give me a day or two I'll upload updated one after running some tests and looking at the code, since threading is tricky :-)
Comment 5 Dmitry Titov 2011-03-29 20:36:58 PDT
Created attachment 87455 [details]
Patch

Work in progress. It applies to both JSC and V8, so I'm removing "[v8]" in the title of the bug

The mutex is gone! It should not be there, since guarding the entry into JS engine is only possible from the same thread, otherwise the whole "entry into JS" would have to be under a lock.

The patch makes the bit preventing the reentry to be only set/checked on a single thread and so the mutex is no longer needed. The logic of termination:

1. If Worker is merely "Closed", as in WorkerGlobalScope.Close() - the WorkerContext::m_isClosing is set and that eventually drains Task queue and terminates the worker
2. If Worker.terminate() is called or parent page closes - the WorkerScriptController::scheduleExecutionTermination() is called. This will eventually (so no locking needed) propagate to the worker thread and raise termination exception. Upon exit from JSC or V8 with that exception, we use WorkerScriptController::forbidExecution() to set up a flag.
3. After the flag is set, we avoid reentering JS in EventListener and WorkerScriptController code.

The only difference between JSC and V8 is that JSC's Terminator is not guaranteed to immediately raise termination exception, so additional check is added for the Terminator::shouldTerminate() in addition to exception check.

Tests run fine. I will run V8 tests and update patch with ChanegLogs etc.
Comment 6 Alexey Proskuryakov 2011-03-29 21:38:12 PDT
> This will eventually (so no locking needed) propagate to the worker thread and raise termination exception.

Did you check svn history for the lock around m_executionForbidden? I vaguely recall thinking the same when introducing it originally, but then there was a specific race condition that made the lock in WorkerScriptController::evaluate necessary.
Comment 7 Dmitry Titov 2011-03-29 22:04:11 PDT
(In reply to comment #6)
> > This will eventually (so no locking needed) propagate to the worker thread and raise termination exception.
> 
> Did you check svn history for the lock around m_executionForbidden? I vaguely recall thinking the same when introducing it originally, but then there was a specific race condition that made the lock in WorkerScriptController::evaluate necessary.

I looked, yes. It was created here: http://trac.webkit.org/changeset/38689.

There was originally a timeout of 1ms... Before Terminator. 

I think you might needed a mutex to exclude situation when timeout aborted the JS execution, but m_executionForbidden on worker thread is still 'false' - that can happen w/o mutex. So in that case JS could exit by timeout, and immediately enter again.

I think checking the exception result on exit from JS and setting m_executionForbidden on the worker thread achieves the same result, w/o mutex.

Does it match your recollection?
Comment 8 Alexey Proskuryakov 2011-03-29 22:21:53 PDT
Yes, that was I've been seeing. Thanks!
Comment 9 Dmitry Titov 2011-03-30 15:57:37 PDT
Created attachment 87632 [details]
Patch
Comment 10 Dmitry Titov 2011-04-01 15:52:34 PDT
Created attachment 87935 [details]
Patch, added ENABLE(WORKERS) per EWS.
Comment 11 Early Warning System Bot 2011-04-01 16:25:13 PDT
Attachment 87935 [details] did not build on qt:
Build output: http://queues.webkit.org/results/8321313
Comment 12 Dmitry Titov 2011-04-01 16:37:44 PDT
Created attachment 87940 [details]
Patch, another EWS fix.
Comment 13 David Levin 2011-04-08 15:21:57 PDT
Comment on attachment 87940 [details]
Patch, another EWS fix.

View in context: https://bugs.webkit.org/attachment.cgi?id=87940&action=review

> Source/WebCore/bindings/v8/WorkerScriptController.cpp:72
> +      return ScriptValue();

indent off.
Comment 14 Dmitry Titov 2011-04-13 14:34:09 PDT
I didn't land this patch because it exposed a race condition in fast/workers/shared-worker-frame-lifecycle.html that caused it to occasionally fail when run like this:
run-webkit-tests LayoutTests/fast/workers/ --iterations 100 --random

The culprit happens to be document.write(), called mostly directly from inline script (in this case it appends to the document) and sometimes deferred via setTimeout() (when not all worker threads from previous test happen to exit yet) - in latter case it overwrites the whole document and makes the test fail.

Added the change from document.write() to using appendChild(), verified we don't use document.write in other places in worker tests.

Updated patch is coming.
Comment 15 Dmitry Titov 2011-04-13 14:50:32 PDT
Created attachment 89468 [details]
Patch, fixed flaky test

Same code change, plus a LayoutTests/fast/workers/shared-worker-frame-lifecycle.html update to remove race condition.
Comment 16 WebKit Commit Bot 2011-04-14 02:52:45 PDT
Comment on attachment 89468 [details]
Patch, fixed flaky test

Rejecting attachment 89468 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'apply-..." exit_code: 2

Last 500 characters of output:
ile Source/WebCore/bindings/v8/WorkerScriptController.h
patching file Source/WebCore/dom/Document.h
patching file Source/WebCore/dom/MessagePort.cpp
patching file Source/WebCore/dom/ScriptExecutionContext.h
patching file Source/WebCore/workers/WorkerContext.cpp
patching file Source/WebCore/workers/WorkerContext.h
patching file Source/WebCore/workers/WorkerThread.cpp

Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', u'--reviewer', u'David Levin', u'--force']" exit_code: 1

Full output: http://queues.webkit.org/results/8404575
Comment 17 Dmitry Titov 2011-04-14 14:05:13 PDT
Landing manually...
Comment 18 Dmitry Titov 2011-04-14 15:14:47 PDT
Comment on attachment 89468 [details]
Patch, fixed flaky test

Landed: http://trac.webkit.org/changeset/83900