The worker-lifecycle layout test crashes intermittently on snowleopard.
The stderr shows:
ASSERTION FAILED: !result
pthread_mutex_lock() is returning an error - this seems to imply a deadlock of some kind, but I don't see any way for that to happen given the simple uses of mutexes in the worker code.
> pthread_mutex_lock() is returning an error
When I saw errors from this function, it usually meant that a garbage mutex object was passed in (likely because its containing object was already deallocated).
I was able to repro this locally on Leopard (not snowleopard) by running the test in a tight loop - seems to happen every few hundred iterations.
The race condition seems to be this:
WorkerContext::close() invokes m_workerThread->stop(). This starts the process of shutting down the worker thread by calling runLoop.terminate(). Once the thread of execution unwinds and the runloop exits, the following actions occur:
// The below assignment will destroy the context, which will in turn notify messaging proxy.
// We cannot let any objects survive past thread exit, because no other thread will run GC or otherwise destroy them.
m_workerContext = 0;
Note that we call clearScript() before freeing the workerContext - I'm not certain why, but I'm guessing it's to try to untangle some kind of circular reference.
Now, while those steps are executing, imagine that the main thread decides to GC the Worker object (it can do this, because the WorkerThread went idle as part of shutting down). This invokes WorkerMessagingProxy::workerObjectDestroyed():
m_workerObject = 0;
Since the WorkerMessagingProxy doesn't know that the workerContext has been destroyed (hasn't happened yet), it happily invokes terminateWorkerContext() which invokes WorkerThread::stop() again.
WorkerThread::stop() accesses the workerContext and WorkerScriptController to forbidExecution() (force the thread to exit), while those objects could be getting freed by the already-exiting thread, which generates this failure.
Looks like only two pieces of WorkerThread are accessed cross-thread:
Since runLoop is already thread safe, we're OK with accessing that across threads.
WorkerThread::stop() currently grabs the m_threadCreationMutex while doing its work. I'm wondering if we should also grab that mutex while cleaning up after the run loop exits, to protect access to workerContext.
I'll try that fix locally and see how it works. ap - what do you think?
I think that WorkerContext.close() shouldn't close the worker directly - it should send this request to proxy, which should close the worker. All information about worker state should be current in proxy, workers themselves shouldn't have the freedom to stop.
I don't think that's a viable solution.
The semantics of WorkerGlobalScope::close() are such that no further async events (timers, message port messages, network requests, etc) should be processed after close() are processed.
Routing close requests through the proxy would mean we'd have to jump through hoops to short-circuit any further event activity/generation. Possibly doable, but so much simpler to just call m_runLoop.terminate().
Also, forcing the parent to get involved has some nasty implications with nested SharedWorkers where the parent worker may not be around any more, although we probably address that by routing close requests through the repository instead of the parent worker object.
What's the downside to allowing workers to shut themselves down (other than this bug, of course :)?
See also: bug 25902 comment 3 (so, there was a race condition, after all!)
I think this proves that the current solution is difficult to understand - the code smells wrong, but we couldn't find the bug even after detailed analysis, not until it was exposed by a test. Simplifying cross-thread communication by not letting worker contexts close without proxy consent should be much more maintainable long-term.
Created attachment 38769 [details]
Per my earlier conversation with ap - we no longer invoke WorkerThread::stop() when WorkerContext::close() is called. Instead, the WorkerMessagingProxy and SharedWorkerProxy shutdown the thread themselves, to avoid race conditions with the thread shutting down while the main thread is still accessing it.
Created attachment 38770 [details]
patch with fix for typo in comment
Had a typo in a comment so figured I'd fix it up.
Comment on attachment 38770 [details]
patch with fix for typo in comment
+ Error events should still be delivered even if the worker thread is closing.
This sounds like something worth a comment in code, ideally with an explanation of why they should be delivered. Is this something covered by existing tests?
+ Also fixed problem where error events were not dispatched if error handlers were added via addEventListener().
Please add a test for this bug fix.
So, this function can now be called repeatedly. Is it OK to call m_thread->stop() twice?
r=me on a condition there's a good answer to this last question.
(In reply to comment #10)
> (From update of attachment 38770 [details])
> + Error events should still be delivered even if the worker thread is
> This sounds like something worth a comment in code, ideally with an explanation
> of why they should be delivered. Is this something covered by existing tests?
Yes, it's covered by existing tests (worker_close.html), which is why I made this change (the tests started breaking once I routed WorkerContext::close() through the WorkerObjectProxy, because now the m_askToTerminate flag is set.
I will add a comment, although it's weird to have a disembodied comment in the code about *why* you aren't introducing buggy behavior with unnecessary checks, which is why I put it in the ChangeLog instead :)
> + Also fixed problem where error events were not dispatched if error
> handlers were added via addEventListener().
> Please add a test for this bug fix.
> void SharedWorkerProxy::close()
> - ASSERT(!isClosing());
> So, this function can now be called repeatedly. Is it OK to call
> m_thread->stop() twice?
It is technically OK to call m_thread->stop() twice in this case (the case where it gets called twice is when the parent document closes while the worker thread is executing, then the worker thread calls WorkerContext::close(). The second call to m_thread->stop() would happen on the Worker thread itself, which is safe.
However, this seems fragile so I'm going to make the following change to clean it up:
1) Make SharedWorkerProxy::close() private (should have been already).
2) Re-institute the ASSERT(!isClosing()) to SharedWorkerProxy::close().
3) Add the following to workerContextClosed() to avoid calling close() on an already-closed worker:
> r=me on a condition there's a good answer to this last question.
> I will add a comment, although it's weird to have a disembodied comment in the
> code about *why* you aren't introducing buggy behavior with unnecessary checks,
> which is why I put it in the ChangeLog instead :)
This is a good way to think about it. But this change makes this version of performTask() different from MessageWorkerTask::performTask(), which can be confusing, and that was the reason for me to suggest having a comment in code.
> However, this seems fragile so I'm going to make the following change to clean
> it up:
Sounds good to me.
Committed as r47914.