ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting() on the bots sometimes.
<rdar://problem/36594564>
Created attachment 331541 [details] Patch
*** Bug 181763 has been marked as a duplicate of this bug. ***
Comment on attachment 331541 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331541&action=review > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { Isn't there a risk that the service worker thread might be kept busy (by JS say), will never execute that task and thus will never stop? We could try making ServiceWorkerContainer::ensureSWClientConnection() asynchronous. CacheStorageConnection is using the same pattern though so maybe this deadlock can also happen with regular workers using Cache API. It seems to me we should try to not be service worker specific and handle this at WorkerThread level. WorkerThread::start has a callback so we might be able to know if the worker is started and use that information to stop it now or just after being started. The current callback is called after evaluation of the script which might not be good in case of runaway script. Maybe we could do a callOnMainThread shortly after creation of the worker global scope to check whether worker thread is to be stopped? > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > + } Can there be cases where there is more than one such worker per registration? If not, we could break the loop when we found one. Or maybe it is not too complex to keep a pointer to this worker in SWServerRegistration?
(In reply to youenn fablet from comment #4) > Comment on attachment 331541 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. Looking at it again, Cache API will probably not block the worker thread at creation of the global scope. Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a related refactoring might so the trick.
(In reply to youenn fablet from comment #5) > (In reply to youenn fablet from comment #4) > > Comment on attachment 331541 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > say), will never execute that task and thus will never stop? > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > > asynchronous. > > CacheStorageConnection is using the same pattern though so maybe this > > deadlock can also happen with regular workers using Cache API. > > Looking at it again, Cache API will probably not block the worker thread at > creation of the global scope. > Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a > related refactoring might so the trick. Making it async may not be trivial though. I'd have to check. (In reply to youenn fablet from comment #4) > Comment on attachment 331541 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. > It seems to me we should try to not be service worker specific and handle > this at WorkerThread level. > > WorkerThread::start has a callback so we might be able to know if the worker > is started and use that information to stop it now or just after being > started. > The current callback is called after evaluation of the script which might > not be good in case of runaway script. > Maybe we could do a callOnMainThread shortly after creation of the worker > global scope to check whether worker thread is to be stopped? > > > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > > + } > > Can there be cases where there is more than one such worker per > registration? > If not, we could break the loop when we found one. > Or maybe it is not too complex to keep a pointer to this worker in > SWServerRegistration? No, I don't think there could be more than 1 such worker. I could either break from the loop when finding one or store that worker on the registration somehow. I'll look into it.
> > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > say), will never execute that task and thus will never stop? > > > > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. I believe this is well handled for regular workers. If there is something special in service workers that would make this exception-throwing-to-stop-spinning strategy to fail, I would hope that we can realign service workers with workers.
(In reply to youenn fablet from comment #7) > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > > say), will never execute that task and thus will never stop? > > > > > > > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. > > I believe this is well handled for regular workers. > If there is something special in service workers that would make this > exception-throwing-to-stop-spinning strategy to fail, I would hope that we > can realign service workers with workers. From what I can see from regular workers, WorkerThread::stop() is called from terminateWorkerGlobalScope(), which is always called on the background thread. Call sites look like: m_scriptExecutionContext->postTask([this] (ScriptExecutionContext&) { m_mayBeDestroyed = true; if (m_workerThread) terminateWorkerGlobalScope(); else workerGlobalScopeDestroyedInternal(); });
Created attachment 331648 [details] Patch
Improved approach based on offline discussion with Youenn and Brady.
Comment on attachment 331648 [details] Patch Attachment 331648 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/6124606 New failing tests: fast/workers/worker-terminate-forever.html
Created attachment 331655 [details] Archive of layout-test-results from ews116 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-sierra Platform: Mac OS X 10.12.6
Comment on attachment 331648 [details] Patch Well, this is unfortunate.
Comment on attachment 331648 [details] Patch Attachment 331648 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/6124993 New failing tests: fast/workers/worker-terminate-forever.html
Created attachment 331659 [details] Archive of layout-test-results from ews206 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews206 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
I looked into making ensureSWConnection() async and this is a larger change and it may introduce bugs, which we really do not want at this point. I think I'll go back to the original dispatching of the WorkerThread::stop() to the bg thread.
Created attachment 331670 [details] Patch
Created attachment 331674 [details] Patch
The deadlock shown in Bug 181763 is probably fixed by making ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it outside the WorkerThread::start lock section, in WorkerScriptController::initScript() for instance. Can we do that instead?
(In reply to youenn fablet from comment #19) > The deadlock shown in Bug 181763 is probably fixed by making > ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it > outside the WorkerThread::start lock section, in > WorkerScriptController::initScript() for instance. > Can we do that instead? I thought about this but if I am outside the locked section then the workerglobalscope may get destroyed at any point no? Because stop() is able to lock the mutex at this point.
WorkerScriptController::initScript expects a valid global scope. IIRC, it is called outside the lock section and before evaluating the script.
(In reply to youenn fablet from comment #21) > WorkerScriptController::initScript expects a valid global scope. > IIRC, it is called outside the lock section and before evaluating the script. I guess it works out because WorkerThread::stop() post a task to destroy the workerglobalscope on the background thread, using m_runLoop.postTaskAndTerminate(). Still, seems odd to add new infrastructure (new virtual function and call after the lock is released), just to work around the fact that we are calling callOnMainThreadAndWait(). I plan to get rid of callOnMainThreadAndWait() a little bit longer term anyway. Also, dispatching to the bg thread the stop() request, as the current patch does seems fine to me. What's the drawback? If the thread is busy looping and does not process the request within 10 seconds, we will kill the SW process.
Created attachment 331745 [details] Patch
Comment on attachment 331745 [details] Patch Clearing flags on attachment: 331745 Committed r227221: <https://trac.webkit.org/changeset/227221>
All reviewed patches have been landed. Closing bug.