RESOLVED FIXED 181761
ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()
https://bugs.webkit.org/show_bug.cgi?id=181761
Summary ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()
Chris Dumez
Reported 2018-01-17 13:29:11 PST
ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting() on the bots sometimes.
Attachments
Patch (6.43 KB, patch)
2018-01-17 14:17 PST, Chris Dumez
no flags
Patch (10.01 KB, patch)
2018-01-18 12:09 PST, Chris Dumez
no flags
Archive of layout-test-results from ews116 for mac-sierra (2.92 MB, application/zip)
2018-01-18 13:42 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews206 for win-future (11.97 MB, application/zip)
2018-01-18 14:05 PST, EWS Watchlist
no flags
Patch (9.80 KB, patch)
2018-01-18 16:39 PST, Chris Dumez
no flags
Patch (10.01 KB, patch)
2018-01-18 17:01 PST, Chris Dumez
no flags
Patch (8.84 KB, patch)
2018-01-19 09:42 PST, Chris Dumez
no flags
Radar WebKit Bug Importer
Comment 1 2018-01-17 13:29:56 PST
Chris Dumez
Comment 2 2018-01-17 14:17:23 PST
Chris Dumez
Comment 3 2018-01-17 14:18:11 PST
*** Bug 181763 has been marked as a duplicate of this bug. ***
youenn fablet
Comment 4 2018-01-18 01:18:56 PST
Comment on attachment 331541 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331541&action=review > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { Isn't there a risk that the service worker thread might be kept busy (by JS say), will never execute that task and thus will never stop? We could try making ServiceWorkerContainer::ensureSWClientConnection() asynchronous. CacheStorageConnection is using the same pattern though so maybe this deadlock can also happen with regular workers using Cache API. It seems to me we should try to not be service worker specific and handle this at WorkerThread level. WorkerThread::start has a callback so we might be able to know if the worker is started and use that information to stop it now or just after being started. The current callback is called after evaluation of the script which might not be good in case of runaway script. Maybe we could do a callOnMainThread shortly after creation of the worker global scope to check whether worker thread is to be stopped? > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > + } Can there be cases where there is more than one such worker per registration? If not, we could break the loop when we found one. Or maybe it is not too complex to keep a pointer to this worker in SWServerRegistration?
youenn fablet
Comment 5 2018-01-18 01:32:15 PST
(In reply to youenn fablet from comment #4) > Comment on attachment 331541 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. Looking at it again, Cache API will probably not block the worker thread at creation of the global scope. Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a related refactoring might so the trick.
Chris Dumez
Comment 6 2018-01-18 09:07:09 PST
(In reply to youenn fablet from comment #5) > (In reply to youenn fablet from comment #4) > > Comment on attachment 331541 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > say), will never execute that task and thus will never stop? > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > > asynchronous. > > CacheStorageConnection is using the same pattern though so maybe this > > deadlock can also happen with regular workers using Cache API. > > Looking at it again, Cache API will probably not block the worker thread at > creation of the global scope. > Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a > related refactoring might so the trick. Making it async may not be trivial though. I'd have to check. (In reply to youenn fablet from comment #4) > Comment on attachment 331541 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. > It seems to me we should try to not be service worker specific and handle > this at WorkerThread level. > > WorkerThread::start has a callback so we might be able to know if the worker > is started and use that information to stop it now or just after being > started. > The current callback is called after evaluation of the script which might > not be good in case of runaway script. > Maybe we could do a callOnMainThread shortly after creation of the worker > global scope to check whether worker thread is to be stopped? > > > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > > + } > > Can there be cases where there is more than one such worker per > registration? > If not, we could break the loop when we found one. > Or maybe it is not too complex to keep a pointer to this worker in > SWServerRegistration? No, I don't think there could be more than 1 such worker. I could either break from the loop when finding one or store that worker on the registration somehow. I'll look into it.
youenn fablet
Comment 7 2018-01-18 09:17:19 PST
> > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > say), will never execute that task and thus will never stop? > > > > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. I believe this is well handled for regular workers. If there is something special in service workers that would make this exception-throwing-to-stop-spinning strategy to fail, I would hope that we can realign service workers with workers.
Chris Dumez
Comment 8 2018-01-18 09:21:51 PST
(In reply to youenn fablet from comment #7) > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > > say), will never execute that task and thus will never stop? > > > > > > > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. > > I believe this is well handled for regular workers. > If there is something special in service workers that would make this > exception-throwing-to-stop-spinning strategy to fail, I would hope that we > can realign service workers with workers. From what I can see from regular workers, WorkerThread::stop() is called from terminateWorkerGlobalScope(), which is always called on the background thread. Call sites look like: m_scriptExecutionContext->postTask([this] (ScriptExecutionContext&) { m_mayBeDestroyed = true; if (m_workerThread) terminateWorkerGlobalScope(); else workerGlobalScopeDestroyedInternal(); });
Chris Dumez
Comment 9 2018-01-18 12:09:08 PST
Chris Dumez
Comment 10 2018-01-18 12:09:33 PST
Improved approach based on offline discussion with Youenn and Brady.
EWS Watchlist
Comment 11 2018-01-18 13:42:43 PST
Comment on attachment 331648 [details] Patch Attachment 331648 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/6124606 New failing tests: fast/workers/worker-terminate-forever.html
EWS Watchlist
Comment 12 2018-01-18 13:42:44 PST
Created attachment 331655 [details] Archive of layout-test-results from ews116 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-sierra Platform: Mac OS X 10.12.6
Chris Dumez
Comment 13 2018-01-18 13:53:50 PST
Comment on attachment 331648 [details] Patch Well, this is unfortunate.
EWS Watchlist
Comment 14 2018-01-18 14:05:35 PST
Comment on attachment 331648 [details] Patch Attachment 331648 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/6124993 New failing tests: fast/workers/worker-terminate-forever.html
EWS Watchlist
Comment 15 2018-01-18 14:05:45 PST
Created attachment 331659 [details] Archive of layout-test-results from ews206 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews206 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Chris Dumez
Comment 16 2018-01-18 16:37:50 PST
I looked into making ensureSWConnection() async and this is a larger change and it may introduce bugs, which we really do not want at this point. I think I'll go back to the original dispatching of the WorkerThread::stop() to the bg thread.
Chris Dumez
Comment 17 2018-01-18 16:39:22 PST
Chris Dumez
Comment 18 2018-01-18 17:01:07 PST
youenn fablet
Comment 19 2018-01-19 00:54:17 PST
The deadlock shown in Bug 181763 is probably fixed by making ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it outside the WorkerThread::start lock section, in WorkerScriptController::initScript() for instance. Can we do that instead?
Chris Dumez
Comment 20 2018-01-19 07:07:33 PST
(In reply to youenn fablet from comment #19) > The deadlock shown in Bug 181763 is probably fixed by making > ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it > outside the WorkerThread::start lock section, in > WorkerScriptController::initScript() for instance. > Can we do that instead? I thought about this but if I am outside the locked section then the workerglobalscope may get destroyed at any point no? Because stop() is able to lock the mutex at this point.
youenn fablet
Comment 21 2018-01-19 07:25:20 PST
WorkerScriptController::initScript expects a valid global scope. IIRC, it is called outside the lock section and before evaluating the script.
Chris Dumez
Comment 22 2018-01-19 08:57:13 PST
(In reply to youenn fablet from comment #21) > WorkerScriptController::initScript expects a valid global scope. > IIRC, it is called outside the lock section and before evaluating the script. I guess it works out because WorkerThread::stop() post a task to destroy the workerglobalscope on the background thread, using m_runLoop.postTaskAndTerminate(). Still, seems odd to add new infrastructure (new virtual function and call after the lock is released), just to work around the fact that we are calling callOnMainThreadAndWait(). I plan to get rid of callOnMainThreadAndWait() a little bit longer term anyway. Also, dispatching to the bg thread the stop() request, as the current patch does seems fine to me. What's the drawback? If the thread is busy looping and does not process the request within 10 seconds, we will kill the SW process.
Chris Dumez
Comment 23 2018-01-19 09:42:16 PST
Chris Dumez
Comment 24 2018-01-19 11:13:59 PST
Comment on attachment 331745 [details] Patch Clearing flags on attachment: 331745 Committed r227221: <https://trac.webkit.org/changeset/227221>
Chris Dumez
Comment 25 2018-01-19 11:14:01 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.