181761 – ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()

RESOLVED FIXED 181761

ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()

https://bugs.webkit.org/show_bug.cgi?id=181761

Summary ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()

Chris Dumez

Reported 2018-01-17 13:29:11 PST

ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting() on the bots sometimes.

Attachments
Patch (6.43 KB, patch) 2018-01-17 14:17 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (10.01 KB, patch) 2018-01-18 12:09 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews116 for mac-sierra (2.92 MB, application/zip) 2018-01-18 13:42 PST, EWS Watchlist	no flags	Details
Archive of layout-test-results from ews206 for win-future (11.97 MB, application/zip) 2018-01-18 14:05 PST, EWS Watchlist	no flags	Details
Patch (9.80 KB, patch) 2018-01-18 16:39 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (10.01 KB, patch) 2018-01-18 17:01 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (8.84 KB, patch) 2018-01-19 09:42 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2018-01-17 13:29:56 PST

<rdar://problem/36594564>

Chris Dumez

Comment 2 2018-01-17 14:17:23 PST

Created attachment 331541 [details] Patch

Chris Dumez

Comment 3 2018-01-17 14:18:11 PST

*** Bug 181763 has been marked as a duplicate of this bug. ***

youenn fablet

Comment 4 2018-01-18 01:18:56 PST

Comment on attachment 331541 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=331541&action=review > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { Isn't there a risk that the service worker thread might be kept busy (by JS say), will never execute that task and thus will never stop? We could try making ServiceWorkerContainer::ensureSWClientConnection() asynchronous. CacheStorageConnection is using the same pattern though so maybe this deadlock can also happen with regular workers using Cache API. It seems to me we should try to not be service worker specific and handle this at WorkerThread level. WorkerThread::start has a callback so we might be able to know if the worker is started and use that information to stop it now or just after being started. The current callback is called after evaluation of the script which might not be good in case of runaway script. Maybe we could do a callOnMainThread shortly after creation of the worker global scope to check whether worker thread is to be stopped? > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > + } Can there be cases where there is more than one such worker per registration? If not, we could break the loop when we found one. Or maybe it is not too complex to keep a pointer to this worker in SWServerRegistration?

youenn fablet

Comment 5 2018-01-18 01:32:15 PST

(In reply to youenn fablet from comment #4) > Comment on attachment 331541 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. Looking at it again, Cache API will probably not block the worker thread at creation of the global scope. Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a related refactoring might so the trick.

Chris Dumez

Comment 6 2018-01-18 09:07:09 PST

(In reply to youenn fablet from comment #5) > (In reply to youenn fablet from comment #4) > > Comment on attachment 331541 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > say), will never execute that task and thus will never stop? > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > > asynchronous. > > CacheStorageConnection is using the same pattern though so maybe this > > deadlock can also happen with regular workers using Cache API. > > Looking at it again, Cache API will probably not block the worker thread at > creation of the global scope. > Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a > related refactoring might so the trick. Making it async may not be trivial though. I'd have to check. (In reply to youenn fablet from comment #4) > Comment on attachment 331541 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=331541&action=review > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. > It seems to me we should try to not be service worker specific and handle > this at WorkerThread level. > > WorkerThread::start has a callback so we might be able to know if the worker > is started and use that information to stop it now or just after being > started. > The current callback is called after evaluation of the script which might > not be good in case of runaway script. > Maybe we could do a callOnMainThread shortly after creation of the worker > global scope to check whether worker thread is to be stopped? > > > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > > + } > > Can there be cases where there is more than one such worker per > registration? > If not, we could break the loop when we found one. > Or maybe it is not too complex to keep a pointer to this worker in > SWServerRegistration? No, I don't think there could be more than 1 such worker. I could either break from the loop when finding one or store that worker on the registration somehow. I'll look into it.

youenn fablet

Comment 7 2018-01-18 09:17:19 PST

> > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > say), will never execute that task and thus will never stop? > > > > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. I believe this is well handled for regular workers. If there is something special in service workers that would make this exception-throwing-to-stop-spinning strategy to fail, I would hope that we can realign service workers with workers.

Chris Dumez

Comment 8 2018-01-18 09:21:51 PST

(In reply to youenn fablet from comment #7) > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > > say), will never execute that task and thus will never stop? > > > > > > > > This is why I uploaded https://bugs.webkit.org/show_bug.cgi?id=181563. > > I believe this is well handled for regular workers. > If there is something special in service workers that would make this > exception-throwing-to-stop-spinning strategy to fail, I would hope that we > can realign service workers with workers. From what I can see from regular workers, WorkerThread::stop() is called from terminateWorkerGlobalScope(), which is always called on the background thread. Call sites look like: m_scriptExecutionContext->postTask([this] (ScriptExecutionContext&) { m_mayBeDestroyed = true; if (m_workerThread) terminateWorkerGlobalScope(); else workerGlobalScopeDestroyedInternal(); });

Chris Dumez

Comment 9 2018-01-18 12:09:08 PST

Created attachment 331648 [details] Patch

Chris Dumez

Comment 10 2018-01-18 12:09:33 PST

Improved approach based on offline discussion with Youenn and Brady.

EWS Watchlist

Comment 11 2018-01-18 13:42:43 PST

Comment on attachment 331648 [details] Patch Attachment 331648 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/6124606 New failing tests: fast/workers/worker-terminate-forever.html

EWS Watchlist

Comment 12 2018-01-18 13:42:44 PST

Created attachment 331655 [details] Archive of layout-test-results from ews116 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-sierra Platform: Mac OS X 10.12.6

Chris Dumez

Comment 13 2018-01-18 13:53:50 PST

Comment on attachment 331648 [details] Patch Well, this is unfortunate.

EWS Watchlist

Comment 14 2018-01-18 14:05:35 PST

Comment on attachment 331648 [details] Patch Attachment 331648 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/6124993 New failing tests: fast/workers/worker-terminate-forever.html

EWS Watchlist

Comment 15 2018-01-18 14:05:45 PST

Created attachment 331659 [details] Archive of layout-test-results from ews206 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews206 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Chris Dumez

Comment 16 2018-01-18 16:37:50 PST

I looked into making ensureSWConnection() async and this is a larger change and it may introduce bugs, which we really do not want at this point. I think I'll go back to the original dispatching of the WorkerThread::stop() to the bg thread.

Chris Dumez

Comment 17 2018-01-18 16:39:22 PST

Created attachment 331670 [details] Patch

Chris Dumez

Comment 18 2018-01-18 17:01:07 PST

Created attachment 331674 [details] Patch

youenn fablet

Comment 19 2018-01-19 00:54:17 PST

The deadlock shown in Bug 181763 is probably fixed by making ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it outside the WorkerThread::start lock section, in WorkerScriptController::initScript() for instance. Can we do that instead?

Chris Dumez

Comment 20 2018-01-19 07:07:33 PST

(In reply to youenn fablet from comment #19) > The deadlock shown in Bug 181763 is probably fixed by making > ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it > outside the WorkerThread::start lock section, in > WorkerScriptController::initScript() for instance. > Can we do that instead? I thought about this but if I am outside the locked section then the workerglobalscope may get destroyed at any point no? Because stop() is able to lock the mutex at this point.

youenn fablet

Comment 21 2018-01-19 07:25:20 PST

WorkerScriptController::initScript expects a valid global scope. IIRC, it is called outside the lock section and before evaluating the script.

Chris Dumez

Comment 22 2018-01-19 08:57:13 PST

(In reply to youenn fablet from comment #21) > WorkerScriptController::initScript expects a valid global scope. > IIRC, it is called outside the lock section and before evaluating the script. I guess it works out because WorkerThread::stop() post a task to destroy the workerglobalscope on the background thread, using m_runLoop.postTaskAndTerminate(). Still, seems odd to add new infrastructure (new virtual function and call after the lock is released), just to work around the fact that we are calling callOnMainThreadAndWait(). I plan to get rid of callOnMainThreadAndWait() a little bit longer term anyway. Also, dispatching to the bg thread the stop() request, as the current patch does seems fine to me. What's the drawback? If the thread is busy looping and does not process the request within 10 seconds, we will kill the SW process.

Chris Dumez

Comment 23 2018-01-19 09:42:16 PST

Created attachment 331745 [details] Patch

Chris Dumez

Comment 24 2018-01-19 11:13:59 PST

Comment on attachment 331745 [details] Patch Clearing flags on attachment: 331745 Committed r227221: <https://trac.webkit.org/changeset/227221>

Chris Dumez

Comment 25 2018-01-19 11:14:01 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Service Workers

Assignee

Chris Dumez

Reported

2018-01-17 13:29 PST

Modified

2018-01-19 11:14 PST History

CC List

7 users Show

URL

Keywords InRadar

Duplicates (1)

181763 View as bug list

Depends on

Blocks