WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
181761
ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()
https://bugs.webkit.org/show_bug.cgi?id=181761
Summary
ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting()
Chris Dumez
Reported
2018-01-17 13:29:11 PST
ASSERT(registration || isTerminating()) hit in SWServerWorker::skipWaiting() on the bots sometimes.
Attachments
Patch
(6.43 KB, patch)
2018-01-17 14:17 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(10.01 KB, patch)
2018-01-18 12:09 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ews116 for mac-sierra
(2.92 MB, application/zip)
2018-01-18 13:42 PST
,
EWS Watchlist
no flags
Details
Archive of layout-test-results from ews206 for win-future
(11.97 MB, application/zip)
2018-01-18 14:05 PST
,
EWS Watchlist
no flags
Details
Patch
(9.80 KB, patch)
2018-01-18 16:39 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(10.01 KB, patch)
2018-01-18 17:01 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(8.84 KB, patch)
2018-01-19 09:42 PST
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2018-01-17 13:29:56 PST
<
rdar://problem/36594564
>
Chris Dumez
Comment 2
2018-01-17 14:17:23 PST
Created
attachment 331541
[details]
Patch
Chris Dumez
Comment 3
2018-01-17 14:18:11 PST
***
Bug 181763
has been marked as a duplicate of this bug. ***
youenn fablet
Comment 4
2018-01-18 01:18:56 PST
Comment on
attachment 331541
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=331541&action=review
> Source/WebCore/workers/service/context/SWContextManager.cpp:113 > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable {
Isn't there a risk that the service worker thread might be kept busy (by JS say), will never execute that task and thus will never stop? We could try making ServiceWorkerContainer::ensureSWClientConnection() asynchronous. CacheStorageConnection is using the same pattern though so maybe this deadlock can also happen with regular workers using Cache API. It seems to me we should try to not be service worker specific and handle this at WorkerThread level. WorkerThread::start has a callback so we might be able to know if the worker is started and use that information to stop it now or just after being started. The current callback is called after evaluation of the script which might not be good in case of runaway script. Maybe we could do a callOnMainThread shortly after creation of the worker global scope to check whether worker thread is to be stopped?
> Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > + }
Can there be cases where there is more than one such worker per registration? If not, we could break the loop when we found one. Or maybe it is not too complex to keep a pointer to this worker in SWServerRegistration?
youenn fablet
Comment 5
2018-01-18 01:32:15 PST
(In reply to youenn fablet from
comment #4
)
> Comment on
attachment 331541
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=331541&action=review
> > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API.
Looking at it again, Cache API will probably not block the worker thread at creation of the global scope. Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a related refactoring might so the trick.
Chris Dumez
Comment 6
2018-01-18 09:07:09 PST
(In reply to youenn fablet from
comment #5
)
> (In reply to youenn fablet from
comment #4
) > > Comment on
attachment 331541
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=331541&action=review
> > > > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > say), will never execute that task and thus will never stop? > >
This is why I uploaded
https://bugs.webkit.org/show_bug.cgi?id=181563
.
> > We could try making ServiceWorkerContainer::ensureSWClientConnection() > > asynchronous. > > CacheStorageConnection is using the same pattern though so maybe this > > deadlock can also happen with regular workers using Cache API. > > Looking at it again, Cache API will probably not block the worker thread at > creation of the global scope. > Making ServiceWorkerContainer::ensureSWClientConnection() asynchronous or a > related refactoring might so the trick.
Making it async may not be trivial though. I'd have to check. (In reply to youenn fablet from
comment #4
)
> Comment on
attachment 331541
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=331541&action=review
> > > Source/WebCore/workers/service/context/SWContextManager.cpp:113 > > + serviceWorker->thread().runLoop().postTask([identifier, serviceWorker = WTFMove(serviceWorker), completionHandler = WTFMove(completionHandler)](ScriptExecutionContext& context) mutable { > > Isn't there a risk that the service worker thread might be kept busy (by JS > say), will never execute that task and thus will never stop? > > We could try making ServiceWorkerContainer::ensureSWClientConnection() > asynchronous. > CacheStorageConnection is using the same pattern though so maybe this > deadlock can also happen with regular workers using Cache API. > It seems to me we should try to not be service worker specific and handle > this at WorkerThread level. > > WorkerThread::start has a callback so we might be able to know if the worker > is started and use that information to stop it now or just after being > started. > The current callback is called after evaluation of the script which might > not be good in case of runaway script. > Maybe we could do a callOnMainThread shortly after creation of the worker > global scope to check whether worker thread is to be stopped? > > > Source/WebCore/workers/service/server/SWServerRegistration.cpp:250 > > + } > > Can there be cases where there is more than one such worker per > registration? > If not, we could break the loop when we found one. > Or maybe it is not too complex to keep a pointer to this worker in > SWServerRegistration?
No, I don't think there could be more than 1 such worker. I could either break from the loop when finding one or store that worker on the registration somehow. I'll look into it.
youenn fablet
Comment 7
2018-01-18 09:17:19 PST
> > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > say), will never execute that task and thus will never stop? > > > > > This is why I uploaded
https://bugs.webkit.org/show_bug.cgi?id=181563
.
I believe this is well handled for regular workers. If there is something special in service workers that would make this exception-throwing-to-stop-spinning strategy to fail, I would hope that we can realign service workers with workers.
Chris Dumez
Comment 8
2018-01-18 09:21:51 PST
(In reply to youenn fablet from
comment #7
)
> > > > Isn't there a risk that the service worker thread might be kept busy (by JS > > > > say), will never execute that task and thus will never stop? > > > > > > > > This is why I uploaded
https://bugs.webkit.org/show_bug.cgi?id=181563
. > > I believe this is well handled for regular workers. > If there is something special in service workers that would make this > exception-throwing-to-stop-spinning strategy to fail, I would hope that we > can realign service workers with workers.
From what I can see from regular workers, WorkerThread::stop() is called from terminateWorkerGlobalScope(), which is always called on the background thread. Call sites look like: m_scriptExecutionContext->postTask([this] (ScriptExecutionContext&) { m_mayBeDestroyed = true; if (m_workerThread) terminateWorkerGlobalScope(); else workerGlobalScopeDestroyedInternal(); });
Chris Dumez
Comment 9
2018-01-18 12:09:08 PST
Created
attachment 331648
[details]
Patch
Chris Dumez
Comment 10
2018-01-18 12:09:33 PST
Improved approach based on offline discussion with Youenn and Brady.
EWS Watchlist
Comment 11
2018-01-18 13:42:43 PST
Comment on
attachment 331648
[details]
Patch
Attachment 331648
[details]
did not pass mac-debug-ews (mac): Output:
http://webkit-queues.webkit.org/results/6124606
New failing tests: fast/workers/worker-terminate-forever.html
EWS Watchlist
Comment 12
2018-01-18 13:42:44 PST
Created
attachment 331655
[details]
Archive of layout-test-results from ews116 for mac-sierra The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews116 Port: mac-sierra Platform: Mac OS X 10.12.6
Chris Dumez
Comment 13
2018-01-18 13:53:50 PST
Comment on
attachment 331648
[details]
Patch Well, this is unfortunate.
EWS Watchlist
Comment 14
2018-01-18 14:05:35 PST
Comment on
attachment 331648
[details]
Patch
Attachment 331648
[details]
did not pass win-ews (win): Output:
http://webkit-queues.webkit.org/results/6124993
New failing tests: fast/workers/worker-terminate-forever.html
EWS Watchlist
Comment 15
2018-01-18 14:05:45 PST
Created
attachment 331659
[details]
Archive of layout-test-results from ews206 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews206 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Chris Dumez
Comment 16
2018-01-18 16:37:50 PST
I looked into making ensureSWConnection() async and this is a larger change and it may introduce bugs, which we really do not want at this point. I think I'll go back to the original dispatching of the WorkerThread::stop() to the bg thread.
Chris Dumez
Comment 17
2018-01-18 16:39:22 PST
Created
attachment 331670
[details]
Patch
Chris Dumez
Comment 18
2018-01-18 17:01:07 PST
Created
attachment 331674
[details]
Patch
youenn fablet
Comment 19
2018-01-19 00:54:17 PST
The deadlock shown in
Bug 181763
is probably fixed by making ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it outside the WorkerThread::start lock section, in WorkerScriptController::initScript() for instance. Can we do that instead?
Chris Dumez
Comment 20
2018-01-19 07:07:33 PST
(In reply to youenn fablet from
comment #19
)
> The deadlock shown in
Bug 181763
is probably fixed by making > ServiceWorkerGlobalScope::m_registration a RefPtr<> and lazily create it > outside the WorkerThread::start lock section, in > WorkerScriptController::initScript() for instance. > Can we do that instead?
I thought about this but if I am outside the locked section then the workerglobalscope may get destroyed at any point no? Because stop() is able to lock the mutex at this point.
youenn fablet
Comment 21
2018-01-19 07:25:20 PST
WorkerScriptController::initScript expects a valid global scope. IIRC, it is called outside the lock section and before evaluating the script.
Chris Dumez
Comment 22
2018-01-19 08:57:13 PST
(In reply to youenn fablet from
comment #21
)
> WorkerScriptController::initScript expects a valid global scope. > IIRC, it is called outside the lock section and before evaluating the script.
I guess it works out because WorkerThread::stop() post a task to destroy the workerglobalscope on the background thread, using m_runLoop.postTaskAndTerminate(). Still, seems odd to add new infrastructure (new virtual function and call after the lock is released), just to work around the fact that we are calling callOnMainThreadAndWait(). I plan to get rid of callOnMainThreadAndWait() a little bit longer term anyway. Also, dispatching to the bg thread the stop() request, as the current patch does seems fine to me. What's the drawback? If the thread is busy looping and does not process the request within 10 seconds, we will kill the SW process.
Chris Dumez
Comment 23
2018-01-19 09:42:16 PST
Created
attachment 331745
[details]
Patch
Chris Dumez
Comment 24
2018-01-19 11:13:59 PST
Comment on
attachment 331745
[details]
Patch Clearing flags on attachment: 331745 Committed
r227221
: <
https://trac.webkit.org/changeset/227221
>
Chris Dumez
Comment 25
2018-01-19 11:14:01 PST
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug