Bug 180045 - Flaky crash in WebCore::DOMGuardedObject::clear() during service worker tests
Summary: Flaky crash in WebCore::DOMGuardedObject::clear() during service worker tests
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-11-27 10:48 PST by Chris Dumez
Modified: 2017-11-29 21:48 PST (History)
8 users (show)

See Also:


Attachments
Patch (6.17 KB, patch)
2017-11-28 18:09 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (4.23 KB, patch)
2017-11-29 11:02 PST, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Dumez 2017-11-27 10:48:47 PST
Flaky crash in WebCore::DOMGuardedObject::clear() during service worker tests:
Thread 7 Crashed:: WebCore: Worker
0   com.apple.JavaScriptCore      	0x0000000121066234 WTFCrash + 36 (Assertions.cpp:270)
1   com.apple.WebCore             	0x0000000113600c18 WebCore::DOMGuardedObject::clear() + 104 (JSDOMGuardedObject.cpp:50)
2   com.apple.WebCore             	0x0000000113600d78 WebCore::DOMGuardedObject::contextDestroyed() + 40 (JSDOMGuardedObject.cpp:62)
3   com.apple.WebCore             	0x0000000113bf6008 WebCore::ScriptExecutionContext::~ScriptExecutionContext() + 184 (ScriptExecutionContext.cpp:124)
4   com.apple.WebCore             	0x0000000114f04081 WebCore::WorkerGlobalScope::~WorkerGlobalScope() + 977 (WorkerGlobalScope.cpp:96)
5   com.apple.WebCore             	0x0000000114f50662 WebCore::ServiceWorkerGlobalScope::~ServiceWorkerGlobalScope() + 98 (ServiceWorkerGlobalScope.cpp:43)
6   com.apple.WebCore             	0x0000000114f50685 WebCore::ServiceWorkerGlobalScope::~ServiceWorkerGlobalScope() + 21 (ServiceWorkerGlobalScope.cpp:43)
7   com.apple.WebCore             	0x0000000114f506c9 WebCore::ServiceWorkerGlobalScope::~ServiceWorkerGlobalScope() + 25 (ServiceWorkerGlobalScope.cpp:43)
8   com.apple.WebCore             	0x0000000112582081 WTF::RefCounted<WebCore::WorkerGlobalScope>::deref() const + 81 (RefCounted.h:145)
9   com.apple.WebCore             	0x0000000113035347 void WTF::derefIfNotNull<WebCore::WorkerGlobalScope>(WebCore::WorkerGlobalScope*) + 55 (RefPtr.h:46)
10  com.apple.WebCore             	0x0000000113510d8b WTF::RefPtr<WebCore::WorkerGlobalScope>::operator=(std::nullptr_t) + 91 (RefPtr.h:152)
11  com.apple.WebCore             	0x0000000114f1f475 WebCore::WorkerThread::workerThread() + 1989 (WorkerThread.cpp:232)
12  com.apple.WebCore             	0x0000000114f25e28 WebCore::WorkerThread::start(WTF::Function<void (WTF::String const&)>&&)::$_12::operator()() const + 24 (WorkerThread.cpp:145)
13  com.apple.WebCore             	0x0000000114f25de9 WTF::Function<void ()>::CallableWrapper<WebCore::WorkerThread::start(WTF::Function<void (WTF::String const&)>&&)::$_12>::call() + 25 (Function.h:101)
14  com.apple.JavaScriptCore      	0x00000001210a17ab WTF::Function<void ()>::operator()() const + 139 (Function.h:56)
15  com.apple.JavaScriptCore      	0x00000001210ef6ef WTF::Thread::entryPoint(WTF::Thread::NewThreadContext*) + 351 (Threading.cpp:129)
16  com.apple.JavaScriptCore      	0x00000001210f4db5 WTF::wtfThreadEntryPoint(void*) + 21 (ThreadingPthreads.cpp:223)
17  libsystem_pthread.dylib       	0x00007fff8dd8699d _pthread_body + 131
18  libsystem_pthread.dylib       	0x00007fff8dd8691a _pthread_start + 168
19  libsystem_pthread.dylib       	0x00007fff8dd84351 thread_start + 13

See https://build.webkit.org/results/Apple%20El%20Capitan%20Debug%20WK2%20(Tests)/r225150%20(4315)/com.apple.WebKit.WebContent.Development-27977-crash-log.txt

The symptom is that a lot of SW tests time out during a layout test run because the SW context process has crashed.
Comment 1 Chris Dumez 2017-11-27 10:56:21 PST
Assertion in DOMGuardedObject::clear() is:
    ASSERT(!m_guarded || m_globalObject);

I am not familiar with this code, not sure why this is bad.
Comment 2 Chris Dumez 2017-11-27 10:56:58 PST
Seems to happen on Service worker termination, when we destroy the ServiceWorkerGlobalScope object.
Comment 3 Chris Dumez 2017-11-27 14:55:36 PST
We believe this is likely the promise for a fetch() in a SW that stays alive longer than the ServiceWorkerGlobalScope object.
Comment 4 Chris Dumez 2017-11-27 15:00:25 PST
well, I guess it could be the promises for DOMCache API as well :/
Comment 5 youenn fablet 2017-11-27 15:14:30 PST
One possibility might be that we should clean the WorkerCacheStorageConnection callbacks.
I will take a look and try to reproduce the crash with regular workers.
Comment 6 Alexey Proskuryakov 2017-11-28 11:02:24 PST
Duplicate of bug 178345?
Comment 7 Chris Dumez 2017-11-28 11:44:01 PST
(In reply to Alexey Proskuryakov from comment #6)
> Duplicate of bug 178345?

bug 178345 is interesting because it is slightly different. In particular, it occurs when destroying a DedicatedWorkerGlobalScope, not a ServiceWorkerGlobalScope.
This is likely the same underlying bug though. I guess this answers Youenn's question about this being specific to service workers or not.
Comment 8 Alexey Proskuryakov 2017-11-28 17:05:56 PST
I can reproduce 100% with an ASan build:

run-webkit-tests --no-retry --no-sample --repeat 2 http/tests/workers/service/basic-unregister-then-register-again-reuse.html
Comment 9 Radar WebKit Bug Importer 2017-11-28 17:10:23 PST
<rdar://problem/35737288>
Comment 10 youenn fablet 2017-11-28 17:26:44 PST
We discussed this a bit with Chris.
One possibility is that DOMGuardedObjects are cleared too late in the game for Workers.
We might need to clear them explicitly as part of stopping workers.

I'll try to upload a patch in that sense.
Comment 11 youenn fablet 2017-11-28 18:09:30 PST
Created attachment 327812 [details]
Patch
Comment 12 Alexey Proskuryakov 2017-11-28 18:29:33 PST
Also reproduces with http/tests/workers/service/service-worker-cache-api.https.html.
Comment 13 youenn fablet 2017-11-29 09:06:29 PST
With the patch, I am no longer experiencing any crash on service worker tests with ASAN build.

I am still not sure why the same kind of issues would not happen in regular environment.
Comment 14 youenn fablet 2017-11-29 09:10:41 PST
Chris suggested to clear this in DOMGlobalObject destructor, which would fix my concern related to window environment as well.
Will give it a try.
Comment 15 youenn fablet 2017-11-29 11:02:37 PST
Created attachment 327876 [details]
Patch
Comment 16 youenn fablet 2017-11-29 11:08:23 PST
(In reply to youenn fablet from comment #15)
> Created attachment 327876 [details]
> Patch

Moved the clearing of dom guarded objects just before we remove the strong reference to the worker global object from worker script controller.
Comment 17 WebKit Commit Bot 2017-11-29 13:39:59 PST
Comment on attachment 327876 [details]
Patch

Clearing flags on attachment: 327876

Committed r225291: <https://trac.webkit.org/changeset/225291>
Comment 18 WebKit Commit Bot 2017-11-29 13:40:01 PST
All reviewed patches have been landed.  Closing bug.
Comment 19 Alexey Proskuryakov 2017-11-29 21:48:33 PST
(In reply to youenn fablet from comment #13)
> With the patch, I am no longer experiencing any crash on service worker
> tests with ASAN build.

Confirmed, I also cannot reproduce any more. I guess it remains to be seen if the same crash can happen for a different reason.

> I am still not sure why the same kind of issues would not happen in regular
> environment.

I believe it does happen - the original report is about a non-ASan build.