Bug 222682 - [WinCairo] deadlock in WTF::Thread::ThreadHolder::~ThreadHolder while WebKitWebProcess.exe is shutting down
Summary: [WinCairo] deadlock in WTF::Thread::ThreadHolder::~ThreadHolder while WebKitW...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Fujii Hironori
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-03 13:26 PST by Fujii Hironori
Modified: 2021-03-09 11:59 PST (History)
3 users (show)

See Also:


Attachments
Patch (3.01 KB, patch)
2021-03-06 11:50 PST, Fujii Hironori
don.olmstead: review+
Details | Formatted Diff | Diff
Patch (17.15 KB, patch)
2021-03-08 18:10 PST, Fujii Hironori
no flags Details | Formatted Diff | Diff
window-open-crash-log.txt (81.48 KB, text/plain)
2021-03-08 19:48 PST, Fujii Hironori
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fujii Hironori 2021-03-03 13:26:39 PST
[WinCairo] dead lock in WTF::Thread::ThreadHolder::~ThreadHolder while WebKitWebProcess.exe is shutting down

I'm recently infrequently observing WebKitWebProcess.exe's dead lock while it is quiting by running WinCairo WebKit2 layout tests.
I think this problem has started since late February 2021.

Callstack:

> ntdll.dll!NtWaitForAlertByThreadId()	Unknown
> ntdll.dll!RtlSleepConditionVariableSRW()	Unknown
> KernelBase.dll!SleepConditionVariableSRW()	Unknown
> WTF.dll!WTF::ThreadCondition::timedWait(WTF::Mutex & mutex, WTF::WallTime absoluteTime) Line 390	C++
> WTF.dll!WTF::ParkingLot::parkConditionallyImpl(const void * address, const WTF::ScopedLambda<bool ()> & validation, const WTF::ScopedLambda<void ()> & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 602	C++
> [Inline Frame] WTF.dll!WTF::ParkingLot::parkConditionally(const void * address, const WTF::ParkingLot::compareAndPark<unsigned char,unsigned char>::<unnamed-tag> & validation, const WTF::ParkingLot::compareAndPark<unsigned char,unsigned char>::<unnamed-tag> & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 82	C++
> [Inline Frame] WTF.dll!WTF::ParkingLot::compareAndPark(const WTF::Atomic<unsigned char> * address, unsigned char expected) Line 94	C++
> WTF.dll!WTF::LockAlgorithm<unsigned char,'\x01','\x02',WTF::EmptyLockHooks<unsigned char>>::lockSlow(WTF::Atomic<unsigned char> & lock) Line 85	C++
> [Inline Frame] WTF.dll!WTF::Lock::lock() Line 59	C++
> [Inline Frame] WTF.dll!WTF::Locker<WTF::Lock>::lock() Line 128	C++
> [Inline Frame] WTF.dll!WTF::Locker<WTF::Lock>::Locker(WTF::Lock & lockable) Line 60	C++
> [Inline Frame] WTF.dll!WTF::holdLock(WTF::Lock & lock) Line 144	C++
> WTF.dll!WTF::RunLoop::threadWillExit() Line 179	C++
> [Inline Frame] WTF.dll!WTF::RunLoop::Holder::~Holder() Line 51	C++
> [Inline Frame] WTF.dll!WTF::ThreadSpecific<WTF::RunLoop::Holder,WTF::CanBeGCThread::False>::Data::~Data() Line 90	C++
> WTF.dll!WTF::ThreadSpecific<WTF::RunLoop::Holder,WTF::CanBeGCThread::False>::destroy(void * ptr) Line 178	C++
> [Inline Frame] WTF.dll!WTF::Thread::SpecificStorage::destroySlots() Line 330	C++
> [Inline Frame] WTF.dll!WTF::Thread::ThreadHolder::~ThreadHolder() Line 278	C++
> WTF.dll!WTF::`dynamic atexit destructor for 's_threadHolder'() Line 286	C++
> WTF.dll!__dyn_tls_dtor(void * __formal, const unsigned long dwReason, void * __formal) Line 119	C++
> ntdll.dll!LdrpCallInitRoutine()	Unknown
> ntdll.dll!LdrpCallTlsInitializers()	Unknown
> ntdll.dll!LdrShutdownProcess()	Unknown
> ntdll.dll!RtlExitUserProcess()	Unknown
> kernel32.dll!ExitProcessImplementation()	Unknown
> ucrtbase.dll!exit_or_terminate_process()	Unknown
> ucrtbase.dll!common_exit()	Unknown
> WebKit2.dll!WebKit::AuxiliaryProcess::didClose(IPC::Connection &) Line 60	C++
> WebKit2.dll!IPC::Connection::connectionDidClose() Line 849	C++
> WebKit2.dll!IPC::Connection::readEventHandler() Line 158	C++
> [Inline Frame] WTF.dll!WTF::Function<void ()>::operator()() Line 83	C++
> WTF.dll!WTF::RunLoop::performWork() Line 129	C++
> [Inline Frame] WTF.dll!WTF::RunLoop::wndProc(HWND__ * hWnd, unsigned int message, unsigned __int64 wParam, __int64 lParam) Line 56	C++
> WTF.dll!WTF::RunLoop::RunLoopWndProc(HWND__ * hWnd, unsigned int message, unsigned __int64 wParam, __int64 lParam) Line 39	C++
> user32.dll!UserCallWinProcCheckWow()	Unknown
> user32.dll!DispatchMessageWorker()	Unknown
> WTF.dll!WTF::RunLoop::run() Line 73	C++
> [Inline Frame] WTF.dll!WTF::Function<void ()>::operator()() Line 83	C++
> WTF.dll!WTF::Thread::entryPoint(WTF::Thread::NewThreadContext * newThreadContext) Line 182	C++
> WTF.dll!WTF::wtfThreadEntryPoint(void * data) Line 153	C++
> ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>()	Unknown
> kernel32.dll!BaseThreadInitThunk()	Unknown
> ntdll.dll!RtlUserThreadStart()	Unknown


I fixed a similar bug in 10 months ago.
Bug 210955 – [Win] Deadlock in WTF::Thread::didExit() while WebKitNetworkProcess.exe is exiting
Comment 1 Fujii Hironori 2021-03-03 21:05:24 PST
_exit() is called on the IPC thread. However, Bug 210955 fixed
only the case of exiting program from the main thread.

There can be some solutions:

1. Call _exit() on the main thread
2. Return from the main function
3. Use TerminateProcess API instead of _exit() (Bug 210955 Comment 3)
4. Skip the destruction of ~ThreadHolder in the thread invoking _exit() by adding a flag
Comment 2 Fujii Hironori 2021-03-05 18:46:07 PST
The callstack of comment#0 is came from Release build, thus it is confusing.
Here is a callsck of Debug build.

> ntdll.dll!NtWaitForAlertByThreadId()	Unknown
> ntdll.dll!RtlSleepConditionVariableSRW()	Unknown
> KernelBase.dll!SleepConditionVariableSRW()	Unknown
> WTF.dll!WTF::ThreadCondition::timedWait(WTF::Mutex & mutex, WTF::WallTime absoluteTime) Line 390	C++
> WTF.dll!WTF::ParkingLot::parkConditionallyImpl(const void * address, const WTF::ScopedLambda<bool __cdecl(void)> & validation, const WTF::ScopedLambda<void __cdecl(void)> & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 602	C++
> WTF.dll!WTF::ParkingLot::parkConditionally<bool <lambda>(void),void <lambda>(void)>(const void * address, const WTF::ParkingLot::compareAndPark::__l2::bool <lambda>(void) & validation, const WTF::ParkingLot::compareAndPark::__l2::void <lambda>(void) & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 82	C++
> WTF.dll!WTF::ParkingLot::compareAndPark<unsigned char,unsigned char>(const WTF::Atomic<unsigned char> * address, unsigned char expected) Line 94	C++
> WTF.dll!WTF::LockAlgorithm<unsigned char,1,2,WTF::EmptyLockHooks<unsigned char>>::lockSlow(WTF::Atomic<unsigned char> & lock) Line 83	C++
> WTF.dll!WTF::Lock::lockSlow() Line 41	C++
> WTF.dll!WTF::Lock::lock() Line 60	C++
> WTF.dll!WTF::Locker<WTF::Lock>::lock() Line 131	C++
> WTF.dll!WTF::Locker<WTF::Lock>::Locker<WTF::Lock>(WTF::Lock & lockable) Line 60	C++
> WTF.dll!WTF::holdLock<WTF::Lock>(WTF::Lock & lock) Line 144	C++
> WTF.dll!WTF::RunLoop::threadWillExit() Line 179	C++
> WTF.dll!WTF::RunLoop::Holder::~Holder() Line 52	C++
> WTF.dll!WTF::RunLoop::Holder::`scalar deleting destructor'(unsigned int)	C++
> WTF.dll!WTF::ThreadSpecific<WTF::RunLoop::Holder,0>::Data::~Data() Line 91	C++
> WTF.dll!WTF::ThreadSpecific<WTF::RunLoop::Holder,0>::Data::`scalar deleting destructor'(unsigned int)	C++
> WTF.dll!WTF::ThreadSpecific<WTF::RunLoop::Holder,0>::destroy(void * ptr) Line 178	C++
> WTF.dll!WTF::Thread::SpecificStorage::destroySlots() Line 331	C++
> WTF.dll!WTF::Thread::ThreadHolder::~ThreadHolder() Line 279	C++
> WTF.dll!WTF::`dynamic atexit destructor for 's_threadHolder''()	C++
> WTF.dll!__dyn_tls_dtor(void * __formal, const unsigned long dwReason, void * __formal) Line 119	C++
> ntdll.dll!LdrpCallInitRoutine()	Unknown
> ntdll.dll!LdrpCallTlsInitializers()	Unknown
> ntdll.dll!LdrShutdownProcess()	Unknown
> ntdll.dll!RtlExitUserProcess()	Unknown
> kernel32.dll!ExitProcessImplementation()	Unknown
> ucrtbase.dll!exit_or_terminate_process()	Unknown
> ucrtbase.dll!common_exit()	Unknown
> WebKit2.dll!WebKit::callExit(IPC::Connection * __formal) Line 236	C++
> WebKit2.dll!IPC::Connection::connectionDidClose() Line 849	C++
> WebKit2.dll!IPC::Connection::readEventHandler() Line 159	C++
> WebKit2.dll!IPC::Connection::invokeReadEventHandler::__l2::<lambda>() Line 238	C++
> WebKit2.dll!WTF::Detail::CallableWrapper<void <lambda>(void),void>::call() Line 52	C++
> WTF.dll!WTF::Function<void __cdecl(void)>::operator()() Line 84	C++
> WTF.dll!WTF::WorkQueue::dispatch::__l2::<lambda>() Line 62	C++
> WTF.dll!WTF::Detail::CallableWrapper<void <lambda>(void),void>::call() Line 52	C++
> WTF.dll!WTF::Function<void __cdecl(void)>::operator()() Line 84	C++
> WTF.dll!WTF::RunLoop::performWork() Line 129	C++
> WTF.dll!WTF::RunLoop::wndProc(HWND__ * hWnd, unsigned int message, unsigned __int64 wParam, __int64 lParam) Line 57	C++
> WTF.dll!WTF::RunLoop::RunLoopWndProc(HWND__ * hWnd, unsigned int message, unsigned __int64 wParam, __int64 lParam) Line 39	C++
> user32.dll!UserCallWinProcCheckWow()	Unknown
> user32.dll!DispatchMessageWorker()	Unknown
> WTF.dll!WTF::RunLoop::run() Line 74	C++
> WTF.dll!WTF::WorkQueue::platformInitialize::__l2::<lambda>() Line 42	C++
> WTF.dll!WTF::Detail::CallableWrapper<void <lambda>(void),void>::call() Line 52	C++
> WTF.dll!WTF::Function<void __cdecl(void)>::operator()() Line 84	C++
> WTF.dll!WTF::Thread::entryPoint(WTF::Thread::NewThreadContext * newThreadContext) Line 182	C++
> WTF.dll!WTF::wtfThreadEntryPoint(void * data) Line 153	C++
> ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>()	Unknown
> kernel32.dll!BaseThreadInitThunk()	Unknown
> ntdll.dll!RtlUserThreadStart()	Unknown

Connection::connectionDidClose calls WebKit::callExit.
Comment 3 Fujii Hironori 2021-03-05 18:51:31 PST
(In reply to Fujii Hironori from comment #1)
> There can be some solutions:
> 
> 1. Call _exit() on the main thread
> 2. Return from the main function

When the main thread is unresponsive, this approach isn't nice.
Comment 4 Fujii Hironori 2021-03-06 11:50:14 PST
Created attachment 422502 [details]
Patch
Comment 5 Yusuke Suzuki 2021-03-06 21:50:57 PST
Comment on attachment 422502 [details]
Patch

How about defining it in WTF? Like,

void exitProcess(int code)
{
#if OS(WINDOWS)
    // Calling _exit in non-main threads may cause a deadlock in WTF::Thread::ThreadHolder::~ThreadHolder.
    TerminateProcess(GetCurrentProcess(), code);
#else
    _exit(code);
#endif
}
Comment 6 Yusuke Suzuki 2021-03-06 23:25:34 PST
(In reply to Yusuke Suzuki from comment #5)
> Comment on attachment 422502 [details]
> Patch
> 
> How about defining it in WTF? Like,
> 
> void exitProcess(int code)
> {
> #if OS(WINDOWS)
>     // Calling _exit in non-main threads may cause a deadlock in
> WTF::Thread::ThreadHolder::~ThreadHolder.
>     TerminateProcess(GetCurrentProcess(), code);
> #else
>     _exit(code);
> #endif
> }

And let's call it in JSC shell too :)
Comment 7 Fujii Hironori 2021-03-07 12:31:13 PST
Which header is a good place to add?
Process.h can conflict with process.h on Windows (Bug 193944).
Adding ProcessUtility.h?

I'm not sure WTF is the best place because terminating a process
forcibly isn't a good practice. In the following the Microsoft
blog article, they recommend exiting a program after shutting
down all worker threads.

During process termination, the gates are now electrified | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20100122-00/?p=15193
Comment 8 Fujii Hironori 2021-03-08 12:56:35 PST
I can add 'exitProcess' class method to WebKit::AuxiliaryProcess class.

And, if I remove Connection::setDidCloseOnConnectionWorkQueueCallback and add Connection::Client::didCloseOnConnectionWorkQueue,
WebProcess and NetworkProcess will override it like the following.

void WebProcess::didCloseOnConnectionWorkQueue(Connection&)
{
    exitProcess();
}

void NetworkProcess::didCloseOnConnectionWorkQueue(Connection&)
{
    auto watchdogDelay = 10_s;
    WorkQueue::create("com.apple.WebKit.NetworkProcess.WatchDogQueue")->dispatchAfter(watchdogDelay, [] {
       exitProcess();
    });
}

Does this look nice?
Comment 9 Yusuke Suzuki 2021-03-08 12:58:32 PST
(In reply to Fujii Hironori from comment #8)
> I can add 'exitProcess' class method to WebKit::AuxiliaryProcess class.
> 
> And, if I remove Connection::setDidCloseOnConnectionWorkQueueCallback and
> add Connection::Client::didCloseOnConnectionWorkQueue,
> WebProcess and NetworkProcess will override it like the following.
> 
> void WebProcess::didCloseOnConnectionWorkQueue(Connection&)
> {
>     exitProcess();
> }
> 
> void NetworkProcess::didCloseOnConnectionWorkQueue(Connection&)
> {
>     auto watchdogDelay = 10_s;
>    
> WorkQueue::create("com.apple.WebKit.NetworkProcess.WatchDogQueue")-
> >dispatchAfter(watchdogDelay, [] {
>        exitProcess();
>     });
> }
> 
> Does this look nice?

I think this is OK for this WebKit processes. But why doesn't exit (in JSC shell) cause the same problem?
Comment 10 Fujii Hironori 2021-03-08 13:13:09 PST
(In reply to Yusuke Suzuki from comment #9)
> I think this is OK for this WebKit processes. But why doesn't exit (in JSC
> shell) cause the same problem?

JSC shell calls _exit in #if OS(UNIX).
https://github.com/WebKit/WebKit/blob/7b1d9f7d5910d7929b070565c29bcfb81d5f0a32/Source/JavaScriptCore/jsc.cpp#L3232
The deadlock is Windows specific.
Comment 11 Yusuke Suzuki 2021-03-08 13:23:23 PST
(In reply to Fujii Hironori from comment #10)
> (In reply to Yusuke Suzuki from comment #9)
> > I think this is OK for this WebKit processes. But why doesn't exit (in JSC
> > shell) cause the same problem?
> 
> JSC shell calls _exit in #if OS(UNIX).
> https://github.com/WebKit/WebKit/blob/
> 7b1d9f7d5910d7929b070565c29bcfb81d5f0a32/Source/JavaScriptCore/jsc.cpp#L3232
> The deadlock is Windows specific.

Ah, I mean, doesn't jscExit cause the same problem?
https://github.com/WebKit/WebKit/blob/main/Source/JavaScriptCore/jsc.cpp#L167-L183
Comment 12 Fujii Hironori 2021-03-08 13:49:06 PST
(In reply to Yusuke Suzuki from comment #11)
> Ah, I mean, doesn't jscExit cause the same problem?

Is jscExit called only from the main thread?
If so, it is not a problem because r260911 (Bug 210955) skips the destruction in ~ThreadHolder only in the main thread.

Yes, I know you don't like the idea (Bug 204192 Comment 25). 😅
Comment 13 Fujii Hironori 2021-03-08 18:10:41 PST
Created attachment 422648 [details]
Patch
Comment 14 Fujii Hironori 2021-03-08 19:48:14 PST
Created attachment 422658 [details]
window-open-crash-log.txt

mac-wk2 testing bot reported plugins/window-open.html crashed.

Thread 8 Crashed:: Dispatch queue: com.apple.IPC.ReceiveQueue
0   ???                           	000000000000000000 0 + 0
1   com.apple.WebKit              	0x0000000103f05bff IPC::Connection::connectionDidClose() + 191
2   libdispatch.dylib             	0x00007fff7284d658 _dispatch_client_callout + 8
3   libdispatch.dylib             	0x00007fff7284f818 _dispatch_continuation_pop + 414
4   libdispatch.dylib             	0x00007fff7285f4be _dispatch_source_invoke + 2084
5   libdispatch.dylib             	0x00007fff72852af6 _dispatch_lane_serial_drain + 263
6   libdispatch.dylib             	0x00007fff728535d6 _dispatch_lane_invoke + 363
7   libdispatch.dylib             	0x00007fff7285cc09 _dispatch_workloop_worker_thread + 596
8   libsystem_pthread.dylib       	0x00007fff72aaba3d _pthread_wqthread + 290
9   libsystem_pthread.dylib       	0x00007fff72aaab77 start_wqthread + 15

It seems that m_client was already destructed when IPC::Connection::connectionDidClose.
This seems Bug 51115 that is solved by r77874.
Comment 15 Fujii Hironori 2021-03-08 20:25:07 PST
Hmm, I can find no good solutions. I'd like to land the first simple patch at the moment. Is it acceptable?
Comment 16 Fujii Hironori 2021-03-09 11:58:10 PST
Committed r274163 (235084@main): <https://commits.webkit.org/235084@main>