[WinCairo] dead lock in Thread::destructTLS By running run-webkit-tests, sometimes WebKitNetworkProcess.exe process remains alive. > python ./Tools/Scripts/run-webkit-tests --wincairo --debug --no-new-test-results --no-retry-failures --no-show-results -vf fast By attaching a debugger, I can get the callstack. There is only the main thread. Callstack: > [External Code] > WTF.dll!WTF::WordLock::lockSlow() Line 148 C++ > WTF.dll!WTF::WordLock::lock() Line 62 C++ > WTF.dll!WTF::Locker<WTF::WordLock>::lock() Line 114 C++ > WTF.dll!WTF::Locker<WTF::WordLock>::Locker(WTF::WordLock & lockable) Line 55 C++ > WTF.dll!WTF::holdLock<WTF::WordLock>(WTF::WordLock & lock) Line 127 C++ > WTF.dll!WTF::Thread::destructTLS(void * data) Line 343 C++ > WTF.dll!WTF::Thread::destructTLS::ThreadExitCallback::~ThreadExitCallback() Line 334 C++ > WTF.dll!??__Fcallback@?1??destructTLS@Thread@WTF@@KAXPEAX@Z@YAXXZ() Line 324 C++ > [External Code]
Created attachment 383546 [details] screenshot of debugger
Created attachment 383549 [details] screenshot of debugger (WebKitTestRunner.exe) I observed a case that WebKitTestRunner.exe didn't die.
Created attachment 383593 [details] screenshot of debugger (in WordLock::lockSlow) Waited in the parking, but no other threads were alive.
Created attachment 383602 [details] Patch
Comment on attachment 383602 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=383602&action=review > Source/WTF/wtf/win/ThreadingWin.cpp:268 > +static Mutex threadMapMutex; What happens if you use std::mutex? I'm guessing this is the issue.
Created attachment 383607 [details] Patch to use std::mutex I confirmed using std::mutex also seems solving the issue.
How about the idea implementing ThreadSpecific in WTF::Thread? It'd be easy to ensure deref-ing WTF::Thread in TLS after destructing other ThreadSpecific. We won't need to defer deref-ing WTF::Thread in TLS.
(In reply to Fujii Hironori from comment #7) > How about the idea implementing ThreadSpecific in WTF::Thread? > It'd be easy to ensure deref-ing WTF::Thread in TLS after destructing other > ThreadSpecific. > We won't need to defer deref-ing WTF::Thread in TLS. For Windows, I think it is OK. Windows TLS / FLS has very weird behavior on destructor calls, reviving the TLS entry when destroying the other TLS etc. We are doing weird workaround for now. Maybe, implementing TLS in WTF::Thread would improve stability, and it would be improving performance too (b/c we are using FLS in Windows, and IIRC, it is slow). For non-Windows (like, Darwin), I think the performance problem needs to be solved. And in Darwin / Linux pthread TLS, it seems that it behaves as we want. So I think we can just use pthread TLS on Darwin / Linux.
(In reply to Yusuke Suzuki from comment #8) > > For Windows, I think it is OK. Windows TLS / FLS has very weird behavior on > destructor calls, reviving the TLS entry when destroying the other TLS etc. > We are doing weird workaround for now. Maybe, implementing TLS in > WTF::Thread would improve stability, and it would be improving performance > too (b/c we are using FLS in Windows, and IIRC, it is slow). > > For non-Windows (like, Darwin), I think the performance problem needs to be > solved. > And in Darwin / Linux pthread TLS, it seems that it behaves as we want. So I > think we can just use pthread TLS on Darwin / Linux. IIRC, we saw some performance problem in our WTF::Lock / JSLock in Windows due to slowness of Windows FLS.
(In reply to Fujii Hironori from comment #6) > Created attachment 383607 [details] > Patch to use std::mutex > > I confirmed using std::mutex also seems solving the issue. This sounds very strange to me. WTF::WordLock is intentionally avoiding dependency to TLS to make it usable when destroying threads. Internally, it only uses stack, atomics, std::mutex, and std::condition_variable. If using std::mutex solved the problem, my guess is that root cause of this problem is not in WordLock, rather in std::condition_variable in Windows, which is used in WordLock. Given that, can you try replacing std::mutex in WordLock to Mutex (and functions), and replacing std::condition_variable in WordLock to ThreadCondition (and functions)? If it solves the problem, I think this is much better solution.
Now, my guess is that using std::condition_variable / ThreadCondition after destroying some TLS is the problem.
(In reply to Yusuke Suzuki from comment #11) > Now, my guess is that using std::condition_variable / ThreadCondition after > destroying some TLS is the problem. Can we know what implementation in Windows STL std::mutex / std::condition_variable is used? They recently released their STL code: https://github.com/microsoft/STL and it includes three implementations, vista, win7, and other one. If win7 version is used, I think the behavior is completely same between std::mutex and Mutex, std::condition_variable and ThreadCondition.
(In reply to Yusuke Suzuki from comment #12) > (In reply to Yusuke Suzuki from comment #11) > > Now, my guess is that using std::condition_variable / ThreadCondition after > > destroying some TLS is the problem. > > Can we know what implementation in Windows STL std::mutex / > std::condition_variable is used? They recently released their STL code: > https://github.com/microsoft/STL and it includes three implementations, > vista, win7, and other one. If win7 version is used, I think the behavior is > completely same between std::mutex and Mutex, std::condition_variable and > ThreadCondition. But if CONCRT implementation is used, maybe it would be different. And if std::mutex / Mutex, and std::condition_variable / ThreadCondition implementations are different, it would be possible that using std::condition_variable when destroying TLS is the problem. Anyway, if we can fix the problem by replacing std::mutex to Mutex and replacing std::condition_variable to ThreadCondition in WordLock, it would be very nice since we can just continue using WordLock when such a weird edge case happens. And I think implementing TLS in WTF::Thread for Windows is nice idea.
Created attachment 383608 [details] Patch to use Mutex and ThreadCondition in WordLock Unfortunately, using Mutex and ThreadCondition in WordLock doesn't solve the issue.
Created attachment 383609 [details] screenshot of debugger (with the patch to use Mutex and ThreadCondition in WordLock)
(In reply to Fujii Hironori from comment #15) > Created attachment 383609 [details] > screenshot of debugger (with the patch to use Mutex and ThreadCondition in > WordLock) Interesting. I think, this means that std::condition_variable is the root cause of this problem.
Comment on attachment 383602 [details] Patch Attachment 383602 [details] did not pass win-ews (win): Output: https://webkit-queues.webkit.org/results/13257920 New failing tests: animations/no-style-recalc-during-accelerated-animation.html
Created attachment 383610 [details] Archive of layout-test-results from ews214 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews214 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Comment on attachment 383602 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=383602&action=review r=me >> Source/WTF/wtf/win/ThreadingWin.cpp:268 >> +static Mutex threadMapMutex; > > What happens if you use std::mutex? I'm guessing this is the issue. Can you add a comment that describes why we are using Mutex instead of WordLock here?
It seems using stl_condition_variable_win7. > msvcp140.dll!Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface * lock) Line 180 C++ > msvcp140.dll!do_wait(_Cnd_internal_imp_t * cond, _Mtx_internal_imp_t * mtx, const xtime * target) Line 58 C++ > WTF.dll!std::condition_variable::wait(std::unique_lock<std::mutex> & _Lck) Line 603 C++ > WTF.dll!WTF::WordLock::lockSlow() Line 149 C++ > WTF.dll!WTF::WordLock::lock() Line 62 C++ > WTF.dll!WTF::`anonymous namespace'::enqueue<WTF::`anonymous-namespace'::ThreadData * <lambda>(void)>(const void * address, const WTF::ParkingLot::parkConditionallyImpl::__l2::WTF::`anonymous-namespace'::ThreadData * <lambda>(void) & functor) Line 492 C++ > WTF.dll!WTF::ParkingLot::parkConditionallyImpl(const void * address, const WTF::ScopedLambda<bool __cdecl(void)> & validation, const WTF::ScopedLambda<void __cdecl(void)> & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 578 C++ > WTF.dll!WTF::ParkingLot::parkConditionally<bool <lambda>(void),void <lambda>(void)>(const void * address, const WTF::ParkingLot::compareAndPark::__l2::bool <lambda>(void) & validation, const WTF::ParkingLot::compareAndPark::__l2::void <lambda>(void) & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 82 C++ > WTF.dll!WTF::ParkingLot::compareAndPark<unsigned char,unsigned char>(const WTF::Atomic<unsigned char> * address, unsigned char expected) Line 94 C++ > WTF.dll!WTF::LockAlgorithm<unsigned char,1,2,WTF::EmptyLockHooks<unsigned char>>::lockSlow(WTF::Atomic<unsigned char> & lock) Line 83 C++ > WTF.dll!WTF::Lock::lockSlow() Line 41 C++ > WTF.dll!WTF::Lock::lock() Line 60 C++ > WTF.dll!WTF::Locker<WTF::Lock>::lock() Line 114 C++ > WTF.dll!WTF::Locker<WTF::Lock>::Locker<WTF::Lock>(WTF::Lock & lockable) Line 55 C++ > WTF.dll!WTF::holdLock<WTF::Lock>(WTF::Lock & lock) Line 127 C++ > WTF.dll!WTF::WorkQueue::dispatch(WTF::Function<void __cdecl(void)> && function) Line 104 C++ > WebKit2.dll!IPC::Connection::invokeReadEventHandler() Line 233 C++ > WebKit2.dll!IPC::Connection::open::__l2::<lambda>() Line 253 C++ > WebKit2.dll!WTF::Detail::CallableWrapper<void <lambda>(void),void>::call() Line 52 C++ > WebKit2.dll!WTF::Function<void __cdecl(void)>::operator()() Line 80 C++ > WebKit2.dll!IPC::Connection::EventListener::callback(void * data, unsigned char timerOrWaitFired) Line 338 C++ > [External Code]
Is it possible to kill a thread which is locking the mutex?
(In reply to Fujii Hironori from comment #21) > Is it possible to kill a thread which is locking the mutex? Not sure.
This is my wild guess: 1. A detached thread is looking the WordLock 2. the main thread is going to exit 3. The detached thread is killed while holding the lock 4. Windows reclaims locked mutexes the killed thread was holding, but the WordLock's atomic is not a OS-managed mutex. 5. ~ThreadExitCallback is executed in the main thread.
(In reply to Fujii Hironori from comment #23) > This is my wild guess: > > 1. A detached thread is looking the WordLock > 2. the main thread is going to exit > 3. The detached thread is killed while holding the lock > 4. Windows reclaims locked mutexes the killed thread was holding, but the > WordLock's atomic is not a OS-managed mutex. > 5. ~ThreadExitCallback is executed in the main thread. Interesting. Several questions, 1. Can the main thread run after the system kills other threads? 2. Is stucking thread the main thread? If your guess is true, I think the best way is checking `isMainThread()` status in Thread::destructTLS to avoid execution. Wrap `Thread::destructTLS` with a function, which checks `isMainThread()`. (And rename the current destructTLS to destructTLSImpl etc.).
(In reply to Yusuke Suzuki from comment #24) > (In reply to Fujii Hironori from comment #23) > > This is my wild guess: > > > > 1. A detached thread is looking the WordLock > > 2. the main thread is going to exit > > 3. The detached thread is killed while holding the lock > > 4. Windows reclaims locked mutexes the killed thread was holding, but the > > WordLock's atomic is not a OS-managed mutex. > > 5. ~ThreadExitCallback is executed in the main thread. > > Interesting. Several questions, > > 1. Can the main thread run after the system kills other threads? > 2. Is stucking thread the main thread? > > If your guess is true, I think the best way is checking `isMainThread()` > status in Thread::destructTLS to avoid execution. Wrap `Thread::destructTLS` > with a function, which checks `isMainThread()`. (And rename the current > destructTLS to destructTLSImpl etc.). BTW, I really want to remove MainThread abstraction in WTF and move it to WebCore/PAL since this MainThread is meaningless if we are using JavaScriptCore.framework. I think we should invent some way to get system main thread information from Windows system (like, pthread_main_np()) instead of recording the main thread... Like, bool isSystemMainThread()
Created attachment 383730 [details] a test program to test my hypothesis I created a test program to test my hypothesis (Comment 23). > TestWTF.exe --gtest_filter=WTF_Lock.Test1 The test program also stuck with the same callstack. And, the test program exited promptly if I replace WordLock with Mutex.
The old-fashioned theory on how processes exit | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20070502-00/?p=27023 Quick overview of how processes exit on Windows XP | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20070503-00/?p=27003 During process termination, the gates are now electrified | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20100122-00/?p=15193 ExitProcess function (processthreadsapi.h) - Win32 apps | Microsoft Docs https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-exitprocess Terminating a Process - Win32 apps | Microsoft Docs https://docs.microsoft.com/en-us/windows/win32/procthread/terminating-a-process DllMain entry point (Process.h) - Win32 apps | Microsoft Docs https://docs.microsoft.com/en-us/windows/win32/dlls/dllmain Here are my understanding: * TLS are destructed on DLL_PROCESS_DETACH and DLL_THREAD_DETACH * ExitProcess terminates all other threads withouth notifying DLL_THREAD_DETACH (TLS of those thread aren't destructed) * ExitProcess sends DLL_PROCESS_DETACH (TLS of the thread are destructed)
(In reply to Fujii Hironori from comment #27) > Here are my understanding: This seems wrong. Ignore it.
Created attachment 383737 [details] screenshot of debugger I found a option to show call frames of external module in Visual Studio. This callstack looks better.
(In reply to Yusuke Suzuki from comment #13) > And I think implementing TLS in WTF::Thread for Windows is nice idea. Filed: Bug 204341 – [Win] Implement WTF::ThreadSpecific in WTF::Thread
r252687 removed threadMapMutex. I don't see any issue now. Closed.
(In reply to Fujii Hironori from comment #31) > r252687 removed threadMapMutex. I don't see any issue now. Closed. That’s nice. So we could run JSC tests on Windows, right?
Oh, I just meant I don't observe any other deadlock-during-shutdown issues being described in the article (Comment 27). JSC Windows port still has the suspended thread issue (Bug 191285).