WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
204192
[WinCairo] dead lock in Thread::destructTLS
https://bugs.webkit.org/show_bug.cgi?id=204192
Summary
[WinCairo] dead lock in Thread::destructTLS
Fujii Hironori
Reported
2019-11-13 23:35:38 PST
[WinCairo] dead lock in Thread::destructTLS By running run-webkit-tests, sometimes WebKitNetworkProcess.exe process remains alive.
> python ./Tools/Scripts/run-webkit-tests --wincairo --debug --no-new-test-results --no-retry-failures --no-show-results -vf fast
By attaching a debugger, I can get the callstack. There is only the main thread. Callstack:
> [External Code] > WTF.dll!WTF::WordLock::lockSlow() Line 148 C++ > WTF.dll!WTF::WordLock::lock() Line 62 C++ > WTF.dll!WTF::Locker<WTF::WordLock>::lock() Line 114 C++ > WTF.dll!WTF::Locker<WTF::WordLock>::Locker(WTF::WordLock & lockable) Line 55 C++ > WTF.dll!WTF::holdLock<WTF::WordLock>(WTF::WordLock & lock) Line 127 C++ > WTF.dll!WTF::Thread::destructTLS(void * data) Line 343 C++ > WTF.dll!WTF::Thread::destructTLS::ThreadExitCallback::~ThreadExitCallback() Line 334 C++ > WTF.dll!??__Fcallback@?1??destructTLS@Thread@WTF@@KAXPEAX@Z@YAXXZ() Line 324 C++ > [External Code]
Attachments
screenshot of debugger
(249.46 KB, image/png)
2019-11-13 23:56 PST
,
Fujii Hironori
no flags
Details
screenshot of debugger (WebKitTestRunner.exe)
(221.74 KB, image/png)
2019-11-14 01:43 PST
,
Fujii Hironori
no flags
Details
screenshot of debugger (in WordLock::lockSlow)
(279.38 KB, image/png)
2019-11-14 19:06 PST
,
Fujii Hironori
no flags
Details
Patch
(1.58 KB, patch)
2019-11-14 23:06 PST
,
Fujii Hironori
ysuzuki
: review+
ews-watchlist
: commit-queue-
Details
Formatted Diff
Diff
Patch to use std::mutex
(685 bytes, patch)
2019-11-15 01:28 PST
,
Fujii Hironori
no flags
Details
Formatted Diff
Diff
Patch to use Mutex and ThreadCondition in WordLock
(1.68 KB, patch)
2019-11-15 02:37 PST
,
Fujii Hironori
no flags
Details
Formatted Diff
Diff
screenshot of debugger (with the patch to use Mutex and ThreadCondition in WordLock)
(252.26 KB, image/png)
2019-11-15 02:38 PST
,
Fujii Hironori
no flags
Details
Archive of layout-test-results from ews214 for win-future
(14.83 MB, application/zip)
2019-11-15 02:43 PST
,
EWS Watchlist
no flags
Details
a test program to test my hypothesis
(900 bytes, patch)
2019-11-17 21:23 PST
,
Fujii Hironori
no flags
Details
Formatted Diff
Diff
screenshot of debugger
(326.99 KB, image/png)
2019-11-18 03:54 PST
,
Fujii Hironori
no flags
Details
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Fujii Hironori
Comment 1
2019-11-13 23:56:51 PST
Created
attachment 383546
[details]
screenshot of debugger
Fujii Hironori
Comment 2
2019-11-14 01:43:36 PST
Created
attachment 383549
[details]
screenshot of debugger (WebKitTestRunner.exe) I observed a case that WebKitTestRunner.exe didn't die.
Fujii Hironori
Comment 3
2019-11-14 19:06:12 PST
Created
attachment 383593
[details]
screenshot of debugger (in WordLock::lockSlow) Waited in the parking, but no other threads were alive.
Fujii Hironori
Comment 4
2019-11-14 23:06:01 PST
Created
attachment 383602
[details]
Patch
Yusuke Suzuki
Comment 5
2019-11-15 00:12:06 PST
Comment on
attachment 383602
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=383602&action=review
> Source/WTF/wtf/win/ThreadingWin.cpp:268 > +static Mutex threadMapMutex;
What happens if you use std::mutex? I'm guessing this is the issue.
Fujii Hironori
Comment 6
2019-11-15 01:28:36 PST
Created
attachment 383607
[details]
Patch to use std::mutex I confirmed using std::mutex also seems solving the issue.
Fujii Hironori
Comment 7
2019-11-15 01:40:21 PST
How about the idea implementing ThreadSpecific in WTF::Thread? It'd be easy to ensure deref-ing WTF::Thread in TLS after destructing other ThreadSpecific. We won't need to defer deref-ing WTF::Thread in TLS.
Yusuke Suzuki
Comment 8
2019-11-15 01:54:15 PST
(In reply to Fujii Hironori from
comment #7
)
> How about the idea implementing ThreadSpecific in WTF::Thread? > It'd be easy to ensure deref-ing WTF::Thread in TLS after destructing other > ThreadSpecific. > We won't need to defer deref-ing WTF::Thread in TLS.
For Windows, I think it is OK. Windows TLS / FLS has very weird behavior on destructor calls, reviving the TLS entry when destroying the other TLS etc. We are doing weird workaround for now. Maybe, implementing TLS in WTF::Thread would improve stability, and it would be improving performance too (b/c we are using FLS in Windows, and IIRC, it is slow). For non-Windows (like, Darwin), I think the performance problem needs to be solved. And in Darwin / Linux pthread TLS, it seems that it behaves as we want. So I think we can just use pthread TLS on Darwin / Linux.
Yusuke Suzuki
Comment 9
2019-11-15 01:55:56 PST
(In reply to Yusuke Suzuki from
comment #8
)
> > For Windows, I think it is OK. Windows TLS / FLS has very weird behavior on > destructor calls, reviving the TLS entry when destroying the other TLS etc. > We are doing weird workaround for now. Maybe, implementing TLS in > WTF::Thread would improve stability, and it would be improving performance > too (b/c we are using FLS in Windows, and IIRC, it is slow). > > For non-Windows (like, Darwin), I think the performance problem needs to be > solved. > And in Darwin / Linux pthread TLS, it seems that it behaves as we want. So I > think we can just use pthread TLS on Darwin / Linux.
IIRC, we saw some performance problem in our WTF::Lock / JSLock in Windows due to slowness of Windows FLS.
Yusuke Suzuki
Comment 10
2019-11-15 02:00:48 PST
(In reply to Fujii Hironori from
comment #6
)
> Created
attachment 383607
[details]
> Patch to use std::mutex > > I confirmed using std::mutex also seems solving the issue.
This sounds very strange to me. WTF::WordLock is intentionally avoiding dependency to TLS to make it usable when destroying threads. Internally, it only uses stack, atomics, std::mutex, and std::condition_variable. If using std::mutex solved the problem, my guess is that root cause of this problem is not in WordLock, rather in std::condition_variable in Windows, which is used in WordLock. Given that, can you try replacing std::mutex in WordLock to Mutex (and functions), and replacing std::condition_variable in WordLock to ThreadCondition (and functions)? If it solves the problem, I think this is much better solution.
Yusuke Suzuki
Comment 11
2019-11-15 02:06:30 PST
Now, my guess is that using std::condition_variable / ThreadCondition after destroying some TLS is the problem.
Yusuke Suzuki
Comment 12
2019-11-15 02:08:05 PST
(In reply to Yusuke Suzuki from
comment #11
)
> Now, my guess is that using std::condition_variable / ThreadCondition after > destroying some TLS is the problem.
Can we know what implementation in Windows STL std::mutex / std::condition_variable is used? They recently released their STL code:
https://github.com/microsoft/STL
and it includes three implementations, vista, win7, and other one. If win7 version is used, I think the behavior is completely same between std::mutex and Mutex, std::condition_variable and ThreadCondition.
Yusuke Suzuki
Comment 13
2019-11-15 02:11:51 PST
(In reply to Yusuke Suzuki from
comment #12
)
> (In reply to Yusuke Suzuki from
comment #11
) > > Now, my guess is that using std::condition_variable / ThreadCondition after > > destroying some TLS is the problem. > > Can we know what implementation in Windows STL std::mutex / > std::condition_variable is used? They recently released their STL code: >
https://github.com/microsoft/STL
and it includes three implementations, > vista, win7, and other one. If win7 version is used, I think the behavior is > completely same between std::mutex and Mutex, std::condition_variable and > ThreadCondition.
But if CONCRT implementation is used, maybe it would be different. And if std::mutex / Mutex, and std::condition_variable / ThreadCondition implementations are different, it would be possible that using std::condition_variable when destroying TLS is the problem. Anyway, if we can fix the problem by replacing std::mutex to Mutex and replacing std::condition_variable to ThreadCondition in WordLock, it would be very nice since we can just continue using WordLock when such a weird edge case happens. And I think implementing TLS in WTF::Thread for Windows is nice idea.
Fujii Hironori
Comment 14
2019-11-15 02:37:19 PST
Created
attachment 383608
[details]
Patch to use Mutex and ThreadCondition in WordLock Unfortunately, using Mutex and ThreadCondition in WordLock doesn't solve the issue.
Fujii Hironori
Comment 15
2019-11-15 02:38:11 PST
Created
attachment 383609
[details]
screenshot of debugger (with the patch to use Mutex and ThreadCondition in WordLock)
Yusuke Suzuki
Comment 16
2019-11-15 02:41:53 PST
(In reply to Fujii Hironori from
comment #15
)
> Created
attachment 383609
[details]
> screenshot of debugger (with the patch to use Mutex and ThreadCondition in > WordLock)
Interesting. I think, this means that std::condition_variable is the root cause of this problem.
EWS Watchlist
Comment 17
2019-11-15 02:43:31 PST
Comment on
attachment 383602
[details]
Patch
Attachment 383602
[details]
did not pass win-ews (win): Output:
https://webkit-queues.webkit.org/results/13257920
New failing tests: animations/no-style-recalc-during-accelerated-animation.html
EWS Watchlist
Comment 18
2019-11-15 02:43:34 PST
Created
attachment 383610
[details]
Archive of layout-test-results from ews214 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews214 Port: win-future Platform: CYGWIN_NT-10.0-17763-3.0.5-338.x86_64-x86_64-64bit
Yusuke Suzuki
Comment 19
2019-11-15 02:46:01 PST
Comment on
attachment 383602
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=383602&action=review
r=me
>> Source/WTF/wtf/win/ThreadingWin.cpp:268 >> +static Mutex threadMapMutex; > > What happens if you use std::mutex? I'm guessing this is the issue.
Can you add a comment that describes why we are using Mutex instead of WordLock here?
Fujii Hironori
Comment 20
2019-11-15 03:16:19 PST
It seems using stl_condition_variable_win7.
> msvcp140.dll!Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface * lock) Line 180 C++ > msvcp140.dll!do_wait(_Cnd_internal_imp_t * cond, _Mtx_internal_imp_t * mtx, const xtime * target) Line 58 C++ > WTF.dll!std::condition_variable::wait(std::unique_lock<std::mutex> & _Lck) Line 603 C++ > WTF.dll!WTF::WordLock::lockSlow() Line 149 C++ > WTF.dll!WTF::WordLock::lock() Line 62 C++ > WTF.dll!WTF::`anonymous namespace'::enqueue<WTF::`anonymous-namespace'::ThreadData * <lambda>(void)>(const void * address, const WTF::ParkingLot::parkConditionallyImpl::__l2::WTF::`anonymous-namespace'::ThreadData * <lambda>(void) & functor) Line 492 C++ > WTF.dll!WTF::ParkingLot::parkConditionallyImpl(const void * address, const WTF::ScopedLambda<bool __cdecl(void)> & validation, const WTF::ScopedLambda<void __cdecl(void)> & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 578 C++ > WTF.dll!WTF::ParkingLot::parkConditionally<bool <lambda>(void),void <lambda>(void)>(const void * address, const WTF::ParkingLot::compareAndPark::__l2::bool <lambda>(void) & validation, const WTF::ParkingLot::compareAndPark::__l2::void <lambda>(void) & beforeSleep, const WTF::TimeWithDynamicClockType & timeout) Line 82 C++ > WTF.dll!WTF::ParkingLot::compareAndPark<unsigned char,unsigned char>(const WTF::Atomic<unsigned char> * address, unsigned char expected) Line 94 C++ > WTF.dll!WTF::LockAlgorithm<unsigned char,1,2,WTF::EmptyLockHooks<unsigned char>>::lockSlow(WTF::Atomic<unsigned char> & lock) Line 83 C++ > WTF.dll!WTF::Lock::lockSlow() Line 41 C++ > WTF.dll!WTF::Lock::lock() Line 60 C++ > WTF.dll!WTF::Locker<WTF::Lock>::lock() Line 114 C++ > WTF.dll!WTF::Locker<WTF::Lock>::Locker<WTF::Lock>(WTF::Lock & lockable) Line 55 C++ > WTF.dll!WTF::holdLock<WTF::Lock>(WTF::Lock & lock) Line 127 C++ > WTF.dll!WTF::WorkQueue::dispatch(WTF::Function<void __cdecl(void)> && function) Line 104 C++ > WebKit2.dll!IPC::Connection::invokeReadEventHandler() Line 233 C++ > WebKit2.dll!IPC::Connection::open::__l2::<lambda>() Line 253 C++ > WebKit2.dll!WTF::Detail::CallableWrapper<void <lambda>(void),void>::call() Line 52 C++ > WebKit2.dll!WTF::Function<void __cdecl(void)>::operator()() Line 80 C++ > WebKit2.dll!IPC::Connection::EventListener::callback(void * data, unsigned char timerOrWaitFired) Line 338 C++ > [External Code]
Fujii Hironori
Comment 21
2019-11-15 03:19:54 PST
Is it possible to kill a thread which is locking the mutex?
Yusuke Suzuki
Comment 22
2019-11-15 11:56:26 PST
(In reply to Fujii Hironori from
comment #21
)
> Is it possible to kill a thread which is locking the mutex?
Not sure.
Fujii Hironori
Comment 23
2019-11-15 16:03:36 PST
This is my wild guess: 1. A detached thread is looking the WordLock 2. the main thread is going to exit 3. The detached thread is killed while holding the lock 4. Windows reclaims locked mutexes the killed thread was holding, but the WordLock's atomic is not a OS-managed mutex. 5. ~ThreadExitCallback is executed in the main thread.
Yusuke Suzuki
Comment 24
2019-11-15 16:40:04 PST
(In reply to Fujii Hironori from
comment #23
)
> This is my wild guess: > > 1. A detached thread is looking the WordLock > 2. the main thread is going to exit > 3. The detached thread is killed while holding the lock > 4. Windows reclaims locked mutexes the killed thread was holding, but the > WordLock's atomic is not a OS-managed mutex. > 5. ~ThreadExitCallback is executed in the main thread.
Interesting. Several questions, 1. Can the main thread run after the system kills other threads? 2. Is stucking thread the main thread? If your guess is true, I think the best way is checking `isMainThread()` status in Thread::destructTLS to avoid execution. Wrap `Thread::destructTLS` with a function, which checks `isMainThread()`. (And rename the current destructTLS to destructTLSImpl etc.).
Yusuke Suzuki
Comment 25
2019-11-15 16:49:19 PST
(In reply to Yusuke Suzuki from
comment #24
)
> (In reply to Fujii Hironori from
comment #23
) > > This is my wild guess: > > > > 1. A detached thread is looking the WordLock > > 2. the main thread is going to exit > > 3. The detached thread is killed while holding the lock > > 4. Windows reclaims locked mutexes the killed thread was holding, but the > > WordLock's atomic is not a OS-managed mutex. > > 5. ~ThreadExitCallback is executed in the main thread. > > Interesting. Several questions, > > 1. Can the main thread run after the system kills other threads? > 2. Is stucking thread the main thread? > > If your guess is true, I think the best way is checking `isMainThread()` > status in Thread::destructTLS to avoid execution. Wrap `Thread::destructTLS` > with a function, which checks `isMainThread()`. (And rename the current > destructTLS to destructTLSImpl etc.).
BTW, I really want to remove MainThread abstraction in WTF and move it to WebCore/PAL since this MainThread is meaningless if we are using JavaScriptCore.framework. I think we should invent some way to get system main thread information from Windows system (like, pthread_main_np()) instead of recording the main thread... Like, bool isSystemMainThread()
Fujii Hironori
Comment 26
2019-11-17 21:23:46 PST
Created
attachment 383730
[details]
a test program to test my hypothesis I created a test program to test my hypothesis (
Comment 23
).
> TestWTF.exe --gtest_filter=WTF_Lock.Test1
The test program also stuck with the same callstack. And, the test program exited promptly if I replace WordLock with Mutex.
Fujii Hironori
Comment 27
2019-11-17 23:54:52 PST
The old-fashioned theory on how processes exit | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20070502-00/?p=27023
Quick overview of how processes exit on Windows XP | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20070503-00/?p=27003
During process termination, the gates are now electrified | The Old New Thing
https://devblogs.microsoft.com/oldnewthing/20100122-00/?p=15193
ExitProcess function (processthreadsapi.h) - Win32 apps | Microsoft Docs
https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-exitprocess
Terminating a Process - Win32 apps | Microsoft Docs
https://docs.microsoft.com/en-us/windows/win32/procthread/terminating-a-process
DllMain entry point (Process.h) - Win32 apps | Microsoft Docs
https://docs.microsoft.com/en-us/windows/win32/dlls/dllmain
Here are my understanding: * TLS are destructed on DLL_PROCESS_DETACH and DLL_THREAD_DETACH * ExitProcess terminates all other threads withouth notifying DLL_THREAD_DETACH (TLS of those thread aren't destructed) * ExitProcess sends DLL_PROCESS_DETACH (TLS of the thread are destructed)
Fujii Hironori
Comment 28
2019-11-18 03:05:13 PST
(In reply to Fujii Hironori from
comment #27
)
> Here are my understanding:
This seems wrong. Ignore it.
Fujii Hironori
Comment 29
2019-11-18 03:54:54 PST
Created
attachment 383737
[details]
screenshot of debugger I found a option to show call frames of external module in Visual Studio. This callstack looks better.
Fujii Hironori
Comment 30
2019-11-18 21:47:05 PST
(In reply to Yusuke Suzuki from
comment #13
)
> And I think implementing TLS in WTF::Thread for Windows is nice idea.
Filed:
Bug 204341
– [Win] Implement WTF::ThreadSpecific in WTF::Thread
Fujii Hironori
Comment 31
2020-01-07 22:56:15 PST
r252687
removed threadMapMutex. I don't see any issue now. Closed.
Yusuke Suzuki
Comment 32
2020-01-08 11:23:28 PST
(In reply to Fujii Hironori from
comment #31
)
>
r252687
removed threadMapMutex. I don't see any issue now. Closed.
That’s nice. So we could run JSC tests on Windows, right?
Fujii Hironori
Comment 33
2020-01-08 17:38:13 PST
Oh, I just meant I don't observe any other deadlock-during-shutdown issues being described in the article (
Comment 27
). JSC Windows port still has the suspended thread issue (
Bug 191285
).
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug