Bug 174081

Summary: [WTF] Implement WTF::ThreadGroup
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: bburg, buildbot, commit-queue, jlewis3, mark.lam, rniwa
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=174262
Bug Depends on: 173975, 174265    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Archive of layout-test-results from ews112 for mac-elcapitan
none
Patch
none
Patch
none
Archive of layout-test-results from ews116 for mac-elcapitan
none
Archive of layout-test-results from ews104 for mac-elcapitan-wk2
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch mark.lam: review+

Description Yusuke Suzuki 2017-07-02 18:10:40 PDT
Currently, JSC has MachineThreads, which is a thread group.
However, it's implementation is a bit hacky: removing a dead thread from this group depends on per-MachineThreads' TLS destructor.
But it is not good. It exhausts so many TLS if we have many MachineThreads, which is not considered in Windows environment right now.

Let's move this to WTF.
Comment 1 Yusuke Suzuki 2017-07-03 02:45:44 PDT
Created attachment 314468 [details]
Patch

WIP
Comment 2 Yusuke Suzuki 2017-07-03 03:09:06 PDT
(In reply to Yusuke Suzuki from comment #0)
> Currently, JSC has MachineThreads, which is a thread group.
> However, it's implementation is a bit hacky: removing a dead thread from
> this group depends on per-MachineThreads' TLS destructor.
> But it is not good. It exhausts so many TLS if we have many MachineThreads,
> which is not considered in Windows environment right now.
> 
> Let's move this to WTF.

I guess this hacky mechanism causes some errors which is investigated in bug 173427.
Comment 3 Yusuke Suzuki 2017-07-03 04:16:05 PDT
Created attachment 314471 [details]
Patch
Comment 4 Yusuke Suzuki 2017-07-03 08:05:20 PDT
Created attachment 314483 [details]
Patch
Comment 5 Build Bot 2017-07-03 09:31:15 PDT
Comment on attachment 314483 [details]
Patch

Attachment 314483 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4044852

Number of test failures exceeded the failure limit.
Comment 6 Build Bot 2017-07-03 09:31:16 PDT
Created attachment 314489 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 7 Yusuke Suzuki 2017-07-03 09:38:11 PDT
Created attachment 314490 [details]
Patch
Comment 8 Yusuke Suzuki 2017-07-03 09:47:02 PDT
Created attachment 314492 [details]
Patch
Comment 9 Build Bot 2017-07-03 11:14:04 PDT
Comment on attachment 314492 [details]
Patch

Attachment 314492 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4045529

Number of test failures exceeded the failure limit.
Comment 10 Build Bot 2017-07-03 11:14:06 PDT
Created attachment 314499 [details]
Archive of layout-test-results from ews116 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews116  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 11 Build Bot 2017-07-03 11:45:42 PDT
Comment on attachment 314492 [details]
Patch

Attachment 314492 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/4045734

New failing tests:
workers/bomb.html
Comment 12 Build Bot 2017-07-03 11:45:43 PDT
Created attachment 314502 [details]
Archive of layout-test-results from ews104 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 13 Yusuke Suzuki 2017-07-03 11:46:40 PDT
Created attachment 314503 [details]
Patch
Comment 14 Mark Lam 2017-07-05 17:59:23 PDT
Comment on attachment 314503 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314503&action=review

Some comments for now while I continue to review this.  Also, please re-land the patch for https://bugs.webkit.org/show_bug.cgi?id=173975, and re-base after that.

> Source/JavaScriptCore/ChangeLog:3
> +        [WTF] Implemet WTF::ThreadGroup

typo: /Implemet/Implement/.

> Source/WTF/ChangeLog:26
> +        If we take a design having two fine grained in WTF::Thread and WTF::ThreadGroup, we easily encounter

I think you meant to say "using two fine grain locks" instead of "having two fine grained".

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:156
> +        unsigned index = 0;
> +        for (auto* thread : threads) {
> +            if (*thread != currentThread) {
> +                auto result = thread->suspend();
> +                if (result)
> +                    isSuspended.set(index);

Here's you're relying on the iteration order of the HashSet being deterministic (in the same order) with the assumption that locking the threadGroup will ensure that this is so.  While locking the threadGroup is a necessary condition, my gut feeling says that this is a brittle assumption base on how HashSet is implemented.  It'll probably be ok as it is unlikely that HashSet will be re-implemented in anyway that will break this assumption, but I wonder if we can make / should this less brittle.

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:159
> +                    // These threads will be removed from the ThreadGroup. Thus, we do not anything here except for reporting.

typo: /do not anything/do not do anything/.

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:174
> +            if (*thread != currentThread) {

You don't need this check because the currentThread is guaranteed to not be in the isSuspended set.

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:185
> +            if (*thread != currentThread) {

Ditto.  You can remove this check.
Comment 15 Yusuke Suzuki 2017-07-05 19:46:08 PDT
Comment on attachment 314503 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314503&action=review

Thanks!

>> Source/JavaScriptCore/ChangeLog:3
>> +        [WTF] Implemet WTF::ThreadGroup
> 
> typo: /Implemet/Implement/.

Oops, fixed.

>> Source/WTF/ChangeLog:26
>> +        If we take a design having two fine grained in WTF::Thread and WTF::ThreadGroup, we easily encounter
> 
> I think you meant to say "using two fine grain locks" instead of "having two fine grained".

Yeah, thanks. Fixed.

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:156
>> +                    isSuspended.set(index);
> 
> Here's you're relying on the iteration order of the HashSet being deterministic (in the same order) with the assumption that locking the threadGroup will ensure that this is so.  While locking the threadGroup is a necessary condition, my gut feeling says that this is a brittle assumption base on how HashSet is implemented.  It'll probably be ok as it is unlikely that HashSet will be re-implemented in anyway that will break this assumption, but I wonder if we can make / should this less brittle.

I think we should use ListHashSet here! It ensures that the order of the iteration is insertion order. And it means that order is not changed during multiple iterations if we do not modify this set. Fixed.

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:159
>> +                    // These threads will be removed from the ThreadGroup. Thus, we do not anything here except for reporting.
> 
> typo: /do not anything/do not do anything/.

Thanks, fixed.

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:174
>> +            if (*thread != currentThread) {
> 
> You don't need this check because the currentThread is guaranteed to not be in the isSuspended set.

Ah, right. Fixed.

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:185
>> +            if (*thread != currentThread) {
> 
> Ditto.  You can remove this check.

Fixed.
Comment 16 Yusuke Suzuki 2017-07-05 19:46:52 PDT
Created attachment 314685 [details]
Patch
Comment 17 Mark Lam 2017-07-06 14:19:24 PDT
Comment on attachment 314685 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314685&action=review

r=me with issues addressed.

> Source/WTF/wtf/ThreadGroup.cpp:61
> +bool ThreadGroup::add(Thread& thread)
> +{
> +    auto destructionLocker = holdLock(destructionMutex());
> +    auto locker = holdLock(m_lock);
> +    if (!thread.canAddToThreadGroup(destructionLocker))
> +        return false;
> +    if (m_threads.add(&thread).isNewEntry)
> +        thread.addToThreadGroup(destructionLocker, *this);
> +    return true;
> +}
> +
> +bool ThreadGroup::addCurrentThread()
> +{
> +    return add(Thread::current());
> +}

Do we really need an API for add() other than addCurrentThread()?  I think this logic is correct, but if we don't need add(), we can reduce the complexity by only implementing addCurrentThread(), and removing all the canAddToThreadGroup logic.  I'm ok with keeping add() too if you think we may need it soon.

Regardless, addCurrentThread() should always succeed.  You can assert that instead of returning a boolean.

> Source/WTF/wtf/ThreadGroup.h:56
> +    void removeCurrentThread(const AbstractLocker&, Thread&);

Since there are 2 different mutex at play in this code, I recommend naming the AbstractLocker that should be passed in e.g. destructionMutexLocker (or destructionLocker to be consistent with how you name it elsewhere ... I prefer destructionMutexLocker because it explicitly spells out which mutex we want locked).  I think this documents the code's intent better.

> Source/WTF/wtf/Threading.cpp:152
> +        m_canAddToThreadGroup = false;

This flag feels redundant because we have m_didExit = true above.  The only reason we aren't allowed to add the thread is because it's exiting, right?  Or is there another reason?

One option is you can remove the m_canAddToThreadGroup flag and just have ThreadGroup::add() check hasExited() instead.  However, will need to hold the destructionMutex in the  block above which sets m_didExit.  I think Thread::m_mutex is never held while calling back to higher level functions.  Hence, you can just hoist the destructionLocker to the top of this function.

> Source/WTF/wtf/Threading.cpp:158
> +    ASSERT_WITH_MESSAGE(Thread::current() == *this, "This function is only called from the current thread itself.");

nit: maybe rephrase the error message as "This function should only be called from the current thread itself."  If the assertion fails, then clearly, this function isn't called from the current thread and contradicts the error message.

> Source/WTF/wtf/Threading.h:185
> +    bool canAddToThreadGroup(const AbstractLocker&) { return m_canAddToThreadGroup; }
> +    void addToThreadGroup(const AbstractLocker&, ThreadGroup&);
> +    void removeFromThreadGroup(const AbstractLocker&, ThreadGroup&);

I also recommend naming the AbstractLocker& arguments here as destructionMutexLocker.

> Tools/TestWebKitAPI/Tests/WTF/ThreadGroup.cpp:58
> +            EXPECT_EQ(threadGroup->add(*thread), true);
> +            threads.append(thread);

You tested ThreadGroup::add() (adding from another thread) here but didn't test addCurrentThread() which is actually used in our code base.  I think you should add a case that tests addCurrentThread().  Maybe factor this test out and parameterized it to add the thread in the 1 of the 2 ways.
Comment 18 Yusuke Suzuki 2017-07-06 20:01:09 PDT
Comment on attachment 314685 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314685&action=review

Thanks!

>> Source/WTF/wtf/ThreadGroup.cpp:61
>> +}
> 
> Do we really need an API for add() other than addCurrentThread()?  I think this logic is correct, but if we don't need add(), we can reduce the complexity by only implementing addCurrentThread(), and removing all the canAddToThreadGroup logic.  I'm ok with keeping add() too if you think we may need it soon.
> 
> Regardless, addCurrentThread() should always succeed.  You can assert that instead of returning a boolean.

After refactoring, this add API just reuses `m_didExit`. So it does not cause much complication, and it makes tests easier.
So, I think having this API would be nice.

For addCurrentThread() case, I agree to you that we should make this `void` and insert assertion. Changed.

>> Source/WTF/wtf/ThreadGroup.h:56
>> +    void removeCurrentThread(const AbstractLocker&, Thread&);
> 
> Since there are 2 different mutex at play in this code, I recommend naming the AbstractLocker that should be passed in e.g. destructionMutexLocker (or destructionLocker to be consistent with how you name it elsewhere ... I prefer destructionMutexLocker because it explicitly spells out which mutex we want locked).  I think this documents the code's intent better.

OK, I picked "destructionMutexLocker".

>> Source/WTF/wtf/Threading.cpp:152
>> +        m_canAddToThreadGroup = false;
> 
> This flag feels redundant because we have m_didExit = true above.  The only reason we aren't allowed to add the thread is because it's exiting, right?  Or is there another reason?
> 
> One option is you can remove the m_canAddToThreadGroup flag and just have ThreadGroup::add() check hasExited() instead.  However, will need to hold the destructionMutex in the  block above which sets m_didExit.  I think Thread::m_mutex is never held while calling back to higher level functions.  Hence, you can just hoist the destructionLocker to the top of this function.

Right. We should reuse m_didExit. I changed it to your design and reuse m_didExit.

>> Source/WTF/wtf/Threading.cpp:158
>> +    ASSERT_WITH_MESSAGE(Thread::current() == *this, "This function is only called from the current thread itself.");
> 
> nit: maybe rephrase the error message as "This function should only be called from the current thread itself."  If the assertion fails, then clearly, this function isn't called from the current thread and contradicts the error message.

Nice. Changed.

>> Source/WTF/wtf/Threading.h:185
>> +    void removeFromThreadGroup(const AbstractLocker&, ThreadGroup&);
> 
> I also recommend naming the AbstractLocker& arguments here as destructionMutexLocker.

Sounds fine. Changed.

>> Tools/TestWebKitAPI/Tests/WTF/ThreadGroup.cpp:58
>> +            threads.append(thread);
> 
> You tested ThreadGroup::add() (adding from another thread) here but didn't test addCurrentThread() which is actually used in our code base.  I think you should add a case that tests addCurrentThread().  Maybe factor this test out and parameterized it to add the thread in the 1 of the 2 ways.

OK, changed.
Comment 19 Yusuke Suzuki 2017-07-06 21:56:32 PDT
Committed r219238: <http://trac.webkit.org/changeset/219238>
Comment 20 Yusuke Suzuki 2017-07-06 22:08:27 PDT
Committed r219239: <http://trac.webkit.org/changeset/219239>
Comment 21 Yusuke Suzuki 2017-07-06 22:19:02 PDT
Committed r219241: <http://trac.webkit.org/changeset/219241>
Comment 22 Matt Lewis 2017-07-07 10:38:43 PDT
Looks like the revision and supplementary fix seem to have caused fast/workers/dedicated-worker-lifecycle.html
to become a flaky timeout.
Comment 23 Yusuke Suzuki 2017-07-07 11:25:15 PDT
(In reply to Matt Lewis from comment #22)
> Looks like the revision and supplementary fix seem to have caused
> fast/workers/dedicated-worker-lifecycle.html
> to become a flaky timeout.

Let's roll out it now and investigate it. I'm now preparing rolling out.
Comment 24 WebKit Commit Bot 2017-07-07 11:26:24 PDT
Re-opened since this is blocked by bug 174265
Comment 25 Mark Lam 2017-07-07 13:16:26 PDT
For the record, this patch was rolled out in r219260: <http://trac.webkit.org/changeset/219260> for https://bugs.webkit.org/show_bug.cgi?id=174265.
Comment 26 Yusuke Suzuki 2017-07-08 00:05:32 PDT
(In reply to Mark Lam from comment #25)
> For the record, this patch was rolled out in r219260:
> <http://trac.webkit.org/changeset/219260> for
> https://bugs.webkit.org/show_bug.cgi?id=174265.

Interestingly, this flakiness does not happen in GTK port.
I guess this is due to std::mutex's slowness.
Since ThreadGroup and Thread can hold lock during deallocating TLS, we use std::mutex instead of WTF::Lock, but it may be slow.
Previously, we used WTF::Lock. But it is wrong.
Comment 27 Yusuke Suzuki 2017-07-08 00:06:28 PDT
(In reply to Yusuke Suzuki from comment #26)
> (In reply to Mark Lam from comment #25)
> > For the record, this patch was rolled out in r219260:
> > <http://trac.webkit.org/changeset/219260> for
> > https://bugs.webkit.org/show_bug.cgi?id=174265.
> 
> Interestingly, this flakiness does not happen in GTK port.
> I guess this is due to std::mutex's slowness.
> Since ThreadGroup and Thread can hold lock during deallocating TLS, we use
> std::mutex instead of WTF::Lock, but it may be slow.
> Previously, we used WTF::Lock. But it is wrong.

And I've reproduced this slowness in macOS debug build.
Comment 28 Yusuke Suzuki 2017-07-08 01:08:36 PDT
(In reply to Yusuke Suzuki from comment #27)
> (In reply to Yusuke Suzuki from comment #26)
> > (In reply to Mark Lam from comment #25)
> > > For the record, this patch was rolled out in r219260:
> > > <http://trac.webkit.org/changeset/219260> for
> > > https://bugs.webkit.org/show_bug.cgi?id=174265.
> > 
> > Interestingly, this flakiness does not happen in GTK port.
> > I guess this is due to std::mutex's slowness.
> > Since ThreadGroup and Thread can hold lock during deallocating TLS, we use
> > std::mutex instead of WTF::Lock, but it may be slow.
> > Previously, we used WTF::Lock. But it is wrong.
> 
> And I've reproduced this slowness in macOS debug build.

OK, I think I found the issue.
When stopping Thread, we hold a global destruction lock.
At that time, if the thread is suspended, it holds a global lock super long time! It becomes bottleneck of destroying so many threads like this test.

One super simple way to avoid this is just holding destructionLock in suspend() and resume() too. But still I'm considering the way to alleviate this situation.
Comment 29 Yusuke Suzuki 2017-07-08 02:57:52 PDT
I'm now reconstructing this patch with a bit more fancy design.
Comment 30 Yusuke Suzuki 2017-07-08 03:24:14 PDT
Created attachment 314917 [details]
Patch
Comment 31 Yusuke Suzuki 2017-07-08 03:26:06 PDT
Comment on attachment 314917 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314917&action=review

> Source/WTF/wtf/Threading.cpp:167
> +    if (shouldRemoveThreadFromThreadGroup()) {
> +        Vector<std::shared_ptr<ThreadGroup>> threadGroups;
> +        {
> +            std::lock_guard<std::mutex> locker(m_mutex);
> +            for (auto& threadGroup : m_threadGroups) {
> +                // If ThreadGroup is just being destroyed,
> +                // we do not need to perform unregistering.
> +                if (auto retained = threadGroup.lock())
> +                    threadGroups.append(WTFMove(retained));
> +            }
> +            m_canAddToThreadGroup = false;
> +        }
> +        for (auto& threadGroup : threadGroups) {
> +            std::lock_guard<std::mutex> threadGroupLocker(threadGroup->getLock());
> +            std::lock_guard<std::mutex> locker(m_mutex);
> +            threadGroup->m_threads.remove(this);
> +        }
> +    }
> +    // We would like to say "thread is exited" after unregistering threads from thread groups.
> +    // So we need to separate m_canAddToThreadGroup from m_didExit.

This is the most fancy part. We use std::weak_ptr<>::lock to get std::shared_ptr.
After getting them, even if we release TH's lock, ThreadGroup does not die since we retain them with std::shared_ptr.

We intentionally use std::shared_ptr. Since WeakPtr in WTF is not thread safe.
Comment 32 Yusuke Suzuki 2017-07-08 06:40:23 PDT
Comment on attachment 314917 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314917&action=review

>> Source/WTF/wtf/Threading.cpp:167
>> +    // So we need to separate m_canAddToThreadGroup from m_didExit.
> 
> This is the most fancy part. We use std::weak_ptr<>::lock to get std::shared_ptr.
> After getting them, even if we release TH's lock, ThreadGroup does not die since we retain them with std::shared_ptr.
> 
> We intentionally use std::shared_ptr. Since WeakPtr in WTF is not thread safe.

The key insight behind this code is that we do not need to unregister threads from thread group when the thread group is dying ;)
It means that we do not need to guard this didExit execution from destructing thread groups! We attempt to retain thread groups by using weak_ptr::lock. But if it returns nullptr, ok, let's just ignore that!
Comment 33 Yusuke Suzuki 2017-07-08 07:36:05 PDT
Comment on attachment 314917 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314917&action=review

> Source/WTF/wtf/ThreadGroup.h:64
> +};

I'll change this to ListHashSet<Ref<Thread>> to keep threads while ThreadGroup is dying.
Thread may ignore ThreadGroup unregistration if the thread group is just dying.
In that case, if it is Thread*, Thread can die faster than ThreadGroup which iterating these threads in its destructor.
Comment 34 Yusuke Suzuki 2017-07-08 08:27:58 PDT
Created attachment 314919 [details]
Patch
Comment 35 Yusuke Suzuki 2017-07-08 08:29:49 PDT
Comment on attachment 314919 [details]
Patch

Still investigating.
Comment 36 Yusuke Suzuki 2017-07-08 08:59:15 PDT
(In reply to Yusuke Suzuki from comment #35)
> Comment on attachment 314919 [details]
> Patch
> 
> Still investigating.

Ah, ok. I find the another issue.
When suspending and resuming, we just take the lock of the target thread.
It is wrong. Consider the case, like, the target thread is executing didExit.
Inside it, the target thread holds the lock of the other thread!
If this thread is suspended. And after that, if the issuer attempts to take the lock of the thread which lock is held by the suspended thread, it causes dead lock.
Comment 37 Yusuke Suzuki 2017-07-08 09:24:52 PDT
Created attachment 314920 [details]
Patch
Comment 38 Yusuke Suzuki 2017-07-08 09:29:41 PDT
Comment on attachment 314920 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314920&action=review

> Source/WTF/ChangeLog:46
> +        If we use per-thread lock, the suspended thread can hold the lock of the other threads. It causes dead lock.

This is the main reason why the test is timing out.
Comment 39 Yusuke Suzuki 2017-07-08 10:26:58 PDT
Created attachment 314922 [details]
Patch
Comment 40 Mark Lam 2017-07-09 09:31:15 PDT
Comment on attachment 314922 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314922&action=review

A question and comment while continue reading.

> Source/WTF/wtf/Threading.h:184
> +    bool addToThreadGroup(const std::lock_guard<std::mutex>& threadGroupLocker, ThreadGroup&);
> +    void removeFromThreadGroup(const std::lock_guard<std::mutex>& threadGroupLocker, ThreadGroup&);

Again, I recommend giving a name to the lock_guard (since we're dealing with 2 mutexes) to better document which mutex we intend to have locked on entry here.  I suggest "threadGroupLockGuard" or "threadGroupLocker".

> Source/WTF/wtf/ThreadingPthreads.cpp:431
> -    std::lock_guard<std::mutex> locker(m_mutex);
> +    LockHolder locker(globalSuspendLock);

Why is this change needed?
Comment 41 Yusuke Suzuki 2017-07-09 14:44:56 PDT
Comment on attachment 314922 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314922&action=review

> Source/WTF/ChangeLog:46
> +        If we use per-thread lock, the suspended thread can hold the lock of the other threads. It causes dead lock.

This is why I changed the suspending and resuming locks.
When suspending and resuming, we just take the lock of the target thread.
It is wrong. Consider the case, like, the target thread is executing didExit (of course, other operations taking m_mutex is the same).
Inside it, the target thread holds the lock of the other thread.
If this thread is suspended. And after that, if the issuer attempts to take the lock of the thread which lock is held by the suspended thread, it causes dead lock.

>> Source/WTF/wtf/ThreadingPthreads.cpp:431
>> +    LockHolder locker(globalSuspendLock);
> 
> Why is this change needed?

This is important to fix the dead lock observed in LayoutTests.
See ChangeLog for the details :)
Comment 42 Yusuke Suzuki 2017-07-09 19:39:21 PDT
Comment on attachment 314922 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314922&action=review

> Source/WTF/wtf/Threading.cpp:172
> +bool Thread::addToThreadGroup(const std::lock_guard<std::mutex>&, ThreadGroup& threadGroup)

Maybe, in C++ side?
I've just add the name. And use UNUSED_PARAM to supress compiler warnings with unused parameters.

>> Source/WTF/wtf/Threading.h:184
>> +    void removeFromThreadGroup(const std::lock_guard<std::mutex>& threadGroupLocker, ThreadGroup&);
> 
> Again, I recommend giving a name to the lock_guard (since we're dealing with 2 mutexes) to better document which mutex we intend to have locked on entry here.  I suggest "threadGroupLockGuard" or "threadGroupLocker".

Oh, I think this lock_guard is named as `threadGroupLocker`.
Comment 43 Yusuke Suzuki 2017-07-09 19:44:14 PDT
Created attachment 314961 [details]
Patch
Comment 44 Mark Lam 2017-07-11 15:40:21 PDT
Comment on attachment 314961 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314961&action=review

> Source/WTF/wtf/ThreadGroup.h:53
> +    ThreadGroup() = default;

Any reason why the constructor is now public?

> Source/WTF/wtf/Threading.cpp:156
> +                if (auto retained = threadGroup.lock())
> +                    threadGroups.append(WTFMove(retained));

How does std::enable_shared_from_this work?  It is ref counted, and is it thread safe?  Is there a race here with invoking threadGroup.lock() from this Thread which may be a different thread than then one that the VM (and therefore ThreadGroup) is running on?  I didn't see anything about thread safety in http://en.cppreference.com/w/cpp/memory/enable_shared_from_this and http://en.cppreference.com/w/cpp/memory/weak_ptr.

> Source/WTF/wtf/Threading.cpp:188
> +    if (!m_canAddToThreadGroup)
> +        return;

This reads a bit weird that removeFromThreadGroup() is dependent on m_canAddToThreadGroup.  This checking of m_canAddToThreadGroup here made me feel that it needs a better name, which led me to the conclusion that m_canAddToThreadGroup is the same as m_didExit.

I suggest inverting m_canAddToThreadGroup and renaming it to m_isShuttingDown or m_isDetachingFromThreadGroups (or something like that).  I think it can be combined with m_didExit, but we can leave that for another patch at a later time as it involves more refactoring to make it work.  m_didExit doesn't work perfectly as a name either.
Comment 45 Mark Lam 2017-07-11 16:07:44 PDT
Comment on attachment 314961 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314961&action=review

r=me with remaining issues addressed.

>> Source/WTF/wtf/Threading.cpp:156
>> +                    threadGroups.append(WTFMove(retained));
> 
> How does std::enable_shared_from_this work?  It is ref counted, and is it thread safe?  Is there a race here with invoking threadGroup.lock() from this Thread which may be a different thread than then one that the VM (and therefore ThreadGroup) is running on?  I didn't see anything about thread safety in http://en.cppreference.com/w/cpp/memory/enable_shared_from_this and http://en.cppreference.com/w/cpp/memory/weak_ptr.

I looked into the implementation of std::enable_shared_from_this.  It uses atomics, and is therefore thread-safe.  No issue here.
Comment 46 Yusuke Suzuki 2017-07-19 01:21:35 PDT
Comment on attachment 314961 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314961&action=review

>> Source/WTF/wtf/ThreadGroup.h:53
>> +    ThreadGroup() = default;
> 
> Any reason why the constructor is now public?

This is due to make_shared. Since we do not know the implementation of make_shared, we cannot use `friend std::make_shared` way (if make_shared calls something like make_shared_impl and constructs ThreadGroup inside it, this friend function specifier does not avoid compile errors.)

>> Source/WTF/wtf/Threading.cpp:188
>> +        return;
> 
> This reads a bit weird that removeFromThreadGroup() is dependent on m_canAddToThreadGroup.  This checking of m_canAddToThreadGroup here made me feel that it needs a better name, which led me to the conclusion that m_canAddToThreadGroup is the same as m_didExit.
> 
> I suggest inverting m_canAddToThreadGroup and renaming it to m_isShuttingDown or m_isDetachingFromThreadGroups (or something like that).  I think it can be combined with m_didExit, but we can leave that for another patch at a later time as it involves more refactoring to make it work.  m_didExit doesn't work perfectly as a name either.

OK, let's use `m_isShuttingDown`. And, yeah, I think combining these flags should be done in a separate refactoring patch.
Comment 47 Yusuke Suzuki 2017-07-19 01:44:04 PDT
Committed r219653: <http://trac.webkit.org/changeset/219653>