Bug 169819 - [JSC] MachineThreads does not consider situation that one thread has multiple VMs
Summary: [JSC] MachineThreads does not consider situation that one thread has multiple...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-17 11:04 PDT by Yusuke Suzuki
Modified: 2017-03-23 14:55 PDT (History)
12 users (show)

See Also:


Attachments
Patch (17.58 KB, patch)
2017-03-17 11:04 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (18.77 KB, patch)
2017-03-17 11:11 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (1.00 MB, application/zip)
2017-03-17 12:19 PDT, Build Bot
no flags Details
Patch (19.16 KB, patch)
2017-03-18 07:28 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (19.13 KB, patch)
2017-03-21 14:47 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (17.41 KB, patch)
2017-03-23 08:17 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (17.74 KB, patch)
2017-03-23 14:03 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2017-03-17 11:04:02 PDT
[JSC] MachineThreads does not consider situation that one thread has multiple VMs
Comment 1 Yusuke Suzuki 2017-03-17 11:04:20 PDT
Created attachment 304797 [details]
Patch
Comment 2 Yusuke Suzuki 2017-03-17 11:11:10 PDT
Created attachment 304800 [details]
Patch
Comment 3 Build Bot 2017-03-17 12:19:39 PDT
Comment on attachment 304800 [details]
Patch

Attachment 304800 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/3350965

New failing tests:
webrtc/peer-connection-audio-mute2.html
Comment 4 Build Bot 2017-03-17 12:19:43 PDT
Created attachment 304808 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 5 Yusuke Suzuki 2017-03-18 02:37:17 PDT
I think this patch should include futher changes: Start suspending threads in predefined manner to avoid deadlock.
Comment 6 Yusuke Suzuki 2017-03-18 07:28:11 PDT
Created attachment 304867 [details]
Patch
Comment 7 Yusuke Suzuki 2017-03-18 07:28:37 PDT
(In reply to comment #5)
> I think this patch should include futher changes: Start suspending threads
> in predefined manner to avoid deadlock.

As a first step, this patch just separates ThreadData from Thread.
Comment 8 Yusuke Suzuki 2017-03-21 14:47:11 PDT
Created attachment 305030 [details]
Patch
Comment 9 Yusuke Suzuki 2017-03-22 10:59:02 PDT
Ping?
Comment 10 Mark Lam 2017-03-22 21:43:19 PDT
Comment on attachment 305030 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305030&action=review

> Source/JavaScriptCore/ChangeLog:10
> +        While MachineThreads::Thread is created for every MachineThreads. Thus, one thread
> +        can have multiple MachineThreads::Thread if the thread has multiple VMs.
> +        It actually causes deadlock in GTK port DatabaseProcess.

I think I understand what the issue is now.  It's not so much that every VM allocates its own MachineThreads::Thread per thread it runs on, but that the Linux implementation of thread suspend/resume relies on having a thread specific singleton that houses the semaphore needed to communicate with the originating thread that requested the suspend/resume.  As a result, when there's potentially more than one VM running on the same thread, MachineThreads::Thread cannot serve as that thread specific singleton that the Linux port needs.  Hence, you're factoring out the ThreadData part as the thread specific singleton.  Did I understand the issue correctly?

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:73
> -thread_local static std::atomic<JSC::MachineThreads::Thread*> threadLocalCurrentThread;
> +static std::atomic<JSC::MachineThreads::ThreadData*> globalThreadDataToSuspend { nullptr };

Why is this needed?  Shouldn't there only be 1 ThreadData per thread, and hence, you can use something like:
    thread_local static std::atomic<JSC::MachineThreads::ThreadData*> threadLocalCurrentThreadData;

... and revert the conversion to using globalThreadDataToSuspend below?

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:207
> +    static NeverDestroyed<ThreadSpecific<MachineThreads::ThreadData, CanBeGCThread::True>> threadData;
> +    return threadData.get();

I'm not too clear on the interaction of NeverDestroyed with ThreadSpecific here.  My concern is whether this will leak memory for every thread that is started and ended.  For example, let's say you have the following scenario:

1. Thread T1 creates a VM.
2. Thread T2 enters the VM, thereby creating a ThreadSpecific<ThreadData> for T2.
3. Thread T2 exits the VM, and terminates itself.
4. Thread T3 now enters the VM, and creates a ThreadSpecific<ThreadData> for T3.
5. Thread T3 exits the VM, and terminated itself.
6. Repeat steps 2 and 3 for thread T4, T5, etc. just like we did for thread T3 in steps 4 and 5.

On thread destruction, I know that ThreadSpecific will take care of destructing ThreadData and freeing its memory.
What I suspect is that NeverDestroyed will allocated a memory for ThreadSpecific, but not free it.  As a result, for every thread that is created and dies, we'll end up leaking sizeof(ThreadSpecific).

Can you test and confirm that we won't be leaking memory here?

> Source/JavaScriptCore/heap/MachineStackMarker.h:67
> +    class ThreadData {

This needs to be WTF_MAKE_FAST_ALLOCATED because ThreadSpecific::destroy() will call fastFree() on it.

> Source/JavaScriptCore/heap/MachineStackMarker.h:147
> +        bool suspend()
> +        {
> +            return data->suspend();
> +        }
> +        void resume()
> +        {
> +            data->resume();
> +        }
> +        size_t getRegisters(Registers& regs)
> +        {
> +            return data->getRegisters(regs);
> +        }
> +        void freeRegisters(Registers& regs)
> +        {
> +            data->freeRegisters(regs);
> +        }
> +        std::pair<void*, size_t> captureStack(void* stackTop)
> +        {
> +            return data->captureStack(stackTop);
> +        }

nit: since these only have a single statement in their body, you can make them all single line just like the other methods on Thread.
Comment 11 Yusuke Suzuki 2017-03-23 06:04:31 PDT
Comment on attachment 305030 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305030&action=review

>> Source/JavaScriptCore/ChangeLog:10
>> +        It actually causes deadlock in GTK port DatabaseProcess.
> 
> I think I understand what the issue is now.  It's not so much that every VM allocates its own MachineThreads::Thread per thread it runs on, but that the Linux implementation of thread suspend/resume relies on having a thread specific singleton that houses the semaphore needed to communicate with the originating thread that requested the suspend/resume.  As a result, when there's potentially more than one VM running on the same thread, MachineThreads::Thread cannot serve as that thread specific singleton that the Linux port needs.  Hence, you're factoring out the ThreadData part as the thread specific singleton.  Did I understand the issue correctly?

Right.

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:207
>> +    return threadData.get();
> 
> I'm not too clear on the interaction of NeverDestroyed with ThreadSpecific here.  My concern is whether this will leak memory for every thread that is started and ended.  For example, let's say you have the following scenario:
> 
> 1. Thread T1 creates a VM.
> 2. Thread T2 enters the VM, thereby creating a ThreadSpecific<ThreadData> for T2.
> 3. Thread T2 exits the VM, and terminates itself.
> 4. Thread T3 now enters the VM, and creates a ThreadSpecific<ThreadData> for T3.
> 5. Thread T3 exits the VM, and terminated itself.
> 6. Repeat steps 2 and 3 for thread T4, T5, etc. just like we did for thread T3 in steps 4 and 5.
> 
> On thread destruction, I know that ThreadSpecific will take care of destructing ThreadData and freeing its memory.
> What I suspect is that NeverDestroyed will allocated a memory for ThreadSpecific, but not free it.  As a result, for every thread that is created and dies, we'll end up leaking sizeof(ThreadSpecific).
> 
> Can you test and confirm that we won't be leaking memory here?

Oooooops! Nice catch. Right, NeverDestroyed will not call the destructor.
I'll offer another way to allocate thread local storage for that, this should be something like PerThread in bmalloc.

>> Source/JavaScriptCore/heap/MachineStackMarker.h:67
>> +    class ThreadData {
> 
> This needs to be WTF_MAKE_FAST_ALLOCATED because ThreadSpecific::destroy() will call fastFree() on it.

Thanks! Fixed.

>> Source/JavaScriptCore/heap/MachineStackMarker.h:147
>> +        }
> 
> nit: since these only have a single statement in their body, you can make them all single line just like the other methods on Thread.

OK, fixed.
Comment 12 Yusuke Suzuki 2017-03-23 07:59:36 PDT
Comment on attachment 305030 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305030&action=review

>>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:207
>>> +    return threadData.get();
>> 
>> I'm not too clear on the interaction of NeverDestroyed with ThreadSpecific here.  My concern is whether this will leak memory for every thread that is started and ended.  For example, let's say you have the following scenario:
>> 
>> 1. Thread T1 creates a VM.
>> 2. Thread T2 enters the VM, thereby creating a ThreadSpecific<ThreadData> for T2.
>> 3. Thread T2 exits the VM, and terminates itself.
>> 4. Thread T3 now enters the VM, and creates a ThreadSpecific<ThreadData> for T3.
>> 5. Thread T3 exits the VM, and terminated itself.
>> 6. Repeat steps 2 and 3 for thread T4, T5, etc. just like we did for thread T3 in steps 4 and 5.
>> 
>> On thread destruction, I know that ThreadSpecific will take care of destructing ThreadData and freeing its memory.
>> What I suspect is that NeverDestroyed will allocated a memory for ThreadSpecific, but not free it.  As a result, for every thread that is created and dies, we'll end up leaking sizeof(ThreadSpecific).
>> 
>> Can you test and confirm that we won't be leaking memory here?
> 
> Oooooops! Nice catch. Right, NeverDestroyed will not call the destructor.
> I'll offer another way to allocate thread local storage for that, this should be something like PerThread in bmalloc.

Ah, no. NeverDestroyed<> is created as static. And its storage for ThreadSpecific itself is allocated per process (by static).
Thus, I think it does not cause any memory leaks.

static NeverDestroyed<ThreadSpecific<T>> t;

In the above case, ThreadSpecific<T>'s storage itself is statically allocated by NeverDestroyed<> per process.
And T()'s memory is allocated per thread in ThreadSpecific. And this allocated per thread memory should be deleted by the registered destructor by using pthread_key_create.
See ThreadSpecific<T>::destroy(void* ptr) function. It finally calls `delete ptr`.
Comment 13 Yusuke Suzuki 2017-03-23 08:06:36 PDT
Comment on attachment 305030 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305030&action=review

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:73
>> +static std::atomic<JSC::MachineThreads::ThreadData*> globalThreadDataToSuspend { nullptr };
> 
> Why is this needed?  Shouldn't there only be 1 ThreadData per thread, and hence, you can use something like:
>     thread_local static std::atomic<JSC::MachineThreads::ThreadData*> threadLocalCurrentThreadData;
> 
> ... and revert the conversion to using globalThreadDataToSuspend below?

I'm a bit worried about using thread_local inside the signal handler.
But seeing the wiki https://sourceware.org/glibc/wiki/TLSandSignals, it seems that thread_local is async signal safe.
So, we can just use thread_local. Reverting the changes about globalThreadDataToSuspend.
Comment 14 Yusuke Suzuki 2017-03-23 08:17:00 PDT
Created attachment 305196 [details]
Patch
Comment 15 Mark Lam 2017-03-23 10:07:21 PDT
Comment on attachment 305196 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305196&action=review

> Source/JavaScriptCore/ChangeLog:10
> +        While MachineThreads::Thread is created for every MachineThreads. Thus, one thread
> +        can have multiple MachineThreads::Thread if the thread has multiple VMs.
> +        It actually causes deadlock in GTK port DatabaseProcess.

I suggest adding a little bit of context here like this:

"The Linux port of PlatformThread suspend/resume mechanism relies on having a thread specific singleton thread data, and was relying on MachineThreads::Thread to be this thread specific singleton.  But because MachineThreads::Thread is not a thread specific singleton, we can get a deadlock in the GTK port's DatabaseProcess."

> Source/JavaScriptCore/ChangeLog:13
> +        As a first step, this patch simply moves per thread data from MachineThreads::Thread
> +        to MachineThreads::ThreadData.

This patch is no longer just a first step, right?  It actually solves the issue, right?  How about rephrasing this as:

"This patch fixes this issue by moving per thread data from MachineThreads::Thread to MachineThreads::ThreadData, where there will only be one instance of MachineThreads::ThreadData per thread.  Each MachineThreads::Thread will now point to the same MachineThreads::ThreadData for any given thread."

> Source/JavaScriptCore/heap/MachineStackMarker.cpp:207
> +    static NeverDestroyed<ThreadSpecific<MachineThreads::ThreadData, CanBeGCThread::True>> threadData;
> +    return threadData.get();

Looking at the implementation of ThreadSpecific<T>, it appears that it does not instantiate T until you invoke operator T*() on it.  Invoking get() returns null if it's not initialized.  Can you assert that threadData.get() is actually returning what you expect?

> Source/JavaScriptCore/heap/MachineStackMarker.h:141
> +    class Thread {
> +        WTF_MAKE_FAST_ALLOCATED;
> +        Thread(ThreadData*);
> +
> +    public:
> +        using Registers = ThreadData::Registers;
> +
> +        static Thread* createForCurrentThread();
> +
> +        bool operator==(const PlatformThread& other) const;
> +        bool operator!=(const PlatformThread& other) const { return !(*this == other); }
> +
> +        bool suspend() { return data->suspend(); }
> +        void resume() { data->resume(); }
> +        size_t getRegisters(Registers& regs) { return data->getRegisters(regs); }
> +        void freeRegisters(Registers& regs) { data->freeRegisters(regs); }
> +        std::pair<void*, size_t> captureStack(void* stackTop) { return data->captureStack(stackTop); }
> +
> +        const PlatformThread& platformThread() { return data->platformThread; }
> +        void* stackBase() const { return data->stackBase; }
> +        void* stackEnd() const { return data->stackEnd; }
> +
> +        Thread* next;
> +        ThreadData* data;
> +    };

Thinking about this some more, I realized that this Thread class now only serves one purpose: to be a SinglyLinkedList.  I was thinking you could get rid of it and just change MachineThread::m_registeredThreads to a WTF::SinglyLinkedList<ThreadData>, but found out that our WTF::SinglyLinkedList implementation does not yet support the iteration protocol.  I suppose we can beef up the SinglyLinkedList impl later, and refactor this to use this to use it.  The operator== and operator!= will need to move back into ThreadData.  I think that that would make the code cleaner.  You can even forego the renaming of MachineThreads::Thread to MachineThreads::ThreadData, and having to create all these wrapper functions.  I do like the ThreadData name better, and style-wise, I prefer accessor functions but that's just a refactoring that can be applied later if needed.

What do you think?  If it's too much work, I'm ok with landing this patch for now and filing a separate bug to use a SinglyLinkedList later.
Comment 16 Yusuke Suzuki 2017-03-23 13:59:21 PDT
Comment on attachment 305196 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305196&action=review

Thanks!

>> Source/JavaScriptCore/ChangeLog:10
>> +        It actually causes deadlock in GTK port DatabaseProcess.
> 
> I suggest adding a little bit of context here like this:
> 
> "The Linux port of PlatformThread suspend/resume mechanism relies on having a thread specific singleton thread data, and was relying on MachineThreads::Thread to be this thread specific singleton.  But because MachineThreads::Thread is not a thread specific singleton, we can get a deadlock in the GTK port's DatabaseProcess."

OK, fixed.

>> Source/JavaScriptCore/ChangeLog:13
>> +        to MachineThreads::ThreadData.
> 
> This patch is no longer just a first step, right?  It actually solves the issue, right?  How about rephrasing this as:
> 
> "This patch fixes this issue by moving per thread data from MachineThreads::Thread to MachineThreads::ThreadData, where there will only be one instance of MachineThreads::ThreadData per thread.  Each MachineThreads::Thread will now point to the same MachineThreads::ThreadData for any given thread."

Sounds nice! Fixed.

>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:207
>> +    return threadData.get();
> 
> Looking at the implementation of ThreadSpecific<T>, it appears that it does not instantiate T until you invoke operator T*() on it.  Invoking get() returns null if it's not initialized.  Can you assert that threadData.get() is actually returning what you expect?

No. This get() is NeverDestoyed<T>::get(). It returns ThreadSpecific<>&.
Then, ThreadSpecific<T>::operator T* is invoked here, which includes initializing logic.
And ThreadSpecific<T>::get() is private function. Thus it cannot be accessed from outside of ThreadSpecific<>.

But, inserting assertion is good idea. I've added ASSERT to MachineThreads::Thread constructor.

>> Source/JavaScriptCore/heap/MachineStackMarker.h:141
>> +    };
> 
> Thinking about this some more, I realized that this Thread class now only serves one purpose: to be a SinglyLinkedList.  I was thinking you could get rid of it and just change MachineThread::m_registeredThreads to a WTF::SinglyLinkedList<ThreadData>, but found out that our WTF::SinglyLinkedList implementation does not yet support the iteration protocol.  I suppose we can beef up the SinglyLinkedList impl later, and refactor this to use this to use it.  The operator== and operator!= will need to move back into ThreadData.  I think that that would make the code cleaner.  You can even forego the renaming of MachineThreads::Thread to MachineThreads::ThreadData, and having to create all these wrapper functions.  I do like the ThreadData name better, and style-wise, I prefer accessor functions but that's just a refactoring that can be applied later if needed.
> 
> What do you think?  If it's too much work, I'm ok with landing this patch for now and filing a separate bug to use a SinglyLinkedList later.

Yeah. While using SinglyLinkedList is nice, I think we still need Thread separeted from ThreadData.
Consider the situation,

There are two VMs A and B. And there are two user threads i, ii, and iii.

i use A. And later use B (with JSLock).
ii use A. And later use B.
iii use A.

In that case, A's thread list should include i, ii, iii while B's thread list should include i and ii.
If we move `Thread* next` to ThreadData, it breaks the above situation.

Several clean up can be considered.

1. Creating SinglyListNode.

Define the class like, class SinglyListNode<ThreadData>;
It has operator-> to access ThreadData's accessor. And SinglyListNode adds m_next feature.
And SinglyList<ThreadData> will be managed by this node. BTW, I think this is the same to the STL's std::list<ThreadData>.

2. Introduce IntrusiveList.

Create the IntrusiveList class. And use it for the Thread. Like, Thread : public IntrusiveListNode<Thread>.
I think it's a bit more generic approach. And maybe, it is useful.

BTW, I think the above changes require some code addition to WTF. So I like doing this in the separate patch :)
Comment 17 Yusuke Suzuki 2017-03-23 14:03:16 PDT
Created attachment 305225 [details]
Patch
Comment 18 Mark Lam 2017-03-23 14:04:44 PDT
Comment on attachment 305196 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=305196&action=review

>>> Source/JavaScriptCore/heap/MachineStackMarker.cpp:207
>>> +    return threadData.get();
>> 
>> Looking at the implementation of ThreadSpecific<T>, it appears that it does not instantiate T until you invoke operator T*() on it.  Invoking get() returns null if it's not initialized.  Can you assert that threadData.get() is actually returning what you expect?
> 
> No. This get() is NeverDestoyed<T>::get(). It returns ThreadSpecific<>&.
> Then, ThreadSpecific<T>::operator T* is invoked here, which includes initializing logic.
> And ThreadSpecific<T>::get() is private function. Thus it cannot be accessed from outside of ThreadSpecific<>.
> 
> But, inserting assertion is good idea. I've added ASSERT to MachineThreads::Thread constructor.

Ah yes.  My mistake.

>>> Source/JavaScriptCore/heap/MachineStackMarker.h:141
>>> +    };
>> 
>> Thinking about this some more, I realized that this Thread class now only serves one purpose: to be a SinglyLinkedList.  I was thinking you could get rid of it and just change MachineThread::m_registeredThreads to a WTF::SinglyLinkedList<ThreadData>, but found out that our WTF::SinglyLinkedList implementation does not yet support the iteration protocol.  I suppose we can beef up the SinglyLinkedList impl later, and refactor this to use this to use it.  The operator== and operator!= will need to move back into ThreadData.  I think that that would make the code cleaner.  You can even forego the renaming of MachineThreads::Thread to MachineThreads::ThreadData, and having to create all these wrapper functions.  I do like the ThreadData name better, and style-wise, I prefer accessor functions but that's just a refactoring that can be applied later if needed.
>> 
>> What do you think?  If it's too much work, I'm ok with landing this patch for now and filing a separate bug to use a SinglyLinkedList later.
> 
> Yeah. While using SinglyLinkedList is nice, I think we still need Thread separeted from ThreadData.
> Consider the situation,
> 
> There are two VMs A and B. And there are two user threads i, ii, and iii.
> 
> i use A. And later use B (with JSLock).
> ii use A. And later use B.
> iii use A.
> 
> In that case, A's thread list should include i, ii, iii while B's thread list should include i and ii.
> If we move `Thread* next` to ThreadData, it breaks the above situation.
> 
> Several clean up can be considered.
> 
> 1. Creating SinglyListNode.
> 
> Define the class like, class SinglyListNode<ThreadData>;
> It has operator-> to access ThreadData's accessor. And SinglyListNode adds m_next feature.
> And SinglyList<ThreadData> will be managed by this node. BTW, I think this is the same to the STL's std::list<ThreadData>.
> 
> 2. Introduce IntrusiveList.
> 
> Create the IntrusiveList class. And use it for the Thread. Like, Thread : public IntrusiveListNode<Thread>.
> I think it's a bit more generic approach. And maybe, it is useful.
> 
> BTW, I think the above changes require some code addition to WTF. So I like doing this in the separate patch :)

Or just use WTF::DoublyLinkedList.  But you can do that in a separate patch.
Comment 19 Mark Lam 2017-03-23 14:05:33 PDT
Comment on attachment 305225 [details]
Patch

r=me
Comment 20 Yusuke Suzuki 2017-03-23 14:26:47 PDT
Comment on attachment 305225 [details]
Patch

Thanks!
Comment 21 WebKit Commit Bot 2017-03-23 14:55:02 PDT
Comment on attachment 305225 [details]
Patch

Clearing flags on attachment: 305225

Committed r214319: <http://trac.webkit.org/changeset/214319>
Comment 22 WebKit Commit Bot 2017-03-23 14:55:08 PDT
All reviewed patches have been landed.  Closing bug.