Bug 175187

Summary: [WTF] ThreadSpecific should not introduce additional indirection
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: New BugsAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, buildbot, cdumez, cmarcelo, dbates, keith_miller, mark.lam, msaboff, sbarati, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch mark.lam: review+

Description Yusuke Suzuki 2017-08-04 07:34:08 PDT
[WTF] ThreadSpecific should not introduce additional indirection
Comment 1 Yusuke Suzuki 2017-08-04 07:37:15 PDT
Created attachment 317242 [details]
Patch
Comment 2 Build Bot 2017-08-04 07:39:39 PDT
Attachment 317242 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/ThreadSpecific.h:107:  Should be indented on a separate line, with the colon or comma first on that line.  [whitespace/indent] [4]
Total errors found: 1 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Yusuke Suzuki 2017-08-04 07:44:12 PDT
Comment on attachment 317242 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=317242&action=review

> Source/WTF/wtf/ThreadSpecific.h:170
>      RELEASE_ASSERT(canBeGCThread == CanBeGCThread::True || !mayBeGCThread());

I'm not sure this release assert is good here.
get() is frequently called from operator*, operator->. And the problematic case is that when we create the Data in GC thread (with CanBeGCThread::False).
So is it enough to put this in ::set() function?

If we can drop this RELEASE_ASSERT, this get() function potentially becomes super fast. I intentionally put the storage in Data's first member.
Thus, data->storagePointer()'s pointer value would be the same to data. In that case, this function would becomes,

return static_cast<Data*>(pthread_getspecific(m_key));
Comment 4 Yusuke Suzuki 2017-08-04 07:44:42 PDT
(In reply to Yusuke Suzuki from comment #3)
> Comment on attachment 317242 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=317242&action=review
> 
> > Source/WTF/wtf/ThreadSpecific.h:170
> >      RELEASE_ASSERT(canBeGCThread == CanBeGCThread::True || !mayBeGCThread());
> 
> I'm not sure this release assert is good here.
> get() is frequently called from operator*, operator->. And the problematic
> case is that when we create the Data in GC thread (with
> CanBeGCThread::False).
> So is it enough to put this in ::set() function?
> 
> If we can drop this RELEASE_ASSERT, this get() function potentially becomes
> super fast. I intentionally put the storage in Data's first member.
> Thus, data->storagePointer()'s pointer value would be the same to data. In
> that case, this function would becomes,
> 
> return static_cast<Data*>(pthread_getspecific(m_key));

By the optimizer
Comment 5 Yusuke Suzuki 2017-08-04 08:05:22 PDT
Created attachment 317243 [details]
Patch
Comment 6 Yusuke Suzuki 2017-08-04 08:12:17 PDT
Created attachment 317244 [details]
Patch
Comment 7 Yusuke Suzuki 2017-08-04 08:27:02 PDT
Created attachment 317245 [details]
Patch
Comment 8 Yusuke Suzuki 2017-08-04 08:28:09 PDT
Created attachment 317246 [details]
Patch
Comment 9 Yusuke Suzuki 2017-08-04 08:46:16 PDT
Created attachment 317249 [details]
Patch
Comment 10 Yusuke Suzuki 2017-08-04 10:02:50 PDT
Created attachment 317252 [details]
Patch
Comment 11 Mark Lam 2017-08-09 13:19:35 PDT
Comment on attachment 317252 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=317252&action=review

I'm going to propose a change.  Let me know if you think this is not a good idea.

I also think there's a bug with using ThreadSpecific unique_ptrs to point to the ThreadGlobalData instance.

> Source/WTF/wtf/ThreadSpecific.h:111
> +        void construct()
> +        {
> +            new (NotNull, storagePointer()) T;
> +        }
>  
> -        T* value;
> +        void destroy()
> +        {
> +            storagePointer()->~T();
> +        }
> +
> +        Data(ThreadSpecific<T, canBeGCThread>* owner)
> +            : owner(owner)
> +        {
> +        }

In the old idiom, it is possible for the T constructor to call ThreadSpecific<T...>::get() and get a nullptr.  With your new idiom, the T constructor will get its partially initialized self if it calls ThreadSpecific<T...>::get().  I think it doesn't make sense for the T constructor to call ThreadSpecific<T...>::get() anyway.  I understand that there is a concern of infinite recursion but that should not be possible.  The old code does things in this order: 1. construct T, 2. constructs Data, 3. set Data in TLS.  Because this is the case, the T constructor will always see that ThreadSpecific<T...>::get() returns null.  If this triggers recursion now, then it would have done so before.

So, how about changing the Data constructor to initialize T instead:

    Data(ThreadSpecific<T, canBeGCThread>* owner)
        : owner(owner)
    {
        new (NotNull, storagePointer()) T;
    }

    ~Data()
    {
        storagePointer()->~T();
    }

Get rid of construct() and destroy().  If we do it this way, we have the same behavior as before i.e. if T constructor calls ThreadSpecific<T...>::get(), it gets a nullptr.

> Source/WTF/wtf/ThreadSpecific.h:120
> +    Data* set();

Rename set to constructAndSet.

> Source/WTF/wtf/ThreadSpecific.h:178
> +inline auto ThreadSpecific<T, canBeGCThread>::set() -> Data*

Rename set to constructAndSet.

> Source/WTF/wtf/ThreadSpecific.h:255
> +inline auto ThreadSpecific<T, canBeGCThread>::set() -> Data*

Rename set to constructAndSet.

> Source/WTF/wtf/ThreadSpecific.h:279
> -    data->value->~T();
> -    fastFree(data->value);
> +    data->destroy();

I suggest changing this to:
     auto* owner = data->owner;
     data->~Data();

In the old idiom, data destructors are able to call get() to get the instance of T, but the T instance may be partially destructed.  In your new idiom, this is no different.  Also, get() does not rely on any fields in Data itself (it only offsets into it to get T).  Hence, there's no issue with just destructing Data here.

You will also need to change the "delete data" below to "fastFree(data)" instead.

> Source/WTF/wtf/ThreadSpecific.h:282
> +    pthread_setspecific(data->owner->m_key, nullptr);

I suggest changing this to:
    pthread_setspecific(owner->m_key, nullptr);

> Source/WTF/wtf/ThreadSpecific.h:284
> +    FlsSetValue(flsKeys()[data->owner->m_index], nullptr);

I suggest changing this to:
    FlsSetValue(flsKeys()[owner->m_index], nullptr);

> Source/WTF/wtf/ThreadSpecific.h:308
> +    Data* data = set();
> +    data->construct();

I suggest changing this to:
    Data* data = constructAndSet();

> Source/WebCore/platform/ThreadGlobalData.cpp:94
> +    // While we store the same pointer to std::unique_ptr, it is ok because the web thread never finishes.
> +    (**staticData).reset(sharedMainThreadStaticData);

I think using unique_ptr for staticData is problematic.  Since both the main thread and the web thread has an instance of a std::unique_ptr, both pointing to the same ThreadGlobalData, if the main thread exits but the web thread does not, then the main thread's ThreadSpecific will delete its unique_ptr, which in turn deletes the ThreadGlobalData.  However, the web thread still thinks it's alive.  Using unique_ptr this way is also wrong because it communicates that the owner of the unique_ptr has unique and exclusive ownership of the object, but it obviously does not since you're having more than one unique_ptr point (per thread) to the same ThreadGlobalData.

I think what you want is:
static ThreadSpecific<ThreadGlobalData*>* staticData { nullptr };

> Source/WebCore/platform/ThreadGlobalData.h:44
> +    WTF_MAKE_NONCOPYABLE(ThreadGlobalData);
> +    WTF_MAKE_FAST_ALLOCATED;

These should be indented by 4 more spaces.
Comment 12 Yusuke Suzuki 2017-08-09 19:11:35 PDT
Comment on attachment 317252 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=317252&action=review

>> Source/WTF/wtf/ThreadSpecific.h:111
>> +        }
> 
> In the old idiom, it is possible for the T constructor to call ThreadSpecific<T...>::get() and get a nullptr.  With your new idiom, the T constructor will get its partially initialized self if it calls ThreadSpecific<T...>::get().  I think it doesn't make sense for the T constructor to call ThreadSpecific<T...>::get() anyway.  I understand that there is a concern of infinite recursion but that should not be possible.  The old code does things in this order: 1. construct T, 2. constructs Data, 3. set Data in TLS.  Because this is the case, the T constructor will always see that ThreadSpecific<T...>::get() returns null.  If this triggers recursion now, then it would have done so before.
> 
> So, how about changing the Data constructor to initialize T instead:
> 
>     Data(ThreadSpecific<T, canBeGCThread>* owner)
>         : owner(owner)
>     {
>         new (NotNull, storagePointer()) T;
>     }
> 
>     ~Data()
>     {
>         storagePointer()->~T();
>     }
> 
> Get rid of construct() and destroy().  If we do it this way, we have the same behavior as before i.e. if T constructor calls ThreadSpecific<T...>::get(), it gets a nullptr.

According to the comment in operator T* and the old code,

// Set up thread-specific value's memory pointer before invoking constructor, in case any function it calls
// needs to access the value, to avoid recursion.
ptr = static_cast<T*>(fastZeroedMalloc(sizeof(T)));
set(ptr);
new (NotNull, ptr) T;

So, the old code explicitly set the pointer to the Data in TLS first and construct this after that. Thus,

Data(ThreadSpecific<T, canBeGCThread>* owner) : owner(owner) { new (NotNull, storagePointer()) T; }

will break the current semantics. Is it correct? I think we need to separate Data constructor and T construction to keep the current semantics.

>> Source/WTF/wtf/ThreadSpecific.h:120
>> +    Data* set();
> 
> Rename set to constructAndSet.

Sounds nice. To represent the current oder, I renamed it to setAndConstruct.

>> Source/WTF/wtf/ThreadSpecific.h:178
>> +inline auto ThreadSpecific<T, canBeGCThread>::set() -> Data*
> 
> Rename set to constructAndSet.

Fixed.

>> Source/WTF/wtf/ThreadSpecific.h:255
>> +inline auto ThreadSpecific<T, canBeGCThread>::set() -> Data*
> 
> Rename set to constructAndSet.

Fixed.

>> Source/WTF/wtf/ThreadSpecific.h:279
>> +    data->destroy();
> 
> I suggest changing this to:
>      auto* owner = data->owner;
>      data->~Data();
> 
> In the old idiom, data destructors are able to call get() to get the instance of T, but the T instance may be partially destructed.  In your new idiom, this is no different.  Also, get() does not rely on any fields in Data itself (it only offsets into it to get T).  Hence, there's no issue with just destructing Data here.
> 
> You will also need to change the "delete data" below to "fastFree(data)" instead.

OK, sounds fine. Fixed.

>> Source/WTF/wtf/ThreadSpecific.h:282
>> +    pthread_setspecific(data->owner->m_key, nullptr);
> 
> I suggest changing this to:
>     pthread_setspecific(owner->m_key, nullptr);

Fixed.

>> Source/WTF/wtf/ThreadSpecific.h:284
>> +    FlsSetValue(flsKeys()[data->owner->m_index], nullptr);
> 
> I suggest changing this to:
>     FlsSetValue(flsKeys()[owner->m_index], nullptr);

Fixed.

>> Source/WTF/wtf/ThreadSpecific.h:308
>> +    data->construct();
> 
> I suggest changing this to:
>     Data* data = constructAndSet();

I've changed this to return T* from setAndConstruct().

>> Source/WebCore/platform/ThreadGlobalData.cpp:94
>> +    (**staticData).reset(sharedMainThreadStaticData);
> 
> I think using unique_ptr for staticData is problematic.  Since both the main thread and the web thread has an instance of a std::unique_ptr, both pointing to the same ThreadGlobalData, if the main thread exits but the web thread does not, then the main thread's ThreadSpecific will delete its unique_ptr, which in turn deletes the ThreadGlobalData.  However, the web thread still thinks it's alive.  Using unique_ptr this way is also wrong because it communicates that the owner of the unique_ptr has unique and exclusive ownership of the object, but it obviously does not since you're having more than one unique_ptr point (per thread) to the same ThreadGlobalData.
> 
> I think what you want is:
> static ThreadSpecific<ThreadGlobalData*>* staticData { nullptr };

However, is this the same semantics to the current code? In the current code, we just use ThreadSpecific<ThreadGlobalData>. And in web thread, we use ThreadSpecific<ThreadGlobalData>::replace. But ThreadSpecific<ThreadGlobalData>::replace does not allocate a new data. So, if the main thread exits in the current code, it will destroy ThreadGlobalData shared with the web thread.
Since we should take serious care about threading code, I think we should keep the current semantics as much as possible.

>> Source/WebCore/platform/ThreadGlobalData.h:44
>> +    WTF_MAKE_FAST_ALLOCATED;
> 
> These should be indented by 4 more spaces.

Fixed.
Comment 13 Yusuke Suzuki 2017-08-09 19:25:36 PDT
Created attachment 317775 [details]
Patch
Comment 14 Mark Lam 2017-08-10 09:47:20 PDT
Comment on attachment 317252 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=317252&action=review

>>> Source/WebCore/platform/ThreadGlobalData.cpp:94
>>> +    (**staticData).reset(sharedMainThreadStaticData);
>> 
>> I think using unique_ptr for staticData is problematic.  Since both the main thread and the web thread has an instance of a std::unique_ptr, both pointing to the same ThreadGlobalData, if the main thread exits but the web thread does not, then the main thread's ThreadSpecific will delete its unique_ptr, which in turn deletes the ThreadGlobalData.  However, the web thread still thinks it's alive.  Using unique_ptr this way is also wrong because it communicates that the owner of the unique_ptr has unique and exclusive ownership of the object, but it obviously does not since you're having more than one unique_ptr point (per thread) to the same ThreadGlobalData.
>> 
>> I think what you want is:
>> static ThreadSpecific<ThreadGlobalData*>* staticData { nullptr };
> 
> However, is this the same semantics to the current code? In the current code, we just use ThreadSpecific<ThreadGlobalData>. And in web thread, we use ThreadSpecific<ThreadGlobalData>::replace. But ThreadSpecific<ThreadGlobalData>::replace does not allocate a new data. So, if the main thread exits in the current code, it will destroy ThreadGlobalData shared with the web thread.
> Since we should take serious care about threading code, I think we should keep the current semantics as much as possible.

Good point!  I still think that the code is fragile, but I agree with you that it is equivalent to present semantics.  Hence, it is ok, and safe to go with your use of unique_ptr.  What this means is that we probably don't expect the main thread to ever exit as well.
Comment 15 Mark Lam 2017-08-10 10:15:11 PDT
Comment on attachment 317775 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=317775&action=review

r=me with suggestions.

> Source/WTF/ChangeLog:15
> +        This patch adds storage to Data and initializes T in Data.
> +        And we drop ThreadSpecific::replace support, which is only used
> +        by Web thread. Web thread should use ThreadSpecific<std::unique_ptr<T>> instead.

I suggest rephrasing this as:

This patch adds storage in Data in order to embed the instance of T. The constructor for Data will invoke the constructor for T on the embedded storage. We also drop ThreadSpecific::replace which is only used by the web thread to set its thread specific ThreadGlobalData to the one shared from the main thread. The existing implementation relies on the main thread and the web thread never exiting in order for the shared ThreadGlobalData to stay alive. We can achieve the same semantics by using a ThreadSpecific<std::unique_ptr<T>> to hold the ThreadGlobalData instance instead.

> Source/WTF/wtf/ThreadSpecific.h:101
> +        void construct()
> +        {
> +            new (NotNull, storagePointer()) T;
> +        }

You can delete this.  Not needed anymore.  See constructor below.

> Source/WTF/wtf/ThreadSpecific.h:106
> +        Data(ThreadSpecific<T, canBeGCThread>* owner)
> +            : owner(owner)
> +        {
> +        }

In the body of the constructor, add:
    // Set up thread-specific value's memory pointer before invoking T's constructor (in case any function it calls
    // needs to access the value) to avoid recursion.
    set(data);
    new (NotNull, storagePointer()) T;

This ensures that T is initialize as part of constructing Data, and is not reliant on the client to do the work correctly.  It also makes the code more symmetrical: Data's constructor calls T's constructor, and Data's destructor calls T's destructor.

> Source/WTF/wtf/ThreadSpecific.h:121
> +    T* setAndConstruct();
> +    void set(Data*);

This is just a suggestion: rename "setAndConstruct" to just "set" (to mirror "get").  The fact that "set"ting involves allocating Data and constructing T is just an implementation detail.  Accordingly, rename "set" to "setInTLS" or "setInternal".  What do you think?

> Source/WTF/wtf/ThreadSpecific.h:290
> +    // Set up thread-specific value's memory pointer before invoking constructor, in case any function it calls
> +    // needs to access the value, to avoid recursion.

Move this into the Data constructor (see above).

> Source/WTF/wtf/ThreadSpecific.h:295
> +    set(data);
> +    data->construct();

Since we only even create instances of Data that we want to set into TLS, we can make this cleaner and just do this work in the constructor (see above).

Also add a comment near the "new Data()" line: "// Data will set itself into TLS."  Or something to that effect.
Alternatively, you can just assert it:
    ASSERT(get() == data->storagePointer());
or add both the assert and the comment.

> Source/WebCore/platform/ThreadGlobalData.cpp:93
> +    // While we store the same pointer to std::unique_ptr, it is ok because the web thread never finishes.

I suggest rephrasing this as:
// The web thread never finishes, and we expect the main thread to also never finish. Hence, it is safe to store the same ThreadGlobalData pointer in a thread specific std::unique_ptr.

I also suggest adding a FIXME comment here with a bug url to change ThreadGlobalData to be ThreadSafeRefCounted (and staticData to be ThreadSpecific<RefPtr<ThreadGlobalData>>*) later.  I think that will removes the need to assume that the main thread never exits, and makes the code less fragile.  On the other hand, I think there is a lot of expectation that the "main" thread paired with the web thread is always the main thread (because we rely on pthread_main_np() to be true for the main thread).  Hence, it is expected in many places that the main thread never exits, and this issue is unlikely to ever manifest.  I'll let you decide if you want to do this step or not.
Comment 16 Yusuke Suzuki 2017-08-10 13:53:58 PDT
Comment on attachment 317775 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=317775&action=review

>> Source/WTF/ChangeLog:15
>> +        by Web thread. Web thread should use ThreadSpecific<std::unique_ptr<T>> instead.
> 
> I suggest rephrasing this as:
> 
> This patch adds storage in Data in order to embed the instance of T. The constructor for Data will invoke the constructor for T on the embedded storage. We also drop ThreadSpecific::replace which is only used by the web thread to set its thread specific ThreadGlobalData to the one shared from the main thread. The existing implementation relies on the main thread and the web thread never exiting in order for the shared ThreadGlobalData to stay alive. We can achieve the same semantics by using a ThreadSpecific<std::unique_ptr<T>> to hold the ThreadGlobalData instance instead.

Sounds very fine. Fixed.

>> Source/WTF/wtf/ThreadSpecific.h:101
>> +        }
> 
> You can delete this.  Not needed anymore.  See constructor below.

Deleted.

>> Source/WTF/wtf/ThreadSpecific.h:106
>> +        }
> 
> In the body of the constructor, add:
>     // Set up thread-specific value's memory pointer before invoking T's constructor (in case any function it calls
>     // needs to access the value) to avoid recursion.
>     set(data);
>     new (NotNull, storagePointer()) T;
> 
> This ensures that T is initialize as part of constructing Data, and is not reliant on the client to do the work correctly.  It also makes the code more symmetrical: Data's constructor calls T's constructor, and Data's destructor calls T's destructor.

Nice, changed.

>> Source/WTF/wtf/ThreadSpecific.h:121
>> +    void set(Data*);
> 
> This is just a suggestion: rename "setAndConstruct" to just "set" (to mirror "get").  The fact that "set"ting involves allocating Data and constructing T is just an implementation detail.  Accordingly, rename "set" to "setInTLS" or "setInternal".  What do you think?

Renamed to `set`. And renamed this `set` to `setInTLS` :)

> Source/WTF/wtf/ThreadSpecific.h:281
>  #if USE(PTHREADS)
> -    pthread_setspecific(data->owner->m_key, 0);
> +    pthread_setspecific(owner->m_key, nullptr);
>  #elif OS(WINDOWS)
> -    FlsSetValue(flsKeys()[data->owner->m_index], 0);
> +    FlsSetValue(flsKeys()[owner->m_index], nullptr);
>  #else
>  #error ThreadSpecific is not implemented for this platform.
>  #endif

I think moving this to ~Data() is fine (it makes Data and ~Data more symmetrical). And we can use `delete` instead of fastFree.

>> Source/WTF/wtf/ThreadSpecific.h:290
>> +    // needs to access the value, to avoid recursion.
> 
> Move this into the Data constructor (see above).

OK, moved.

>> Source/WTF/wtf/ThreadSpecific.h:295
>> +    data->construct();
> 
> Since we only even create instances of Data that we want to set into TLS, we can make this cleaner and just do this work in the constructor (see above).
> 
> Also add a comment near the "new Data()" line: "// Data will set itself into TLS."  Or something to that effect.
> Alternatively, you can just assert it:
>     ASSERT(get() == data->storagePointer());
> or add both the assert and the comment.

Make sense. I added the comment and assert.

>> Source/WebCore/platform/ThreadGlobalData.cpp:93
>> +    // While we store the same pointer to std::unique_ptr, it is ok because the web thread never finishes.
> 
> I suggest rephrasing this as:
> // The web thread never finishes, and we expect the main thread to also never finish. Hence, it is safe to store the same ThreadGlobalData pointer in a thread specific std::unique_ptr.
> 
> I also suggest adding a FIXME comment here with a bug url to change ThreadGlobalData to be ThreadSafeRefCounted (and staticData to be ThreadSpecific<RefPtr<ThreadGlobalData>>*) later.  I think that will removes the need to assume that the main thread never exits, and makes the code less fragile.  On the other hand, I think there is a lot of expectation that the "main" thread paired with the web thread is always the main thread (because we rely on pthread_main_np() to be true for the main thread).  Hence, it is expected in many places that the main thread never exits, and this issue is unlikely to ever manifest.  I'll let you decide if you want to do this step or not.

Changed. And filed the bug (and added FIXME comment).
Comment 17 Yusuke Suzuki 2017-08-10 14:00:53 PDT
Created attachment 317840 [details]
Patch
Comment 18 Yusuke Suzuki 2017-08-10 14:19:58 PDT
Committed r220548: <http://trac.webkit.org/changeset/220548>
Comment 19 Radar WebKit Bug Importer 2017-08-10 14:21:07 PDT
<rdar://problem/33835435>