Bug 147162 - Simplify the WTF thread creation mechanism
Summary: Simplify the WTF thread creation mechanism
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-21 13:28 PDT by Mark Lam
Modified: 2015-07-22 14:47 PDT (History)
4 users (show)

See Also:


Attachments
the fix. (18.86 KB, patch)
2015-07-21 13:41 PDT, Mark Lam
no flags Details | Formatted Diff | Diff
patch 2: added missing #include, and #if out unused function in NDEBUG build. (19.46 KB, patch)
2015-07-21 14:54 PDT, Mark Lam
ggaren: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2015-07-21 13:28:29 PDT
The current thread creation mechanism does the following:
   1. There are 2 wrapper thread entry points:
      a. threadEntryPoint() in Threading.cpp
      b. wtfThreadEntryPoint() in ThreadingPthreads.cpp / ThreadingWin.cpp

   2. There are 2 wrapper data structures for thread data:
      a. One version of createThread() wraps entryPoint and data in a std::function<void()>.
      b. A second version of createThread() further wraps the std::function<void()> in a NewThreadContext.
      c. Next, createThreadInternal() wraps the NewThreadContext in a ThreadFunctionInvocation.

      As a result:
      d. wtfThreadEntryPoint() needs to unwrap the ThreadFunctionInvocation, and invokes its function, taking us to ...
      e. threadEntryPoint() which unwraps the NewThreadContext, and invokes its entryPoint, taking us to ...
      f. the std::function<void()> which invokes its captured entryPoint.

   3. There are complicated life cycle management for these data structures:
      a. ownership of the std::function<void()> closure is forwarded (using WTF::move()) to
         a local in threadEntryPoint() where it gets destructed on return.
      b. the NewThreadContext is new'd in createThread() on the creating thread, and
         delete'd on the created thread before invoking the entryPoint.
      c. the ThreadFunctionInvocation is allocated using std::make_unique on the creating thread, and
         ownership is transferred to the created thread, where
         it destructs it on return from wtfThreadEntryPoint.

   4. Threading.cpp's threadEntryPoint calls initializeCurrentThreadInternal() to initialize the
      created thread's name.  The underlying mechanism for setting the thread name is platform
      specific anyway.  There's no reason why we can't just pass the name down to createThreadInternal()
      and have the specific platform implementation take care of it. 

This is a lot of malloc/frees and wrapping and un-wrappings.  And there are 3 representations of the thread entry point.  Also, ...

   5. The ThreadIdentifier of the created thread is computed as follows:
      a. The creating thread locks a mutex.
      b. The creating thread starts the new thread.
      c. The new thread blocks on the mutex.
      d. The creating thread establishes the ThreadIdentifier of the new thread.
      e. The creating thread unlocks the mutex.
      f. The new thread unblocks and caches the ThreadIdentifier in its thread local storage.
      g. The new thread releases the mutex and moves on with life.

This does not work when we want to use std::thread::id as the implementation of ThreadIdentifier later.  That requires the ThreadIdentifier to be established on the created thread instead because there is no way std::thread::get_id() that can be invoked on a thread that is not the current one.

With all that in mind, we can simplify the thread creation mechanism as follows:

1. Only have 1 wrapper thread entry point in ThreadingPthreads.cpp / ThreadingWin.cpp.
   We'll name it threadEntryPoint().

2. Only have 2 wrapper data structures for thread data:
   a. createThread() can wrap entryPoint and data in a std::function<void()> if needed.
   b. createThreadInternal() will use a platform specific ThreadData struct that allows the created thread to pass initialization data to the created thread.

3. Remove unnecessary mallocs:
   a. ownership of the std::function<void()> closure continues to be forwarded (using WTF::move()) to
      a local in the platform specific threadEntryPoint() where it gets destructed on return.
   b. ThreadData will be allocated on the stack in createThreadInternal().  It is only needed while
      creating the new thread and its life-cycle is localized there.

4. We will always compute the ThreadIdentifier in the created thread and pass it back to the creating thread.

5. The ThreadIdentifier of the create thread will be computed on the created thread as follows:
   a. The creating thread locks a mutex.
   b. The creating thread starts the new thread.
   c. The new thread blocks on the mutex.
   d. The creating thread waits on a condition variable and releases the mutex.
   e. The new thread unblocks, and establishes its ThreadIdentifier, and sets it in the ThreadData shared from the creating thread.
   f. The new thread notifies all waiting on the condition variable, and releases the mutex.
   g. The creating thread wakes up, and retrieves the ThreadIdentifier from the ThreadData.
   h. The creating thread releases the mutex and moves on with life.

Note that the life-cycle of the ThreadData struct is limited to the scope of createThreadInternal().  The new thread's notification to the creating thread is the signal that it is done with thread initialization.  Hence, the ThreadIdentifier is available at that time, and the ThreadData struct may be released by the creating thread.
Comment 1 Mark Lam 2015-07-21 13:41:22 PDT
Created attachment 257201 [details]
the fix.

Uploading to test on the EWS bots, while I'm running perf and regression tests locally.
Comment 2 Mark Lam 2015-07-21 14:54:37 PDT
Created attachment 257207 [details]
patch 2: added missing #include, and #if out unused function in NDEBUG build.
Comment 3 Mark Lam 2015-07-22 14:17:53 PDT
All tests (jsc, api, and layout tests) show no regressions.
Benchmarks (jsc, jetstream, speedstream) shows that this patch is perf neutral.
Comment 4 Geoffrey Garen 2015-07-22 14:39:53 PDT
Comment on attachment 257207 [details]
patch 2: added missing #include, and #if out unused function in NDEBUG build.

View in context: https://bugs.webkit.org/attachment.cgi?id=257207&action=review

> Source/WTF/ChangeLog:51
> +        This does not work when we want to use std::thread::id as the implementation of ThreadIdentifier later.
> +        That requires the ThreadIdentifier to be established on the created thread instead because there is no
> +        way std::thread::get_id() that can be invoked on a thread that is not the current one.

I'm not certain what you mean by this, but I think you've missed something. See http://en.cppreference.com/w/cpp/thread/thread/get_id.
Comment 5 Geoffrey Garen 2015-07-22 14:47:08 PDT
Comment on attachment 257207 [details]
patch 2: added missing #include, and #if out unused function in NDEBUG build.

View in context: https://bugs.webkit.org/attachment.cgi?id=257207&action=review

I don't think this patch is a simplification. Perhaps a new patch, inspired by some of the ideas in this patch, would be a simplification.

> Source/WTF/wtf/ThreadingPthreads.cpp:180
> +    volatile ThreadIdentifier id { 0 };

"volatile" is wrong here.

> Source/WTF/wtf/ThreadingPthreads.cpp:188
> +    // The contents of the ThreadData is only guaranteed to be valid before we signal the creating
> +    // thread below. The signaling tells the creating thread that we're done initializing and implies
> +    // that it may dispose of the ThreadData. Hence, we should grab the info we need out of it
> +    // before then.

This is a strange dance. Why not have the creating thread transfer owner of ThreadData to the created thread? That's a much easier model to work with.

> Source/WTF/wtf/ThreadingPthreads.cpp:225
> +    while (!data.id)
> +        data.conditionVariable.wait(lock);

Why do we want the creating thread to wait for the created thread to come into existence? That seems potentially slow, and a regression relative to what the old code did.

> Source/WTF/wtf/ThreadingPthreads.cpp:226
> +    ASSERT(data.id);

I don't like this ASSERT, since it duplicates the while loop just above it.

> Source/WTF/wtf/ThreadingWin.cpp:212
> +    volatile ThreadIdentifier id { 0 };

"volatile" is wrong here.

> Source/WTF/wtf/ThreadingWin.cpp:231
> +#if !defined(NDEBUG)
> +        data->id = currentThread();
> +#endif
> +        data->done = true;

Why not use id to indicate done-ness, as we do in pthreads?

> Source/WTF/wtf/ThreadingWin.cpp:264
> +    while (!data.done)
> +        data.conditionVariable.wait(lock);

Why do we want the creating thread to wait for the created thread to come into existence? That seems potentially slow, and a regression relative to what the old code did.