Bug 30922
Summary: | ThreadIdentifier abstraction is unnecessarily slow | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jens Alfke <jens> |
Component: | JavaScriptCore | Assignee: | Jens Alfke <jens> |
Status: | RESOLVED DUPLICATE | ||
Severity: | Enhancement | ||
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Mac | ||
OS: | All |
Jens Alfke
Threading.h uses a 'ThreadIdentifier' type to abstract away platform-specific thread types like pthread_t. But the implementation of this type for pthreads, found in ThreadingPthreads.cpp, is suboptimal. This is causing measurable performance impact in Chrome*; I haven't checked Safari.
ThreadIdentifiers are consecutive integers which are mapped from pthread_t's via a HashMap. This means that looking up the identifier of a thread (including the current thread) means first acquiring a mutex, then a hashtable lookup. And looking up a thread by its identifier requires a mutex plus a linear scan of the table.
I can think of two much faster ways to do this:
(1) Define ThreadIdentifier as a typedef of pthread_t, eliminating the need for the mapping altogether. Drawback is that pthread_t is probably a pointer, making it 64 bits wide in a 64-bit process, which increases the size of a ThreadIdentifier. Does this matter? We probably don't store enough of these to make a difference.
(2) Use a pthread_t[ ] array instead of a dictionary. The ThreadIdentifier is just the array index where that pthread_t is found. This makes lookup quite fast. If the array is fixed-size then no locking is needed, you can do an atomic check-and-test to append to the array. Drawback is that the array is fixed-size and thread IDs are never removed, so the array could overflow if a lot of threads are created.
(3) Use pthread_getspecific et al to associate the identifier. I don't know how well this performs compared to what exists.
Option 1 seems like a winner to me. It's very fast and allows at least a page of code to be ripped out of ThreadingPthreads.cpp.
* Here's some analysis from Shark, running the Dromaeo core DOM benchmarks, focused on the V8 GC entry point (so it shows up as 100%). Note that 9.6% of the GC time is spent in WTF::currentThread, becuase DOMData::ensureDeref has to check whether the object is being released on its native thread or not.
0.0% 100.0% Chromium Framework v8::internal::Heap::CollectGarbage(int, v8::internal::AllocationSpace)
0.0% 100.0% Chromium Framework v8::internal::Heap::PerformGarbageCollection(v8::internal::AllocationSpace, v8::internal::GarbageCollector, v8::internal::GCTracer*)
10.8% 63.8% Chromium Framework v8::internal::GlobalHandles::PostGarbageCollectionProcessing()
0.2% 41.6% Chromium Framework WebCore::DOMDataStore::weakDOMObjectCallback(v8::Persistent<v8::Value>, void*)
1.6% 40.4% Chromium Framework void WebCore::DOMData::handleWeakObject<void>(WebCore::DOMDataStore::DOMWrapperMapType, v8::Handle<v8::Object>, void*)
7.1% 19.7% Chromium Framework WebCore::DOMData::ensureDeref(WebCore::V8ClassIndex::V8WrapperType, void*)
0.3% 9.6% Chromium Framework WTF::currentThread()
5.7% 9.1% Chromium Framework _ZN3WTFL25identifierByPthreadHandleERKP17_opaque_pthread_t
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Mark Rowe (bdash)
*** This bug has been marked as a duplicate of bug 25348 ***