Bug 30922

Summary: ThreadIdentifier abstraction is unnecessarily slow
Product: WebKit Reporter: Jens Alfke <jens>
Component: JavaScriptCoreAssignee: Jens Alfke <jens>
Status: RESOLVED DUPLICATE    
Severity: Enhancement    
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: All   

Jens Alfke
Reported 2009-10-29 13:14:59 PDT
Threading.h uses a 'ThreadIdentifier' type to abstract away platform-specific thread types like pthread_t. But the implementation of this type for pthreads, found in ThreadingPthreads.cpp, is suboptimal. This is causing measurable performance impact in Chrome*; I haven't checked Safari. ThreadIdentifiers are consecutive integers which are mapped from pthread_t's via a HashMap. This means that looking up the identifier of a thread (including the current thread) means first acquiring a mutex, then a hashtable lookup. And looking up a thread by its identifier requires a mutex plus a linear scan of the table. I can think of two much faster ways to do this: (1) Define ThreadIdentifier as a typedef of pthread_t, eliminating the need for the mapping altogether. Drawback is that pthread_t is probably a pointer, making it 64 bits wide in a 64-bit process, which increases the size of a ThreadIdentifier. Does this matter? We probably don't store enough of these to make a difference. (2) Use a pthread_t[ ] array instead of a dictionary. The ThreadIdentifier is just the array index where that pthread_t is found. This makes lookup quite fast. If the array is fixed-size then no locking is needed, you can do an atomic check-and-test to append to the array. Drawback is that the array is fixed-size and thread IDs are never removed, so the array could overflow if a lot of threads are created. (3) Use pthread_getspecific et al to associate the identifier. I don't know how well this performs compared to what exists. Option 1 seems like a winner to me. It's very fast and allows at least a page of code to be ripped out of ThreadingPthreads.cpp. * Here's some analysis from Shark, running the Dromaeo core DOM benchmarks, focused on the V8 GC entry point (so it shows up as 100%). Note that 9.6% of the GC time is spent in WTF::currentThread, becuase DOMData::ensureDeref has to check whether the object is being released on its native thread or not. 0.0% 100.0% Chromium Framework v8::internal::Heap::CollectGarbage(int, v8::internal::AllocationSpace) 0.0% 100.0% Chromium Framework v8::internal::Heap::PerformGarbageCollection(v8::internal::AllocationSpace, v8::internal::GarbageCollector, v8::internal::GCTracer*) 10.8% 63.8% Chromium Framework v8::internal::GlobalHandles::PostGarbageCollectionProcessing() 0.2% 41.6% Chromium Framework WebCore::DOMDataStore::weakDOMObjectCallback(v8::Persistent<v8::Value>, void*) 1.6% 40.4% Chromium Framework void WebCore::DOMData::handleWeakObject<void>(WebCore::DOMDataStore::DOMWrapperMapType, v8::Handle<v8::Object>, void*) 7.1% 19.7% Chromium Framework WebCore::DOMData::ensureDeref(WebCore::V8ClassIndex::V8WrapperType, void*) 0.3% 9.6% Chromium Framework WTF::currentThread() 5.7% 9.1% Chromium Framework _ZN3WTFL25identifierByPthreadHandleERKP17_opaque_pthread_t
Attachments
Mark Rowe (bdash)
Comment 1 2009-10-29 13:33:18 PDT
*** This bug has been marked as a duplicate of bug 25348 ***
Note You need to log in before you can comment on or make changes to this bug.