RESOLVED FIXED 109597
Cache timer heap pointer to timers
https://bugs.webkit.org/show_bug.cgi?id=109597
Summary Cache timer heap pointer to timers
Antti Koivisto
Reported 2013-02-12 10:48:31 PST
Accessing timer heap through thread global storage is slow (~0.1% in PLT3). We can cache the heap pointer to each TimerBase. There are not huge numbers of timers around so memory is not an issue and many timers are heavily reused.
Attachments
patch (5.13 KB, patch)
2013-02-12 10:51 PST, Antti Koivisto
kling: review+
Antti Koivisto
Comment 1 2013-02-12 10:51:31 PST
Andreas Kling
Comment 2 2013-02-12 10:57:09 PST
Comment on attachment 187896 [details] patch Neat hack! r=me
Antti Koivisto
Comment 3 2013-02-12 12:25:04 PST
Alexey Proskuryakov
Comment 4 2013-02-12 18:36:31 PST
Comment on attachment 187896 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=187896&action=review It is worrisome that these functions are so incredibly hot that TLS access - a very fast operation - shows up on the profile. > Source/WebCore/platform/Timer.h:85 > + const Vector<TimerBase*>& timerHeap() const { ASSERT(m_cachedThreadGlobalTimerHeap); return *m_cachedThreadGlobalTimerHeap; } > + Vector<TimerBase*>& timerHeap() { ASSERT(m_cachedThreadGlobalTimerHeap); return *m_cachedThreadGlobalTimerHeap; } I don't think that differentiating between the two is meaningful. The first function casts away constness from "this" anyway, so const safety is violated despite having a separate const version of this function. I suggest always returning a non-const reference, the timer heap is never immutable.
Antti Koivisto
Comment 5 2013-02-13 06:20:36 PST
(In reply to comment #4) > (From update of attachment 187896 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=187896&action=review > > It is worrisome that these functions are so incredibly hot that TLS access - a very fast operation - shows up on the profile. Part of the problem was that timerHeap() is accessed from many places during a singe call. Here is the profile before the patch: Running Time Self Symbol Name 134.0ms 0.5% 9.0 WebCore::TimerBase::setNextFireTime(double) 47.0ms 0.1% 2.0 WebCore::TimerBase::heapPopMin() 41.0ms 0.1% 31.0 void std::__1::__push_heap_front<WebCore::TimerHeapLessThanFunction&, WebCore::TimerHeapIterator>(WebCore::TimerHeapIterator, WebCore::TimerHeapIterator, WebCore::TimerHeapLessThanFunction&, std::__1::iterator_traits<WebCore::TimerHeapIterator>::difference_type) 10.0ms 0.0% 5.0 WebCore::timerHeap() 3.0ms 0.0% 2.0 WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*() 1.0ms 0.0% 1.0 pthread_getspecific 2.0ms 0.0% 2.0 DYLD-STUB$$pthread_getspecific 4.0ms 0.0% 1.0 WebCore::timerHeap() 3.0ms 0.0% 3.0 WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*() 34.0ms 0.1% 0.0 WebCore::setSharedTimerFireInterval(double) 26.0ms 0.0% 20.0 void std::__1::__push_heap_back<WebCore::TimerHeapLessThanFunction&, WebCore::TimerHeapIterator>(WebCore::TimerHeapIterator, WebCore::TimerHeapIterator, WebCore::TimerHeapLessThanFunction&, std::__1::iterator_traits<WebCore::TimerHeapIterator>::difference_type) 6.0ms 0.0% 1.0 WebCore::timerHeap() 4.0ms 0.0% 3.0 WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*() 1.0ms 0.0% 1.0 pthread_getspecific 1.0ms 0.0% 1.0 DYLD-STUB$$pthread_getspecific 10.0ms 0.0% 2.0 WebCore::timerHeap() 8.0ms 0.0% 6.0 WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*() 2.0ms 0.0% 2.0 pthread_getspecific 3.0ms 0.0% 0.0 WebCore::stopSharedTimer() 2.0ms 0.0% 2.0 WebCore::TimerBase::alignedFireTime(double) const 2.0ms 0.0% 2.0 WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*() 1.0ms 0.0% 0.0 WebCore::DOMTimer::alignedFireTime(double) const Focusing on timerHeap(): Running Time Self Symbol Name 32.0ms 0.1% 11.0 WebCore::timerHeap() 18.0ms 0.0% 14.0 WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*() 4.0ms 0.0% 4.0 pthread_getspecific 3.0ms 0.0% 3.0 DYLD-STUB$$pthread_getspecific Call to pthread_getspecific is indeed pretty fast in itself. Assembly level profile shows that time is spent in indirection and branchiness related to fetching the right piece of data. > I suggest always returning a non-const reference, the timer heap is never immutable. You are right.
Note You need to log in before you can comment on or make changes to this bug.