Bug 109597

Summary: Cache timer heap pointer to timers
Product: WebKit Reporter: Antti Koivisto <koivisto>
Component: PlatformAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, kling, mjs
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch kling: review+

Description Antti Koivisto 2013-02-12 10:48:31 PST
Accessing timer heap through thread global storage is slow (~0.1% in PLT3). We can cache the heap pointer to each TimerBase. There are not huge numbers of timers around so memory is not an issue and many timers are heavily reused.
Comment 1 Antti Koivisto 2013-02-12 10:51:31 PST
Created attachment 187896 [details]
patch
Comment 2 Andreas Kling 2013-02-12 10:57:09 PST
Comment on attachment 187896 [details]
patch

Neat hack! r=me
Comment 3 Antti Koivisto 2013-02-12 12:25:04 PST
http://trac.webkit.org/changeset/142652
Comment 4 Alexey Proskuryakov 2013-02-12 18:36:31 PST
Comment on attachment 187896 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=187896&action=review

It is worrisome that these functions are so incredibly hot that TLS access - a very fast operation - shows up on the profile.

> Source/WebCore/platform/Timer.h:85
> +    const Vector<TimerBase*>& timerHeap() const { ASSERT(m_cachedThreadGlobalTimerHeap); return *m_cachedThreadGlobalTimerHeap; }
> +    Vector<TimerBase*>& timerHeap() { ASSERT(m_cachedThreadGlobalTimerHeap); return *m_cachedThreadGlobalTimerHeap; }

I don't think that differentiating between the two is meaningful. The first function casts away constness from "this" anyway, so const safety is violated despite having a separate const version of this function.

I suggest always returning a non-const reference, the timer heap is never immutable.
Comment 5 Antti Koivisto 2013-02-13 06:20:36 PST
(In reply to comment #4)
> (From update of attachment 187896 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=187896&action=review
> 
> It is worrisome that these functions are so incredibly hot that TLS access - a very fast operation - shows up on the profile.

Part of the problem was that timerHeap() is accessed from many places during a singe call. Here is the profile before the patch:

Running Time	Self		Symbol Name
134.0ms    0.5%	9.0	 	 WebCore::TimerBase::setNextFireTime(double)
47.0ms    0.1%	2.0	 	  WebCore::TimerBase::heapPopMin()
41.0ms    0.1%	31.0	 	   void std::__1::__push_heap_front<WebCore::TimerHeapLessThanFunction&, WebCore::TimerHeapIterator>(WebCore::TimerHeapIterator, WebCore::TimerHeapIterator, WebCore::TimerHeapLessThanFunction&, std::__1::iterator_traits<WebCore::TimerHeapIterator>::difference_type)
10.0ms    0.0%	5.0	 	    WebCore::timerHeap()
3.0ms    0.0%	2.0	 	     WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*()
1.0ms    0.0%	1.0	 	      pthread_getspecific
2.0ms    0.0%	2.0	 	     DYLD-STUB$$pthread_getspecific
4.0ms    0.0%	1.0	 	   WebCore::timerHeap()
3.0ms    0.0%	3.0	 	    WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*()
34.0ms    0.1%	0.0	 	  WebCore::setSharedTimerFireInterval(double)
26.0ms    0.0%	20.0	 	  void std::__1::__push_heap_back<WebCore::TimerHeapLessThanFunction&, WebCore::TimerHeapIterator>(WebCore::TimerHeapIterator, WebCore::TimerHeapIterator, WebCore::TimerHeapLessThanFunction&, std::__1::iterator_traits<WebCore::TimerHeapIterator>::difference_type)
6.0ms    0.0%	1.0	 	   WebCore::timerHeap()
4.0ms    0.0%	3.0	 	    WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*()
1.0ms    0.0%	1.0	 	     pthread_getspecific
1.0ms    0.0%	1.0	 	    DYLD-STUB$$pthread_getspecific
10.0ms    0.0%	2.0	 	  WebCore::timerHeap()
8.0ms    0.0%	6.0	 	   WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*()
2.0ms    0.0%	2.0	 	    pthread_getspecific
3.0ms    0.0%	0.0	 	  WebCore::stopSharedTimer()
2.0ms    0.0%	2.0	 	  WebCore::TimerBase::alignedFireTime(double) const
2.0ms    0.0%	2.0	 	  WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*()
1.0ms    0.0%	0.0	 	  WebCore::DOMTimer::alignedFireTime(double) const

Focusing on timerHeap():

Running Time	Self		Symbol Name
32.0ms    0.1%	11.0	 	 WebCore::timerHeap()
18.0ms    0.0%	14.0	 	  WTF::ThreadSpecific<WebCore::ThreadGlobalData>::operator WebCore::ThreadGlobalData*()
4.0ms    0.0%	4.0	 	   pthread_getspecific
3.0ms    0.0%	3.0	 	  DYLD-STUB$$pthread_getspecific

Call to pthread_getspecific is indeed pretty fast in itself. Assembly level profile shows that time is spent in indirection and branchiness related to fetching the right piece of data.

> I suggest always returning a non-const reference, the timer heap is never immutable.

You are right.