Bug 173789

Summary: Add assertions to RefCounted and DeferrableRefCounted to catch thread-safety issues
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: Web Template FrameworkAssignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: RESOLVED WONTFIX    
Severity: Normal CC: bfulgham, cdumez, darin
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=31639
https://bugs.webkit.org/show_bug.cgi?id=127004
Attachments:
Description Flags
Attempt #1 (doesn't work) none

Description David Kilzer (:ddkilzer) 2017-06-23 13:52:26 PDT
While reading the committed fix for Bug 173753 (<https://trac.webkit.org/r218734>), I wondered if there was a way to catch thread-unsafe use of RefCounted objects in Debug builds.

After talking to Chris Dumez, he pointed me to the fix for Bug 173693 (<https://trac.webkit.org/r218705>), where a thread ID debug assertion was added to WebKit::GenericCallback.

So this bug represents adding a similar debug assertion to RefCounted and DeferrableRefCounted to check for places where RefCountedBase::m_refCount and DeferrableRefCountedBase::m_refCount are being accessed on a different thread than the thread where the object was created.  This also catches situations where ref() and deref() are called on different threads.
Comment 1 Alexey Proskuryakov 2017-06-23 21:13:10 PDT
> check for places where RefCountedBase::m_refCount and DeferrableRefCountedBase::m_refCount are being accessed on a 

This is legal with locking.
Comment 2 David Kilzer (:ddkilzer) 2017-06-24 03:27:01 PDT
(In reply to Alexey Proskuryakov from comment #1)
> > check for places where RefCountedBase::m_refCount and DeferrableRefCountedBase::m_refCount are being accessed on a 
> 
> This is legal with locking.

Do you know of any place where we actually lock like this?  Seems like locking would be a fairly heavy(?) compared to using atomic operations for m_refCount in ThreadSafeRefCounted.

The other place this is legal is when doing a WTFMove() of an object from one thread to another, such as in a lambda.

I guess if I can't get this to work I should focus on making it easy to build and run tests with ThreadSanitizer instead, which would catch the same issues, but work in the locking case and maybe(?) the WTFMove() case.
Comment 3 David Kilzer (:ddkilzer) 2017-06-24 18:26:07 PDT
Created attachment 313792 [details]
Attempt #1 (doesn't work)
Comment 4 David Kilzer (:ddkilzer) 2017-06-24 18:31:42 PDT
Comment on attachment 313792 [details]
Attempt #1 (doesn't work)

View in context: https://bugs.webkit.org/attachment.cgi?id=313792&action=review

> Source/WTF/wtf/RefPtr.h:113
>      RefPtr copyRef() const & WARN_UNUSED_RETURN { return RefPtr(m_ptr); }
> +    template<typename V> RefPtr<RefCounted<V>> copyRef() const & WARN_UNUSED_RETURN {
> +        m_ptr->resetOriginThreadID();
> +        return RefPtr<RefCounted<V>>(m_ptr);
> +    }

This specialization doesn't work, but I'm not sure why yet.

The patch compiles, but I still get some spurious assertions in Debug builds when running tests like this:

./Tools/Scripts/run-webkit-tests --debug --no-build -1 --no-retry --no-show-results --no-timeout --no-sample compositing/backface-visibility/backface-visibility-image.html

I think it's calling the copyRef() on the line above instead of the new method since m_originThreadID doesn't get reset.

The assert happens in Source/WebCore/loader/SubresourceLoader.cpp on this line in buffer.copyRef():

    ResourceLoader::didReceiveDataOrBuffer(data, length, buffer.copyRef(), encodedDataLength, dataPayloadType);

The reason that m_originThreadID is reset is that we have cases where we pass an object from one thread to another, then issues  copyRef() (or similar), which then destroys the old object and creates a new one on the current thread.  If we don't do this, then we hit false-positive asserts.  (Or at least that's the current thinking.)

Anyway, I'll look into it more on Monday.
Comment 5 Darin Adler 2017-06-25 10:16:46 PDT
(In reply to David Kilzer (:ddkilzer) from comment #0)
> So this bug represents adding a similar debug assertion to RefCounted and
> DeferrableRefCounted to check for places where RefCountedBase::m_refCount
> and DeferrableRefCountedBase::m_refCount are being accessed on a different
> thread than the thread where the object was created.  This also catches
> situations where ref() and deref() are called on different threads.

An example of how we do this intentionally is with isolatedCopy. We make a copy on one thread and then transfer it to another, using the appropriate correct locking and using isolatedCopy to guarantee other clients aren’t manipulating the reference count on the same object. To add some kind of assertion like this we will have to come up with a scheme to distinguish intentional correct use from incorrect. The object goes into a “ready to go to another thread” state as far as the programmer is concerned, but RefCounted, Ref, and RefPtr have no idea this is going on.
Comment 6 Darin Adler 2017-06-25 10:22:33 PDT
The last attempt at this was in 2011, and the details of how we started on it are in bug 31639, but then there was lots of follow-on to try to make it work. We eventually decided not to pursue it and removed it with bug 127004.
Comment 7 David Kilzer (:ddkilzer) 2017-06-25 14:49:21 PDT
(In reply to Darin Adler from comment #6)
> The last attempt at this was in 2011, and the details of how we started on
> it are in bug 31639, but then there was lots of follow-on to try to make it
> work. We eventually decided not to pursue it and removed it with bug 127004.

Okay, thanks for the info.  Sounds like investigating ThreadSanitizer would probably be more useful here.
Comment 8 Darin Adler 2017-06-25 16:15:17 PDT
Given we had a hard time getting this to work globally last time, one idea would be to add something that is activated only for certain classes, certain code paths, or  certain objects and slowly turn it on for more and more use cases. Some day it could be on by default.