Bug 162702 - Fix race condition in StringView's UnderlyingString lifecycle management.
Summary: Fix race condition in StringView's UnderlyingString lifecycle management.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on:
Blocks: 160384
  Show dependency treegraph
 
Reported: 2016-09-28 14:04 PDT by Mark Lam
Modified: 2016-09-29 17:01 PDT (History)
8 users (show)

See Also:


Attachments
proposed patch. (5.96 KB, patch)
2016-09-28 14:11 PDT, Mark Lam
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2016-09-28 14:04:01 PDT
There 2 relevant functions at play:

void StringView::setUnderlyingString(const StringImpl* string)
{
    UnderlyingString* underlyingString;
    if (!string)
        underlyingString = nullptr;
    else {
        std::lock_guard<StaticLock> lock(underlyingStringsMutex);
        auto result = underlyingStrings().add(string, nullptr);
        if (result.isNewEntry)
            result.iterator->value = new UnderlyingString(*string);
        else
            ++result.iterator->value->refCount;
        underlyingString = result.iterator->value; // Point P2.
    }
    adoptUnderlyingString(underlyingString); // Point P5.
}

... and ...

void StringView::adoptUnderlyingString(UnderlyingString* underlyingString)
{
    if (m_underlyingString) {
        // Point P0.
        if (!--m_underlyingString->refCount) {
            if (m_underlyingString->isValid) { // Point P1.
                std::lock_guard<StaticLock> lock(underlyingStringsMutex);
                underlyingStrings().remove(&m_underlyingString->string); // Point P3.
            }
            delete m_underlyingString; // Point P4.
        }
    }
    m_underlyingString = underlyingString;
}

Imagine the following scenario:

1. Thread T1 has been using an UnderlyingString U1, and is now done with it.
   T1 runs up to point P1 in adoptUnderlyingString(), and is blocked waiting for
   the underlyingStringsMutex (which is currently being held by Thread T2).
2. Context switch to Thread T2.
   T2 wants to use UnderlyingString U1, and runs up to point P2 in setUnderlyingString()
   and releases the underlyingStringsMutex.
   Note: T2 thinks it has successfully refCounted U1, and therefore U1 is safe to use.
3. Context switch to Thread T1.
   T1 acquires the underlyingStringsMutex, and proceeds to remove it from the
   underlyingStrings() map (see Point P3).  It thinks it has successfully done so
   and proceeds to delete U1 (see Point P4).
4. Context switch to Thread T2.
   T2 proceeds to use U1 (see Point P5 in setUnderlyingString()).
   Note: U1 has already been freed.  This is a use after free.

The fix is to acquire the underlyingStringsMutex at Point P0 in adoptUnderlyingString() instead of after P1.  This ensures that the decrementing of the UnderlyingString refCount and its removal from the underlyingStrings() map is done as an atomic unit.

Note: If you look in StringView.cpp, you see another setUnderlyingString() which takes a StringView otherString.  This version of setUnderlyingString() can only be called from within the same thread that created the other StringView.  As a result, here, we are guaranteed that the UnderlyingString refCount is never zero, and there's no other threat of another thread trying to delete the UnderlyingString while we adopt it.  Hence, we don't need to acquire the underlyingStringsMutex here.
Comment 1 Mark Lam 2016-09-28 14:11:18 PDT
Created attachment 290119 [details]
proposed patch.
Comment 2 Geoffrey Garen 2016-09-28 14:29:17 PDT
Comment on attachment 290119 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=290119&action=review

r=me

> Source/WTF/wtf/text/StringView.cpp:272
> +    // StringView instances are not passed across threads. Hence, since we have a
> +    // reference to otherString, we are guaranteed that its underlyingString refCount
> +    // will remain non-zero for us to increment. We don't need to grab the
> +    // underlyingStringsMutex.

It's a general principle that any caller that passes a const& object should not pass garbage memory, so I don't think this comment adds anything -- and it might harm readability by suggesting that anything more special than that is happening here.
Comment 3 Mark Lam 2016-09-28 14:33:53 PDT
Thanks for the review.  I removed the comment in setUnderlyingString().

Landed in r206552: <http://trac.webkit.org/r206552>.
Comment 4 Darin Adler 2016-09-29 12:43:05 PDT
Comment on attachment 290119 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=290119&action=review

> Source/WTF/wtf/text/StringView.cpp:240
> +        std::lock_guard<StaticLock> lock(underlyingStringsMutex);
>          if (!--m_underlyingString->refCount) {

Can we make UnderlyingString::refCount an atomic instead? I am concerned that the locking will slow things down too much.
Comment 5 Darin Adler 2016-09-29 12:55:40 PDT
Not that the performance here is critical, but even in debug builds this is hot enough that it might be nice to find a way to make it faster.
Comment 6 Darin Adler 2016-09-29 12:59:21 PDT
I also suspect that in the original case where this was failing, there is actually a thread safety problem in the client code, and the root cause is not the lack of thread safety in StringView itself. After all, the reference counts of the strings themselves are not guarded against access from multiple threads, so it’s not clear to me how use of a StringView of one of those strings could be safe.

I would like to look at that case at some point.
Comment 7 Mark Lam 2016-09-29 13:18:26 PDT
(In reply to comment #4)
> Comment on attachment 290119 [details]
> proposed patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=290119&action=review
> 
> > Source/WTF/wtf/text/StringView.cpp:240
> > +        std::lock_guard<StaticLock> lock(underlyingStringsMutex);
> >          if (!--m_underlyingString->refCount) {
> 
> Can we make UnderlyingString::refCount an atomic instead? I am concerned
> that the locking will slow things down too much.

UnderlyingString::refCount is already an atomic.  The issue is that the adjustment of refCount and the insertion/removal from the underlyingStrings map needs to be an atomic unit.  

If the 2 are not in an atomic unit, then another thread can inc the refCount while we're waiting for the underlyingStringsMutex for the deletion.  I supposed I can check the refCount again after acquiring the lock to decide whether to commence with removal from the map and deletion, or to skip them.  I'll look into that.

(In reply to comment #6)
> I also suspect that in the original case where this was failing, there is
> actually a thread safety problem in the client code, and the root cause is
> not the lack of thread safety in StringView itself. After all, the reference
> counts of the strings themselves are not guarded against access from
> multiple threads, so it’s not clear to me how use of a StringView of one of
> those strings could be safe.
> 
> I would like to look at that case at some point.

I'm not sure that this is true.  The refCount of the UnderlyingString record is not tied to the refCount of the StringImpl that this StringView is associated with.  In the test cases that were failing, the relevant StringImpl is that of a StringSourceProvider (i.e. very long lived relative to the StringViews).

It's just that StringViews were being constructed and destructed many times from different threads, and with that comes the creation and deletion of the associated UnderlyingString record.  Based on the evidence I've seen so far, it seems clear to me that the issue really is in StringView::adoptUnderlyingString() management of the UnderlyingString record, and does not have anything to do with the associated StringImpl.
Comment 8 Mark Lam 2016-09-29 14:13:43 PDT
(In reply to comment #7)
> If the 2 are not in an atomic unit, then another thread can inc the refCount
> while we're waiting for the underlyingStringsMutex for the deletion.  I
> supposed I can check the refCount again after acquiring the lock to decide
> whether to commence with removal from the map and deletion, or to skip them.
> I'll look into that.

This won't work.  Imagine this scenario:

1. Thread T1 acquires and holds on to the underlyingStringsMutex.

2. Thread T2 has a StringView associated with StringImpl S2 and UnderlyingString U2, and is done with the StringView.
   T2 decrements UnderlyingString U2's refCount to 0 and wants to delete it, but blocks (in adoptUnderlyingString()) waiting for the underlyingStringsMutex owned by T1.

3. Thread T3 wants a StringView associated with StringImpl S2, but blocks (in setUnderlyingString()) waiting for the underlyingStringsMutex owned by T1.

4. T1 releases the underlyingStringsMutex.

5. T3 acquires the underlyingStringsMutex.

6. T3 queries the map and finds U2 associated with S2 (because T2 is still blocked and hasn't removed U2 from the map yet).
   Note: U2's refCount is already 0 because decremented it to 0 back in step 2 above.

7. T3 increments U2's refCount.
   T3 releases the underlyingStringsMutex.
   T3 is done with its StringView and destructs it.
   T3 acquires the underlyingStringsMutex.
   T3 decrements U2's refCount to 0, and deletes U2.
   T3 releases the underlyingStringsMutex.

8. T2 finally unblocks and acquires the underlyingStringsMutex.
   T2 checks U2's refCount and sees that it is 0 (because T3 decrement it back to 0 too) <==== Use after free right here.
   T2 tries to remove U2 from the map but fails because U2 is not in the map).
   T2 deleted U2.  <======= Double delete.

Another variation on this is Thread T4 shows up between step 7 and 8, and allocates a new UnderlyingString U4.  But since T3 already deleted U2, the memory for U2 gets re-allocated as U4 now, and gets re-inserted to the map.  Later in step 8, T2 actually delete U4 (because it has the same memory address as the deleted U2).  Eventually, T4 tries to delete U4 and crashes.

The only way to avoid this is to ensure that if ever decrement an UnderlyingString's refCount to 0, that UnderlyingString must also be removed from the map (atomically with the refCount decrement) to prevent it from being used by another thread.

Oh wait, we could change StringView::setUnderlyingString() to check if the found UnderlyingString in the may has a refCount of 0.  If so, this means it is in the process of being removed and deleted.  As such, we can remove if from the map there and make a new one as if we didn't find it in the map.  Let me try that.
Comment 9 Mark Lam 2016-09-29 14:25:18 PDT
(In reply to comment #8)
> Oh wait, we could change StringView::setUnderlyingString() to check if the
> found UnderlyingString in the may has a refCount of 0.  If so, this means it
> is in the process of being removed and deleted.  As such, we can remove if
> from the map there and make a new one as if we didn't find it in the map. 
> Let me try that.

No, that won't work either because the pending deletion and removal will do the removal from using the StringImpl* as the key.  By then, the entry in the map can be for a different UnderlyingString.  We could continue this dance and first check if the would be removed UnderlyingString pointer value from the map is the same as the one we expect.  If it's the same, then we can proceed with the removal.

What all this means is: if we want to avoid that lock on deletion, we'll have to:
1. make StringView::adoptUnderlyingString() second guess whether it can really delete the UnderlyingString record, and
2. make StringView::setUnderlyingString() second guess whether it has really found an existing UnderlyingString record in the map.

This code is a lot more tricky.  All it does is make the refCount decrements faster for refCounts greater than 1.  It also makes all removal of the UnderlyingString record from the map slower because we're now making it do 2 map lookups: one to check if the record has been replaced, one to remove it (if not replaced).  The setUnderlyingString() side will also be slower when UnderlyingString record already exists in the map.

Hence, the code gets a lot more tricky, and I'm not sure that it is a net win.
Comment 10 Darin Adler 2016-09-29 17:01:57 PDT
OK, I guess we shouldn’t spend more time on this if it’s fast enough to be bearable.