Bug 152675

Summary: ResourceHandleCFURLConnectionDelegateWithOperationQueue delegate methods don't NULL-check m_handle->client()
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: Page LoadingAssignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, aestes, ap, beidson, bfulgham, commit-queue, dbates, psolanki, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch v1 none

Description David Kilzer (:ddkilzer) 2016-01-03 16:27:47 PST
ResourceHandleCFURLConnectionDelegateWithOperationQueue delegate methods don't NULL-check m_handle->client() in Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp.

This has lead to a number of NULL-pointer dereference crashes in the com.apple.WebKit.Networking process on iOS.
Comment 1 David Kilzer (:ddkilzer) 2016-01-03 16:28:14 PST
<rdar://problem/24034044>
Comment 2 David Kilzer (:ddkilzer) 2016-01-03 16:40:27 PST
Created attachment 268156 [details]
Patch v1
Comment 3 Alexey Proskuryakov 2016-01-03 19:12:19 PST
Comment on attachment 268156 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=268156&action=review

> Source/WebCore/ChangeLog:7
> +        Reviewed by NOBODY (OOPS!).

What tests is this covered by?

> Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp:191
> +        if (!protector->hasHandle() || !m_handle->client())

I wonder if the design is to have protector's handle zeroed out when removing the client, but we have a race. If so, adding a null check will not fix the crashes.
Comment 4 David Kilzer (:ddkilzer) 2016-01-03 20:54:57 PST
(In reply to comment #3)
> Comment on attachment 268156 [details]
> Patch v1
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=268156&action=review
> 
> > Source/WebCore/ChangeLog:7
> > +        Reviewed by NOBODY (OOPS!).
> 
> What tests is this covered by?

I don't think it's possible to write tests for this.  However, I'm not an expert in this area of code, so perhaps you have some ideas?

> > Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp:191
> > +        if (!protector->hasHandle() || !m_handle->client())
> 
> I wonder if the design is to have protector's handle zeroed out when
> removing the client, but we have a race. If so, adding a null check will not
> fix the crashes.

I'll try disassembling the method to see if I can figure out which dereference is causing the crash (m_handle-> or client()->).
Comment 5 Alexey Proskuryakov 2016-01-04 09:17:19 PST
> I don't think it's possible to write tests for this.  However, I'm not an expert in this area of code, so perhaps you have some ideas?

To answer this question, I'd need to know more about what circumstances this crash occurs in.

> I'll try disassembling the method to see if I can figure out which dereference is causing the crash (m_handle-> or client()->).

Not sure if that will answer my question though. What I'm asking is whether m_handle->client() can become null after the null check that you add, and before the subsequent dereference.
Comment 6 Brent Fulgham 2016-01-13 11:12:13 PST
Comment on attachment 268156 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=268156&action=review

>>> Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp:191
>>> +        if (!protector->hasHandle() || !m_handle->client())
>> 
>> I wonder if the design is to have protector's handle zeroed out when removing the client, but we have a race. If so, adding a null check will not fix the crashes.
> 
> I'll try disassembling the method to see if I can figure out which dereference is causing the crash (m_handle-> or client()->).

Is this still planned? Should we move forward with this patch as-is?
Comment 7 David Kilzer (:ddkilzer) 2016-01-16 04:38:14 PST
Comment on attachment 268156 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=268156&action=review

>>>> Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp:191
>>>> +        if (!protector->hasHandle() || !m_handle->client())
>>> 
>>> I wonder if the design is to have protector's handle zeroed out when removing the client, but we have a race. If so, adding a null check will not fix the crashes.
>> 
>> I'll try disassembling the method to see if I can figure out which dereference is causing the crash (m_handle-> or client()->).
> 
> Is this still planned? Should we move forward with this patch as-is?

I haven't had a chance to investigate what object is being referenced yet, but I'm not sure why the design of this class would differ from Source/WebCore/platform/network/mac/WebCoreResourceHandleAsOperationQueueDelegate.mm, which has the same NULL checks.

Alexey, why do you think this class differs in behavior from WebCoreResourceHandleAsOperationQueueDelegate.mm?
Comment 8 Alexey Proskuryakov 2016-01-16 19:33:44 PST
To clarify, my comment was asking for fairly straightforward code inspection. I don't expect that checking disassembly would be very relevant.
Comment 9 Brent Fulgham 2016-01-20 16:59:06 PST
Comment on attachment 268156 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=268156&action=review

Based on the other uses of this class pattern, and the way other OperationQueue classes in WebKit are written, this seems like the right approach. r=me.

>>>>> Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp:191
>>>>> +        if (!protector->hasHandle() || !m_handle->client())
>>>> 
>>>> I wonder if the design is to have protector's handle zeroed out when removing the client, but we have a race. If so, adding a null check will not fix the crashes.
>>> 
>>> I'll try disassembling the method to see if I can figure out which dereference is causing the crash (m_handle-> or client()->).
>> 
>> Is this still planned? Should we move forward with this patch as-is?
> 
> I haven't had a chance to investigate what object is being referenced yet, but I'm not sure why the design of this class would differ from Source/WebCore/platform/network/mac/WebCoreResourceHandleAsOperationQueueDelegate.mm, which has the same NULL checks.
> 
> Alexey, why do you think this class differs in behavior from WebCoreResourceHandleAsOperationQueueDelegate.mm?

These null-checks are the same as the ones Alexey wrote in WebCoreResourceHandleAsOperationQueueDelegate, so I'm not sure why we're spending so much time debating this.

If there is a race condition, we'll see that in further testing so why not just proceed with this change?
Comment 10 Brent Fulgham 2016-01-20 17:25:42 PST
(In reply to comment #5)
> Not sure if that will answer my question though. What I'm asking is whether
> m_handle->client() can become null after the null check that you add, and
> before the subsequent dereference.

It doesn't seem likely.

The client is assigned to the ResourceHandleInternal object at construction time, and can be cleared by ResourceHandle::clearClient().

I see clearClient being called in just a handful of places:
1. The NetworkLoad destructor.
2. ResourceLoader's "finishNetworkLoad()" method.
3. ApplicationCacheGroup's "stopLoading()" method.

The ResourceLoader 'finishNetworkLoad' call happens when it is releasing its resources, which happens after the load is complete (or terminated).

The ApplicationCacheGroup 'stopLoading()' call happens as cleanup when ending loads.

I don't think there are expected cases where the m_client member is cleared while ongoing resource loads are happening.
Comment 11 WebKit Commit Bot 2016-01-20 17:49:23 PST
Comment on attachment 268156 [details]
Patch v1

Clearing flags on attachment: 268156

Committed r195393: <http://trac.webkit.org/changeset/195393>
Comment 12 WebKit Commit Bot 2016-01-20 17:49:28 PST
All reviewed patches have been landed.  Closing bug.
Comment 13 Alexey Proskuryakov 2016-01-20 20:36:41 PST
> I don't think there are expected cases where the m_client member is cleared while ongoing resource loads are happening.

I'm not sure how this crash can happen then - we have a null client in a callback, i.e. while the load is ongoing. We certainly didn't start with a null client.
Comment 14 Brent Fulgham 2016-01-21 09:49:02 PST
(In reply to comment #13)
> > I don't think there are expected cases where the m_client member is cleared while ongoing resource loads are happening.
> 
> I'm not sure how this crash can happen then - we have a null client in a
> callback, i.e. while the load is ongoing. We certainly didn't start with a
> null client.

I admit that there may be cases where a load is cancelled (or fails) where cleanup happens while asynchronous callbacks are enqueued. It would be good to understand how this happens, and fix things at that point.

However, the conditions under which this might happen are not obvious from reading the source code. This simple change will make it less likely for users to experience a crash under these conditions.
Comment 15 David Kilzer (:ddkilzer) 2016-01-23 10:39:10 PST
(In reply to comment #3)
> Comment on attachment 268156 [details]
> Patch v1
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=268156&action=review
> 
> > Source/WebCore/ChangeLog:7
> > +        Reviewed by NOBODY (OOPS!).
> 
> What tests is this covered by?
> 
> > Source/WebCore/platform/network/cf/ResourceHandleCFURLConnectionDelegateWithOperationQueue.cpp:191
> > +        if (!protector->hasHandle() || !m_handle->client())
> 
> I wonder if the design is to have protector's handle zeroed out when
> removing the client, but we have a race. If so, adding a null check will not
> fix the crashes.

I see ResourceHandle::clearClient() is called in ResourceLoader::finishNetworkLoad(), and then derefs m_handle by assigning nullptr in ResourceLoader.cpp:

void ResourceLoader::finishNetworkLoad()
{
    platformStrategies()->loaderStrategy()->remove(this);

    if (m_handle) {
        ASSERT(m_handle->client() == this);
        m_handle->clearClient();
        m_handle = nullptr;
    }
}

Note that the new-ish NetworkLoad::~NetworkLoad() does not clear m_handle, but the destructor would take care of deref-ing the ResourceHandle in NetworkHandle.cpp:

NetworkLoad::~NetworkLoad()
{
    ASSERT(RunLoop::isMain());
#if USE(NETWORK_SESSION)
    if (m_responseCompletionHandler)
        m_responseCompletionHandler(PolicyIgnore);
#else
    if (m_handle)
        m_handle->clearClient();
#endif
}

However, I don't see any code that calls clearClient() and then clears m_handle using releaseHandle().

--

I do think ResourceHandleCFURLConnectionDelegate should probably set m_handle to nullptr, though, since it's a pointer (filed Bug 153401):

ResourceHandleCFURLConnectionDelegate::~ResourceHandleCFURLConnectionDelegate()
{
}

--

I also noticed that while ResourceHandleCFURLConnectionDelegate is ThreadSafeRefCounted, and that ResourceHandleCFURLConnectionDelegateWithOperationQueue is, too, by inheritance, ResourceHandle is NOT ThreadSafeRefCounted, only RefCounted.

Now ResourceHandleCFURLConnectionDelegate only holds a pointer to its ResourceHandle, but I wonder if there are any cases where we rely on reference counting for a ResourceHandle to keep it alive on a background thread (either in NSURLConnection or CFURLConnection code paths)?