Bug 164220

Summary: NetworkSession: Network process crash when converting main resource to download
Product: WebKit Reporter: Carlos Garcia Campos <cgarcia>
Component: WebKit2Assignee: Carlos Garcia Campos <cgarcia>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, bugs-noreply, cdumez, commit-queue, ryanhaddad
Priority: P2 Keywords: Soup
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 164631    
Bug Blocks: 164458    
Attachments:
Description Flags
Patch
achristensen: review+
Patch
achristensen: review+
New patch
none
Patch none

Description Carlos Garcia Campos 2016-10-31 03:49:38 PDT
Right after the main resource load is converted to a download, the web process deletes the ResourceLoader which sends the RemoveLoadIdentifier to the network process and the load is aborted. Sometimes it happens that NetworkResourceLoader::abort() is called while the NetworkLoad is still deciding the destination of the download. In such case, NetworkResourceLoader::didConvertToDownload() has already been called, but not NetworkResourceLoader::didBecomeDownload(). In NetworkResourceLoader::abort() we already handle the case of having a NetworkLoad after NetworkResourceLoader::didConvertToDownload() has been called, to avoid canceling the load in such case, however cleanup() is always called unconditionally and the NetworkLoad is deleted before NetworkResourceLoader::didBecomeDownload() is called. When the NetworkLoad is destroyed the NetworkDataTask client becomes nullptr, leaving it in a state where both the client is nullptr and the download hasn't been created yet. That's not expected to happen and when the response completion handler is called in the NetworkDataTask it tries to use either the client or the download and it crashes. In the GTK+ port we have unit test for this use case, but the find destination happens so fasta that RemoveLoadIdentifier message is always received after etworkResourceLoader::didBecomeDownload() is called.
Comment 1 Carlos Garcia Campos 2016-10-31 04:25:12 PDT
Created attachment 293415 [details]
Patch
Comment 2 Michael Catanzaro 2016-10-31 04:30:23 PDT
Comment on attachment 293415 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=293415&action=review

> Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestDownloads.cpp:545
> +        if (test->m_shouldDelayDecideDestination)
> +            g_usleep(0.2 * G_USEC_PER_SEC);

There's no way to know when the load has been aborted? :/
Comment 3 Carlos Garcia Campos 2016-10-31 04:40:32 PDT
(In reply to comment #2)
> Comment on attachment 293415 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=293415&action=review
> 
> > Tools/TestWebKitAPI/Tests/WebKit2Gtk/TestDownloads.cpp:545
> > +        if (test->m_shouldDelayDecideDestination)
> > +            g_usleep(0.2 * G_USEC_PER_SEC);
> 
> There's no way to know when the load has been aborted? :/

No, because it's an internal abort, the load is converted to a download, which means that a Download is created to continue with the load, and the ResourceLoader is destroyed. So, the load is not actually aborted, it's the NetworkResourceLoader. The problem is that we were deleting the NetworkResourceLoader before the network load finished to become a download. From the API point of view there isn't any abort, after we decide the load is a download, we use the WebKitDownload object and forget about the loader client.
Comment 4 Carlos Garcia Campos 2016-10-31 10:00:07 PDT
Committed r208154: <http://trac.webkit.org/changeset/208154>
Comment 5 Ryan Haddad 2016-10-31 14:34:12 PDT
Reverted r208154 for reason:

This change caused an assertion failure during API tests on macOS.

Committed r208174: <http://trac.webkit.org/changeset/208174>
Comment 6 Ryan Haddad 2016-10-31 14:34:54 PDT
*** Bug 164237 has been marked as a duplicate of this bug. ***
Comment 7 Carlos Garcia Campos 2016-11-01 03:20:03 PDT
Created attachment 293544 [details]
Patch

Updated patch to fix the assert. The abort is no longer cleaning up the resource loader when becoming a download.
Comment 8 Carlos Garcia Campos 2016-11-02 10:33:43 PDT
Alex, could you review this again please? this is causing network process to crash on every single download started by a web page in epiphany :-P
Comment 9 Alex Christensen 2016-11-02 11:27:38 PDT
http://trac.webkit.org/changeset/208293
Comment 11 Alex Christensen 2016-11-04 15:11:07 PDT
When we make a NetworkResourceLoader become a download, it needs to not be owned by the NetworkConnectionToWebProcess any more if we want it to stay alive when a NetworkConnectionToWebProcess is destroyed.  We need a new container for the NetworkResourceLoaders.
Comment 12 Carlos Garcia Campos 2016-11-05 04:42:45 PDT
Created attachment 293992 [details]
New patch

This is a new approach that ensures the NetworkResourceLoader is cleaned up and destroyed right after becoming a download, but the NetworkLoad keeps alive until the NetworkDataTask becomes a download, by using PendingDownload to adopt the NetworkLoad. I think this also makes the downloads life cycle easier to follow now, since we will always have a PendingDownload, and we no longer need didBecomeDownload() callbacks.
Comment 13 Alex Christensen 2016-11-07 14:12:51 PST
Comment on attachment 293992 [details]
New patch

View in context: https://bugs.webkit.org/attachment.cgi?id=293992&action=review

> Source/WebKit2/NetworkProcess/NetworkLoad.h:134
> -    NetworkLoadClient& m_client;
> +    NetworkLoadClient* m_client;

This should be a std::reference_wrapper, which is reassignable but never null.
Comment 14 Chris Dumez 2016-11-09 19:52:41 PST
Created attachment 294325 [details]
Patch
Comment 15 Chris Dumez 2016-11-09 19:53:55 PST
Committing this as I suspect this will fix the assertion being hit by the patch at Bug 164458.
Comment 16 WebKit Commit Bot 2016-11-09 20:52:14 PST
Comment on attachment 294325 [details]
Patch

Clearing flags on attachment: 294325

Committed r208521: <http://trac.webkit.org/changeset/208521>
Comment 17 WebKit Commit Bot 2016-11-09 20:52:19 PST
All reviewed patches have been landed.  Closing bug.
Comment 18 Carlos Garcia Campos 2016-11-10 00:29:46 PST
(In reply to comment #15)
> Committing this as I suspect this will fix the assertion being hit by the
> patch at Bug 164458.

Perfect, thank you!
Comment 19 Chris Dumez 2016-11-10 07:35:00 PST
Why was this reopened?
Comment 20 Carlos Garcia Campos 2016-11-10 07:43:51 PST
(In reply to comment #19)
> Why was this reopened?

History says it was me, but I really don't know how I did it, sorry.
Comment 21 Chris Dumez 2016-11-18 11:42:50 PST
(In reply to comment #20)
> (In reply to comment #19)
> > Why was this reopened?
> 
> History says it was me, but I really don't know how I did it, sorry.

From C++11 standard:
"""
Additionally, u can, upon request, transfer ownership to another unique pointer u2. Upon completion of such a transfer, the following postconditions hold:
— u2.p is equal to the pre-transfer u.p,
— u.p is equal to nullptr, and
"""

So my understanding is that the unique_ptr member is guaranteed to be null after you WTFMove() it out to another unique_ptr.

There's got to be something else going on here. Maybe the method is called on a non-main-thread and we're accessing m_networkLoad from 2 threads?