Bug 183479 - http/tests/workers/service/service-worker-download.https.html times out with async policy delegates
Summary: http/tests/workers/service/service-worker-download.https.html times out with ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ali Juma
URL:
Keywords: InRadar
Depends on:
Blocks: 180568
  Show dependency treegraph
 
Reported: 2018-03-08 14:03 PST by Ali Juma
Modified: 2018-03-12 17:25 PDT (History)
10 users (show)

See Also:


Attachments
Patch (5.15 KB, patch)
2018-03-09 14:10 PST, Ali Juma
no flags Details | Formatted Diff | Diff
Patch (5.46 KB, patch)
2018-03-09 15:33 PST, Ali Juma
no flags Details | Formatted Diff | Diff
Patch (5.90 KB, patch)
2018-03-12 07:44 PDT, Ali Juma
no flags Details | Formatted Diff | Diff
Patch (6.06 KB, patch)
2018-03-12 08:30 PDT, Ali Juma
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-sierra-wk2 (2.56 MB, application/zip)
2018-03-12 09:40 PDT, EWS Watchlist
no flags Details
Archive of layout-test-results from ews123 for ios-simulator-wk2 (2.19 MB, application/zip)
2018-03-12 10:06 PDT, EWS Watchlist
no flags Details
Patch (6.35 KB, patch)
2018-03-12 12:12 PDT, Ali Juma
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ali Juma 2018-03-08 14:03:47 PST
This times out because data for the iframe load is received before the response policy decision is received. The problem is that we try to commit the load, which cancels the in-progress policy check, which cancels the load. So the iframe never finishes loading, and the test hangs waiting for the iframe onload event.

More details:
ServiceWorkerClientFetch::didReceiveData happens after FrameLoader::checkContentPolicy but before we get a response to that check. So then we get to DocumentLoader::commitLoad, which leads to FrameLoader::transitionToCommitted, which leads to PolicyChecker::stopCheck in FrameLoader::stopLoading, which causes DocumentLoader::continueAfterContentPolicy to get called with decision Policy::Ignore. This leads to the frame loader getting destroyed without ever getting to dispatch a load event.

We need to wait for a policy decision before trying to call DocumentLoader::commitLoad.

Is there an obvious place for DocumentLoader::dataReceived to stash away the data (rather than immediately call DocumentLoader::commitLoad) so that it can be used later when we receive the policy response? Or would it make more sense to somehow defer handling the ServiceWorkerClientFetch::didReceiveData IPC until we're ready for it (though I don't think there's anything SW-specific about this bug, other than maybe the load happens to complete fast enough to beat the policy response).
Comment 1 Ali Juma 2018-03-09 10:59:50 PST
(In reply to Ali Juma from comment #0)
> (though I don't think there's anything
> SW-specific about this bug, other than maybe the load happens to complete
> fast enough to beat the policy response).

Actually, this *does* seem to be SW-specific. The non-SW version of this bug was fixed in bug 182720, where the NetworkResourceLoader::ContinueWillSendRequest message is now delayed until a policy response is received. So it looks like we need an equivalent way to delay receiving data from ServiceWorkers.
Comment 2 youenn fablet 2018-03-09 12:23:39 PST
It might be a dupe of bug 183110.
Comment 3 Ali Juma 2018-03-09 14:10:55 PST
Created attachment 335460 [details]
Patch
Comment 4 Ali Juma 2018-03-09 14:12:34 PST
(In reply to youenn fablet from comment #2)
> It might be a dupe of bug 183110.

Thanks for the pointer! It's the same underlying issue, but looks like there's a race in the setting/reading of m_isCheckingResponse from that bug.
Comment 5 youenn fablet 2018-03-09 14:21:06 PST
(In reply to Ali Juma from comment #4)
> (In reply to youenn fablet from comment #2)
> > It might be a dupe of bug 183110.
> 
> Thanks for the pointer! It's the same underlying issue, but looks like
> there's a race in the setting/reading of m_isCheckingResponse from that bug.

There should be no race since all that code is run on the main thread.
I agree this code is misleading, see bug 181392 for more information on why we have these calls.
Ideally, in a not too far future, we will be able to remove them.

We could document this in the code or maybe add some ASSERT(isMainThread()) in case we want to start receiving data out of the main thread.

Test looks good.
Comment 6 Ali Juma 2018-03-09 14:51:43 PST
(In reply to youenn fablet from comment #5)
> (In reply to Ali Juma from comment #4)
> > (In reply to youenn fablet from comment #2)
> > > It might be a dupe of bug 183110.
> > 
> > Thanks for the pointer! It's the same underlying issue, but looks like
> > there's a race in the setting/reading of m_isCheckingResponse from that bug.
> 
> There should be no race since all that code is run on the main thread.
> I agree this code is misleading, see bug 181392 for more information on why
> we have these calls.

Ah, so it's not a race between threads, but rather than the didReceiveResponse and didReceiveData IPCs are sent almost one after the other in ServiceWorkerFetch::processRepsonse.

So when ServiceWorkerClientFetch::didReceiveResponse is executing, it's possible that the task for handling the didReceiveData message has already been added to the runloop, so the code that gets scheduled on the runloop (via callOnMainThread) in didReceiveResponse will execute *after* didReceiveData executes, and that's why m_isCheckingResponse is still false in didReceiveData.

Moving the check of m_isCheckingResponse into the code that's inside callOnMainThread in didReceiveData gives us the correct ordering, since that code will run *after* the code that's inside callOnMainThread in didReceiveResponse.
Comment 7 Ali Juma 2018-03-09 15:33:06 PST
Created attachment 335478 [details]
Patch

Update changelog description of the race
Comment 8 youenn fablet 2018-03-09 16:09:20 PST
Oh I see, it might be better to set m_isCheckingResponse to true in ServiceWorkerClientFetch::didReceiveResponse before calling callOnMainThread then.
Comment 9 Ali Juma 2018-03-12 07:44:35 PDT
Created attachment 335593 [details]
Patch
Comment 10 Ali Juma 2018-03-12 07:58:54 PDT
Comment on attachment 335593 [details]
Patch

Need to figure out why this version makes imported/w3c/web-platform-tests/service-workers/service-worker/redirected-response.https.html timeout
Comment 11 Ali Juma 2018-03-12 08:30:03 PDT
Created attachment 335597 [details]
Patch
Comment 12 Ali Juma 2018-03-12 08:32:56 PDT
(In reply to Ali Juma from comment #10)
> Comment on attachment 335593 [details]
> Patch
> 
> Need to figure out why this version makes
> imported/w3c/web-platform-tests/service-workers/service-worker/redirected-
> response.https.html timeout

Figured it out -- when clearing m_isCheckingResponse in the code posted to the runloop by didReceiveResponse, we also need to call continueLoadingAfterCheckingResponse to deal with any calls that earlied-out while m_isCheckingResponse was true.
Comment 13 youenn fablet 2018-03-12 09:04:23 PDT
Comment on attachment 335597 [details]
Patch

r=me.

To handle the early exit, you can probably use makeScopeExit and release the scope in case it uses the regular path.
See for instance RegistrationDatabase::openSQLiteDatabase.
Comment 14 EWS Watchlist 2018-03-12 09:40:37 PDT
Comment on attachment 335597 [details]
Patch

Attachment 335597 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/6916049

New failing tests:
imported/w3c/web-platform-tests/service-workers/service-worker/redirected-response.https.html
Comment 15 EWS Watchlist 2018-03-12 09:40:38 PDT
Created attachment 335598 [details]
Archive of layout-test-results from ews105 for mac-sierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-sierra-wk2  Platform: Mac OS X 10.12.6
Comment 16 EWS Watchlist 2018-03-12 10:06:17 PDT
Comment on attachment 335597 [details]
Patch

Attachment 335597 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/6916071

New failing tests:
imported/w3c/web-platform-tests/service-workers/service-worker/redirected-response.https.html
Comment 17 EWS Watchlist 2018-03-12 10:06:19 PDT
Created attachment 335602 [details]
Archive of layout-test-results from ews123 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews123  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.6
Comment 18 youenn fablet 2018-03-12 11:07:40 PDT
Comment on attachment 335597 [details]
Patch


View in context: https://bugs.webkit.org/attachment.cgi?id=335597&action=review

> Source/WebKit/WebProcess/Storage/ServiceWorkerClientFetch.cpp:107
> +            continueLoadingAfterCheckingResponse();

continueLoadingAfterCheckingResponse is a no-op in that case.

> Source/WebKit/WebProcess/Storage/ServiceWorkerClientFetch.cpp:113
> +            continueLoadingAfterCheckingResponse();

We probably do not want to call continueLoadingAfterCheckingResponse() since we will fail the loader just after.

> Source/WebKit/WebProcess/Storage/ServiceWorkerClientFetch.cpp:124
> +            continueLoadingAfterCheckingResponse();

In that case, we probably want to set m_isCheckingResponse back to false but we do not need to call continueLoadingAfterCheckingResponse().
That will ensure that when following the redirection and getting the error, we will report the error.
Comment 19 Ali Juma 2018-03-12 12:12:34 PDT
Created attachment 335621 [details]
Patch
Comment 20 Ali Juma 2018-03-12 12:22:22 PDT
Comment on attachment 335597 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=335597&action=review

>> Source/WebKit/WebProcess/Storage/ServiceWorkerClientFetch.cpp:107
>> +            continueLoadingAfterCheckingResponse();
> 
> continueLoadingAfterCheckingResponse is a no-op in that case.

Removed.

>> Source/WebKit/WebProcess/Storage/ServiceWorkerClientFetch.cpp:113
>> +            continueLoadingAfterCheckingResponse();
> 
> We probably do not want to call continueLoadingAfterCheckingResponse() since we will fail the loader just after.

Removed.

>> Source/WebKit/WebProcess/Storage/ServiceWorkerClientFetch.cpp:124
>> +            continueLoadingAfterCheckingResponse();
> 
> In that case, we probably want to set m_isCheckingResponse back to false but we do not need to call continueLoadingAfterCheckingResponse().
> That will ensure that when following the redirection and getting the error, we will report the error.

The reason we need continueLoadAfterCheckingResponse here is that m_redirectionStatus is set to RedirectionStatus::Receiving below, and only a call to didFinish can advance that to Received, so if we already earlied-out a call to didFinish we need to make sure it gets called again. There's another related issue with redirects: we need to reset the value of m_didFinish in start(), so that a didFinish that happens before the redirection doesn't confuse a future call to continueLoadingAfterCheckingResponse following the redirection.
Comment 21 Chris Dumez 2018-03-12 15:32:33 PDT
Is this ready to land?
Comment 22 WebKit Commit Bot 2018-03-12 15:59:00 PDT
Comment on attachment 335621 [details]
Patch

Clearing flags on attachment: 335621

Committed r229558: <https://trac.webkit.org/changeset/229558>
Comment 23 WebKit Commit Bot 2018-03-12 15:59:01 PDT
All reviewed patches have been landed.  Closing bug.
Comment 24 Radar WebKit Bug Importer 2018-03-12 17:25:23 PDT
<rdar://problem/38397893>