Bug 150927 - Inconsistencies in main resource load delegates when loading from history
Summary: Inconsistencies in main resource load delegates when loading from history
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 142375
  Show dependency treegraph
 
Reported: 2015-11-05 05:14 PST by Carlos Garcia Campos
Modified: 2016-01-12 00:24 PST (History)
12 users (show)

See Also:


Attachments
Patch (6.06 KB, patch)
2015-11-05 05:23 PST, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (10.29 KB, patch)
2015-11-05 05:28 PST, Carlos Garcia Campos
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews104 for mac-mavericks-wk2 (767.03 KB, application/zip)
2015-11-05 06:00 PST, Build Bot
no flags Details
Archive of layout-test-results from ews100 for mac-mavericks (647.16 KB, application/zip)
2015-11-05 06:12 PST, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-yosemite (779.05 KB, application/zip)
2015-11-05 06:19 PST, Build Bot
no flags Details
Updated patch (11.65 KB, patch)
2015-11-05 06:22 PST, Carlos Garcia Campos
mcatanzaro: review+
Details | Formatted Diff | Diff
Patch for landing (12.32 KB, patch)
2016-01-11 23:45 PST, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2015-11-05 05:14:19 PST
The order in which load delegates are called when loading a resource from history is different if the page cache is enabled or not. When the page cache is disabled, the order is the same to any other resource loaded from the network, but when the resource is loaded from the page cache, all the delegates are called after the load is committed. This has always been a problem for the GTK+ port (see bugs #91482 and #91478, for example), and I ended up trying to workaround the issue with messy code in WebKitWebView that is still causing us a lot of problems, see bug #142375. So, basically we are trying to ensure we always have a main resource with a response when the load is committed, that's something our API claims, and what applications expect. We have tried to solve this problem without affecting the WebCore behaviour, but it has caused more problems, and I think WebCore loader should be consistent in any case, so i think it's time to fix it in WebCore and remove all our hacks in GTK+ API layer.
Comment 1 Carlos Garcia Campos 2015-11-05 05:23:16 PST
Created attachment 264855 [details]
Patch
Comment 2 Carlos Garcia Campos 2015-11-05 05:28:27 PST
Created attachment 264856 [details]
Updated patch

I forgot to git add the -expected.txt
Comment 3 Build Bot 2015-11-05 06:00:05 PST
Comment on attachment 264856 [details]
Updated patch

Attachment 264856 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/385800

New failing tests:
loader/go-back-cached-main-resource.html
http/tests/loading/main-resource-delegates-on-back-navigation.html
Comment 4 Build Bot 2015-11-05 06:00:10 PST
Created attachment 264858 [details]
Archive of layout-test-results from ews104 for mac-mavericks-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-mavericks-wk2  Platform: Mac OS X 10.9.5
Comment 5 Carlos Garcia Campos 2015-11-05 06:08:59 PST
(In reply to comment #3)
> Comment on attachment 264856 [details]
> Updated patch
> 
> Attachment 264856 [details] did not pass mac-wk2-ews (mac-wk2):
> Output: http://webkit-queues.webkit.org/results/385800
> 
> New failing tests:
> loader/go-back-cached-main-resource.html
> http/tests/loading/main-resource-delegates-on-back-navigation.html

The problem seems to be that in mac comparing the responses to skip the main resource when sending the notifications for the subresources doesn't work. I guess there's something in platform data that is different. So, I need a different way to skip the main resource, it's probably safe to assume the first response is the main resource one, since they are appended.
Comment 6 Build Bot 2015-11-05 06:12:48 PST
Comment on attachment 264856 [details]
Updated patch

Attachment 264856 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/385839

New failing tests:
loader/go-back-cached-main-resource.html
Comment 7 Build Bot 2015-11-05 06:12:53 PST
Created attachment 264860 [details]
Archive of layout-test-results from ews100 for mac-mavericks

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews100  Port: mac-mavericks  Platform: Mac OS X 10.9.5
Comment 8 Build Bot 2015-11-05 06:18:58 PST
Comment on attachment 264856 [details]
Updated patch

Attachment 264856 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/385836

New failing tests:
loader/go-back-cached-main-resource.html
Comment 9 Build Bot 2015-11-05 06:19:02 PST
Created attachment 264861 [details]
Archive of layout-test-results from ews112 for mac-yosemite

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-yosemite  Platform: Mac OS X 10.10.5
Comment 10 Carlos Garcia Campos 2015-11-05 06:22:40 PST
Created attachment 264862 [details]
Updated patch

Try to fix the tests. loader/go-back-cached-main-resource.html also failed here, but the difference look like an improvement to me, so I've updated the expected.
Comment 11 Michael Catanzaro 2015-11-05 12:09:40 PST
Comment on attachment 264862 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=264862&action=review

Thanks Carlos; as you're well-aware, this has been a big problem for us....

> Source/WebCore/loader/FrameLoader.cpp:1872
> +        // Main resource delegates were already sent, se we skip the first response here.

(typo: se -> so)
Comment 12 Alexey Proskuryakov 2015-11-05 21:22:15 PST
Comment on attachment 264862 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=264862&action=review

> LayoutTests/ChangeLog:9
> +        Add test to check that load delegates are the same when loading
> +        from history with the page cache enabled and disabled.

Why is this right? There is no loading when restoring a page from page cache, so it's not clear to me why delegates should be the same.

Also, the behavior is different when restoring in that all state is preserved.
Comment 13 Carlos Garcia Campos 2015-11-05 22:46:42 PST
(In reply to comment #12)
> Comment on attachment 264862 [details]
> Updated patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=264862&action=review
> 
> > LayoutTests/ChangeLog:9
> > +        Add test to check that load delegates are the same when loading
> > +        from history with the page cache enabled and disabled.
> 
> Why is this right? There is no loading when restoring a page from page
> cache, so it's not clear to me why delegates should be the same.
> 
> Also, the behavior is different when restoring in that all state is
> preserved.

Also when restoring from page cache there isn't any response form the server nor data received and we call those delegates to notify the API layer about it. So, if we do call the delegates why not being consistent? It doesn't make much sense to me that the load is committed but there's no main resource loading at all. Ideally, from the API layer point of view, restoring from the cache should be transparent, like it is when loading from the disk cache, for example.
Comment 14 Anders Carlsson 2015-11-06 09:57:46 PST
I kinda agree with Alexey here.
Comment 15 Michael Catanzaro 2015-11-06 11:34:39 PST
(In reply to comment #12)
> > LayoutTests/ChangeLog:9
> > +        Add test to check that load delegates are the same when loading
> > +        from history with the page cache enabled and disabled.
> 
> Why is this right? There is no loading when restoring a page from page
> cache, so it's not clear to me why delegates should be the same.

Applications get very, very confused when load events happen out of order. To make it easier to write applications, we have an API guarantee that events will only occur in the order STARTED -> REDIRECTED -> COMMITTED -> FINISHED. But we currently implement this guarantee with hacks in the GTK+ WebKitWebView object, to delay the emission of WebCore events when they arrive out of order when loading from the page cache. So when loading from the page cache, the load state exposed by our API is different than the load state exposed by WebCore. We've found this leads to $BUGS.

For example, after COMMITTED apps expect to be able to get TLS certificate info, but when loading from the page cache, we transition to the COMMITTED state for the load after WebCore has already transitioned to STARTED for the next load, and the TLS certificate info is NULL when we've guaranteed it not to be because it got cleared when WebCore transitioned to STARTED. It's going to be difficult for us to fix all such issues when discovered. I'm actually surprised that you're not having similar bugs.
Comment 16 Carlos Garcia Campos 2015-11-07 00:15:07 PST
(In reply to comment #14)
> I kinda agree with Alexey here.

Could you elaborate? I also agree with ap that there's no actual load when restoring from the page cache, but we are calling the load delegates in that case already. So, the only change here is that they are called in the same order as when the page cache is disabled. I think this has a minimum impact (if any) in other ports, but a huge impact in GTK+. We could get rid of several hacks that have been causing issues and crashes since I added them.
Comment 17 Michael Catanzaro 2015-11-12 10:56:14 PST
I talked to Alexey about this yesterday. I believe his concern is that reordering the load delegates has the potential to break WK1 applications.

We might need some setting to choose between the different behaviors?
Comment 18 Alexey Proskuryakov 2015-11-12 11:00:34 PST
Another thing that we discussed was that it sounds like an actual bug if SSL certificate is forgotten in UI process before the next loads commits (which was what prompted this investigation).
Comment 19 Carlos Garcia Campos 2015-11-12 11:05:21 PST
(In reply to comment #17)
> I talked to Alexey about this yesterday. I believe his concern is that
> reordering the load delegates has the potential to break WK1 applications.
> 
> We might need some setting to choose between the different behaviors?

The order of the delegates is the same, set id, did receive response, data, finish. The difference is that set id and did receive response happens before the frame load is committed, like for all others loads. Do wk1 applications have special code to deal with history loads when page cache is enabled?
Comment 20 Carlos Garcia Campos 2015-11-12 11:07:49 PST
(In reply to comment #18)
> Another thing that we discussed was that it sounds like an actual bug if SSL
> certificate is forgotten in UI process before the next loads commits (which
> was what prompted this investigation).

The actual problem is that we don't have a main resource when load is committed. It's not that the certificate is lost, it's that we use the response of the main resource to get the certificate info, and we say in our docs that when the load is committed is the moment to ask for the certificate. Because for a load to be committed we need to have a main resource load and response from the server. That's true in all the cases expect for history loads with page cache enabled.
Comment 21 Carlos Garcia Campos 2015-11-12 11:09:28 PST
In any case we can use platform ifdefs and make it GTK only to ensure we don't break any other port. I don't like adding platform ifdefs to the loading code, but it's already full of IOS ifdefs anyway.
Comment 22 Carlos Garcia Campos 2015-11-17 02:29:56 PST
Would you agree with adding platform ifdefs for this for now? I personally don't like it, but I don't see any other way of unblocking this.
Comment 23 Alexey Proskuryakov 2015-11-17 11:08:46 PST
It's still my understanding that the delivery order is wrong, but red herring, and that the real issue is that some level of code doesn't understand what provisional load vs. committed load means. Sorry, I will not have the time to look into this deeper.
Comment 24 Michael Catanzaro 2015-11-17 13:48:48 PST
I just found bug #59111 via the FIXME in WebPageProxy::didCommitLoadForFrame... it is a bit discouraging.

(In reply to comment #23)
> It's still my understanding that the delivery order is wrong, but red
> herring, and that the real issue is that some level of code doesn't
> understand what provisional load vs. committed load means. Sorry, I will not
> have the time to look into this deeper.

To be clear, I was completely wrong; delivery order is not the problem, I only thought it was because I debugged this issue half a year ago and forgot why we had delayed load events. It's my fault for confusing you. The only problem here is that the page cache code sends didCommitLoad too soon, before a main resource is available, like Carlos says.
Comment 25 Carlos Garcia Campos 2015-11-21 02:52:45 PST
(In reply to comment #23)
> It's still my understanding that the delivery order is wrong, but red
> herring, and that the real issue is that some level of code doesn't
> understand what provisional load vs. committed load means. Sorry, I will not
> have the time to look into this deeper.

I'll try to start from scratch again, please forget about the TLS certificate issues, because that's a problem of our workaround in the UI process to solve this issue, but not the problem itself.

When a load starts the way load delegates are called is usually the following one:

1 - didStartProvisionalLoadForFrame
2 - willSendRequest (main resource)
3 - didReceiveResponse (main resource)
4 - didCommitLoadForFrame
didReceiveData (main resource)
other resource start loading as well
didFinishLoading (main resource)
didFinishLoadForFrame

I stopped using numbers after step 4 because the order can change from that step on, but not before, because the load is never committed until the main resource has started loading and a response has received from the server. At that point the load is no longer provisional and it's committed. Because of this fact, our public API claims that you can call webkit_web_view_get_main_resource() once the load has been committed and you will get an object, and not only that but also that you can call webkit_web_resource_get_response() to get the response. What triggered this investigation (see bugs #91482 and #91478) was that we were getting null pointers in same cases. Then we realized that it was only when loading from history, and I tried to fix it on our side instead of fixing the real issue as I'm doing now. My attempts to workaround the issue ended up being hack after hack, that never actually fixed the issue, but made the code a lot more complex and fragile. 

What happens when loading from the history cache *and the page cache is enabled*, is the following:

1 - didStartProvisionalLoadForFrame
2 - didCommitLoadForFrame
3 - willSendRequest (main resource)
4 - didReceiveResponse (main resource)
5 - didReceiveData (main resource)
6 - didFinishLoading (main resource)
other resource start loading as well
didFinishLoadForFrame

So, when the load is committed there's no main resource from the API point of view and of course there isn't a response either. When restoring from page cache, there's no actual load, but we still call all the load delegates, because this should be transparent from the client point of view, but it's not actually transparent, because of these inconsistencies. So, what this patch does is ensuring that willSendRequest and didReceiveResponse for the main resource are called before didCommitLoadForFrame. I don't see how this can break existing apps, unless there are applications having different ways to handle load delegates when the page is loaded from the page cache.

I don't know what else I can do to unblock this. The safest way as I said would be add platform ifdefs, but I wouldn't like it at all. We won't know if this actually breaks anything until we try, and we can just roll it out in such case.
Comment 26 Carlos Garcia Campos 2015-12-03 01:45:24 PST
So, since ap doesn't have time to look at this in detail, could any other reviewer review this patch, please?
Comment 27 Carlos Garcia Campos 2015-12-07 02:20:39 PST
I have attached to bug #142375 the patch we could land after this one, to fix the crash and clean up all the hacks added to workaround this.
Comment 28 Michael Catanzaro 2016-01-02 11:20:28 PST
Comment on attachment 264862 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=264862&action=review

No further objections -> r=me

As a bonus, I think this might fix bug #59111; someone with a Mac ought to check.

> LayoutTests/http/tests/loading/main-resource-delegates-on-back-navigation.html:7
> +//window.jsTestIsAsync = true;

Don't forget to fix these stale comments.

>> Source/WebCore/loader/FrameLoader.cpp:1872
>> +        // Main resource delegates were already sent, se we skip the first response here.
> 
> (typo: se -> so)

Reminder: se -> so
Comment 29 Carlos Garcia Campos 2016-01-11 23:45:01 PST
Created attachment 268748 [details]
Patch for landing

Agreed in IRC with ap to land this with a rephrased ChangeLog.
Comment 30 Carlos Garcia Campos 2016-01-12 00:24:01 PST
Committed r194888: <http://trac.webkit.org/changeset/194888>