Bug 41813 - XHRs get unnecessary Cache-Control http header after reload
Summary: XHRs get unnecessary Cache-Control http header after reload
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nate Chapin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-07 16:59 PDT by Nate Chapin
Modified: 2010-07-13 04:32 PDT (History)
6 users (show)

See Also:


Attachments
Patch (6.74 KB, patch)
2010-07-07 17:08 PDT, Nate Chapin
fishd: review-
Details | Formatted Diff | Diff
patch #2 (9.56 KB, patch)
2010-07-09 14:41 PDT, Nate Chapin
fishd: review+
fishd: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nate Chapin 2010-07-07 16:59:04 PDT
When a page is reloaded, it verifies all items in the cache with the server instead of just loading from cache if permitted.  The state that controls this action is not cleared when the load is completed (it is cleared during the next navigation), so any subsequent loads (e.g., XHRs) will be given a Cache-Control header, even if the reload "action" has completed.

I believe the state should be cleared after the load event rather than always waiting until the next navigation.

(This causes a lot of extra http requests that get 304 responses on sites like Facebook, where a bunch of static content is loaded via async XHR after the "page has loaded".  Initial report at http://code.google.com/p/chromium/issues/detail?id=7819)
Comment 1 Nate Chapin 2010-07-07 17:08:38 PDT
Created attachment 60807 [details]
Patch
Comment 2 Darin Fisher (:fishd, Google) 2010-07-07 23:13:06 PDT
Comment on attachment 60807 [details]
Patch

WebCore/loader/FrameLoader.cpp:2606
> +          m_loadType = FrameLoadTypeStandard;
I think we also want to reset the load type to standard after performing
a back/forward navigation.  Are there any cases when we wouldn't want to
reset it?

WebCore/loader/SubresourceLoader.cpp:96
> +      else if (!fl->activeDocumentLoader()->isLoadingInAPISense())
I would expect subresources to only care about the state of the documentLoader.
It seems wrong to care if there is a new provisional navigation starting.

Are both of these changes necessary?  It seems like the change to
SubresourceLoader.cpp should be sufficient.  However, resetting m_loadType
certainly seems reasonable to me.
Comment 3 Darin Fisher (:fishd, Google) 2010-07-07 23:13:50 PDT
Comment on attachment 60807 [details]
Patch

R- because of the activeDocumentLoader thing.
Comment 4 Alexey Proskuryakov 2010-07-08 10:01:39 PDT
This seems intentional. What's the difference between content loaded before onload and after onload? If a user initiates a reload, they want to refresh content, and technical details of whether it's loaded via HTML tags or XMLHttpRequest are irrelevant.

In fact, many people complain about the opposite behavior - we sometimes DON'T revalidate resources after reload, see bug 30862 and related.
Comment 5 Darin Fisher (:fishd, Google) 2010-07-08 10:47:28 PDT
(In reply to comment #4)
> This seems intentional.

This behavior differs from Mozilla and IE, and it means that WebKit based browsers generate nearly 3x as many network requests as those browsers on Facebook!  Any site that navigates using reference fragments (or history.pushState) to reuse the existing document will perform poorly at the network level after the user hits reload once on a particular "page".


> What's the difference between content loaded before 
> onload and after onload?

Content loaded during page load is related to the page load.  You can see this concept applied to other aspects of page loading: a subframe navigated during page load does not create a separate back/forward entry.

Consider back/forward navigations.  When navigating back in history, we apply a rule that allows stale content to be loaded from the cache.  However, if the user then interacts with the page, and the page creates an XHR, should that XHR also inherit the rule that causes it to load stale content?  It seems like that could render a web app unusable.

I think the load flags should be temporary for the duration of a page load (navigation).  After that, they should reset to the normal state, so that future interactions will behave as they do when a page is normally loaded. 


> If a user initiates a reload, they want to refresh 
> content, and technical details of whether it's loaded via HTML tags or 
> XMLHttpRequest are irrelevant.

Agreed.  Nate's patch treats all subresource loads equally.


> In fact, many people complain about the opposite behavior - we sometimes 
> DON'T  revalidate resources after reload, see bug 30862 and related.

Bug 30862 is not contradictory to this bug.  The script element created during parsing of the body should inherit the load flags used to fetch the document.  The load event should be deferred until that script element's load event handler runs, but during that event handler if another script element is created, then the document's load event will be further deferred, and the second script element should also inherit the reload flag.  Finally, when the second script element finishes loading, we can fire the load event to the page and reset the load flags.  No problem.
Comment 6 Alexey Proskuryakov 2010-07-08 11:15:42 PDT
Matching other browsers is of course a very strong argument. However, we've been long trying to apply our own thinking to refresh functionality, and didn't just blindly follow others here.

> Agreed.  Nate's patch treats all subresource loads equally.

It's only equal in a low level technical sense. The user doesn't know which requests are made before onload, and which are made after it. Actually, it's dynamic content that's most interesting to refresh for normal people, so preventing revalidation in that case sounds like a bad thing.
Comment 7 Darin Fisher (:fishd, Google) 2010-07-08 11:36:03 PDT
(In reply to comment #6)
> Matching other browsers is of course a very strong argument. However, we've 
> been long trying to apply our own thinking to refresh functionality, and didn't 
> just blindly follow others here.

3x the number of network requests on Facebook is not in the realm of reasonable, don't you agree?  I mean, it is really bad.  There are a lot of Facebook users.

I agree with the principle of doing what's best provided it is compatible with the web.  In this case, I think the choice is pretty clear.


> > Agreed.  Nate's patch treats all subresource loads equally.
> 
> It's only equal in a low level technical sense. The user doesn't know which 
> requests are made before onload, and which are made after it. Actually, it's 
> dynamic content that's most interesting to refresh for normal people, so 
> preventing revalidation in that case sounds like a bad thing.

I guess I was confused by your argument.  I agree that the user won't know the difference for requests made close to the time of the page load stopping.  However, after the page has been quiescent for a little while, I don't think the user should expect a previous reload to impact new clicks.

I think there is some burden on the web page developer to do the right thing when using XHR.  If we apply load flags to any request made before or during onload, then the web developer can make sure to issue subresource requests during that time so that a 'reload' would apply to those requests.  This will work in Firefox and IE (see bug 30862).  Web developers can similarly know that load flags will not apply to requests issued after onload.  So, they can plan accordingly.

I know there is still the case of uninformed web developers, and that is surely the majority case, but I don't know what else to do.  The current behavior is hurting more than it is helping.  I think that's clear.
Comment 8 Darin Fisher (:fishd, Google) 2010-07-08 11:38:05 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > Matching other browsers is of course a very strong argument. However, we've 
> > been long trying to apply our own thinking to refresh functionality, and didn't 
> > just blindly follow others here.
> 
> 3x the number of network requests on Facebook is not in the realm of 
> reasonable, don't you agree?

In case you were wondering, this issue was reported by Facebook developers.
The 3x number comes from their network traffic logs.
Comment 9 Alexey Proskuryakov 2010-07-08 11:51:34 PDT
Is it known why users try to reload Facebook pages? Will this change make their lives worse?

> However, after the page has been quiescent for a little while, I don't think 
> the user should expect a previous reload to impact new clicks.

I hesitate to propose it, but this almost sounds like there should be a timeout of several seconds then.
Comment 10 Darin Fisher (:fishd, Google) 2010-07-08 12:45:40 PDT
(In reply to comment #9)
> Is it known why users try to reload Facebook pages? Will this change make their 
> lives worse?

I suspect people hit the reload button to reload the page they are looking at.  I doubt they do so to reload the page that they will look at next.  You know what I mean?  On facebook, there is only one real page, but to the user it looks like many pages.  The user probably only wants to reload a single conceptual page, which is what this patch will give them.  That's my intuition anyways.


> > However, after the page has been quiescent for a little while, I don't think 
> > the user should expect a previous reload to impact new clicks.
> 
> I hesitate to propose it, but this almost sounds like there should be a timeout 
> of several seconds then.

Let's start by having a hard cut off after onload, and fix bug 30862 as I proposed.  I bet that will address most user complaints.

Do you know of any other bugs related to this topic besides bug 30862?
Comment 11 Alexey Proskuryakov 2010-07-08 13:36:17 PDT
> I suspect people hit the reload button to reload the page they are looking at.  
> I doubt they do so to reload the page that they will look at next.

Well, this just means that we should reset the load type when fragment changes, doesn't it?

My question was a bit different though. What kind of content do people want to refresh on Facebook? It's possible that fixing this bug in this way will expose some other difference in behavior, and WebKit will be the only engine  that doesn't refresh this content. That concern is of course not limited to Facebook, but we could start testing with it if we knew what scenario we're talking about.

> Let's start by having a hard cut off after onload, and fix bug 30862 as I proposed.

Fixing 30862 would be of course very welcome. I'd love to hear what other folks CC'ed here think about your proposal about this bug.

> Do you know of any other bugs related to this topic besides bug 30862?

There are several bugs mentioned in comments there. Besides those, bug 38690 may be related (it might even be related to Facebook problem - do we know for sure if it's users actually hitting Refresh, and not some other reason that makes WebKit revalidate?)
Comment 12 Antti Koivisto 2010-07-08 14:11:16 PDT
I agree with Darin about both this and 30862. The current behavior is buggy and resetting the load type on onload is the appropriate fix.

I had noticed the problem earlier when working with the loader code but never tried to fix it as there wasn't known serious harm caused by it (reloads are assumed to be relatively rare) and the fix is a risky behavior change. Since it is now known to cause real pain, the fix should be tried out.
Comment 13 Darin Fisher (:fishd, Google) 2010-07-08 14:18:03 PDT
(In reply to comment #11)
> > I suspect people hit the reload button to reload the page they are looking at.  
> > I doubt they do so to reload the page that they will look at next.
> 
> Well, this just means that we should reset the load type when fragment changes, 
> doesn't it?

That is another choice.  It seems worse to me.  Again, a request issued after
page load completes seems conceptually divorced from page load.  Once the page
loads, the page should now act like a page that was loaded normally.


> My question was a bit different though. What kind of content do people want 
> to refresh on Facebook?

I don't know... maybe to see if people have posted additional comments?


> It's possible that fixing this bug in this way will
> expose some other difference in behavior, and WebKit will be the only 
> engine that doesn't refresh this content.

The patch here is trying to make WebKit be more like other browsers.


> That concern is of course not 
> limited to Facebook, but we could start testing with it if we knew what 
> scenario we're talking about.

The scenario is given in comment #0.  Use XHR to request a resource.


> > Do you know of any other bugs related to this topic besides bug 30862?
> 
> There are several bugs mentioned in comments there. Besides those, bug 38690 
> may be related (it might even be related to Facebook problem - do we know for
> sure if it's users actually hitting Refresh, and not some other reason that 
> makes WebKit revalidate?)

I don't know for sure.  It is a guess based on observing the behavior of
reload on Facebook pages.  I don't observe the problem on Facebook unless
I hit reload first.  I didn't exhaust all possible avenues to see if there
isn't a redirect following a form submission somewhere.  It is possible,
but given the popularity of the reload button, I'm tempted to blame that.
Comment 14 Alexey Proskuryakov 2010-07-08 14:40:52 PDT
I still see this change as likely to degrade user experience, but I won't argue with both you and Antti any more.

>> It's possible that fixing this bug in this way will
>> expose some other difference in behavior, and WebKit will be the only 
>> engine that doesn't refresh this content.
>
> The patch here is trying to make WebKit be more like other browsers.

It's very common that a patch that technically makes us more like other browsers actually breaks functionality if we're still insufficiently like them (or if there are browser specific code paths). All I'm asking for is to actually test (to a reasonable degree) the use case we're supposed to be improving here. Not in the terms of XMLHttpRequest or DOM events, but in the terms of still seeing new content on Facebook.
Comment 15 Darin Fisher (:fishd, Google) 2010-07-08 16:16:07 PDT
(In reply to comment #14)
> > The patch here is trying to make WebKit be more like other browsers.
> 
> It's very common that a patch that technically makes us more like other 
> browsers actually breaks functionality if we're still insufficiently like them 
> (or if there are browser specific code paths).

Yes, this is a good thing to be concerned about.


> All I'm asking for is to 
> actually test (to a reasonable degree) the use case we're supposed to be 
> improving here. Not in the terms of XMLHttpRequest or DOM events, but in the 
> terms of still seeing new content on Facebook.

Certainly.  When we have this fix in the dev channel of Chrome, I plan on
informing the Facebook engineers so they can also confirm the fix.
Comment 16 Nate Chapin 2010-07-09 14:41:34 PDT
Created attachment 61100 [details]
patch #2

This change merges the places that are setting cache policy based on DocumentLoader's original request into addExtraFieldsToRequest.  As it currently stands, those settings often get overridden anyway, and it makes it much harder to follow the logic of the cache policy.  Hopefully this is clearer.
Comment 17 Darin Fisher (:fishd, Google) 2010-07-09 17:08:01 PDT
Comment on attachment 61100 [details]
patch #2

LayoutTests/http/tests/xmlhttprequest/cache-headers-after-reload.html:25
 +            xhr.onload = finish;
nit: indentation

R=me
Comment 18 Darin Fisher (:fishd, Google) 2010-07-09 17:11:28 PDT
By the way, Alexey... it seems we were already not observing this bug for subresources that go through the WebCore cache.  Take a look at FrameLoader::subresourceCachePolicy.  It will return early if m_isComplete is true, and that is true once we fire onload.  So, Nate's patch has the effect of making subresources like XHR that do not go through the WebCore cache behave consistently with those that do with regards to reload behavior.

We seriously need to clean up this code.  It is really wonky that there are two systems determining cache policy for different subresources.
Comment 19 Nate Chapin 2010-07-12 12:25:27 PDT
http://trac.webkit.org/changeset/63100
Comment 20 Mario Sanchez Prada 2010-07-12 17:08:33 PDT
(In reply to comment #19)
> http://trac.webkit.org/changeset/63100

Not sure of the actual reason but it seems this patch is causing an API test failing in the GTK port:

http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/1381/steps/API%20tests/logs/stdio

I found this one as the guilty commit by using git bisect with latest code from trunk, but still don't know whether this test is failing because this test broke something or because it uncovered some other problem somwhere else.

See comment #37 on bug 38648 for more details, now it's time for me to go to bed.
Comment 21 Mario Sanchez Prada 2010-07-12 17:25:10 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > http://trac.webkit.org/changeset/63100
> 
> Not sure of the actual reason but it seems this patch is causing an API test failing in the GTK port:
> 
> http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/1381/steps/API%20tests/logs/stdio
> 
> I found this one as the guilty commit by using git bisect with latest code from trunk, but still don't know whether this test is failing because this test broke something or because it uncovered some other problem somwhere else.
> 
> See comment #37 on bug 38648 for more details, now it's time for me to go to bed.

Hmmm.. looks like I was too slow, as Martin already filed a bug for this: 

https://bugs.webkit.org/show_bug.cgi?id=42114

Sorry for the spam
Comment 22 Mario Sanchez Prada 2010-07-13 04:32:01 PDT
(In reply to comment #20)
> (In reply to comment #19)
> > http://trac.webkit.org/changeset/63100
> 
> Not sure of the actual reason but it seems this patch is causing an API test 
> failing in the GTK port:
> 
> http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/1381/steps/API%20tests/logs/stdio
> 
> I found this one as the guilty commit by using git bisect with latest code from 
> trunk, but still don't know whether this test is failing because this test 
> broke something or because it uncovered some other problem somwhere else.

After taking another look on this (I'm feel really curious about this stuff), I found the problem in the WebKitGTK API test comes because of this one-line change:


diff --git a/WebCore/loader/FrameLoader.cpp b/WebCore/loader/FrameLoader.cpp
index 8611ac0..40beec7 100644
--- a/WebCore/loader/FrameLoader.cpp
+++ b/WebCore/loader/FrameLoader.cpp
@@ -2605,6 +2605,9 @@ String FrameLoader::userAgent(const KURL& url) const
 void FrameLoader::handledOnloadEvents()
 {
     m_client->dispatchDidHandleOnloadEvents();
+
+    m_loadType = FrameLoadTypeStandard;
+
 #if ENABLE(OFFLINE_WEB_APPLICATIONS)
     if (documentLoader())
         documentLoader()->applicationCacheHost()->stopDeferringEvents();


I'm not an expert on this matter so I can't tell why this is causing the following assertion in the webview unit test to fail:

ERROR:../../WebKit/gtk/tests/testwebview.c:269:do_test_webkit_web_view_adjustments: assertion failed (gtk_adjustment_get_value(adjustment) == 100.0): (0 == 100)


...but somehow that's causing the problem and I wonder, out of my ignorance about this part of the code, whether the handleOnloadEvents() function is the right place to reset the m_loadTypeValue. Maybe with this piece of information someone of you could help to understand why this is happening and therefore to fix bug 42114.

Thanks in advance.