Bug 30862 - Dynamically inserted subresources aren't revalidated even when the containing document is reloaded
Summary: Dynamically inserted subresources aren't revalidated even when the containing...
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
: 35883 43664 97044 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-10-28 08:39 PDT by Martin Wittemann
Modified: 2022-11-17 14:04 PST (History)
26 users (show)

See Also:


Attachments
code to reproduce the bug (3.18 KB, application/zip)
2009-10-28 08:40 PDT, Martin Wittemann
no flags Details
Modified test case, load scripts dynamically next turn after 'load' event with 1000ms delay (1018 bytes, application/zip)
2012-01-27 09:11 PST, johnjbarton
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Wittemann 2009-10-28 08:39:36 PDT
Inserting script tags dynamically does not work well with the caches. In fact, the caches fail as you can see in the attached example.

Steps to reproduce:
1. Open the Bug.html file
  - Check the log: should print "bug1", "bug2"
2. Change bug2.js
  - set the console message to "bug22"
3. Reload the Bug.html file
  - check the log again: should print "bug1", "bug22"
  - BUT prints: "bug1", "bug2"!
Comment 1 Martin Wittemann 2009-10-28 08:40:30 PDT
Created attachment 42030 [details]
code to reproduce the bug
Comment 2 Alexey Proskuryakov 2009-10-28 08:52:38 PDT
Confirmed with shipping Safari/WebKit 4.0.3 and with a local build or r50046.
Comment 3 Alexey Proskuryakov 2009-10-28 11:38:05 PDT
We should remember that the page was created by reloading, and revalidate subresources that are added dynamically, not just those requested by HTML parser. I'm surprised this doesn't work.
Comment 4 Alexey Proskuryakov 2009-10-30 13:52:33 PDT
I said that this should work, but turns out that Firefox has the same behavior in this respect. The attached test case doesn't fail there for a different reason, but if you move test execution into an onload handler, then subresources won't be revalidated. It's just timing of when requests are started that makes the behavior look different.

So, both Safari and Firefox forget that the document was a reload once load completes, and any resources that are added dynamically later aren't revalidated (unless they have appropriate Cache-Control headers, of course).

A workaround is to use Empty Caches command from Safari menu. I've changed bug summary to reflect that there is a workaround that doesn't involve restarting Safari.
Comment 5 Alexey Proskuryakov 2009-11-17 13:09:47 PST
There are 30 votes on this bug. Given that reloading works in the same way as it does in Firefox, what exactly do you want us to change in WebKit?
Comment 6 Martin Wittemann 2009-11-17 22:55:12 PST
Well, I would be satisfied if it would work like in Firefox. Firefox detects the changes in the dynamically loaded sources and reloads them without a need to empty the cache, at least from my point of view as a web developer. (see the differing behavior in the posted example)
Comment 7 Alexey Proskuryakov 2009-12-14 10:26:20 PST
See also: bug 32423.
Comment 8 Simon Fraser (smfr) 2010-02-04 15:03:41 PST
<rdar://problem/7614047>
Comment 9 Alexey Proskuryakov 2010-03-09 15:09:39 PST
See also: bug 35883.
Comment 10 Alexey Proskuryakov 2010-08-07 00:49:35 PDT
*** Bug 43664 has been marked as a duplicate of this bug. ***
Comment 11 Alexey Proskuryakov 2010-08-07 00:51:25 PDT
CC'in Darin Fisher, who (I think) knows what the actual difference with Firefox is.
Comment 12 Kyle Simpson 2010-08-07 10:46:41 PDT
Alexey-
As noted in #32423 (which I filed), *that* bug is the opposite of this one. 32423 is about the cache not being used in a circumstance where it *should* be used. This bug is about the cache being used in a case where it *shouldn't* be.

As for this bug as well as #35883 (which I filed), I think part of the confusion is that of how and when a resource is "marked" as needing to be reloaded on the next page-refresh.

If the resource *starts* loading before onload occurs, but finishes loading *after* the page-onload fires, is it marked as needing re-validation for next shift+refresh? If the resource starts *and* completes loading *before* the page-onload, is it marked? If the resource is requested *after* page-onload and completes some time later, is it marked?

If a resource is marked as needing re-validation based on when it is loaded or completes loading, it leads to these race conditions.

But if instead a resource is not marked *until the time of reload* (meaning all currently loaded resources in a page are marked when you hit refresh), then it shouldn't matter how those resources got to the page originally, right?

If some get there through normal static load, and some get there on-demand/dynamically later, in all cases, don't you *always* want to revalidate all current page resources when the page is refreshed?
Comment 13 Kyle Simpson 2010-08-07 10:49:43 PDT
*** Bug 35883 has been marked as a duplicate of this bug. ***
Comment 14 Kyle Simpson 2011-02-02 12:48:12 PST
Just want to ping this ticket and verify that it is indeed not yet fixed. Just tested in webkit nightly.
Comment 15 Darin Fisher (:fishd, Google) 2011-02-02 13:02:30 PST
(In reply to comment #12)
> But if instead a resource is not marked *until the time of reload* (meaning all currently loaded resources in a page are marked when you hit refresh), then it shouldn't matter how those resources got to the page originally, right?

I understand why you might want reload to work that way, but that's not how things work.  The browser does not remember the set of URLs loaded in a page upon reload.  Instead, reload just means go refetch the document, and for any subresource requested by the document *before onload*, apply a similar cache validation policy.

You see, the set of subresources requested by a document can change when you reload the document.  It would be odd to only apply the validation logic to the set of subresources previously requested, which may not fully cover the set of subresources newly requested.

I think this bug report is INVALID.  WebKit is functioning as intended.
Comment 16 Kyle Simpson 2011-02-02 13:15:36 PST
With the increase in the use of dynamic resource loaders, it's becoming more and more common that resources are being lazy-loaded/on-demand loaded well after the page finishes its initial load. The effect is that there's a lot of resources on a page which cannot be re-requested by doing a shift+refresh like the web browsers have always worked before.

It's an extremely common problem in my development that I'm working on changes to a single JavaScript file, but that file is being dynamically loaded to the page, and so my only way to get that resource to be re-requested from the server is to manually clear the cache (instead of a shift+reload).

I'm sorry, but I've got to vehemently disagree that this should be the intentional design of the browser. Maybe it's not a "bug" in the true sense of the term, but it's certainly quite unexpected for people to shift+reload a page and only some of the resources are re-validated, while others lately sit in cache and never get re-requested.
Comment 17 Simon Fraser (smfr) 2011-02-02 13:18:24 PST
I agree that the current behavior is hugely frustrating for web developers.
Comment 18 Dan Dean 2011-02-02 13:24:02 PST
I also agree that resources like this should be reloaded.
Comment 19 Brady Eidson 2011-02-02 13:58:05 PST
(In reply to comment #16)
> I'm sorry, but I've got to vehemently disagree that this should be the intentional design of the browser. Maybe it's not a "bug" in the true sense of the term, but it's certainly quite unexpected for people to shift+reload a page and only some of the resources are re-validated, while others lately sit in cache and never get re-requested.

(In reply to comment #17)
> I agree that the current behavior is hugely frustrating for web developers.

(In reply to comment #18)
> I also agree that resources like this should be reloaded.

To be quite blunt, web browser caching is about the USER experience, not the web developer experience.  The impact on web browser users here is an order of magnitude greater than that of web developers.

I'm not saying web developers aren't important - of course they are.  

But destroying a user-centric optimization to hack in something web developers want seems wrong to me.
Comment 20 Mathias Bynens 2011-02-02 14:01:08 PST
(In reply to comment #19)
> To be quite blunt, web browser caching is about the USER experience, not the web developer experience.  The impact on web browser users here is an order of magnitude greater than that of web developers.
> 
> I'm not saying web developers aren't important - of course they are.  
> 
> But destroying a user-centric optimization to hack in something web developers want seems wrong to me.

How about differentiating between normal refreshes (⌘ + R) and hard refreshes (⌘ + ⇧ + R) then? I don’t think regular web browser users use hard refresh all that much.
Comment 21 Alexander Romanovich 2011-02-02 14:11:14 PST
While I understand (and share) the frustration among web developers, the problem is that lazy-loaded resources don't really have anything to do with a reload of the main resource. A super-refresh solution would probably solve the issue, but it's still a strange logic.

There are JS workarounds which use query strings on the subresources when necessary to force refreshes. But it would be useful if JS executing on the page could detect that a reload was performed, and be able to lazy-load the resources with an extra header (rather than a query string) to force them to refresh.
Comment 22 Kyle Simpson 2011-02-02 14:11:43 PST
(in reply to comment #19)
>To be quite blunt, web browser caching is about the USER experience

But the point being missed is that this IS affecting users as well as web developers. If my production site uses a script loader (which all mine do), and half a dozen JS files are loaded on-demand, half of which happen to load before the page finishes, and the other half which finish loading later... now there's half the files that are "marked" as needing re-validation, and the other half not.

Now, I as the web author change two files in a dependent way (in other words, both changes need to go out). I *should* be able to rely on the fact that the user's browser is going to do a conditional "If-Modified-Since" check on all resources.

So, when the user clicks the refresh button, or navigates to another page on the same site, one of the changed files will be re-validated (and thus re-pulled), while the other changed file is in this weird not-marked state, and won't be revalidated or re-requested. NOW the user's cache is in a broken state. And *they* can't fix it with a shift+refresh. They must clear their cache (something that probably a good percentage of web users know they can do, but not very many of them do it regularly, and many have no idea what it means, even when a tech support rep tells them to do it.)

> But destroying a user-centric optimization 

I'm not exactly sure what you're referring to here, but I don't buy the explanation that there's no way to reliably mark all resources that are loaded to a page (regardless of how they got there) as needing to be re-validated on the next refresh. It may not be the algorithm I specified, but I think there must be some way to do it. It seems like it must be possible to just say "list all the files in the active cache which are (or have been) loaded in A, and let's re-validate that list of files."

I'm not sure why you're insisting on making this an "us vs. them" thing between web developers and end-users. I feel like there must be a way to satisfy the needs of this use-case in an amicable way for both parties.

BUT... just for the sake of argument, let's say there really is a fundamental paradigm incompatibility between the use camps. Couldn't the browser then implement this "fully reload everything regardless" functionality in a separate way (that's not as painful as clearing the full cache), like ctrl+refresh or something? That way normal users could do shift+refresh and those of us who care can do ctrl+refresh?
Comment 23 Dan Dean 2011-02-02 14:24:08 PST
Alexander,

I don't buy that answer. Users don't care how resources get loaded into the page. If a resource is loaded into a page, and the user hits "refresh", they expect *all resoureces* to be refreshed. That they were lazy-loaded is an implementation detail that is meaningless to the user.
Comment 24 Brady Eidson 2011-02-02 14:33:06 PST
(In reply to comment #22)
> (in reply to comment #19)

> > But destroying a user-centric optimization 
> 
> I'm not exactly sure what you're referring to here, but I don't buy the explanation that there's no way to reliably mark all resources that are loaded to a page (regardless of how they got there) as needing to be re-validated on the next refresh. It may not be the algorithm I specified, but I think there must be some way to do it. It seems like it must be possible to just say "list all the files in the active cache which are (or have been) loaded in A, and let's re-validate that list of files."
> 
> I'm not sure why you're insisting on making this an "us vs. them" thing between web developers and end-users. I feel like there must be a way to satisfy the needs of this use-case in an amicable way for both parties.

Nowhere in my comment did I see this bug was invalid, or that we can't resolve it while also preserving a very important performance optimization for end users.

My comment was in reply to 3 comments - with an obvious "web developer" bias - who suggested that it IS worthwhile to destroy a very important end user optimization to make the life of web developers easier.

Both camps can be accommodated here.  I was merely intending to put up resistance the growing "web developers are clearly more important" momentum.
Comment 25 Alexander Romanovich 2011-02-02 14:35:29 PST
Dan, that may be, but what's more important is seeing if the desired behavior is described in the spec so that all browsers can comply with a common and reliable cross-browser solution. If WebKit is following standards with this, it'd be difficult to get the suggestion here implemented.

As a web developer, I'd also love to see this fixed. But there are some things that would need to be settled. For example, if reloading a page instructs a Javascript application to always revalidate resources it requests, that state would presumably not end. If we're talking about a long-running JS application, then every single lazy-load request would force a round trip to the server, indefinitely. This might be a particular problem with applications that routinely lazy-load images, more so than scripts.

The other point I was making is that it seems beneficial to allow your Javascript application to somehow know if a resource needs to be revalidated because the main resource was refreshed. Then you could make this decision for yourself on a resource-by-resource basis. Maybe someone with more knowledge about what behavior browsers are supposed to be following on both these points could shed more light.
Comment 26 Alexey Proskuryakov 2011-02-02 14:44:08 PST
I think that this advocacy may be misplaced.

This bug is about a very specific special case where our behavior differs from Firefox (see attached test case). Hijacking it to discuss larger changes seems counter-productive - even if the bugs are merged later, discussions should be separate.
Comment 27 Kyle Simpson 2011-02-02 14:53:04 PST
(in reply to comment #25)

> but what's more important is seeing if the desired behavior is described in the spec 

To the best of my knowledge, this is not in the spec. In fact, when I raised the issue with the W3C about getting spec standards for how resources should be loaded and cached, I got this response:

> The spec is deliberately not specific about exactly how resources are
> loaded and cached, because it's good for browsers to be able to
> innovate and compete on the algorithms they use for this.  The
> difference should not be black-box-detectable -- except for
> performance, of course.

[from http://lists.w3.org/Archives/Public/public-html/2010Dec/0165.html]

I agree there should be standards for it, so this is reliable between browsers. Right now, there's different behavior in different browsers, and it causes multitude more headaches for web authors (especially tools devs like myself) because we can't feature-detect how a browser will handle loading, caching, and revalidating, so we have to fall back to ugly browser inferences or horrid UA sniffing. :(

But, the fact that a few of us think there should be standards doesn't mean there ever will be, nor does it mean browsers should sit around and wait for that. Indeed, browsers innovate in these non-spec'd areas all the time. Perhaps what's being suggested here (this "super refresh") may eventually get spec'd. 

But there's a very small chance that something which has long been the realm of "browser implementation detail" (according to members of the W3C) is going to suddenly become the realm of the spec, UNLESS the browsers have all implemented ideas that have centered around a common de facto standard.

As such, I'd really hate to see *this* bug sit unaddressed indefinitely in the smallest of hopes that the spec will, of its own accord, take the lead on this topic. We need browsers to do sensible things with loading, caching, and re-validation, things that make sense for both users and developers. Those things will eventually be seen as the *right* things, and time will eventually make that the "standard" (either implied or explicit).
Comment 28 Kyle Simpson 2011-02-02 15:00:30 PST
(in reply to comment #26)

> This bug is about a very specific special case where our behavior differs from Firefox

I'm sorry, I'm not sure I see how this is about a difference between Firefox and Webkit. As far as I've found in all my testing (I originally reported this in bug #35883 with my own test case), the behavior is true in both Webkit and FF, which is that resources loaded dynamically after the page finishes loading are not properly "marked" for re-validation on the next page refresh.
Comment 29 Alexey Proskuryakov 2011-02-02 16:05:56 PST
Please try the attached test case with steps to reproduce from comment 0. It has different behavior in Safari and in Firefox, for reasons that are not completely understood. This is what this bug is about.

> As far as I've found in all my testing (I originally reported this in bug #35883 with my own
> test case), the behavior is true in both Webkit and FF

Then bug 35883 should not have been marked as a duplicate, please re-open it. Surely, making the change you propose would also address this bug, but it doesn't mean that should be discussing it here.
Comment 30 Darin Fisher (:fishd, Google) 2011-02-02 17:03:56 PST
(In reply to comment #29)
> Please try the attached test case with steps to reproduce from comment 0. It has different behavior in Safari and in Firefox, for reasons that are not completely understood. This is what this bug is about.

Hmm, I did not realize we differ from Firefox in regards to this bug.  I wonder if Firefox is buggy then...

Anyways, I still stand by comment #15.  At the very least, I think there should be some kind of time based heuristic for when we stop applying the cache validation policy of the document to subresources.

We used to have a bug where the cache policy of the document would be sticky for XMLHttpRequest.  Facebook engineers noticed this as a huge disparity between WebKit-based browsers and other browsers when it comes to the frequency of If-Modified-Since requests because users do hit reload sometimes.  The fix was to make WebKit reset the cache policy of the document on load.  That was a good fix IMO.

I could maybe be persuaded to treat shift+reload differently, but even that seems fishy.  I hate to have to complicate the code for it.  But suppose we did.  What would the new heuristic be?  Certainly you would not want to neuter the cache forever?  Again, remember that facebook pages do not navigate the document.  They just play games with history.pushState / fragment navigations, and then they dynamically load the page with XHR.  Other pages are like that too!
Comment 31 Alexander Romanovich 2011-02-02 17:49:56 PST
Darin echos my concerns, but I think the super-refresh method is probably the best way to resolve this now. Especially given the response W3C gave Kyle. A time-based heuristic sounds like it would be guesswork though. Could there not be some other rule? For instance, is there is a way that WebKit could distinguish between a resource that is lazy-loaded from a Javascript that was immediately executed when the page loaded (even if running after the main resource and its subresources have all finished loading), as opposed to one that's triggered later from user interaction with the page? The former being revalidated, and the later being loaded normally.
Comment 32 Alexey Proskuryakov 2011-02-02 19:06:13 PST
I'm worried  about comments here mentioning that super refresh would be a user feature. I don't think that's acceptable, and we shouldn't do anything to encourage solutions that have any risk of getting into a state where users need to know about shift-refresh.
Comment 33 Darin Fisher (:fishd, Google) 2011-02-02 21:58:15 PST
(In reply to comment #31)
> Darin echos my concerns, but I think the super-refresh method is probably the best way to resolve this now.

Forgive my ignorance, but what is the super-refresh method?
Comment 34 Alexey Proskuryakov 2011-02-02 22:10:15 PST
Oh, I think it's just synonym for shift-refresh.
Comment 35 Darin Fisher (:fishd, Google) 2011-02-02 22:37:02 PST
(In reply to comment #34)
> Oh, I think it's just synonym for shift-refresh.

OK :)

So, the proposal is to make shift-refresh sticky, perhaps with an expiration time.  On reflection, I'm less excited about the expiration time as I think it would make the behavior of the browser less predictable to developers.  It may be especially frustrating for developers of AJAX style applications.

I'm also concerned about making shift-refresh sticky because some users do know about that keystroke, and it is a very "costly" keystroke.

It feels like this should somehow be addressed via something that developers have to opt-into somehow, possibly as an extension that developers install.

I don't think we should muck with the default web platform behavior here.
Comment 36 Alexander Romanovich 2011-02-03 05:53:34 PST
Thanks for the time you've taken to reason through this issue, since it's such a hassle for users at the moment. Just to be clear: developers themselves (for which a developer extension might help) are not the problem, its the end users who are frustrated by this.

The shift-refresh suggestion aside, in terms of a feature developers can opt-into, what about the other suggestion here? If application developers could 1) detect that the main resource was refreshed and 2) force revalidation when performing a lazy-load of a resource, I think that we'd have a suitable solution that developers can work with. They could easily decide to revalidate resources they later request on a case by case basis this way. Perhaps by toggling on a revalidation mode, requesting the resources, and then toggling it off (unless there's a per-request flag that can be passed somehow).
Comment 37 Brady Eidson 2011-02-03 09:02:50 PST
(In reply to comment #36)
> Thanks for the time you've taken to reason through this issue, since it's such a hassle for users at the moment. Just to be clear: developers themselves (for which a developer extension might help) are not the problem, its the end users who are frustrated by this.

There are no end users complaining loudly in this bug, only developers.

If this bug actually affects the live delivery of a site to an end user, and the end user sees the issue, they will (correctly) not blame the browser, but rather the developer.

If a developer's particular configuration is susceptible to this issue on their live site, relying on one particular browser's special behavior seems ill-adviced, as they can always use appropriate HTTP headers when delivering their resources - Headers that will intentionally bypass long standing caching rules as they desire.

> The shift-refresh suggestion aside, in terms of a feature developers can opt-into, what about the other suggestion here? If application developers could 1) detect that the main resource was refreshed and 2) force revalidation when performing a lazy-load of a resource, I think that we'd have a suitable solution that developers can work with. They could easily decide to revalidate resources they later request on a case by case basis this way. Perhaps by toggling on a revalidation mode, requesting the resources, and then toggling it off (unless there's a per-request flag that can be passed somehow).

Fact #1 - HTTP caching is supposed to be invisible to the end user of the web platform.
Fact #2 - We all know that it actually isn't - in practice - and this bug is just one of many manifestations of this.

I think doing what you propose will take us FARTHER from the ideal of #1, not closer.

On a slight tangent, one oft-underlooked feature of the ApplicationCache is atomic updating of websites - all or nothing.  No stale resources.  We've heard of a lot of developers relying on this with great success...
Comment 38 Alexander Romanovich 2011-02-03 09:19:12 PST
(In reply to comment #37)
> There are no end users complaining loudly in this bug, only developers.
> 
> If this bug actually affects the live delivery of a site to an end user, and the end user sees the issue, they will (correctly) not blame the browser, but rather the developer.

To be fair, I wouldn't expect end users to be commenting in here. The requests from developers here are a result of us getting complaints from our end users until we find ways to work around this limitation. But also, I've requested a way to empower developers to solve the problem, precisely because I'm not blaming the browser, I merely feel that developers don't have adequate solutions at their disposal.

> If a developer's particular configuration is susceptible to this issue on their live site, relying on one particular browser's special behavior seems ill-adviced, as they can always use appropriate HTTP headers when delivering their resources - Headers that will intentionally bypass long standing caching rules as they desire.

I don't think controlling the headers sent with the resources in question is necessarily the best way to solve the problem (nor is anyone suggesting that this be solved by one browser's special behavior). A javascript author ideally wouldn't have to have access or knowledge about that to refresh a resource.

> Fact #1 - HTTP caching is supposed to be invisible to the end user of the web platform.
> I think doing what you propose will take us FARTHER from the ideal of #1, not closer.

I'm not sure how what has been suggested exposes anything about HTTP caching to the end user. For myself, I have only requested that *developers* be able to request a resource with revalidation. As it stands, you allow anyone (end users, developers) to revalidate resources in the main document by refreshing. Why can a developer not revalidate a resource via a JS-originated request?
Comment 39 Kyle Simpson 2011-02-03 09:45:31 PST
Forgive my stubborness, but I still do not understand why a shift+refresh should ONLY re-validate some resources (those that came in initial page load) and not others?

I've heard several times in this thread that it simply can't be done without losing an important User-Experience optimization, but I haven't seen a single actual explanation of that techinical limitation.

If a *user* says, "I want to refresh this page because things seem to not be working correctly" (or) "I want to refresh this page because the support forums/representatives of the site told me to", in what use-case(s) are we saying that this use would actually only want some of the resources on that page re-validated, and not others?

BOTTOM LINE: How on earth can the browser justify doing something which doesn't make sense to an end-user? If an end-user clicks refresh, or does shift+refresh even, they are almost certainly saying "refresh everything", not expecting you to only refresh some things and not others. How does the current behavior not violate the principle of lease-surprise???

----------------
That question aside, for a moment...

I'm quite confused by the alternate suggestion. How is a dynamic script loader (like mine, LABjs), supposed to:

1. detect that a "refresh" of a page-view is happening; AND
2. somehow manually force all the resources I'm going to ask for to be re-validated

Adding a ?x=128945839853 type cache-buster to all the URLs is not what we want, because we don't want a full reload, we want the browser to re-validate (an If-Modified-Since request).

As it stands now, any resources which were sent with a "Last-Modified" response header will be re-validated, but ONLY so long as that resource arrived during/before initial page load. If a resource is sent with "Last-Modified" in response to a dynamic resource load after initial page-load, essentially the browser will ignore this header, because it will not re-validate the next time that resource is requested in the same manner.

Are we suggesting that every resource container expose some property that indicates whether or not the resource it contains is subject to re-validation or not? Because I'd of course only want to force my own re-validation on resources in which the browser wouldn't already be inclined to re-validate.

Also, keep in mind there are various ways that resources are loaded, from XHR to dynamic script/link tags, to iframes, <object>s, <img>s, etc. If the suggestion is that we create a mechanism by which a web author (ie, me, as a script loader author) can force a re-validation, it has to take into account all those possible request methods.

Perhaps a `rel` attribute can be added to a resource request that is like `rel="alwaysRevalidate"` or something like that?

It sure sounds like we're going down a MUCH more complicated path than something like the "super-refresh" idea. And I completely don't understand why there'd need to be a timeout on it.
Comment 40 Alexey Proskuryakov 2011-02-03 09:48:03 PST
> I'm not sure how what has been suggested exposes anything about HTTP caching to the end user. For myself, I have only requested that *developers* be able to request a resource with revalidation. As it stands, you allow anyone (end users, developers) to revalidate resources in the main document by refreshing. Why can a developer not revalidate a resource via a JS-originated request?

While I question the wisdom of such designs, there is a valid bug we have about this, see bug 51286.
Comment 41 Alexander Romanovich 2011-02-03 10:01:55 PST
(In reply to comment #39)
> I've heard several times in this thread that it simply can't be done without losing an important User-Experience optimization, but I haven't seen a single actual explanation of that techinical limitation.
> If a *user* says, "I want to refresh this page because things seem to not be working correctly" (or) "I want to refresh this page because the support forums/representatives of the site told me to", in what use-case(s) are we saying that this use would actually only want some of the resources on that page re-validated, and not others?

Because by permanently forcing all future resources to be revalidated, you are thereby sacrificing one goal of caching which is that the server should not be contacted at all until the resource is set to expire. It would be a problem to indefinitely contact the server to revalidate every single resource over the entire lifetime the JS application runs. There has to be some point at which the automatic revalidation ends. We either need the browser to revalidate resources that are immediately loaded by the application (the refresh idea), or give the developer a way to revalidate them themselves (better, in my opinion).

> I'm quite confused by the alternate suggestion. How is a dynamic script loader (like mine, LABjs), supposed to:
> 
> 1. detect that a "refresh" of a page-view is happening; AND
> 2. somehow manually force all the resources I'm going to ask for to be re-validated
> 
> Adding a ?x=128945839853 type cache-buster to all the URLs is not what we want, because we don't want a full reload, we want the browser to re-validate (an If-Modified-Since request).

Kyle, I think we're saying the same thing here. What I am saying is that there is *not* currently a way to do #1 and #2, and that it would be useful to have that ability. I also believe a query string is not a valid solution. I'd love for LABjs to be able to detect that the page was refreshed and then be able to force a refresh for particular resources it loads later.

> Are we suggesting that every resource container expose some property that indicates whether or not the resource it contains is subject to re-validation or not? Because I'd of course only want to force my own re-validation on resources in which the browser wouldn't already be inclined to re-validate.
> Also, keep in mind there are various ways that resources are loaded, from XHR to dynamic script/link tags, to iframes, <object>s, <img>s, etc. If the suggestion is that we create a mechanism by which a web author (ie, me, as a script loader author) can force a re-validation, it has to take into account all those possible request methods.

This is the reason I suggested toggling a revalidation mode on/off that would encompass any requests (from any of the methods you listed) while the revalidation mode is on.

> Perhaps a `rel` attribute can be added to a resource request that is like `rel="alwaysRevalidate"` or something like that?

A rel attribute does not cover most methods by which the resource is loaded.
Comment 42 Alexander Romanovich 2011-02-03 10:04:44 PST
(In reply to comment #40)
> > I'm not sure how what has been suggested exposes anything about HTTP caching to the end user. For myself, I have only requested that *developers* be able to request a resource with revalidation. As it stands, you allow anyone (end users, developers) to revalidate resources in the main document by refreshing. Why can a developer not revalidate a resource via a JS-originated request?
> 
> While I question the wisdom of such designs, there is a valid bug we have about this, see bug 51286.

It would be *great* to solve bug 51286 because it gives us a reliable way (hopefully cross-browser?) to revalidate any resource we want on the fly. But what's still missing is the ability to detect that the main resource was refreshed. We'd want to know that also, so that we can then choose to re-cache additional resources via XHR.
Comment 43 Alexey Proskuryakov 2011-02-03 10:11:35 PST
> Hmm, I did not realize we differ from Firefox in regards to this bug.  I wonder if Firefox is buggy then...

Darin, I suspect that the original bug here is that we might not correctly stick the reload state until after onload. That sounds like a reasonable thing to fix.
Comment 44 Kyle Simpson 2011-02-03 10:21:50 PST
(in reply to comment #41)

> Because by permanently forcing all future resources to be revalidated, you are thereby sacrificing one goal of caching which is that the server should not be contacted at all until the resource is set to expire. 

Again, I'm not suggesting a different model, by which resources are marked for re-validation NOT when they are added to a page, but when the user holds down the shift button and hits refresh.

I'm not suggesting anything "permanent" about the re-validation... I'm saying that it should be an on-demand thing that matches up with when the user on-demand tries to do a hard refresh.

Also, aren't resources that are loaded during page load marked as "definitely needs revalidation" immediately and for the life-time of the page? So why should resources that are lazy loaded in at a later time also not be marked as "revalidate through the end of this page lifetime"?

BTW, it would appear I'm asking for the same thing that's in Bug #52153.
Comment 45 Kyle Simpson 2011-02-03 10:22:59 PST
(in correction of comment #44)

Doh. I meant to say "I *am* suggesting a different model..."
Comment 46 Alexander Romanovich 2011-02-03 10:35:04 PST
(In reply to comment #44) 
> I'm not suggesting anything "permanent" about the re-validation... I'm saying that it should be an on-demand thing that matches up with when the user on-demand tries to do a hard refresh.

Right, but all I'm saying is that what the user requested was to have the main application resources refreshed. If they click a button 3 minutes into using the application and a bunch of images are loaded in, those should not be revalidated. Consider an application that flips through a large number of such resources, all of which represent content the application loads in later and are not part of the application itself that the user was trying to reset.

I didn't mean to get anything off track here. According to comment #43, if the original issue is fixed, then any resources your dynamic script loader up until onload for the page fires will be revalidated -- but not after that. I still feel we need an additional ability (i.e. the browser not forgetting that the main resource was refreshed) but I can file a separate request for that.
Comment 47 Kyle Simpson 2011-02-09 15:48:02 PST
Just checking... is this changeset the "super refresh" being implemented, as has been discussed here?

http://trac.webkit.org/changeset/77922
Comment 48 Vsevolod Vlasov 2011-07-07 09:22:19 PDT
(In reply to comment #47)
> Just checking... is this changeset the "super refresh" being implemented, as has been discussed here?
> 
> http://trac.webkit.org/changeset/77922

No, it's not. This patch implemented passing shift-refresh from inspector to inspected page, so that pressing shift-refresh behaved equally regardless the inspector or inspected page has the focus.

This is not really related to this bug, but web developers might be interested in http://webk.it/63999 that enabled clear cache and cookies support from inspector (context menu in network panel). This will be supported in chromium very soon.

I am also working on http://webk.it/64097 that will allow to disable cache while developing.
Comment 49 Britt Selvitelle 2011-09-20 08:49:39 PDT
Britt, previously of Twitter here.

I agree with Kyle that the behavior is confusing.
I think the answer lies in the question: "Is shift-reload different from regular-reload?"

If yes, than we need to design based on that principal. It is a unique method that is well known by developers to revalidate all resources, including those l loaded dynamically after the page.

If no, then we make reload ubiquitous and have no shift-reload keyboard shortcut, in order to not confuse users or developers, and develop alternative means for developers to clear the asset cache while working.
Comment 50 Darin Fisher (:fishd, Google) 2011-09-20 13:33:03 PDT
(In reply to comment #49)
> Britt, previously of Twitter here.
> 
> I agree with Kyle that the behavior is confusing.
> I think the answer lies in the question: "Is shift-reload different from 
> regular-reload?"
> 
> If yes, than we need to design based on that principal.

Yes, they are different.  The shift modifier causes the browser to disregard any locally cached copy of resources requested for the page load.


> It is a unique method that is well known by developers to revalidate all 
> resources, including those l loaded dynamically after the page.

I don't think it is well known or expected that shift+reload should impact the
behavior of resources requested after the page finishes loading.  Think about
what this does to an AJAX site, which rarely performs a page load.  If the
reload behavior becomes sticky, then you would end up requesting resources
end-to-end forever on an AJAX site.  This is most likely not what developers
intend.  (WebKit used to have this bug actually.)


> If no, then we make reload ubiquitous and have no shift-reload keyboard 
> shortcut, in order to not confuse users or developers, and develop 
> alternative means for developers to clear the asset cache while working.

This doesn't make sense to me.  There is a clear difference between reload and
shift-reload, which seems valuable to me.  Reload also does not impact 
resources loaded after the page has finished loading.
Comment 51 Alexander Romanovich 2011-09-20 14:29:03 PDT
Darin, in previous comments it was discussed that the sticky nature of lazy loaded requests implementing a refresh be given a timer, so that only those requested immediately during page load would be refreshed. It would not happen forever. Check your comment #35 for example.

The idea is that developers might have to load in a required JS file, etc. at page load time, but conditionally on a cookie or something. So from the developer's point of view, it's helpful to have those immediately-lazy-loaded resources be considered packaged with the page and able to be refreshed. Given the timer suggestion, I think it's a suggestion that produces a big benefit for developers (and even end users, who will get changes to lazy-loaded resources in sync with changes to the page) without causing undue load for resources requested much later.
Comment 52 Darin Fisher (:fishd, Google) 2011-09-20 15:46:25 PDT
(In reply to comment #51)
> Darin, in previous comments it was discussed that the sticky nature of lazy loaded requests implementing a refresh be given a timer, so that only those requested immediately during page load would be refreshed. It would not happen forever. Check your comment #35 for example.

Right, I forgot about that, but I still agree with my earlier comment.  Actually, I feel a bit stronger about it.  Using a timeout for this sort of thing will likely add to confusion.  It creates a flaky / unreliable behavior, and that just doesn't seem like a good thing for the web platform.


> The idea is that developers might have to load in a required JS file, etc. at page load time, but conditionally on a cookie or something. So from the developer's point of view, it's helpful to have those immediately-lazy-loaded resources be considered packaged with the page and able to be refreshed. Given the timer suggestion, I think it's a suggestion that produces a big benefit for developers (and even end users, who will get changes to lazy-loaded resources in sync with changes to the page) without causing undue load for resources requested much later.

It is a very bad idea to have end-user observed behavior depend on a timer.  What value of a timer is good?  5 seconds?  30 seconds?  None of those are good choices.  The subresources requested after page load can be delayed a long time.  This would create a flaky / unreliable web platform.

I believe the best solution here would be to provide some special option for developers to use.  I should also remind people that it is really easy to clear browsing data in Chrome and Firefox.  Just type Ctrl+Shift+Del.
Comment 53 Britt Selvitelle 2011-09-20 19:09:16 PDT
(In reply to comment #50)
> Yes, they are different.  The shift modifier causes the browser to disregard any locally cached copy of resources requested for the page load.

Right. The problem is that the definition on how developers define "for the page load" has changed. Many sites load resources after in initial load but before the site is considered ready.
 
> > It is a unique method that is well known by developers to revalidate all 
> > resources, including those l loaded dynamically after the page.
> 
> I don't think it is well known or expected that shift+reload should impact the
> behavior of resources requested after the page finishes loading.  Think about
> what this does to an AJAX site, which rarely performs a page load.  If the
> reload behavior becomes sticky, then you would end up requesting resources
> end-to-end forever on an AJAX site.  This is most likely not what developers
> intend.  (WebKit used to have this bug actually.)

You would only end up requesting resources end-to-end when you do a shift-reload, which is exactly what it's for.
Comment 54 Kyle Simpson 2011-09-20 22:22:30 PDT
It remains incredulous to me that webkit persists in the belief that it makes sense to users (or developers) that there is an inherent race condition in the way things are currently implemented with shift+refresh and resource re-validation.

If I ask for 4 scripts to dynamically load, and (for any of a variety of reaons) 3 of the 4 are requested "during page load" (aka, before DOM-ready or before window.onload -- not sure which that means?), and the 4th ends up being requested just a fraction of a second "after"... then when i shift+reload the page, only 3 of the 4 scripts are re-validated. That inconsistency is inexcusable.

That situation is extremely common in my development experience, and it's inherently a race condition of which resources get requested soon enough to qualify as "during page load". It's crazy to me that we can't figure out a way to address this. Not seeing the reality of how that negatively affects users/developers shows that the people making decisions here are out of touch with real web development.

No matter what type of explanation you make for why resources loaded "after page load" don't get re-validated (I've read the explanations dozens of times and it's still confusing to me), there's no way around the fact that this optimization strategy creates the side effect of confusing race condition behavior for pages which use dynamic resource loading techniques during page load. In every sense of the phrase, that type of racey behavior violates "principle of least surprise".
Comment 55 Alexander Romanovich 2011-09-21 06:13:31 PDT
Right, it's unfortunate that the perspectives of the browser architects and the web developers cannot meet here. Considering how fundamental and increasingly common this scenario has become, there really needs to be more thinking applied to find a solution. I do feel the opinions against the timer are exaggerated, considering we're really only talking about the need for a special case for requests mere milliseconds after page load. Slightly bending strict policies in place with WebKit developers would reduce a much greater amount of headache for web developers, and we'd end up with something much more reflective of the work WebKit is actually being used for. I do hope this conversation can continue until it gets us somewhere. :)
Comment 56 Vsevolod Vlasov 2011-09-21 06:26:36 PDT
> It's crazy to me that we can't figure out a way to address this. 
There now two common ways to address this for developer.

1. There is a developer's option to disable cache both in Chrome and Safari.
2. Use different environment for development: e.g. serve your scripts with no-cache headers.

Both of these scenarios are very easy to use for developer. They could not be used by user so they are solving the problem mentioned in Darin's comment #30.

Could you explain why these approaches are not good enough for you?
Comment 57 Alexander Romanovich 2011-09-21 06:34:11 PDT
Sure.

1) My end users are not developers, so this doesn't address the problem.
2) Then none of my scripts will be cached. :)

Here's a really simple example scenario, to describe the problem. Suppose you have an application which allows for a web page to have a "logged-in" mode and a "non-logged-in" mode. If a cookie is present, we know that the user is logged in and needs an additional set of scripts and stylesheets to load immediately upon page load. Certainly I want all those resources for my logged-in state to be cached (!).

The problem arises if I update one of those resources on the server. A refresh is intended to be able to revalidate resources that the page needs, so if a change is made, the local browser cache can update its copy. However, this is not possible. All users will have to wait until a potentially long expiration time on those resources expires before they'll ever get the updates.
Comment 58 Vsevolod Vlasov 2011-09-21 06:41:29 PDT
> The problem arises if I update one of those resources on the server. A refresh is intended to be able to revalidate resources that the page needs, so if a change is made, the local browser cache can update its copy. However, this is not possible. All users will have to wait until a potentially long expiration time on those resources expires before they'll ever get the updates.

If we are talking about production version, then there is a common approach: you should version your scripts. E.g. try to load twitter.com and observe timestamp in the name of each js resource.

Otherwise you will encounter problems anyway when your users will not know/use shift-reload.
Comment 59 Alexander Romanovich 2011-09-21 06:46:49 PDT
But that's also a method that many developers have gripes about being forced to use. It's clearly a hack to fake revalidation, when there should be a way we can actually achieve revalidation.
Comment 60 Boris Zbarsky 2011-09-21 09:45:46 PDT
Jumping into the discussion here, since Firefox was mentioned....

Fundamentally, I think the issue is that web applications want to have it both ways: apply revalidation to some set of requests (the source of the app) but not to another set of requests (the data the app operates on). But the network library has no way to tell the two sets apart.  The browser could use some heuristics, maybe, but a <script> tag could fall into either set depending on what the page is doing with it.

The only way to really make this work sanely, I think, is via a manifest model a la appcache where there's a list of things that are part of "the app" (which therefore need to be revalidated on reload) and another list (possibly implicit by omission) that are "the data the app operates on" and does not need revalidation.
Comment 61 Kyle Simpson 2011-09-21 12:17:22 PDT
(In reply to comment #60)
> Fundamentally, I think the issue is that web applications want to have it both ways: apply revalidation to some set of requests (the source of the app) but not to another set of requests (the data the app operates on). 

Actually, I think the implementation detail that's really the culprit comes from an email exchange with Boris:

> Your proposed solution assumes the existence of a "list of resources the 
page has loaded", which simply doesn't exist....

So, if I understand correctly, is the fear that a page would have requested via XHR a bunch of data (JSON, XML), and that those Ajax data requests don't make sense (and are quite wasteful) to re-request?

If I'm correct (finally) in understanding this situation, the problem is actually that the *browser* (has nothing to do with the user or app) can't distinguish between bytes loaded for resources, and bytes loaded for transient data.

Let's explore if there's any way that the browser can in fact make a better distinction.

At first glance, it seems that XHR loading is the approriate trigger. XHR is for data, everything else is data.

But, there's (at least) two outliers that muddy the waters: data loaded as JSON-P, and resources loaded as XHR (like loading a script via XHR and then eval()'ing it or injecting it).

For the case of XHR that is loading a resource (in other words, *this* XHR should get flagged for re-validation), could we have the author set a request-header like "X-Revalidatable-Resource: true" to tell the browser that this request/response is a resource and should be treated as such as far as caching, revalidation, etc?

For the case of data loaded via a script-tag (JSON-P), which is almost always done via dynamically created script elements, could the author set a flag on the resource to un-flag it from revalidation... perhaps with a property like `revalidatable` which is always defaulted to `true`, but if set to `false`, then the browser will treat it instead as transient data and not re-validate it?

I realize in both cases, we're proposing new signals that the author has to do, and if they don't do it, we'll have false-positives and false-negatives, respectively. But the upside is, we're putting the onus on the developer to do the signaling, so that the browser doesn't have to use heuristics (which are also probably flawed) or timers (which are certainly flawed).

Is there any hope for an approach similar to this?
Comment 62 Kyle Simpson 2011-09-21 12:18:46 PDT
(In reply to comment #61)
> At first glance, it seems that XHR loading is the approriate trigger. XHR is for data, everything else is data.

Sorry, I meant "everything else is a resource".
Comment 63 Alexander Romanovich 2011-09-21 12:52:17 PDT
The manifest idea would be useful, as a developer could make that decision up front, just once, and not have to worry about it again. As long as WebKit switches on revalidation for their request only 1) after a refresh and 2) only the first time it's requested after a refresh, until another refresh is triggered.

Kyle's suggestions also merit consideration, especially in the case of XHR requests, whereby we can instruct the browser to revalidate in a more reasonable way than cache busting with a timestamp, etc. It doesn't make sense to me to keep adding "tricks" to force the browser to download a new copy of something. In the case of the manually inserted script tag, that might be a bit more complicated because there's another discussion about introducing a new attribute to the language.

Rather, I think a switch to put the browser into "revalidate-the-next-request" mode would cover both, without needing to modify the way XHR or inserted script tags work. However: I think we also previously discussed the fact that even if we had the ability to use such a mode, we don't yet have a way to know that a refresh was just performed, so our script would still not know that it needs to revalidate the lazy-loaded resources.
Comment 64 johnjbarton 2012-01-25 16:59:48 PST
I hit this problem in the following scenario:
  1. Using a web app on a server,
  2. Server updates itself,
  3. The next time I hit the server, the app is broken.
The update has new JavaScript. The JavaScript is almost all loaded dynamically (requireJS). Page loads but uses the cached JS: fail. Control+Shift+R fails. 

Now I happen to be a developer, but nothing about this scenario requires it.

Dynamic script loading is now the state of the art. Adding cache busting version markers is just a hack, plus its hard for small teams to implement. Manifests create a new class of bugs because they require developer action to maintain.

Three suggestions:

1) revalidate any resources that ever load in script tags. This is a much better approximation than "near load event" and much cheaper than building a manifest.

2) group resources by origin for cache handling. So force-refresh should apply to all resources from an origin, not just those loaded in a time window.

3) Decide this is a real problem. We need help here. There are only a few dynamic loaders now and a few browsers. Surely we can solve this problem.
Comment 65 johnjbarton 2012-01-26 09:08:03 PST
I am trying to understand this issue. I hope an expert will clarify: if a response header specifies 
  Cache-Control: no-cache 
then the browser will cache the resource but use it on reload only after revalidation with the server. 

Does this apply to all resources all of the time? 

This bug report seems to indicate that revalidation is only applied to resources loaded before 'load', but perhaps this limitation only applies to resources with no Cache-Control headers?
Comment 66 johnjbarton 2012-01-27 09:11:54 PST
Created attachment 124326 [details]
Modified test case, load scripts dynamically next turn after 'load' event with 1000ms delay

The attached modification of the OP test case delays the script load until the next turn after 'load' and further delays it 1000ms.  These changes ensure that the dynamic resources are well outside of the initial load.

Using the test case on a server that sets 
  Cache-Control:no-cache
I verified that this bug is *not* observed on Chrome 18.0.1017.2 dev

So what ever problem I am having, its not this one.
Comment 67 Jason San Jose 2012-07-31 16:17:23 PDT
I've seen workaround posted here to change HTTP headers. Are there any workarounds when the documents are local file:// URLs?
Comment 68 Florin Malita 2012-09-19 15:12:51 PDT
*** Bug 97044 has been marked as a duplicate of this bug. ***