Bug 17998 - When a resource is cached locally, WebKit should follow RFC 2616 "Specific end-to-end revalidation" instead of "Unspecified end-to-end revalidation"
Summary: When a resource is cached locally, WebKit should follow RFC 2616 "Specific en...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 525.x (Safari 3.1)
Hardware: Mac (Intel) OS X 10.5
: P2 Major
Assignee: Antti Koivisto
URL:
Keywords: InRadar
: 13892 (view as bug list)
Depends on: 49246
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-21 18:31 PDT by Mark J. Hughes
Modified: 2018-03-21 03:50 PDT (History)
15 users (show)

See Also:


Attachments
implement Specific end-to-end revalidation for WebCore memory cache (46.00 KB, patch)
2008-08-18 18:09 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
updated patch (46.52 KB, patch)
2008-08-18 20:44 PDT, Antti Koivisto
darin: review-
Details | Formatted Diff | Diff
more updates (51.34 KB, patch)
2008-08-25 15:16 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
WebCore patch for enabling revalidation on reload (27.42 KB, patch)
2008-11-11 02:51 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
WebKit patch for enabling revalidation on reload (4.18 KB, patch)
2008-11-11 02:54 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
updated WebCore patch (27.58 KB, patch)
2008-11-11 14:30 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark J. Hughes 2008-03-21 18:31:10 PDT
Overview:
---------

After much research, WebKit seems to ignore Expire / ETag headers in a response, either for static or dynamic content. Using Apache's mod_expires as a basis point, WebKit/Safari seems unable to understand server-side caching headers, and caching accordingly.

Ultimately, this is bad on its face, but especially bad for low bandwidth, high latency, and/or resource-limited situations, such as dial-up users or the iPhone.

Steps to Reproduce:
-------------------

1) Using Mac OS X builtin web sharing, configure my userdir.conf file (under 10.5/Intel)

shell> sudo vi /etc/apache2/users/mhughes.conf

<Directory "/Users/mhughes/Sites">
	
	Options Indexes MultiViews
	AllowOverride None
	Order allow,deny
	Allow from all
	
	ExpiresActive   On
	ExpiresDefault  "access plus 7 days"

</Directory>

2) Restart Apache

shell> sudo apachectl graceful

  --or--

"System Preferences" > "Sharing" >	Uncheck "Web Sharing" > Wait a few seconds for it to shutdown > Re-check "Web Sharing"

 ** Confirm the restart by opening http://localhost/~mhughes/ - you should see "Your website here" and such.

3) Confirm Expires and ETag are being sent properly by the webserver (Apache 2.2).  Open your terminal window and issue the following command:

shell> curl -i --head http://localhost/~mhughes/images/macosxlogo.gif

Result should look similar to below:

[antbox:~] mhughes% curl -i --head http://localhost/~mhughes/images/macosxlogo.gif
HTTP/1.1 200 OK
Date: Sat, 22 Mar 2008 00:12:40 GMT
Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.7l DAV/2 PHP/5.2.4
Last-Modified: Sat, 03 Nov 2007 20:12:50 GMT
ETag: "52b87-b0d-43e0be8509480"
Accept-Ranges: bytes
Content-Length: 2829
Cache-Control: max-age=604800
Expires: Sat, 29 Mar 2008 00:12:40 GMT
Content-Type: image/gif

 ** Note the ETag, Cache-Control, and Expires headers being returned.

3) Open Terminal.app and run the following:

shell> sudo tail -n 0 -f /var/log/apache2/access_log | grep macosxlogo

## The -n 0 will ensure no unwelcome previous access lines

4) Open Safari to http://localhost/~mhughes/

5) Return to your terminal window. Results should look similar to below:

::1 - - [21/Mar/2008:17:06:40 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829

 ** Request return with a 200 response code. This is fine.

6) Return to Safari and reload the page.

7) Return to your terminal window.  Results should look similar to below:

::1 - - [21/Mar/2008:17:06:48 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829

 ** Request still have a 200 response code, not 304 (Not Modified) as expected.

Expected Results:
-----------------

All static content should have, upon subsequent load, been cached via the numerous caching headers.

Actual Results:
---------------

All content, regardless of cache headers from the server, is re-requested and retransferred from the server. This causes much more bandwidth to be used, which should also increase local resource usage and cache churning.

Products affected:
------------------

	Safari	Version 3.1 (5525.13) 10.5 / Intel
	Safari	Version 3.0.4 (523.15) WinXP SP2
	WebKit	r31114 10.5 / Intel

Additional Builds and Platforms:
--------------------------------

Other major browsers, by comparison, perform as expected (below are the Apache logs for relevant browsers)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 <---- Running on 10.5.2 / Intel

First request:          127.0.0.1 - - [21/Mar/2008:17:13:58 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829
Second request:         127.0.0.1 - - [21/Mar/2008:17:14:04 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 304 -

Opera/9.24 (Macintosh; Intel Mac OS X; U; en) <---- Running on 10.5.2 / Intel

First request:          127.0.0.1 - - [21/Mar/2008:17:18:15 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829
Second request:         127.0.0.1 - - [21/Mar/2008:17:18:31 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 304 -

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en; rv:1.8.1.14pre) Gecko/20080321 Camino/1.6b3pre (like Firefox/2.0.0.14pre) <---- (Camino nightly) Running on 10.5.2 / Intel

First request:          127.0.0.1 - - [21/Mar/2008:17:20:04 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829
Second request:         127.0.0.1 - - [21/Mar/2008:17:20:08 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 304 -

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) <---- Running on WinXP SP2

First request:          10.50.3.140 - - [21/Mar/2008:17:24:53 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829
Second request:         10.50.3.140 - - [21/Mar/2008:17:25:05 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 304 -

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) <---- Running on WinXP SP2

First request:          10.50.3.140 - - [21/Mar/2008:17:28:47 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 200 2829
Second request:         10.50.3.140 - - [21/Mar/2008:17:28:49 -0700] "GET /~mhughes/images/macosxlogo.gif HTTP/1.1" 304 -

Additional information:
-----------------------

I am not familiar with the core of the engine enough to point blame, but I did notice that even with returning:

...
Last-Modified: Sat, 03 Nov 2007 20:12:50 GMT
ETag: "52b87-b0d-43e0be8509480"
...
Cache-Control: max-age=604800
Expires: Sat, 29 Mar 2008 00:12:40 GMT
...

The subsequent request(s) from WebKit/Safari will show (by using the web inspector to dump headers):

>>> request headers >>>

Cache-Control	max-age=0
Referer			http://localhost/~mhughes/
User-Agent		Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_2; en-us) AppleWebKit/526.1+ (KHTML, like Gecko) Version/3.0.4 Safari/523.15

<<< request headers <<<

>>> response headers >>>

Accept-Ranges	bytes
Cache-Control	max-age=604800
Connection		Keep-Alive
Content-Length	2829
Content-Type	image/gif
Date			Sat, 22 Mar 2008 00:33:41 GMT
Etag			"52b87-b0d-43e0be8509480"
Expires			Sat, 29 Mar 2008 00:33:41 GMT
Keep-Alive		timeout=5, max=98
Last-Modified	Sat, 03 Nov 2007 20:12:50 GMT
Server			Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.7l DAV/2 PHP/5.2.4

<<< response headers <<<

Of note is that no matter how many cache cleans or reloads, the request always seems to contain:

	Cache-Control	max-age=0

A good example to test with:
----------------------------

 * http://www.apple.com/startpage/

Clear cache > Open Inspector > Load Page.

 *** According to network panel, all of the resources total ~770KB.

Reload page leaving inspector open.

 *** Same transfer of data. A sampling of the requests show all as pulling max-age=0 and retransferring.

Note: As a point of comparison, repeat the same thing in FireFox with FireBug installed. On reload, ONLY the HTML page is reloaded from the network. All other pages trust their cached copies, even though they do confirm with the server that their copies are good.
Comment 1 Mark Rowe (bdash) 2008-03-21 19:13:46 PDT
Please try a similar experiment but instead of hitting Reload, simply load the page again by hitting Enter in the address bar.
Comment 2 Mark J. Hughes 2008-03-22 15:02:20 PDT
(In reply to comment #1)
> Please try a similar experiment but instead of hitting Reload, simply load the
> page again by hitting Enter in the address bar.

On resubmission, or by following a link to that page, the cache is wholly respected and there is not even a request to validate the content to confirm the page's status - there is no network activity whatsoever. This is great, and even more "aggressive" than other browsers.

However, why during a reload are the linked resources themselves not trusted? I could see a reload causing the contents of the "primary" URL to be forceably reloaded/reverifed (Cache-Control: max-age=0), but if that page references resources in the cache already, wouldn't it be more sensible/efficient to let the cache work the same way as if you had reentered the URL? In this case, with Expires/Cache-control: max-age=>0 (freshness) and ETag (validator) returned in the initial response, sending the If-None-Match / If-Modified-Since headers would make the utmost sense.

In short, even if reload becomes a condition for the browser to forceably revalidate all of the resources involved, shouldn't the proper outcome of described scenario be a stream of 304s from the server in the ideal situation?
Comment 3 Mark Rowe (bdash) 2008-03-22 15:08:23 PDT
See <http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4>:  What you describe for the current behaviour is termed "Unspecified end-to-end revalidation".  What you describe for the preferred behaviour is termed "Specific end-to-end revalidation".  The paragraph preceding the definitions suggests that specific end-to-end revalidation should be used when a local cached copy of the resource is available, which would appear to be in agreement the approach you suggest.  I suspect that the current behaviour that WebKit uses is determined by the CFNetwork stack it uses for HTTP rather than in WebKit itself.
Comment 4 Mark Rowe (bdash) 2008-03-22 15:09:02 PDT
I think the bug should be retitled to reflect the specific problem you mentioned.  At the moment it's too general and doesn't accurately describe the situation.
Comment 5 Mark J. Hughes 2008-03-22 15:55:49 PDT
Renamed per your suggestion. You are certainly correct in this, as the file is in the cache already, and therefore not a total unknown.
Comment 6 Mark Rowe (bdash) 2008-03-23 21:44:42 PDT
<rdar://problem/5815252>
Comment 7 Alexey Proskuryakov 2008-03-23 23:18:16 PDT
> Note: As a point of comparison, repeat the same thing in FireFox with FireBug
> installed. On reload, ONLY the HTML page is reloaded from the network.

In general, Safari reloading behavior should be compared to what other browsers do on Shift-reload - since we only have on flavor of reload, it has to be the more thorough one.
Comment 8 Mark J. Hughes 2008-03-24 09:02:24 PDT
> > Note: As a point of comparison, repeat the same thing in FireFox with FireBug
> > installed. On reload, ONLY the HTML page is reloaded from the network.
> 
> In general, Safari reloading behavior should be compared to what other browsers
> do on Shift-reload - since we only have on flavor of reload, it has to be the
> more thorough one.

All due respect to the many engineers who have spent their time working on this project, but that doesn't sound like a proper implementation of the mechanism. Most all browsers seem to implement "Specific end to end" on reload, as referenced earlier. That WebKit would forcibly ignore the cache on reload seems counterintuitive, and unfriendly to the types of issues I list above - low bandwidth, high latency, or resource-limited devices - the first two I have spent much time in, and I am made to shy away from WebKit/Safari when I do.

That said, what is the rationale for wholesale reloading on "reload?"
Comment 9 Alexey Proskuryakov 2008-03-24 09:22:31 PDT
Please note that my personal opinion is that we should have both reload and shift-reload. But in the absence of the latter, it appears obvious that reload should fulfill the primary purpose of reloading - which is to guarantee freshness of content, regardless of proxy misconfiguration, time zone changes, software bugs and other issues that may necessitate full reload. Saving bandwidth is a very important consideration, but a secondary one.

Just as an example, in the past I've personally experienced a situation where I couldn't reload stale content that was stuck in a proxy from within Safari - I had to open the same page in Firefox and use shift-reload; only after that, Safari would display updated content.
Comment 10 Antti Koivisto 2008-03-24 10:10:03 PDT
It is not acceptable to make user experience worse for every user to deal with exteremely rare caching problems. Supporting shift-reload is probably the right way to deal with those cases.
Comment 11 Mark J. Hughes 2008-03-24 10:19:20 PDT
> Please note that my personal opinion is that we should have both reload and
> shift-reload. But in the absence of the latter, it appears obvious that reload
> should fulfill the primary purpose of reloading - which is to guarantee
> freshness of content, regardless of proxy misconfiguration, time zone changes,
> software bugs and other issues that may necessitate full reload. Saving
> bandwidth is a very important consideration, but a secondary one.

So what would the process be to have both options available?

Clearly, it's not a situation where the ideal is one or the other, but in fact both. I would also think that there could be a client-side preference to force super-reload when connected to a known bad proxy, or in general (for web dev testing for instance), or to be OS X Network Locaton specific (so that you could say to always err on the side of trust on a cellular WiFi network).

I'd like to dig in further to the codebase and see if I could perhaps contribute this. Does anyone know if the work would be in WebKit, WebCore, CFNetwork, or is it not community changeable?

> Just as an example, in the past I've personally experienced a situation where I
> couldn't reload stale content that was stuck in a proxy from within Safari - I
> had to open the same page in Firefox and use shift-reload; only after that,
> Safari would display updated content.

As both are just browsers, I believe it would then be advisable to research if there are any differences between WebKit SuperReload'ing and FF's SuperReload'ing. If FF can force an upstream to re-cache, and WebKit cannot, that seems like WebKit needs additional headers/something to force upstreams to re-cache/re-validate content.
Comment 12 Alexey Proskuryakov 2008-03-24 15:26:21 PDT
(In reply to comment #11)
> Does anyone know if the work would be in WebKit, WebCore,
> CFNetwork, or is it not community changeable?

Depending on what the UI will be, this may be WebKit or Safari. Web Inspector is in open source WebKit, while Safari UI is closed source. Some supporting changes in lower level frameworks might be necessary, too.

> As both are just browsers, I believe it would then be advisable to research if
> there are any differences between WebKit SuperReload'ing and FF's
> SuperReload'ing. 

This was a while ago, I believe Safari's behavior has changed, and using Firefox to reset proxy wouldn't have been necessary now.
Comment 13 Alexey Proskuryakov 2008-03-25 01:08:07 PDT
(In reply to comment #12)
> This was a while ago, I believe Safari's behavior has changed, and using
> Firefox to reset proxy wouldn't have been necessary now.

Actually, I take my words back - this part hasn't changed, see bug 7414 comment 6 (the plain reload results mentioned there are obsolete).
Comment 14 Antti Koivisto 2008-08-18 18:09:59 PDT
Created attachment 22872 [details]
implement Specific end-to-end revalidation for WebCore memory cache

putting this up for comments
Comment 15 Antti Koivisto 2008-08-18 20:44:44 PDT
Created attachment 22874 [details]
updated patch

Fixes based on comments by Mark.

Added code that includes "Cache-Control: max-age=0" in reload case (which is not enabled for now) required for correct "Unspecified end-to-end revalidation" .
Comment 16 Darin Adler 2008-08-24 15:32:07 PDT
Comment on attachment 22874 [details]
updated patch

This looks great. I have a few questions about it.

How can we test this? I assume you did some testing already, but I'm curious how we can prevent changes from causing regressions in this area. I'm also thinking that the division of labor between WebCore and the underlying loader library is getting weaker.

Some of the code here should probably be using functions from ResourceHandle; maybe the constant "304" could be coming from there for example and also the "can revalidate".

 195     if (resource->resourceToRevalidate())
 196         return;

I don't like the way this if statement reads. It's in a function named revalidateResource it says "if there is a resource to revalidate, just return". This shows that the names aren't quite right. I think I understand the case where this would be true, but I'm not sure. Maybe this is a resource that already represents a revalidation? Maybe we need a comment explaining this a little more?

 201     String url = resource->url(),;

The comma here looks like a syntax error. It seems that we wouldn't be able to compile this. Also might be more efficient to use const& here.

 282     // Equivalent of deref() for all clients
 283     m_clients.clear();
 284     
 285     unsigned moveCount = clientsToMove.size();
 286     for (unsigned n = 0; n < moveCount; ++n)
 287         m_resourceToRevalidate->addClient(clientsToMove[n]);

Does this create a window where the clients could go away altogether or something along those lines? Normally doing a deref before a reref is incorrect. Maybe it's OK since it's not literally deref(). Also, it's possible your use of "deref()" here is out of date, since Hyatt renamed the functions here in the cache since they weren't quite ref/deref.

 292     return !m_loading && (!m_response.httpHeaderField("Last-Modified").isEmpty() || !m_response.httpHeaderField("ETag").isEmpty());

This assumes that ETag is implemented. It's not implemented in current versions of CFNetwork!

 48         void setResource(CachedResource* res) {

We put the braces on a separate line. And maybe a function like this shouldn't be inline. The general rule is that we want to inline functions that are completely trivial, like a single assignment, or ones where there's a clear performance win. But not others.

 57         CachedResourceHandleBase& operator=(const CachedResourceHandleBase&) { return *this; } 

Do we want this function to be defined at all? Who needs to call it?

 245             String lastModified = resourceToRevalidate->response().httpHeaderField("Last-Modified");
 246             String eTag = resourceToRevalidate->response().httpHeaderField("ETag");

These can be more efficient if they use const& so they don't have to ref/deref the string.

I'm going to say review- because of that comma (at least) and because there seem to be enough questions that at least one might cause you to change the patch.
Comment 17 Mark Rowe (bdash) 2008-08-24 15:42:48 PDT
(In reply to comment #16)
> (From update of attachment 22874 [details] [edit])
> 
>  292     return !m_loading &&
> (!m_response.httpHeaderField("Last-Modified").isEmpty() ||
> !m_response.httpHeaderField("ETag").isEmpty());
> 
> This assumes that ETag is implemented. It's not implemented in current versions
> of CFNetwork!

The revalidation is handled entirely in WebCore, with Loader::Host::servePendingRequests adding the If-None-Match header if the original response had an ETag header.  This should allow us to use ETags to revalidate from the memory cache even when using a network library that does not support revalidation via ETags.
Comment 18 Antti Koivisto 2008-08-24 17:20:45 PDT
(In reply to comment #16)
> (From update of attachment 22874 [details] [edit])
> This looks great. I have a few questions about it.
> 
> How can we test this? I assume you did some testing already, but I'm curious
> how we can prevent changes from causing regressions in this area. I'm also
> thinking that the division of labor between WebCore and the underlying loader
> library is getting weaker.

I don't have good story for that. I have been running versions of this patch in regular use over a few months but that obviously that obviously does not catch that much. Automated HTTP tests that give code coverage should be possible but good coverage of real world scenarios (involving potentially broken servers, caches, etc) is very difficult. This is true for all our networking code.

I don't think division of labor is getting weaker. That overall idea is to treat networking library cache and WebCore cache as two separate levels in HTTP cache hierarchy. The specification is written in multiple cache levels in mind and the interaction between levels is well defined. This allows us to implement functionality (like ETag, below) without caring if it is implemented in other levels.

For example if the CFNetwork level includes a cache validator to a request for an item in its own cache we never see the resulting 304 response in WebCore level. It gets transparently turned into 200 response.  We only get 304 responses for requests that include a validator from the WebCore level.
 
> Some of the code here should probably be using functions from ResourceHandle;
> maybe the constant "304" could be coming from there for example and also the
> "can revalidate".
> 
>  195     if (resource->resourceToRevalidate())
>  196         return;
> 
> I don't like the way this if statement reads. It's in a function named
> revalidateResource it says "if there is a resource to revalidate, just return".
> This shows that the names aren't quite right. I think I understand the case
> where this would be true, but I'm not sure. Maybe this is a resource that
> already represents a revalidation? Maybe we need a comment explaining this a
> little more?

I don't like the name either. I changed it a few times and this was the best I came up with.

The questions here is along the lines of "Is this a resource that represents a request that includes cache validator for an existing resource".

> 
>  201     String url = resource->url(),;
> 
> The comma here looks like a syntax error. It seems that we wouldn't be able to
> compile this. Also might be more efficient to use const& here.

Good catch. Suprisingly it compiles fine, as does

int a,;

No idea if this is a compiler bug or if it is legal C++.

> 
>  282     // Equivalent of deref() for all clients
>  283     m_clients.clear();
>  284     
>  285     unsigned moveCount = clientsToMove.size();
>  286     for (unsigned n = 0; n < moveCount; ++n)
>  287         m_resourceToRevalidate->addClient(clientsToMove[n]);
> 
> Does this create a window where the clients could go away altogether or
> something along those lines? Normally doing a deref before a reref is
> incorrect. Maybe it's OK since it's not literally deref(). Also, it's possible
> your use of "deref()" here is out of date, since Hyatt renamed the functions
> here in the cache since they weren't quite ref/deref.

Yeah, deref() comment is pre-rename.

CachedResources have weird destruction rules that involve multiple conditions (CachedResources::canDelete()), and this patch adds more (for resources in process of being revalidated) so just removing clients don't kill the objects. 

> 
>  292     return !m_loading &&
> (!m_response.httpHeaderField("Last-Modified").isEmpty() ||
> !m_response.httpHeaderField("ETag").isEmpty());
> 
> This assumes that ETag is implemented. It's not implemented in current versions
> of CFNetwork!

The implementation is entirely self contained in WebCore. We can validate WebCore cache with ETag validators even if CFNetwork can not do it with its cache.

> 
>  48         void setResource(CachedResource* res) {
> 
> We put the braces on a separate line. And maybe a function like this shouldn't
> be inline. The general rule is that we want to inline functions that are
> completely trivial, like a single assignment, or ones where there's a clear
> performance win. But not others.

Ok.

>  57         CachedResourceHandleBase& operator=(const
> CachedResourceHandleBase&) { return *this; } 
> 
> Do we want this function to be defined at all? Who needs to call it?

I made it private to ensure that no one is invoking the default version. The handle went though various iterations, that helped to catch bugs at least in some form.

> 
>  245             String lastModified =
> resourceToRevalidate->response().httpHeaderField("Last-Modified");
>  246             String eTag =
> resourceToRevalidate->response().httpHeaderField("ETag");
> 
> These can be more efficient if they use const& so they don't have to ref/deref
> the string.
> 
> I'm going to say review- because of that comma (at least) and because there
> seem to be enough questions that at least one might cause you to change the
> patch.

Thanks for the review!
Comment 19 Antti Koivisto 2008-08-25 15:16:18 PDT
Created attachment 22987 [details]
more updates

Fixes based on review comments, cleanups
- added CachedResource::mustRevalidate(CachePolicy) that takes "Cache-Control: no-cache" and "CacheControl: must-revalidate" into account
- added CachedResource::isCacheValidator()
- renamed CachedResource::canBeRevalidated() to canUseCacheValidator()
- added CachedResourceHandle.cpp
- sorted headers
- random cosmetic fixes
Comment 20 Darin Adler 2008-08-30 19:14:07 PDT
Comment on attachment 22987 [details]
more updates

Sorry, I can't remember which of these are repeating something I already asked.

 213         CachedScript *cs = pendingScripts.first().get();

Would be nice to move the star when you're tweaking this line of code.

 225     int delta = static_cast<int>(resource->size());

Does this really require an explicit cast?

 103     void revalidationSucceeded(CachedResource* revalidatingResource, const ResourceResponse& response);
 104     void revalidationFailed(CachedResource* revalidatingResource);

Seems that you don't need the argument names here.

 111     return (difftime(now, m_expirationDate) >= 0);

Extra parentheses on this line.

 88     void checkDelete();

Not a great name. This names it sound like we check deletion, which doesn't make a lot of logical sense. A better name might be deleteIfPossible() or deleteIfNotInCache(). Those are less elegant than checkDelete, but at least a little easier to understand. There's probably an even better name.

 301         return !cacheControl.isEmpty() && (cacheControl.contains("no-cache") || (isExpired() && cacheControl.contains("must-revalidate")));
 302     return isExpired() || cacheControl.contains("no-cache");

I think we'd want the case-ignoring version of contains, right?

Also, is a substring match really the correct way to read this field? That seems a little sloppy to me. What's the correct algorithm?

 134     void setResponse(const ResourceResponse& response);
 188     void setResourceToRevalidate(CachedResource* resource);

Should omit the argument names in these cases.

 31 void CachedResourceHandleBase::setResource(CachedResource* resource) {

Brace goes on a separate line.

 39         operator bool() const { return get(); }

We normally don't use operator bool. I think it would be best to imitate the technique used in our smart pointers such as RefPtr instead.

 42         CachedResourceHandleBase() : m_resource(0) {}

We use a space in these parentheses.

 46         void setResource(CachedResource* res);

Should omit the argument names in these cases.

None of these are serious concerns, so I'll say review+.
Comment 21 Antti Koivisto 2008-09-05 02:32:04 PDT
Sending        WebCore/ChangeLog
Sending        WebCore/WebCore.pro
Sending        WebCore/WebCore.vcproj/WebCore.vcproj
Sending        WebCore/WebCore.xcodeproj/project.pbxproj
Sending        WebCore/css/CSSFontFaceSource.h
Sending        WebCore/css/CSSImportRule.h
Sending        WebCore/dom/Clipboard.h
Sending        WebCore/dom/ProcessingInstruction.h
Sending        WebCore/dom/ScriptElement.h
Sending        WebCore/dom/XMLTokenizer.cpp
Sending        WebCore/dom/XMLTokenizer.h
Sending        WebCore/html/HTMLImageLoader.cpp
Sending        WebCore/html/HTMLImageLoader.h
Sending        WebCore/html/HTMLLinkElement.h
Sending        WebCore/html/HTMLTokenizer.cpp
Sending        WebCore/html/HTMLTokenizer.h
Sending        WebCore/loader/Cache.cpp
Sending        WebCore/loader/Cache.h
Sending        WebCore/loader/CachedResource.cpp
Sending        WebCore/loader/CachedResource.h
Adding         WebCore/loader/CachedResourceHandle.cpp
Adding         WebCore/loader/CachedResourceHandle.h
Sending        WebCore/loader/DocLoader.cpp
Sending        WebCore/loader/UserStyleSheetLoader.h
Sending        WebCore/loader/loader.cpp
Sending        WebCore/page/EventHandler.cpp
Sending        WebCore/rendering/RenderImage.cpp
Sending        WebCore/rendering/RenderImage.h
Sending        WebCore/rendering/style/RenderStyle.h
Sending        WebCore/rendering/style/StyleCachedImage.h
Sending        WebCore/svg/SVGFEImageElement.h
Sending        WebCore/svg/graphics/filters/SVGFEImage.h
Sending        WebCore/xml/XSLImportRule.h
Transmitting file data .................................
Committed revision 36109.
Comment 22 Antti Koivisto 2008-09-05 02:33:45 PDT
> Also, is a substring match really the correct way to read this field? That
> seems a little sloppy to me. What's the correct algorithm?

I added a FIXME. It should really be tokenized but substring match works in practice (and some http code i have seen does the same thing). 
Comment 23 Antti Koivisto 2008-09-05 02:34:28 PDT
Leaving this open for followup patch to enable it for reloading case.
Comment 24 Eric Seidel (no email) 2008-10-06 18:25:34 PDT
Comment on attachment 22987 [details]
more updates

Clearing review flag since this has landed.
Comment 25 Antti Koivisto 2008-11-11 02:51:58 PST
Created attachment 25045 [details]
WebCore patch for enabling revalidation on reload
Comment 26 Antti Koivisto 2008-11-11 02:54:29 PST
Created attachment 25046 [details]
WebKit patch for enabling revalidation on reload

This also contains hack that allows WebKit running in shipping Safari to use shift-reload to force end-to-end reload.
Comment 27 Antti Koivisto 2008-11-11 14:30:23 PST
Created attachment 25055 [details]
updated WebCore patch

Some small updates
- set request cache policy in addExtraFieldsToRequest()
- some more text in ChangeLog
Comment 28 Mark Rowe (bdash) 2008-11-13 15:48:13 PST
*** Bug 13892 has been marked as a duplicate of this bug. ***
Comment 29 Andrew 2008-11-24 10:42:46 PST
Does this bug need to be accepted as a bug (instead of having the status NEW) in order to get reviewed?
Comment 30 Darin Adler 2008-11-24 11:49:19 PST
(In reply to comment #29)
> Does this bug need to be accepted as a bug (instead of having the status NEW)
> in order to get reviewed?

No. "NEW" just means that the assignee hasn't accepted. It's more the status of the assignee than the status of the bug. We don't use it much on the WebKit project.
Comment 31 Andrew 2008-11-24 12:41:01 PST
So currently the bug is assigned to "Nobody". Should it be assigned to the person in charge of this area of the Webkit engine so that it can be moved along? I'm quite keen to see these patches get reviewed as this bug very much affects my daily usage of Webkit.
Comment 32 Darin Adler 2008-11-24 12:55:36 PST
(In reply to comment #31)
> So currently the bug is assigned to "Nobody". Should it be assigned to the
> person in charge of this area of the Webkit engine so that it can be moved
> along? I'm quite keen to see these patches get reviewed as this bug very much
> affects my daily usage of Webkit.

This patch is in the review queue. There are about 50 others in the queue too. Assignment of the bug won't affect how soon it's reviewed.
Comment 33 Darin Adler 2008-11-24 13:51:30 PST
Comment on attachment 25046 [details]
WebKit patch for enabling revalidation on reload

> +    // FIXME: This is a hack to make shift-reload work on shipping Safari. Remove when Safari has this functionality.
> +    if ([[[NSBundle mainBundle] bundleIdentifier] isEqualToString:@"com.apple.Safari"])

Normally we would do a linked-on-or-after check to make this back off automatically when the new Safari ships.
Comment 34 Darin Adler 2008-11-24 13:51:30 PST
Comment on attachment 25046 [details]
WebKit patch for enabling revalidation on reload

> +    // FIXME: This is a hack to make shift-reload work on shipping Safari. Remove when Safari has this functionality.
> +    if ([[[NSBundle mainBundle] bundleIdentifier] isEqualToString:@"com.apple.Safari"])

Normally we would do a linked-on-or-after check to make this back off automatically when the new Safari ships.
Comment 35 Darin Adler 2008-11-24 13:56:12 PST
Comment on attachment 25055 [details]
updated WebCore patch

> +    CachePolicy cachePolicy = this->cachePolicy();
> +    if (cachePolicy == CachePolicyVerify || cachePolicy == CachePolicyCache) {
> +        CachedResource* existing = cache()->resourceForURL(fullURL.string());
> +        if (existing && !existing->isPreloaded() && existing->mustRevalidate(cachePolicy)) {
> +            cache()->revalidateResource(existing, this);
> +            m_reloadedURLs.add(fullURL.string());
> +        }
> +        return;
> +    }
> +    if (cachePolicy == CachePolicyReload || cachePolicy == CachePolicyRevalidate) {
> +        CachedResource* existing = cache()->resourceForURL(fullURL.string());
> +        if (existing && !existing->isPreloaded()) {
> +            if (cachePolicy == CachePolicyReload)
> +                cache()->remove(existing);
> +            else
> +                cache()->revalidateResource(existing, this);
> +            m_reloadedURLs.add(fullURL.string());
> +        }
>      }

The above looks like it should be a switch statement.

> -    ScheduledRedirection(double redirectDelay, const String& redirectURL, bool redirectLockHistory, bool userGesture)
> +    ScheduledRedirection(double redirectDelay, const String& redirectURL, bool redirectLockHistory, bool userGesture, bool refresh)

It's a little confusing to have both the term "reload" and the term "refresh" used.

> +    ResourceRequest request(url, referrer,  refresh ? ReloadIgnoringCacheData : UseProtocolCachePolicy);

Two spaces here after the comma.

> -        scheduleRedirection(new ScheduledRedirection(delay, url, delay <= 1, false));
> +        scheduleRedirection(new ScheduledRedirection(delay, url, delay <= 1, false, false));

Yuck, I hate booleans. We should really use enums instead so we can see what "false" means.

> +    // FIXME: We don't have a mechanism to revalidate the main resource without reloadin at the moment.

Typo here. Need a "g" on the end of "reloading".

> -        void changeLocation(const String& url, const String& referrer, bool lockHistory = true, bool userGesture = false);
> -        void changeLocation(const KURL&, const String& referrer, bool lockHistory = true, bool userGesture = false);
> +        void changeLocation(const String& url, const String& referrer, bool lockHistory = true, bool userGesture = false, bool refresh = false);
> +        void changeLocation(const KURL&, const String& referrer, bool lockHistory = true, bool userGesture = false, bool refresh = false);

You added the "= false" default here, but you also added the false boolean explicitly to some call sites. I think you should omit it at the call sites.

r=me
Comment 36 Antti Koivisto 2008-12-15 07:14:08 PST
> The above looks like it should be a switch statement.

Fixed.
 
> It's a little confusing to have both the term "reload" and the term "refresh"
> used.

The confusing terminology predates this patch. At least some instances of "refresh" refer to meta refresh header. Cleaning this up left as future exercise.

> > +    ResourceRequest request(url, referrer,  refresh ? ReloadIgnoringCacheData : UseProtocolCachePolicy);
> 
> Two spaces here after the comma.

Fixed.

> Yuck, I hate booleans. We should really use enums instead so we can see what
> "false" means.

Agreed. Still keeping this bit of refactoring out from this patch.

> Typo here. Need a "g" on the end of "reloading".

Fixed.

> You added the "= false" default here, but you also added the false boolean
> explicitly to some call sites. I think you should omit it at the call sites.

Fixed.

Also did some slight additional refactoring which probably does not need review.
Comment 37 Antti Koivisto 2008-12-15 07:30:49 PST
Sending        WebCore/ChangeLog
Sending        WebCore/WebCore.base.exp
Sending        WebCore/loader/CachePolicy.h
Sending        WebCore/loader/DocLoader.cpp
Sending        WebCore/loader/DocLoader.h
Sending        WebCore/loader/DocumentLoader.cpp
Sending        WebCore/loader/FrameLoader.cpp
Sending        WebCore/loader/FrameLoader.h
Sending        WebCore/loader/FrameLoaderTypes.h
Sending        WebCore/loader/NavigationAction.cpp
Sending        WebCore/loader/SubresourceLoader.cpp
Sending        WebCore/loader/loader.cpp
Transmitting file data ............
Committed revision 39304.
Comment 38 Antti Koivisto 2008-12-15 07:32:02 PST
Sending        WebKit/mac/ChangeLog
Sending        WebKit/mac/WebView/WebFrame.h
Sending        WebKit/mac/WebView/WebFrame.mm
Sending        WebKit/mac/WebView/WebFramePrivate.h
Sending        WebKit/mac/WebView/WebView.h
Sending        WebKit/mac/WebView/WebView.mm
Transmitting file data ......
Committed revision 39305.
Comment 39 Antti Koivisto 2008-12-15 07:33:48 PST
> Normally we would do a linked-on-or-after check to make this back off
> automatically when the new Safari ships.

I will add this bit later.  
Comment 40 Antti Koivisto 2008-12-15 07:34:31 PST
This bug still remain unresolved for the top level resources.
Comment 41 Mark Rowe (bdash) 2009-01-12 23:36:55 PST
Comment on attachment 25046 [details]
WebKit patch for enabling revalidation on reload

Clearing review flag as this was landed.
Comment 42 Mark Rowe (bdash) 2009-01-12 23:37:11 PST
Comment on attachment 25055 [details]
updated WebCore patch

Clearing review flag as this was landed.
Comment 43 Andrew 2009-03-27 08:46:25 PDT
Perhaps this is bad bugzilla-etiquette but can someone review these patches please? They were added to this bug 3.5 months ago now. If I had the skill I would but I don't so I'm afraid I need to ask.

Thank you.
Comment 44 Mark Rowe (bdash) 2009-03-27 09:07:57 PDT
None of the patches attached to this bug are waiting for review.  The relevant ones have already been landed, as my two previous comments indicate.
Comment 45 Andrew 2009-03-27 09:10:35 PDT
Mark, thanks. So we're awaiting further patches from Antti in order to be able to mark this bug as resolved?
Comment 46 David Kilzer (:ddkilzer) 2010-01-04 16:44:48 PST
(In reply to comment #45)
> Mark, thanks. So we're awaiting further patches from Antti in order to be able
> to mark this bug as resolved?

Correct.  Antti?  :)
Comment 47 Alexander Romanovich 2010-10-27 09:23:59 PDT
Just to be clear: are you planning to send If-None-Match/If-Modified-Since headers with a regular reload request in WebKit?

At the moment WebKit browsers do not do this, while FF and IE do. FF and IE can then benefit from 304 Not Modified responses from the server with no body content and it would be great if WebKit could receive those too, especially in situations where end users are hammering their reload button in order to get content they're expecting to be published in the immediate future.
Comment 48 Alexey Proskuryakov 2010-10-27 10:09:12 PDT
Just to avoid confusion, this is only about top level resources. Subresources do get their conditional headers on reload, and we also have shift-reload now (there is no Cmd+Shift+R, but you can shift-click the reload button).

The top level resource being different is still a bug that needs to be fixed.
Comment 49 Antti Koivisto 2010-11-09 04:28:32 PST
I filed bug 49246 specifically for the main resource in the memory cache.
Comment 50 Alexander Romanovich 2013-01-30 06:51:45 PST
Is this any closer to being fixed for main resources now that Nate Chapin landed all the changes for main resources and the memory cache in:

Bug 49246 - Main resource should be cached in the memory cache
Bug 105667 - Enable reuse of cached main resources
Comment 51 Brady Eidson 2013-04-18 13:47:45 PDT
We assumed that once we cached main resources, main resources would get this for free.

Well, we cache main resources now and - as pointed out in 114738 - we haven't gotten it for free.

We can finish this up now.
Comment 52 Brady Eidson 2013-04-18 13:53:31 PDT
*** Bug 114738 has been marked as a duplicate of this bug. ***
Comment 53 Alexander Romanovich 2015-10-21 11:09:33 PDT
Hi all. Though there was some initial debate about this 7 years ago, the most recent comments here suggest there isn't any controversy today. Especially based on comments 48 and 51. Are any WebKit developers actively interested in pursuing this?