Bug 49672

Summary: Allow no-store resources to be used for back navigation
Product: WebKit Reporter: Joseph Pecoraro <joepeck>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, beidson, darin, ddkilzer, eric, joepeck, koivisto, simon.fraser, skyul, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
[PATCH] Allow no-store for history
none
[PATCH] Fixed Typos + Improved Comments
none
[PATCH] Do No Mark "https" Resources as Allowed For History
darin: review+
[PATCH] Patch that Will Land none

Description Joseph Pecoraro 2010-11-17 10:08:44 PST
The HTTP spec allows 'Cache-Control: no-store' resources to be used when used for "History", such as back/forward navigation.

Relevant parts of the spec:
http://www.ietf.org/rfc/rfc2616.txt

13.13 History Lists

    User agents often have history mechanisms, such as "Back" buttons
    and history lists, which can be used to redisplay an entity retrieved
    earlier in a session.

>   History mechanisms and caches are different. In particular history
>   mechanisms SHOULD NOT try to show a semantically transparent view
>   of the current state of a resource. Rather, a history mechanism is
>   meant to show exactly what the user saw at the time when the resource
>   was retrieved.

    By default, an expiration time does not apply to history mechanisms.
    If the entity is still in storage, a history mechanism SHOULD display
    it even if the entity has expired, unless the user has specifically
    configured the agent to refresh expired history documents.

    This is not to be construed to prohibit the history mechanism from
    telling the user that a view might be stale.


14.9.2 What May be Stored by Caches

   no-store
      The purpose of the no-store directive is to prevent the
      inadvertent release or retention of sensitive information (for
      example, on backup tapes). The no-store directive applies to the
      entire message, and MAY be sent either in a response or in a
      request. If sent in a request, a cache MUST NOT store any part of
      either this request or any response to it. If sent in a response,
      a cache MUST NOT store any part of either this response or the
      request that elicited it. This directive applies to both non-
      shared and shared caches. "MUST NOT store" in this context means
      that the cache MUST NOT intentionally store the information in
      non-volatile storage, and MUST make a best-effort attempt to
      remove the information from volatile storage as promptly as
      possible after forwarding it.

>     Even when this directive is associated with a response, users
>     might explicitly store such a response outside of the caching
>     system (e.g., with a "Save As" dialog). History buffers MAY store
>     such responses as part of their normal operation.

      The purpose of this directive is to meet the stated requirements
      of certain users and service authors who are concerned about
      accidental releases of information via unanticipated accesses to
      cache data structures. While the use of this directive might
      improve privacy in some cases, we caution that it is NOT in any
      way a reliable or sufficient mechanism for ensuring privacy. In
      particular, malicious or compromised caches might not recognize or
      obey this directive, and communications networks might be
      vulnerable to eavesdropping.
Comment 1 Joseph Pecoraro 2010-11-17 10:32:35 PST
Created attachment 74132 [details]
[PATCH] Allow no-store for history
Comment 2 Joseph Pecoraro 2010-11-17 10:37:17 PST
The test can be tested manually when the patch is applied. Just start the webkit-tests http server:

    shell> ./WebKitTools/Scripts/run-webkit-httpd

Then load the test page:

    http://127.0.0.1:8000/cache/history-only-cached-subresource-loads.html

I thought about adding a setTimeout between navigations when not in DRT, so
when run manually you can see the navigations but I decided it would make the
code look more complicated.

When run before the patch, the first part of the test will FAIL, because the back
navigation will refetch the resource.
Comment 3 Joseph Pecoraro 2010-11-17 10:49:23 PST
Created attachment 74133 [details]
[PATCH] Fixed Typos + Improved Comments
Comment 4 Alexey Proskuryakov 2010-11-17 11:56:26 PST
Does this work in Firefox?

I personally think that this is a good idea, but there is some related controversy, see bug 26777.
Comment 5 Darin Adler 2010-11-17 12:36:25 PST
Comment on attachment 74133 [details]
[PATCH] Fixed Typos + Improved Comments

View in context: https://bugs.webkit.org/attachment.cgi?id=74133&action=review

r=me, but you should change some of the names

> WebCore/loader/cache/CachedResource.h:124
> +    bool isHistoryOnly() const { return m_historyOnly; }
> +    void setHistoryOnly(bool b) { m_historyOnly = b; }

Name these isStale() and setStale().

> WebCore/loader/cache/CachedResource.h:260
> +    bool m_historyOnly : 1;

Name this m_isStale.
Comment 6 Joseph Pecoraro 2010-11-17 13:27:00 PST
(In reply to comment #5)
> (From update of attachment 74133 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=74133&action=review
> 
> r=me, but you should change some of the names
> 
> > WebCore/loader/cache/CachedResource.h:124
> > +    bool isHistoryOnly() const { return m_historyOnly; }
> > +    void setHistoryOnly(bool b) { m_historyOnly = b; }
> 
> Name these isStale() and setStale().
> 
> > WebCore/loader/cache/CachedResource.h:260
> > +    bool m_historyOnly : 1;
> 
> Name this m_isStale.

I originally had that exact naming scheme, but decided against it. My reasoning
for switching to "history only" was that I thought "stale" was too similar to isExpired().
This comes into play when a resource is received with other headers, such as "Expires",
or "Cache-Control: max-age=..." and it outlives its freshness. I wanted to clearly
indicate this new state of a resource, which is why I chose the longer name.

With this in mind, would "stale" still be a better choice? It does match more closely
to the CachePolicy name (AllowsStale). If not, should I change the "allowsStale"
variable name to MemoryCache::requestResource be changed instead?
Comment 7 Darin Adler 2010-11-17 14:45:15 PST
(In reply to comment #6)
> I originally had that exact naming scheme, but decided against it. My reasoning
> for switching to "history only" was that I thought "stale" was too similar to isExpired().
> This comes into play when a resource is received with other headers, such as "Expires",
> or "Cache-Control: max-age=..." and it outlives its freshness. I wanted to clearly
> indicate this new state of a resource, which is why I chose the longer name.
> 
> With this in mind, would "stale" still be a better choice? It does match more closely
> to the CachePolicy name (AllowsStale). If not, should I change the "allowsStale"
> variable name to MemoryCache::requestResource be changed instead?

I see that you took your term from the specification text, which refers to this as a “history buffer”.

Sadly, the word “history” is ambiguous, because back/forward is called history in the Window object API, but and global history is called history in browser user interface. So I think a name involving “history” is confusing. And also a bit of a layering violation because the fact that this data is used for a “history buffer” is not really something that belongs down at this cache level.

If we want to use the terminology “for history only” for this, then yes I think we need to change the name of the cache policy too.
Comment 8 Darin Adler 2010-11-17 14:47:05 PST
Comment on attachment 74133 [details]
[PATCH] Fixed Typos + Improved Comments

View in context: https://bugs.webkit.org/attachment.cgi?id=74133&action=review

>>> WebCore/loader/cache/CachedResource.h:124
>>> +    void setHistoryOnly(bool b) { m_historyOnly = b; }
>> 
>> Name these isStale() and setStale().
> 
> I originally had that exact naming scheme, but decided against it. My reasoning
> for switching to "history only" was that I thought "stale" was too similar to isExpired().
> This comes into play when a resource is received with other headers, such as "Expires",
> or "Cache-Control: max-age=..." and it outlives its freshness. I wanted to clearly
> indicate this new state of a resource, which is why I chose the longer name.
> 
> With this in mind, would "stale" still be a better choice? It does match more closely
> to the CachePolicy name (AllowsStale). If not, should I change the "allowsStale"
> variable name to MemoryCache::requestResource be changed instead?

The names should probably then be isForHistoryOnly, m_isForHistoryOnly, and setForHistoryOnly.

> WebCore/loader/cache/CachedResourceLoader.cpp:266
> +    bool allowStaleResources = cachePolicy() == CachePolicyAllowStale;

The CachePolicy value should probably be changed to something like CachePolicyHistoryBuffer or something along those lines.

> WebCore/loader/cache/MemoryCache.cpp:97
> +CachedResource* MemoryCache::requestResource(CachedResourceLoader* cachedResourceLoader, CachedResource::Type type, const KURL& url, const String& charset, bool requestIsPreload, bool allowStale)

This argument name should be something like forHistory.
Comment 9 Joseph Pecoraro 2010-11-17 17:02:23 PST
(In reply to comment #8)
> • The names should probably then be isForHistoryOnly, m_isForHistoryOnly, and setForHistoryOnly.
>
> • The CachePolicy value should probably be changed to something like CachePolicyHistoryBuffer or something along those lines.
> 
> • This argument name should be something like forHistory.

Done, Thanks.


(In reply to comment #4)
> Does this work in Firefox?

Good question. Yes, it looks like Firefox already does this. I manually tested Firefox/4.0b6
loading the page containing a no-store subresource, did a google search, and came back,
and the page must have used the stale subresource because the random number was the
same. Refreshing was different. Likewise, it does not cache the no-store for a main resource.

  # Manually tested the following, as they couldn't handle the window.open automation for some reason.
  http://127.0.0.1:8000/cache/resources/no-store-resource.html
Comment 10 Joseph Pecoraro 2010-11-17 17:06:23 PST
> Good question. Yes, it looks like Firefox already does this.

Actually, I take that back. Its likely that for the simple page they were using their back/forward
cache, the equivalent of our PageCache. I'm going to take a moment to more accurately check.
Comment 11 Joseph Pecoraro 2010-11-17 17:12:51 PST
(In reply to comment #10)
> > Good question. Yes, it looks like Firefox already does this.
> 
> Actually, I take that back. Its likely that for the simple page they were using their back/forward
> cache, the equivalent of our PageCache. I'm going to take a moment to more accurately check.

Firefox does not cache "no-store" resources. What I saw before was their back/forward cache.
I disabling their back/forward cache by (1) going to "about:config" and (2) setting
"browser.sessionhistory.max_total_viewers" to 0. After this change, they did decide to
reload the "no-store" resource.
Comment 12 Alexey Proskuryakov 2010-11-17 17:16:41 PST
> What I saw before was their back/forward cache.

That's surprising, because it contradicts what was said in 26777. Could you compare to what Firefox 3.6 does?
Comment 13 Alexey Proskuryakov 2010-11-17 17:18:12 PST
And as Joe himself mentioned on IRC, this needs to be carefully considered for https resources. I don't see any practical harm from caching these for history, but perhaps others will.
Comment 14 Joseph Pecoraro 2010-11-17 17:35:39 PST
(In reply to comment #12)
> > What I saw before was their back/forward cache.
> 
> That's surprising, because it contradicts what was said in 26777.
> Could you compare to what Firefox 3.6 does?

I don't think it contradicts the comments. I think those comments were specific
to https pages, or handling of "no-store" on the main resource. Here, "no-store"
was for a subresource. The original documentation Firefox provided for the
"bfcache" feature does not mention its handling of subresources, only the loose
term "pages".
https://developer.mozilla.org/En/Using_Firefox_1.5_caching

For what it is worth, the manual test of Safari 5.0.2 with the PageCache enabled
acts the same as Firefox 4. The main resource still uses the PageCache even with
a "no-store" subresource. This should be fine, as no new requests are made, we
are just jumping back to where we were. But, since (currently) we won't be using
the PageCache in https sites, I think we should not allow "forHistory" resources.
Comment 15 Joseph Pecoraro 2010-11-17 19:01:00 PST
Created attachment 74193 [details]
[PATCH] Do No Mark "https" Resources as Allowed For History

• Addressed Darin's comments and made renames.
• Do not mark https resources for history, just remove them like we used to.

Correct me if I am wrong, but I don't need to check if the main resource is https, just
the "no-cache" subresource. I thought about this for a while, and I figured that if a
sub-resource is non-secure inside a secure main resource, then the subresource is
not particularly important.

I have "Reviewed by Darin" in the patch, but I will replace that after another review.
I just forgot to change it back to OOPS.
Comment 16 Antti Koivisto 2010-11-17 19:17:46 PST
How about just directly testing for resource->response().cacheControlContainsNoStore()? I think that would be simpler and less confusing than yet another field in CachedResource. 

Response caches the cache control parse result so there is no performance penalty for calling it repeatedly.
Comment 17 WebKit Review Bot 2010-11-17 20:50:42 PST
Attachment 74193 [details] did not build on win:
Build output: http://queues.webkit.org/results/6142036
Comment 18 Darin Adler 2010-11-17 22:14:27 PST
Comment on attachment 74193 [details]
[PATCH] Do No Mark "https" Resources as Allowed For History

View in context: https://bugs.webkit.org/attachment.cgi?id=74193&action=review

r=me, but please fix that function to remove the unnecessary conversion to KURL, and try to find out why the Windows EWS build is failing.

> WebCore/loader/cache/CachedResource.cpp:259
> +            if (KURL(ParsedURLString, url()).protocolIs("https"))

There is no need to parse a URL just to check its protocol. The KURL.h header has a protocolIs function that takes a string for this reason.
Comment 19 Darin Adler 2010-11-17 22:15:07 PST
Antti’s comment also makes sense. I missed that when reviewing.
Comment 20 Eric Seidel (no email) 2010-11-18 03:20:23 PST
Comment on attachment 74133 [details]
[PATCH] Fixed Typos + Improved Comments

Cleared Darin Adler's review+ from obsolete attachment 74133 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment 21 Joseph Pecoraro 2010-11-18 09:45:23 PST
Created attachment 74245 [details]
[PATCH] Patch that Will Land

Just attaching the patch I'll land.

  • addressed Antti's comment, remove "isForHistory" and just use the response. cacheControlContainsNoStore
  • addressed Darin's comment, use the static "protocolIs" function.
  • added some comments to clarify this in MemoryCache::requestResource
  • switched from remove() to evict() because all remove does is call evict =)

Rebasing and landing now.
Comment 22 Joseph Pecoraro 2010-11-18 10:02:34 PST
Committed r72303
	M	WebCore/ChangeLog
	M	WebCore/loader/FrameLoader.cpp
	M	WebCore/loader/cache/MemoryCache.h
	M	WebCore/loader/cache/CachedResourceLoader.cpp
	M	WebCore/loader/cache/CachedResource.cpp
	M	WebCore/loader/cache/CachePolicy.h
	M	WebCore/loader/cache/MemoryCache.cpp
	A	LayoutTests/http/tests/cache/history-only-cached-subresource-loads-expected.txt
	A	LayoutTests/http/tests/cache/history-only-cached-subresource-loads.html
	A	LayoutTests/http/tests/cache/resources/no-store-resource-forward.html
	A	LayoutTests/http/tests/cache/resources/random.cgi
	A	LayoutTests/http/tests/cache/resources/no-store-resource-next.html
	A	LayoutTests/http/tests/cache/resources/no-store-resource.html
	M	LayoutTests/ChangeLog
r72303 = a40cebe316b06142cfcc85e9aa988f2364189d69
http://trac.webkit.org/changeset/72303
Comment 23 WebKit Review Bot 2010-11-18 10:49:18 PST
http://trac.webkit.org/changeset/72303 might have broken Qt Linux Release