Bug 24656

Summary: cacheControlContainsNoCache() in ResourceResponseBase.h is wrong
Product: WebKit Reporter: Grace Kloba <klobag>
Component: Page LoadingAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, darin, ddkilzer, fishd, klobag, koivisto
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
Attachments:
Description Flags
patch to fix the bug
sam: review-
proposed fix ddkilzer: review+

Description Grace Kloba 2009-03-17 15:58:49 PDT
Here is the code,

    bool cacheControlContainsNoCache() const
    {
        if (!m_haveParsedCacheControl)
            parseCacheControlDirectives();
        return m_m_cacheControlContainsMustRevalidate;
    }

We should return m_cacheControlContainsNoCache instead
Comment 1 Grace Kloba 2009-03-17 18:14:40 PDT
Another bug in the cpp file, ResourceResponseBase::parseCacheControlDirectives().

        if ((equalIgnoringCase(directives[i].first, "private") || equalIgnoringCase(directives[i].first, "no-cache")) && !directives[i].second.isEmpty())
            parseCacheControlDirectiveValues(directives[i].second, directiveValues);
        else
            directiveValues.append(directives[i].second);

I think the else should append directives[i].first as directives[i].second is empty.
Comment 2 Grace Kloba 2009-03-17 19:39:54 PDT
Created attachment 28717 [details]
patch to fix the bug
Comment 3 Sam Weinig 2009-03-17 22:48:14 PDT
Comment on attachment 28717 [details]
patch to fix the bug

If you would like to put something up for review, you need to set the review flag to ?.  

This needs test cases, so r-.
Comment 4 Alexey Proskuryakov 2009-03-18 13:52:12 PDT
Hmm, this may be quite tricky to test for.

A test could use a stateful script that would return different resource content each time, see e.g. http/tests/resources/network-simulator.php.
Comment 5 Grace Kloba 2009-03-18 15:57:23 PDT
I found network-simulator.php too by searching "no-store". But I can't find the old test case when the code was added.

It is really fixing a typo.

Will work on a test case later.
Comment 6 Alexey Proskuryakov 2009-03-29 06:03:58 PDT
OK, I guess it's a bit unfair to strictly require tests, given that we've let the original patch in with no tests at all. But it also means that the tests are so much more important!

>          if ((equalIgnoringCase(directives[i].first, "private") || equalIgnoringCase(directives[i].first, "no-cache")) && !directives[i].second.isEmpty())
>              parseCacheControlDirectiveValues(directives[i].second, directiveValues);
>          else
> -            directiveValues.append(directives[i].second);
> +            directiveValues.append(directives[i].first);

This code looks busted in many more ways, and I don't think that it's very desirable to only fix one minor case. All this check does is stuff Cache-Control private and no-cache directive values into one bucket with other directives, which looks bogus. E.g., one possible use of this syntax is 'Cache-Control: no-cache="X-Foobar"', which means that 'X-Foobar' header must not be sent in a proxy response without revalidation.

According to svn blame, this code was added in <http://trac.webkit.org/changeset/38145>.
Comment 7 David Kilzer (:ddkilzer) 2009-03-29 09:46:37 PDT
(In reply to comment #6)

> >          if ((equalIgnoringCase(directives[i].first, "private") || equalIgnoringCase(directives[i].first, "no-cache")) && !directives[i].second.isEmpty())
> >              parseCacheControlDirectiveValues(directives[i].second, directiveValues);
> >          else
> > -            directiveValues.append(directives[i].second);
> > +            directiveValues.append(directives[i].first);
> 
> This code looks busted in many more ways, and I don't think that it's very
> desirable to only fix one minor case. All this check does is stuff
> Cache-Control private and no-cache directive values into one bucket with other
> directives, which looks bogus. E.g., one possible use of this syntax is
> 'Cache-Control: no-cache="X-Foobar"', which means that 'X-Foobar' header must
> not be sent in a proxy response without revalidation.

The "private" and "no-cache" directive values are NOT put into a single bucket.  You must read the second if/else statement below it to see how the directiveValues data is used.  The check for "private" or "no-cache" in the code above is because the spec says that they are the only directives that may have one or more sub-values, so they are parsed differently than other directives.  (But the code inside the outer for loop handles directives with and without sub-values, which may be why it's confusing.)

See Section 14.9 in <http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>.
Comment 8 David Kilzer (:ddkilzer) 2009-03-29 10:05:49 PDT
(In reply to comment #7)
> (In reply to comment #6)
> 
> > >          if ((equalIgnoringCase(directives[i].first, "private") || equalIgnoringCase(directives[i].first, "no-cache")) && !directives[i].second.isEmpty())
> > >              parseCacheControlDirectiveValues(directives[i].second, directiveValues);
> > >          else
> > > -            directiveValues.append(directives[i].second);
> > > +            directiveValues.append(directives[i].first);
> > 
> > This code looks busted in many more ways, and I don't think that it's very
> > desirable to only fix one minor case. All this check does is stuff
> > Cache-Control private and no-cache directive values into one bucket with other
> > directives, which looks bogus. E.g., one possible use of this syntax is
> > 'Cache-Control: no-cache="X-Foobar"', which means that 'X-Foobar' header must
> > not be sent in a proxy response without revalidation.
> 
> The "private" and "no-cache" directive values are NOT put into a single bucket.
>  You must read the second if/else statement below it to see how the
> directiveValues data is used.  The check for "private" or "no-cache" in the
> code above is because the spec says that they are the only directives that may
> have one or more sub-values, so they are parsed differently than other
> directives.  (But the code inside the outer for loop handles directives with
> and without sub-values, which may be why it's confusing.)

I remember now, the parsing code was changed in r39383 because fully parsing the cache-control headers was a memory regression:

<http://trac.webkit.org/changeset/39383>

The original code I committed in r38145 did do correct parsing of those headers.

-            directiveValues.append(directives[i].second);
+            directiveValues.append(directives[i].first);

This change is incorrect.

-        return m_cacheControlContainsMustRevalidate;
+        return m_cacheControlContainsNoCache;

This change is correct and fixes a copy-paste error in cacheControlContainsNoCache().

If more detail is needed for Cache-Control headers (after r39383), you'll have to modify the parsing routines to check for what you're looking for and stuff it into a 1-bit bool field or return the data on an as-needed basis.
Comment 9 Alexey Proskuryakov 2009-03-29 10:08:31 PDT
Dave, please check how the current code handles e.g. "Cache-Control: private=must-revalidate" - instead of treating must-revalidate as a header name, it sets m_cacheControlContainsMustRevalidate to true.
Comment 10 David Kilzer (:ddkilzer) 2009-03-29 12:05:27 PDT
(In reply to comment #9)
> Dave, please check how the current code handles e.g. "Cache-Control:
> private=must-revalidate" - instead of treating must-revalidate as a header
> name, it sets m_cacheControlContainsMustRevalidate to true.

Yep, I agree that this broke in r39383.
Comment 11 David Kilzer (:ddkilzer) 2009-03-29 12:12:04 PDT
(In reply to comment #10)
> (In reply to comment #9)
> > Dave, please check how the current code handles e.g. "Cache-Control:
> > private=must-revalidate" - instead of treating must-revalidate as a header
> > name, it sets m_cacheControlContainsMustRevalidate to true.
> 
> Yep, I agree that this broke in r39383.

This is still the incorrect fix:

-            directiveValues.append(directives[i].second);
+            directiveValues.append(directives[i].first);

Basically, the for() loop in ResourceResponseBase::parseCacheControlDirectives() that was added in r39383 needs to be fixed so it's checking for the must-revalidate keyword instead of a sub-value of a "private" or "no-cache" directive.

I'll try to fix this soon-ish.
Comment 12 Alexey Proskuryakov 2009-04-08 06:22:43 PDT
Created attachment 29331 [details]
proposed fix

Let's get this fixed, at last.

I'm actually not sure if a test can be made for this bug.
Comment 13 David Kilzer (:ddkilzer) 2009-04-08 07:31:29 PDT
Comment on attachment 29331 [details]
proposed fix

r=me
Comment 14 Alexey Proskuryakov 2009-04-08 07:36:55 PDT
Committed <http://trac.webkit.org/changeset/42319>.
Comment 15 Darin Fisher (:fishd, Google) 2009-04-08 11:01:06 PDT
Cache-control: no-cache="set-cookie" is actually quite common.
Comment 16 David Kilzer (:ddkilzer) 2009-04-08 11:54:25 PDT
(In reply to comment #15)
> Cache-control: no-cache="set-cookie" is actually quite common.

Darin, please file a new bug.  Thanks!
Comment 17 Alexey Proskuryakov 2009-04-08 23:10:29 PDT
(In reply to comment #15)
> Cache-control: no-cache="set-cookie" is actually quite common.

What is the practical difference between no-cache and no-cache="set-cookie"? RFC 2616 seems to say that it's up to the cache to decide whether to send a cached response with Set-Cookie omitted, or to revalidate, which sounds confusing to me.
Comment 18 Darin Fisher (:fishd, Google) 2009-04-09 01:31:09 PDT
> What is the practical difference between no-cache and no-cache="set-cookie"?

no-cache applies to the response body, no-cache="set-cookie" only applies to the set-cookie response header.  practically speaking, no-cache="set-cookie" means that you should strip that cookie out when storing the response in the cache.  otherwise, the response is totally cacheable.  (in general, browser caches strip out set-cookie, but i guess there is probably some app server out there that likes to make sure... perhaps as information to proxy caches.)
Comment 19 Alexey Proskuryakov 2009-04-09 05:25:19 PDT
(In reply to comment #18)
> practically speaking, no-cache="set-cookie"
> means that you should strip that cookie out when storing the response in the
> cache. 

What should the cache return when asked for this resource? It doesn't sound like returning a response without one of the headers makes much sense.
Comment 20 Darin Fisher (:fishd, Google) 2009-04-09 10:51:51 PDT
> What should the cache return when asked for this resource? It doesn't sound
> like returning a response without one of the headers makes much sense.

That's what it means.  Think of it as a way for servers to specify that some information should be considered transient or private (can make good sense for set-cookie).  HTTP caches also strip other transient (hop-by-hop) headers like Connection, etc.
Comment 21 Darin Fisher (:fishd, Google) 2009-04-09 10:57:34 PDT
From RFC 2616 section 14.9.1:

  If the no-cache directive does specify one or more field-names,
  then a cache MAY use the response to satisfy a subsequent request,
  subject to any other restrictions on caching. However, the
  specified field-name(s) MUST NOT be sent in the response to a
  subsequent request without successful revalidation with the origin
  server. This allows an origin server to prevent the re-use of
  certain header fields in a response, while still allowing caching
  of the rest of the response.
Comment 22 Alexey Proskuryakov 2009-04-09 12:40:48 PDT
(In reply to comment #21)
> From RFC 2616 section 14.9.1:

Yes, that's exactly the part that I was talking about. What confuses me is that a response that had a Set-Cookie header when sent to the first client is one that more than likely needs to have it when sent to other clients - it should obviously be different, but not missing. Is there a use case where a missing Set-Cookie header is fine?
Comment 23 Darin Fisher (:fishd, Google) 2009-04-09 14:53:23 PDT
> obviously be different, but not missing. Is there a use case where a missing
> Set-Cookie header is fine?

Perhaps a web developer might assume that the cookie would persist in the cookie database.  One might need to combine this with cache-control: private to avoid strange proxy effects.  Of course, the cookie could be absent in the cookie db :/
Comment 24 David Kilzer (:ddkilzer) 2009-04-09 15:18:49 PDT
(In reply to comment #23)
> > obviously be different, but not missing. Is there a use case where a missing
> > Set-Cookie header is fine?
> 
> Perhaps a web developer might assume that the cookie would persist in the
> cookie database.  One might need to combine this with cache-control: private to
> avoid strange proxy effects.  Of course, the cookie could be absent in the
> cookie db :/

I don't understand how caching http proxy behavior applies to a web browser.  Once the http request hits the browser, cookies have a different set of rules determining their cached lifetime than resources do.  I don't see where no-cache="set-cookie" would have any bearing on a web browser.

Or is Chrome's cache set up like a caching http proxy such that it needs to know when it can replay certain headers to each subprocess?
Comment 25 Darin Fisher (:fishd, Google) 2009-04-09 15:36:36 PDT
no, chrome's network stack does not act like a proxy.  sorry to confuse matters.  i mentioned proxies because that is something that web developers have to sometimes think about.

i brought up the example of no-cache="set-cookie" because i have seen those in the wild.  i'm only guessing why web developers (or more likely some web app engine) might want to send that in the response headers.

most browsers do not store set-cookie response headers in the cache anyways, so this particular response header (no-cache="set-cookie") should indeed be irrelevant to browsers.

however, what i found is that because no-cache="set-cookie" is common, if you do not discern no-cache="foo" from no-cache, then you will miss out on being able to cache a lot of content on the web.
Comment 26 David Kilzer (:ddkilzer) 2009-04-09 16:10:31 PDT
(In reply to comment #25)
> however, what i found is that because no-cache="set-cookie" is common, if you
> do not discern no-cache="foo" from no-cache, then you will miss out on being
> able to cache a lot of content on the web.

I see, so no-cache="set-cookie" should not prevent content from being cached.

(In reply to comment #16)
> Darin, please file a new bug.  Thanks!

Pretty please?  :)
Comment 27 Darin Fisher (:fishd, Google) 2009-04-09 22:52:13 PDT
> I see, so no-cache="set-cookie" should not prevent content from being cached.

Right.


> (In reply to comment #16)
> > Darin, please file a new bug.  Thanks!
> 
> Pretty please?  :)

Sorry, I missed that comment before.  See bug 25129.
Comment 28 Alexey Proskuryakov 2009-04-10 02:16:49 PDT
(In reply to comment #23)
> Perhaps a web developer might assume that the cookie would persist in the
> cookie database.  One might need to combine this with cache-control: private to
> avoid strange proxy effects.

AFAICT cache-control: private also makes no-cache="set-cookie" unnecessary. So, I'm still confused about the utility of such directive, but it does seem like we should not treat it as no-cache indeed. Have you ever seen no-cache=... with other header names in practice? Set-cookie2 maybe?
Comment 29 Darin Fisher (:fishd, Google) 2009-04-10 08:34:17 PDT
> AFAICT cache-control: private also makes no-cache="set-cookie" unnecessary. So,
> I'm still confused about the utility of such directive, but it does seem like
> we should not treat it as no-cache indeed. Have you ever seen no-cache=... with
> other header names in practice? Set-cookie2 maybe?

Using a tool I know of to search the web, I found some instances of cache-control no-cache="set-cookie,set-cookie2".  I see no-cache="server" here: http://www.w3.org/Protocols/HTTP/Issues/cachedirective.txt