Bug 21493 - Resources added to WebCore::Cache which should never be cached
Summary: Resources added to WebCore::Cache which should never be cached
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 525.x (Safari 3.1)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: David Kilzer (:ddkilzer)
URL:
Keywords:
Depends on: 21596
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-08 18:52 PDT by David Kilzer (:ddkilzer)
Modified: 2011-06-13 18:12 PDT (History)
6 users (show)

See Also:


Attachments
Patch v1 (15.10 KB, patch)
2008-10-08 19:16 PDT, David Kilzer (:ddkilzer)
sam: review-
Details | Formatted Diff | Diff
Original cache behavior (57.62 KB, text/plain)
2008-10-08 19:20 PDT, David Kilzer (:ddkilzer)
no flags Details
Revised cache behavior (33.35 KB, text/plain)
2008-10-08 19:20 PDT, David Kilzer (:ddkilzer)
no flags Details
Patch v2 (27.36 KB, patch)
2008-10-13 21:17 PDT, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2008-10-08 18:52:55 PDT
* SUMMARY
The WebCore::Cache tends to add resources that it shouldn't (like tiny "images" that are used to gather statistics) when loading a web page.  This happens because the Expires, Pragma and Cache-Control headers are not checked before the resource is added to the cache.  The result is that these items rarely (if ever) get removed from the cache because they're so tiny and the cache size is so large on modern desktop computers.

* STEPS TO REPRODUCE
1. Launch Safari with a debug build of WebKit.
2. Go to URL:  http://googlemapsmania.blogspot.com/
3. For each Google Maps web site linked on the right side of the page:
3a. Click on the link.
3b. Wait for the page to load.
3c. Click back.

* RESULTS
"Images" with URLs like the following are cached when they should not be.  (NOTE: You must add more information to dumpLRULists(true) to see the URLs for each resource.)

http://c8.statcounter.com/t.php?sc_project=903528&resolution=1920&h=1200&camefrom=&u=http%3A//googlemapsmania.blogspot.com/&t=Google%20Maps%20Mania&java=1&security=5d533acd&sc_random=0.29789417516440153

http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1468457514&utmhn=googlemapsmania.blogspot.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=9.0%20r124&utmdt=Google%20Maps%20Mania&utmhid=1842576464&utmr=-&utmp=/&utmac=UA-269511-8&utmcc=__utma%3D109382163.219521200792548670.1222901069.1223404926.1223409971.12%3B%2B__utmz%3D109382163.1222901069.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B

* REGRESSION
Unknown.  This behavior has probably existed since the cache was written.
Comment 1 David Kilzer (:ddkilzer) 2008-10-08 19:16:30 PDT
Created attachment 24214 [details]
Patch v1

Proposed fix.
Comment 2 David Kilzer (:ddkilzer) 2008-10-08 19:20:03 PDT
Created attachment 24215 [details]
Original cache behavior

This is the original cache behavior when following the Steps to Reproduce in Comment #1.
Comment 3 Mark Rowe (bdash) 2008-10-08 19:20:28 PDT
Comment on attachment 24214 [details]
Patch v1

Some of the new methods on ResourceResponseBase are rather sizable.  Early returns and help functions may help make that code easier to follow.
Comment 4 David Kilzer (:ddkilzer) 2008-10-08 19:20:35 PDT
Created attachment 24216 [details]
Revised cache behavior

This is the revised cache behavior when following the Steps to Reproduce in Comment #1 with the patch applied.
Comment 5 Sam Weinig 2008-10-08 20:47:17 PDT
Comment on attachment 24214 [details]
Patch v1

+    CacheControlDirectiveMap directives = m_response.cacheControlDirectives();
This will copy the CacheControlDirectiveMap.  Instead, passing a CacheControlDirectiveMap to build up would be more efficient.

+    CacheControlDirectiveMap cacheControlDirectives = m_response.cacheControlDirectives();
Same here.

+        Vector<pair<String, String> > directives = parseCacheHeader(pragmaHeader);
This will also cause a copy of the Vector.  There are a few more of these as well.

+        for (unsigned i = 0, max = directives.size(); i < max; ++i)
This is a bit awkward.  We usually put the max assignment on the line above.

+            result.set(directives[i].first.lower(), directives[i].second);
Instead of calling lower, perhaps a CaseFoldingHash would be better for this.

+    static Vector<pair<String, String> > parseCacheHeader(const String&);
+    static Vector<String> parseCacheControlDirectiveValues(const String&);
These don't need to be declared in the header.  Marking the function as static in the c++ will give them internal linkage.

r- for now.
Comment 6 David Kilzer (:ddkilzer) 2008-10-09 18:07:04 PDT
I need to rework the parser--it doesn't follow the spec (as white space and other characters are included as a separator, not just a comma).
Comment 7 David Kilzer (:ddkilzer) 2008-10-13 21:17:12 PDT
Created attachment 24338 [details]
Patch v2

(In reply to comment #5)
> (From update of attachment 24214 [details] [edit])
> +    CacheControlDirectiveMap directives = m_response.cacheControlDirectives();
> This will copy the CacheControlDirectiveMap.  Instead, passing a
> CacheControlDirectiveMap to build up would be more efficient.

Fixed.

> +    CacheControlDirectiveMap cacheControlDirectives =
> m_response.cacheControlDirectives();
> Same here.

Fixed.

> +        Vector<pair<String, String> > directives =
> parseCacheHeader(pragmaHeader);
> This will also cause a copy of the Vector.  There are a few more of these as
> well.

All fixed.

> +        for (unsigned i = 0, max = directives.size(); i < max; ++i)
> This is a bit awkward.  We usually put the max assignment on the line above.

I prefer this (when it doesn't make the line insanely long) since it scopes the max variable to the loop, but I changed to declare max outside the loop.

> +            result.set(directives[i].first.lower(), directives[i].second);
> Instead of calling lower, perhaps a CaseFoldingHash would be better for this.

Fixed. Thanks for the suggestion!

> +    static Vector<pair<String, String> > parseCacheHeader(const String&);
> +    static Vector<String> parseCacheControlDirectiveValues(const String&);
> These don't need to be declared in the header.  Marking the function as static
> in the c++ will give them internal linkage.

Fixed to be declared locally within the .cpp source file.

(In reply to comment #6)
> I need to rework the parser--it doesn't follow the spec (as white space and
> other characters are included as a separator, not just a comma).

The rest of the changes were to clean up the parser, including some new methods added to String and StringImpl.
Comment 8 Antti Koivisto 2008-10-14 01:23:21 PDT
While the idea is mostly right I don't think this patch gives optimal behavior. With this patch an uncacheable resource is kicked out of cache right after getting headers for it. That means that we may end up repeteadly loading a single resource that is referenced in multiple times from a document.  Previously we would load it only once since code in DocLoader::checkForReload() guarantees that the validity of a resource is checked only once for a single document. In some dynamic cases this may mean reloading an image (with bad cache headers) on every document update. 

That resources is marked uncacheable does not mean that we should load it multiple times for a single document. I'm not sure yet what is the right solution. We should think about this bit more.
Comment 9 David Kilzer (:ddkilzer) 2008-10-14 02:58:44 PDT
(In reply to comment #8)
> That resources is marked uncacheable does not mean that we should load it
> multiple times for a single document. I'm not sure yet what is the right
> solution. We should think about this bit more.

It sounds like we need a "temporary eviction list" of CachedResource objects that are dumped from the cache after a page is done loading.  This would prevent them from being loaded multiple times for a single page, and prevent us from walking the entire cache looking for them.  In practice this list would also be very small.

Should I split the patch into a separate "parsing cache header" patch and this one?  It's gotten kind of large with the String[Impl] class changes.
Comment 10 Antti Koivisto 2008-10-14 11:52:07 PDT
Yeah, I think splitting the patch is a good idea. The cache header parsing code looks good. The part that I was concerned about was really those few lines in loader.cpp.
Comment 11 David Kilzer (:ddkilzer) 2008-10-14 12:58:42 PDT
(In reply to comment #10)
> Yeah, I think splitting the patch is a good idea. The cache header parsing code
> looks good. The part that I was concerned about was really those few lines in
> loader.cpp.

See Bug 21596.
Comment 12 David Kilzer (:ddkilzer) 2008-10-14 17:28:18 PDT
Comment on attachment 24338 [details]
Patch v2

Clearing review flag since I'm blocked on Bug 21596.
Comment 13 Shane Stephens 2011-06-13 18:09:29 PDT
Sorry, wrong bug :(
Comment 14 Adam Barth 2011-06-13 18:12:14 PDT
Actually, I think this was fixed a long time ago.