Bug 21493

Summary: Resources added to WebCore::Cache which should never be cached
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: WebCore Misc.Assignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ggaren, koivisto, sam, shanestephens, simon.fraser
Priority: P2    
Version: 525.x (Safari 3.1)   
Hardware: Mac   
OS: OS X 10.5   
Bug Depends on: 21596    
Bug Blocks:    
Attachments:
Description Flags
Patch v1
sam: review-
Original cache behavior
none
Revised cache behavior
none
Patch v2 none

David Kilzer (:ddkilzer)
Reported 2008-10-08 18:52:55 PDT
* SUMMARY The WebCore::Cache tends to add resources that it shouldn't (like tiny "images" that are used to gather statistics) when loading a web page. This happens because the Expires, Pragma and Cache-Control headers are not checked before the resource is added to the cache. The result is that these items rarely (if ever) get removed from the cache because they're so tiny and the cache size is so large on modern desktop computers. * STEPS TO REPRODUCE 1. Launch Safari with a debug build of WebKit. 2. Go to URL: http://googlemapsmania.blogspot.com/ 3. For each Google Maps web site linked on the right side of the page: 3a. Click on the link. 3b. Wait for the page to load. 3c. Click back. * RESULTS "Images" with URLs like the following are cached when they should not be. (NOTE: You must add more information to dumpLRULists(true) to see the URLs for each resource.) http://c8.statcounter.com/t.php?sc_project=903528&resolution=1920&h=1200&camefrom=&u=http%3A//googlemapsmania.blogspot.com/&t=Google%20Maps%20Mania&java=1&security=5d533acd&sc_random=0.29789417516440153 http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1468457514&utmhn=googlemapsmania.blogspot.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=9.0%20r124&utmdt=Google%20Maps%20Mania&utmhid=1842576464&utmr=-&utmp=/&utmac=UA-269511-8&utmcc=__utma%3D109382163.219521200792548670.1222901069.1223404926.1223409971.12%3B%2B__utmz%3D109382163.1222901069.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B * REGRESSION Unknown. This behavior has probably existed since the cache was written.
Attachments
Patch v1 (15.10 KB, patch)
2008-10-08 19:16 PDT, David Kilzer (:ddkilzer)
sam: review-
Original cache behavior (57.62 KB, text/plain)
2008-10-08 19:20 PDT, David Kilzer (:ddkilzer)
no flags
Revised cache behavior (33.35 KB, text/plain)
2008-10-08 19:20 PDT, David Kilzer (:ddkilzer)
no flags
Patch v2 (27.36 KB, patch)
2008-10-13 21:17 PDT, David Kilzer (:ddkilzer)
no flags
David Kilzer (:ddkilzer)
Comment 1 2008-10-08 19:16:30 PDT
Created attachment 24214 [details] Patch v1 Proposed fix.
David Kilzer (:ddkilzer)
Comment 2 2008-10-08 19:20:03 PDT
Created attachment 24215 [details] Original cache behavior This is the original cache behavior when following the Steps to Reproduce in Comment #1.
Mark Rowe (bdash)
Comment 3 2008-10-08 19:20:28 PDT
Comment on attachment 24214 [details] Patch v1 Some of the new methods on ResourceResponseBase are rather sizable. Early returns and help functions may help make that code easier to follow.
David Kilzer (:ddkilzer)
Comment 4 2008-10-08 19:20:35 PDT
Created attachment 24216 [details] Revised cache behavior This is the revised cache behavior when following the Steps to Reproduce in Comment #1 with the patch applied.
Sam Weinig
Comment 5 2008-10-08 20:47:17 PDT
Comment on attachment 24214 [details] Patch v1 + CacheControlDirectiveMap directives = m_response.cacheControlDirectives(); This will copy the CacheControlDirectiveMap. Instead, passing a CacheControlDirectiveMap to build up would be more efficient. + CacheControlDirectiveMap cacheControlDirectives = m_response.cacheControlDirectives(); Same here. + Vector<pair<String, String> > directives = parseCacheHeader(pragmaHeader); This will also cause a copy of the Vector. There are a few more of these as well. + for (unsigned i = 0, max = directives.size(); i < max; ++i) This is a bit awkward. We usually put the max assignment on the line above. + result.set(directives[i].first.lower(), directives[i].second); Instead of calling lower, perhaps a CaseFoldingHash would be better for this. + static Vector<pair<String, String> > parseCacheHeader(const String&); + static Vector<String> parseCacheControlDirectiveValues(const String&); These don't need to be declared in the header. Marking the function as static in the c++ will give them internal linkage. r- for now.
David Kilzer (:ddkilzer)
Comment 6 2008-10-09 18:07:04 PDT
I need to rework the parser--it doesn't follow the spec (as white space and other characters are included as a separator, not just a comma).
David Kilzer (:ddkilzer)
Comment 7 2008-10-13 21:17:12 PDT
Created attachment 24338 [details] Patch v2 (In reply to comment #5) > (From update of attachment 24214 [details] [edit]) > + CacheControlDirectiveMap directives = m_response.cacheControlDirectives(); > This will copy the CacheControlDirectiveMap. Instead, passing a > CacheControlDirectiveMap to build up would be more efficient. Fixed. > + CacheControlDirectiveMap cacheControlDirectives = > m_response.cacheControlDirectives(); > Same here. Fixed. > + Vector<pair<String, String> > directives = > parseCacheHeader(pragmaHeader); > This will also cause a copy of the Vector. There are a few more of these as > well. All fixed. > + for (unsigned i = 0, max = directives.size(); i < max; ++i) > This is a bit awkward. We usually put the max assignment on the line above. I prefer this (when it doesn't make the line insanely long) since it scopes the max variable to the loop, but I changed to declare max outside the loop. > + result.set(directives[i].first.lower(), directives[i].second); > Instead of calling lower, perhaps a CaseFoldingHash would be better for this. Fixed. Thanks for the suggestion! > + static Vector<pair<String, String> > parseCacheHeader(const String&); > + static Vector<String> parseCacheControlDirectiveValues(const String&); > These don't need to be declared in the header. Marking the function as static > in the c++ will give them internal linkage. Fixed to be declared locally within the .cpp source file. (In reply to comment #6) > I need to rework the parser--it doesn't follow the spec (as white space and > other characters are included as a separator, not just a comma). The rest of the changes were to clean up the parser, including some new methods added to String and StringImpl.
Antti Koivisto
Comment 8 2008-10-14 01:23:21 PDT
While the idea is mostly right I don't think this patch gives optimal behavior. With this patch an uncacheable resource is kicked out of cache right after getting headers for it. That means that we may end up repeteadly loading a single resource that is referenced in multiple times from a document. Previously we would load it only once since code in DocLoader::checkForReload() guarantees that the validity of a resource is checked only once for a single document. In some dynamic cases this may mean reloading an image (with bad cache headers) on every document update. That resources is marked uncacheable does not mean that we should load it multiple times for a single document. I'm not sure yet what is the right solution. We should think about this bit more.
David Kilzer (:ddkilzer)
Comment 9 2008-10-14 02:58:44 PDT
(In reply to comment #8) > That resources is marked uncacheable does not mean that we should load it > multiple times for a single document. I'm not sure yet what is the right > solution. We should think about this bit more. It sounds like we need a "temporary eviction list" of CachedResource objects that are dumped from the cache after a page is done loading. This would prevent them from being loaded multiple times for a single page, and prevent us from walking the entire cache looking for them. In practice this list would also be very small. Should I split the patch into a separate "parsing cache header" patch and this one? It's gotten kind of large with the String[Impl] class changes.
Antti Koivisto
Comment 10 2008-10-14 11:52:07 PDT
Yeah, I think splitting the patch is a good idea. The cache header parsing code looks good. The part that I was concerned about was really those few lines in loader.cpp.
David Kilzer (:ddkilzer)
Comment 11 2008-10-14 12:58:42 PDT
(In reply to comment #10) > Yeah, I think splitting the patch is a good idea. The cache header parsing code > looks good. The part that I was concerned about was really those few lines in > loader.cpp. See Bug 21596.
David Kilzer (:ddkilzer)
Comment 12 2008-10-14 17:28:18 PDT
Comment on attachment 24338 [details] Patch v2 Clearing review flag since I'm blocked on Bug 21596.
Shane Stephens
Comment 13 2011-06-13 18:09:29 PDT
Sorry, wrong bug :(
Adam Barth
Comment 14 2011-06-13 18:12:14 PDT
Actually, I think this was fixed a long time ago.
Note You need to log in before you can comment on or make changes to this bug.