* SUMMARY The WebCore::Cache tends to add resources that it shouldn't (like tiny "images" that are used to gather statistics) when loading a web page. This happens because the Expires, Pragma and Cache-Control headers are not checked before the resource is added to the cache. The result is that these items rarely (if ever) get removed from the cache because they're so tiny and the cache size is so large on modern desktop computers. * STEPS TO REPRODUCE 1. Launch Safari with a debug build of WebKit. 2. Go to URL: http://googlemapsmania.blogspot.com/ 3. For each Google Maps web site linked on the right side of the page: 3a. Click on the link. 3b. Wait for the page to load. 3c. Click back. * RESULTS "Images" with URLs like the following are cached when they should not be. (NOTE: You must add more information to dumpLRULists(true) to see the URLs for each resource.) http://c8.statcounter.com/t.php?sc_project=903528&resolution=1920&h=1200&camefrom=&u=http%3A//googlemapsmania.blogspot.com/&t=Google%20Maps%20Mania&java=1&security=5d533acd&sc_random=0.29789417516440153 http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1468457514&utmhn=googlemapsmania.blogspot.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=9.0%20r124&utmdt=Google%20Maps%20Mania&utmhid=1842576464&utmr=-&utmp=/&utmac=UA-269511-8&utmcc=__utma%3D109382163.219521200792548670.1222901069.1223404926.1223409971.12%3B%2B__utmz%3D109382163.1222901069.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B * REGRESSION Unknown. This behavior has probably existed since the cache was written.
Created attachment 24214 [details] Patch v1 Proposed fix.
Created attachment 24215 [details] Original cache behavior This is the original cache behavior when following the Steps to Reproduce in Comment #1.
Comment on attachment 24214 [details] Patch v1 Some of the new methods on ResourceResponseBase are rather sizable. Early returns and help functions may help make that code easier to follow.
Created attachment 24216 [details] Revised cache behavior This is the revised cache behavior when following the Steps to Reproduce in Comment #1 with the patch applied.
Comment on attachment 24214 [details] Patch v1 + CacheControlDirectiveMap directives = m_response.cacheControlDirectives(); This will copy the CacheControlDirectiveMap. Instead, passing a CacheControlDirectiveMap to build up would be more efficient. + CacheControlDirectiveMap cacheControlDirectives = m_response.cacheControlDirectives(); Same here. + Vector<pair<String, String> > directives = parseCacheHeader(pragmaHeader); This will also cause a copy of the Vector. There are a few more of these as well. + for (unsigned i = 0, max = directives.size(); i < max; ++i) This is a bit awkward. We usually put the max assignment on the line above. + result.set(directives[i].first.lower(), directives[i].second); Instead of calling lower, perhaps a CaseFoldingHash would be better for this. + static Vector<pair<String, String> > parseCacheHeader(const String&); + static Vector<String> parseCacheControlDirectiveValues(const String&); These don't need to be declared in the header. Marking the function as static in the c++ will give them internal linkage. r- for now.
I need to rework the parser--it doesn't follow the spec (as white space and other characters are included as a separator, not just a comma).
Created attachment 24338 [details] Patch v2 (In reply to comment #5) > (From update of attachment 24214 [details] [edit]) > + CacheControlDirectiveMap directives = m_response.cacheControlDirectives(); > This will copy the CacheControlDirectiveMap. Instead, passing a > CacheControlDirectiveMap to build up would be more efficient. Fixed. > + CacheControlDirectiveMap cacheControlDirectives = > m_response.cacheControlDirectives(); > Same here. Fixed. > + Vector<pair<String, String> > directives = > parseCacheHeader(pragmaHeader); > This will also cause a copy of the Vector. There are a few more of these as > well. All fixed. > + for (unsigned i = 0, max = directives.size(); i < max; ++i) > This is a bit awkward. We usually put the max assignment on the line above. I prefer this (when it doesn't make the line insanely long) since it scopes the max variable to the loop, but I changed to declare max outside the loop. > + result.set(directives[i].first.lower(), directives[i].second); > Instead of calling lower, perhaps a CaseFoldingHash would be better for this. Fixed. Thanks for the suggestion! > + static Vector<pair<String, String> > parseCacheHeader(const String&); > + static Vector<String> parseCacheControlDirectiveValues(const String&); > These don't need to be declared in the header. Marking the function as static > in the c++ will give them internal linkage. Fixed to be declared locally within the .cpp source file. (In reply to comment #6) > I need to rework the parser--it doesn't follow the spec (as white space and > other characters are included as a separator, not just a comma). The rest of the changes were to clean up the parser, including some new methods added to String and StringImpl.
While the idea is mostly right I don't think this patch gives optimal behavior. With this patch an uncacheable resource is kicked out of cache right after getting headers for it. That means that we may end up repeteadly loading a single resource that is referenced in multiple times from a document. Previously we would load it only once since code in DocLoader::checkForReload() guarantees that the validity of a resource is checked only once for a single document. In some dynamic cases this may mean reloading an image (with bad cache headers) on every document update. That resources is marked uncacheable does not mean that we should load it multiple times for a single document. I'm not sure yet what is the right solution. We should think about this bit more.
(In reply to comment #8) > That resources is marked uncacheable does not mean that we should load it > multiple times for a single document. I'm not sure yet what is the right > solution. We should think about this bit more. It sounds like we need a "temporary eviction list" of CachedResource objects that are dumped from the cache after a page is done loading. This would prevent them from being loaded multiple times for a single page, and prevent us from walking the entire cache looking for them. In practice this list would also be very small. Should I split the patch into a separate "parsing cache header" patch and this one? It's gotten kind of large with the String[Impl] class changes.
Yeah, I think splitting the patch is a good idea. The cache header parsing code looks good. The part that I was concerned about was really those few lines in loader.cpp.
(In reply to comment #10) > Yeah, I think splitting the patch is a good idea. The cache header parsing code > looks good. The part that I was concerned about was really those few lines in > loader.cpp. See Bug 21596.
Comment on attachment 24338 [details] Patch v2 Clearing review flag since I'm blocked on Bug 21596.
Sorry, wrong bug :(
Actually, I think this was fixed a long time ago.