RESOLVED FIXED189760
Resource Load Statistics: Relax cap on partitioned cache max age if reloads result in the same bytes
https://bugs.webkit.org/show_bug.cgi?id=189760
Summary Resource Load Statistics: Relax cap on partitioned cache max age if reloads r...
John Wilander
Reported 2018-09-19 12:17:28 PDT
We should add logic to achieve the following: 1. Set cache max age cap for prevalent resources to N days. 2. Start every new cache entry with a max age cap multiplier M = 1. 3. When a cache entry for a prevalent resource reaches the age of N*M days, discard it, but hold on to its checksum and M. 4. If the reload results in the same checksum, increase M according to some scheme such as linear or doubling, until it reaches a year.
Attachments
Patch (27.47 KB, patch)
2018-10-22 17:31 PDT, John Wilander
no flags
Patch (27.51 KB, patch)
2018-10-22 17:38 PDT, John Wilander
no flags
Patch (27.56 KB, patch)
2018-10-22 17:43 PDT, John Wilander
no flags
Patch (27.72 KB, patch)
2018-10-22 17:54 PDT, John Wilander
no flags
Patch (32.41 KB, patch)
2018-10-23 12:22 PDT, John Wilander
no flags
Patch (33.88 KB, patch)
2018-11-02 17:07 PDT, John Wilander
no flags
Patch (33.96 KB, patch)
2018-11-02 17:23 PDT, John Wilander
no flags
Patch (30.44 KB, patch)
2018-11-05 11:54 PST, John Wilander
no flags
Patch (30.30 KB, patch)
2018-11-05 11:58 PST, John Wilander
no flags
Patch for landing (30.74 KB, patch)
2018-11-06 14:28 PST, John Wilander
no flags
Patch for landing (31.09 KB, patch)
2018-11-06 14:33 PST, John Wilander
no flags
Radar WebKit Bug Importer
Comment 1 2018-09-19 12:18:03 PDT
John Wilander
Comment 2 2018-10-22 17:31:52 PDT
John Wilander
Comment 3 2018-10-22 17:33:16 PDT
Antti and Youenn, I wanted to get some feedback from you on this. The Changelog has an explanation of what's in the patch.
John Wilander
Comment 4 2018-10-22 17:36:59 PDT
It may very well be that we need to persist what's in the m_forcedReloadEntries HashMap.
John Wilander
Comment 5 2018-10-22 17:38:52 PDT
John Wilander
Comment 6 2018-10-22 17:39:18 PDT
Fixed a non-Cocoa build issue.
John Wilander
Comment 7 2018-10-22 17:43:49 PDT
John Wilander
Comment 8 2018-10-22 17:54:17 PDT
John Wilander
Comment 9 2018-10-23 12:22:08 PDT
John Wilander
Comment 10 2018-10-23 12:23:35 PDT
Corrected to the code to use ResourceResponseBase::isRedirection() instead of ResourceResponseBase::isRedirected(). Fleshed out test case to cover this functionality.
Antti Koivisto
Comment 11 2018-10-25 00:47:22 PDT
Comment on attachment 352987 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352987&action=review > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:281 > +#if ENABLE(RESOURCE_LOAD_STATISTICS) Why is this feature called RESOURCE_LOAD_STATISTICS? It is confusing since it is not about collecting statistics. > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:284 > + auto verificationEntry = m_forcedReloadEntries.take(identifier); This mechanism only works if the network process stays alive so that m_forcedReloadEntries is around when the cache entry is rewritten. Why is this a good assumption? Especially on iOS the browser may get closed a lot. Should we persist the data in m_forcedReloadEntries, maybe to the cache entry itself? The size of the m_forcedReloadEntries is a potential concern. Wouldn't it eventually contain entries for pretty much every resource stored in a session, which could be a very large number. > Source/WebKit/NetworkProcess/cache/NetworkCacheEntry.cpp:103 > encoder.encodeChecksum(); > > +#if ENABLE(RESOURCE_LOAD_STATISTICS) > + if (m_maxAgeCap) > + encoder << m_maxAgeCap; > +#endif This needs to be encoded before the checksum. Since you are encoding an optional, this should be done unconditionally (no if branch). > Source/WebKit/NetworkProcess/cache/NetworkCacheEntry.cpp:113 > + return { m_key, m_timeStamp, header, body, { }, std::nullopt }; std::nullopt -> { } for consistency. > Source/WebKit/NetworkProcess/cache/NetworkCacheEntry.cpp:153 > +#if ENABLE(RESOURCE_LOAD_STATISTICS) > + // Optional max age cap. > + decoder.decode(entry->m_maxAgeCap); > +#endif Similarly decoding should happen before the checksum check. You'll should check the return value and return nullptr if decoding fails.
youenn fablet
Comment 12 2018-10-26 12:14:45 PDT
Comment on attachment 352987 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=352987&action=review >> Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:284 >> + auto verificationEntry = m_forcedReloadEntries.take(identifier); > > This mechanism only works if the network process stays alive so that m_forcedReloadEntries is around when the cache entry is rewritten. Why is this a good assumption? Especially on iOS the browser may get closed a lot. Should we persist the data in m_forcedReloadEntries, maybe to the cache entry itself? > > The size of the m_forcedReloadEntries is a potential concern. Wouldn't it eventually contain entries for pretty much every resource stored in a session, which could be a very large number. NetworkCache is a long living object so it might be better to move the state handling to NetworkResourceLoader (the instance or some of its completion handlers). That way, there will be no risk of leaking memory by having forever stored entries in m_forcedReloadEntries (say a load is cancelled).
John Wilander
Comment 13 2018-10-26 12:50:42 PDT
(In reply to youenn fablet from comment #12) > Comment on attachment 352987 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=352987&action=review > > >> Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:284 > >> + auto verificationEntry = m_forcedReloadEntries.take(identifier); > > > > This mechanism only works if the network process stays alive so that m_forcedReloadEntries is around when the cache entry is rewritten. Why is this a good assumption? Especially on iOS the browser may get closed a lot. Should we persist the data in m_forcedReloadEntries, maybe to the cache entry itself? > > > > The size of the m_forcedReloadEntries is a potential concern. Wouldn't it eventually contain entries for pretty much every resource stored in a session, which could be a very large number. > > NetworkCache is a long living object so it might be better to move the state > handling to NetworkResourceLoader (the instance or some of its completion > handlers). > That way, there will be no risk of leaking memory by having forever stored > entries in m_forcedReloadEntries (say a load is cancelled). That is a good suggestion. I'll look into it. Anything else on this patch?
John Wilander
Comment 14 2018-11-02 17:07:44 PDT
John Wilander
Comment 15 2018-11-02 17:09:33 PDT
I've moved the logic out to the NetworkResourceLoader as suggested. I've also removed the global state. I could not rely on the existing validation infrastructure since redirects are not handled by there. Instead I tie into the special functions for redirects.
John Wilander
Comment 16 2018-11-02 17:23:26 PDT
John Wilander
Comment 17 2018-11-02 17:24:07 PDT
Non-Cocoa build fixes.
youenn fablet
Comment 18 2018-11-04 18:34:55 PST
Comment on attachment 353749 [details] Patch I think the patch is going in the right direction. Some improvement comments and questions below. View in context: https://bugs.webkit.org/attachment.cgi?id=353749&action=review > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:236 > + m_cacheEntryForValidation = WTFMove(entry); I am not sure we should directly reuse m_cacheEntryForValidation. When m_cacheEntryForValidation is not empty, we expect to send a conditional request and receive a 304. It might be clearer if we introduce m_cacheEntryForMaxAgeCapValidation. If not null, we do not have to do conditional request, check for 304 but instead to check whether the response is the same as the cached entry response. > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:347 > case UseDecision::NoDueToPrevalentResourceAgeCap: Do we still need UseDecision::NoDueToPrevalentResourceAgeCap? It seems the loader will do the check and cache is just the place that stores this information. > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:423 > +std::unique_ptr<Entry> Cache::storeRedirect(const WebCore::ResourceRequest& request, const WebCore::ResourceResponse& response, const WebCore::ResourceRequest& redirectRequest, PAL::SessionID sessionID, CheckForMaxAgeCap shouldCheckForMaxAgeCap) Instead of sessionID and shouldCheckForMaxAgeCap, we might want to directly pass an optional<Second>. That will allow decoupling NetworkCache from NetworkStorageSession. It seems NetworkStorageSession is more a loader/data task concept. > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:476 > +std::unique_ptr<Entry> Cache::replaceRedirect(const WebCore::ResourceRequest& request, const Entry& existingEntry, const WebCore::ResourceResponse& response, const WebCore::ResourceRequest& redirectRequest, PAL::SessionID sessionID, CheckForMaxAgeCap shouldCheckForMaxAgeCap) Ditto here. > Source/WebKit/NetworkProcess/cache/NetworkCacheEntry.cpp:145 > + // Optional max age cap. Not sure we need this comment. Should we also encode/decode this no matter whether RESOURCE_LOAD_STATISTICS is defined or not. The reason is that, depending on whether RESOURCE_LOAD_STATISTICS is defined, we should bump or not the network cache version. It might be simpler to just always encode it. > Source/WebKit/NetworkProcess/cache/NetworkCacheStorage.cpp:504 > + { } Antti will know more but I would have thought we either would handle maxAgeCap as bodyHash or put it in the header or body data. > LayoutTests/http/tests/resourceLoadStatistics/cap-cache-max-age-for-prevalent-resource.html:34 > + // This should not trigger a successful cache hit and create another capped cache entry. Do we have a test that ensures that the request doing the max age cap revalidation is not a conditional request?
Antti Koivisto
Comment 19 2018-11-05 08:19:13 PST
Comment on attachment 353749 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=353749&action=review >> Source/WebKit/NetworkProcess/cache/NetworkCacheStorage.cpp:504 >> + { } > > Antti will know more but I would have thought we either would handle maxAgeCap as bodyHash or put it in the header or body data. Right, the Storage layer is only concerned about storing bytes to the disk and includes no semantics. Like other similar things, maxAgeCap should be encoded as header data (in NetworkCache::Entry::encodeAsStorageRecord). Is there some good reason this patch moves it here?
John Wilander
Comment 20 2018-11-05 10:14:33 PST
(In reply to youenn fablet from comment #18) > Comment on attachment 353749 [details] > Patch > > I think the patch is going in the right direction. > Some improvement comments and questions below. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=353749&action=review > > > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:236 > > + m_cacheEntryForValidation = WTFMove(entry); > > I am not sure we should directly reuse m_cacheEntryForValidation. You are right. This is a leftover from when I tried to build this strictly on top of revalidation which turned out not to be a thing for redirects. > When m_cacheEntryForValidation is not empty, we expect to send a conditional > request and receive a 304. Absolutely true. > It might be clearer if we introduce m_cacheEntryForMaxAgeCapValidation. Yes, I will do that. > If not null, we do not have to do conditional request, check for 304 but > instead to check whether the response is the same as the cached entry > response. > > > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:347 > > case UseDecision::NoDueToPrevalentResourceAgeCap: > > Do we still need UseDecision::NoDueToPrevalentResourceAgeCap? > It seems the loader will do the check and cache is just the place that > stores this information. Yeah, you might be right. I have to think it through. > > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:423 > > +std::unique_ptr<Entry> Cache::storeRedirect(const WebCore::ResourceRequest& request, const WebCore::ResourceResponse& response, const WebCore::ResourceRequest& redirectRequest, PAL::SessionID sessionID, CheckForMaxAgeCap shouldCheckForMaxAgeCap) > > Instead of sessionID and shouldCheckForMaxAgeCap, we might want to directly > pass an optional<Second>. > That will allow decoupling NetworkCache from NetworkStorageSession. > > It seems NetworkStorageSession is more a loader/data task concept. Sounds like a good idea. > > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:476 > > +std::unique_ptr<Entry> Cache::replaceRedirect(const WebCore::ResourceRequest& request, const Entry& existingEntry, const WebCore::ResourceResponse& response, const WebCore::ResourceRequest& redirectRequest, PAL::SessionID sessionID, CheckForMaxAgeCap shouldCheckForMaxAgeCap) > > Ditto here. Mhm. > > Source/WebKit/NetworkProcess/cache/NetworkCacheEntry.cpp:145 > > + // Optional max age cap. > > Not sure we need this comment. > > Should we also encode/decode this no matter whether RESOURCE_LOAD_STATISTICS > is defined or not. > The reason is that, depending on whether RESOURCE_LOAD_STATISTICS is > defined, we should bump or not the network cache version. > It might be simpler to just always encode it. OK. > > Source/WebKit/NetworkProcess/cache/NetworkCacheStorage.cpp:504 > > + { } > > Antti will know more but I would have thought we either would handle > maxAgeCap as bodyHash or put it in the header or body data. > > > LayoutTests/http/tests/resourceLoadStatistics/cap-cache-max-age-for-prevalent-resource.html:34 > > + // This should not trigger a successful cache hit and create another capped cache entry. > > Do we have a test that ensures that the request doing the max age cap > revalidation is not a conditional request? How would we test that? You mean if the server responds with a 304? Thanks for all these comments!
John Wilander
Comment 21 2018-11-05 10:15:12 PST
(In reply to Antti Koivisto from comment #19) > Comment on attachment 353749 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=353749&action=review > > >> Source/WebKit/NetworkProcess/cache/NetworkCacheStorage.cpp:504 > >> + { } > > > > Antti will know more but I would have thought we either would handle maxAgeCap as bodyHash or put it in the header or body data. > > Right, the Storage layer is only concerned about storing bytes to the disk > and includes no semantics. Like other similar things, maxAgeCap should be > encoded as header data (in NetworkCache::Entry::encodeAsStorageRecord). Is > there some good reason this patch moves it here? Probably not. I will try to move it. Thanks!
John Wilander
Comment 22 2018-11-05 11:54:23 PST
John Wilander
Comment 23 2018-11-05 11:58:06 PST
youenn fablet
Comment 24 2018-11-06 06:10:08 PST
Comment on attachment 353896 [details] Patch LGTM with comments (I am not sure we should introduce replaceRedirect in particular). We should probably bump the network cache revision number. Since it was done one week ago, maybe this is not needed. If we do not bump it, please add a comment in the ChangeLog for why we are not doing it. As of the check of the conditional request, the server php script could check whether any of the request contains some If-Modified-Since/If-None-Match.. header. In such a case, it could generate a response that would make the test fail. View in context: https://bugs.webkit.org/attachment.cgi?id=353896&action=review > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:601 > +#if ENABLE(RESOURCE_LOAD_STATISTICS) Maybe we could put this new code in a new method called validateCacheEntryForMaxAgeCapValidation. It would return a std::optional<Seconds> If RESOURCE_LOAD_STATISTICS is not defined, it would return nullopt. If m_cacheEntryForMaxAgeCapValidation is null, it would return networkStorageSession->maxAgeCacheCap(request) . In case of replacing the entry, it would not use replaceRedirect, but would just call remove(). The readding of the redirect would happen later on, in the regular code path. That would remove the need for replaceRedirect, which I am not sure is correct, since the new redirectResponse might not be cacheable. Some comments below if you do not like the idea of the utility function/removal of replaceRedirect. > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:609 > + m_cacheEntryForMaxAgeCapValidation = m_cache->replaceRedirect(request, *m_cacheEntryForMaxAgeCapValidation, redirectResponse, redirectRequest, std::nullopt); No need to set m_cacheEntryForMaxAgeCapValidation since we nullify it shortly after without using it. So maybe replaceRedirect should return void. > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:611 > + m_cache->remove(m_cacheEntryForMaxAgeCapValidation->key()); If we remove the previous entry from the cache, do we still want to store the new one? The patch does so which might be safer. > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:617 > + if (!existingCacheEntryIsValid && redirectResponse.source() == ResourceResponse::Source::Network && canUseCachedRedirect(request)) { I would rename existingCacheEntryIsValid to shouldTryCachingEntry. That way we would write: if (shouldTryCachingEntry && ...) That seems to read better to me. > Source/WebKit/NetworkProcess/cache/NetworkCacheSpeculativeLoad.cpp:82 > +#endif I would reduce the code difference by doing: std::optional<Seconds> maxAgeCap; #if ENABLE(RESOURCE_LOAD_STATISTICS) if (auto networkStorageSession = WebCore::NetworkStorageSession::storageSession(PAL::SessionID::defaultSessionID())) maxAgeCap = networkStorageSession->maxAgeCacheCap(request); #endif m_cacheEntry = m_cache->storeRedirect(request, redirectResponse, redirectRequest, maxAgeCap);
John Wilander
Comment 25 2018-11-06 14:28:42 PST
Created attachment 354007 [details] Patch for landing
John Wilander
Comment 26 2018-11-06 14:30:59 PST
Thanks Youenn and Antti! I broke out the functionality into NetworkResourceLoader::validateCacheEntryForMaxAgeCapValidation(), removed Cache::replaceRedirect(), and made sure the test case will fail if validation headers are sent.
John Wilander
Comment 27 2018-11-06 14:33:19 PST
Created attachment 354009 [details] Patch for landing
John Wilander
Comment 28 2018-11-06 14:34:04 PST
The previous patch was missing the explainer in the change log. Now fixed.
WebKit Commit Bot
Comment 29 2018-11-06 15:10:46 PST
Comment on attachment 354009 [details] Patch for landing Clearing flags on attachment: 354009 Committed r237895: <https://trac.webkit.org/changeset/237895>
WebKit Commit Bot
Comment 30 2018-11-06 15:10:48 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.