WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
203058
HEAD requests are not cached
https://bugs.webkit.org/show_bug.cgi?id=203058
Summary
HEAD requests are not cached
Ben Nham
Reported
2019-10-16 15:31:34 PDT
Created
attachment 381123
[details]
test case for GET followed by HEAD The warm Amazon page load on PLT5 shows a number of static resources that aren't cached. This is due to a GET followed by a HEAD for the same resource: 1. The main page has an <img src="foo.jpg"> which causes a GET for foo.jpg. 2. Some time later, they do a HEAD request for foo.jpg via an XHR (they extract Content-Length out of the response headers to power some other logic). I've attached a reduced test case that makes this request pattern. (Note that Amazon's home page even as of today shows this request pattern, so it's not entirely theoretical.) This request pattern causes misses in both our memory and disk caches. # Memory Cache In the memory cache, the GET has a type of ImageResource, while the HEAD has a type of RawResource. The mismatched types cause the memory cache to always reload the resource, as shown in `CachedResourceLoader::determineRevalidationPolicy`: ``` // If the same URL has been loaded as a different type, we need to reload. if (existingResource->type() != type) { LOG(ResourceLoading, "CachedResourceLoader::determineRevalidationPolicy reloading due to type mismatch."); logMemoryCacheResourceRequest(frame(), DiagnosticLoggingKeys::inMemoryCacheKey(), DiagnosticLoggingKeys::unusedReasonTypeMismatchKey()); return Reload; } ``` # Disk Cache In the disk cache, the first GET is cached to disk as expected. However, `makeStoreDecision` decides to not store HEAD responses: ``` if (originalRequest.httpMethod() != "GET") return StoreDecision::NoDueToHTTPMethod; ``` In response, `Cache::store` deletes the existing GET response from the disk cache: ``` StoreDecision storeDecision = makeStoreDecision(request, response, responseData ? responseData->size() : 0); if (storeDecision != StoreDecision::Yes) { LOG(NetworkCache, "(NetworkProcess) didn't store, storeDecision=%d", static_cast<int>(storeDecision)); auto key = makeCacheKey(request); auto isSuccessfulRevalidation = response.httpStatusCode() == 304; if (!isSuccessfulRevalidation) { // Make sure we don't keep a stale entry in the cache. remove(key); } return nullptr; } ``` So as a result, when the page load is complete, there is nothing at all cached for foo.jpg. So on reload, we actually end up fetching foo.jpg from the network again, for both the GET and HEAD requests. # Other Browsers Chrome is able to downgrade a cached GET response to a HEAD response. When loading the test case in Chrome: 1. On a cold load, foo.jpg is loaded from the network for the GET, and then the disk cache cache is used to satisfy the HEAD request. 2. On a warm load, both the GET and HEAD requests are served out of disk cache.
Attachments
test case for GET followed by HEAD
(712 bytes, text/html)
2019-10-16 15:31 PDT
,
Ben Nham
no flags
Details
WIP: allow HEADs to use cached GETs
(4.96 KB, patch)
2019-10-29 15:28 PDT
,
Ben Nham
no flags
Details
Formatted Diff
Diff
WIP: cache HEADs as well as GETs
(2.47 KB, patch)
2019-10-29 15:29 PDT
,
Ben Nham
no flags
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2019-10-16 15:32:11 PDT
<
rdar://problem/56349558
>
Chris Dumez
Comment 2
2019-10-16 15:56:15 PDT
I am surprised we are not caching responses to HEAD requests. AFAIK, it is allowed and it would not be hard to implement. That said, it would be useful to get an estimate of the impact on PLT to see if it is worth doing.
Ben Nham
Comment 3
2019-10-16 16:15:54 PDT
I do not think it will be a huge win, because we are not blocking DOMContentLoaded or FirstMeaningfulPaint on these resources loading. It would impact AllSubresourcesLoaded time only. There are 13 resources on the Amazon snapshot in the test that exhibit this behavior. The other sites in the test would be unaffected. For PLT5: 1. On the cold run, we'd have to do the 13 GETs, but potentially the 13 HEADs could be served out of cache. 2. On the warm run, all 26 requests (13 GETs and 13 HEADs) could potentially be served out of cache. We open multiple connections to the PLT5 server in parallel, so we save on 6 parallel SSL session setups to the same hostname, plus 13-26 transmission delays which are parallelized onto those 6 concurrent connections. Probably looking at something 100-200 ms saved on the warm Amazon PLT. I can try to jerry-rig up some quick and dirty patch to try to get a better estimate from the A/B bots. Another consideration is that maybe we should not bother caching HEADs directly but just support the downgrade from a cached GET to a cached HEAD.
Geoffrey Garen
Comment 4
2019-10-16 21:28:47 PDT
FWIW, in cases where the optimization is clear and not hard to do, my preference is just to do it. Even if it's no speedup in the end, it un-muddies the waters, and prevents the next person from tripping over the same issue.
Antti Koivisto
Comment 5
2019-10-16 22:31:19 PDT
We definitely should support it. Implementation is not totally trivial since you'd also want to support responding with cached GET results. The reason this wasn't done earlier is simply because we hadn't seen any interesting content using HEAD.
Ben Nham
Comment 6
2019-10-29 11:27:45 PDT
There seem to be two ways this is currently supported in the field: 1. Cache GET and HEAD requests separately without support for GET => HEAD downgrade (Firefox) 2. Don't cache HEAD requests directly, but support downgrade of a cached GET to a cached HEAD (Chrome) Obviously we could do both, but it would be a bit more complicated than doing one or the other (because we'd potentially now do two cache lookups for a HEAD request rather than just one). I've tried out a simple patch that treats GETs and HEADs separately and it doesn't have an effect on PLT even though it does properly result in cached HEADs in the warm trace. I suspected this would happen due to the requests not being on the critical path. Based on that, my vote is to do the simplest thing that adds this feature and is maintainable.
Antti Koivisto
Comment 7
2019-10-29 13:20:39 PDT
Caching HEAD itself seems pretty pointless. The Chrome behavior is probably the way to go.
Antti Koivisto
Comment 8
2019-10-29 13:25:54 PDT
In terms of implementation it should be pretty much a matter of replacing HEAD with GET in the cache key and maybe skipping the unneeded fetching of the body from the disk (though that can be later optimization if needed).
Ben Nham
Comment 9
2019-10-29 15:28:49 PDT
Created
attachment 382236
[details]
WIP: allow HEADs to use cached GETs
Ben Nham
Comment 10
2019-10-29 15:29:15 PDT
Created
attachment 382238
[details]
WIP: cache HEADs as well as GETs
Ben Nham
Comment 11
2019-10-29 15:31:19 PDT
Neither seems too complicated to implement, so I uploaded proposed patches for comment for both strategies (although it sounds like we want to go with the downgrading of cached GETs). Since I haven't written too many WK patches before I'm just looking to make sure I'm going down the right path. There aren't any test cases yet, but I'll add some test cases to the disk-cache suite soon.
Geoffrey Garen
Comment 12
2019-11-05 09:02:24 PST
Some details: - I clicked the "patch" box in the Details view so your changes now show up as patches, and can be seen as diffs and submitted to EWS. - You need a ChangeLog. I usually use the webkit-patch tool or the prepare-ChangeLog tool to help generate boilerplate, and then I fill it in with comments.
Geoffrey Garen
Comment 13
2019-11-05 09:13:36 PST
Comment on
attachment 382236
[details]
WIP: allow HEADs to use cached GETs View in context:
https://bugs.webkit.org/attachment.cgi?id=382236&action=review
> Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:214 > + if (decision == UseDecision::Validate && isHead) { > + // Seems silly to validate a HEAD, so just force the real request to go out. > + return UseDecision::NoDueToHTTPMethod; > + }
I don't fully understand from this comment why you went to the extra trouble to prohibit Validate on HEAD in the cache. Does silly in this context mean that it would be slow, or incorrect, or something else? If the behavior would be surprising but still correct, it might be best just to let it happen instead of special casing it.
> Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:327 > + bool isHead = request.httpMethod() == "HEAD"; > + auto retrieveDecision = makeRetrieveDecision(request, isHead);
It's inconsistent to use a boolean to identify HEAD requests and a string comparison to identify GET requests. Let's pick one way and stick with it. If we do, the overall special casing and intrusion of HEAD into the cache's design will be smaller. I think string comparison may be fine for performance, since it's just three (GET) or four (HEAD) branches. So we could just use that. That said, if we want something better than string comparison, I'd suggest a refactoring patch to add httpMethodForCache() to ResourceRequest, and have it return an enum class { Get, Head, Other }. The ResourceRequest could initialize this member in its constructor, or lazily.
Geoffrey Garen
Comment 14
2019-11-05 09:16:28 PST
Comment on
attachment 382238
[details]
WIP: cache HEADs as well as GETs This patch looks good to me, and not particularly hard to maintain going forward. Based on this, I would support doing both.
Ahmad Saleem
Comment 15
2022-06-07 14:31:57 PDT
Despite this patch not landing (can verify via Webkit Github mirror), I am seeing below mentioned behavior with attached test case in Safari 15.5 on macOS 12.4. Link -
https://github.com/WebKit/WebKit/blob/main/Source/WebKit/NetworkProcess/cache/NetworkCache.cpp
Each time reliably on running test case attached, I get "HEAD XHR succeeded!" but when I use "Network" with "Method" column enabled, I can see both GET and HEAD responses. Despite enabling and disabling caches, I see both requests. Although, "Resource Size" is zero in case of HEAD request. Just wanted to update latest behavior and if I testing it wrong or incorrect. Happy to be corrected. Thanks!
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug