203058 – HEAD requests are not cached

NEW 203058

HEAD requests are not cached

https://bugs.webkit.org/show_bug.cgi?id=203058

Summary HEAD requests are not cached

Ben Nham

Reported 2019-10-16 15:31:34 PDT

Created attachment 381123 [details] test case for GET followed by HEAD The warm Amazon page load on PLT5 shows a number of static resources that aren't cached. This is due to a GET followed by a HEAD for the same resource: 1. The main page has an <img src="foo.jpg"> which causes a GET for foo.jpg. 2. Some time later, they do a HEAD request for foo.jpg via an XHR (they extract Content-Length out of the response headers to power some other logic). I've attached a reduced test case that makes this request pattern. (Note that Amazon's home page even as of today shows this request pattern, so it's not entirely theoretical.) This request pattern causes misses in both our memory and disk caches. # Memory Cache In the memory cache, the GET has a type of ImageResource, while the HEAD has a type of RawResource. The mismatched types cause the memory cache to always reload the resource, as shown in `CachedResourceLoader::determineRevalidationPolicy`: ``` // If the same URL has been loaded as a different type, we need to reload. if (existingResource->type() != type) { LOG(ResourceLoading, "CachedResourceLoader::determineRevalidationPolicy reloading due to type mismatch."); logMemoryCacheResourceRequest(frame(), DiagnosticLoggingKeys::inMemoryCacheKey(), DiagnosticLoggingKeys::unusedReasonTypeMismatchKey()); return Reload; } ``` # Disk Cache In the disk cache, the first GET is cached to disk as expected. However, `makeStoreDecision` decides to not store HEAD responses: ``` if (originalRequest.httpMethod() != "GET") return StoreDecision::NoDueToHTTPMethod; ``` In response, `Cache::store` deletes the existing GET response from the disk cache: ``` StoreDecision storeDecision = makeStoreDecision(request, response, responseData ? responseData->size() : 0); if (storeDecision != StoreDecision::Yes) { LOG(NetworkCache, "(NetworkProcess) didn't store, storeDecision=%d", static_cast<int>(storeDecision)); auto key = makeCacheKey(request); auto isSuccessfulRevalidation = response.httpStatusCode() == 304; if (!isSuccessfulRevalidation) { // Make sure we don't keep a stale entry in the cache. remove(key); } return nullptr; } ``` So as a result, when the page load is complete, there is nothing at all cached for foo.jpg. So on reload, we actually end up fetching foo.jpg from the network again, for both the GET and HEAD requests. # Other Browsers Chrome is able to downgrade a cached GET response to a HEAD response. When loading the test case in Chrome: 1. On a cold load, foo.jpg is loaded from the network for the GET, and then the disk cache cache is used to satisfy the HEAD request. 2. On a warm load, both the GET and HEAD requests are served out of disk cache.

Attachments
test case for GET followed by HEAD (712 bytes, text/html) 2019-10-16 15:31 PDT, Ben Nham	no flags	Details
WIP: allow HEADs to use cached GETs (4.96 KB, patch) 2019-10-29 15:28 PDT, Ben Nham	no flags	Details Formatted Diff Diff
WIP: cache HEADs as well as GETs (2.47 KB, patch) 2019-10-29 15:29 PDT, Ben Nham	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2019-10-16 15:32:11 PDT

<rdar://problem/56349558>

Chris Dumez

Comment 2 2019-10-16 15:56:15 PDT

I am surprised we are not caching responses to HEAD requests. AFAIK, it is allowed and it would not be hard to implement. That said, it would be useful to get an estimate of the impact on PLT to see if it is worth doing.

Ben Nham

Comment 3 2019-10-16 16:15:54 PDT

I do not think it will be a huge win, because we are not blocking DOMContentLoaded or FirstMeaningfulPaint on these resources loading. It would impact AllSubresourcesLoaded time only. There are 13 resources on the Amazon snapshot in the test that exhibit this behavior. The other sites in the test would be unaffected. For PLT5: 1. On the cold run, we'd have to do the 13 GETs, but potentially the 13 HEADs could be served out of cache. 2. On the warm run, all 26 requests (13 GETs and 13 HEADs) could potentially be served out of cache. We open multiple connections to the PLT5 server in parallel, so we save on 6 parallel SSL session setups to the same hostname, plus 13-26 transmission delays which are parallelized onto those 6 concurrent connections. Probably looking at something 100-200 ms saved on the warm Amazon PLT. I can try to jerry-rig up some quick and dirty patch to try to get a better estimate from the A/B bots. Another consideration is that maybe we should not bother caching HEADs directly but just support the downgrade from a cached GET to a cached HEAD.

Geoffrey Garen

Comment 4 2019-10-16 21:28:47 PDT

FWIW, in cases where the optimization is clear and not hard to do, my preference is just to do it. Even if it's no speedup in the end, it un-muddies the waters, and prevents the next person from tripping over the same issue.

Antti Koivisto

Comment 5 2019-10-16 22:31:19 PDT

We definitely should support it. Implementation is not totally trivial since you'd also want to support responding with cached GET results. The reason this wasn't done earlier is simply because we hadn't seen any interesting content using HEAD.

Ben Nham

Comment 6 2019-10-29 11:27:45 PDT

There seem to be two ways this is currently supported in the field: 1. Cache GET and HEAD requests separately without support for GET => HEAD downgrade (Firefox) 2. Don't cache HEAD requests directly, but support downgrade of a cached GET to a cached HEAD (Chrome) Obviously we could do both, but it would be a bit more complicated than doing one or the other (because we'd potentially now do two cache lookups for a HEAD request rather than just one). I've tried out a simple patch that treats GETs and HEADs separately and it doesn't have an effect on PLT even though it does properly result in cached HEADs in the warm trace. I suspected this would happen due to the requests not being on the critical path. Based on that, my vote is to do the simplest thing that adds this feature and is maintainable.

Antti Koivisto

Comment 7 2019-10-29 13:20:39 PDT

Caching HEAD itself seems pretty pointless. The Chrome behavior is probably the way to go.

Antti Koivisto

Comment 8 2019-10-29 13:25:54 PDT

In terms of implementation it should be pretty much a matter of replacing HEAD with GET in the cache key and maybe skipping the unneeded fetching of the body from the disk (though that can be later optimization if needed).

Ben Nham

Comment 9 2019-10-29 15:28:49 PDT

Created attachment 382236 [details] WIP: allow HEADs to use cached GETs

Ben Nham

Comment 10 2019-10-29 15:29:15 PDT

Created attachment 382238 [details] WIP: cache HEADs as well as GETs

Ben Nham

Comment 11 2019-10-29 15:31:19 PDT

Neither seems too complicated to implement, so I uploaded proposed patches for comment for both strategies (although it sounds like we want to go with the downgrading of cached GETs). Since I haven't written too many WK patches before I'm just looking to make sure I'm going down the right path. There aren't any test cases yet, but I'll add some test cases to the disk-cache suite soon.

Geoffrey Garen

Comment 12 2019-11-05 09:02:24 PST

Some details: - I clicked the "patch" box in the Details view so your changes now show up as patches, and can be seen as diffs and submitted to EWS. - You need a ChangeLog. I usually use the webkit-patch tool or the prepare-ChangeLog tool to help generate boilerplate, and then I fill it in with comments.

Geoffrey Garen

Comment 13 2019-11-05 09:13:36 PST

Comment on attachment 382236 [details] WIP: allow HEADs to use cached GETs View in context: https://bugs.webkit.org/attachment.cgi?id=382236&action=review > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:214 > + if (decision == UseDecision::Validate && isHead) { > + // Seems silly to validate a HEAD, so just force the real request to go out. > + return UseDecision::NoDueToHTTPMethod; > + } I don't fully understand from this comment why you went to the extra trouble to prohibit Validate on HEAD in the cache. Does silly in this context mean that it would be slow, or incorrect, or something else? If the behavior would be surprising but still correct, it might be best just to let it happen instead of special casing it. > Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:327 > + bool isHead = request.httpMethod() == "HEAD"; > + auto retrieveDecision = makeRetrieveDecision(request, isHead); It's inconsistent to use a boolean to identify HEAD requests and a string comparison to identify GET requests. Let's pick one way and stick with it. If we do, the overall special casing and intrusion of HEAD into the cache's design will be smaller. I think string comparison may be fine for performance, since it's just three (GET) or four (HEAD) branches. So we could just use that. That said, if we want something better than string comparison, I'd suggest a refactoring patch to add httpMethodForCache() to ResourceRequest, and have it return an enum class { Get, Head, Other }. The ResourceRequest could initialize this member in its constructor, or lazily.

Geoffrey Garen

Comment 14 2019-11-05 09:16:28 PST

Comment on attachment 382238 [details] WIP: cache HEADs as well as GETs This patch looks good to me, and not particularly hard to maintain going forward. Based on this, I would support doing both.

Ahmad Saleem

Comment 15 2022-06-07 14:31:57 PDT

Despite this patch not landing (can verify via Webkit Github mirror), I am seeing below mentioned behavior with attached test case in Safari 15.5 on macOS 12.4. Link - https://github.com/WebKit/WebKit/blob/main/Source/WebKit/NetworkProcess/cache/NetworkCache.cpp Each time reliably on running test case attached, I get "HEAD XHR succeeded!" but when I use "Network" with "Method" column enabled, I can see both GET and HEAD responses. Despite enabling and disabling caches, I see both requests. Although, "Resource Size" is zero in case of HEAD request. Just wanted to update latest behavior and if I testing it wrong or incorrect. Happy to be corrected. Thanks!

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware All

OS All

Product WebKit

Component Page Loading

Assignee

Ben Nham

Reported

2019-10-16 15:31 PDT

Modified

2022-06-07 14:31 PDT History

CC List

9 users Show

URL

Keywords InRadar

Depends on

Blocks