Bug 144259

Summary: Network Cache: Disk cache getting filled by YouTube video data
Product: WebKit Reporter: Antti Koivisto <koivisto>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: aestes, ap, barraclough, cdumez, commit-queue, darin, eric.carlson, jer.noble, mitz, mjs
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=144533
Attachments:
Description Flags
patch
none
patch
none
patch
none
patch darin: review+

Description Antti Koivisto 2015-04-27 06:15:16 PDT
MSE media loads via the cache and tends to eventually fill it.
Comment 1 Antti Koivisto 2015-04-27 06:15:41 PDT
rdar://problem/20127334
Comment 2 Antti Koivisto 2015-04-27 06:31:05 PDT
Created attachment 251736 [details]
patch
Comment 3 WebKit Commit Bot 2015-04-27 06:32:42 PDT
Attachment 251736 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/network/ResourceRequestBase.h:150:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 18 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Antti Koivisto 2015-04-27 06:55:32 PDT
Created attachment 251738 [details]
patch
Comment 5 WebKit Commit Bot 2015-04-27 06:57:08 PDT
Attachment 251738 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/network/ResourceRequestBase.h:150:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 18 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Antti Koivisto 2015-04-27 07:55:32 PDT
Created attachment 251739 [details]
patch
Comment 7 WebKit Commit Bot 2015-04-27 07:57:20 PDT
Attachment 251739 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/network/ResourceRequestBase.h:150:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Antti Koivisto 2015-04-27 08:36:01 PDT
Created attachment 251743 [details]
patch
Comment 9 WebKit Commit Bot 2015-04-27 08:38:12 PDT
Attachment 251743 [details] did not pass style-queue:


ERROR: Source/WebCore/platform/network/ResourceRequestBase.h:150:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 1 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Darin Adler 2015-04-27 09:06:33 PDT
Comment on attachment 251743 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=251743&action=review

> Source/WebCore/platform/network/ResourceRequestBase.h:231
> +        Requester m_requester { Requester::Unspecified };

The data members just above are going out of their way to use bitfields, and here for something that takes only 2 bits we are not using one.

> Source/WebKit2/NetworkProcess/cache/NetworkCache.cpp:307
> +    bool isLikelyStreamingMedia = isXHR && (response.mimeType().startsWith("video/") || response.mimeType().startsWith("audio/"));

Is MIME type guaranteed to be lower case? I think it would be best to do this comparison in a case folding way.
Comment 11 Antti Koivisto 2015-04-27 10:35:16 PDT
> The data members just above are going out of their way to use bitfields, and
> here for something that takes only 2 bits we are not using one.

Those bitfields are an over-optimization. I was planning to get rid of them separately.
Comment 12 Geoffrey Garen 2015-04-27 10:53:16 PDT
Comment on attachment 251743 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=251743&action=review

> Source/WebKit2/NetworkProcess/cache/NetworkCache.cpp:309
> +    // Media loaded via XHR is likely being used for MSE streaming (YouTube and Netflix for example).
> +    // Streaming media fills the cache quickly and is unlikely to be reused.
> +    bool isXHR = originalRequest.requester() == WebCore::ResourceRequest::Requester::XHR;
> +    bool isLikelyStreamingMedia = isXHR && (response.mimeType().startsWith("video/") || response.mimeType().startsWith("audio/"));
> +    if (isLikelyStreamingMedia)
> +        return StoreDecision::NoDueToStreamingMedia;

We don't think that people rewind, fast-forward, and/or revisit videos? I tend to do all of those things. :(
Comment 13 Alexey Proskuryakov 2015-04-27 21:14:57 PDT
We just removed "requestor" from ResourceRequest due to it being a layering violation, and deprecated isMainResourceRequest, so it's unfortunate that it's being added back. ResourceRequest is a network level concept that shouldn't know about "XMLHttpRequest" on "main resource" in my opinion.

Should we have an abstraction for networking, or just use web concepts all the way through?
Comment 14 Antti Koivisto 2015-04-28 00:19:50 PDT
In practice we need this sort of information through the stack. We can pass it via side channels. This is bug-prone and cumbersome. We could add a new high level request type that knows about the web and wraps a low level type that doesn't. This is adds yet another layer to the already too layered networking code.

In the end we are making a browser engine not a platonic networking platform. "Web concepts all the way through" is a decent description of WebKit architecture.
Comment 15 Antti Koivisto 2015-04-28 03:20:08 PDT
> We don't think that people rewind, fast-forward, and/or revisit videos? I
> tend to do all of those things. :(

At least the current YouTube player never seems to generate the same URL twice so cache entires are never hit. For rewind and fast-forward have 5min 1024p, 314MB memory cache on MSE level (bigger than the entire disk cache).

We can't currently move data from the memory cache to MSE without copying so there is no mapped file benefit either.

We should revisit this at some point. I think we want a separate more transient media cache section on disk and we should allow copy-free use of mapped buffers through the stack.
Comment 16 Antti Koivisto 2015-04-28 03:21:26 PDT
> We can't currently move data from the memory cache to MSE without copying so
> there is no mapped file benefit either.

from the disk cache I mean
Comment 17 Antti Koivisto 2015-04-28 05:51:07 PDT
https://trac.webkit.org/r183467
Comment 18 mitz 2015-04-28 07:18:04 PDT
(In reply to comment #17)
> https://trac.webkit.org/r183467

Looks like this broke the USE(CFNETWORK) iOS configuration:

Source/WebCore/platform/network/ios/ResourceRequestIOS.mm:42:7: error: member initializer 'm_mainResourceRequest' does not name a non-static data member or base class
Comment 19 Andy Estes 2015-04-28 08:54:48 PDT
Mavericks Debug build fix: http://trac.webkit.org/changeset/183478
Comment 20 Darin Adler 2015-04-28 09:26:45 PDT
Regarding Alexey and Antti’s discussion about “web concepts” at the lower level: I think what we want for the layering here is that the lower level exposes the concepts it uses to make the cache work, and the higher level adopts/implements those. There is no reason, in my opinion, that the lower level concepts need to be “non-web”, but we do want them to be as simple as possible, not simply plumbing through the entire high level. And obviously, “nothing at all” is the simplest, but may not be sufficient.
Comment 21 Alexey Proskuryakov 2015-04-28 09:56:53 PDT
It's not really a choice about platonic vs. pragmatical, it's simple layered software design.

If you want networking to implement a policy that is easy for other engineers to understand, if should have a descriptive name (not "whatever XHR needs"), and the policies should be as centralized as possible, not scattered across multiple frameworks and processes.

In other words, policy decisions are made before a ResourceRequest is passed to NetworkProcess. It should be especially straightforward in this case, where you just want to inhibit caching.
Comment 22 Antti Koivisto 2015-04-28 13:20:09 PDT
If you look at the patch you will see that the requster is used for number of non-cache related things. Those used to be handled by a side-channel and by (already) having the requester type in ResourceRequest (on iOS).

I want to keep cache decision-making in one place. I also don't want to be coy and use some other name when I really mean XHR. I would be surprised if this confused anyone. Generally less abstract code is more understandable.
Comment 23 Antti Koivisto 2015-04-28 13:35:50 PDT
Also "which layer knows about which concept" kind of layering is very soft when not reflected in code dependencies. I don't think there is any "simple layered software design" hard rules here.
Comment 24 Alexey Proskuryakov 2015-05-04 19:49:07 PDT
This caused bug 144533.

For the record, I still think that this is poor design. Even your own words demonstrate that - you say that one needs to look at the patch to find out what "XMLHttpRequest" means to the low level networking code. I do not think that any human can fully understand the relationship between explicit behaviors passed from upper levels and implicit ones triggered by ResourceRequest::Requester::XHR.