101300 – Store the initiating CachedResourceRequest

James Simonsen

Reported 2012-11-05 22:00:20 PST

Store the initiating CachedResourceRequest

James Simonsen

Comment 1 2012-11-05 22:20:29 PST

Created attachment 172487 [details] Patch

James Simonsen

Comment 2 2012-11-05 22:24:00 PST

One open question: How long should this live? I really only need it until the load completes. Is it okay to clear it out then? Otherwise, I feel we risk hanging on to things way too long.

Adam Barth

Comment 3 2012-11-05 23:15:15 PST

Comment on attachment 172487 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=172487&action=review > Source/WebCore/loader/SubresourceLoader.cpp:206 > - // Since a subresource loader does not load multipart sections progressively, data was delivered to the loader all at once. > - // After the first multipart section is complete, signal to delegates that this load is "finished" > + // Since a subresource loader does not load multipart sections progressively, data was delivered to the loader all at once. > + // After the first multipart section is complete, signal to delegates that this load is "finished" The only changes to this file appear to be whitespace changes. > Source/WebCore/loader/cache/CachedResource.cpp:360 > - > + Can you post a version of this patch that doesn't contain all these whitespace changes. They're really distracting. > Source/WebCore/loader/cache/CachedResource.h:235 > - > + More noise. :( > Source/WebCore/loader/cache/CachedResource.h:365 > + OwnPtr<CachedResourceRequest> m_initialRequest; How can the CachedResource hold the request? I thought the whole point of this approach is that there were many CachedResourceRequests that would lead to a single CachedResource so we decided to store the initiator state on the request instead of the resource. > Source/WebCore/loader/cache/CachedResourceRequest.h:74 > + RefPtr<Element> m_initiatorElement; > + RefPtr<Document> m_initiatorDocument; This seems very likely to create a reference cycle. Certainly the way this patch has CachedResource holding the CachedResourceRequest, that means the MemoryCache (with its static lifetime) is transitively holding the entire document..... I don't understand how we're managing the lifetimes correctly in this patch. Perhaps it's worth explaining the design in the ChangeLog?

Adam Barth

Comment 4 2012-11-05 23:19:11 PST

> One open question: How long should this live? I really only need it until the load completes. Is it okay to clear it out then? Otherwise, I feel we risk hanging on to things way too long. That is an important question. It's not clear to me what the right answer is.

jochen

Comment 5 2012-11-06 02:00:12 PST

(In reply to comment #4) > > One open question: How long should this live? I really only need it until the load completes. Is it okay to clear it out then? Otherwise, I feel we risk hanging on to things way too long. > > That is an important question. It's not clear to me what the right answer is. For the purpose of notifying the embedder, we don't need to store this at all: I think it's enough to have this information available at willSendRequest() time. If the embedder wants to store some information, e.g. the a pointer to the Element, the embedder can do so itself

Adam Barth

Comment 6 2012-11-06 14:50:27 PST

It doesn't really matter if its the embedder or WebKit that retains the element. We still need to understand what the lifetimes are.

James Simonsen

Comment 7 2012-11-06 21:19:32 PST

Created attachment 172712 [details] Patch

James Simonsen

Comment 8 2012-11-06 21:25:26 PST

(In reply to comment #3) > (From update of attachment 172487 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=172487&action=review > > > Source/WebCore/loader/cache/CachedResource.h:365 > > + OwnPtr<CachedResourceRequest> m_initialRequest; > > How can the CachedResource hold the request? I thought the whole point of this approach is that there were many CachedResourceRequests that would lead to a single CachedResource so we decided to store the initiator state on the request instead of the resource. For Resource Timing, we still need to hang on to the initiator throughout the duration of the load. So the first CachedResourceRequest needs to live until the load completes so that we can report to Resource Timing that it finished. I've changed this so that the SubresourceLoader hangs on to the CachedResourceRequest instead. I hope that makes both of these points clearer. > > > Source/WebCore/loader/cache/CachedResourceRequest.h:74 > > + RefPtr<Element> m_initiatorElement; > > + RefPtr<Document> m_initiatorDocument; > > This seems very likely to create a reference cycle. Certainly the way this patch has CachedResource holding the CachedResourceRequest, that means the MemoryCache (with its static lifetime) is transitively holding the entire document..... I don't understand how we're managing the lifetimes correctly in this patch. Perhaps it's worth explaining the design in the ChangeLog? See above. Now it's owned by the loader, which is only alive throughout the load. Does that help clarify it?

Build Bot

Comment 9 2012-11-06 21:40:42 PST

Comment on attachment 172712 [details] Patch Attachment 172712 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/14747644

Adam Barth

Comment 10 2012-11-07 10:03:14 PST

I don't think we should push the CachedResourceRequest further into the loader pipeline. I'm sorry if I wasn't clear earlier. The problem is that there can be many CachedResourceRequests that result in a single CachedResource and a single ResourceRequest. The idea behind storing the initiator on the CachedResourceRequest is to let the requester hold on to the initiator by remember its own CachedResourceRequest. Pushing CachedResourceRequest below CachedResourceLoader doesn't make sense given the relationship between these objects.

James Simonsen

Comment 11 2012-11-07 11:21:49 PST

(In reply to comment #10) > The idea behind storing the initiator on the CachedResourceRequest is to let the requester hold on to the initiator by remember its own CachedResourceRequest. Pushing CachedResourceRequest below CachedResourceLoader doesn't make sense given the relationship between these objects. The initiator data must be kept separate from the requester, because they have different lifetimes. For Resource Timing, the initiator is the first object to request a resource, no matter what happens to that object. If that object is destroyed before it finishes loading the resources, but another request finishes loading the resource, then *the first (destroyed) object is recorded as the initiator.* Also, for conceptual purposes, an "initiator" is a "requester." Specifically, it's the requester that triggered a network load. For a cached resource, that's the first one. For an uncacheable resource, it's each individual request.

Adam Barth

Comment 12 2012-11-07 13:20:35 PST

> For Resource Timing, the initiator is the first object to request a resource, no matter what happens to that object. That seems like a bad design. That means we need to keep the first object alive for an arbitrary amount of time just in case some other object ends up loading this resource in the future.

James Simonsen

Comment 13 2012-11-07 14:16:16 PST

(In reply to comment #12) > > For Resource Timing, the initiator is the first object to request a resource, no matter what happens to that object. > > That seems like a bad design. That means we need to keep the first object alive for an arbitrary amount of time just in case some other object ends up loading this resource in the future. Perhaps I worded it poorly. The example I gave is correct though. I'll try again: The initiator is bound to the resource load, not the resource itself. If the load is stopped for any reason, we can forget about the initiator. If the load transfers ownership while in progress, we must remember the initiator until the load completes. This is the example I gave earlier, about requester 1 being destroyed, but requester 2 still waiting. In that case, we must remember the initiator data from requester 1. Does that help? If not, maybe read the processing model here: http://w3c-test.org/webperf/specs/ResourceTiming/#processing-model

Adam Barth

Comment 14 2012-11-07 14:23:47 PST

"If the resource is not to be fetched from the networking layer, such as being fetched from an in-memory cache, abort the remaining steps." It sounds like this algorithm was designed by an implementer with a memory cache that works differently from WebKit's MemoryCache. In particular, this step will always cause us to abort the remaining steps because we fetch all these resources from an in-memory cache. It doesn't make sense to me that two documents loading at the same time in different tabs will have their initiator information cross-talk though the MemoryCache if we happen to load them in the same process but will not if we load them in different processes. If we decide to implement an out-of-process network stack akin to what WebKit2 is doing with a shared memory MemoryCache, it's not at all clear to me that we can implement this behavior in a sensible way. In summary, this seems like a bad design. The API we expose to the web should not depend on implementation details about how we decide to merge, fork, or prefetch resources. By avoiding API commitments about how these processes work internally, we free ourselves to continue to innovate in this area.

James Simonsen

Comment 15 2012-11-07 14:41:54 PST

(In reply to comment #14) > "If the resource is not to be fetched from the networking layer, such as being fetched from an in-memory cache, abort the remaining steps." > > It sounds like this algorithm was designed by an implementer with a memory cache that works differently from WebKit's MemoryCache. In particular, this step will always cause us to abort the remaining steps because we fetch all these resources from an in-memory cache. > > It doesn't make sense to me that two documents loading at the same time in different tabs will have their initiator information cross-talk though the MemoryCache if we happen to load them in the same process but will not if we load them in different processes. > > If we decide to implement an out-of-process network stack akin to what WebKit2 is doing with a shared memory MemoryCache, it's not at all clear to me that we can implement this behavior in a sensible way. > > In summary, this seems like a bad design. The API we expose to the web should not depend on implementation details about how we decide to merge, fork, or prefetch resources. By avoiding API commitments about how these processes work internally, we free ourselves to continue to innovate in this area. I don't think it's as bad as it sounds. The intent is simple: any time the document does something that causes a resource to load from the network, record it. Also note this at the top of processing model: "This section is non-normative." Clearly the spec doesn't cover multiple tab situations. That seems like an oversight. The WG is meeting here tomorrow. I'll be happy to raise that issue in person.

Adam Barth

Comment 16 2012-11-07 14:48:31 PST

The processing model is non-normative? That doesn't make much sense. :(

Brady Eidson

Comment 17 2012-11-07 14:57:05 PST

(In reply to comment #15) > (In reply to comment #14) > Clearly the spec doesn't cover multiple tab situations. That seems like an oversight. The WG is meeting here tomorrow. I'll be happy to raise that issue in person. Will there be IRC and/or a call-in? Any room for observers?

James Simonsen

Comment 18 2012-11-07 14:58:46 PST

(In reply to comment #16) > The processing model is non-normative? That doesn't make much sense. :( Tony and I are on the WG. We can push for changes if you have requests. We felt the spec serves the purpose of what developers want (effectively inspector's network timeline available remotely). We just need to figure out how to work it into WebKit. In the meantime, what do we need to do to move forward? Marja is still waiting on this too.

James Simonsen

Comment 19 2012-11-07 15:00:41 PST

(In reply to comment #17) > (In reply to comment #15) > > (In reply to comment #14) > > Clearly the spec doesn't cover multiple tab situations. That seems like an oversight. The WG is meeting here tomorrow. I'll be happy to raise that issue in person. > > Will there be IRC and/or a call-in? Any room for observers? This is a workshop rather than a normal meeting, so I'm not sure. The details are here: http://www.w3.org/2012/11/performance-workshop/ Our weekly conference calls are Wednesdays at 10am PST. See public-web-perf@w3.org for details.

Adam Barth

Comment 20 2012-11-07 15:08:43 PST

> In the meantime, what do we need to do to move forward? Marja is still waiting on this too. Does Marja's use case involve storing the CachedResourceRequest in the CachedResource (or lower down in the loader)? If not, it might be easier to work on her use case first. In general, though, what we need to move forward is a design that doesn't require picking a "most special" requester among all the requesters of a resource.

Brady Eidson

Comment 21 2012-11-07 15:12:24 PST

I've been pretty quiet so far, but want to go on the record as echo'ing Adam's objections pretty much point for point.

James Simonsen

Comment 22 2012-11-07 15:41:38 PST

(In reply to comment #20) > > In the meantime, what do we need to do to move forward? Marja is still waiting on this too. > > Does Marja's use case involve storing the CachedResourceRequest in the CachedResource (or lower down in the loader)? If not, it might be easier to work on her use case first. From Jochen's reply, it sounds like they need it available whenever willSendRequest() is called. I think that implies that it sticks around for a bit. Maybe they can chime in on exactly where they would like the information available (resource or loader). > > In general, though, what we need to move forward is a design that doesn't require picking a "most special" requester among all the requesters of a resource. Since the processing model is non-normative (even I hadn't noticed that before), I'd be fine with coming up with our own definition. Say, first active requester when the load completes. I'll then take up the issue with the WG.

Marja Hölttä

Comment 23 2012-11-08 00:49:07 PST

For our use case, we want to track events like "image element requested to load this image", not the actual load. If two image elements request the same image, we want to get both events "img1 requested the image" and "img2 requested the image", and we don't care how these requests are served (e.g., the first one will cause a load from the network and the second one will be served from the cache). So when the CachedResourceLoader gets the request to load the image, we'd notify the embedder about it, and that's it. The embedder can then store a reference to the Element if it wants, but our code doesn't need to store any references. Everything that happens after that point (actually loading the resource etc.) would be unaffected by our changes.

James Simonsen

Comment 24 2012-11-13 17:43:29 PST

This patch is probably dead now, but I want to follow up on the Resource Timing part... First, the processing model *is* normative. The diagram is not. The editors will clarify that. Second, everyone agreed the caching scenarios are not adequately defined. It's not just us. We didn't reach a conclusion during the F2F, so I started a thread here: http://lists.w3.org/Archives/Public/public-web-perf/2012Nov/thread.html Finally, here's another stab at moving forward (assuming the main problem is hanging on to Documents and Elements too long)... Have a ResourceTimingNotifier singleton. Whenever a CachedResourceRequest is sent, record the initiator's name, Document, and a pointer to its ResourceLoader (solely to identify it) in the singleton. When a ResourceLoader finishes, notify the ResourceTimingNotifier. The ResourceTimingNotifier will populate the appropriate Documents' ResourceTiming buffers with the finished information. If a Document is deleted, it will remove itself from the ResourceTimingNotifier. So, the ResourceTimingNotifier only stores raw pointers, which means no cycles, and we just need to update it at the correct times. If the raw pointers are unappealing, we can assign unique IDs. Is this better? Or worse?

Adam Barth

Comment 25 2012-11-13 22:54:20 PST

Why can't we just use the normal client callbacks to do this work? The ResourceTimingNotifier static just seems like a back door from ResourceLoader to Document, bypassing the layering in the loader. I don't fully understand Brady's plan for moving the loader out of process in WebKit2, so I can't comment on any implications that has for the design of this feature.

James Simonsen

Comment 26 2012-11-14 17:11:49 PST

(In reply to comment #25) > Why can't we just use the normal client callbacks to do this work? The ResourceTimingNotifier static just seems like a back door from ResourceLoader to Document, bypassing the layering in the loader. That seems like a reasonable start. There are a few details to work out: How does Resource Timing register for those callbacks? There's one Resource Timing buffer per DOMWindow. Really it should only get callbacks for requests in its document. Perhaps DocumentLoader could be that central registration point? It already keeps track of all the SubresourceLoaders. It could also track the ThreadableLoaders and (child) FrameLoaders. Otherwise, maybe some sort of singleton where documents can register additional clients to append to all loads? Then, we want Resource Timing to register as a client for FrameLoader, ThreadableLoader, and SubresourceLoader loads. I think the first two will need to be modified to accept multiple clients. That should be easy. Then we just need to inject the additional clients any time a load happens. Here's another alternative building off of the DocumentLoader idea and addressing the layering issue... Implement Resource Timing in DocumentLoader. DocumentLoader tracks the loads and populates the Resource Timing buffer. Then just have the Resource Timing API pull data from DocumentLoader as needed.

Attachments
Patch (57.41 KB, patch) 2012-11-05 22:20 PST, James Simonsen	no flags	Details Formatted Diff Diff
Patch (53.90 KB, patch) 2012-11-06 21:19 PST, James Simonsen	abarth: review- buildbot: commit-queue-	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.