Bug 101300 - Store the initiating CachedResourceRequest
Summary: Store the initiating CachedResourceRequest
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: James Simonsen
URL:
Keywords:
Depends on: 99736
Blocks: 84883
  Show dependency treegraph
 
Reported: 2012-11-05 22:00 PST by James Simonsen
Modified: 2012-12-03 14:23 PST (History)
14 users (show)

See Also:


Attachments
Patch (57.41 KB, patch)
2012-11-05 22:20 PST, James Simonsen
no flags Details | Formatted Diff | Diff
Patch (53.90 KB, patch)
2012-11-06 21:19 PST, James Simonsen
abarth: review-
buildbot: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Simonsen 2012-11-05 22:00:20 PST
Store the initiating CachedResourceRequest
Comment 1 James Simonsen 2012-11-05 22:20:29 PST
Created attachment 172487 [details]
Patch
Comment 2 James Simonsen 2012-11-05 22:24:00 PST
One open question: How long should this live? I really only need it until the load completes. Is it okay to clear it out then? Otherwise, I feel we risk hanging on to things way too long.
Comment 3 Adam Barth 2012-11-05 23:15:15 PST
Comment on attachment 172487 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=172487&action=review

> Source/WebCore/loader/SubresourceLoader.cpp:206
> -        // Since a subresource loader does not load multipart sections progressively, data was delivered to the loader all at once.        
> -        // After the first multipart section is complete, signal to delegates that this load is "finished" 
> +        // Since a subresource loader does not load multipart sections progressively, data was delivered to the loader all at once.
> +        // After the first multipart section is complete, signal to delegates that this load is "finished"

The only changes to this file appear to be whitespace changes.

> Source/WebCore/loader/cache/CachedResource.cpp:360
> -    
> +

Can you post a version of this patch that doesn't contain all these whitespace changes.  They're really distracting.

> Source/WebCore/loader/cache/CachedResource.h:235
> -    
> +

More noise.  :(

> Source/WebCore/loader/cache/CachedResource.h:365
> +    OwnPtr<CachedResourceRequest> m_initialRequest;

How can the CachedResource hold the request?  I thought the whole point of this approach is that there were many CachedResourceRequests that would lead to a single CachedResource so we decided to store the initiator state on the request instead of the resource.

> Source/WebCore/loader/cache/CachedResourceRequest.h:74
> +    RefPtr<Element> m_initiatorElement;
> +    RefPtr<Document> m_initiatorDocument;

This seems very likely to create a reference cycle.  Certainly the way this patch has CachedResource holding the CachedResourceRequest, that means the MemoryCache (with its static lifetime) is transitively holding the entire document.....  I don't understand how we're managing the lifetimes correctly in this patch.  Perhaps it's worth explaining the design in the ChangeLog?
Comment 4 Adam Barth 2012-11-05 23:19:11 PST
> One open question: How long should this live? I really only need it until the load completes. Is it okay to clear it out then? Otherwise, I feel we risk hanging on to things way too long.

That is an important question.  It's not clear to me what the right answer is.
Comment 5 jochen 2012-11-06 02:00:12 PST
(In reply to comment #4)
> > One open question: How long should this live? I really only need it until the load completes. Is it okay to clear it out then? Otherwise, I feel we risk hanging on to things way too long.
> 
> That is an important question.  It's not clear to me what the right answer is.

For the purpose of notifying the embedder, we don't need to store this at all: I think it's enough to have this information available at willSendRequest() time. If the embedder wants to store some information, e.g. the a pointer to the Element, the embedder can do so itself
Comment 6 Adam Barth 2012-11-06 14:50:27 PST
It doesn't really matter if its the embedder or WebKit that retains the element.  We still need to understand what the lifetimes are.
Comment 7 James Simonsen 2012-11-06 21:19:32 PST
Created attachment 172712 [details]
Patch
Comment 8 James Simonsen 2012-11-06 21:25:26 PST
(In reply to comment #3)
> (From update of attachment 172487 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=172487&action=review
> 
> > Source/WebCore/loader/cache/CachedResource.h:365
> > +    OwnPtr<CachedResourceRequest> m_initialRequest;
> 
> How can the CachedResource hold the request?  I thought the whole point of this approach is that there were many CachedResourceRequests that would lead to a single CachedResource so we decided to store the initiator state on the request instead of the resource.

For Resource Timing, we still need to hang on to the initiator throughout the duration of the load. So the first CachedResourceRequest needs to live until the load completes so that we can report to Resource Timing that it finished.

I've changed this so that the SubresourceLoader hangs on to the CachedResourceRequest instead. I hope that makes both of these points clearer.

> 
> > Source/WebCore/loader/cache/CachedResourceRequest.h:74
> > +    RefPtr<Element> m_initiatorElement;
> > +    RefPtr<Document> m_initiatorDocument;
> 
> This seems very likely to create a reference cycle.  Certainly the way this patch has CachedResource holding the CachedResourceRequest, that means the MemoryCache (with its static lifetime) is transitively holding the entire document.....  I don't understand how we're managing the lifetimes correctly in this patch.  Perhaps it's worth explaining the design in the ChangeLog?

See above. Now it's owned by the loader, which is only alive throughout the load. Does that help clarify it?
Comment 9 Build Bot 2012-11-06 21:40:42 PST
Comment on attachment 172712 [details]
Patch

Attachment 172712 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/14747644
Comment 10 Adam Barth 2012-11-07 10:03:14 PST
I don't think we should push the CachedResourceRequest further into the loader pipeline.  I'm sorry if I wasn't clear earlier.  The problem is that there can be many CachedResourceRequests that result in a single CachedResource and a single ResourceRequest.

The idea behind storing the initiator on the CachedResourceRequest is to let the requester hold on to the initiator by remember its own CachedResourceRequest.  Pushing CachedResourceRequest below CachedResourceLoader doesn't make sense given the relationship between these objects.
Comment 11 James Simonsen 2012-11-07 11:21:49 PST
(In reply to comment #10)
> The idea behind storing the initiator on the CachedResourceRequest is to let the requester hold on to the initiator by remember its own CachedResourceRequest.  Pushing CachedResourceRequest below CachedResourceLoader doesn't make sense given the relationship between these objects.

The initiator data must be kept separate from the requester, because they have different lifetimes. For Resource Timing, the initiator is the first object to request a resource, no matter what happens to that object. If that object is destroyed before it finishes loading the resources, but another request finishes loading the resource, then *the first (destroyed) object is recorded as the initiator.*

Also, for conceptual purposes, an "initiator" is a "requester." Specifically, it's the requester that triggered a network load. For a cached resource, that's the first one. For an uncacheable resource, it's each individual request.
Comment 12 Adam Barth 2012-11-07 13:20:35 PST
> For Resource Timing, the initiator is the first object to request a resource, no matter what happens to that object.

That seems like a bad design.  That means we need to keep the first object alive for an arbitrary amount of time just in case some other object ends up loading this resource in the future.
Comment 13 James Simonsen 2012-11-07 14:16:16 PST
(In reply to comment #12)
> > For Resource Timing, the initiator is the first object to request a resource, no matter what happens to that object.
> 
> That seems like a bad design.  That means we need to keep the first object alive for an arbitrary amount of time just in case some other object ends up loading this resource in the future.

Perhaps I worded it poorly. The example I gave is correct though.

I'll try again:

The initiator is bound to the resource load, not the resource itself. If the load is stopped for any reason, we can forget about the initiator.

If the load transfers ownership while in progress, we must remember the initiator until the load completes. This is the example I gave earlier, about requester 1 being destroyed, but requester 2 still waiting. In that case, we must remember the initiator data from requester 1.

Does that help? If not, maybe read the processing model here:

http://w3c-test.org/webperf/specs/ResourceTiming/#processing-model
Comment 14 Adam Barth 2012-11-07 14:23:47 PST
"If the resource is not to be fetched from the networking layer, such as being fetched from an in-memory cache, abort the remaining steps."

It sounds like this algorithm was designed by an implementer with a memory cache that works differently from WebKit's MemoryCache.  In particular, this step will always cause us to abort the remaining steps because we fetch all these resources from an in-memory cache.

It doesn't make sense to me that two documents loading at the same time in different tabs will have their initiator information cross-talk though the MemoryCache if we happen to load them in the same process but will not if we load them in different processes.

If we decide to implement an out-of-process network stack akin to what WebKit2 is doing with a shared memory MemoryCache, it's not at all clear to me that we can implement this behavior in a sensible way.

In summary, this seems like a bad design.  The API we expose to the web should not depend on implementation details about how we decide to merge, fork, or prefetch resources.  By avoiding API commitments about how these processes work internally, we free ourselves to continue to innovate in this area.
Comment 15 James Simonsen 2012-11-07 14:41:54 PST
(In reply to comment #14)
> "If the resource is not to be fetched from the networking layer, such as being fetched from an in-memory cache, abort the remaining steps."
> 
> It sounds like this algorithm was designed by an implementer with a memory cache that works differently from WebKit's MemoryCache.  In particular, this step will always cause us to abort the remaining steps because we fetch all these resources from an in-memory cache.
> 
> It doesn't make sense to me that two documents loading at the same time in different tabs will have their initiator information cross-talk though the MemoryCache if we happen to load them in the same process but will not if we load them in different processes.
> 
> If we decide to implement an out-of-process network stack akin to what WebKit2 is doing with a shared memory MemoryCache, it's not at all clear to me that we can implement this behavior in a sensible way.
> 
> In summary, this seems like a bad design.  The API we expose to the web should not depend on implementation details about how we decide to merge, fork, or prefetch resources.  By avoiding API commitments about how these processes work internally, we free ourselves to continue to innovate in this area.

I don't think it's as bad as it sounds.

The intent is simple: any time the document does something that causes a resource to load from the network, record it.

Also note this at the top of processing model: "This section is non-normative."

Clearly the spec doesn't cover multiple tab situations. That seems like an oversight. The WG is meeting here tomorrow. I'll be happy to raise that issue in person.
Comment 16 Adam Barth 2012-11-07 14:48:31 PST
The processing model is non-normative?  That doesn't make much sense.  :(
Comment 17 Brady Eidson 2012-11-07 14:57:05 PST
(In reply to comment #15)
> (In reply to comment #14)
> Clearly the spec doesn't cover multiple tab situations. That seems like an oversight. The WG is meeting here tomorrow. I'll be happy to raise that issue in person.

Will there be IRC and/or a call-in?  Any room for observers?
Comment 18 James Simonsen 2012-11-07 14:58:46 PST
(In reply to comment #16)
> The processing model is non-normative?  That doesn't make much sense.  :(

Tony and I are on the WG. We can push for changes if you have requests.

We felt the spec serves the purpose of what developers want (effectively inspector's network timeline available remotely). We just need to figure out how to work it into WebKit.

In the meantime, what do we need to do to move forward? Marja is still waiting on this too.
Comment 19 James Simonsen 2012-11-07 15:00:41 PST
(In reply to comment #17)
> (In reply to comment #15)
> > (In reply to comment #14)
> > Clearly the spec doesn't cover multiple tab situations. That seems like an oversight. The WG is meeting here tomorrow. I'll be happy to raise that issue in person.
> 
> Will there be IRC and/or a call-in?  Any room for observers?

This is a workshop rather than a normal meeting, so I'm not sure. The details are here:

http://www.w3.org/2012/11/performance-workshop/

Our weekly conference calls are Wednesdays at 10am PST. See public-web-perf@w3.org for details.
Comment 20 Adam Barth 2012-11-07 15:08:43 PST
> In the meantime, what do we need to do to move forward? Marja is still waiting on this too.

Does Marja's use case involve storing the CachedResourceRequest in the CachedResource (or lower down in the loader)?  If not, it might be easier to work on her use case first.

In general, though, what we need to move forward is a design that doesn't require picking a "most special" requester among all the requesters of a resource.
Comment 21 Brady Eidson 2012-11-07 15:12:24 PST
I've been pretty quiet so far, but want to go on the record as echo'ing Adam's objections pretty much point for point.
Comment 22 James Simonsen 2012-11-07 15:41:38 PST
(In reply to comment #20)
> > In the meantime, what do we need to do to move forward? Marja is still waiting on this too.
> 
> Does Marja's use case involve storing the CachedResourceRequest in the CachedResource (or lower down in the loader)?  If not, it might be easier to work on her use case first.

From Jochen's reply, it sounds like they need it available whenever willSendRequest() is called. I think that implies that it sticks around for a bit.

Maybe they can chime in on exactly where they would like the information available (resource or loader).

> 
> In general, though, what we need to move forward is a design that doesn't require picking a "most special" requester among all the requesters of a resource.

Since the processing model is non-normative (even I hadn't noticed that before), I'd be fine with coming up with our own definition. Say, first active requester when the load completes. I'll then take up the issue with the WG.
Comment 23 Marja Hölttä 2012-11-08 00:49:07 PST
For our use case, we want to track events like "image element requested to load this image", not the actual load.

If two image elements request the same image, we want to get both events "img1  requested the image" and "img2 requested the image", and we don't care how these requests are served (e.g., the first one will cause a load from the network and the second one will be served from the cache).

So when the CachedResourceLoader gets the request to load the image, we'd notify the embedder about it, and that's it. The embedder can then store a reference to the Element if it wants, but our code doesn't need to store any references. Everything that happens after that point (actually loading the resource etc.) would be unaffected by our changes.
Comment 24 James Simonsen 2012-11-13 17:43:29 PST
This patch is probably dead now, but I want to follow up on the Resource Timing part...

First, the processing model *is* normative. The diagram is not. The editors will clarify that.

Second, everyone agreed the caching scenarios are not adequately defined. It's not just us. We didn't reach a conclusion during the F2F, so I started a thread here:

http://lists.w3.org/Archives/Public/public-web-perf/2012Nov/thread.html

Finally, here's another stab at moving forward (assuming the main problem is hanging on to Documents and Elements too long)...

Have a ResourceTimingNotifier singleton.

Whenever a CachedResourceRequest is sent, record the initiator's name, Document, and a pointer to its ResourceLoader (solely to identify it) in the singleton.

When a ResourceLoader finishes, notify the ResourceTimingNotifier. The ResourceTimingNotifier will populate the appropriate Documents' ResourceTiming buffers with the finished information.

If a Document is deleted, it will remove itself from the ResourceTimingNotifier.

So, the ResourceTimingNotifier only stores raw pointers, which means no cycles, and we just need to update it at the correct times. If the raw pointers are unappealing, we can assign unique IDs.

Is this better? Or worse?
Comment 25 Adam Barth 2012-11-13 22:54:20 PST
Why can't we just use the normal client callbacks to do this work?  The ResourceTimingNotifier static just seems like a back door from ResourceLoader to Document, bypassing the layering in the loader.

I don't fully understand Brady's plan for moving the loader out of process in WebKit2, so I can't comment on any implications that has for the design of this feature.
Comment 26 James Simonsen 2012-11-14 17:11:49 PST
(In reply to comment #25)
> Why can't we just use the normal client callbacks to do this work?  The ResourceTimingNotifier static just seems like a back door from ResourceLoader to Document, bypassing the layering in the loader.

That seems like a reasonable start. There are a few details to work out:

How does Resource Timing register for those callbacks? There's one Resource Timing buffer per DOMWindow. Really it should only get callbacks for requests in its document.

Perhaps DocumentLoader could be that central registration point? It already keeps track of all the SubresourceLoaders. It could also track the ThreadableLoaders and (child) FrameLoaders.

Otherwise, maybe some sort of singleton where documents can register additional clients to append to all loads?

Then, we want Resource Timing to register as a client for FrameLoader, ThreadableLoader, and SubresourceLoader loads. I think the first two will need to be modified to accept multiple clients. That should be easy. Then we just need to inject the additional clients any time a load happens.

Here's another alternative building off of the DocumentLoader idea and addressing the layering issue...

Implement Resource Timing in DocumentLoader. DocumentLoader tracks the loads and populates the Resource Timing buffer. Then just have the Resource Timing API pull data from DocumentLoader as needed.