Bug 109952 - Merge MainResourceLoader's didFinishLoading and dataReceived into DocumentLoader
Summary: Merge MainResourceLoader's didFinishLoading and dataReceived into DocumentLoader
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nate Chapin
URL:
Keywords:
Depends on:
Blocks: 104969
  Show dependency treegraph
 
Reported: 2013-02-15 10:07 PST by Nate Chapin
Modified: 2013-03-13 12:13 PDT (History)
5 users (show)

See Also:


Attachments
patch (14.07 KB, patch)
2013-02-15 10:14 PST, Nate Chapin
no flags Details | Formatted Diff | Diff
address comments (15.47 KB, patch)
2013-03-11 15:39 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff
Patch for landing (15.90 KB, patch)
2013-03-13 11:42 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nate Chapin 2013-02-15 10:07:15 PST
Also, since MainResourceLoader's content filtering is primarily implemented in these 2 functions, move the remaining references to ContentFilter as well.
Comment 1 Nate Chapin 2013-02-15 10:14:20 PST
Created attachment 188597 [details]
patch
Comment 2 Alexey Proskuryakov 2013-02-15 14:14:38 PST
I'm not sure if I understand the strategy. DocumentLoader is responsible for main resource and subresources. Why is it desirable to fold code for main resource loading into it?
Comment 3 Nate Chapin 2013-02-15 14:37:29 PST
(In reply to comment #2)
> I'm not sure if I understand the strategy. DocumentLoader is responsible for main resource and subresources. Why is it desirable to fold code for main resource loading into it?

The primary rationale is that the boundary between MainResourceLoader and DocumentLoader is very poorly defined, especially now that MainResourceLoader does not subclass ResourceLoader, and since DocumentLoader already does quite a bit of the work related to loading the main resource. It seems pretty clear (to me anyway) that we should either merge them or create a clearer sense of each class's responsibilities. I couldn't come up with clear responsibilities, so I figured merging made more sense.

Antti initially suggested the merger to me a couple months ago in #webkit. Unfortunately I appear to have lost the logs of that conversation, so I can't remember the full details of his thinking.
Comment 4 Darin Adler 2013-02-15 17:08:29 PST
Comment on attachment 188597 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188597&action=review

Seems like it’s a good direction for refactoring, but I am not confident this does not change behavior, since it changes the order of operations and moves some calls that change internal state across frame loader calls that make their way all the way out to clients, at least on Mac with WebKit1.

I am torn because there is no obvious review- here, but there are a lot of questions. I am not going to say review+ or review-, but I wish I could do one or the other.

> Source/WebCore/loader/DocumentLoader.h:58
> +#if USE(CONTENT_FILTERING)
> +    class ContentFilter;
> +#endif

I think it’s overkill to put #if around forward declarations. Lets just include them unconditionally.

If you do want to put an #if around this, then please put it in a separate paragraph as it was in MainResourceLoader.h. We don’t mix #if in with sorted lists like this.

> Source/WebCore/loader/DocumentLoader.h:200
> +        void cancelMainResourceLoad(const ResourceError& = ResourceError());

Is this really clearer than creating an empty error at the call site? I worry about default arguments like this one.

> Source/WebCore/loader/MainResourceLoader.cpp:-518
> -    // The additional processing can do anything including possibly removing the last
> -    // reference to this object; one example of this is 3266216.
> -    RefPtr<MainResourceLoader> protect(this);

Why is this no longer needed? I understand that this function itself no longer accesses MainResourceLoader, but removing this can change the time of destruction of various objects. How did you prove it was OK?

> Source/WebCore/loader/DocumentLoader.cpp:294
> +    RefPtr<DocumentLoader> protect(this);

Seems OK to add this, but how did you know it was needed?

> Source/WebCore/loader/DocumentLoader.cpp:328
> +    m_applicationCacheHost->finishedLoadingMainResource();

This is now called before removing the item from the memory cache at the call site. Is it OK to change that?

> Source/WebCore/loader/DocumentLoader.cpp:443
> +#if USE(CFNETWORK) || PLATFORM(MAC)
> +    // Workaround for <rdar://problem/6060782>
> +    if (m_response.isNull())
> +        setResponse(ResourceResponse(KURL(), "text/html", 0, String(), String()));
> +#endif

This now happens after dispatchDidReceiveData, so it seems the new timing will be observable by clients. Why is that OK?

> Source/WebCore/loader/DocumentLoader.cpp:448
> +    ASSERT(!m_frame->page()->defersLoading());

What guarantees m_frame is non-zero?

What guarantees m_frame->page() is non-zero?

The old code didn’t rely on either of these things.

> Source/WebCore/loader/DocumentLoader.cpp:467
> +#if USE(CONTENT_FILTERING)
> +    bool loadWasBlockedBeforeFinishing = false;
> +    if (m_contentFilter && m_contentFilter->needsMoreData()) {
> +        m_contentFilter->addData(data, length);
> +
> +        if (m_contentFilter->needsMoreData()) {
> +            // Since the filter still needs more data to make a decision,
> +            // transition back to the committed state so that we don't partially
> +            // load content that might later be blocked.
> +            commitLoad(0, 0);
> +            return;
> +        }
> +
> +        data = m_contentFilter->getReplacementData(length);
> +        loadWasBlockedBeforeFinishing = m_contentFilter->didBlockData();
> +    }
> +#endif

This now happens after dispatchDidReceiveData, so it seems the new timing will be observable by clients. Why is that OK?

> Source/WebCore/loader/DocumentLoader.cpp:478
> +#if USE(CONTENT_FILTERING)
> +    if (loadWasBlockedBeforeFinishing)
> +        cancelMainResourceLoad();
> +#endif

What guarantees it’s OK to do this after calling commitLoad? Can’t the document loader be deleted as a side effect of doing that?

> Source/WebCore/loader/DocumentLoader.cpp:960
> -    finishedLoading();
> +    finishedLoading(0);

The addition of this mysterious 0 is not good. It seems that 0 is a magic value that says “please compute the load finish time”. Concept is not new to this patch, but this call site is new, and this seems unpleasantly mysterious.

Also strange to use 0 here and 0.0 elsewhere in the same source file for the same purpose.
Comment 5 Nate Chapin 2013-02-19 14:16:08 PST
I tried to do a brain dump on all of the concerns, I hope I was coherent :)

Will work on a new version based on this feedback.

(In reply to comment #4)
> (From update of attachment 188597 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=188597&action=review
> 
> Seems like it’s a good direction for refactoring, but I am not confident this does not change behavior, since it changes the order of operations and moves some calls that change internal state across frame loader calls that make their way all the way out to clients, at least on Mac with WebKit1.
> 
> I am torn because there is no obvious review- here, but there are a lot of questions. I am not going to say review+ or review-, but I wish I could do one or the other.
> 
> > Source/WebCore/loader/DocumentLoader.h:58
> > +#if USE(CONTENT_FILTERING)
> > +    class ContentFilter;
> > +#endif
> 
> I think it’s overkill to put #if around forward declarations. Lets just include them unconditionally.
> 
> If you do want to put an #if around this, then please put it in a separate paragraph as it was in MainResourceLoader.h. We don’t mix #if in with sorted lists like this.

I don't feel strongly, I had just copied the old formatting. Will fix.

> 
> > Source/WebCore/loader/DocumentLoader.h:200
> > +        void cancelMainResourceLoad(const ResourceError& = ResourceError());
> 
> Is this really clearer than creating an empty error at the call site? I worry about default arguments like this one.

Would it be better to have two versions of the function, one with 1 argument, one having 0 arguments and calling the 1-argument with a default value? That was how it was done in MainResourceLoader::cancel.

> 
> > Source/WebCore/loader/MainResourceLoader.cpp:-518
> > -    // The additional processing can do anything including possibly removing the last
> > -    // reference to this object; one example of this is 3266216.
> > -    RefPtr<MainResourceLoader> protect(this);
> 
> Why is this no longer needed? I understand that this function itself no longer accesses MainResourceLoader, but removing this can change the time of destruction of various objects. How did you prove it was OK?

The only RefPtr member of MainResourceLoader is a RefPtr<DocumentLoader>. The layout tests pass without any RefPtr here, but it's possible there's a case that would require one. If I were to add a RefPtr here, I would say protect(this) should be sufficient.

The only other member of MainResourceLoader that could theoretically cause a problem is m_resource, but given that m_resource is private to MainResourceLoader and is protected heavily in SubresourceLoader, it shouldn't get killed if it's going to still be needed.


> 
> > Source/WebCore/loader/DocumentLoader.cpp:294
> > +    RefPtr<DocumentLoader> protect(this);
> 
> Seems OK to add this, but how did you know it was needed?

This was moved from MainResourceLoader::didFinishLoading(). IIRC, I tried without it and there was a use-after-free, but I'll double-check.


> 
> > Source/WebCore/loader/DocumentLoader.cpp:328
> > +    m_applicationCacheHost->finishedLoadingMainResource();
> 
> This is now called before removing the item from the memory cache at the call site. Is it OK to change that?

I think so, but I'm not absolutely positive. The only problem would be if a main resource request for the current url was triggered from within finishedLoadingMainResource(). That is clearly impossible for chromium's ApplicationCacheHost implementation. I can't find a case where that would be possible in the WebCore/loader/appcache/ApplicationCacheHost.cpp implementation, but I might have missed something.

I will see about exposing the information required to move the memory cache removal into DocumentLoader to maintain the ordering.

> 
> > Source/WebCore/loader/DocumentLoader.cpp:443
> > +#if USE(CFNETWORK) || PLATFORM(MAC)
> > +    // Workaround for <rdar://problem/6060782>
> > +    if (m_response.isNull())
> > +        setResponse(ResourceResponse(KURL(), "text/html", 0, String(), String()));
> > +#endif
> 
> This now happens after dispatchDidReceiveData, so it seems the new timing will be observable by clients. Why is that OK?

I believe these cases are mutually excluseive. The dispatchDidReceiveData() call in MainResourceLoader only happens for SubstituteData loads and MemoryCache hits. The memory cache won't hit if its m_response is null and it will always set a response for a cache hit (see CachedResource::addClientToSet and CachedRawResource::didAddClient). SubstituteData loads also will set a response before sending data (see MainResourceLoader::handleSubstituteDataLoadNow).

> 
> > Source/WebCore/loader/DocumentLoader.cpp:448
> > +    ASSERT(!m_frame->page()->defersLoading());
> 
> What guarantees m_frame is non-zero?
> 
> What guarantees m_frame->page() is non-zero?
> 
> The old code didn’t rely on either of these things.

My thought had been that neither of these callbacks should be possible in those cases. If DocumentLoader has been detached from its Frame or the Frame from its Page, that should trigger cancellation of all resource loads, which should preclude callbacks for receiving data or finishing loading. I couldn't find any case where ResourceLoader::defersLoading() and Page::defersLoading() were able to be different, but it's possible I missed something.

> 
> > Source/WebCore/loader/DocumentLoader.cpp:467
> > +#if USE(CONTENT_FILTERING)
> > +    bool loadWasBlockedBeforeFinishing = false;
> > +    if (m_contentFilter && m_contentFilter->needsMoreData()) {
> > +        m_contentFilter->addData(data, length);
> > +
> > +        if (m_contentFilter->needsMoreData()) {
> > +            // Since the filter still needs more data to make a decision,
> > +            // transition back to the committed state so that we don't partially
> > +            // load content that might later be blocked.
> > +            commitLoad(0, 0);
> > +            return;
> > +        }
> > +
> > +        data = m_contentFilter->getReplacementData(length);
> > +        loadWasBlockedBeforeFinishing = m_contentFilter->didBlockData();
> > +    }
> > +#endif
> 
> This now happens after dispatchDidReceiveData, so it seems the new timing will be observable by clients. Why is that OK?

I think the same logic for the previous dispatchDidReceiveData reordering concern still applies.  Perhaps I should expose enough information to do the call to dispatchDidReceiveData in its current order just to remove that concern, though.

> 
> > Source/WebCore/loader/DocumentLoader.cpp:478
> > +#if USE(CONTENT_FILTERING)
> > +    if (loadWasBlockedBeforeFinishing)
> > +        cancelMainResourceLoad();
> > +#endif
> 
> What guarantees it’s OK to do this after calling commitLoad? Can’t the document loader be deleted as a side effect of doing that?

This ordering is already the case (though it's hidden by the MainResourceLoader/DocumentLoader boundary). I don't know enough about ContentFilter to know why this ordering was safe.

> 
> > Source/WebCore/loader/DocumentLoader.cpp:960
> > -    finishedLoading();
> > +    finishedLoading(0);
> 
> The addition of this mysterious 0 is not good. It seems that 0 is a magic value that says “please compute the load finish time”. Concept is not new to this patch, but this call site is new, and this seems unpleasantly mysterious.
> 
> Also strange to use 0 here and 0.0 elsewhere in the same source file for the same purpose.

0 vs 0.0 : totally right
0 vs something that isn't a magic number: Perhaps I should switch this to currentTime() and see what breaks?
Comment 6 Antti Koivisto 2013-02-19 15:02:46 PST
(In reply to comment #3)
> Antti initially suggested the merger to me a couple months ago in #webkit. Unfortunately I appear to have lost the logs of that conversation, so I can't remember the full details of his thinking.

Right, that was the reasoning. There is no point of them existing as separate entities anymore. It just creates complexity and confusion.
Comment 7 Antti Koivisto 2013-02-19 15:42:12 PST
Comment on attachment 188597 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188597&action=review

Looks good to me.

>>> Source/WebCore/loader/DocumentLoader.h:200
>>> +        void cancelMainResourceLoad(const ResourceError& = ResourceError());
>> 
>> Is this really clearer than creating an empty error at the call site? I worry about default arguments like this one.
> 
> Would it be better to have two versions of the function, one with 1 argument, one having 0 arguments and calling the 1-argument with a default value? That was how it was done in MainResourceLoader::cancel.

Providing the empty error value explicitly is probably best. It can have informative name.

>>> Source/WebCore/loader/DocumentLoader.cpp:960
>>>      setResponse(ResourceResponse(m_request.url(), mimeType, 0, String(), String()));
>>> -    finishedLoading();
>>> +    finishedLoading(0);
>> 
>> The addition of this mysterious 0 is not good. It seems that 0 is a magic value that says “please compute the load finish time”. Concept is not new to this patch, but this call site is new, and this seems unpleasantly mysterious.
>> 
>> Also strange to use 0 here and 0.0 elsewhere in the same source file for the same purpose.
> 
> 0 vs 0.0 : totally right
> 0 vs something that isn't a magic number: Perhaps I should switch this to currentTime() and see what breaks?

monotonicallyIncreasingTime() seems like the right thing here.
Comment 8 Nate Chapin 2013-03-11 15:39:50 PDT
Created attachment 192582 [details]
address comments
Comment 9 Antti Koivisto 2013-03-13 11:11:31 PDT
Comment on attachment 192582 [details]
address comments

View in context: https://bugs.webkit.org/attachment.cgi?id=192582&action=review

> Source/WebCore/loader/DocumentLoader.cpp:330
> +    timing()->setResponseEnd(finishTime ? finishTime : (m_timeOfLastDataReceived ? m_timeOfLastDataReceived : monotonicallyIncreasingTime()));

A variable would be nice.
Comment 10 Antti Koivisto 2013-03-13 11:11:49 PDT
r=me
Comment 11 Nate Chapin 2013-03-13 11:42:23 PDT
Created attachment 192954 [details]
Patch for landing
Comment 12 WebKit Review Bot 2013-03-13 12:13:48 PDT
Comment on attachment 192954 [details]
Patch for landing

Clearing flags on attachment: 192954

Committed r145734: <http://trac.webkit.org/changeset/145734>
Comment 13 WebKit Review Bot 2013-03-13 12:13:51 PDT
All reviewed patches have been landed.  Closing bug.