Bug 30457 - Allow image requests started from unload handlers to outlive the page
Summary: Allow image requests started from unload handlers to outlive the page
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Nate Chapin
URL:
Keywords:
Depends on: 32529
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-16 14:07 PDT by John Abd-El-Malek
Modified: 2010-08-24 11:11 PDT (History)
12 users (show)

See Also:


Attachments
patch (33.43 KB, patch)
2009-12-09 14:29 PST, Nate Chapin
darin: review-
Details | Formatted Diff | Diff
Attempt #2 (27.67 KB, patch)
2009-12-16 11:04 PST, Nate Chapin
no flags Details | Formatted Diff | Diff
Attempt #3 (23.54 KB, patch)
2009-12-18 13:16 PST, Nate Chapin
darin: review-
Details | Formatted Diff | Diff
Attempt #4 (23.69 KB, patch)
2009-12-18 16:32 PST, Nate Chapin
no flags Details | Formatted Diff | Diff
Patch without the crash (27.43 KB, patch)
2010-01-25 09:25 PST, Nate Chapin
no flags Details | Formatted Diff | Diff
Attempt #6 (or thereabouts) (27.37 KB, patch)
2010-04-26 16:40 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff
Attempt #7 (clean up weird layering) (31.82 KB, patch)
2010-05-03 15:29 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff
Merge in recent changes to FrameLoader (27.09 KB, patch)
2010-05-17 11:38 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff
+ more thorough ChangeLog (28.60 KB, patch)
2010-06-15 13:54 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff
Rewrite at ResourceHandleClient level (19.38 KB, patch)
2010-07-22 12:59 PDT, Nate Chapin
no flags Details | Formatted Diff | Diff
ResourceHandleClient-level solution + rewritten layout test (20.62 KB, patch)
2010-08-18 13:29 PDT, Nate Chapin
levin: review-
Details | Formatted Diff | Diff
Addressing levin's comments (21.20 KB, patch)
2010-08-23 13:26 PDT, Nate Chapin
levin: review+
levin: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Abd-El-Malek 2009-10-16 14:07:39 PDT
Creating a bug from the discussion on webkit-dev.  There's consensus for allowing images loaded in unload handlers to outlive the page, in order to maintain compatibility with IE.  This hack is used by a bunch of ad networks for tracking purposes.  Since the image doesn't currently outlive the page, they resort to simulating sleep() in JavaScript by busy looping.  By letting the image load outlive the page, the goal is that web developers won't resort to such hacks which hurt the user.
Comment 1 John Abd-El-Malek 2009-10-16 14:52:57 PDT
The webkit-dev thread: https://lists.webkit.org/pipermail/webkit-dev/2009-September/thread.html#9925
Comment 2 Nate Chapin 2009-12-04 11:08:37 PST
I've been working on this, I hope to have a patch ready for review by end of day.
Comment 3 Nate Chapin 2009-12-09 14:29:46 PST
Created attachment 44561 [details]
patch

Patch, with thanks to jam@chromium.org for starting it and giving me something to work with.
Comment 4 WebKit Review Bot 2009-12-09 14:31:58 PST
Attachment 44561 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/loader/loader.cpp:571:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 1
Comment 5 Nate Chapin 2009-12-09 14:41:39 PST
(In reply to comment #4)
> Attachment 44561 [details] did not pass style-queue:
> 
> Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
> WebCore/loader/loader.cpp:571:  Boolean expressions that span multiple lines
> should have their operators on the left side of the line instead of the right
> side.  [whitespace/operators] [4]
> Total errors found: 1

Fixed, but I won't bother uploading a new patch for a single line style change.
Comment 6 Darin Adler 2009-12-11 10:13:55 PST
Comment on attachment 44561 [details]
patch

Thanks for taking this on. This is looking good!

I have one or two significant design comments and lots of minor naming and structure comments.

> +    , m_beforeUnloadEventBeingDispatched(false)

Best practice for boolean data and function members is to name them with an adjective phrase like this "frame loader <xxx>".

So following that, the name would be m_isDispatchingBeforeUnloadEvent.

> +    m_beforeUnloadEventBeingDispatched = true;
>      bool canContinue = shouldContinue && (!isLoadingMainFrame() || m_frame->shouldClose());
> +    m_beforeUnloadEventBeingDispatched = false;

Can this ever re-enter? Maybe we should assert !m_beforeUnloadEventBeingDispatched to catch it if it ever is in a debug build.

> +        void setOutlivePage(bool outlivePage) { m_outlivePage = outlivePage; }
> +        bool outlivePage() const { return m_outlivePage; }

Best practice for getter functions is to make sure their names are not something that sound like a verb phrase. So these should be:

    setShouldOutlivePage
    shouldOutlivePage
    m_shouldOutlivePage
    m_shouldSkipSecurityCheck

If the thing is named like a verb, then the getter function sounds like a function that will take action.

> +    , m_createdDuringUnload(isDispatchingUnloadEvent)

Should be m_wasCreatedDuringUnload and wasCreatedDuringUnload for reasons similar to those mentioned above.

> +    , m_frameForOutlivingRequests(0)

No need for this line of code. RefPtr starts out 0 without an explicit initializer.

> -    CachedImage(const String& url);
> +    CachedImage(const String& url, bool isDispatchingUnloadEvent);

I don't think this design is quite right. You might already have a CachedImage because some other page tried to load the same image. The CachedImage itself is shared among multiple requesters of the image. So it doesn't make logical sense to have that shared CachedImage object have a boolean that depends on the timing of the first requestImage call.

If we do need to keep the boolean, I think "wasFirstRequestedDuringUnload" is more precise than "createdDuringUnload" and draws attention to this curious rule in a way I think could be helpful in understanding behavior in the future.

I think the right way to do this is probably to look at the outstanding requests and answer a question about whether at least one or whether all are last-minute requests rather than storing a boolean in the CachedImage.

> -PassRefPtr<SubresourceLoader> SubresourceLoader::create(Frame* frame, SubresourceLoaderClient* client, const ResourceRequest& request, bool skipCanLoadCheck, bool sendResourceLoadCallbacks, bool shouldContentSniff)
> +PassRefPtr<SubresourceLoader> SubresourceLoader::create(Frame* frame, SubresourceLoaderClient* client, const ResourceRequest& request, bool skipSecurityCheck, bool outlivePage, bool sendResourceLoadCallbacks, bool shouldContentSniff)

Name should be "shouldOutlivePage".

> +    void incrementOutlivePageRequestCount();
> +    void decrementOutlivePageRequestCount();
> +    int outlivePageRequestCount() const { return m_outlivePageRequestCount; }
> +    Frame* frameForOutlivingRequests() { return m_frameForOutlivingRequests.get(); }

> +    int m_outlivePageRequestCount;
> +    RefPtr<Frame> m_frameForOutlivingRequests;

These names are awkward. We should try to do better. A good technique is to write out an English sentence explaining what a function does or data is, and then extract words from the sentence and use them in that order. You can't make the words "increment outlive page request count" into a sentence, because "outlive page request" doesn't work well. And I think we could find a name that doesn't require that wording.

Maybe we can come up with a term for these requests that is clearer than "outliving requests" and use that term in a lot more of this code. What's a term that in plain English would sound like this concept? Maybe something crazy like "last gasp requests"? That's not so good, but I think it's better than "outliving requests". And I think it's critical that we call them the same thing in the various functions. Here we call them "outliving requests" and "outlive page requests" side by side -- we should choose one term or the other.

I'm also concerned that the increment/decrement API here makes it easy to get off by one and leak. The code that manages the count needs to be extra clear.

> -        RefPtr<SubresourceLoader> loader = SubresourceLoader::create(docLoader->doc()->frame(),
> -            this, resourceRequest, request->shouldSkipCanLoadCheck(), request->sendResourceLoadCallbacks());
> +        bool outlivePage = request->outlivePage();
> +        Frame* frame = outlivePage ? docLoader->frameForOutlivingRequests() : docLoader->doc()->frame();
> +        RefPtr<SubresourceLoader> loader = SubresourceLoader::create(frame,
> +            this, resourceRequest, request->skipSecurityCheck(), outlivePage, request->sendResourceLoadCallbacks());

I think this would be slightly better if factored differently. The function in DocLoader would take a Request* argument and return the frame to use for the load. The policy that it's the "outliving requests" that use a different frame would be in DocLoader.

> +            if (outlivePage)
> +                docLoader->decrementOutlivePageRequestCount();

We have a minor design problem here. This Loader function is one of five different places that call the decrementOutlivePageRequestCount function. It's quite difficult to read the Loader code and spot whether there are any mismatches between increment and decrement. We should look to see if we can change the structure so that it's tighter and a little more clear that the increments and decrements will always be balanced.

Maybe the best way to do it would be to pass the Request* to increment/decrementRequestCount instead of having a separate increment/decrement for requests that will outlive the page.

> +            newImage = new CachedImage(sourceURI(attr), document->docLoader()->frame()->loader()->isDispatchingUnloadEvent());

Can any of the items in this expression be zero? For example, can the frame be zero?

> -        bool skipCanLoadCheck = false;
> -        loadRequest(request, skipCanLoadCheck);
> +        bool skipSecurityCheck = false;
> +        loadRequest(request, skipSecurityCheck);

Our best practice for this is now to use an enum rather than a boolean. This would avoid the curious local variable that is declared simply to give a new name to "false". Since you are changing all these call sites it might be nice to move from a boolean to an enum. In fact, the rename from skipCanLoadCheck to skipSecurityCheck really ought to happen in a separate patch, first. That would help this work get done more quickly, I think. Smaller patches are better.

> +    bool m_beforeUnloadEventBeingDispatched;
>      bool m_unloadEventBeingDispatched;

Your new boolean should be named m_isDipatchingBeforeUnloadEvent. And at some point the old boolean should be renamed m_isDispatchingUnloadEvent.

Given all the different code paths, I think we need more test cases. For example, I could probably remove some of the calls to decrementOutlivePageRequestCount without making the test fail. We should strive to have tests that would fail if we do the code wrong.

review- because I'd like to see the CachedImage boolean reconsidered and like to see quite a few more tests if possible

It would also be good to consider my other comments and suggestions, especially the one about doing the rename in a separate earlier patch, and the increment/decrement tightening and simplification.
Comment 7 Nate Chapin 2009-12-11 10:29:58 PST
(In reply to comment #6)
> (From update of attachment 44561 [details])
> Thanks for taking this on. This is looking good!
> 
> I have one or two significant design comments and lots of minor naming and
> structure comments.
> 
> > +    , m_beforeUnloadEventBeingDispatched(false)
> 
> Best practice for boolean data and function members is to name them with an
> adjective phrase like this "frame loader <xxx>".
> 
> So following that, the name would be m_isDispatchingBeforeUnloadEvent.
> 
> > +    m_beforeUnloadEventBeingDispatched = true;
> >      bool canContinue = shouldContinue && (!isLoadingMainFrame() || m_frame->shouldClose());
> > +    m_beforeUnloadEventBeingDispatched = false;
> 
> Can this ever re-enter? Maybe we should assert
> !m_beforeUnloadEventBeingDispatched to catch it if it ever is in a debug build.
> 
> > +        void setOutlivePage(bool outlivePage) { m_outlivePage = outlivePage; }
> > +        bool outlivePage() const { return m_outlivePage; }
> 
> Best practice for getter functions is to make sure their names are not
> something that sound like a verb phrase. So these should be:
> 
>     setShouldOutlivePage
>     shouldOutlivePage
>     m_shouldOutlivePage
>     m_shouldSkipSecurityCheck
> 
> If the thing is named like a verb, then the getter function sounds like a
> function that will take action.
> 
> > +    , m_createdDuringUnload(isDispatchingUnloadEvent)
> 
> Should be m_wasCreatedDuringUnload and wasCreatedDuringUnload for reasons
> similar to those mentioned above.
> 
> > +    , m_frameForOutlivingRequests(0)
> 
> No need for this line of code. RefPtr starts out 0 without an explicit
> initializer.
> 
> > -    CachedImage(const String& url);
> > +    CachedImage(const String& url, bool isDispatchingUnloadEvent);
> 
> I don't think this design is quite right. You might already have a CachedImage
> because some other page tried to load the same image. The CachedImage itself is
> shared among multiple requesters of the image. So it doesn't make logical sense
> to have that shared CachedImage object have a boolean that depends on the
> timing of the first requestImage call.
> 
> If we do need to keep the boolean, I think "wasFirstRequestedDuringUnload" is
> more precise than "createdDuringUnload" and draws attention to this curious
> rule in a way I think could be helpful in understanding behavior in the future.
> 
> I think the right way to do this is probably to look at the outstanding
> requests and answer a question about whether at least one or whether all are
> last-minute requests rather than storing a boolean in the CachedImage.
> 
> > -PassRefPtr<SubresourceLoader> SubresourceLoader::create(Frame* frame, SubresourceLoaderClient* client, const ResourceRequest& request, bool skipCanLoadCheck, bool sendResourceLoadCallbacks, bool shouldContentSniff)
> > +PassRefPtr<SubresourceLoader> SubresourceLoader::create(Frame* frame, SubresourceLoaderClient* client, const ResourceRequest& request, bool skipSecurityCheck, bool outlivePage, bool sendResourceLoadCallbacks, bool shouldContentSniff)
> 
> Name should be "shouldOutlivePage".
> 
> > +    void incrementOutlivePageRequestCount();
> > +    void decrementOutlivePageRequestCount();
> > +    int outlivePageRequestCount() const { return m_outlivePageRequestCount; }
> > +    Frame* frameForOutlivingRequests() { return m_frameForOutlivingRequests.get(); }
> 
> > +    int m_outlivePageRequestCount;
> > +    RefPtr<Frame> m_frameForOutlivingRequests;
> 
> These names are awkward. We should try to do better. A good technique is to
> write out an English sentence explaining what a function does or data is, and
> then extract words from the sentence and use them in that order. You can't make
> the words "increment outlive page request count" into a sentence, because
> "outlive page request" doesn't work well. And I think we could find a name that
> doesn't require that wording.
> 
> Maybe we can come up with a term for these requests that is clearer than
> "outliving requests" and use that term in a lot more of this code. What's a
> term that in plain English would sound like this concept? Maybe something crazy
> like "last gasp requests"? That's not so good, but I think it's better than
> "outliving requests". And I think it's critical that we call them the same
> thing in the various functions. Here we call them "outliving requests" and
> "outlive page requests" side by side -- we should choose one term or the other.
> 
> I'm also concerned that the increment/decrement API here makes it easy to get
> off by one and leak. The code that manages the count needs to be extra clear.
> 
> > -        RefPtr<SubresourceLoader> loader = SubresourceLoader::create(docLoader->doc()->frame(),
> > -            this, resourceRequest, request->shouldSkipCanLoadCheck(), request->sendResourceLoadCallbacks());
> > +        bool outlivePage = request->outlivePage();
> > +        Frame* frame = outlivePage ? docLoader->frameForOutlivingRequests() : docLoader->doc()->frame();
> > +        RefPtr<SubresourceLoader> loader = SubresourceLoader::create(frame,
> > +            this, resourceRequest, request->skipSecurityCheck(), outlivePage, request->sendResourceLoadCallbacks());
> 
> I think this would be slightly better if factored differently. The function in
> DocLoader would take a Request* argument and return the frame to use for the
> load. The policy that it's the "outliving requests" that use a different frame
> would be in DocLoader.
> 
> > +            if (outlivePage)
> > +                docLoader->decrementOutlivePageRequestCount();
> 
> We have a minor design problem here. This Loader function is one of five
> different places that call the decrementOutlivePageRequestCount function. It's
> quite difficult to read the Loader code and spot whether there are any
> mismatches between increment and decrement. We should look to see if we can
> change the structure so that it's tighter and a little more clear that the
> increments and decrements will always be balanced.
> 
> Maybe the best way to do it would be to pass the Request* to
> increment/decrementRequestCount instead of having a separate
> increment/decrement for requests that will outlive the page.
> 
> > +            newImage = new CachedImage(sourceURI(attr), document->docLoader()->frame()->loader()->isDispatchingUnloadEvent());
> 
> Can any of the items in this expression be zero? For example, can the frame be
> zero?
> 
> > -        bool skipCanLoadCheck = false;
> > -        loadRequest(request, skipCanLoadCheck);
> > +        bool skipSecurityCheck = false;
> > +        loadRequest(request, skipSecurityCheck);
> 
> Our best practice for this is now to use an enum rather than a boolean. This
> would avoid the curious local variable that is declared simply to give a new
> name to "false". Since you are changing all these call sites it might be nice
> to move from a boolean to an enum. In fact, the rename from skipCanLoadCheck to
> skipSecurityCheck really ought to happen in a separate patch, first. That would
> help this work get done more quickly, I think. Smaller patches are better.
> 
> > +    bool m_beforeUnloadEventBeingDispatched;
> >      bool m_unloadEventBeingDispatched;
> 
> Your new boolean should be named m_isDipatchingBeforeUnloadEvent. And at some
> point the old boolean should be renamed m_isDispatchingUnloadEvent.
> 
> Given all the different code paths, I think we need more test cases. For
> example, I could probably remove some of the calls to
> decrementOutlivePageRequestCount without making the test fail. We should strive
> to have tests that would fail if we do the code wrong.
> 
> review- because I'd like to see the CachedImage boolean reconsidered and like
> to see quite a few more tests if possible
> 
> It would also be good to consider my other comments and suggestions, especially
> the one about doing the rename in a separate earlier patch, and the
> increment/decrement tightening and simplification.

This all seems reasonable.

I had been wondering about the leak potential with the increment/decrement and considered something very similar to what you suggested, but I wasn't confident enough in my judgment to do it without outside confirmation :-)

I'll split the renaming out into a separate patch and get that on its way before I get another patch for this out again.

Thanks!
Comment 8 Darin Adler 2009-12-11 10:35:44 PST
(In reply to comment #6)
> I think the right way to do this is probably to look at the outstanding
> requests and answer a question about whether at least one or whether all are
> last-minute requests rather than storing a boolean in the CachedImage.

Since that might not be efficient in cases where there are a huge number of requests for the same image (say a spacer image with tons of separate <img> elements on a page), another possibility is for CachedImage to keep a count of the total number of requests and of requests that are the last-minute type. I can't remember enough about the design here to know exactly what to suggest. Obviously having another count that can get out of whack is not so great either.
Comment 9 Nate Chapin 2009-12-11 16:22:24 PST
> I'll split the renaming out into a separate patch and get that on its way
> before I get another patch for this out again.

Do you have a preference where I define the enum to take the place of skipSecurityCheck?  I'm not seeing an obvious place that won't require adding in new #includes in several places.
Comment 10 Darin Adler 2009-12-11 16:25:16 PST
> Do you have a preference where I define the enum to take the place of
> skipSecurityCheck?  I'm not seeing an obvious place that won't require adding
> in new #includes in several places.

If you can't find an existing header to put it in, I'd put it in one of the "types-only" headers, like FrameLoaderTypes.h.
Comment 11 Nate Chapin 2009-12-16 11:04:13 PST
Created attachment 44996 [details]
Attempt #2

I believe I either addressed or obsoleted all of your code concerns.

I renamed m_unloadEventBeingDispatched to m_isDispatchingUnloadEvent in this patch, but I can break that into a separate patch too if you like

Finally, I didn't add any new tests yet.  I wouldn't be surprised if they're still needed, but since this patch is much less spread out than the first attempt, I figured I'd ask a second opinion before actually going to the trouble of writing them.
Comment 12 WebKit Review Bot 2009-12-16 11:05:31 PST
Attachment 44996 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/loader/DocLoader.cpp:44:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1
Comment 13 Darin Adler 2009-12-16 16:20:46 PST
Comment on attachment 44996 [details]
Attempt #2

> -                m_unloadEventBeingDispatched = true;
> +                m_isDispatchingUnloadEvent = true;

Maybe we should assert !m_isDispatchingUnloadEvent here?/

> -                m_unloadEventBeingDispatched = false;
> +                m_isDispatchingUnloadEvent = false;

Maybe we should assert !m_isDispatchingUnloadEvent here.

> +    m_isDispatchingBeforeUnloadEvent = false;

Maybe we should assert ! m_isDispatchingBeforeUnloadEvent here.

> -    m_unloadEventBeingDispatched = true;
> +    m_isDispatchingUnloadEvent = true;

Likewise.

> -    m_unloadEventBeingDispatched = false;
> +    m_isDispatchingUnloadEvent = false;

And again.

> -void DocLoader::incrementRequestCount()
> +void DocLoader::addRequest(Request* request)
>  {
>      ++m_requestCount;
> +    if (request->shouldOutlivePage()) {
> +        if (!m_requestsThatOutlivePageCount++)
> +            m_frameForRequestsThatOutlivePage = m_doc->frame();
> +    }
> +        
>  }

Extra blank line there.

It seems kind of strange to choose the Frame based on the first request that should outlive the page. Is it really right for all those later requests for the same resource to also share that one frame? I think the code at least needs a comment to make it clear what the policy is and why that's OK.

> +    int requestsThatOutlivePageCount() { return m_requestsThatOutlivePageCount; }

It's a little strange to expose this just so we can use it in an assertion. Maybe just expose the boolean condition instead? Or not have the assertion?

> +    OutlivePagePolicy shouldOutlivePage = resource->isImage() && docLoader && docLoader->frame() 
> +        && docLoader->frame()->loader()->isDispatchingUnloadEvent()
> +        ? OutlivePage : DoNotOutlivePage;

I think this should go in a paragraph of its own and have its own comment. The comment should explain why images have a different policy than other loads.

> +    // If we're doing a load that can outlive the current page, we don't want this request to block navigating
> +    // away from the current document.
> +    if (request->shouldOutlivePage() == DoNotOutlivePage && (priority > Low || !url.protocolInHTTPFamily() || !hadRequests)) {

The comment is a good idea, but its worded backward from the code, and so a bit hard to read. If you really wanted code that matched that comment, then you'd probably say:

    if (request->shouldOutlivePage() == DoNotOutlivePage)
        priority = Low;

> +    bool isDispatchingUnloadEvent() const { return m_isDispatchingBeforeUnloadEvent || m_isDispatchingUnloadEvent; }

I think it's confusing to have a function member and a data member with the same name, but different meanings. I think the function needs to be renamed.

I'm going to say review+ but I think you might decide to do a new patch based on at least one of my comments above
Comment 14 Nate Chapin 2009-12-18 13:16:14 PST
Created attachment 45176 [details]
Attempt #3

Added ASSERTs in FrameLoader, reworked comments, removed assert in loader::cancelRequests.

The frame for these types of requests is now stored in Request.  You made a good point, logically the frame should be decided each time a Request is created (if needed of course).  This, combined with the removal of the ASSERT in loader::cancelRequests, removes the need for any changes to DocLoader, so I reverted those changes.
Comment 15 WebKit Review Bot 2009-12-18 13:19:52 PST
Attachment 45176 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/loader/Request.cpp:25:  Found other header before a header this file implements. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
Total errors found: 1
Comment 16 Darin Adler 2009-12-18 16:03:17 PST
Comment on attachment 45176 [details]
Attempt #3

>  #include "config.h"
> +#include "DocLoader.h"
> +#include "Frame.h"
>  #include "Request.h"
>  
>  #include "CachedResource.h"

The style-check script is correct. These new includes should be added after "CachedResource.h" rather than before "Request.h".
Comment 17 Darin Adler 2009-12-18 16:10:07 PST
Comment on attachment 45176 [details]
Attempt #3

> -PassRefPtr<SubresourceLoader> SubresourceLoader::create(Frame* frame, SubresourceLoaderClient* client, const ResourceRequest& request, SecurityCheckPolicy securityCheck, bool sendResourceLoadCallbacks, bool shouldContentSniff)
> +PassRefPtr<SubresourceLoader> SubresourceLoader::create(Frame* frame, SubresourceLoaderClient* client, const ResourceRequest& request, SecurityCheckPolicy securityCheck, OutlivePagePolicy outlivePage, bool sendResourceLoadCallbacks, bool shouldContentSniff)
>  {
>      if (!frame)
>          return 0;
>  
>      FrameLoader* fl = frame->loader();
> -    if (securityCheck == DoSecurityCheck && (fl->state() == FrameStateProvisional || fl->activeDocumentLoader()->isStopping()))
> +    if (outlivePage == DoNotOutlivePage && (fl->state() == FrameStateProvisional || fl->activeDocumentLoader()->isStopping()))

Argument name here is not great. It's not an outlive, nor a page. I think outlivePagePolicy or shouldOutlivePage would be better, even though they are long. Same for securityCheck.

> +    // If we are loading an image during an unlaod event, we want to allow the request to outlive the page

Typo here: "unlaod".

> +    // that we are leaving.  Some sites (most commonly ad networks) rely on image requests in beforeunload
> +    // or unload event handlers to track time spent on the page.  This will allow them to do the tracking

We use one space after a period, not two.

> +    , m_shouldOutlivePage(shouldOutlivePage)

I noticed that m_frameForRequestThatOutlivesPage and m_shouldOutlivePage always match.

Because of that, you could eliminate m_shouldOutlivePage, and instead have the shouldOutlivePage() function just check if m_frameForRequestThatOutlivesPage is 0 or not. I think it would be nice to save that data member and probably wouldn't hurt the clarity of the code.

m_frameForRequestThatOutlivesPage is not quite right, because it's used even when the request does not outlive the page. Maybe m_frameForRequestThatCanOutlivePage or m_frameForRequestThatShouldOutlivePage?

Maybe all the things named shouldOutlivePage should be renamed canOutlivePage.

review- just because of the header thing -- the code itself seems fine to me, but that's too blatant a (minor) style violation
Comment 18 Nate Chapin 2009-12-18 16:32:02 PST
Created attachment 45206 [details]
Attempt #4

Yeah, sorry about that.  I'm still trying to get in the habit of run check-webkit-style before uploading.
Comment 19 WebKit Review Bot 2009-12-18 16:33:28 PST
style-queue ran check-webkit-style on attachment 45206 [details] without any errors.
Comment 20 Maciej Stachowiak 2009-12-18 18:31:14 PST
That's hot.
Comment 21 Nate Chapin 2009-12-21 09:40:23 PST
http://trac.webkit.org/changeset/52446
Comment 22 Dimitri Glazkov (Google) 2010-01-22 18:07:39 PST
Rolled out in http://trac.webkit.org/changeset/52456.
Comment 23 Nate Chapin 2010-01-25 09:25:36 PST
Created attachment 47350 [details]
Patch without the crash

The additions to http://trac.webkit.org/changeset/52446 can be summarized as such:
Request.h/cpp : In addition to the Frame, maintain RefPtrs to the Document and
DocumentLoader in the case of a request that can outlive the page.
DocumentLoader.cpp / ResourceLoader.cpp : Allow for the possibility that
Frame::page() and Frame::settings() may be null (as can now be the case if the
unload event is triggered by closing rather than navigating).
FrameLoader.h/cpp : Keep a count of the number of requests that may outlive the
page that refer to this FrameLoader, and don't allow m_documentLoader to be
nulled if that count is non-zero.

I'm not sure that this is a good solution, but it does appear to stop the
crashing.  It seems to me, though, that I've allowed the possibility that, when
an OutlivePage Request is executed, FrameLoader::m_documentLoader will not be
the DocumentLoader that originally associated with the Request.  Is that a Bad
Thing?
Comment 24 Darin Adler 2010-01-25 09:55:35 PST
(In reply to comment #23)
> I'm not sure that this is a good solution, but it does appear to stop the
> crashing.  It seems to me, though, that I've allowed the possibility that, when
> an OutlivePage Request is executed, FrameLoader::m_documentLoader will not be
> the DocumentLoader that originally associated with the Request.  Is that a Bad
> Thing?

There’s no simple answer to that question. You’ll need to look at the specific state the document loader contains and figure out what specifically could go wrong. For example, there is the m_responses vector which is used for loading a page from the page cache. So there could be problems relating to “go back”. There could also be problems when a page like this is loaded from a web archive because of the m_pendingSubstituteResources map. Or problems having to do with the application cache. I’m just looking through the data members of the document loader class to come up with these.
Comment 25 Eric Seidel (no email) 2010-01-26 14:03:31 PST
Comment on attachment 44996 [details]
Attempt #2

Clearing darin's r+ from this obsolete patch.
Comment 26 Eric Seidel (no email) 2010-01-26 14:03:47 PST
Comment on attachment 45206 [details]
Attempt #4

Clearing darin's r+ from this obsolete patch.
Comment 27 Nate Chapin 2010-02-01 10:31:48 PST
(In reply to comment #24)
> (In reply to comment #23)
> > I'm not sure that this is a good solution, but it does appear to stop the
> > crashing.  It seems to me, though, that I've allowed the possibility that, when
> > an OutlivePage Request is executed, FrameLoader::m_documentLoader will not be
> > the DocumentLoader that originally associated with the Request.  Is that a Bad
> > Thing?
> 
> There’s no simple answer to that question. You’ll need to look at the specific
> state the document loader contains and figure out what specifically could go
> wrong. For example, there is the m_responses vector which is used for loading a
> page from the page cache. So there could be problems relating to “go back”.
> There could also be problems when a page like this is loaded from a web archive
> because of the m_pendingSubstituteResources map. Or problems having to do with
> the application cache. I’m just looking through the data members of the
> document loader class to come up with these.

Sorry for my silence.  I've gotten sucked into other things, but I'm hoping to return to this patch in the next day or two.
Comment 28 Eric Seidel (no email) 2010-02-17 14:43:58 PST
I'm not sure what the current state of this patch should be.  From the comments above it sounds like we should r- this patch until Nate and Darin agree this is non-harmful.
Comment 29 Nate Chapin 2010-02-17 14:46:12 PST
Comment on attachment 47350 [details]
Patch without the crash

Clearing flags.  I've once again gotten side tracked with other things.  If this bug becomes a higher priority or urgent, let me know.  Otherwise, it may be a bit before I get back to it.
Comment 30 Nate Chapin 2010-04-26 16:40:38 PDT
Created attachment 54346 [details]
Attempt #6 (or thereabouts)

Sorry for the months-long delay.

The main differences between this and the previous version are the creation of dummy objects, rather than keeping the old objects alive.  When a new Request is constructed to outlive its page, it holds a RefPtr to the relevant Frame, but it creates a dummy Document and DocLoader to use for the load.  If we try to null the main document loader of a FrameLoader that has a pending outliving page request, we replace the current document loader with a dummy instead of a null.

The layout tests should be identical to the previous version. I've tested every example I knew of that triggered the crash in the original commit and it appears to be resolved.
Comment 31 Nate Chapin 2010-05-03 15:29:38 PDT
Created attachment 54968 [details]
Attempt #7 (clean up weird layering)

In previous iterations, I was doing the check for whether we were loading an image from an unload event handler in Loader::load().  This version exposes an OutlivePagePolicy parameter on this function, so that the image-specific logic can be in CachedImage::load().

This also makes things more generic, in particular giving us the option of adding other loads that occur asynchronously through navigation.
Comment 32 Nate Chapin 2010-05-17 11:38:54 PDT
Created attachment 56254 [details]
Merge in recent changes to FrameLoader

Resolved conflicts with http://trac.webkit.org/changeset/59374.
Comment 33 Alexey Proskuryakov 2010-05-17 15:29:43 PDT
See also: bug 39209.
Comment 34 David Levin 2010-06-14 17:45:43 PDT
When I started to look at this, I realized that per function comments in the ChangeLog would make this a lot more reviewable (for me at least).

For example, a change log entry for FrameLoader::checkLoadComplete would tell me why "ASSERT(m_client->hasWebView());" was removed so I don't have to figure it out again.

Please consider adding these.
Comment 35 Nate Chapin 2010-06-15 13:54:03 PDT
Created attachment 58815 [details]
+ more thorough ChangeLog

I think I've got everything non-trivial explained in the ChangeLog now.  Hope that helps.
Comment 36 Nate Chapin 2010-07-21 13:52:59 PDT
Comment on attachment 58815 [details]
+ more thorough ChangeLog

I'm going to remove the review? flag on this patch.  I have some ideas on how to rewrite this to minimize side effects, so I don't think there's any sense in leaving this draft up.
Comment 37 Nate Chapin 2010-07-22 12:59:02 PDT
Created attachment 62328 [details]
Rewrite at ResourceHandleClient level

This patch uses a new class, PingLoader, to handle this family of loads.  PingLoader subclasses ResourceHandleClient, so it doesn't need to keep any external references (except to the ResourceHandle it created) and won't be cancelable when a FrameLoader is stopped.  The PingLoader triggers a load by creating a ResourceHandle and waits for a callback indicating that a response was received or that the load failed.  As soon as such a callback is received, the load is cancelled and the PingLoader deletes itself.

Testing this patch is a bit tricky.  Because we aren't making use of ResourceLoader, these loads are invisible to layoutTestController.dumpResourceLoadCallbacks().  The attached test creates an image load in the unload handler, and the resource loaded as an image sets a cookie.  We then wait for the document.cookie to visible to be set and pass when it does.
Comment 38 Ojan Vafai 2010-07-22 14:01:41 PDT
Comment on attachment 62328 [details]
Rewrite at ResourceHandleClient level

The C++ code looks good to me, but I don't know the resource loading pipeline well enough to r+ this.

> +++ LayoutTests/http/tests/navigation/image-load-in-unload-handler.html	(revision 0)
> +function checkCookie() {
> +    if (document.cookie) {
> +        log("PASS: Cookie found.");
> +        if (window.layoutTestController)
> +            layoutTestController.notifyDone();
> +    } else
> +        setTimeout(checkCookie, 50);
> +}

Instead of checking for a cookie, can we listen to onload/onerror on the image? setTimeouts often lead to flaky tests and should be avoided whenever possible.

Does this mean that we won't navigate away from a page until the image has loaded?
Comment 39 Nate Chapin 2010-07-22 14:12:38 PDT
(In reply to comment #38)
> (From update of attachment 62328 [details])
> The C++ code looks good to me, but I don't know the resource loading pipeline well enough to r+ this.
> 
> > +++ LayoutTests/http/tests/navigation/image-load-in-unload-handler.html	(revision 0)
> > +function checkCookie() {
> > +    if (document.cookie) {
> > +        log("PASS: Cookie found.");
> > +        if (window.layoutTestController)
> > +            layoutTestController.notifyDone();
> > +    } else
> > +        setTimeout(checkCookie, 50);
> > +}
> 
> Instead of checking for a cookie, can we listen to onload/onerror on the image? setTimeouts often lead to flaky tests and should be avoided whenever possible.

I'll look into that, sure.

> 
> Does this mean that we won't navigate away from a page until the image has loaded?

This should mean that there is absolutely no way for an image to block a navigation, since the Frame will not have any reference to the request (and therefore no way to cancel or block on it).
Comment 40 Nate Chapin 2010-07-26 16:06:56 PDT
Comment on attachment 62328 [details]
Rewrite at ResourceHandleClient level

Clearing review? flag until I get the layout test sorted out.
Comment 41 Nate Chapin 2010-08-18 13:29:16 PDT
Created attachment 64760 [details]
ResourceHandleClient-level solution + rewritten layout test

Thanks to David Levin for suggesting how to properly test this behavior.

The layout test now uses php scripts saving and checking a timestamp in a file to determine whether the load in the unload handler was actually sent to the server.
Comment 42 David Levin 2010-08-20 13:10:11 PDT
Comment on attachment 64760 [details]
ResourceHandleClient-level solution + rewritten layout test

I like the approach as it is very simple. My biggest concern is if the ResourceHandle may have an unfortunate amount of knowledge about the frame (see below).


WebCore/loader/DocLoader.cpp:126
 +          if (f->loader()->pageDismissalEventBeingDispatched()) {
Shouldn't this be after the allowImages check just below this?

It seems odd that the user would disable this and then we have a loophole here.

WebCore/loader/PingLoader.cpp:41
 +  void PingLoader::loadOutlivingImage(Frame* frame, const String& url)
How about just naming this loadImage?
(I can't parse loadOutlivingImage without already knowing what it does already.)


WebCore/loader/PingLoader.cpp:42
 +  {
Shouldn't we check to see if the url is valid? (after completing it).


WebCore/loader/PingLoader.cpp:43
 +      if (SecurityOrigin::restrictAccessToLocal() && !SecurityOrigin::canLoad(KURL(ParsedURLString, url), String(), frame->document())) {
Shouldn't the canLoad call be done using the "completeURL"?


WebCore/loader/PingLoader.cpp:48
 +      ResourceRequest r(frame->document()->completeURL(url));
Please avoid using abbreviates for variables "r".

WebCore/loader/PingLoader.cpp:54
 +      new PingLoader(frame, r);
This needs a PassOwnPtr and leakPtr around it.


WebCore/loader/PingLoader.cpp:59
 +      m_handle = ResourceHandle::create(request, this, frame, false, false);
How does this live past the lifetime of the frame? Does any platform store information from the frame that will live on past this request? And if so, how does that work?

For example WebCore/platform/network/qt/ResourceHandleQt.cpp, does this
    getInternal()->m_frame = static_cast<FrameLoaderClientQt*>(frame->loader()->client())->webFrame();

WebCore/loader/DocLoader.cpp:127
 +              PingLoader::loadOutlivingImage(f, url);
It looks like this also skips checks that would normally be done for image loading in DocLoader::canRequest.

WebCore/loader/PingLoader.h:47
 +  // and don't depend on their Frame staying alive (i.e., auditing pingbacks).
Consider:
  
Triggers asynchronous loads independent of Frame staying alive (i.e., auditing pingbacks).


WebCore/loader/PingLoader.h:58
 +      ~PingLoader() { m_handle->cancel(); }
Please move the destructor to the cpp file, and then get rid of the "#include "ResourceHandle.h"" from this header file.


LayoutTests/http/tests/navigation/resources/check-ping.php:7
 +      // This refresh header is teh sadness, but if the file doesn't
typo:teh Also, I understand what you are saying but it seems like a colloquialism and I'm not sure how well it will work for non-native speakers.

Perhaps, you might want to consider something more standard like "Unfortunately, we have to do a refresh because..."


LayoutTests/http/tests/navigation/resources/check-ping.php:19
 +  $result = ($expectedPingValue == $actualPingValue) ? "PASS" : "FAIL";
Should this be part of the "refresh" sequence? Just because the file exists does that mean that its contents have been flushed to disk? (and the unlink done after this step).
Comment 43 Nate Chapin 2010-08-20 14:00:05 PDT
Ok.

(In reply to comment #42)
> (From update of attachment 64760 [details])
> WebCore/loader/PingLoader.cpp:59
>  +      m_handle = ResourceHandle::create(request, this, frame, false, false);
> How does this live past the lifetime of the frame? Does any platform store information from the frame that will live on past this request? And if so, how does that work?
> 
> For example WebCore/platform/network/qt/ResourceHandleQt.cpp, does this
>     getInternal()->m_frame = static_cast<FrameLoaderClientQt*>(frame->loader()->client())->webFrame();

I need to look at this in depth to figure out whether this is a problem.  Most platforms use the frame during creation, then throw it away, but it's possible this will cause a problem on other platforms.

> LayoutTests/http/tests/navigation/resources/check-ping.php:7
>  +      // This refresh header is teh sadness, but if the file doesn't
> typo:teh Also, I understand what you are saying but it seems like a colloquialism and I'm not sure how well it will work for non-native speakers.
> 
> Perhaps, you might want to consider something more standard like "Unfortunately, we have to do a refresh because..."

Sorry, I was feeling a bit cranky when I wrote that comment (having just spent hours trying to figure out why my pings weren't being detected), and I forgot to edit it before upload :)


> LayoutTests/http/tests/navigation/resources/check-ping.php:19
>  +  $result = ($expectedPingValue == $actualPingValue) ? "PASS" : "FAIL";
> Should this be part of the "refresh" sequence? Just because the file exists does that mean that its contents have been flushed to disk? (and the unlink done after this step).

The load of delete-ping.php in image-load-in-unload-handler.html should have guaranteed unlink (even if a file was left around by a previous test), but if you think it would improve test stability, I'll switch it around.
Comment 44 David Levin 2010-08-20 14:26:05 PDT
(In reply to comment #43)
> (In reply to comment #42)
> > LayoutTests/http/tests/navigation/resources/check-ping.php:19
> >  +  $result = ($expectedPingValue == $actualPingValue) ? "PASS" : "FAIL";
> > Should this be part of the "refresh" sequence? Just because the file exists does that mean that its contents have been flushed to disk? (and the unlink done after this step).
> 
> The load of delete-ping.php in image-load-in-unload-handler.html should have guaranteed unlink (even if a file was left around by a previous test), but if you think it would improve test stability, I'll switch it around.

I trust that. What I don't know is when the new file is created do we know that the contents of the file is flushed to disk on creation.

i.e. Does new file existing mean that it has the full contents? (Yes it seems reasonable, but I don't know and I'd like to avoid a flaky test.)

btw, because of the delete ping at the beginning of the test, I'd be fine with just omitting the date time check (which would fix this issue).
Comment 45 Nate Chapin 2010-08-20 15:29:46 PDT
(In reply to comment #43)
> Ok.
> 
> (In reply to comment #42)
> > (From update of attachment 64760 [details] [details])
> > WebCore/loader/PingLoader.cpp:59
> >  +      m_handle = ResourceHandle::create(request, this, frame, false, false);
> > How does this live past the lifetime of the frame? Does any platform store information from the frame that will live on past this request? And if so, how does that work?
> > 
> > For example WebCore/platform/network/qt/ResourceHandleQt.cpp, does this
> >     getInternal()->m_frame = static_cast<FrameLoaderClientQt*>(frame->loader()->client())->webFrame();
> 
> I need to look at this in depth to figure out whether this is a problem.  Most platforms use the frame during creation, then throw it away, but it's possible this will cause a problem on other platforms.

qt and gtk both save the frame for later use.

gtk only uses it for triggering authentication dialogs.  I probably need to double-check with a gtk/soup expert, but I don't think that'll be relevant, since PingLoader uses ResourceHandleClient's empty functions for authentication-related callbacks.

qt appears to be ok (at least for now), because the frame is only actually used:
1. In a test that's not covering this case
2. In a synchronous call from the constructor unless the load is deferred (and the parameters we're passing into the ResourceHandle constructor guarantee it won't be deferred).

So while I think it's ok now, it's certainly possible something will change to break these loads.
Comment 46 Nate Chapin 2010-08-23 13:26:58 PDT
Created attachment 65159 [details]
Addressing levin's comments
Comment 47 David Levin 2010-08-23 13:40:30 PDT
Comment on attachment 65159 [details]
Addressing levin's comments

WebCore/loader/PingLoader.cpp:55
 +      PassOwnPtr<PingLoader> pingLoader(new PingLoader(frame, request));
When I mentioned using PassOwnPtr, I was thinking of a static create method, but given the usage I suppose that would be overkill.

Since this is a local variable, use OwnPtr instead of PassOwnPtr.
Comment 48 WebKit Review Bot 2010-08-24 10:43:39 PDT
http://trac.webkit.org/changeset/65910 might have broken Chromium Linux Release
Comment 49 Nate Chapin 2010-08-24 11:11:06 PDT
(In reply to comment #48)
> http://trac.webkit.org/changeset/65910 might have broken Chromium Linux Release

http://trac.webkit.org/changeset/65910
http://trac.webkit.org/changeset/65911