Bug 44722 - Add support for "downloadToFile" to ResourceRequest and ResourceResponse
Summary: Add support for "downloadToFile" to ResourceRequest and ResourceResponse
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Nordman
URL:
Keywords:
Depends on:
Blocks: 44721
  Show dependency treegraph
 
Reported: 2010-08-26 14:55 PDT by Michael Nordman
Modified: 2016-05-18 21:25 PDT (History)
15 users (show)

See Also:


Attachments
downloadToFile (21.22 KB, patch)
2010-08-26 15:09 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadToFile (22.34 KB, patch)
2010-08-26 15:34 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadToFile (21.13 KB, patch)
2010-08-26 20:20 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadAsBlob (60.90 KB, patch)
2010-10-05 19:58 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadAsBlob (62.17 KB, patch)
2010-10-05 20:31 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadAsBlob (61.49 KB, patch)
2010-10-05 20:47 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadAsBlob (64.22 KB, patch)
2010-10-06 16:28 PDT, Michael Nordman
abarth: review-
Details | Formatted Diff | Diff
downloadAsBlob (66.39 KB, patch)
2010-10-07 19:49 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadAsBlob (66.40 KB, patch)
2010-10-07 20:12 PDT, Michael Nordman
no flags Details | Formatted Diff | Diff
downloadAsBlob (66.82 KB, patch)
2010-10-08 14:30 PDT, Michael Nordman
abarth: review-
Details | Formatted Diff | Diff
downloadAsBlob (66.83 KB, patch)
2010-10-08 15:31 PDT, Michael Nordman
abarth: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Nordman 2010-08-26 14:55:58 PDT
The intent is to use that capability to implement XHR.responseBlob.
Comment 1 Michael Nordman 2010-08-26 15:09:23 PDT
Created attachment 65625 [details]
downloadToFile

not ready for review
Comment 2 Michael Nordman 2010-08-26 15:34:36 PDT
Created attachment 65631 [details]
downloadToFile

ChangeLog comments
Comment 3 Michael Nordman 2010-08-26 16:05:00 PDT
Comment on attachment 65631 [details]
downloadToFile

r? to get some feedback on the approach

* Still noodling about how to manage the lifetime of the downloaded file.
* Haven't yet looked at baking this into platform/ResourceHandle.cpp

Need to see how the blob infrastructure plays out.
Comment 4 Darin Fisher (:fishd, Google) 2010-08-26 16:55:47 PDT
Comment on attachment 65631 [details]
downloadToFile

WebCore/platform/network/ResourceRequestBase.cpp:49
 +  #if ENABLE(XHR_RESPONSE_BLOB)

i recommend not conditionalizing this code.  it is useful infrastructure
that can be used independently of XHR_RESPONSE_BLOB.  it is also quite
lightweight, so there isn't much cost to having it in webkit builds that
do not enable XHR_RESPONSE_BLOB.  i think the benefits of cleaner code
are worth it.  same goes for ResourceResponseBase and the affected 
Chromium WebKit API files.  come to think of it, perhaps this entire
patch can have the ENABLE flag excluded.
Comment 5 Michael Nordman 2010-08-26 20:20:10 PDT
Created attachment 65667 [details]
downloadToFile

removed conditional compilation guards
Comment 6 Michael Nordman 2010-09-02 15:18:49 PDT
I think i'm far enough along with the related work to feel like getting this into review and into the code base now... can somebody take a look?
Comment 7 Michael Nordman 2010-09-08 13:19:00 PDT
ping
Comment 8 Darin Fisher (:fishd, Google) 2010-09-08 13:25:31 PDT
Comment on attachment 65667 [details]
downloadToFile

View in context: https://bugs.webkit.org/attachment.cgi?id=65667&action=prettypatch

> WebCore/ChangeLog:13
> +
shouldn't there also be a change to ResourceHandleClient to similarly add a didReceiveFileData method?

> WebCore/platform/network/ResourceResponseBase.h:114
> +    // TBD: How is the lifetime of this file is managed?
nit: "file is managed" -> "file managed"

TBD -> FIXME
Comment 9 Michael Nordman 2010-09-08 14:54:32 PDT
(In reply to comment #8)
> shouldn't there also be a change to ResourceHandleClient to similarly add a didReceiveFileData method?

I guess it would be a good idea if something should actually call into this new method :)

> nit: "file is managed" -> "file managed"
> 
> TBD -> FIXME

Will do.
Comment 10 Michael Nordman 2010-09-08 15:16:47 PDT
Hey... looks like when this option is enabled in chrome, DataReceived msgs don't get sent to the renderer... we'll have to add FileDataReceived msgs for to get the progress events to work right.
Comment 11 Darin Fisher (:fishd, Google) 2010-09-08 17:01:51 PDT
(In reply to comment #10)
> Hey... looks like when this option is enabled in chrome, DataReceived msgs don't get sent to the renderer... we'll have to add FileDataReceived msgs for to get the progress events to work right.

Yeah, that makes sense.  WebURLLoaderClient already has a didDownloadData method that should get called though.  If that matches up to a ResourceHandleClient method, then we should be set.
Comment 12 Michael Nordman 2010-09-08 17:29:28 PDT
> Yeah, that makes sense.  WebURLLoaderClient already has a didDownloadData
> method that should get called though.  If that matches up to a
> ResourceHandleClient method, then we should be set.

I'll check... but i think that's probably backed by DataReceived msgs and won't actually get called in the downloadAsFile case.
Comment 13 Michael Nordman 2010-10-05 19:58:49 PDT
Created attachment 69878 [details]
downloadAsBlob
Comment 14 Michael Nordman 2010-10-05 20:31:04 PDT
Created attachment 69881 [details]
downloadAsBlob

There is no common code implementation for actually constructing a blob out of the response data in the async case, but there is a comment about how to go about doing so in platform/ResourceHandle.cpp.
Comment 15 WebKit Review Bot 2010-10-05 20:33:35 PDT
Attachment 69881 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1
WebKit/chromium/src/WebURLRequest.cpp:296:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
WebKit/chromium/src/WebURLRequestPrivate.h:53:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
WebCore/platform/network/ResourceHandle.cpp:214:  One space before end of line comments  [whitespace/comments] [5]
WebCore/loader/DocumentThreadableLoader.cpp:389:  One line control clauses should not use braces.  [whitespace/braces] [4]
WebKit/chromium/src/ResourceHandle.cpp:421:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
WebKit/chromium/src/WebURLResponsePrivate.h:53:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
WebCore/loader/FrameLoader.cpp:2774:  Should have a space between // and comment  [whitespace/comments] [4]
WebKit/chromium/src/WebURLResponse.cpp:396:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
Total errors found: 8 in 31 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 16 Early Warning System Bot 2010-10-05 20:43:28 PDT
Attachment 69881 [details] did not build on qt:
Build output: http://queues.webkit.org/results/4180093
Comment 17 Michael Nordman 2010-10-05 20:47:47 PDT
Created attachment 69882 [details]
downloadAsBlob

Fixed the style issues.

Also if it's easier to see what's going on in a side-by-side fashion, you can look here too...
  webcore - http://codereview.chromium.org/3404026/show
  webkit - http://codereview.chromium.org/3404025/show
... the patch for review in bugzilla is only slightly modified from those.
Comment 18 Early Warning System Bot 2010-10-05 20:59:32 PDT
Attachment 69882 [details] did not build on qt:
Build output: http://queues.webkit.org/results/4202088
Comment 19 WebKit Review Bot 2010-10-06 14:40:02 PDT
Attachment 69882 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/4216121
Comment 20 Michael Nordman 2010-10-06 16:28:02 PDT
Created attachment 70005 [details]
downloadAsBlob

fix some build issues
Comment 21 Early Warning System Bot 2010-10-06 16:39:21 PDT
Attachment 70005 [details] did not build on qt:
Build output: http://queues.webkit.org/results/4147110
Comment 22 Adam Barth 2010-10-06 17:02:16 PDT
Comment on attachment 70005 [details]
downloadAsBlob

View in context: https://bugs.webkit.org/attachment.cgi?id=70005&action=review

We need tests for this code.

> WebCore/loader/DocumentThreadableLoader.cpp:236
> +    // Ignore response body of preflight requests.

We can skip this comment.

> WebCore/loader/DocumentThreadableLoader.cpp:273
> +    if (m_actualRequest) {
> +        ASSERT(!m_sameOriginRequest);
> +        ASSERT(m_options.crossOriginRequestPolicy == UseAccessControl);
> +        preflightSuccess();
> +    } else

Prefer early return.

> WebCore/loader/DocumentThreadableLoader.cpp:367
> +            dataAsBlob = m_document->frame()->loader()->loadResourceSynchronouslyAsBlob(request, storedCredentials, error, response, identifier);
> +        else
> +            identifier = m_document->frame()->loader()->loadResourceSynchronously(request, storedCredentials, error, response, data);

Should these methods assert that request.downloadAsBlob is set to the right thing?

> WebCore/loader/DocumentThreadableLoader.cpp:388
> +       didFinishLoadingBlob(identifier, dataAsBlob.release());

bad indent.  prefer early return.

> WebCore/loader/FrameLoader.cpp:2740
>  unsigned long FrameLoader::loadResourceSynchronously(const ResourceRequest& request, StoredCredentials storedCredentials, ResourceError& error, ResourceResponse& response, Vector<char>& data)

This method shouldn't be on FrameLoader, but that's not your fault.

> WebCore/loader/FrameLoader.cpp:2742
> +    ASSERT(!request.downloadAsBlob());

Ah, I see it does.

> WebCore/loader/FrameLoader.cpp:2744
> +    unsigned long identifier = prepareSyncReqeust(request, error, actualRequest);

prepareSyncReqeust => prepareSyncRequest

> WebCore/loader/FrameLoader.cpp:2761
> +PassRefPtr<Blob> FrameLoader::loadResourceSynchronouslyAsBlob(const ResourceRequest& request, StoredCredentials storedCredentials, ResourceError& error, ResourceResponse& response, unsigned long& identifier)

These two methods duplicate a lot of code.  Is it possible to share more?  Perhaps pass which ResourceHandle static method to use as a template parameter?

> WebCore/loader/ResourceLoader.cpp:286
> +void ResourceLoader::didFinishLoadingBlob(double finishTime, PassRefPtr<Blob>)

It seems strange to use a PassRefPtr and then to ignore the argument.  Did you mean to pass a Blob* ?

> WebCore/loader/ResourceLoader.cpp:439
> +void ResourceLoader::didReceiveDataCommon(bool isBlobData, const char* data, int length, int lengthReceived)

Can we use an enum instead of isBlobData ?  The call sites are hard to read.

> WebCore/loader/ResourceLoader.h:83
> +        virtual void didReceiveBlobData(int) { }

Please don't declare virtual functions inline.

> WebCore/loader/ResourceLoader.h:106
> +        virtual void didFinishLoadingBlob(ResourceHandle*, double /*finishTime*/, PassRefPtr<Blob>);

There's not reason to comment out the formal parameter name.

> WebCore/loader/SubresourceLoader.cpp:195
> +void SubresourceLoader::didFinishLoadingCommon(bool asBlob, double finishTime, PassRefPtr<Blob> blob)

Same comment w.r.t. enum here.  Also, please use the same variable names for consistency.

> WebCore/loader/SubresourceLoader.cpp:206
> +            m_client->didFinishLoadingBlob(this, blob);

I'm confused.  Does didFinishLoadingBlob take a PassRefPtr?  If so, won't blob be null below?

> WebCore/loader/WorkerThreadableLoader.cpp:235
> +        createCallbackTask(&workerContextDidFinishLoadingBlob, m_workerClientWrapper, identifier, blob->url(), blob->type(), blob->size()),
> +        m_taskMode);

We don't usually put in this kind of line break.

> WebCore/loader/WorkerThreadableLoader.h:137
> +            // Only to be used on the main thread.

This comment doesn't match the one above.  Is that an intentional difference?

> WebCore/platform/network/ResourceHandle.cpp:196
> +PassRefPtr<Blob> ResourceHandle::loadResourceSynchronouslyAsBlob(
> +                                               NetworkingContext* context,
> +                                               const ResourceRequest& request,
> +                                               StoredCredentials storedCredentials,
> +                                               ResourceError& error,
> +                                               ResourceResponse& response)

This all goes on one line.  Yes, it's ugly.

> WebCore/platform/network/ResourceHandle.cpp:214
> +        blobData->appendData(CString(data.data(), data.size())); // FIXME: Don't copy the data.

Woah, that's a big memcpy.  This change is already huge, but please follow up and fix this!!!

> WebCore/platform/network/ResourceRequestBase.h:228
> +        bool m_downloadAsBlob;

There's a general problem that we can't add members to ResourceRequestBase that can't be represented in an NSURLRequest (or something like that).  I'm not sure if this issue has been resolved.  We should check with some Apple folks.

> WebCore/platform/network/chromium/ResourceRequest.h:87
> +        // If true, the response body will be provided as a file.

Remove this comment.

> WebCore/platform/network/chromium/ResourceRequest.h:89
> +        bool downloadToFile() const { return downloadAsBlob(); }
> +        void setDownloadToFile(bool download) { setDownloadAsBlob(download); }

Why are we switching names from "AsBlob" to "ToFile" ?

> WebCore/platform/network/chromium/ResourceResponse.cpp:40
> +    m_downloadFilePath  = path;

extra space before =

> WebCore/xml/XMLHttpRequest.cpp:1041
> +void XMLHttpRequest::didReceiveDataCommon(bool isBlobData, const char* data, int len)

isBlobData => enum
len => length

> WebCore/xml/XMLHttpRequest.cpp:1069
> +        if (len == -1) 
> +            len = strlen(data);

wow, that's terrible.  I had no idea this was here.

> WebKit/chromium/src/ResourceHandle.cpp:89
> +    static PassRefPtr<Blob> loadResourceSynchronously(NetworkingContext* context,
> +                                                      const ResourceRequest& request,
> +                                                      StoredCredentials storedCredentials,
> +                                                      ResourceError& error,
> +                                                      ResourceResponse& response,
> +                                                      Vector<char>* data);

one line

> WebKit/chromium/src/ResourceHandle.cpp:247
> +String ResourceHandleInternal::determineContentType(const ResourceResponse& response)
> +{
> +    if (response.isHTTP())
> +        return extractMIMETypeFromMediaType(response.httpHeaderField("Content-Type"));
> +    return response.mimeType();
> +}

Why is this implemented here instead of as a member of ResourceResponse?

> WebKit/chromium/src/ResourceHandle.cpp:255
> +PassRefPtr<Blob> ResourceHandleInternal::loadResourceSynchronously(NetworkingContext* context,
> +                                                                   const ResourceRequest& request,
> +                                                                   StoredCredentials storedCredentials,
> +                                                                   ResourceError& error,
> +                                                                   ResourceResponse& response,
> +                                                                   Vector<char>* data)

one line.

> WebKit/chromium/src/ResourceHandle.cpp:266
> +    long long lengthOut = 7777;

7777?

> WebKit/chromium/src/ResourceHandle.cpp:268
> +    // TODO(michaeln): Some way to get the lengthOut in the asFile case.

TODO(michaeln) => FIXME
Comment 23 Michael Nordman 2010-10-06 17:08:10 PDT
Thnx for looking!
Comment 24 Michael Nordman 2010-10-06 17:42:26 PDT
Just replying to particular comments, no new patch yet, I'm looking at the QT build (??).


> > WebCore/loader/FrameLoader.cpp:2761
> > +PassRefPtr<Blob> FrameLoader::loadResourceSynchronouslyAsBlob(const ResourceRequest& request, StoredCredentials storedCredentials, 
> 
> These two methods duplicate a lot of code.  Is it possible to share more?  Perhaps pass which
> ResourceHandle static method to use as a template parameter?

I tried to factor out the common parts into prepareSyncReqeust [sic]. I'll look into getting greater reuse.


> > WebCore/loader/ResourceLoader.cpp:286
> > +void ResourceLoader::didFinishLoadingBlob(double finishTime, PassRefPtr<Blob>)
> 
> It seems strange to use a PassRefPtr and then to ignore the argument.  Did you mean to pass a Blob* ?

The only reason I'm using a PassRefPtr is that I thought webcore didn't like passing raw ptrs around. Switching to raw ptrs would be fine by me, nothing hidden and unexpected going on under the covers!


> > WebCore/loader/SubresourceLoader.cpp:206
> > +            m_client->didFinishLoadingBlob(this, blob);
> 
> I'm confused.  Does didFinishLoadingBlob take a PassRefPtr?  If so, won't blob be null below?

Speaking of hidden unexpected side affects, I think its me thats confused. If i switch to raw ptrs this would work, or I'll have to allocate a RefPtr<T> on the stack in this method to satisfy both 'finish' calls with non empty PassRefPtrs. 


> > WebCore/loader/WorkerThreadableLoader.h:137
> > +            // Only to be used on the main thread.
> 
> This comment doesn't match the one above.  Is that an intentional difference?

No secret meanings. There are several similar comments phrased differently, this phrasing matches at least one of them. I'll make them all match.
> We need tests for this code.

This won't function in any port just yet. Is there any precedence to adding layout tests that aren't run yet? Any convention for how to do so? Maybe by naming them, foo_test.html.notyet?


> > WebCore/platform/network/ResourceHandle.cpp:196
> > +PassRefPtr<Blob> ResourceHandle::loadResourceSynchronouslyAsBlob(
> > +                                               NetworkingContext* context,
> > +                                               const ResourceRequest& request,
> > +                                               StoredCredentials storedCredentials,
> > +                                               ResourceError& error,
> > +                                               ResourceResponse& response)
> 
> This all goes on one line.  Yes, it's ugly.

Is there such a thing as too long of a line? I've said in jest, "no line of code ever written is too long for webkit"... but really :)

> > WebCore/platform/network/ResourceHandle.cpp:214
> > +        blobData->appendData(CString(data.data(), data.size())); // FIXME: Don't copy the data.
> 
> Woah, that's a big memcpy.  This change is already huge, but please follow up and fix this!!!

I know. That class has no good way to pass ownership of Vector full of data just yet. I think Jian is working on that.

> > WebCore/platform/network/ResourceRequestBase.h:228
> > +        bool m_downloadAsBlob;
> 
> There's a general problem that we can't add members to ResourceRequestBase that can't be represented in an NSURLRequest (or something like that).  I'm not sure if this issue has been resolved.  We should check with some Apple folks.

Hmmmm, that would be bad? I'll cc ap.

> > WebCore/platform/network/chromium/ResourceRequest.h:89
> > +        bool downloadToFile() const { return downloadAsBlob(); }
> > +        void setDownloadToFile(bool download) { setDownloadAsBlob(download); }
> 
> Why are we switching names from "AsBlob" to "ToFile" ?

For chrome, I'm implementing this in terms of it's "downloadToFile" capability (for now).


> > WebKit/chromium/src/ResourceHandle.cpp:247
> > +String ResourceHandleInternal::determineContentType(const ResourceResponse& response)
> > +{
> > +    if (response.isHTTP())
> > +        return extractMIMETypeFromMediaType(response.httpHeaderField("Content-Type"));
> > +    return response.mimeType();
> > +}
> 
> Why is this implemented here instead of as a member of ResourceResponse?

Good idea, I'll make a method on ResourceResponseBase for this.

> > WebKit/chromium/src/ResourceHandle.cpp:266
> > +    long long lengthOut = 7777;
> 
> 7777?

No special significance other than it's not evil :)
Comment 25 Michael Nordman 2010-10-06 17:44:53 PDT
ap, can you take a look at the question directed to "Apple folks" in the previous comment?
Comment 26 Adam Barth 2010-10-06 17:50:05 PDT
> > > WebCore/loader/ResourceLoader.cpp:286
> > > +void ResourceLoader::didFinishLoadingBlob(double finishTime, PassRefPtr<Blob>)
> > 
> > It seems strange to use a PassRefPtr and then to ignore the argument.  Did you mean to pass a Blob* ?
> 
> The only reason I'm using a PassRefPtr is that I thought webcore didn't like passing raw ptrs around. Switching to raw ptrs would be fine by me, nothing hidden and unexpected going on under the covers!

Oh, raw pointers are fine.  PassRefPtr is used when passing ownership of a RefCounted object.  If you haven't read this document, you might enjoy reading it:

http://webkit.org/coding/RefPtr.html

> > > WebCore/loader/WorkerThreadableLoader.h:137
> > > +            // Only to be used on the main thread.
> > 
> > This comment doesn't match the one above.  Is that an intentional difference?
> 
> No secret meanings. There are several similar comments phrased differently, this phrasing matches at least one of them. I'll make them all match.

Great.  That will make our lives easier in the future.

> > We need tests for this code.
> 
> This won't function in any port just yet. Is there any precedence to adding layout tests that aren't run yet? Any convention for how to do so? Maybe by naming them, foo_test.html.notyet?

Ideally, we land things in testable chunks.  Why doesn't this work in the Mac port?  It looks like ResourceHandle deals with creating the blob in that case.

> > > WebCore/platform/network/ResourceHandle.cpp:196
> > > +PassRefPtr<Blob> ResourceHandle::loadResourceSynchronouslyAsBlob(
> > > +                                               NetworkingContext* context,
> > > +                                               const ResourceRequest& request,
> > > +                                               StoredCredentials storedCredentials,
> > > +                                               ResourceError& error,
> > > +                                               ResourceResponse& response)
> > 
> > This all goes on one line.  Yes, it's ugly.
> 
> Is there such a thing as too long of a line? I've said in jest, "no line of code ever written is too long for webkit"... but really :)

:)

It's rare that we'll break function declarations over lines.  We'll break if-conditionals if the condition gets complicated.

> > > WebCore/platform/network/ResourceHandle.cpp:214
> > > +        blobData->appendData(CString(data.data(), data.size())); // FIXME: Don't copy the data.
> > 
> > Woah, that's a big memcpy.  This change is already huge, but please follow up and fix this!!!
> 
> I know. That class has no good way to pass ownership of Vector full of data just yet. I think Jian is working on that.

This is an example of technical debt.  We prefer not to incur these debts because its hard to get folks to pay them off.  If we land this patch, we should feel obligated to fix it soon.

> > > WebCore/platform/network/chromium/ResourceRequest.h:89
> > > +        bool downloadToFile() const { return downloadAsBlob(); }
> > > +        void setDownloadToFile(bool download) { setDownloadAsBlob(download); }
> > 
> > Why are we switching names from "AsBlob" to "ToFile" ?
> 
> For chrome, I'm implementing this in terms of it's "downloadToFile" capability (for now).

I'm not sure what that means.

> > > WebKit/chromium/src/ResourceHandle.cpp:266
> > > +    long long lengthOut = 7777;
> > 
> > 7777?
> 
> No special significance other than it's not evil :)

I don't follow.  Would zero or uninitialized be better?
Comment 27 Michael Nordman 2010-10-06 21:48:04 PDT
> Oh, raw pointers are fine.  PassRefPtr is used when passing ownership of a RefCounted object.  If you haven't read this document, you might enjoy reading it:
> 
> http://webkit.org/coding/RefPtr.html

Great, i'll switch to raw ptrs for this.

> > This won't function in any port just yet. Is there any precedence to adding layout tests that aren't run yet? Any convention for how to do so? Maybe by naming them, foo_test.html.notyet?
> 
> Ideally, we land things in testable chunks.  Why doesn't this work in the Mac port?  It looks like ResourceHandle deals with creating the blob in that case.

See comment #14. It's not handled in the async case, see the FIXME comment in network/ResourceHandle::create(). I should probably remove the loadResourceSynchronouslyAsBlob method body and replace it with a "FIXME: Implement me" for this plumbing oriented patch. And then provide impls for the sync and async cases together along with tests in a seperate patch.

> This is an example of technical debt.  We prefer not to incur these debts because its hard to get folks to pay them off.  If we land this patch, we should feel obligated to fix it soon.

I hear you. I've been calling for "please don't copy the blob data" in all patches that i can see. There's no way to pass ownership of a vector to the CString instance in the BlobData. CString is probably not the best choice for BlobData, but that's how it is right now, and this patch really isn't about fixing that. I think fixing that is on Jian's radar, he has a patch out for review that's a step in the right direction.

> > No special significance other than it's not evil :)
> 
> I don't follow.  Would zero or uninitialized be better?

Zero is fine.
Comment 28 Adam Barth 2010-10-06 22:39:18 PDT
> > This is an example of technical debt.  We prefer not to incur these debts because its hard to get folks to pay them off.  If we land this patch, we should feel obligated to fix it soon.
> 
> I hear you. I've been calling for "please don't copy the blob data" in all patches that i can see. There's no way to pass ownership of a vector to the CString instance in the BlobData. CString is probably not the best choice for BlobData, but that's how it is right now, and this patch really isn't about fixing that. I think fixing that is on Jian's radar, he has a patch out for review that's a step in the right direction.

That doesn't sound like an offer to pay the debt.  Maybe we should wait for that work to complete before landing this part of the patch?
Comment 29 Alexey Proskuryakov 2010-10-06 23:32:02 PDT
> There's a general problem that we can't add members to ResourceRequestBase that can't be represented in an NSURLRequest (or something like that).  I'm not sure if this issue has been resolved.

I think that Chromium developers were adding members to it regardless, which is only OK as long as no one is ever going to enable the features they add, or doesn't care whether they work correctly.

Why is downloading being added to ResourceHandle? I think that this is wrong. It seems that a lot of functionality was added to ResourceHandle not because it was a good idea, but because this used to be the only approachable part of the loader system. With recent developments, it is already no longer approachable or maintainable. This needs to stop, and to be undone.
Comment 30 Adam Barth 2010-10-07 10:02:53 PDT
> I think that Chromium developers were adding members to it regardless, which is only OK as long as no one is ever going to enable the features they add, or doesn't care whether they work correctly.

Presumably they both want to enable the features and they care whether they work correctly.  That leads me to conclude that it's not ok.  :)
Comment 31 Michael Nordman 2010-10-07 13:35:11 PDT
(In reply to comment #28)
> > > This is an example of technical debt.  We prefer not to incur these debts because its hard to get folks to pay them off.  If we land this patch, we should feel obligated to fix it soon.
> > 
> > I hear you. I've been calling for "please don't copy the blob data" in all patches that i can see. There's no way to pass ownership of a vector to the CString instance in the BlobData. CString is probably not the best choice for BlobData, but that's how it is right now, and this patch really isn't about fixing that. I think fixing that is on Jian's radar, he has a patch out for review that's a step in the right direction.
> 
> That doesn't sound like an offer to pay the debt.  Maybe we should wait for that work to complete before landing this part of the patch?

There is a patch out for review that pays this debt.
https://bugs.webkit.org/show_bug.cgi?id=45909
Notice the new BlobDataItem ctor that takes a PassOwnPtr<Vector<char> >.

I'll remove the impl of this sync method body for now and follow up with a later patch that
* makes use of that ctor
* provides a common code impl for the sync and async cases


(In reply to comment #29)
> > There's a general problem that we can't add members to ResourceRequestBase that can't be represented in an NSURLRequest (or something like that).  I'm not sure if this issue has been resolved.
> 
> I think that Chromium developers were adding members to it regardless, which is only OK as long as no one is ever going to enable the features they add, or doesn't care whether they work correctly.
> 
> Why is downloading being added to ResourceHandle? I think that this is wrong. It seems that a lot of functionality was added to ResourceHandle not because it was a good idea, but because this used to be the only approachable part of the loader system. With recent developments, it is already no longer approachable or maintainable. This needs to stop, and to be undone.

ap, this is the first i've heard of this constraint. What is the issue with adding data members to ResourceRequestBase that can't be represented in an NSURLRequest? Afaict the the value of that data member set in XHR makes to ResourceHandle::start() just fine?

I'm adding this capability to ResourceHandle because its a good place for it in the stack. This is in support of XHR.asBlob/.responseBlob. Conduct a regular XHR, only access the result as a blob. XHR uses ResourceHandle to do the downloading in the usual case, the only difference is we want the response in the form of an opaque handle. Defining this capability as this level makes significant optimizations possible, one example, a blob backed by data managed in the browser's HTTP cache.


(In reply to comment #30)
> > I think that Chromium developers were adding members to it regardless, which is only OK as long as no one is ever going to enable the features they add, or doesn't care whether they work correctly.
> 
> Presumably they both want to enable the features and they care whether they work correctly.  That leads me to conclude that it's not ok.  :)

What is the correctness issue? What is the source of the constraint that fields can't be added to ResourceRequestBase?
Comment 32 Adam Barth 2010-10-07 14:01:47 PDT
> > That doesn't sound like an offer to pay the debt.  Maybe we should wait for that work to complete before landing this part of the patch?
> 
> There is a patch out for review that pays this debt.
> https://bugs.webkit.org/show_bug.cgi?id=45909
> Notice the new BlobDataItem ctor that takes a PassOwnPtr<Vector<char> >.
> 
> I'll remove the impl of this sync method body for now and follow up with a later patch that
> * makes use of that ctor
> * provides a common code impl for the sync and async cases

Great.

> > > I think that Chromium developers were adding members to it regardless, which is only OK as long as no one is ever going to enable the features they add, or doesn't care whether they work correctly.
> > 
> > Presumably they both want to enable the features and they care whether they work correctly.  That leads me to conclude that it's not ok.  :)
> 
> What is the correctness issue? What is the source of the constraint that fields can't be added to ResourceRequestBase?

My understanding is that the Mac port converts ResourceHandles into NSURLRequests, bounces them around for a while, then converts them back to ResourceHandles.  The semantics for a bunch of the client APIs is that "we hand you a ResourceHandle object and you give us back a completely different object than we'll then use."

I'm not sure whether all the bouncing around is contained within WebKit or whether the Mac port hands these NSURLRequests off to the embedder.
Comment 33 Alexey Proskuryakov 2010-10-07 14:06:51 PDT
> I'm not sure whether all the bouncing around is contained within WebKit or whether the Mac port hands these NSURLRequests off to the embedder.

They are sent to the embedder. I don't think that we overwrite NSURLRequests inside WebKit.
Comment 34 Alexey Proskuryakov 2010-10-07 14:11:11 PDT
> I'm adding this capability to ResourceHandle because its a good place for it in the stack. This is in support of XHR.asBlob/.responseBlob.

This was discussed on webkit-dev a few weeks ago. I'm saying that blob support shouldn't be in ResourceHandle either. See <http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg12820.html> - we didn't reach agreement then, and without such agreement, I think that it's unadvisable to put more responsibilities upon ResourceHandle.
Comment 35 Adam Barth 2010-10-07 14:26:03 PDT
(In reply to comment #34)
> > I'm adding this capability to ResourceHandle because its a good place for it in the stack. This is in support of XHR.asBlob/.responseBlob.
> 
> This was discussed on webkit-dev a few weeks ago. I'm saying that blob support shouldn't be in ResourceHandle either. See <http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg12820.html> - we didn't reach agreement then, and without such agreement, I think that it's unadvisable to put more responsibilities upon ResourceHandle.

This patch is about something different than that thread.  That thread was about resolving blob URLs.  This patch is about requesting normal URL (like http URLs), but asking for the data to be returned in the form of a blob.

There seem to be two choices for how to do this:

1) Every API that wants to expose data as a blob (in this case XMLHttpRequest) first requests that data as bytes and then stuffs those bytes into a blob.

2) We add the ability for the network stack to return responses as blobs and plumb the blob responses back through the loading system.

In principle, we could restructure the loading system to always return responses as blob, since a blob is strictly more general than a sequence of bytes, but given that most callers want a sequence of bytes, that might not be the best path forward.

The advantage of (2) is that when you're requesting something that's already on disk (e.g., in the HTTP cache), you don't have to actually read the bytes off disk until later.  In some cases, you can avoid ever reading the bytes off disk if, for example, the web site shoves the blob into a persistent storage API that accepts blobs.
Comment 36 Michael Nordman 2010-10-07 19:49:29 PDT
Created attachment 70195 [details]
downloadAsBlob

New patch that addresses adam's comments.
Comment 37 WebKit Review Bot 2010-10-07 19:51:35 PDT
Attachment 70195 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1
WebCore/platform/network/ResourceHandle.cpp:30:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 34 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 38 Michael Nordman 2010-10-07 20:12:02 PDT
Created attachment 70198 [details]
downloadAsBlob

fix include order
Comment 39 Alexey Proskuryakov 2010-10-07 20:19:30 PDT
Comment on attachment 70198 [details]
downloadAsBlob

r- per comment 34.
Comment 40 Adam Barth 2010-10-07 20:37:37 PDT
> r- per comment 34.

Did you have thoughts re: comment 35?
Comment 41 Alexey Proskuryakov 2010-10-07 21:25:20 PDT
OK. I wasn't sure if this bug was a good place for such discussion, but I don't really have a better suggestion for the place. We already tried webkit-dev.

I'd say that the webkit-dev discussion was more general, even though it started from discussing the code for requesting blob URLs. At least my comments from it all apply - I prefer to have a class that encapsulates platform specific loader libraries, giving them compatible behavior with respect to buffering, authentication and other aspects that are not specific to the Web. Whether it's called ResourceHandle or something else is not important - but as of a few months ago, ResourceHandle was pretty much such a class. It's hard enough to get that working, we don't need any extras to make it harder.

This is similar to how we divided a generic stream implementation and WebSockets code.

Now, we clearly need a layer on top of that for resource loading, standardizing the supported schemes (e.g. data: may or may not be supported by underlying libraries). Such a layer could know more about WebCore internals, possibly even about the blob scheme. Or maybe blob goes even further.

As far as output format is concerned, the existing stream interface seems the most natural for a loading class. It should be easy to write an adaptor converting it into a blob, saving it to a file etc. It doesn't seem like every caller would need to know much about blobs to use such an adaptor.
Comment 42 Alexey Proskuryakov 2010-10-07 21:26:28 PDT
As a separate note, discussion in the bug seems to expect the reader to know the relationship between downloadAsBlob and downloadAsFile. I don't know how they are related.
Comment 43 Adam Barth 2010-10-07 21:35:23 PDT
(In reply to comment #42)
> As a separate note, discussion in the bug seems to expect the reader to know the relationship between downloadAsBlob and downloadAsFile. I don't know how they are related.

This appears to be specific to the Chromium WebKit API and likely something that will need to be corrected at the API level.

(In reply to comment #41)
> As far as output format is concerned, the existing stream interface seems the most natural for a loading class. It should be easy to write an adaptor converting it into a blob, saving it to a file etc. It doesn't seem like every caller would need to know much about blobs to use such an adaptor.

That precludes optimizations that avoid actually reading the bytes off disk because the bytes are forced to flow through this class.
Comment 44 Alexey Proskuryakov 2010-10-08 10:15:38 PDT
> That precludes optimizations that avoid actually reading the bytes off disk because the bytes are forced to flow through this class.

Well, the bytes are in memory when loaded from network, and they are in memory when someone uses them. There don't seem to be a lot of operations that can be performed without touching memory.
Comment 45 Adam Barth 2010-10-08 10:33:04 PDT
(In reply to comment #44)
> > That precludes optimizations that avoid actually reading the bytes off disk because the bytes are forced to flow through this class.
> 
> Well, the bytes are in memory when loaded from network, and they are in memory when someone uses them. There don't seem to be a lot of operations that can be performed without touching memory.

They're not necessarily in memory if the resource already resides in the disk cache, for example.  Also, in a multiprocess architecture, the bytes might reside in memory in one process but not in another.
Comment 46 Alexey Proskuryakov 2010-10-08 10:45:23 PDT
> They're not necessarily in memory if the resource already resides in the disk cache, for example.

Well, but if they are being loaded from the disk cache, that's presumably for a purpose, and that usually means that they need to be loaded to memory.

> Also, in a multiprocess architecture, the bytes might reside in memory in one process but not in another.

Yes, a multi-process implementation that does loading in a different process clearly needs a better split line than ResourceHandle. Passing a file handle may make sense there.

Trying to avoid larger refactoring by overloading the purpose of ResourceHandle is the root of many disputes that we have.
Comment 47 Adam Barth 2010-10-08 11:18:53 PDT
(In reply to comment #46)
> > They're not necessarily in memory if the resource already resides in the disk cache, for example.
> 
> Well, but if they are being loaded from the disk cache, that's presumably for a purpose, and that usually means that they need to be loaded to memory.

Not necessarily.  For example, suppose the web site wants to stick the blob in FileSystem.

> > Also, in a multiprocess architecture, the bytes might reside in memory in one process but not in another.
> 
> Yes, a multi-process implementation that does loading in a different process clearly needs a better split line than ResourceHandle. Passing a file handle may make sense there.

I'm not sure I understand.  A blob is the web platform equivalent of a file handle.  I agree that it makes sense to ask the network stack for a handle to the resource as opposed to the resource itself, which is what this patch lets you do.

> Trying to avoid larger refactoring by overloading the purpose of ResourceHandle is the root of many disputes that we have.

We = ?

What larger refactoring did you have in mind?  The advantage of retrieving a blob from the network stack is that you get a handle to the resource instead of the resource itself.  That seems very much in line with what ResourceHandle is all about.

As I wrote above, it would actually make sense for resource handle to always return a blob.  Currently, however, most clients would just immediately dereference the blob and grab the bytes.  In ports with network stacks that don't understand blobs, that would result in two extra copies (in and out of the blob), which seems undesirable.

Given these constraints, having an option for ResourceHandle to return a handle to the resource in the form of a blob seems to make the most sense.
Comment 48 Michael Nordman 2010-10-08 12:59:46 PDT
Comment on attachment 70198 [details]
downloadAsBlob

Resetting r? to get the try bots to run.
Comment 49 Maciej Stachowiak 2010-10-08 14:07:01 PDT
(In reply to comment #47)
> 
> What larger refactoring did you have in mind?  The advantage of retrieving a blob from the network stack is that you get a handle to the resource instead of the resource itself.  That seems very much in line with what ResourceHandle is all about.
> 
> As I wrote above, it would actually make sense for resource handle to always return a blob.  Currently, however, most clients would just immediately dereference the blob and grab the bytes.  In ports with network stacks that don't understand blobs, that would result in two extra copies (in and out of the blob), which seems undesirable.
> 
> Given these constraints, having an option for ResourceHandle to return a handle to the resource in the form of a blob seems to make the most sense.

The class "Blob" is a class that is part of the Web platform API, to the point of having its own IDL file. ResourceHandle is supposed to be ignorant of the details of DOM APIs. Making ResourceHandle use Blob directly is a layering violation. If there needs to be a lower-level concept of "file handle" that can live at the Platform layer, that is fine, but making code in the platform/ directory depend on Blob breaks the intended layering.

BlobResourceHandle is also a layering violation for the same reason, one that I have objected to and would have r-'d if I had seen it before it landed. I do not think we should go further in this direction.

I agree with Alexey that there should be separate classes that perform the roles of "wrapping the underlying network library to make the interface uniform" and "do everything needed for resource loading in the browser context, including features that are not part of the network library". It used to be the intent that these classes are ResouceHandle and ResourceLoader respectively.

We seem to be increasingly moving towards a place where there is *no* clear separatation of responsibilities between ResourceLoader and ResourceHandle. If ResourceHandle manipulates DOM objects, then what possible standard could you use to decide what functionality should go in ResourceHandle instead of ResourceLoader?

It seems the root of this trend is the fact that it's easier for Chromium to proxy things cross-process exclusively at the ResourceHandle layer, therefore functionality gets pushed there that is not solely about abstracting the platform network library. But the result of that is that platform-specific code that is all about wrapping the network library is mixed in a fairly arbitrary way with cross-platform functionality that could be on top of the low-level interface.
Comment 50 Adam Barth 2010-10-08 14:17:58 PDT
I agree that using a FileHandle abstraction in platform is a better design.  I also agree that BlobResourceHandle is not the right design.

IMHO, we could probably change ResourceHandle to always return a FileHandle without incurring a performance penalty for network libraries that always sequences of bytes.  We'd just have to teach FileHandle how to deal with external memory.

That would make ResourceHandle's job to translate from a Resource (i.e., a URL) to its representation (i.e., a sequence of bytes).  In the same way ResourceHandle is an indirect reference to a Resource.  FileHandle is an indirect reference to a sequence of bytes.  You can either get the bytes directly (like ResourceHandle gives you today) or deal with the handle.  If you keep things as a handle, you might be able to avoid ever need to materialize the bytes.
Comment 51 Michael Nordman 2010-10-08 14:30:54 PDT
Created attachment 70287 [details]
downloadAsBlob

New patch that allows the additional ResourceRequestBase data member to survivce the lossy call out to the embedder via willSendRequest(). Seems like a bug in the WebKit layer that this kind of loss occurs.

I'm not looking to put this here simply as a matter of convenience. This particular functionality seems to belong here for all the reasons Adam has articulated.

About the dependency on class Blob, which happens to have script bindings as well as being a primitive object. I think this is a minor issue. When working up this patch, I had in mind that Blob was actually defined in the platform layer (ooops). It is a rather primitive class.
* Perhaps it should be moved.
* In time, the Blob infrastructe could be refactored such that there is a BlobInternal vs Blob (scriptable) with the 'internal' class being defined in the platform layer.

About the 'adapter' being simple (ap's comments). Yes I have in mind to build support for such an adpater into ResourceHandle.cpp directly. It's a simple ResourceHandleClientWrapper.
Comment 52 Maciej Stachowiak 2010-10-08 14:33:37 PDT
(In reply to comment #50)
> I agree that using a FileHandle abstraction in platform is a better design.  I also agree that BlobResourceHandle is not the right design.
> 
> IMHO, we could probably change ResourceHandle to always return a FileHandle without incurring a performance penalty for network libraries that always sequences of bytes.  We'd just have to teach FileHandle how to deal with external memory.
> 
> That would make ResourceHandle's job to translate from a Resource (i.e., a URL) to its representation (i.e., a sequence of bytes).  In the same way ResourceHandle is an indirect reference to a Resource.  FileHandle is an indirect reference to a sequence of bytes.  You can either get the bytes directly (like ResourceHandle gives you today) or deal with the handle.  If you keep things as a handle, you might be able to avoid ever need to materialize the bytes.

That sounds like a plausible design. Though thinking about it, I am not sure FileHandle is the right name, if in some cases it may represent bytes in memory rather than an actual file on disk.

It also seems to me that clients tend to know up front if they will want immediate access to the data (as currently provided by the ResourceHandle API) or if it's sufficient to have a handle that allows potentially-async access to the data (as in the Blob case). Not sure we can make both convenient and efficient with the same API for both use cases - clients may have to state their intent and get data in different forms.
Comment 53 Maciej Stachowiak 2010-10-08 14:36:10 PDT
(In reply to comment #51)
> Created an attachment (id=70287) [details]
> downloadAsBlob
> 
> New patch that allows the additional ResourceRequestBase data member to survivce the lossy call out to the embedder via willSendRequest(). Seems like a bug in the WebKit layer that this kind of loss occurs.
> 
> I'm not looking to put this here simply as a matter of convenience. This particular functionality seems to belong here for all the reasons Adam has articulated.
> 
> About the dependency on class Blob, which happens to have script bindings as well as being a primitive object. I think this is a minor issue. When working up this patch, I had in mind that Blob was actually defined in the platform layer (ooops). It is a rather primitive class.
> * Perhaps it should be moved.
> * In time, the Blob infrastructe could be refactored such that there is a BlobInternal vs Blob (scriptable) with the 'internal' class being defined in the platform layer.
> 

I don't think it's a minor issue. It is a significant layering violation, and compounds a trend towards more of this kind of layering violation in the ResourceHandle class. It seems that Adam and Alexey agree, as of their latest comments, that ResourceHandle depending on Blob is not the best design. I think you should address this point of review feedback.
Comment 54 Adam Barth 2010-10-08 14:38:36 PDT
> That sounds like a plausible design. Though thinking about it, I am not sure FileHandle is the right name, if in some cases it may represent bytes in memory rather than an actual file on disk.

Indeed.  Perhaps ResourceRepresentationHandle would be the pedantically correct name w.r.t. the web architecture.

> It also seems to me that clients tend to know up front if they will want immediate access to the data (as currently provided by the ResourceHandle API) or if it's sufficient to have a handle that allows potentially-async access to the data (as in the Blob case). Not sure we can make both convenient and efficient with the same API for both use cases - clients may have to state their intent and get data in different forms.

The approach Micheal has in his patch is that the client says upfront which they'd prefer, and they get back a different type of object depending on what the asked for.  Another approach is a flag on the ResourceRequest that hints whether the client wants synchronous access.  The client would then get back the same type of object, but it would optimize performance in one case or the other depending on the hint.
Comment 55 Michael Nordman 2010-10-08 14:49:49 PDT
(In reply to comment #53)
> I don't think it's a minor issue. It is a significant layering violation, and compounds a trend towards more of this kind of layering violation in the ResourceHandle class. It seems that Adam and Alexey agree, as of their latest comments, that ResourceHandle depending on Blob is not the best design. I think you should address this point of review feedback.

I think we're all saying something similar. Replace my use of "BlobInternal" with adam's use of "FileHandle" in what i said.
Comment 56 Adam Barth 2010-10-08 14:53:33 PDT
Comment on attachment 70287 [details]
downloadAsBlob

Maciej is saying he'd prefer the patch not to incur a technical debt that needs to be paid off later.  Instead, we should land the dependencies in the proper order so that each patch is correctly layered / architected.
Comment 57 Michael Nordman 2010-10-08 15:31:23 PDT
Created attachment 70301 [details]
downloadAsBlob

synced and merged, i just want to see the try bot results
Comment 58 Michael Nordman 2010-10-08 15:38:06 PDT
(In reply to comment #32)
> > > That doesn't sound like an offer to pay the debt.  Maybe we should wait for that work to complete before landing this part of the patch?
> > 
> > There is a patch out for review that pays this debt.
> > https://bugs.webkit.org/show_bug.cgi?id=45909
> > Notice the new BlobDataItem ctor that takes a PassOwnPtr<Vector<char> >.
> > 
> > I'll remove the impl of this sync method body for now and follow up with a later patch that
> > * makes use of that ctor
> > * provides a common code impl for the sync and async cases
> 
> Great.

Oh... i had linked to the wrong patch.
https://bugs.webkit.org/show_bug.cgi?id=46543
Comment 59 Michael Nordman 2010-10-08 17:44:59 PDT
So... what i'm hearing in this thread is that.

* we agree in principle that ResourceHandle should have a means of providing an opaque handle to the response data

* the caller should have a means of expressing they would like the response in that form

* that opaque handle can be handed off to a blob

So i'll count that as progress.
Comment 60 WebKit Review Bot 2010-10-08 18:15:36 PDT
Attachment 70301 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/4283004
Comment 61 Adam Barth 2010-10-10 12:28:28 PDT
Comment on attachment 70301 [details]
downloadAsBlob

R- per the discussion above.  We'll need some sort of platform-layer abstraction for a handle to a resource representation before landing this patch.
Comment 62 Eric Seidel (no email) 2010-10-13 15:52:02 PDT
Comment on attachment 65667 [details]
downloadToFile

Cleared Darin Fisher's review+ from obsolete attachment 65667 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment 63 Jarred Nicholls 2011-12-06 14:32:39 PST
Any updates to this?  I'm volunteering...