Bug 85880 - [SOUP] Allow sending URI request data in chunks
Summary: [SOUP] Allow sending URI request data in chunks
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk, Soup
Depends on:
Blocks: 84133
  Show dependency treegraph
 
Reported: 2012-05-08 06:33 PDT by Carlos Garcia Campos
Modified: 2012-05-11 01:13 PDT (History)
7 users (show)

See Also:


Attachments
Patch (21.28 KB, patch)
2012-05-08 06:44 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (23.18 KB, patch)
2012-05-08 10:19 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch according to review comments (24.92 KB, patch)
2012-05-10 04:12 PDT, Carlos Garcia Campos
mrobinson: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2012-05-08 06:33:29 PDT
Currently we have a single IPC message that sends all the uri request data in a single chunk.
Comment 1 Carlos Garcia Campos 2012-05-08 06:44:33 PDT
Created attachment 140703 [details]
Patch
Comment 2 Carlos Garcia Campos 2012-05-08 07:00:26 PDT
It seems g_pollable_source_new_full() is too new.
Comment 3 Dan Winship 2012-05-08 08:12:13 PDT
Comment on attachment 140703 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=140703&action=review

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:43
> +    GRefPtr<GSource> baseSource = adoptGRef(g_timeout_source_new(0));

That means the source will always trigger immediately, and so if there's not actually any data available, the reader will spin in a tight loop. When the stream is "empty" but not finished, you have to return a source that won't trigger until after the next time webkitSoupRequestInputStreamAddData() has been called.

One way to do that would be to have a "GCancellable *waiting_for_data;" in priv, and create a source with g_cancellable_create_source(). Then do g_cancellable_cancel(priv->waiting_for_data) in webkitSoupRequestInputStreamAddData() and g_cancellable_reset(priv->waiting_for_data) in webkitSoupRequestInputStreamPollableReadNonblocking(). (Using GCancellable for this is semantically backwards, but... it works.)

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:44
> +    return g_pollable_source_new_full(stream, baseSource.get(), cancellable);

as you noted, g_pollable_source_new_full() is very new. but you can just reimplement it by hand; it's just a small wrapper around g_pollable_source_new(), which is older.

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:82
> +static void webkit_soup_request_input_stream_class_init(WebKitSoupRequestInputStreamClass* requestStreamClass)

Since you're not overriding GInputStreamClass->read_fn, it will use GMemoryInputStream's implementation, which won't deal with streaming data correctly. It *can't* deal with it correctly, but it seems like it might be better to have it either return G_IO_ERROR_NOT_SUPPORTED, or else do g_return_val_if_reached(-1);

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:108
> +    return stream->priv->bytesAdded == stream->priv->contentLength;

Hm... what about for chunked data, when the final length won't be known in advance? What does WebKit2 do in that case?
Comment 4 Carlos Garcia Campos 2012-05-08 08:24:46 PDT
(In reply to comment #3)
> (From update of attachment 140703 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=140703&action=review

Thanks for looking at it!

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:43
> > +    GRefPtr<GSource> baseSource = adoptGRef(g_timeout_source_new(0));
> 
> That means the source will always trigger immediately, and so if there's not actually any data available, the reader will spin in a tight loop. When the stream is "empty" but not finished, you have to return a source that won't trigger until after the next time webkitSoupRequestInputStreamAddData() has been called.

Yeah, I thought about it, but assumed the ui client would send more data soon. I agree we should use a custom GSource to implement it properly. 

> One way to do that would be to have a "GCancellable *waiting_for_data;" in priv, and create a source with g_cancellable_create_source(). Then do g_cancellable_cancel(priv->waiting_for_data) in webkitSoupRequestInputStreamAddData() and g_cancellable_reset(priv->waiting_for_data) in webkitSoupRequestInputStreamPollableReadNonblocking(). (Using GCancellable for this is semantically backwards, but... it works.)

It's a bit tricky, but easier than implementing a new GSource.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:44
> > +    return g_pollable_source_new_full(stream, baseSource.get(), cancellable);
> 
> as you noted, g_pollable_source_new_full() is very new. but you can just reimplement it by hand; it's just a small wrapper around g_pollable_source_new(), which is older.

Yes, I was looking at the latest changes in gio.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:82
> > +static void webkit_soup_request_input_stream_class_init(WebKitSoupRequestInputStreamClass* requestStreamClass)
> 
> Since you're not overriding GInputStreamClass->read_fn, it will use GMemoryInputStream's implementation, which won't deal with streaming data correctly. It *can't* deal with it correctly, but it seems like it might be better to have it either return G_IO_ERROR_NOT_SUPPORTED, or else do g_return_val_if_reached(-1);

Yes, I assumed it was not going to be used, since this input stream is not a generic object in the end, and will only be used by ResourceHandleSoup.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:108
> > +    return stream->priv->bytesAdded == stream->priv->contentLength;
> 
> Hm... what about for chunked data, when the final length won't be known in advance? What does WebKit2 do in that case?

The API always has to provide the content_length in advance, doesn't SoupRequest also require to set content_length before finish()?
Comment 5 Carlos Garcia Campos 2012-05-08 10:19:21 PDT
Created attachment 140735 [details]
Updated patch

Use a different approach for our custom stream that should work with previous versions of glib.
Comment 6 Dan Winship 2012-05-08 11:25:59 PDT
(In reply to comment #4)
> > Hm... what about for chunked data, when the final length won't be known in advance? What does WebKit2 do in that case?
> 
> The API always has to provide the content_length in advance, doesn't SoupRequest also require to set content_length before finish()?

Yes, if it's known. But the docs explicitly say soup_request_get_content_length() can return -1 too.
Comment 7 Martin Robinson 2012-05-09 17:59:23 PDT
Comment on attachment 140735 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=140735&action=review

This looks pretty good to me! I have a few outstanding questions though, before I feel comfortable giving an r+.

> Source/WebKit2/ChangeLog:7
> +        Reviewed by NOBODY (OOPS!).
> +

It's probably a good idea to write a couple sentences describing this change in general, for the sake of future readers.

> Source/WebKit2/UIProcess/soup/WebSoupRequestManagerProxy.cpp:78
>      if (!m_client.didReceiveURIRequest(this, WebURL::create(uriString).get(), requestID))
> -        handleURIRequest(WebData::create(0, 0).get(), String(), requestID);
> +        handleURIRequest(WebData::create(0, 0).get(), 0, String(), requestID);

Hrm. Perhaps this should return an error?

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:45
> +    uint64_t bytesAdded;

bytesReceivedFromWebProcess or bytesReceived? bytesAdded could easily refer to "bytes added to the output."

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:48
> +    Mutex readLock;

I'm a little confused as to what exactly this mutex is protecting. Is it protecting just the data in this private structure or any operations at all?

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:81
> +        stream->priv->asyncReadData = adoptPtr(new AsyncReadData(result.get(), buffer, count, cancellable));

Won't this break if g_input_stream_read_async is called twice in a row?

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:135
> +    if (webkitSoupRequestInputStreamFinished(stream))
> +        return;

Does this need to be protected by the mutex or can the mutex be locked after this early return¿

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:137
> +    if (stream->priv->bytesAdded + dataLength > stream->priv->contentLength)
> +        dataLength = stream->priv->contentLength - stream->priv->bytesAdded;

Dan should confirm, but is it really an error to pass more data to soup than specified by the content length? What about the case where contentLength = -1 that he mentioned above?
Comment 8 Martin Robinson 2012-05-09 18:03:52 PDT
Comment on attachment 140735 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=140735&action=review

> Source/WebKit2/WebProcess/soup/WebSoupRequestManager.h:58
> +    void handleURIRequest(const CoreIPC::DataReference&, uint64_t contentLength, const String& mimeType, uint64_t requestID);
> +    void addDataToURIRequest(const CoreIPC::DataReference&, uint64_t requestID);

Another quick comment here... Perhaps these names should be similar to the names that WebCore uses in these situations, like handleURISchemeResponseReceived and handleURISchemeResponseData or something even better.
Comment 9 Carlos Garcia Campos 2012-05-09 23:52:45 PDT
(In reply to comment #7)
> (From update of attachment 140735 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=140735&action=review
> 
> This looks pretty good to me! I have a few outstanding questions though, before I feel comfortable giving an r+.

Thanks!

> > Source/WebKit2/ChangeLog:7
> > +        Reviewed by NOBODY (OOPS!).
> > +
> 
> It's probably a good idea to write a couple sentences describing this change in general, for the sake of future readers.

Fair enough

> > Source/WebKit2/UIProcess/soup/WebSoupRequestManagerProxy.cpp:78
> >      if (!m_client.didReceiveURIRequest(this, WebURL::create(uriString).get(), requestID))
> > -        handleURIRequest(WebData::create(0, 0).get(), String(), requestID);
> > +        handleURIRequest(WebData::create(0, 0).get(), 0, String(), requestID);
> 
> Hrm. Perhaps this should return an error?

No, this is handled in the web process, and indeed an error is shown, the same error that if you type foo:bar and foo is not handled.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:45
> > +    uint64_t bytesAdded;
> 
> bytesReceivedFromWebProcess or bytesReceived? bytesAdded could easily refer to "bytes added to the output."

Ok, I'll think a better name.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:48
> > +    Mutex readLock;
> 
> I'm a little confused as to what exactly this mutex is protecting. Is it protecting just the data in this private structure or any operations at all?

GLib will run read_async in a thread, so webkitSoupRequestInputStreamReadAsync() and webkitSoupRequestInputStreamAddData() can be run at the same time. We need to protect bytesAdded, asyncReadData, etc.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:81
> > +        stream->priv->asyncReadData = adoptPtr(new AsyncReadData(result.get(), buffer, count, cancellable));
> 
> Won't this break if g_input_stream_read_async is called twice in a row?

You are not supposed to call read_async twice in a row, in such case, our read_async is not called because g_input_stream_read_async() returns early with the error G_IO_ERROR_PENDING.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:135
> > +    if (webkitSoupRequestInputStreamFinished(stream))
> > +        return;
> 
> Does this need to be protected by the mutex or can the mutex be locked after this early return¿

That reads bytesAdded that is only written by this method, so it can probably be moved after the early return.

> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:137
> > +    if (stream->priv->bytesAdded + dataLength > stream->priv->contentLength)
> > +        dataLength = stream->priv->contentLength - stream->priv->bytesAdded;
> 
> Dan should confirm, but is it really an error to pass more data to soup than specified by the content length?

That's not actually an error, the data is simply truncated.

> What about the case where contentLength = -1 that he mentioned above?

We are not supporting such case, we need to now when more data is expected. We could implement that case by always sending a 0 bytes data message when contentLength is -1 and all data has been read from the input stream.
Comment 10 Carlos Garcia Campos 2012-05-09 23:56:38 PDT
(In reply to comment #8)
> (From update of attachment 140735 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=140735&action=review
> 
> > Source/WebKit2/WebProcess/soup/WebSoupRequestManager.h:58
> > +    void handleURIRequest(const CoreIPC::DataReference&, uint64_t contentLength, const String& mimeType, uint64_t requestID);
> > +    void addDataToURIRequest(const CoreIPC::DataReference&, uint64_t requestID);
> 
> Another quick comment here... Perhaps these names should be similar to the names that WebCore uses in these situations, like handleURISchemeResponseReceived and handleURISchemeResponseData or something even better.

Yes, I don't like the names either, but I couldn't think of a better name.
Comment 11 Carlos Garcia Campos 2012-05-10 03:54:39 PDT
(In reply to comment #10)
> (In reply to comment #8)
> > (From update of attachment 140735 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=140735&action=review
> > 
> > > Source/WebKit2/WebProcess/soup/WebSoupRequestManager.h:58
> > > +    void handleURIRequest(const CoreIPC::DataReference&, uint64_t contentLength, const String& mimeType, uint64_t requestID);
> > > +    void addDataToURIRequest(const CoreIPC::DataReference&, uint64_t requestID);
> > 
> > Another quick comment here... Perhaps these names should be similar to the names that WebCore uses in these situations, like handleURISchemeResponseReceived and handleURISchemeResponseData or something even better.
> 
> Yes, I don't like the names either, but I couldn't think of a better name.

The names are indeed confusing, the request is handled by the UI process, so sending a message to the web process called HandleFoo makes me think the web process is supposeed to handle it. So, I think DidHandleURIRequest and DidReceiveURIREquestData make more sense.
Comment 12 Carlos Garcia Campos 2012-05-10 04:12:19 PDT
Created attachment 141142 [details]
Updated patch according to review comments

Addressed all review comments and also added support for unknown content length and mime type for compatibility with soup
Comment 13 Martin Robinson 2012-05-10 11:21:55 PDT
Comment on attachment 141142 [details]
Updated patch according to review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=141142&action=review

Okay! Thanks for addressing my comments above. Before landing please consider the following simplification and clean up. I think it will make the code a lot more readable. I had some small difficulty understanding this partly because (1) I'm an idiot and (2) It's tricky to keep track of "bytes read out" and "bytes received" especially when their names both conjure images of bytes coming in.

> Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:83
> +    // There is data in the memory stream to read.
> +    if (stream->priv->bytesRead < stream->priv->bytesReceived) {
> +        webkitSoupRequestInputStreamReadAsyncResultComplete(stream, result.get(), buffer, count, cancellable);
> +        return;
> +    }
> +
> +    // We are expecting more data to be added.
> +    if (!webkitSoupRequestInputStreamFinished(stream)) {
> +        stream->priv->asyncReadData = adoptPtr(new AsyncReadData(result.get(), buffer, count, cancellable));
> +        return;
> +    }

I feel that it might be slightly simpler to unconditionally create stream->priv->asyncReadData here. You could do something like:

bool moreDataToSendForRead = stream->priv->bytesRead < stream->priv->bytesReceived;
if (!moreDataToSendForRead && webkitSoupRequestInputStreamFinished(stream)) {
    g_simple_async_result_set_op_res_gssize(result.get(), 0);
    g_simple_async_result_complete_in_idle(result.get());
    return;
}

stream->priv->asyncReadData = adoptPtr(new AsyncReadData(result.get(), buffer, count, cancellable));
if (moreDataToSendForRead)
    webkitSoupRequestInputStreamReadAsyncResultComplete(stream, cancellable);


This way webkitSoupRequestInputStreamReadAsyncResultComplete would be responsible for calling stream->priv->asyncReadData.clear(), which seems to make more sense since it consumes it.

Also, instead of adding a comment like:
// We are expecting more data to be added.
if (!webkitSoupRequestInputStreamFinished(stream)) {

I think it makes sense to simply rename webkitSoupRequestInputStreamFinished to webkitSoupRequestWaitingForMoreDataToBeAdded or something better. I recommend doing the same for stream->priv->bytesRead < stream->priv->bytesReceived by creating a helper like webkitSoupRequestSentAllDataForPendingRead or something similar. Helper methods with good names are almost always better than comments, I'd say.
Comment 14 Carlos Garcia Campos 2012-05-11 00:40:15 PDT
(In reply to comment #13)
> (From update of attachment 141142 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=141142&action=review
> 
> Okay! Thanks for addressing my comments above. Before landing please consider the following simplification and clean up. I think it will make the code a lot more readable. I had some small difficulty understanding this partly because (1) I'm an idiot and (2) It's tricky to keep track of "bytes read out" and "bytes received" especially when their names both conjure images of bytes coming in.
> 
> > Source/WebKit2/WebProcess/soup/WebKitSoupRequestInputStream.cpp:83
> > +    // There is data in the memory stream to read.
> > +    if (stream->priv->bytesRead < stream->priv->bytesReceived) {
> > +        webkitSoupRequestInputStreamReadAsyncResultComplete(stream, result.get(), buffer, count, cancellable);
> > +        return;
> > +    }
> > +
> > +    // We are expecting more data to be added.
> > +    if (!webkitSoupRequestInputStreamFinished(stream)) {
> > +        stream->priv->asyncReadData = adoptPtr(new AsyncReadData(result.get(), buffer, count, cancellable));
> > +        return;
> > +    }
> 
> I feel that it might be slightly simpler to unconditionally create stream->priv->asyncReadData here.

That would mean adding an unnecessary alloc/dealloc in the case where we already have the data ready in the stream to read from. AsyncReadData is to postpone the async result completion until we have more data in the stream to read from.

> You could do something like:
> 
> bool moreDataToSendForRead = stream->priv->bytesRead < stream->priv->bytesReceived;
> if (!moreDataToSendForRead && webkitSoupRequestInputStreamFinished(stream)) {
>     g_simple_async_result_set_op_res_gssize(result.get(), 0);
>     g_simple_async_result_complete_in_idle(result.get());
>     return;
> }
> 
> stream->priv->asyncReadData = adoptPtr(new AsyncReadData(result.get(), buffer, count, cancellable));
> if (moreDataToSendForRead)
>     webkitSoupRequestInputStreamReadAsyncResultComplete(stream, cancellable);
> 
> 
> This way webkitSoupRequestInputStreamReadAsyncResultComplete would be responsible for calling stream->priv->asyncReadData.clear(), which seems to make more sense since it consumes it.
> 
> Also, instead of adding a comment like:
> // We are expecting more data to be added.
> if (!webkitSoupRequestInputStreamFinished(stream)) {
> 
> I think it makes sense to simply rename webkitSoupRequestInputStreamFinished to webkitSoupRequestWaitingForMoreDataToBeAdded or something better. I recommend doing the same for stream->priv->bytesRead < stream->priv->bytesReceived by creating a helper like webkitSoupRequestSentAllDataForPendingRead or something similar. Helper methods with good names are almost always better than comments, I'd say.

I'll try to use better names
Comment 15 Martin Robinson 2012-05-11 01:06:36 PDT
(In reply to comment #14)

> That would mean adding an unnecessary alloc/dealloc in the case where we already have the data ready in the stream to read from. AsyncReadData is to postpone the async result completion until we have more data in the stream to read from.

The allocation in this case is very cheap, but one way to avoid it would be to add a method to AsyncReadData to mark it as invalid and never allocate and deallocate it.
Comment 16 Carlos Garcia Campos 2012-05-11 01:13:58 PDT
Committed r116738: <http://trac.webkit.org/changeset/116738>