Bug 18343 - [GTK] Soup backend must handle upload of multiple files
Summary: [GTK] Soup backend must handle upload of multiple files
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Gustavo Noronha (kov)
URL: http://encodable.com/uploaddemo/
Keywords: Gtk, Soup
Depends on:
Blocks:
 
Reported: 2008-04-07 09:06 PDT by Luca Bruno
Modified: 2009-02-12 10:20 PST (History)
3 users (show)

See Also:


Attachments
proposed patch (5.34 KB, patch)
2009-01-28 17:41 PST, Gustavo Noronha (kov)
no flags Details | Formatted Diff | Diff
updated proposed patch (5.63 KB, patch)
2009-01-29 12:13 PST, Gustavo Noronha (kov)
mrowe: review-
Details | Formatted Diff | Diff
fixed proposed patch (6.85 KB, patch)
2009-01-30 09:07 PST, Gustavo Noronha (kov)
no flags Details | Formatted Diff | Diff
patch ready to be committed (6.86 KB, patch)
2009-02-01 09:54 PST, Gustavo Noronha (kov)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Luca Bruno 2008-04-07 09:06:32 PDT
It's a missing feature. The curl backend currently handle this, but soup doesn't yet.
Comment 1 Dan Winship 2008-04-07 09:56:45 PDT
"multipart" branch of libsoup git (http://gnome.org/~danw/libsoup.git) has various bits of support for multipart/form-data. (SoupMultipart, plus some new methods in soup-forms.h)

that's probably ready to merge in to libsoup svn once I branch for GNOME 2.22 later today
Comment 2 Luca Bruno 2008-04-07 11:03:04 PDT
WebCore gives us already the elements with multipart headers, but not the data in case of files. In curl I just ended up by writing a function (readCallback if not mispelled) that wrote the whole contents of the form to a buffer given by curl. I think this can be done in soup too? Or do you think it's better to have the multipart API in first?
Comment 3 Dan Winship 2008-04-07 14:07:44 PDT
ah, ok, you're right, you don't need SoupMultipart at all.

basically what you want to do is iterate over httpBody()->elements(), and
soup_message_body_append() each chunk to the request_body. (As far as I understand the C++ bits, you should be able to use SOUP_MEMORY_TEMPORARY when appending, because the pointers you pass will remain valid until after the SoupMessage is destroyed. [This avoids needing to copy the data.]) For the files to be uploaded, you can either read them into memory like in curl, or you can copy the trick from libsoup/tests/simple-httpd.c of using mmap-based SoupBuffers. You should also call "soup_message_body_set_accumulate (msg->request_body, FALSE)", to tell libsoup that you don't need it to concatenate everything together into msg->request_body->data for you when it's done. (That's only available in libsoup 2.4.1, released today, so maybe forget about that for now...)
Comment 4 Gustavo Noronha (kov) 2009-01-28 17:41:03 PST
Created attachment 27134 [details]
proposed patch

I implemented what Dan suggested, using the libsoup test as an example. It worked in my tests using GtkLauncher and a simple test CherryPy application, uploading a single file.
Comment 5 Dan Winship 2009-01-28 18:48:41 PST
(In reply to comment #4)
> I implemented what Dan suggested

I'd said to skip the soup_message_body_set_accumulate call "for now", since it was only available in the very-latest libsoup when I wrote the comment. But that was a long time ago, and WebKit currently depends on a much newer libsoup than that anyway, so you should definitely add that. (Though this is just a performance thing, not a correctness thing.)
Comment 6 Gustavo Noronha (kov) 2009-01-29 02:25:14 PST
(In reply to comment #5)
> (In reply to comment #4)
> > I implemented what Dan suggested
> 
> I'd said to skip the soup_message_body_set_accumulate call "for now", since it
> was only available in the very-latest libsoup when I wrote the comment. But
> that was a long time ago, and WebKit currently depends on a much newer libsoup
> than that anyway, so you should definitely add that. (Though this is just a
> performance thing, not a correctness thing.)


Yeah, I considered that, actually. I tried adding the call to just before the for loop, and it caused the code to not work, I'm not really sure why. I believe, though, that having soup accumulate the data to request_body may be sound, since we are thinking of providing the SoupMessage to signals fired after the request has been prepared and sent. What's your opinion on that? This might actually exhaust virtual memory, I guess?
Comment 7 Dan Winship 2009-01-29 04:49:20 PST
(In reply to comment #6)
> Yeah, I considered that, actually. I tried adding the call to just before the
> for loop, and it caused the code to not work, I'm not really sure why.

Hm... maybe something else expects msg->request_body to be filled in later on? Actually, now that I think about it, I'm not sure anyone else is using accumulate=FALSE on the *request* body, so maybe there's a bug in that code in libsoup...

> I
> believe, though, that having soup accumulate the data to request_body may be
> sound, since we are thinking of providing the SoupMessage to signals fired
> after the request has been prepared and sent. What's your opinion on that? This
> might actually exhaust virtual memory, I guess?

Well, the curl backend always behaves that way, right?, so it wouldn't be any worse than that. But there's not much reason to do the mmap hack in that case.
Comment 8 Dan Winship 2009-01-29 06:22:36 PST
(In reply to comment #7)
> But there's not much reason to do the mmap hack in that case.

Oops, strike that part. Leaving accumulate on means it will keep the mmapped buffers around longer than it would otherwise, but it won't actually concatenate everything into a single buffer unless someone explicitly calls soup_message_body_flatten(msg->request_body), so mmapping is still probably better memory-usage-wise.
Comment 9 Gustavo Noronha (kov) 2009-01-29 08:52:28 PST
(In reply to comment #8)
> (In reply to comment #7)
> > But there's not much reason to do the mmap hack in that case.
> 
> Oops, strike that part. Leaving accumulate on means it will keep the mmapped
> buffers around longer than it would otherwise, but it won't actually
> concatenate everything into a single buffer unless someone explicitly calls
> soup_message_body_flatten(msg->request_body), so mmapping is still probably
> better memory-usage-wise.

Yeah, the tests I've done show that only virtual memory and IO caches are affected, while the "private" memory actually allocated stays low, so I don't think a flatten is happening anywhere; using mmap is useful.
Comment 10 Gustavo Noronha (kov) 2009-01-29 08:55:19 PST
(In reply to comment #7)
> Hm... maybe something else expects msg->request_body to be filled in later on?
> Actually, now that I think about it, I'm not sure anyone else is using
> accumulate=FALSE on the *request* body, so maybe there's a bug in that code in
> libsoup...

Maybe. I'll do some further testing.

> > believe, though, that having soup accumulate the data to request_body may be
> > sound, since we are thinking of providing the SoupMessage to signals fired
> > after the request has been prepared and sent. What's your opinion on that? This
> > might actually exhaust virtual memory, I guess?
> 
> Well, the curl backend always behaves that way, right?, so it wouldn't be any
> worse than that. But there's not much reason to do the mmap hack in that case.

Yeah, we'd match curl functionality, but I am afraid we may be needlessly limiting the size of uploads this way, even though the bar would be set pretty high. We can always investigate this further down the line, though, when the need actually arises. Having some form of file upload for Soup right now is a great win already IMO.
Comment 11 Gustavo Noronha (kov) 2009-01-29 12:13:04 PST
Created attachment 27154 [details]
updated proposed patch

This patch only adds a TODO comment regarding the usage of the accumulate flag. We would have to require a development version of libsoup, so I'd rather wait 'till we are far enough into the future, or have ways of #ifdef'ing the line before adding it. As Dan says, this would improve performance, but not setting it to FALSE doesn't make the code incorrect.
Comment 12 Gustavo Noronha (kov) 2009-01-29 12:22:40 PST
(In reply to comment #11)
> This patch only adds a TODO comment regarding the usage of the accumulate flag.

when compared to my previous one, that is =)
Comment 13 Mark Rowe (bdash) 2009-01-30 05:11:29 PST
Comment on attachment 27154 [details]
updated proposed patch

Some brief style comments since I'm not familiar with soup at all.
 
> +#include <fcntl.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
>  #include "config.h"
>  #include "CString.h"
>  #include "ResourceHandle.h"

Per our style guidelines, config.h and ResourceHandle.h should be included first in this file and grouped together, followed by other WebCore headers, then other headers.  That is to say that the new includes should be moved down  by the exiting gio / libsoup includes, and CString.h moved down into the other WebCore includes. 

> @@ -58,6 +64,20 @@ enum
>      ERROR_BAD_NON_HTTP_METHOD
>  };
>  
> +struct mapping
> +{
> +    gpointer ptr;
> +    gsize length;
> +};
> +
> +static void
> +free_mapping(gpointer data)
> +{
> +    struct mapping* mapping = (struct mapping*)data;
> +    munmap (mapping->ptr, mapping->length);
> +    g_slice_free (struct mapping, mapping);
> +}
> +
>  static void cleanupGioOperation(ResourceHandleInternal* handle);

Types should be named with an initial capital letter.  The name "mapping" is rather generic and could perhaps better describe what the type represents.  The return type and linkage specifier of a function declaration should be on the same line as the rest of the declaration.  Function names in WebCore should use camelCase rather than lower_case_with_underscores.  The extra "struct" qualifier when referring to the type should be omitted as it is not needed in C++.  The type cast should use a C++-style cast rather than a C-style cast.

> +        gsize num_elements = httpBody->elements().size();
> +

This should be of type size_t rather than gsize since it is only used to index in to a WebCore type.  It should also be named in camelCase.

> +            for (gsize count = 0; count < num_elements; count++) {

Should be size_t for the same reason as above.  "i" would probably be a more idiomatic name than count. 

> +                const FormDataElement& element = httpBody->elements()[count];
> +
> +                if (element.m_type == FormDataElement::data) {
> +                    soup_message_body_append(msg->request_body, SOUP_MEMORY_TEMPORARY, element.m_data.data(), element.m_data.size());
> +                } else {

The one-line body of the if should not have {}s around it.

> +                    /*
> +                     * mapping for uploaded files code inspired by technique used in
> +                     * libsoup's simple-httpd test
> +                     */
> +                    SoupBuffer* soupBuffer;
> +                    struct stat statbuf;

These variables should be declared immediately before they're first needed.

> +                    struct mapping* mapping = g_slice_new (struct mapping);
> +                    int fd = open(element.m_filename.utf8().data(), O_CLOEXEC|O_RDONLY);
> +
> +                    fstat(fd, &statbuf);
> +
> +                    mapping->ptr = mmap(NULL, statbuf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +                    mapping->length = statbuf.st_size;
> +
> +                    close(fd);

This code really needs some error-handling.  If the file has been deleted since the user selected it in the form control, opening the file will fail and we'll probably end up segfaulting after mmap returns an error and we treat it as a pointer.

I'm going to mark this as r- due to the lack of error handling and style issues.  If someone familiar with libsoup can chime in on whether the usage is correct then this can be r+'d when those issues are addressed.
Comment 14 Gustavo Noronha (kov) 2009-01-30 09:07:10 PST
Created attachment 27182 [details]
fixed proposed patch

Fxed style issues reported by bdash, and also added error checking for open() and mmap(). As I said on IRC, I believe we have soup covered since Dan worked with me to figure out why _set_accumulate(msg, FALSE) wasn't working, and verified the code in the process.
Comment 15 Mark Rowe (bdash) 2009-01-31 16:44:39 PST
Comment on attachment 27182 [details]
fixed proposed patch

I'm going to say r=me, though there are a few issues that should be addressed before landing.

>  #if PLATFORM(GTK)
>      #if GLIB_CHECK_VERSION(2,12,0)
> @@ -55,9 +61,24 @@ enum
>  {
>      ERROR_TRANSPORT,
>      ERROR_UNKNOWN_PROTOCOL,
> -    ERROR_BAD_NON_HTTP_METHOD
> +    ERROR_BAD_NON_HTTP_METHOD,
> +    ERROR_UNABLE_TO_MAP_FILE,
> +    ERROR_UNABLE_TO_OPEN_FILE,
> +};

Do we need the two different error codes here?  The "map file" one seems like an implementation detail that adds little benefit over the "open file" error code.

> +
> +struct FileMapping
> +{
> +    gpointer ptr;
> +    gsize length;
>  };
>  
> +static void freeFileMapping(gpointer data)
> +{
> +    FileMapping* fileMapping = static_cast<FileMapping*>(data);
> +    munmap (fileMapping->ptr, fileMapping->length);
> +    g_slice_free (FileMapping, fileMapping);
> +}

Shouldn't have a space between the function name and parenthesis in a function call.

> +
> +                    struct stat statBuf;
> +                    fstat(fd, &statBuf);

The struct keyword is unnecessary here.

> +
> +                    FileMapping* fileMapping = g_slice_new (FileMapping);

There is extra space after the function name.

> +                    fileMapping->ptr = mmap(NULL, statBuf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +                    if (fileMapping->ptr == MAP_FAILED) {
> +                        ResourceError error("webkit-network-error", ERROR_UNABLE_TO_MAP_FILE, urlString, strerror(errno));
> +                        d->client()->didFail(this, error);
> +                        freeFileMapping(fileMapping);

Calling freeFileMapping here will attempt to munmap a pointer of MAP_FAILED.  Is that ok?

I think we also need to close the file descriptor in this error path to avoid leaking it.

> +                        g_object_unref(msg);
> +                        return false;
> +                    }
> +                    fileMapping->length = statBuf.st_size;
> +
> +                    close(fd);
> +
> +                    SoupBuffer* soupBuffer;

This declaration should be combined with the initialisation on the next line.
> +
> +                    soupBuffer = soup_buffer_new_with_owner(fileMapping->ptr, fileMapping->length, fileMapping, freeFileMapping);
Comment 16 Gustavo Noronha (kov) 2009-01-31 18:22:54 PST
(In reply to comment #15)
> (From update of attachment 27182 [details] [review])
> I'm going to say r=me, though there are a few issues that should be addressed
> before landing.

Thanks, I'll attach a patch ready for landing in a minute.

> Do we need the two different error codes here?  The "map file" one seems like
> an implementation detail that adds little benefit over the "open file" error
> code.

Agreed. I'm removing the MAP one.

> Calling freeFileMapping here will attempt to munmap a pointer of MAP_FAILED. 
> Is that ok?

My understanding from reading the manpage is that it should be OK, but I have added a guard to make sure we will only munmap if it is not MAP_FAILED.
Comment 17 Gustavo Noronha (kov) 2009-02-01 09:54:32 PST
Created attachment 27232 [details]
patch ready to be committed

I'm attaching a patch ready to be commited (except for ChangeLog changes). I have left struct stat as is because my C++ compiler was confusing it with the stat(2) call without the struct prefix, it seems.
Comment 18 Christian Dywan 2009-02-12 10:19:09 PST
Comment on attachment 27232 [details]
patch ready to be committed

        2009-02-12  Gustavo Noronha Silva  <gns@gnome.org>

        Reviewed by Mark Rowe.

        [GTK] Soup backend must handle upload of multiple files
        https://bugs.webkit.org/show_bug.cgi?id=18343

        * platform/network/soup/ResourceHandleSoup.cpp:
        (WebCore::):
        (WebCore::freeFileMapping):
        (WebCore::ResourceHandle::startHttp): Support multipart request bodies
        mmap'ing files to be uploaded.