Bug 117735 - [curl] Improve multipart response handling
Summary: [curl] Improve multipart response handling
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 117300
  Show dependency treegraph
 
Reported: 2013-06-18 05:32 PDT by Peter Gal
Modified: 2013-09-12 11:25 PDT (History)
3 users (show)

See Also:


Attachments
patch (25.15 KB, patch)
2013-07-16 04:55 PDT, Peter Gal
bfulgham: review-
Details | Formatted Diff | Diff
patch v2 (25.17 KB, patch)
2013-08-14 03:01 PDT, Peter Gal
no flags Details | Formatted Diff | Diff
patch v3 (25.53 KB, patch)
2013-09-04 10:24 PDT, Peter Gal
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Gal 2013-06-18 05:32:38 PDT
The curl backend does not know anything about multipart responses so all of the returned data from the server will be displayed.
Comment 1 Peter Gal 2013-06-26 09:36:37 PDT
Patch should arrive in this week or so.
Comment 2 Peter Gal 2013-07-16 04:55:42 PDT
Created attachment 206765 [details]
patch
Comment 3 peavo 2013-07-16 07:51:55 PDT
Good stuff, this patch builds just fine on WinCairo.
Comment 4 Peter Gal 2013-07-16 07:53:32 PDT
(In reply to comment #3)
> Good stuff, this patch builds just fine on WinCairo.

Cool thanks for testing!
Comment 5 Peter Gal 2013-07-23 05:40:27 PDT
ping?
Comment 6 Brent Fulgham 2013-07-23 10:18:40 PDT
Comment on attachment 206765 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=206765&action=review

Overall this looks good.  There are a few minor style problems that need to be corrected.  I'm also not sure about the meaning of 'strict' header parsing.  It would be helpful to know if there are documented rules for when something should be parsed strictly or not.

> Source/WebCore/platform/network/HTTPParsers.cpp:608
> +size_t parseHTTPHeader(const char* start, size_t length, String& failureReason, AtomicString& nameStr, String& valueStr, bool strict)

Who governs whether we are parsing in 'strict' mode or not?  Is this documented by an RFC?

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:37
> +bool MultipartHandle::extractBoundary(const String &contentType, String &boundary)

We write this as "const String& contentType" and "String& boundary".  The reference is part of the type, not the argument name.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:77
> +bool MultipartHandle::matchForBoundary(const char *data, size_t pos, size_t &matched)

We write this as "const char* data" and "size_t& matched".  The pointer (and reference) token is part of the type, not the argument name.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:81
> +    for (size_t i = 0; i < m_boundaryLength; ++i)

When the contents of a for statement spans multiple lines, I prefer to see them wrapped in brackets to aid readability:

for (size_t i = 0; i < m_boundaryLength; ++i) {

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:85
> +        }

... and close here.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:91
> +bool MultipartHandle::checkForBoundary(size_t &boundaryStart, size_t &lastPartialMatch)

Argument names...

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:94
> +    const char *content = m_buffer.data();

const char* content...

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:116
> +    const char *content = m_buffer.data();

const char* content ...

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:119
> +        return false; // Not enough data, skipping for now.

This comment isn't helpful; go ahead and remove it.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:136
> +    for (; p < end; p++) {

It's good to be in the habit of using prefix operator ++/--, since there is a performance benefit to using them for iterators.

for (; p < end; ++p) {

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:156
> +void MultipartHandle::contentReceived(const char *data, size_t length)

const char* data

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:182
> +*/

Nice comment!

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:293
> +    return true; // There is still things to process, so go for it.

This *is* still *a* thing, or There *are* still things.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:322
> +        const char *data = m_buffer.data();

const char* data

> Source/WebCore/platform/network/curl/MultipartHandle.h:40
> +    static bool extractBoundary(const String &contentType, String &boundary);

&'s should be connected to the type declarations, not the argument names.

> Source/WebCore/platform/network/curl/MultipartHandle.h:42
> +    MultipartHandle(ResourceHandle* handle, const String &boundary)

Ditto.

> Source/WebCore/platform/network/curl/MultipartHandle.h:68
> +    bool matchForBoundary(const char *data, size_t pos, size_t &matched);

&'s (and *'s) should be connected to the type declarations, not the argument names.

> Source/WebCore/platform/network/curl/MultipartHandle.h:69
> +    inline bool checkForBoundary(size_t &boundaryStart, size_t &lastPartialMatch);

&'s should be connected to the type declarations, not the argument names.

> Source/WebCore/platform/network/curl/MultipartHandle.h:73
> +    ResourceHandle *m_resourceHandle;

* should be part of the type.

Since ResourceHandle is not 'owned' by this object, and the MultipartHandle cannot exist without a resource handle, I wonder if this should be a reference?
Comment 7 Peter Gal 2013-07-25 08:04:19 PDT
(In reply to comment #6)
> (From update of attachment 206765 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=206765&action=review
> 
> Overall this looks good.  There are a few minor style problems that need to be corrected.  I'm also not sure about the meaning of 'strict' header parsing.  It would be helpful to know if there are documented rules for when something should be parsed strictly or not.
> 
> > Source/WebCore/platform/network/HTTPParsers.cpp:608
> > +size_t parseHTTPHeader(const char* start, size_t length, String& failureReason, AtomicString& nameStr, String& valueStr, bool strict)
> 
> Who governs whether we are parsing in 'strict' mode or not?  Is this documented by an RFC?
> 

In this case if the strict is true it'll mean that we strictly enforce the HTTP headers to end with \r\n. If it is false we'll allow headers which are only separated by a \n. The RFC states that the headers should be separated by a \r\n but there are servers and testcases that doesn't follow this rule (sadly).

Actually I'm not sure if we should allow such headers, but in there are other places in the curl backend where it is allowed, and also I think the other network backends (libsoup for example) allows such headers.
Comment 8 Peter Gal 2013-08-13 04:37:57 PDT
(In reply to comment #6)
> (From update of attachment 206765 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=206765&action=review
> 
> > Source/WebCore/platform/network/curl/MultipartHandle.h:73
> > +    ResourceHandle *m_resourceHandle;
> 
> * should be part of the type.
> 
> Since ResourceHandle is not 'owned' by this object, and the MultipartHandle cannot exist without a resource handle, I wonder if this should be a reference?

In case of the FormDataStream the ResourceHandle is also a simple pointer (http://trac.webkit.org/browser/trunk/Source/WebCore/platform/network/curl/FormDataStreamCurl.h#L54) that's why I did the same here. Also note that in the headerCallback method the ResourceHandle ('job' variable) is pointer and for the 'client()->didReceive*' calls a pointer is required. Still I could change this to be a reference if you think that'll be better.
Comment 9 Peter Gal 2013-08-14 03:01:18 PDT
Created attachment 208710 [details]
patch v2

updated patch.
Comment 10 Chris Dumez 2013-08-14 04:28:11 PDT
Comment on attachment 208710 [details]
patch v2

View in context: https://bugs.webkit.org/attachment.cgi?id=208710&action=review

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:63
> +        boundaryEnd = contentType.find(";", boundaryStart);

';'

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:77
> +bool MultipartHandle::matchForBoundary(const char* data, size_t pos, size_t& matchedLength)

We don't use abbreviations in WebKit: pos -> position

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:82
> +        if (data[pos + i] != m_boundary[i]) {

how do we know data[pos + i] does not go out of bounds?

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:92
> +bool MultipartHandle::checkForBoundary(size_t& boundaryStartPos, size_t& lastPartialMatchPos)

"Pos" -> "Position'

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:99
> +    lastPartialMatchPos = contentLength;

Ditto.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:117
> +    const char* content = m_buffer.data();

This can be moved after the if check for contentLength.

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:135
> +    char* end = const_cast<char*>(content) + contentLength;

Cannot this one be const?

> Source/WebCore/platform/network/curl/MultipartHandle.cpp:249
> +        if (!m_buffer.size())

m_buffer.isEmpty()

> Source/WebCore/platform/network/curl/MultipartHandle.h:77
> +    Vector<char> m_buffer;

Considering the operations you are doing (append and remove from beginning), a Vector may not be the best data structure. You might want to take a look at StreamBuffer (or Deque)

> Source/WebCore/platform/network/curl/ResourceHandleManager.cpp:406
> +                d->m_multipartHandle = adoptPtr(new MultipartHandle(job, boundary));

We usually use static factory functions such as PassOwnPtr<MultipartHandle> MultipartHandle::create() for such cases so that we never deal with raw pointers externally.
Comment 11 Peter Gal 2013-09-04 01:31:16 PDT
(In reply to comment #10)
> (From update of attachment 208710 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=208710&action=review
> 
> > Source/WebCore/platform/network/curl/MultipartHandle.cpp:82
> > +        if (data[pos + i] != m_boundary[i]) {
> 
> how do we know data[pos + i] does not go out of bounds?

Ahh yeah, didn't checked for that. I'll fix that.

> > Source/WebCore/platform/network/curl/MultipartHandle.cpp:135
> > +    char* end = const_cast<char*>(content) + contentLength;
> 
> Cannot this one be const?
> 

It can be, and will be :)

> > Source/WebCore/platform/network/curl/MultipartHandle.h:77
> > +    Vector<char> m_buffer;
> 
> Considering the operations you are doing (append and remove from beginning), a Vector may not be the best data structure. You might want to take a look at StreamBuffer (or Deque)
> 

The StreamBuffer operates with the concept of blocks (which are essentially Vectors) and these blocks are in a Deque. Also it is only possible to access the first block only, so if the boundary string is split between 2 block, to correctly handle that case that'll require more code and/or complexity in the MultipartHandler.

The Deque uses internally the same VectorBuffer as the Vector. Also note that if I would use Deque<char> then when appending characters I would need to iterate over all input chars to call append for each one, in Vector there is already such method so I don't need to write it (I know not a big excuse but still less code to type this way :))

> > Source/WebCore/platform/network/curl/ResourceHandleManager.cpp:406
> > +                d->m_multipartHandle = adoptPtr(new MultipartHandle(job, boundary));
> 
> We usually use static factory functions such as PassOwnPtr<MultipartHandle> MultipartHandle::create() for such cases so that we never deal with raw pointers externally.

Okay, that makes sense.
Comment 12 Peter Gal 2013-09-04 10:24:50 PDT
Created attachment 210473 [details]
patch v3

Updated patch (Sorry for the big delay)
Comment 13 WebKit Commit Bot 2013-09-04 10:27:50 PDT
Attachment 210473 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/WebCore.vcxproj/WebCore.vcxproj', u'Source/WebCore/WebCore.vcxproj/WebCore.vcxproj.filters', u'Source/WebCore/platform/network/HTTPParsers.cpp', u'Source/WebCore/platform/network/HTTPParsers.h', u'Source/WebCore/platform/network/ResourceHandleInternal.h', u'Source/WebCore/platform/network/curl/MultipartHandle.cpp', u'Source/WebCore/platform/network/curl/MultipartHandle.h', u'Source/WebCore/platform/network/curl/ResourceHandleManager.cpp']" exit_code: 1
Source/WebCore/platform/network/curl/MultipartHandle.h:41:  The parameter name "handle" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 1 in 9 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 Peter Gal 2013-09-11 07:25:52 PDT
(In reply to comment #12)
> Created an attachment (id=210473) [details]
> patch v3
> 

ping?
Comment 15 Brent Fulgham 2013-09-12 11:02:24 PDT
Comment on attachment 210473 [details]
patch v3

r=me
Comment 16 WebKit Commit Bot 2013-09-12 11:25:20 PDT
Comment on attachment 210473 [details]
patch v3

Clearing flags on attachment: 210473

Committed r155633: <http://trac.webkit.org/changeset/155633>
Comment 17 WebKit Commit Bot 2013-09-12 11:25:23 PDT
All reviewed patches have been landed.  Closing bug.