Bug 63310 - MHTML archive loader should support binary parts
Summary: MHTML archive loader should support binary parts
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jay Civelli
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-23 21:40 PDT by Jay Civelli
Modified: 2011-06-27 15:18 PDT (History)
4 users (show)

See Also:


Attachments
Patch (17.69 KB, patch)
2011-06-26 10:00 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (17.69 KB, patch)
2011-06-26 21:19 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (17.96 KB, patch)
2011-06-27 12:31 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jay Civelli 2011-06-23 21:40:11 PDT
The MHTML archive loader only supports the following encodings:
QuotedPrintable, Base64 and and 7bit.
It should also support binary for writing/reading archives, as it provides more compact files that are faster to load.
Comment 1 Jay Civelli 2011-06-26 10:00:18 PDT
Created attachment 98629 [details]
Patch
Comment 2 Adam Barth 2011-06-26 15:48:15 PDT
Comment on attachment 98629 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98629&action=review

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:168
> +         endOfArchiveReached = (nextChars[0] == '-' && nextChars[1] == '-');

Do you want to ASSERT(nextChars.size() == 2) before this line (to help the casual reader)?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:180
> +            if (checkBoundary && (line == endOfPartBoundary || line == endOfDocumentBoundary)) {
> +                endOfArchiveReached = (line == endOfDocumentBoundary);

nit: Can we avoid doing this strcmp twice?

> Source/WebCore/platform/SharedBuffer.cpp:171
> +    if (!data.isEmpty())
> +        append(data.data(), data.size());

It looks like append(const char*, unsigned) might already understand length == 0, so this branch might not be needed.

> Source/WebKit/chromium/public/WebPageSerializer.h:64
>      // Serializes the WebView contents to a MHTML representation.
> -    WEBKIT_API static WebCString serializeToMHTML(WebView*);
> +    // Passing true for |useBinaryEncoding| will result in a smaller HTML file,
> +    // but it might not work on other browsers.
> +    WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false);

Should this be an enum for the various supported encodings (even if we just support two today)?
Comment 3 Adam Barth 2011-06-26 15:48:34 PDT
+fishd for WebKit API review.
Comment 4 Jay Civelli 2011-06-26 21:18:47 PDT
Comment on attachment 98629 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98629&action=review

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:168
>> +         endOfArchiveReached = (nextChars[0] == '-' && nextChars[1] == '-');
> 
> Do you want to ASSERT(nextChars.size() == 2) before this line (to help the casual reader)?

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:180
>> +                endOfArchiveReached = (line == endOfDocumentBoundary);
> 
> nit: Can we avoid doing this strcmp twice?

Good idea, done.

>> Source/WebCore/platform/SharedBuffer.cpp:171
>> +        append(data.data(), data.size());
> 
> It looks like append(const char*, unsigned) might already understand length == 0, so this branch might not be needed.

Removed if.

>> Source/WebKit/chromium/public/WebPageSerializer.h:64
>> +    WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false);
> 
> Should this be an enum for the various supported encodings (even if we just support two today)?

I don't expect any other encodings. We could change to an enum in the future in the unlikely event that we need an extra encoding.
Comment 5 Jay Civelli 2011-06-26 21:19:43 PDT
Created attachment 98660 [details]
Patch
Comment 6 Adam Barth 2011-06-26 21:23:04 PDT
Comment on attachment 98660 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98660&action=review

> Source/WebKit/chromium/public/WebPageSerializer.h:64
>      // Serializes the WebView contents to a MHTML representation.
> -    WEBKIT_API static WebCString serializeToMHTML(WebView*);
> +    // Passing true for |useBinaryEncoding| will result in a smaller HTML file,
> +    // but it might not work on other browsers.
> +    WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false);

Ok.  I'm going to let fishd do the final review.  (Changing the WebKit API is a PITA, which is why I suggested an enum.)
Comment 7 Darin Fisher (:fishd, Google) 2011-06-27 10:18:00 PDT
Comment on attachment 98660 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98660&action=review

>> Source/WebKit/chromium/public/WebPageSerializer.h:64
>> +    WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false);
> 
> Ok.  I'm going to let fishd do the final review.  (Changing the WebKit API is a PITA, which is why I suggested an enum.)

Ordinarily, raw bool parameters like this don't help much with readability at the
callsite.  It would make the callsites more self-documenting if you passed an
enum value, or if you created a separate function:  serializeToMHTMLUsingBinaryEncoding.
Comment 8 Jay Civelli 2011-06-27 12:31:14 PDT
Created attachment 98766 [details]
Patch
Comment 9 Jay Civelli 2011-06-27 12:32:43 PDT
Comment on attachment 98660 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98660&action=review

>>> Source/WebKit/chromium/public/WebPageSerializer.h:64
>>> +    WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false);
>> 
>> Ok.  I'm going to let fishd do the final review.  (Changing the WebKit API is a PITA, which is why I suggested an enum.)
> 
> Ordinarily, raw bool parameters like this don't help much with readability at the
> callsite.  It would make the callsites more self-documenting if you passed an
> enum value, or if you created a separate function:  serializeToMHTMLUsingBinaryEncoding.

OK, I created a separate function as I don't expect any other encoding in the future.
Comment 10 Darin Fisher (:fishd, Google) 2011-06-27 12:41:21 PDT
Comment on attachment 98766 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98766&action=review

> Source/WebKit/chromium/src/WebPageSerializer.cpp:211
> +WebCString WebPageSerializer::serializeToMHTMLUsingBinaryEncoding(WebView* view)

why don't you make this method return WebData?  (that addresses the FIXME)
Comment 11 Jay Civelli 2011-06-27 12:55:24 PDT
Comment on attachment 98766 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=98766&action=review

>> Source/WebKit/chromium/src/WebPageSerializer.cpp:211
>> +WebCString WebPageSerializer::serializeToMHTMLUsingBinaryEncoding(WebView* view)
> 
> why don't you make this method return WebData?  (that addresses the FIXME)

I have that in another patch, that got r+ed, landed and broke a test and got rolled back.
In the meantime I landed a patch in Chromium that uses the API.
So my plan is to go the "#ifdef   bug_1234_fixed" road to address that in an upcoming patch.
Comment 12 Darin Fisher (:fishd, Google) 2011-06-27 14:22:53 PDT
Comment on attachment 98766 [details]
Patch

OK, LGTM for WebKit API change.
Comment 13 Darin Fisher (:fishd, Google) 2011-06-27 14:25:05 PDT
Comment on attachment 98766 [details]
Patch

Based on Adam's previous comments that this was just waiting for me to give final review, I'm marking this R=me.  I did not review the WebCore portions of this patch carefully, but I defer to Adam for that part.
Comment 14 Adam Barth 2011-06-27 14:26:59 PDT
I'm happy to take responsibility for the WebCore parts.
Comment 15 WebKit Review Bot 2011-06-27 15:18:31 PDT
Comment on attachment 98766 [details]
Patch

Clearing flags on attachment: 98766

Committed r89869: <http://trac.webkit.org/changeset/89869>
Comment 16 WebKit Review Bot 2011-06-27 15:18:37 PDT
All reviewed patches have been landed.  Closing bug.