RESOLVED FIXED 63310
MHTML archive loader should support binary parts
https://bugs.webkit.org/show_bug.cgi?id=63310
Summary MHTML archive loader should support binary parts
Jay Civelli
Reported 2011-06-23 21:40:11 PDT
The MHTML archive loader only supports the following encodings: QuotedPrintable, Base64 and and 7bit. It should also support binary for writing/reading archives, as it provides more compact files that are faster to load.
Attachments
Patch (17.69 KB, patch)
2011-06-26 10:00 PDT, Jay Civelli
no flags
Patch (17.69 KB, patch)
2011-06-26 21:19 PDT, Jay Civelli
no flags
Patch (17.96 KB, patch)
2011-06-27 12:31 PDT, Jay Civelli
no flags
Jay Civelli
Comment 1 2011-06-26 10:00:18 PDT
Adam Barth
Comment 2 2011-06-26 15:48:15 PDT
Comment on attachment 98629 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98629&action=review > Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:168 > + endOfArchiveReached = (nextChars[0] == '-' && nextChars[1] == '-'); Do you want to ASSERT(nextChars.size() == 2) before this line (to help the casual reader)? > Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:180 > + if (checkBoundary && (line == endOfPartBoundary || line == endOfDocumentBoundary)) { > + endOfArchiveReached = (line == endOfDocumentBoundary); nit: Can we avoid doing this strcmp twice? > Source/WebCore/platform/SharedBuffer.cpp:171 > + if (!data.isEmpty()) > + append(data.data(), data.size()); It looks like append(const char*, unsigned) might already understand length == 0, so this branch might not be needed. > Source/WebKit/chromium/public/WebPageSerializer.h:64 > // Serializes the WebView contents to a MHTML representation. > - WEBKIT_API static WebCString serializeToMHTML(WebView*); > + // Passing true for |useBinaryEncoding| will result in a smaller HTML file, > + // but it might not work on other browsers. > + WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false); Should this be an enum for the various supported encodings (even if we just support two today)?
Adam Barth
Comment 3 2011-06-26 15:48:34 PDT
+fishd for WebKit API review.
Jay Civelli
Comment 4 2011-06-26 21:18:47 PDT
Comment on attachment 98629 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98629&action=review >> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:168 >> + endOfArchiveReached = (nextChars[0] == '-' && nextChars[1] == '-'); > > Do you want to ASSERT(nextChars.size() == 2) before this line (to help the casual reader)? Done. >> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:180 >> + endOfArchiveReached = (line == endOfDocumentBoundary); > > nit: Can we avoid doing this strcmp twice? Good idea, done. >> Source/WebCore/platform/SharedBuffer.cpp:171 >> + append(data.data(), data.size()); > > It looks like append(const char*, unsigned) might already understand length == 0, so this branch might not be needed. Removed if. >> Source/WebKit/chromium/public/WebPageSerializer.h:64 >> + WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false); > > Should this be an enum for the various supported encodings (even if we just support two today)? I don't expect any other encodings. We could change to an enum in the future in the unlikely event that we need an extra encoding.
Jay Civelli
Comment 5 2011-06-26 21:19:43 PDT
Adam Barth
Comment 6 2011-06-26 21:23:04 PDT
Comment on attachment 98660 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98660&action=review > Source/WebKit/chromium/public/WebPageSerializer.h:64 > // Serializes the WebView contents to a MHTML representation. > - WEBKIT_API static WebCString serializeToMHTML(WebView*); > + // Passing true for |useBinaryEncoding| will result in a smaller HTML file, > + // but it might not work on other browsers. > + WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false); Ok. I'm going to let fishd do the final review. (Changing the WebKit API is a PITA, which is why I suggested an enum.)
Darin Fisher (:fishd, Google)
Comment 7 2011-06-27 10:18:00 PDT
Comment on attachment 98660 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98660&action=review >> Source/WebKit/chromium/public/WebPageSerializer.h:64 >> + WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false); > > Ok. I'm going to let fishd do the final review. (Changing the WebKit API is a PITA, which is why I suggested an enum.) Ordinarily, raw bool parameters like this don't help much with readability at the callsite. It would make the callsites more self-documenting if you passed an enum value, or if you created a separate function: serializeToMHTMLUsingBinaryEncoding.
Jay Civelli
Comment 8 2011-06-27 12:31:14 PDT
Jay Civelli
Comment 9 2011-06-27 12:32:43 PDT
Comment on attachment 98660 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98660&action=review >>> Source/WebKit/chromium/public/WebPageSerializer.h:64 >>> + WEBKIT_API static WebCString serializeToMHTML(WebView*, bool useBinaryEncoding = false); >> >> Ok. I'm going to let fishd do the final review. (Changing the WebKit API is a PITA, which is why I suggested an enum.) > > Ordinarily, raw bool parameters like this don't help much with readability at the > callsite. It would make the callsites more self-documenting if you passed an > enum value, or if you created a separate function: serializeToMHTMLUsingBinaryEncoding. OK, I created a separate function as I don't expect any other encoding in the future.
Darin Fisher (:fishd, Google)
Comment 10 2011-06-27 12:41:21 PDT
Comment on attachment 98766 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98766&action=review > Source/WebKit/chromium/src/WebPageSerializer.cpp:211 > +WebCString WebPageSerializer::serializeToMHTMLUsingBinaryEncoding(WebView* view) why don't you make this method return WebData? (that addresses the FIXME)
Jay Civelli
Comment 11 2011-06-27 12:55:24 PDT
Comment on attachment 98766 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98766&action=review >> Source/WebKit/chromium/src/WebPageSerializer.cpp:211 >> +WebCString WebPageSerializer::serializeToMHTMLUsingBinaryEncoding(WebView* view) > > why don't you make this method return WebData? (that addresses the FIXME) I have that in another patch, that got r+ed, landed and broke a test and got rolled back. In the meantime I landed a patch in Chromium that uses the API. So my plan is to go the "#ifdef bug_1234_fixed" road to address that in an upcoming patch.
Darin Fisher (:fishd, Google)
Comment 12 2011-06-27 14:22:53 PDT
Comment on attachment 98766 [details] Patch OK, LGTM for WebKit API change.
Darin Fisher (:fishd, Google)
Comment 13 2011-06-27 14:25:05 PDT
Comment on attachment 98766 [details] Patch Based on Adam's previous comments that this was just waiting for me to give final review, I'm marking this R=me. I did not review the WebCore portions of this patch carefully, but I defer to Adam for that part.
Adam Barth
Comment 14 2011-06-27 14:26:59 PDT
I'm happy to take responsibility for the WebCore parts.
WebKit Review Bot
Comment 15 2011-06-27 15:18:31 PDT
Comment on attachment 98766 [details] Patch Clearing flags on attachment: 98766 Committed r89869: <http://trac.webkit.org/changeset/89869>
WebKit Review Bot
Comment 16 2011-06-27 15:18:37 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.