Bug 7168 - Support reading of MHTML (multipart/related) web archives
Summary: Support reading of MHTML (multipart/related) web archives
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Enhancement
Assignee: Jay Civelli
URL: http://www.faqs.org/rfcs/rfc2557.html
Keywords: InRadar
Depends on: 59834 59946 60637
Blocks: 60013
  Show dependency treegraph
 
Reported: 2006-02-09 19:39 PST by David Kilzer (:ddkilzer)
Modified: 2011-05-24 15:09 PDT (History)
20 users (show)

See Also:


Attachments
MHTML support for WebKit GTK (37.22 KB, patch)
2008-09-02 10:29 PDT, Marco Barisione
no flags Details | Formatted Diff | Diff
Patch (181.97 KB, patch)
2011-04-29 10:20 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (195.44 KB, patch)
2011-04-29 14:25 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (175.16 KB, patch)
2011-05-06 18:51 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.18 MB, application/zip)
2011-05-06 19:49 PDT, WebKit Review Bot
no flags Details
Patch (12.06 KB, patch)
2011-05-11 10:29 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (97.89 KB, patch)
2011-05-13 10:15 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (107.42 KB, patch)
2011-05-13 15:23 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (239.73 KB, patch)
2011-05-16 16:10 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (239.77 KB, patch)
2011-05-16 16:40 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (241.62 KB, patch)
2011-05-17 15:17 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (241.35 KB, patch)
2011-05-18 10:45 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (242.11 KB, patch)
2011-05-18 12:48 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (119.95 KB, patch)
2011-05-19 10:11 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (117.21 KB, patch)
2011-05-19 13:38 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch (117.87 KB, patch)
2011-05-19 15:10 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.47 MB, application/zip)
2011-05-19 16:43 PDT, WebKit Review Bot
no flags Details
Patch (123.23 KB, patch)
2011-05-20 10:54 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.51 MB, application/zip)
2011-05-20 11:55 PDT, WebKit Review Bot
no flags Details
Patch (119.35 KB, patch)
2011-05-23 11:15 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-01 (1.35 MB, application/zip)
2011-05-23 12:30 PDT, WebKit Review Bot
no flags Details
Archive of layout-test-results from cr-jail-7 (201.58 KB, application/zip)
2011-05-23 15:11 PDT, WebKit Commit Bot
no flags Details
Patch (118.69 KB, patch)
2011-05-23 16:08 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.26 MB, application/zip)
2011-05-23 17:58 PDT, WebKit Review Bot
no flags Details
Patch (118.73 KB, patch)
2011-05-24 08:58 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Patch for landing (131.15 KB, patch)
2011-05-24 09:33 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff
Updated to the various FeatureDefines.xcconfig I forgot previously. (3.25 KB, patch)
2011-05-24 15:03 PDT, Jay Civelli
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2006-02-09 19:39:40 PST
MSIE uses the MHTML format (based on RFC 2557) to save "web archives" of web pages.  WebKit should be able to read and display this format.

Note that because the MHTML format is essentially a mail message with a content-type of "multipart/related", a MIME decoder which supports quoted-printable and base64 decoding will be needed.  Mail.app probably contains usable decoding methods, but until they are moved to system libraries, a third-party library would have to be used or that wheel would have to be re-invented.  Pantomime is an example of such a third-party library.

http://www.collaboration-world.com/cgi-bin/project/index.cgi?pid=3
Comment 1 David Kilzer (:ddkilzer) 2006-02-09 19:57:34 PST
Created a Radar bug for a system-level APIs for MIME encoding and decoding in Mac OS X.
<rdar://problem/4440559>
Comment 2 Maciej Stachowiak 2006-02-11 21:35:44 PST
We should seriously consider making MHTML the default web archive format.
Comment 3 Dave Hyatt 2006-02-11 21:48:04 PST
Doesn't IE support both Web archives and saving the subresources into a folder (the way Firefox does)?  The latter would be more useful for Web developers and for people needing to reduce bugs.  It would be cool to support both. :)
Comment 4 David Kilzer (:ddkilzer) 2006-02-12 00:09:18 PST
(In reply to comment #3)
> Doesn't IE support both Web archives and saving the subresources into a folder
> (the way Firefox does)?  The latter would be more useful for Web developers and
> for people needing to reduce bugs.  It would be cool to support both. :)

See bug 7211.  :)
Comment 5 Joost de Valk (AlthA) 2006-02-12 02:54:09 PST
I vote for fixing bug 7211 first :)
Comment 6 Marco Barisione 2008-09-02 10:29:27 PDT
Created attachment 23119 [details]
MHTML support for WebKit GTK

This is an initial patch to add support for both reading and writing MHTML files in WebKit GTK. I tried to make most of the code and the logic shareable in different ports but of course some code is needed to use the native libraries to handle MIME.

Any comments? Should the platform-specific code moved somewhere else?

For GTK reviewers: note that the public API addition is just for testing, probably if the WebCore part is committed we need to add something like the WebArchive class.
Comment 7 David Kilzer (:ddkilzer) 2008-09-04 09:26:25 PDT
Comment on attachment 23119 [details]
MHTML support for WebKit GTK

Great work!!  I was secretly hoping you included a MIME parser and code to write out MIME-encoded files so all platforms could benefit from this feature, but it's nice that you used GMime for the GTK port and provided a way for other platforms to provide their own MIME parsing/creating solutions!  :)

This is a great first patch, but there are some things that need to be fixed before it lands. I'm going to mark this r-; see details below.

- Missing a ChangeLog entry.  See <http://webkit.org/coding/contributing.html>.

>@@ -543,8 +543,15 @@ PassRefPtr<ArchiveResource> DocumentLoader::subresource(const KURL& url) const
>         return archiveResourceForURL(url);
>         
>     CachedResource* resource = doc->docLoader()->cachedResource(url);
>+#if PLATFORM(GTK)
>+    // FIXME not checking resource->preloadResult() is just a hack to
>+    // make it work. I have no clue on what is going on here.
>+    if (!resource)
>+        return 0;
>+#else
>     if (!resource || resource->preloadResult() == CachedResource::PreloadReferenced)
>         return archiveResourceForURL(url);
>+#endif
>         
>     return ArchiveResource::create(resource->data(), url, resource->response());
> }

This definitely needs to be fixed before the patch lands.  Please talk to 'antti' in the #webkit channel on irc.freenode.net about preloading and how it affects this patch, or (if he's not available), beidson/bradee-oh.

>+        else
>+            // MHTML archives do not store the frame name, so we use the
>+            // URL as the name
>+            m_subframes.set((*iFrame)->mainResource()->url().string(), iFrame->get());

I think you can leave off ".string()" after "url()" here as there is a method to turn a KURL object into a String automatically.

>+PassRefPtr<Archive> ArchiveResourceCollection::popSubframeArchive(const String& frameName, const KURL& url)
> {
>-    return m_subframes.take(frameName);
>+    PassRefPtr<Archive> archive = m_subframes.take(frameName);

You want a RefPtr<Archive> here, not a PassRefPtr.

>+    if (archive)
>+        return archive;
>+
>+    return m_subframes.take(url.string());

Again, I think ".string()" may be left off here.

>+bool MHTMLWebArchive::init(SharedBuffer* data)
>+{
>+    ASSERT(data);
>+    if (!data)
>+        return false;
>+
>+    if (!parseRawDataRepresentation(data))
>+        return false;
>+
>+    ASSERT(!m_mainResourceType.isEmpty());
>+
>+    /* MHTML files do not have the concept of subframes, so resources are global
>+     * and are assigned to the main frame. To make the resources accessible
>+     * also to subframes we add all the subresources to all the subframes. */

This is probably fine to land this way, but please consider talking to beidson/bradee-oh in #webkit on irc.freenode.net about how to improve this data structure.

For example, would it make more sense to put all resources as main resources (which more closely matches the structure of MHTML files), but have all subframes look for their resources in the main resources if the archive is an MHTML archive?

>+    const Vector<RefPtr<Archive> >& subframes(subframeArchives());
>+    unsigned i;
>+    for (i = 0; i < subframes.size(); ++i) {

Please declare the loop variable inside the for loop, e.g., "for (unsigned i = 0; ...".

>+        const Vector<RefPtr<ArchiveResource> >& resources(subresources());
>+        unsigned j;
>+        for (j = 0; j < resources.size(); ++j)

Please declare the loop variable inside the loop.

>+        /* For some reasons IE sometimes saves sub frames using the "application/octet-stream" mime type instead of
>+         * text/html. In all the cases we tested these HTML files always start with a UTF-8 BOM followed by a
>+         * "<!DOCTYPE ...", so we use it as an heuristic to fix the mime type. */
>+        const char* htmlHead = "\xef\xbb\xbf<!DOCTYPE ";
>+        CString octetHead(buffer->data(), strlen(htmlHead));
>+        if (!strcmp(octetHead.data(), htmlHead))
>+            response.setMimeType("text/html");
>+    }

Is a better way to check for the UTF-8 BOM plus "<!DOCTYPE " here rather than hard-coding the UTF-8 BOM sequence?  Probably okay to land as is, but it would be nice if we could get the UTF-8 BOM from a predefined macro or const (perhaps in ICU?).

>+    // MHTML files do not store subframe names, so we use the URL as the name.
>+    PassRefPtr<ArchiveResource> resource = ArchiveResource::create(buffer, response.url(), response.mimeType(), response.textEncodingName(), response.url().string(), response);

You want a RefPtr here, not a PassRefPtr.  PassRefPtr types are only used in method argument declarations.  See this unlinked web page for more details:  <http://webkit.org/coding/RefPtr.html>.

Also, I think "response.url().string()" can just be "response.url()" here.

>+        PassRefPtr<MHTMLWebArchive> archive = create();

Again, RefPtr instead of PassRefPtr.

>+void MHTMLWebArchive::serializeResourceIfNew(ArchiveResource* resource, HashSet<String>& uniqueSubresources)
>+{
>+    /* MHTML files do not have the concept of subframes, so resources are global.
>+     * To make them accessible also from subframes they have been also assigned to all
>+     * the subframes (see MHTMLWebArchive::init). When saving a MHTML we want to avoid
>+     * to save the duplicated resources, so we use uniqueSubresources to keep track of
>+     * what was already included. */

If you switched to saving the resources only on the main resources, then I think you would get the uniqueneess for free, right?  Or does it use an array instead of a hash to store the resources?  (I thought it used a hash.)

>+    const Vector<RefPtr<ArchiveResource> >& resources(subresources());
>+    unsigned i;
>+    for (i = 0; i < resources.size(); ++i)

Please declare the loop variable inside the for loop.

>+        rootArchive->serializeResourceIfNew(resources[i].get(), uniqueSubresources);
>+
>+    const Vector<RefPtr<Archive> >& subframes(subframeArchives());
>+    for (i = 0; i < subframes.size(); ++i)

Please redeclare 'i' inside the loop variable here.

>+    PassRefPtr<SharedBuffer> sharedBuffer = serializeEnd();

Use RefPtr.

>+    return sharedBuffer;
>+}

This becomes "return sharedBuffer.release();" with the change to RefPtr above.

>+    RefPtr<MHTMLWebArchive> archive = create();
>+    archive->setMainResource(mainResource);
>+    
>+    for (unsigned i = 0; i < subresources.size(); ++i)
>+        archive->addSubresource(subresources[i]);
>+    
>+    for (unsigned i = 0; i < subframeArchives.size(); ++i)
>+        archive->addSubframeArchive(subframeArchives[i]);  

This would change if all resources were stored as main resources.

>+    // Without this hack a MHTML stream is saved inside another MHTML stream.
>+    // FIXME Figure out how can LegacyWebArchive work without a similar hack.
>+    if (documentLoader->mainResource()->mimeType() == "message/rfc822")
>+        return MHTMLWebArchive::create(documentLoader->mainResource()->data());

Because MHTML files don't have the concept of "subFrames" or "subFrameArchives", I don't think you need to include the if statement check.  Just return here unconditionally and remove the rest of the code in the method.  The if statement is always going to be true anyway, correct?

>+PassRefPtr<MHTMLWebArchive> MHTMLWebArchive::create(Range* range)

This method looks very similar to the LegacyWebArchive.cpp method with the same signature.  Is there any benefit to creating a super class for both LegacyWebArchive and MHTMLWebArchive to share code?

>+PassRefPtr<MHTMLWebArchive> MHTMLWebArchive::create(const String& markupString, Frame* frame, Vector<Node*>& nodes)
> [...]
>+PassRefPtr<MHTMLWebArchive> 
MHTMLWebArchive::createFromSelection(Frame* frame)

These two methods look identical, too.  I'd suggest making a super class to hold common code rather than duplicating it here.

These methods may also not be needed since I think they're used for copying documents (or document fragments) to the pasteboard on Mac OS X, which just happens to use the same format as *.webarchive files.

>+#if PLATFORM(GTK)
>+typedef struct _GMimeObject GMimeObject;
>+typedef struct _GMimePart GMimePart;
>+typedef struct _GMimeMultipart GMimeMultipart;
>+
>+namespace WebCore {
>+void handleMimeObject(GMimeObject*, void*);
>+}
>+#endif
>+
>+namespace WebCore {
>+
>+class Frame;
>+class Node;
>+class Range;
>+
>+class MHTMLWebArchive : public Archive {

I would rather see another #if PLATFORM(GTK)/#endif rather than two "namespace WebCore { }" statements here.

>+    PassRefPtr<SharedBuffer> sharedBuffer = SharedBuffer::create(buffer, bufferSize);

Use RefPtr.

>+    addMimeResource(sharedBuffer, response);

"sharedBuffer" becomes "sharedBuffer.release()".

>+    PassRefPtr<SharedBuffer> sharedBuffer = SharedBuffer::create(byteArray->data, byteArray->len);

Use RefPtr.

>+    return sharedBuffer;

Becomes "sharedBuffer.release()".

>+    PassRefPtr<MHTMLWebArchive> archive = MHTMLWebArchive::create(core(frame));
>+    if (!archive)
>+        return false;
>+
>+    PassRefPtr<SharedBuffer> buffer = archive->rawDataRepresentation();
>+    return g_file_set_contents(filename, buffer->data(), buffer->size(), 0);

Use RefPtr.  No other changes should be necessary.

As I said above, r- to fix the issues above, but a great first patch!!
Comment 8 zaheer 2008-11-18 22:11:30 PST
we have used this patch and found couple of places where it might lead to a crash

1- MHTMLWebArchive::serializeResourceIfNew - in this if resource url is empty, uniqueSubresources.contains(resource->url().string()) results in a crash

2- MHTMLWebArchive::serializeResource - here again resource data can be null and g_mime_part_set_content crashes.

Can you let us know if these are valid scenarios or if something weird is happening in our setup
Comment 9 Paanky 2009-01-22 13:35:40 PST
Used the patch. Found two issues with this. 
1. All the subresources are not saved. Subresources by mean I say about the embedded resources in the main page. For example some of the images are saved where as some are not.
I do understand that the contents added dynamically from script may not be saved, but static contents should be there.

2. If some subresources has not been saved in the file, then while loading it should not go to the network for accessing http resources. Since we are loading a local file then accessing network is not wiser, as much I understand.
Comment 10 Jay Civelli 2011-04-29 10:20:02 PDT
Created attachment 91695 [details]
Patch
Comment 11 Jay Civelli 2011-04-29 10:22:22 PDT
Here is a different patch that does the MHTML reading without using a specific library (and is platform independent).
Comment 12 Gustavo Noronha (kov) 2011-04-29 10:36:23 PDT
Attachment 91695 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/8524219
Comment 13 Early Warning System Bot 2011-04-29 10:38:59 PDT
Attachment 91695 [details] did not build on qt:
Build output: http://queues.webkit.org/results/8516944
Comment 14 WebKit Review Bot 2011-04-29 10:47:02 PDT
Attachment 91695 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8516947
Comment 15 Build Bot 2011-04-29 11:07:20 PDT
Attachment 91695 [details] did not build on win:
Build output: http://queues.webkit.org/results/8524228
Comment 16 WebKit Review Bot 2011-04-29 12:42:18 PDT
Attachment 91695 [details] did not build on mac:
Build output: http://queues.webkit.org/results/8521405
Comment 17 Adam Barth 2011-04-29 14:11:02 PDT
Comment on attachment 91695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91695&action=review

Some drive-by comments.  This patch is too large to review effectively.  Please break this patch into smaller pieces and post them for review in individual bugs that block this bug.

> LayoutTests/mhtml/simple_page.html_original:7
> +<head>
> +<title>A simple page</title>
> +<script>
> +</script>
> +</head>

This should be a text test.

> LayoutTests/mhtml/top_frame.html_original:5
> +<head>
> +<title>A page that contains multiple nested frames</title>
> +</head>

This should all be text tests.  Render tree/image tests are expensive for the project to maintain.

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:32
> +#include "config.h"
> +#include "MHTMLArchive.h"

This whole file should be conditional on ENABLE(MHTML), right?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:45
> +class MHTMLArchive::MHTMLParser {

Should this be in its own file?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:129
> +    if (resource->mimeType() == "text/html" || resource->mimeType() == "application/xhtml+xml") {

What about SVG?  This check seems too specific.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:41
> +namespace {

WebKit uses the static keyword instead of anonymous namespaces.

> Source/WebCore/loader/archive/mhtml/SharedBufferLineReader.h:45
> +// Utility class that allows to read CR-LF terminated lines from a SharedBuffer.
> +class SharedBufferLineReader {

This doesn't appear MHTML specific.  Should it go in a more general location?

> Source/WebCore/platform/text/QuotedPrintable.cpp:47
> +static int hexDigitToInt(char c)
> +{
> +    if (c >= '0' && c <= '9')
> +        return c - '0';
> +    if (c >= 'A' && c <= 'F')
> +        return c - 'A' + 10;
> +    if (c >= 'a' && c <= 'f')
> +        return c - 'a' + 10;
> +    return -1;
> +}

We must have this function elsewhere.
Comment 18 Jay Civelli 2011-04-29 14:25:23 PDT
Created attachment 91740 [details]
Patch
Comment 19 Adam Barth 2011-04-29 14:30:22 PDT
Comment on attachment 91740 [details]
Patch

Please divide this patch into smaller pieces, as requested above.
Comment 20 Adam Barth 2011-04-29 14:32:47 PDT
By the way, we're trying a new system for understanding new features that are going into WebKit.  I'm not sure the new policy is up on the web site yet, but at the contributor's meeting we decided to experiment with having folks who add new features email webkit-dev first.  This probably falls into that category.  This is a very new thing, so I'm not sure sure what will happen.  Here's an example of such an email <https://lists.webkit.org/pipermail/webkit-dev/2011-April/016630.html>.  (Note: This shouldn't block your work on this feature, it's just to check in with the community.)
Comment 21 Alexey Proskuryakov 2011-05-02 21:21:10 PDT
See also: <http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0096>.
Comment 22 Jay Civelli 2011-05-06 18:51:52 PDT
Created attachment 92676 [details]
Patch
Comment 23 Jay Civelli 2011-05-06 18:55:54 PDT
Comment on attachment 91695 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91695&action=review

>> LayoutTests/mhtml/top_frame.html_original:5
>> +</head>
> 
> This should all be text tests.  Render tree/image tests are expensive for the project to maintain.

OK, I am looking at it, will post a new patch making them all text tests soon.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:32
>> +#include "MHTMLArchive.h"
> 
> This whole file should be conditional on ENABLE(MHTML), right?

OK, done for all MHTML files.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:45
>> +class MHTMLArchive::MHTMLParser {
> 
> Should this be in its own file?

Good idea, it now is.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:129
>> +    if (resource->mimeType() == "text/html" || resource->mimeType() == "application/xhtml+xml") {
> 
> What about SVG?  This check seems too specific.

I added the SVG case. Do you think there are other cases? Is this approach too restrictive?

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:41
>> +namespace {
> 
> WebKit uses the static keyword instead of anonymous namespaces.

Done.

>> Source/WebCore/loader/archive/mhtml/SharedBufferLineReader.h:45
>> +class SharedBufferLineReader {
> 
> This doesn't appear MHTML specific.  Should it go in a more general location?

Now in from my previous CL.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:47
>> +}
> 
> We must have this function elsewhere.

This was done in a previous CL.
Comment 24 Early Warning System Bot 2011-05-06 19:03:16 PDT
Comment on attachment 92676 [details]
Patch

Attachment 92676 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8620120
Comment 25 Gyuyoung Kim 2011-05-06 19:23:17 PDT
Comment on attachment 92676 [details]
Patch

Attachment 92676 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/8608196
Comment 26 WebKit Review Bot 2011-05-06 19:36:31 PDT
Comment on attachment 92676 [details]
Patch

Attachment 92676 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/8615196
Comment 27 WebKit Review Bot 2011-05-06 19:49:40 PDT
Comment on attachment 92676 [details]
Patch

Attachment 92676 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8620132

New failing tests:
mhtml/top_frame_ie.mht
mhtml/top_frame_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
Comment 28 WebKit Review Bot 2011-05-06 19:49:46 PDT
Created attachment 92679 [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 29 Alexey Proskuryakov 2011-05-06 19:51:17 PDT
>> +#include "MHTMLArchive.h"
> 
> This whole file should be conditional on ENABLE(MHTML), right?

FWIW, I much prefer when headers have include guards, so one can always include them. Guards in cpp files are more disruptive.

Many optional systems are implemented like this, but not all.
Comment 30 Collabora GTK+ EWS bot 2011-05-06 21:32:39 PDT
Comment on attachment 92676 [details]
Patch

Attachment 92676 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8610208
Comment 31 Adam Barth 2011-05-07 00:18:04 PDT
Comment on attachment 92676 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92676&action=review

Generally, this looks like a nice approach.  I'm glad that you were able to leverage so much of the existing WebArchive code.  The comments below are all just details.  My main area of concern is the Content-Type parser because that's complex and tricky code.

> LayoutTests/mhtml/page_with_image_ie-expected.txt:6
> +    RenderBody {BODY} at (8,8) size 784x584
> +      RenderText {#text} at (0,0) size 126x18

Do these have to be render-tree tests?  It would be much better if they were text tests...

> Source/WebCore/loader/DocumentLoader.cpp:623
> +        ArchiveResource* resource = archiveResourceForURL(originalURL);
> +        if (resource) {

We commonly combine these two lines.

> Source/WebCore/loader/DocumentLoader.cpp:639
> +          return false;

bad indent

> Source/WebCore/loader/DocumentLoader.cpp:643
> +    case Archive::MHTML:
> +        return true; // Always fail the load for resources not included in the MHTML.

Does this need ENABLE(MHTML) ?

> Source/WebCore/loader/FrameLoader.cpp:1018
>  void FrameLoader::loadArchive(PassRefPtr<Archive> prpArchive)
>  {
> -    RefPtr<Archive> archive = prpArchive;
> +    m_archive = prpArchive;

In this case, you can rename prpArchive to archive.

> Source/WebCore/loader/FrameLoader.cpp:2323
> +    const UChar* mimeType = loader->responseMIMEType().characters();
> +    mimeType++;

I don't understand this code.

> Source/WebCore/loader/FrameLoader.h:504
> +    // 0 if the frame content is not from an archive.

I'd remove this comment.

> Source/WebCore/loader/archive/ArchiveFactory.cpp:68
> +    mimeTypes.set("multipart/related", archiveFactoryCreate<MHTMLArchive>);
> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);

Does this need ENABLE(MHTML) ?

> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:92
> +        return archive;

archive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:49
> +    // For security reasons we only load MHTML pages from the local file system.
> +    if (!url.isLocalFile())
> +        return false;

Interesting.  Is that true for the other archive formats?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:56
> +    // Since MHTML is a flat format, we need to make all franes aware of all resources.

I don't fully understand what this is doing, but it seems fine.

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:57
> +    for (size_t i = 0; i < parser.frameCount(); i++) {

++i

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:66
> +    return mainArchive;

mainArchive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:57
> +        return 0;

return nullptr;

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:60
> +    RefPtr<MHTMLArchive> archive = adoptRef(new MHTMLArchive());

MHTMLArchive should have a static create method.  (All RefCounted types in WebKey should be constructed via a static create method.)

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:74
> +            // Ignore IE nesting which makes little sense.

Can you elaborate a bit in this comment?  Does that mean we can't handle all MHTML files or that the nesting isn't required for parsing them?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:101
> +    return archive;

archive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:112
> +            RefPtr<MHTMLArchive> subframe = adoptRef(new MHTMLArchive());

MHTMLArchive::create

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:117
> +    } else
> +        m_resources.append(resource);

I'd move this to the top with an early return.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:120
> +PassRefPtr<ArchiveResource> MHTMLParser::parseNextPart(MIMEHeader* mimeHeader, const String& endOfPartBoundary, const String& endOfDocumentBoundary, bool* endOfArchiveReached)

endOfArchiveReached => WebKit uses non-const references for out parameters.  Nutty, I know.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:122
> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());

Should we ASSERT something about endOfArchiveReached here?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:126
> +    bool partEndReached = false;

endOfPartReached for consistency with endOfArchiveReached ?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:166
> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader->contentLocation()),

mimeHeader->contentLocation() is always an absolute URL?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:178
> +PassRefPtr<MHTMLArchive> MHTMLParser::frameAt(size_t index) const
> +{
> +    return m_frames[index];
> +}

This function does not appear to transfer ownership.  It should just return an MHTMLArchive*

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:188
> +PassRefPtr<ArchiveResource> MHTMLParser::subResourceAt(size_t index) const
> +{
> +    return m_resources[index];
> +}

This function does not appear to transfer ownership.  It should just return an ArchiveResource*

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:45
> +static void unquoteString(String* string)

This would probably be better if it just returned a String.  Strings in WebKit are immutable.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:70
> +                keyValuePairs->add(key, value.toString().stripWhiteSpace());

What if there are duplicates?  Do we keep the first or the last?  Maybe we error out?

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:91
> +    //    if (buffer->isEOF())
> +    //        return 0;

Please remove commented out code.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:93
> +    RefPtr<MIMEHeader> mimeHeader = adoptRef(new MIMEHeader());

MIMEHeader::create

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:94
> +    WTF::HashMap<String, String> keyValuePairs;

Consider creating a typedef for WTF::HashMap<String, String>

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:95
> +    retrieveKeyValuePairs(buffer, &keyValuePairs);

We should just have retrieveKeyValuePairs return a HashMap rather than using out-params.  Also, you can probably skip the WTF on the HashMap.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:117
> +    return mimeHeader.release();

yay!

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:119
> +bool MIMEHeader::isMultipart() const

Missing blank line before this line.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:152
> +String MIMEHeader::contentType() const
> +{
> +    return m_contentType;
> +}
> +
> +String MIMEHeader::charset() const
> +{
> +    return m_charset;
> +}
> +
> +MIMEHeader::Encoding MIMEHeader::contentTransferEncoding() const
> +{
> +    return m_contentTransferEncoding;
> +}
> +
> +String MIMEHeader::contentLocation() const
> +{
> +    return m_contentLocation;
> +}
> +
> +String MIMEHeader::multiPartType() const
> +{
> +    return m_multipartType;
> +}
> +
> +String MIMEHeader::endOfPartBoundary() const
> +{
> +    return m_endOfPartBoundary;
> +}

All these one-liners can go in the header.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:163
> +    String encoding(text);
> +    encoding.stripWhiteSpace();
> +    encoding.makeLower();

Do these modify the string?  I would have expected:

encoding = text.lower().stripWhiteSpace();

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:174
> +void MIMEHeader::parseContentType(const String& contentType)

Don't we already have code for parsing ContentTypes?  Wow, this function is complicated.  I haven't review it in detail.  I'll take a look during the next pass.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:38
> +// This class is a limited MIME parser used to parse the MIME headers of MHTML files.

I'd make this a FIXME comment.  Ideally we'd generalize this code to work for more than just MHTML.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
> +    enum Encoding { QuotedPrintable, Base64, SevenBit, Unknown };

These should each get their own line.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:49
> +    // Parses the MIME header from the passed buffer. The buffer is advanced to
> +    // the end of the header. Returns 0 if there was an error.

This comment can be removed.

> Tools/Scripts/webkitpy/layout_tests/port/test_files.py:47
>  _supported_file_extensions = set(['.html', '.shtml', '.xml', '.xhtml', '.xhtmlmp', '.pl',
> -                                  '.php', '.svg'])
> +                                  '.php', '.svg', '.mht'])

Does this need to do a similar check for the supported feature set?
Comment 32 Jay Civelli 2011-05-11 10:29:46 PDT
Created attachment 93144 [details]
Patch
Comment 33 Jay Civelli 2011-05-11 10:30:28 PDT
Comment on attachment 93144 [details]
Patch

Oops wrong bug.
Comment 34 Jay Civelli 2011-05-13 10:15:08 PDT
Created attachment 93475 [details]
Patch
Comment 35 Jay Civelli 2011-05-13 10:15:21 PDT
Comment on attachment 92676 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=92676&action=review

>> Source/WebCore/loader/DocumentLoader.cpp:623
>> +        if (resource) {
> 
> We commonly combine these two lines.

Done.

>> Source/WebCore/loader/DocumentLoader.cpp:639
>> +          return false;
> 
> bad indent

Corrected.

>> Source/WebCore/loader/DocumentLoader.cpp:643
>> +        return true; // Always fail the load for resources not included in the MHTML.
> 
> Does this need ENABLE(MHTML) ?

Added.

>> Source/WebCore/loader/FrameLoader.cpp:1018
>> +    m_archive = prpArchive;
> 
> In this case, you can rename prpArchive to archive.

Renamed.

>> Source/WebCore/loader/FrameLoader.cpp:2323
>> +    mimeType++;
> 
> I don't understand this code.

The worst part is... me neither...
Probably some left over from something I was doing earlier and that only got only partially removed.
Cleaned-up.

>> Source/WebCore/loader/FrameLoader.h:504
>> +    // 0 if the frame content is not from an archive.
> 
> I'd remove this comment.

Removed.

>> Source/WebCore/loader/archive/ArchiveFactory.cpp:68
>> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);
> 
> Does this need ENABLE(MHTML) ?

Yes, added.

>> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:92
>> +        return archive;
> 
> archive.release()

Added.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:49
>> +        return false;
> 
> Interesting.  Is that true for the other archive formats?

Probably, but I am afraid changing that policy for WebArchive could be disruptive.
(I actually sent an email to Brady Eidson regarding that, he told me that whether this was a problem was not clear).

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:57
>> +    for (size_t i = 0; i < parser.frameCount(); i++) {
> 
> ++i

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:66
>> +    return mainArchive;
> 
> mainArchive.release()

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:57
>> +        return 0;
> 
> return nullptr;

Doesn't nullptr only work with PassOwnPtr?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:60
>> +    RefPtr<MHTMLArchive> archive = adoptRef(new MHTMLArchive());
> 
> MHTMLArchive should have a static create method.  (All RefCounted types in WebKey should be constructed via a static create method.)

It already has one with parameters, adding one with no parameters.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:74
>> +            // Ignore IE nesting which makes little sense.
> 
> Can you elaborate a bit in this comment?  Does that mean we can't handle all MHTML files or that the nesting isn't required for parsing them?

Unmht does not do any nesting. IE does, but it does not seem to make sense. See LayoutTests/mhtml/top_frame_ie.mht as an example in this patch, they nest the top frame and its first child frame, but nothing else.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:101
>> +    return archive;
> 
> archive.release()

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:112
>> +            RefPtr<MHTMLArchive> subframe = adoptRef(new MHTMLArchive());
> 
> MHTMLArchive::create

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:117
>> +        m_resources.append(resource);
> 
> I'd move this to the top with an early return.

Good idea.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:120
>> +PassRefPtr<ArchiveResource> MHTMLParser::parseNextPart(MIMEHeader* mimeHeader, const String& endOfPartBoundary, const String& endOfDocumentBoundary, bool* endOfArchiveReached)
> 
> endOfArchiveReached => WebKit uses non-const references for out parameters.  Nutty, I know.

OK, changed endOfArchiveReached. I guess when passing out parameters as references, it is not possible to make them optional.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:122
>> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());
> 
> Should we ASSERT something about endOfArchiveReached here?

Not sure what to ASSERT, it is an optional out parameter.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:126
>> +    bool partEndReached = false;
> 
> endOfPartReached for consistency with endOfArchiveReached ?

Good idea. Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:166
>> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader->contentLocation()),
> 
> mimeHeader->contentLocation() is always an absolute URL?

Ah, good point. The spec says this could be a relative URL, even though IE and Unmht always generate absolute URLs.
Do you think it is acceptable to add a FIXME and fix as a follow up patch?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:178
>> +}
> 
> This function does not appear to transfer ownership.  It should just return an MHTMLArchive*

OK.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:188
>> +}
> 
> This function does not appear to transfer ownership.  It should just return an ArchiveResource*

OK.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:45
>> +static void unquoteString(String* string)
> 
> This would probably be better if it just returned a String.  Strings in WebKit are immutable.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:70
>> +                keyValuePairs->add(key, value.toString().stripWhiteSpace());
> 
> What if there are duplicates?  Do we keep the first or the last?  Maybe we error out?

Added a LOG_ERROR, keeping the last value.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:91
>> +    //        return 0;
> 
> Please remove commented out code.

Oops, removed.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:93
>> +    RefPtr<MIMEHeader> mimeHeader = adoptRef(new MIMEHeader());
> 
> MIMEHeader::create

Added.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:94
>> +    WTF::HashMap<String, String> keyValuePairs;
> 
> Consider creating a typedef for WTF::HashMap<String, String>

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:95
>> +    retrieveKeyValuePairs(buffer, &keyValuePairs);
> 
> We should just have retrieveKeyValuePairs return a HashMap rather than using out-params.  Also, you can probably skip the WTF on the HashMap.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:117
>> +    return mimeHeader.release();
> 
> yay!

I shall someday fully master the RefPtr class! :-)

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:119
>> +bool MIMEHeader::isMultipart() const
> 
> Missing blank line before this line.

Added.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:152
>> +}
> 
> All these one-liners can go in the header.

Moved to header.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:163
>> +    encoding.makeLower();
> 
> Do these modify the string?  I would have expected:
> 
> encoding = text.lower().stripWhiteSpace();

Ah, yes you are right!
Corrected.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:38
>> +// This class is a limited MIME parser used to parse the MIME headers of MHTML files.
> 
> I'd make this a FIXME comment.  Ideally we'd generalize this code to work for more than just MHTML.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
>> +    enum Encoding { QuotedPrintable, Base64, SevenBit, Unknown };
> 
> These should each get their own line.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:49
>> +    // the end of the header. Returns 0 if there was an error.
> 
> This comment can be removed.

Removed.

>> Tools/Scripts/webkitpy/layout_tests/port/test_files.py:47
>> +                                  '.php', '.svg', '.mht'])
> 
> Does this need to do a similar check for the supported feature set?

Done in webkit.py.
Comment 36 Early Warning System Bot 2011-05-13 10:50:25 PDT
Comment on attachment 93475 [details]
Patch

Attachment 93475 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8694223
Comment 37 Adam Barth 2011-05-13 10:54:17 PDT
Comment on attachment 93475 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review

This is getting close.  A few minor points (and some questions) below.  Thanks!

> Source/WebCore/CMakeLists.txt:1950
> +        loader/archive/mhtml/MIMEHeader.cpp

I thought this got moved to platform/network...

> Source/WebCore/GNUmakefile.list.am:2008
> +	Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp \
> +	Source/WebCore/loader/archive/mhtml/MIMEHeader.h \

Same here

> Source/WebCore/WebCore.gypi:3352
> +            'loader/archive/mhtml/MHTMLArchive.h',

The headers only need to appear once (in the first group of files).

> Source/WebCore/loader/MainResourceLoader.cpp:261
> +        bool isRemoteWebArchive = (equalIgnoringCase("application/x-webarchive", mimeType) || equalIgnoringCase("message/rfc822", mimeType))
> +            && !m_substituteData.isValid() && !url.isLocalFile();

does message/rfc822 need to be behind ENABLE(MHTML) ?

> Source/WebCore/loader/archive/Archive.h:47
> +    virtual ~Archive() { }

Virtual functions should be out-of-line.

> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:62
> +            m_subframes.set((*iFrame)->mainResource()->url().string(), iFrame->get());

Is the variable really called iFrame ?  *cries*

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:121
> +    if (mimeType != "text/html" && mimeType != "application/xhtml+xml" && mimeType != "image/svg+xml")  {

This line doesn't feel right.  Is there some function in WebCore we should be calling that knows about the set of MIME types you're trying to select here?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:140
> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());

I meant that we should add ASSERT(!endOfArchiveReached) here, right?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),

KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
> +class MIMEHeader : public RefCounted<MIMEHeader> {

Am I confused?  I thought we did this class already...
Comment 38 Collabora GTK+ EWS bot 2011-05-13 11:36:44 PDT
Comment on attachment 93475 [details]
Patch

Attachment 93475 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8694229
Comment 39 WebKit Review Bot 2011-05-13 12:47:02 PDT
Comment on attachment 93475 [details]
Patch

Attachment 93475 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8700083
Comment 40 Jay Civelli 2011-05-13 15:23:20 PDT
Created attachment 93523 [details]
Patch
Comment 41 Jay Civelli 2011-05-13 15:23:28 PDT
Comment on attachment 93475 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review

>> Source/WebCore/CMakeLists.txt:1950
>> +        loader/archive/mhtml/MIMEHeader.cpp
> 
> I thought this got moved to platform/network...

Moved them in the new patch.

>> Source/WebCore/WebCore.gypi:3352
>> +            'loader/archive/mhtml/MHTMLArchive.h',
> 
> The headers only need to appear once (in the first group of files).

Ah, ok.

>> Source/WebCore/loader/MainResourceLoader.cpp:261
>> +            && !m_substituteData.isValid() && !url.isLocalFile();
> 
> does message/rfc822 need to be behind ENABLE(MHTML) ?

I think this is fine, since we don't want to let you load such a message from the network regardless of whether MHTML support is enabled or not.
(as this code does for webarchive).

>> Source/WebCore/loader/archive/Archive.h:47
>> +    virtual ~Archive() { }
> 
> Virtual functions should be out-of-line.

Done (sadly I had to create a CPP file to contain that dummy destructor).

>> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:62
>> +            m_subframes.set((*iFrame)->mainResource()->url().string(), iFrame->get());
> 
> Is the variable really called iFrame ?  *cries*

Don't cry! :-) I changed that code!

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:121
>> +    if (mimeType != "text/html" && mimeType != "application/xhtml+xml" && mimeType != "image/svg+xml")  {
> 
> This line doesn't feel right.  Is there some function in WebCore we should be calling that knows about the set of MIME types you're trying to select here?

Now using MIMETypeRegistry::isSupportedNonImageMIMEType() which should do it I believe.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:140
>> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());
> 
> I meant that we should add ASSERT(!endOfArchiveReached) here, right?

Since it is a reference, what would that really check?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
>> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),
> 
> KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?

It's complicated... The spec says that it could be a relative URL and there are several ways it could be resolved.
One of them is to use the base specification if any in the part's content. This is tricky...
Otherwise there are other ways to resolve it, including using the surrounding parts location.
Practically, IE and Firefox (UnMHT) always use absolute URLs (from what I can tell), so I wonder if this could be a FIXME for now.
What do you think?
Comment 42 Adam Barth 2011-05-13 15:25:14 PDT
(In reply to comment #41)
> (From update of attachment 93475 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review
> 
> >> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
> >> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),
> > 
> > KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?
> 
> It's complicated... The spec says that it could be a relative URL and there are several ways it could be resolved.
> One of them is to use the base specification if any in the part's content. This is tricky...
> Otherwise there are other ways to resolve it, including using the surrounding parts location.
> Practically, IE and Firefox (UnMHT) always use absolute URLs (from what I can tell), so I wonder if this could be a FIXME for now.
> What do you think?

Recording that information in a FIXME seems valuable.
Comment 43 Jay Civelli 2011-05-13 15:58:43 PDT
Comment on attachment 93475 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review

>>>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
>>>> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),
>>> 
>>> KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?
>> 
>> It's complicated... The spec says that it could be a relative URL and there are several ways it could be resolved.
>> One of them is to use the base specification if any in the part's content. This is tricky...
>> Otherwise there are other ways to resolve it, including using the surrounding parts location.
>> Practically, IE and Firefox (UnMHT) always use absolute URLs (from what I can tell), so I wonder if this could be a FIXME for now.
>> What do you think?
> 
> Recording that information in a FIXME seems valuable.

Yep, that's what I did in my patch.
Comment 44 Alexey Proskuryakov 2011-05-13 16:06:16 PDT
Comment on attachment 93523 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review

There is a bunch of non-trivial changes to Web Archive code. Did you verify that those still work in Safari nightly builds?

It's great to see many test cases! Is there one verifying that WebKit is not vulnerable to <http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0096>?

I made some nitpick comments below, but I didn't review deeply.

> Source/WebCore/loader/archive/ArchiveFactory.cpp:71
> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);

Where does message/rfc822 type come from in practice? Does CFNetwork know to give local .mht files this type?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:155
> +            // Quoted-Printable require \r\n to be parsed correctly.

What's "correctly"? All code in WebCore to be correct, so this comment doesn't say much.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:179
> +        LOG_ERROR("Invalid encoding for MHTML part.");

Why not log the encoding here to make debugging easier?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:186
> +    KURL location = KURL(ParsedURLString, mimeHeader.contentLocation());

This doesn't seem right. The contentLocation string comes from external content, but this KURL constructor asserts if the string is not a canonical absolute URL in KURL sense.

If we only support absolute URLs, use a constructor with empty base URL: KURL(KURL(), mimeHeader.contentLocation());

> Source/WebCore/loader/archive/mhtml/MHTMLParser.h:50
> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);

This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.

> Source/WebCore/platform/network/MIMEHeader.h:69
> +    static PassRefPtr<MIMEHeader> create();
> +    MIMEHeader();

I'm not sure if we need a private create function. It would certainly be fairly unusual.

> Source/WebCore/platform/network/MIMEHeader.h:71
> +    static Encoding parseEncoding(const String&);

Should this be ContentEncoding? An unqualified "encoding" is usually about charsets.
Comment 45 Adam Barth 2011-05-13 16:06:17 PDT
Comment on attachment 93523 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review

> LayoutTests/mhtml/multi_frames_ie.mht:14
> +------=_NextPart_000_0005_01CC0FF0.EF60FBE0
> +Content-Type: image/png
> +Content-Transfer-Encoding: base64

Where are the results for these tests?
Comment 46 Jay Civelli 2011-05-16 16:07:29 PDT
Comment on attachment 93523 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review

>> LayoutTests/mhtml/multi_frames_ie.mht:14
>> +Content-Transfer-Encoding: base64
> 
> Where are the results for these tests?

Added.

>> Source/WebCore/loader/archive/ArchiveFactory.cpp:71
>> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);
> 
> Where does message/rfc822 type come from in practice? Does CFNetwork know to give local .mht files this type?

CFNetwork does not return message/rfc822 for local .mht files.
I do my development on the Chromium port  which uses a different MIME type registry.
I investigated but I am not familiar with the CFNetwork, do you know how I can register a MIME type for a file extension?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:155
>> +            // Quoted-Printable require \r\n to be parsed correctly.
> 
> What's "correctly"? All code in WebCore to be correct, so this comment doesn't say much.

I meant that the line reader removes the \r\n, but we need them for the content in this case as they are expected by the QuotedPrintable decoder.
I clarified the comment.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:179
>> +        LOG_ERROR("Invalid encoding for MHTML part.");
> 
> Why not log the encoding here to make debugging easier?

Here we don't have access to the encoding string anymore (this is an enum).
I added a log error with the actual unknown encoding in the MIMEHeader where we parse the actual string.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:186
>> +    KURL location = KURL(ParsedURLString, mimeHeader.contentLocation());
> 
> This doesn't seem right. The contentLocation string comes from external content, but this KURL constructor asserts if the string is not a canonical absolute URL in KURL sense.
> 
> If we only support absolute URLs, use a constructor with empty base URL: KURL(KURL(), mimeHeader.contentLocation());

Switched to KURL(KURL(), mimeHeader.contentLocation());

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.h:50
>> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);
> 
> This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.

It is in MHTMLParser.cpp line 94.

>> Source/WebCore/platform/network/MIMEHeader.h:69
>> +    MIMEHeader();
> 
> I'm not sure if we need a private create function. It would certainly be fairly unusual.

OK, removed the create method.

>> Source/WebCore/platform/network/MIMEHeader.h:71
>> +    static Encoding parseEncoding(const String&);
> 
> Should this be ContentEncoding? An unqualified "encoding" is usually about charsets.

Done.
Comment 47 Jay Civelli 2011-05-16 16:09:47 PDT
(In reply to comment #44)
> (From update of attachment 93523 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review
> 
> There is a bunch of non-trivial changes to Web Archive code. Did you verify that those still work in Safari nightly builds?
Yes, I was able to save www.apple.com and reload it as a WebArchive.
Also all the webarchive layout tests still pass.  

> It's great to see many test cases! Is there one verifying that WebKit is not vulnerable to <http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0096>?
From what I understand about the attack, I think we are safe as we only load MHTML from the local file system.
Comment 48 Jay Civelli 2011-05-16 16:10:34 PDT
Created attachment 93709 [details]
Patch
Comment 49 Jay Civelli 2011-05-16 16:40:03 PDT
Created attachment 93712 [details]
Patch
Comment 50 Early Warning System Bot 2011-05-16 16:55:33 PDT
Comment on attachment 93712 [details]
Patch

Attachment 93712 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8707098
Comment 51 Alexey Proskuryakov 2011-05-16 17:26:01 PDT
> I investigated but I am not familiar with the CFNetwork, do you know how I can register a MIME type for a file extension?

We have a default mapping in mediaMIMETypeMap() (MIMETypeRegistry.cpp), but it's only used for media. I'm also not sure how to best fix this.

Wikipedia says that the type is multipart/related, not message/rfc822. Which one is right? Should we perhaps just look at the extension, not at whatever type the loader synthesizes for local files?

Also, both .mht and .mhtml need to be supported, according to Wikipedia.

>>> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);
>> This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.
>
>It is in MHTMLParser.cpp line 94.

This doesn't seem accurate to me. The argument is RefPtr there, not a PassRefPtr.

In general, using PassRefPtr for arguments should be a rare exception. It's dangerous (untested code paths almost always end up with null dereferences), and has a very small impact on performance.
Comment 52 Collabora GTK+ EWS bot 2011-05-16 19:31:32 PDT
Comment on attachment 93712 [details]
Patch

Attachment 93712 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8704176
Comment 53 Jay Civelli 2011-05-17 15:17:23 PDT
Created attachment 93828 [details]
Patch
Comment 54 Jay Civelli 2011-05-17 15:31:28 PDT
(In reply to comment #51)
> > I investigated but I am not familiar with the CFNetwork, do you know how I can register a MIME type for a file extension?
> 
> We have a default mapping in mediaMIMETypeMap() (MIMETypeRegistry.cpp), but it's only used for media. I'm also not sure how to best fix this.
> Wikipedia says that the type is multipart/related, not message/rfc822. Which one is right? Should we perhaps just look at the extension, not at whatever type the loader synthesizes for local files?

OK, my thought is that we should probably stick to the current model where the port decides the file extension/MIME type mapping. It should be fairly easy for other ports to add that mapping to enable the feature. 
 
Regarding the MIME type, you are right, we should use multipart/related not rfc822. I removed reference to message/rfc822.

>
> Also, both .mht and .mhtml need to be supported, according to Wikipedia.
> 
> >>> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);
> >> This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.
> >
> >It is in MHTMLParser.cpp line 94.
> 
> This doesn't seem accurate to me. The argument is RefPtr there, not a PassRefPtr.
> 
> In general, using PassRefPtr for arguments should be a rare exception. It's dangerous (untested code paths almost always end up with null dereferences), and has a very small impact on performance.
Ah, you are right.
Changed it to a RefPtr.cd
Comment 55 Early Warning System Bot 2011-05-17 15:35:42 PDT
Comment on attachment 93828 [details]
Patch

Attachment 93828 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8711182
Comment 56 Alexey Proskuryakov 2011-05-17 15:45:19 PDT
> Changed it to a RefPtr.

RefPtr should never be used in an argument, see <http://www.webkit.org/coding/RefPtr.html>. This argument should be a plain pointer.
Comment 57 Collabora GTK+ EWS bot 2011-05-17 17:31:07 PDT
Comment on attachment 93828 [details]
Patch

Attachment 93828 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8702582
Comment 58 Jay Civelli 2011-05-18 10:24:37 PDT
(In reply to comment #56)
> > Changed it to a RefPtr.
> 
> RefPtr should never be used in an argument, see <http://www.webkit.org/coding/RefPtr.html>. This argument should be a plain pointer.

OK, I am a bit confused as I initially used a PassRefPtr just like http://www.webkit.org/coding/RefPtr.html suggests making sure I immediately assigned it to a RefPtr.
I am OK to change it to a raw pointer, but that means I'll have to do a little dance as the MIMEHeader if null is retrieved from the MIMEHeader::parseHeader class which returns a PassRefPtr that I would have to store in a RefPtr and then get the raw pointer out of it.
Comment 59 Alexey Proskuryakov 2011-05-18 10:38:54 PDT
> OK, I am a bit confused as I initially used a PassRefPtr just like 
> http://www.webkit.org/coding/RefPtr.html suggests

The document doesn't describe PassRefPtr usage in sufficient detail at the moment. 

> I am OK to change it to a raw pointer, but that means I'll have to do a little dance

I don't understand this comment. Maybe post a patch, and we'll see?
Comment 60 Jay Civelli 2011-05-18 10:45:15 PDT
Created attachment 93935 [details]
Patch
Comment 61 Jay Civelli 2011-05-18 11:52:01 PDT
(In reply to comment #59)
> > OK, I am a bit confused as I initially used a PassRefPtr just like 
> > http://www.webkit.org/coding/RefPtr.html suggests
> 
> The document doesn't describe PassRefPtr usage in sufficient detail at the moment. 
> 
> > I am OK to change it to a raw pointer, but that means I'll have to do a little dance
> 
> I don't understand this comment. Maybe post a patch, and we'll see?
I posted a patch. I forgot to rebase so it failed to build, but you can see what I mean.
Comment 62 Alexey Proskuryakov 2011-05-18 12:09:44 PDT
Thanks, I see what you mean. I'm not fully following why the header sometimes comes from outside, and other times doesn't. Depending on what the situation is, there can be different ways to make the code better:
- Split parseArchive into two functions - one that takes a pre-parsed MIMEHeader object, and another that doesn't. Only the latter would parse it internally.
- Always parse the header before calling parseArchive, so that it would never see a null argument.
- Or give refPtrHeader a better name, explaining how it's different from the header argument.

Currently, it's difficult to follow when the header argument may or may not be null.

Adam, would you be willing to take back the reviewing of this work? I only commented on a few things that stood out to me, and didn't review deeply.
Comment 63 Jay Civelli 2011-05-18 12:48:34 PDT
Created attachment 93970 [details]
Patch
Comment 64 Jay Civelli 2011-05-18 12:50:19 PDT
(In reply to comment #62)
> Thanks, I see what you mean. I'm not fully following why the header sometimes comes from outside, and other times doesn't. Depending on what the situation is, there can be different ways to make the code better:
> - Split parseArchive into two functions - one that takes a pre-parsed MIMEHeader object, and another that doesn't. Only the latter would parse it internally.
> - Always parse the header before calling parseArchive, so that it would never see a null argument.
> - Or give refPtrHeader a better name, explaining how it's different from the header argument.
Good suggestions. I want with #1 and made it 2 functions: parseArchive and parseArchiveWithHeader.
Comment 65 Early Warning System Bot 2011-05-18 13:14:38 PDT
Comment on attachment 93970 [details]
Patch

Attachment 93970 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8704791
Comment 66 Adam Barth 2011-05-18 13:31:04 PDT
> Adam, would you be willing to take back the reviewing of this work? I only commented on a few things that stood out to me, and didn't review deeply.

Sure.  I'd be happy to.
Comment 67 Adam Barth 2011-05-18 13:45:11 PDT
Comment on attachment 93970 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93970&action=review

This looks basically done.  The only major issue is the use of render-tree tests.  I've also added some minor comments/questions below.  Thanks!

> LayoutTests/platform/chromium-mac/mhtml/multi_frames_ie-expected.txt:23
> +      RenderPartObject {IFRAME} at (0,172) size 304x154 [border: (2px inset #000000)]
> +        layer at (0,0) size 300x150
> +          RenderView at (0,0) size 300x150
> +        layer at (0,0) size 300x150
> +          RenderBlock {HTML} at (0,0) size 300x150
> +            RenderBody {BODY} at (8,8) size 284x134

We don't need render tree tests for this feature.  We can either use text tests or dump-as-markup tests.  Using render tree tests is too much of a maintenance burden.  I feel like I've made this comment every time I've reviewed this patch.

> Source/WebCore/WebCore.pro:3487
> +        loader/archive/Archive.cpp \

This seems slightly misplaced, but I guess Qt doesn't build with WebArchive support.

> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-17019
> -				892CF207134C8BB300AAEDA1 /* JSStorageInfoQuotaCallback.cpp */,
> -				892CF208134C8BB300AAEDA1 /* JSStorageInfoQuotaCallback.h */,

These changes seem unrelated.  I was seeing them today too.  I'm not sure what their deal is.

> Source/WebCore/loader/DocumentLoader.cpp:658
> +        ASSERT(false);

ASSERT_NOT_REACHED().  You should be able to just remove the default case and rely upon the compiler to warn you if you forget a type.

> Source/WebCore/loader/DocumentLoader.cpp:661
> +    return false;

It's strange that both branches in the Archive::WebArchive lead to returning false.  Is that your intention?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:125
> +    if (!MIMETypeRegistry::isSupportedNonImageMIMEType(mimeType) || MIMETypeRegistry::isSupportedJavaScriptMIMEType(mimeType) || mimeType == "text/css") { // Chromium considers CSS as a document suitable resource.

I'd just remove this comment.  Adding text/css here is a little goofy, but probably harmless.

> Source/WebCore/platform/network/MIMEHeader.cpp:68
> +        size_t semiColonIndex = line.find(L':');

No need for L here.

> Source/WebCore/platform/network/MIMEHeader.cpp:69
> +        if (semiColonIndex == notFound || (semiColonIndex == line.length() - 1)) {

Why isn't

Foo-bar:

A key-value pair?  It just seems like the value is the empty string...

> Source/WebCore/platform/network/MIMEHeader.cpp:78
> +        keyValuePairs.set(key, value.toString().stripWhiteSpace());

Yeah, especially that's then whitespace sensitive.  If you've got a trailing space, then you end up with a key-value pair with an empty value.

> Source/WebCore/platform/network/MIMEHeader.cpp:88
> +        ContentTypeParser contentTypeParser(makeString("Content-Type:", mimeParametersIterator->second));

It's strange that you need to synthesize a fake key name here.

> Source/WebCore/platform/network/MIMEHeader.cpp:107
> +        mimeHeader->m_contentTransferEncoding = parseContentTransferEncoding(mimeParametersIterator->second);

This one works more nicely.

> Source/WebCore/platform/network/MIMEHeader.h:38
> +// FIXME: This class is a limited MIME parser used to parse the MIME headers of MHTML files.

Super minor nit: I'd more this comment to right above the "class MIMEHeader" line.
Comment 68 Collabora GTK+ EWS bot 2011-05-18 17:19:51 PDT
Comment on attachment 93970 [details]
Patch

Attachment 93970 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8709673
Comment 69 Jay Civelli 2011-05-19 10:11:04 PDT
Created attachment 94082 [details]
Patch
Comment 70 Early Warning System Bot 2011-05-19 10:29:17 PDT
Comment on attachment 94082 [details]
Patch

Attachment 94082 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8716367
Comment 71 Jay Civelli 2011-05-19 10:46:48 PDT
Comment on attachment 93970 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=93970&action=review

>> LayoutTests/platform/chromium-mac/mhtml/multi_frames_ie-expected.txt:23
>> +            RenderBody {BODY} at (8,8) size 284x134
> 
> We don't need render tree tests for this feature.  We can either use text tests or dump-as-markup tests.  Using render tree tests is too much of a maintenance burden.  I feel like I've made this comment every time I've reviewed this patch.

Made them text test. Sorry it took so long.

>> Source/WebCore/WebCore.pro:3487
>> +        loader/archive/Archive.cpp \
> 
> This seems slightly misplaced, but I guess Qt doesn't build with WebArchive support.

Yeah, since MHTML is the only archive type, this won't compile without it.

>> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-17019
>> -				892CF208134C8BB300AAEDA1 /* JSStorageInfoQuotaCallback.h */,
> 
> These changes seem unrelated.  I was seeing them today too.  I'm not sure what their deal is.

That's weird. I reverted my change, open the file in XCode and added my file and I still see those. Hopefully the bots will tell us if that causes anything bad.

>> Source/WebCore/loader/DocumentLoader.cpp:658
>> +        ASSERT(false);
> 
> ASSERT_NOT_REACHED().  You should be able to just remove the default case and rely upon the compiler to warn you if you forget a type.

Done.

>> Source/WebCore/loader/DocumentLoader.cpp:661
>> +    return false;
> 
> It's strange that both branches in the Archive::WebArchive lead to returning false.  Is that your intention?

Good point. I simplified that code (coming in my next patch where I'll hopefully have the qt build working).

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:125
>> +    if (!MIMETypeRegistry::isSupportedNonImageMIMEType(mimeType) || MIMETypeRegistry::isSupportedJavaScriptMIMEType(mimeType) || mimeType == "text/css") { // Chromium considers CSS as a document suitable resource.
> 
> I'd just remove this comment.  Adding text/css here is a little goofy, but probably harmless.

Removed comment.

>> Source/WebCore/platform/network/MIMEHeader.cpp:68
>> +        size_t semiColonIndex = line.find(L':');
> 
> No need for L here.

Removed.

>> Source/WebCore/platform/network/MIMEHeader.cpp:69
>> +        if (semiColonIndex == notFound || (semiColonIndex == line.length() - 1)) {
> 
> Why isn't
> 
> Foo-bar:
> 
> A key-value pair?  It just seems like the value is the empty string...

Good point, I am now just requiring a : but relaxed the "no empty value" constraint.

>> Source/WebCore/platform/network/MIMEHeader.cpp:88
>> +        ContentTypeParser contentTypeParser(makeString("Content-Type:", mimeParametersIterator->second));
> 
> It's strange that you need to synthesize a fake key name here.

Yeah, the ContentTypeParser expects the actual key.
Added a FIXME that I'll try to address in a follow-up patch to make the ContentTypeParser more flexible so it can also take just the ContenType value.

>> Source/WebCore/platform/network/MIMEHeader.h:38
>> +// FIXME: This class is a limited MIME parser used to parse the MIME headers of MHTML files.
> 
> Super minor nit: I'd more this comment to right above the "class MIMEHeader" line.

Done.
Comment 72 WebKit Review Bot 2011-05-19 10:51:24 PDT
Comment on attachment 94082 [details]
Patch

Attachment 94082 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8721028
Comment 73 WebKit Review Bot 2011-05-19 10:52:31 PDT
Comment on attachment 94082 [details]
Patch

Attachment 94082 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/8707937
Comment 74 Adam Barth 2011-05-19 11:11:13 PDT
Comment on attachment 94082 [details]
Patch

It's hard for me to read this patch again with fresh eyes, but I think we're in good shape now.  Thanks for changing the tests to be text tests.  It looks like you've still got some broken builds, but feel free to land once you've got that all sorted out.  Thanks Jay.
Comment 75 Collabora GTK+ EWS bot 2011-05-19 11:39:53 PDT
Comment on attachment 94082 [details]
Patch

Attachment 94082 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8711934
Comment 76 WebKit Review Bot 2011-05-19 12:23:47 PDT
Comment on attachment 94082 [details]
Patch

Attachment 94082 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8719148
Comment 77 Jay Civelli 2011-05-19 13:38:22 PDT
Created attachment 94107 [details]
Patch
Comment 78 Early Warning System Bot 2011-05-19 13:53:41 PDT
Comment on attachment 94107 [details]
Patch

Attachment 94107 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8707970
Comment 79 Jay Civelli 2011-05-19 15:10:21 PDT
Created attachment 94126 [details]
Patch
Comment 80 WebKit Review Bot 2011-05-19 16:43:41 PDT
Comment on attachment 94126 [details]
Patch

Attachment 94126 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8719210
Comment 81 WebKit Review Bot 2011-05-19 16:43:48 PDT
Created attachment 94146 [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 82 Collabora GTK+ EWS bot 2011-05-19 20:52:09 PDT
Comment on attachment 94126 [details]
Patch

Attachment 94126 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8717384
Comment 83 Jay Civelli 2011-05-20 10:54:29 PDT
Created attachment 94239 [details]
Patch
Comment 84 WebKit Review Bot 2011-05-20 11:55:38 PDT
Comment on attachment 94239 [details]
Patch

Attachment 94239 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8722243

New failing tests:
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
Comment 85 WebKit Review Bot 2011-05-20 11:55:46 PDT
Created attachment 94253 [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 86 Collabora GTK+ EWS bot 2011-05-20 14:48:56 PDT
Comment on attachment 94239 [details]
Patch

Attachment 94239 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8651038
Comment 87 Jay Civelli 2011-05-23 11:15:25 PDT
Created attachment 94447 [details]
Patch
Comment 88 WebKit Review Bot 2011-05-23 12:29:59 PDT
Comment on attachment 94447 [details]
Patch

Attachment 94447 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8723866

New failing tests:
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
Comment 89 WebKit Review Bot 2011-05-23 12:30:07 PDT
Created attachment 94460 [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 90 Jay Civelli 2011-05-23 13:57:56 PDT
dglazkov says cr-linux needs a clobber to turn green and that it should be safe to land like this.
Comment 91 Adam Barth 2011-05-23 14:03:59 PDT
Comment on attachment 94447 [details]
Patch

Ok.  I'm skeptical, but dglazkov is always right.
Comment 92 Dimitri Glazkov (Google) 2011-05-23 14:54:45 PDT
(In reply to comment #91)
> (From update of attachment 94447 [details])
> Ok.  I'm skeptical, but dglazkov is always right.

Oh noes.
Comment 93 WebKit Commit Bot 2011-05-23 15:11:17 PDT
Comment on attachment 94447 [details]
Patch

Rejecting attachment 94447 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-7', 'build-..." exit_code: 2

Last 500 characters of output:
kitpy/layout_tests/port/webkit_unittest.py", line 85, in test_skipped_directories_for_symbols
    self.assertEqual(result_directories, expected_directories)
AssertionError: set(['fast/wcss', 'mathml', 'compositing/webgl', 'fast/canvas/webgl', 'mhtml', 'http/tests/canvas/webgl']) != set(['fast/wcss', 'compositing/webgl', 'mathml', 'http/tests/canvas/webgl', 'fast/canvas/webgl'])

----------------------------------------------------------------------
Ran 1028 tests in 29.203s

FAILED (failures=1)

Full output: http://queues.webkit.org/results/8722974
Comment 94 WebKit Commit Bot 2011-05-23 15:11:26 PDT
Created attachment 94498 [details]
Archive of layout-test-results from cr-jail-7

The attached test failures were seen while running run-webkit-tests on the commit-queue.
Bot: cr-jail-7  Port: Mac  Platform: Mac OS X 10.6.7
Comment 95 Jay Civelli 2011-05-23 16:08:29 PDT
Created attachment 94512 [details]
Patch
Comment 96 WebKit Review Bot 2011-05-23 17:58:07 PDT
Comment on attachment 94512 [details]
Patch

Attachment 94512 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8723945

New failing tests:
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
Comment 97 WebKit Review Bot 2011-05-23 17:58:15 PDT
Created attachment 94527 [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 98 WebKit Commit Bot 2011-05-23 21:27:06 PDT
Comment on attachment 94512 [details]
Patch

Rejecting attachment 94512 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-7', 'land-a..." exit_code: 1

Last 500 characters of output:
owElement.h
	M	Source/WebCore/html/shadow/MeterShadowElement.cpp
	M	Source/WebCore/html/shadow/DetailsMarkerControl.h
	M	Source/WebCore/html/HTMLElement.h
	M	Source/WebCore/html/HTMLEmbedElement.h
	M	Source/WebCore/html/HTMLEmbedElement.cpp
	M	Source/WebCore/CMakeLists.txt
	M	Source/WebCore/WebCore.xcodeproj/project.pbxproj
r87125 = c3af04450a613e1379f0e1ef9e3bd9054191b2d8 (refs/remotes/trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

Full output: http://queues.webkit.org/results/8727482
Comment 99 Jay Civelli 2011-05-24 08:58:15 PDT
Created attachment 94618 [details]
Patch
Comment 100 Jay Civelli 2011-05-24 09:33:24 PDT
Created attachment 94627 [details]
Patch for landing
Comment 101 WebKit Commit Bot 2011-05-24 12:32:21 PDT
Comment on attachment 94627 [details]
Patch for landing

Clearing flags on attachment: 94627

Committed r87189: <http://trac.webkit.org/changeset/87189>
Comment 102 WebKit Commit Bot 2011-05-24 12:32:52 PDT
All reviewed patches have been landed.  Closing bug.
Comment 103 Adam Roben (:aroben) 2011-05-24 12:38:29 PDT
<rdar://problem/3714721>
Comment 104 Adam Roben (:aroben) 2011-05-24 12:40:30 PDT
I guess we should file a separate bug for the ability to save MHTML files?
Comment 105 Jay Civelli 2011-05-24 12:47:00 PDT
(In reply to comment #104)
> I guess we should file a separate bug for the ability to save MHTML files?

There is already one:
https://bugs.webkit.org/show_bug.cgi?id=7169
Comment 107 David Kilzer (:ddkilzer) 2011-05-24 14:21:58 PDT
(In reply to comment #101)
> (From update of attachment 94627 [details])
> Clearing flags on attachment: 94627
> 
> Committed r87189: <http://trac.webkit.org/changeset/87189>

When you edit one FeatureDefines.xcconfig, you need to edit them all (in JavaScriptCore, WebKit and WebKit2).  See the comment at the top of the file:

// The contents of this file must be kept in sync with FeatureDefines.xcconfig in JavaScriptCore,
// WebCore, WebKit and WebKit2.  Also the default values of the ENABLE_FEATURE_NAME macros in
// build-webkit should match the values below, but they do not need to be in the same order.
Comment 108 Jay Civelli 2011-05-24 14:41:40 PDT
(In reply to comment #107)
> (In reply to comment #101)
> > (From update of attachment 94627 [details] [details])
> > Clearing flags on attachment: 94627
> > 
> > Committed r87189: <http://trac.webkit.org/changeset/87189>
> 
> When you edit one FeatureDefines.xcconfig, you need to edit them all (in JavaScriptCore, WebKit and WebKit2).  See the comment at the top of the file:
> 
> // The contents of this file must be kept in sync with FeatureDefines.xcconfig in JavaScriptCore,
> // WebCore, WebKit and WebKit2.  Also the default values of the ENABLE_FEATURE_NAME macros in
> // build-webkit should match the values below, but they do not need to be in the same order.

Sorry I missed that.
I have a patch ready. Should I file a new bug for that?
Comment 109 David Kilzer (:ddkilzer) 2011-05-24 14:44:02 PDT
(In reply to comment #108)
> (In reply to comment #107)
> > (In reply to comment #101)
> > > (From update of attachment 94627 [details] [details] [details])
> > > Clearing flags on attachment: 94627
> > > 
> > > Committed r87189: <http://trac.webkit.org/changeset/87189>
> > 
> > When you edit one FeatureDefines.xcconfig, you need to edit them all (in JavaScriptCore, WebKit and WebKit2).  See the comment at the top of the file:
> > 
> > // The contents of this file must be kept in sync with FeatureDefines.xcconfig in JavaScriptCore,
> > // WebCore, WebKit and WebKit2.  Also the default values of the ENABLE_FEATURE_NAME macros in
> > // build-webkit should match the values below, but they do not need to be in the same order.
> 
> Sorry I missed that.
> I have a patch ready. Should I file a new bug for that?

Technically yes, but let's just say:  rs=me to fix it.  Thanks!
Comment 110 Jay Civelli 2011-05-24 15:03:40 PDT
Created attachment 94696 [details]
Updated to the various FeatureDefines.xcconfig I forgot previously.
Comment 111 Adam Barth 2011-05-24 15:09:40 PDT
Comment on attachment 94696 [details]
Updated to the various FeatureDefines.xcconfig I forgot previously.

Clearing flags on attachment: 94696

Committed r87213: <http://trac.webkit.org/changeset/87213>
Comment 112 Adam Barth 2011-05-24 15:09:57 PDT
All reviewed patches have been landed.  Closing bug.