Bug 7168 - Support reading of MHTML (multipart/related) web archives
: Support reading of MHTML (multipart/related) web archives
Status: RESOLVED FIXED
: WebKit
WebKit Misc.
: 420+
: Macintosh Mac OS X 10.4
: P2 Enhancement
Assigned To:
: http://www.faqs.org/rfcs/rfc2557.html
: InRadar, ReviewedForRadar
: 59834 59946 60637
: 60013
  Show dependency treegraph
 
Reported: 2006-02-09 19:39 PST by
Modified: 2011-05-24 15:09 PST (History)


Attachments
MHTML support for WebKit GTK (37.22 KB, patch)
2008-09-02 10:29 PST, Marco Barisione
no flags Review Patch | Details | Formatted Diff | Diff
Patch (181.97 KB, patch)
2011-04-29 10:20 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (195.44 KB, patch)
2011-04-29 14:25 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (175.16 KB, patch)
2011-05-06 18:51 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.18 MB, application/zip)
2011-05-06 19:49 PST, WebKit Review Bot
no flags Details
Patch (12.06 KB, patch)
2011-05-11 10:29 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (97.89 KB, patch)
2011-05-13 10:15 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (107.42 KB, patch)
2011-05-13 15:23 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (239.73 KB, patch)
2011-05-16 16:10 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (239.77 KB, patch)
2011-05-16 16:40 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (241.62 KB, patch)
2011-05-17 15:17 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (241.35 KB, patch)
2011-05-18 10:45 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (242.11 KB, patch)
2011-05-18 12:48 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (119.95 KB, patch)
2011-05-19 10:11 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (117.21 KB, patch)
2011-05-19 13:38 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch (117.87 KB, patch)
2011-05-19 15:10 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.47 MB, application/zip)
2011-05-19 16:43 PST, WebKit Review Bot
no flags Details
Patch (123.23 KB, patch)
2011-05-20 10:54 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.51 MB, application/zip)
2011-05-20 11:55 PST, WebKit Review Bot
no flags Details
Patch (119.35 KB, patch)
2011-05-23 11:15 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-01 (1.35 MB, application/zip)
2011-05-23 12:30 PST, WebKit Review Bot
no flags Details
Archive of layout-test-results from cr-jail-7 (201.58 KB, application/zip)
2011-05-23 15:11 PST, WebKit Commit Bot
no flags Details
Patch (118.69 KB, patch)
2011-05-23 16:08 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (1.26 MB, application/zip)
2011-05-23 17:58 PST, WebKit Review Bot
no flags Details
Patch (118.73 KB, patch)
2011-05-24 08:58 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Patch for landing (131.15 KB, patch)
2011-05-24 09:33 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff
Updated to the various FeatureDefines.xcconfig I forgot previously. (3.25 KB, patch)
2011-05-24 15:03 PST, Jay Civelli
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-02-09 19:39:40 PST
MSIE uses the MHTML format (based on RFC 2557) to save "web archives" of web pages.  WebKit should be able to read and display this format.

Note that because the MHTML format is essentially a mail message with a content-type of "multipart/related", a MIME decoder which supports quoted-printable and base64 decoding will be needed.  Mail.app probably contains usable decoding methods, but until they are moved to system libraries, a third-party library would have to be used or that wheel would have to be re-invented.  Pantomime is an example of such a third-party library.

http://www.collaboration-world.com/cgi-bin/project/index.cgi?pid=3
------- Comment #1 From 2006-02-09 19:57:34 PST -------
Created a Radar bug for a system-level APIs for MIME encoding and decoding in Mac OS X.
<rdar://problem/4440559>
------- Comment #2 From 2006-02-11 21:35:44 PST -------
We should seriously consider making MHTML the default web archive format.
------- Comment #3 From 2006-02-11 21:48:04 PST -------
Doesn't IE support both Web archives and saving the subresources into a folder (the way Firefox does)?  The latter would be more useful for Web developers and for people needing to reduce bugs.  It would be cool to support both. :)
------- Comment #4 From 2006-02-12 00:09:18 PST -------
(In reply to comment #3)
> Doesn't IE support both Web archives and saving the subresources into a folder
> (the way Firefox does)?  The latter would be more useful for Web developers and
> for people needing to reduce bugs.  It would be cool to support both. :)

See bug 7211.  :)
------- Comment #5 From 2006-02-12 02:54:09 PST -------
I vote for fixing bug 7211 first :)
------- Comment #6 From 2008-09-02 10:29:27 PST -------
Created an attachment (id=23119) [details]
MHTML support for WebKit GTK

This is an initial patch to add support for both reading and writing MHTML files in WebKit GTK. I tried to make most of the code and the logic shareable in different ports but of course some code is needed to use the native libraries to handle MIME.

Any comments? Should the platform-specific code moved somewhere else?

For GTK reviewers: note that the public API addition is just for testing, probably if the WebCore part is committed we need to add something like the WebArchive class.
------- Comment #7 From 2008-09-04 09:26:25 PST -------
(From update of attachment 23119 [details])
Great work!!  I was secretly hoping you included a MIME parser and code to write out MIME-encoded files so all platforms could benefit from this feature, but it's nice that you used GMime for the GTK port and provided a way for other platforms to provide their own MIME parsing/creating solutions!  :)

This is a great first patch, but there are some things that need to be fixed before it lands. I'm going to mark this r-; see details below.

- Missing a ChangeLog entry.  See <http://webkit.org/coding/contributing.html>.

>@@ -543,8 +543,15 @@ PassRefPtr<ArchiveResource> DocumentLoader::subresource(const KURL& url) const
>         return archiveResourceForURL(url);
>         
>     CachedResource* resource = doc->docLoader()->cachedResource(url);
>+#if PLATFORM(GTK)
>+    // FIXME not checking resource->preloadResult() is just a hack to
>+    // make it work. I have no clue on what is going on here.
>+    if (!resource)
>+        return 0;
>+#else
>     if (!resource || resource->preloadResult() == CachedResource::PreloadReferenced)
>         return archiveResourceForURL(url);
>+#endif
>         
>     return ArchiveResource::create(resource->data(), url, resource->response());
> }

This definitely needs to be fixed before the patch lands.  Please talk to 'antti' in the #webkit channel on irc.freenode.net about preloading and how it affects this patch, or (if he's not available), beidson/bradee-oh.

>+        else
>+            // MHTML archives do not store the frame name, so we use the
>+            // URL as the name
>+            m_subframes.set((*iFrame)->mainResource()->url().string(), iFrame->get());

I think you can leave off ".string()" after "url()" here as there is a method to turn a KURL object into a String automatically.

>+PassRefPtr<Archive> ArchiveResourceCollection::popSubframeArchive(const String& frameName, const KURL& url)
> {
>-    return m_subframes.take(frameName);
>+    PassRefPtr<Archive> archive = m_subframes.take(frameName);

You want a RefPtr<Archive> here, not a PassRefPtr.

>+    if (archive)
>+        return archive;
>+
>+    return m_subframes.take(url.string());

Again, I think ".string()" may be left off here.

>+bool MHTMLWebArchive::init(SharedBuffer* data)
>+{
>+    ASSERT(data);
>+    if (!data)
>+        return false;
>+
>+    if (!parseRawDataRepresentation(data))
>+        return false;
>+
>+    ASSERT(!m_mainResourceType.isEmpty());
>+
>+    /* MHTML files do not have the concept of subframes, so resources are global
>+     * and are assigned to the main frame. To make the resources accessible
>+     * also to subframes we add all the subresources to all the subframes. */

This is probably fine to land this way, but please consider talking to beidson/bradee-oh in #webkit on irc.freenode.net about how to improve this data structure.

For example, would it make more sense to put all resources as main resources (which more closely matches the structure of MHTML files), but have all subframes look for their resources in the main resources if the archive is an MHTML archive?

>+    const Vector<RefPtr<Archive> >& subframes(subframeArchives());
>+    unsigned i;
>+    for (i = 0; i < subframes.size(); ++i) {

Please declare the loop variable inside the for loop, e.g., "for (unsigned i = 0; ...".

>+        const Vector<RefPtr<ArchiveResource> >& resources(subresources());
>+        unsigned j;
>+        for (j = 0; j < resources.size(); ++j)

Please declare the loop variable inside the loop.

>+        /* For some reasons IE sometimes saves sub frames using the "application/octet-stream" mime type instead of
>+         * text/html. In all the cases we tested these HTML files always start with a UTF-8 BOM followed by a
>+         * "<!DOCTYPE ...", so we use it as an heuristic to fix the mime type. */
>+        const char* htmlHead = "\xef\xbb\xbf<!DOCTYPE ";
>+        CString octetHead(buffer->data(), strlen(htmlHead));
>+        if (!strcmp(octetHead.data(), htmlHead))
>+            response.setMimeType("text/html");
>+    }

Is a better way to check for the UTF-8 BOM plus "<!DOCTYPE " here rather than hard-coding the UTF-8 BOM sequence?  Probably okay to land as is, but it would be nice if we could get the UTF-8 BOM from a predefined macro or const (perhaps in ICU?).

>+    // MHTML files do not store subframe names, so we use the URL as the name.
>+    PassRefPtr<ArchiveResource> resource = ArchiveResource::create(buffer, response.url(), response.mimeType(), response.textEncodingName(), response.url().string(), response);

You want a RefPtr here, not a PassRefPtr.  PassRefPtr types are only used in method argument declarations.  See this unlinked web page for more details:  <http://webkit.org/coding/RefPtr.html>.

Also, I think "response.url().string()" can just be "response.url()" here.

>+        PassRefPtr<MHTMLWebArchive> archive = create();

Again, RefPtr instead of PassRefPtr.

>+void MHTMLWebArchive::serializeResourceIfNew(ArchiveResource* resource, HashSet<String>& uniqueSubresources)
>+{
>+    /* MHTML files do not have the concept of subframes, so resources are global.
>+     * To make them accessible also from subframes they have been also assigned to all
>+     * the subframes (see MHTMLWebArchive::init). When saving a MHTML we want to avoid
>+     * to save the duplicated resources, so we use uniqueSubresources to keep track of
>+     * what was already included. */

If you switched to saving the resources only on the main resources, then I think you would get the uniqueneess for free, right?  Or does it use an array instead of a hash to store the resources?  (I thought it used a hash.)

>+    const Vector<RefPtr<ArchiveResource> >& resources(subresources());
>+    unsigned i;
>+    for (i = 0; i < resources.size(); ++i)

Please declare the loop variable inside the for loop.

>+        rootArchive->serializeResourceIfNew(resources[i].get(), uniqueSubresources);
>+
>+    const Vector<RefPtr<Archive> >& subframes(subframeArchives());
>+    for (i = 0; i < subframes.size(); ++i)

Please redeclare 'i' inside the loop variable here.

>+    PassRefPtr<SharedBuffer> sharedBuffer = serializeEnd();

Use RefPtr.

>+    return sharedBuffer;
>+}

This becomes "return sharedBuffer.release();" with the change to RefPtr above.

>+    RefPtr<MHTMLWebArchive> archive = create();
>+    archive->setMainResource(mainResource);
>+    
>+    for (unsigned i = 0; i < subresources.size(); ++i)
>+        archive->addSubresource(subresources[i]);
>+    
>+    for (unsigned i = 0; i < subframeArchives.size(); ++i)
>+        archive->addSubframeArchive(subframeArchives[i]);  

This would change if all resources were stored as main resources.

>+    // Without this hack a MHTML stream is saved inside another MHTML stream.
>+    // FIXME Figure out how can LegacyWebArchive work without a similar hack.
>+    if (documentLoader->mainResource()->mimeType() == "message/rfc822")
>+        return MHTMLWebArchive::create(documentLoader->mainResource()->data());

Because MHTML files don't have the concept of "subFrames" or "subFrameArchives", I don't think you need to include the if statement check.  Just return here unconditionally and remove the rest of the code in the method.  The if statement is always going to be true anyway, correct?

>+PassRefPtr<MHTMLWebArchive> MHTMLWebArchive::create(Range* range)

This method looks very similar to the LegacyWebArchive.cpp method with the same signature.  Is there any benefit to creating a super class for both LegacyWebArchive and MHTMLWebArchive to share code?

>+PassRefPtr<MHTMLWebArchive> MHTMLWebArchive::create(const String& markupString, Frame* frame, Vector<Node*>& nodes)
> [...]
>+PassRefPtr<MHTMLWebArchive> 
MHTMLWebArchive::createFromSelection(Frame* frame)

These two methods look identical, too.  I'd suggest making a super class to hold common code rather than duplicating it here.

These methods may also not be needed since I think they're used for copying documents (or document fragments) to the pasteboard on Mac OS X, which just happens to use the same format as *.webarchive files.

>+#if PLATFORM(GTK)
>+typedef struct _GMimeObject GMimeObject;
>+typedef struct _GMimePart GMimePart;
>+typedef struct _GMimeMultipart GMimeMultipart;
>+
>+namespace WebCore {
>+void handleMimeObject(GMimeObject*, void*);
>+}
>+#endif
>+
>+namespace WebCore {
>+
>+class Frame;
>+class Node;
>+class Range;
>+
>+class MHTMLWebArchive : public Archive {

I would rather see another #if PLATFORM(GTK)/#endif rather than two "namespace WebCore { }" statements here.

>+    PassRefPtr<SharedBuffer> sharedBuffer = SharedBuffer::create(buffer, bufferSize);

Use RefPtr.

>+    addMimeResource(sharedBuffer, response);

"sharedBuffer" becomes "sharedBuffer.release()".

>+    PassRefPtr<SharedBuffer> sharedBuffer = SharedBuffer::create(byteArray->data, byteArray->len);

Use RefPtr.

>+    return sharedBuffer;

Becomes "sharedBuffer.release()".

>+    PassRefPtr<MHTMLWebArchive> archive = MHTMLWebArchive::create(core(frame));
>+    if (!archive)
>+        return false;
>+
>+    PassRefPtr<SharedBuffer> buffer = archive->rawDataRepresentation();
>+    return g_file_set_contents(filename, buffer->data(), buffer->size(), 0);

Use RefPtr.  No other changes should be necessary.

As I said above, r- to fix the issues above, but a great first patch!!
------- Comment #8 From 2008-11-18 22:11:30 PST -------
we have used this patch and found couple of places where it might lead to a crash

1- MHTMLWebArchive::serializeResourceIfNew - in this if resource url is empty, uniqueSubresources.contains(resource->url().string()) results in a crash

2- MHTMLWebArchive::serializeResource - here again resource data can be null and g_mime_part_set_content crashes.

Can you let us know if these are valid scenarios or if something weird is happening in our setup
------- Comment #9 From 2009-01-22 13:35:40 PST -------
Used the patch. Found two issues with this. 
1. All the subresources are not saved. Subresources by mean I say about the embedded resources in the main page. For example some of the images are saved where as some are not.
I do understand that the contents added dynamically from script may not be saved, but static contents should be there.

2. If some subresources has not been saved in the file, then while loading it should not go to the network for accessing http resources. Since we are loading a local file then accessing network is not wiser, as much I understand.
------- Comment #10 From 2011-04-29 10:20:02 PST -------
Created an attachment (id=91695) [details]
Patch
------- Comment #11 From 2011-04-29 10:22:22 PST -------
Here is a different patch that does the MHTML reading without using a specific library (and is platform independent).
------- Comment #12 From 2011-04-29 10:36:23 PST -------
Attachment 91695 [details] did not build on gtk:
Build output: http://queues.webkit.org/results/8524219
------- Comment #13 From 2011-04-29 10:38:59 PST -------
Attachment 91695 [details] did not build on qt:
Build output: http://queues.webkit.org/results/8516944
------- Comment #14 From 2011-04-29 10:47:02 PST -------
Attachment 91695 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8516947
------- Comment #15 From 2011-04-29 11:07:20 PST -------
Attachment 91695 [details] did not build on win:
Build output: http://queues.webkit.org/results/8524228
------- Comment #16 From 2011-04-29 12:42:18 PST -------
Attachment 91695 [details] did not build on mac:
Build output: http://queues.webkit.org/results/8521405
------- Comment #17 From 2011-04-29 14:11:02 PST -------
(From update of attachment 91695 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=91695&action=review

Some drive-by comments.  This patch is too large to review effectively.  Please break this patch into smaller pieces and post them for review in individual bugs that block this bug.

> LayoutTests/mhtml/simple_page.html_original:7
> +<head>
> +<title>A simple page</title>
> +<script>
> +</script>
> +</head>

This should be a text test.

> LayoutTests/mhtml/top_frame.html_original:5
> +<head>
> +<title>A page that contains multiple nested frames</title>
> +</head>

This should all be text tests.  Render tree/image tests are expensive for the project to maintain.

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:32
> +#include "config.h"
> +#include "MHTMLArchive.h"

This whole file should be conditional on ENABLE(MHTML), right?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:45
> +class MHTMLArchive::MHTMLParser {

Should this be in its own file?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:129
> +    if (resource->mimeType() == "text/html" || resource->mimeType() == "application/xhtml+xml") {

What about SVG?  This check seems too specific.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:41
> +namespace {

WebKit uses the static keyword instead of anonymous namespaces.

> Source/WebCore/loader/archive/mhtml/SharedBufferLineReader.h:45
> +// Utility class that allows to read CR-LF terminated lines from a SharedBuffer.
> +class SharedBufferLineReader {

This doesn't appear MHTML specific.  Should it go in a more general location?

> Source/WebCore/platform/text/QuotedPrintable.cpp:47
> +static int hexDigitToInt(char c)
> +{
> +    if (c >= '0' && c <= '9')
> +        return c - '0';
> +    if (c >= 'A' && c <= 'F')
> +        return c - 'A' + 10;
> +    if (c >= 'a' && c <= 'f')
> +        return c - 'a' + 10;
> +    return -1;
> +}

We must have this function elsewhere.
------- Comment #18 From 2011-04-29 14:25:23 PST -------
Created an attachment (id=91740) [details]
Patch
------- Comment #19 From 2011-04-29 14:30:22 PST -------
(From update of attachment 91740 [details])
Please divide this patch into smaller pieces, as requested above.
------- Comment #20 From 2011-04-29 14:32:47 PST -------
By the way, we're trying a new system for understanding new features that are going into WebKit.  I'm not sure the new policy is up on the web site yet, but at the contributor's meeting we decided to experiment with having folks who add new features email webkit-dev first.  This probably falls into that category.  This is a very new thing, so I'm not sure sure what will happen.  Here's an example of such an email <https://lists.webkit.org/pipermail/webkit-dev/2011-April/016630.html>.  (Note: This shouldn't block your work on this feature, it's just to check in with the community.)
------- Comment #21 From 2011-05-02 21:21:10 PST -------
See also: <http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0096>.
------- Comment #22 From 2011-05-06 18:51:52 PST -------
Created an attachment (id=92676) [details]
Patch
------- Comment #23 From 2011-05-06 18:55:54 PST -------
(From update of attachment 91695 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=91695&action=review

>> LayoutTests/mhtml/top_frame.html_original:5
>> +</head>
> 
> This should all be text tests.  Render tree/image tests are expensive for the project to maintain.

OK, I am looking at it, will post a new patch making them all text tests soon.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:32
>> +#include "MHTMLArchive.h"
> 
> This whole file should be conditional on ENABLE(MHTML), right?

OK, done for all MHTML files.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:45
>> +class MHTMLArchive::MHTMLParser {
> 
> Should this be in its own file?

Good idea, it now is.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:129
>> +    if (resource->mimeType() == "text/html" || resource->mimeType() == "application/xhtml+xml") {
> 
> What about SVG?  This check seems too specific.

I added the SVG case. Do you think there are other cases? Is this approach too restrictive?

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:41
>> +namespace {
> 
> WebKit uses the static keyword instead of anonymous namespaces.

Done.

>> Source/WebCore/loader/archive/mhtml/SharedBufferLineReader.h:45
>> +class SharedBufferLineReader {
> 
> This doesn't appear MHTML specific.  Should it go in a more general location?

Now in from my previous CL.

>> Source/WebCore/platform/text/QuotedPrintable.cpp:47
>> +}
> 
> We must have this function elsewhere.

This was done in a previous CL.
------- Comment #24 From 2011-05-06 19:03:16 PST -------
(From update of attachment 92676 [details])
Attachment 92676 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8620120
------- Comment #25 From 2011-05-06 19:23:17 PST -------
(From update of attachment 92676 [details])
Attachment 92676 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/8608196
------- Comment #26 From 2011-05-06 19:36:31 PST -------
(From update of attachment 92676 [details])
Attachment 92676 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/8615196
------- Comment #27 From 2011-05-06 19:49:40 PST -------
(From update of attachment 92676 [details])
Attachment 92676 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8620132

New failing tests:
mhtml/top_frame_ie.mht
mhtml/top_frame_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
------- Comment #28 From 2011-05-06 19:49:46 PST -------
Created an attachment (id=92679) [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
------- Comment #29 From 2011-05-06 19:51:17 PST -------
>> +#include "MHTMLArchive.h"
> 
> This whole file should be conditional on ENABLE(MHTML), right?

FWIW, I much prefer when headers have include guards, so one can always include them. Guards in cpp files are more disruptive.

Many optional systems are implemented like this, but not all.
------- Comment #30 From 2011-05-06 21:32:39 PST -------
(From update of attachment 92676 [details])
Attachment 92676 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8610208
------- Comment #31 From 2011-05-07 00:18:04 PST -------
(From update of attachment 92676 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=92676&action=review

Generally, this looks like a nice approach.  I'm glad that you were able to leverage so much of the existing WebArchive code.  The comments below are all just details.  My main area of concern is the Content-Type parser because that's complex and tricky code.

> LayoutTests/mhtml/page_with_image_ie-expected.txt:6
> +    RenderBody {BODY} at (8,8) size 784x584
> +      RenderText {#text} at (0,0) size 126x18

Do these have to be render-tree tests?  It would be much better if they were text tests...

> Source/WebCore/loader/DocumentLoader.cpp:623
> +        ArchiveResource* resource = archiveResourceForURL(originalURL);
> +        if (resource) {

We commonly combine these two lines.

> Source/WebCore/loader/DocumentLoader.cpp:639
> +          return false;

bad indent

> Source/WebCore/loader/DocumentLoader.cpp:643
> +    case Archive::MHTML:
> +        return true; // Always fail the load for resources not included in the MHTML.

Does this need ENABLE(MHTML) ?

> Source/WebCore/loader/FrameLoader.cpp:1018
>  void FrameLoader::loadArchive(PassRefPtr<Archive> prpArchive)
>  {
> -    RefPtr<Archive> archive = prpArchive;
> +    m_archive = prpArchive;

In this case, you can rename prpArchive to archive.

> Source/WebCore/loader/FrameLoader.cpp:2323
> +    const UChar* mimeType = loader->responseMIMEType().characters();
> +    mimeType++;

I don't understand this code.

> Source/WebCore/loader/FrameLoader.h:504
> +    // 0 if the frame content is not from an archive.

I'd remove this comment.

> Source/WebCore/loader/archive/ArchiveFactory.cpp:68
> +    mimeTypes.set("multipart/related", archiveFactoryCreate<MHTMLArchive>);
> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);

Does this need ENABLE(MHTML) ?

> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:92
> +        return archive;

archive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:49
> +    // For security reasons we only load MHTML pages from the local file system.
> +    if (!url.isLocalFile())
> +        return false;

Interesting.  Is that true for the other archive formats?

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:56
> +    // Since MHTML is a flat format, we need to make all franes aware of all resources.

I don't fully understand what this is doing, but it seems fine.

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:57
> +    for (size_t i = 0; i < parser.frameCount(); i++) {

++i

> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:66
> +    return mainArchive;

mainArchive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:57
> +        return 0;

return nullptr;

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:60
> +    RefPtr<MHTMLArchive> archive = adoptRef(new MHTMLArchive());

MHTMLArchive should have a static create method.  (All RefCounted types in WebKey should be constructed via a static create method.)

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:74
> +            // Ignore IE nesting which makes little sense.

Can you elaborate a bit in this comment?  Does that mean we can't handle all MHTML files or that the nesting isn't required for parsing them?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:101
> +    return archive;

archive.release()

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:112
> +            RefPtr<MHTMLArchive> subframe = adoptRef(new MHTMLArchive());

MHTMLArchive::create

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:117
> +    } else
> +        m_resources.append(resource);

I'd move this to the top with an early return.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:120
> +PassRefPtr<ArchiveResource> MHTMLParser::parseNextPart(MIMEHeader* mimeHeader, const String& endOfPartBoundary, const String& endOfDocumentBoundary, bool* endOfArchiveReached)

endOfArchiveReached => WebKit uses non-const references for out parameters.  Nutty, I know.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:122
> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());

Should we ASSERT something about endOfArchiveReached here?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:126
> +    bool partEndReached = false;

endOfPartReached for consistency with endOfArchiveReached ?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:166
> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader->contentLocation()),

mimeHeader->contentLocation() is always an absolute URL?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:178
> +PassRefPtr<MHTMLArchive> MHTMLParser::frameAt(size_t index) const
> +{
> +    return m_frames[index];
> +}

This function does not appear to transfer ownership.  It should just return an MHTMLArchive*

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:188
> +PassRefPtr<ArchiveResource> MHTMLParser::subResourceAt(size_t index) const
> +{
> +    return m_resources[index];
> +}

This function does not appear to transfer ownership.  It should just return an ArchiveResource*

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:45
> +static void unquoteString(String* string)

This would probably be better if it just returned a String.  Strings in WebKit are immutable.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:70
> +                keyValuePairs->add(key, value.toString().stripWhiteSpace());

What if there are duplicates?  Do we keep the first or the last?  Maybe we error out?

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:91
> +    //    if (buffer->isEOF())
> +    //        return 0;

Please remove commented out code.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:93
> +    RefPtr<MIMEHeader> mimeHeader = adoptRef(new MIMEHeader());

MIMEHeader::create

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:94
> +    WTF::HashMap<String, String> keyValuePairs;

Consider creating a typedef for WTF::HashMap<String, String>

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:95
> +    retrieveKeyValuePairs(buffer, &keyValuePairs);

We should just have retrieveKeyValuePairs return a HashMap rather than using out-params.  Also, you can probably skip the WTF on the HashMap.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:117
> +    return mimeHeader.release();

yay!

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:119
> +bool MIMEHeader::isMultipart() const

Missing blank line before this line.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:152
> +String MIMEHeader::contentType() const
> +{
> +    return m_contentType;
> +}
> +
> +String MIMEHeader::charset() const
> +{
> +    return m_charset;
> +}
> +
> +MIMEHeader::Encoding MIMEHeader::contentTransferEncoding() const
> +{
> +    return m_contentTransferEncoding;
> +}
> +
> +String MIMEHeader::contentLocation() const
> +{
> +    return m_contentLocation;
> +}
> +
> +String MIMEHeader::multiPartType() const
> +{
> +    return m_multipartType;
> +}
> +
> +String MIMEHeader::endOfPartBoundary() const
> +{
> +    return m_endOfPartBoundary;
> +}

All these one-liners can go in the header.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:163
> +    String encoding(text);
> +    encoding.stripWhiteSpace();
> +    encoding.makeLower();

Do these modify the string?  I would have expected:

encoding = text.lower().stripWhiteSpace();

> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:174
> +void MIMEHeader::parseContentType(const String& contentType)

Don't we already have code for parsing ContentTypes?  Wow, this function is complicated.  I haven't review it in detail.  I'll take a look during the next pass.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:38
> +// This class is a limited MIME parser used to parse the MIME headers of MHTML files.

I'd make this a FIXME comment.  Ideally we'd generalize this code to work for more than just MHTML.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
> +    enum Encoding { QuotedPrintable, Base64, SevenBit, Unknown };

These should each get their own line.

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:49
> +    // Parses the MIME header from the passed buffer. The buffer is advanced to
> +    // the end of the header. Returns 0 if there was an error.

This comment can be removed.

> Tools/Scripts/webkitpy/layout_tests/port/test_files.py:47
>  _supported_file_extensions = set(['.html', '.shtml', '.xml', '.xhtml', '.xhtmlmp', '.pl',
> -                                  '.php', '.svg'])
> +                                  '.php', '.svg', '.mht'])

Does this need to do a similar check for the supported feature set?
------- Comment #32 From 2011-05-11 10:29:46 PST -------
Created an attachment (id=93144) [details]
Patch
------- Comment #33 From 2011-05-11 10:30:28 PST -------
(From update of attachment 93144 [details])
Oops wrong bug.
------- Comment #34 From 2011-05-13 10:15:08 PST -------
Created an attachment (id=93475) [details]
Patch
------- Comment #35 From 2011-05-13 10:15:21 PST -------
(From update of attachment 92676 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=92676&action=review

>> Source/WebCore/loader/DocumentLoader.cpp:623
>> +        if (resource) {
> 
> We commonly combine these two lines.

Done.

>> Source/WebCore/loader/DocumentLoader.cpp:639
>> +          return false;
> 
> bad indent

Corrected.

>> Source/WebCore/loader/DocumentLoader.cpp:643
>> +        return true; // Always fail the load for resources not included in the MHTML.
> 
> Does this need ENABLE(MHTML) ?

Added.

>> Source/WebCore/loader/FrameLoader.cpp:1018
>> +    m_archive = prpArchive;
> 
> In this case, you can rename prpArchive to archive.

Renamed.

>> Source/WebCore/loader/FrameLoader.cpp:2323
>> +    mimeType++;
> 
> I don't understand this code.

The worst part is... me neither...
Probably some left over from something I was doing earlier and that only got only partially removed.
Cleaned-up.

>> Source/WebCore/loader/FrameLoader.h:504
>> +    // 0 if the frame content is not from an archive.
> 
> I'd remove this comment.

Removed.

>> Source/WebCore/loader/archive/ArchiveFactory.cpp:68
>> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);
> 
> Does this need ENABLE(MHTML) ?

Yes, added.

>> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:92
>> +        return archive;
> 
> archive.release()

Added.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:49
>> +        return false;
> 
> Interesting.  Is that true for the other archive formats?

Probably, but I am afraid changing that policy for WebArchive could be disruptive.
(I actually sent an email to Brady Eidson regarding that, he told me that whether this was a problem was not clear).

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:57
>> +    for (size_t i = 0; i < parser.frameCount(); i++) {
> 
> ++i

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLArchive.cpp:66
>> +    return mainArchive;
> 
> mainArchive.release()

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:57
>> +        return 0;
> 
> return nullptr;

Doesn't nullptr only work with PassOwnPtr?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:60
>> +    RefPtr<MHTMLArchive> archive = adoptRef(new MHTMLArchive());
> 
> MHTMLArchive should have a static create method.  (All RefCounted types in WebKey should be constructed via a static create method.)

It already has one with parameters, adding one with no parameters.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:74
>> +            // Ignore IE nesting which makes little sense.
> 
> Can you elaborate a bit in this comment?  Does that mean we can't handle all MHTML files or that the nesting isn't required for parsing them?

Unmht does not do any nesting. IE does, but it does not seem to make sense. See LayoutTests/mhtml/top_frame_ie.mht as an example in this patch, they nest the top frame and its first child frame, but nothing else.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:101
>> +    return archive;
> 
> archive.release()

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:112
>> +            RefPtr<MHTMLArchive> subframe = adoptRef(new MHTMLArchive());
> 
> MHTMLArchive::create

Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:117
>> +        m_resources.append(resource);
> 
> I'd move this to the top with an early return.

Good idea.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:120
>> +PassRefPtr<ArchiveResource> MHTMLParser::parseNextPart(MIMEHeader* mimeHeader, const String& endOfPartBoundary, const String& endOfDocumentBoundary, bool* endOfArchiveReached)
> 
> endOfArchiveReached => WebKit uses non-const references for out parameters.  Nutty, I know.

OK, changed endOfArchiveReached. I guess when passing out parameters as references, it is not possible to make them optional.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:122
>> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());
> 
> Should we ASSERT something about endOfArchiveReached here?

Not sure what to ASSERT, it is an optional out parameter.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:126
>> +    bool partEndReached = false;
> 
> endOfPartReached for consistency with endOfArchiveReached ?

Good idea. Done.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:166
>> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader->contentLocation()),
> 
> mimeHeader->contentLocation() is always an absolute URL?

Ah, good point. The spec says this could be a relative URL, even though IE and Unmht always generate absolute URLs.
Do you think it is acceptable to add a FIXME and fix as a follow up patch?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:178
>> +}
> 
> This function does not appear to transfer ownership.  It should just return an MHTMLArchive*

OK.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:188
>> +}
> 
> This function does not appear to transfer ownership.  It should just return an ArchiveResource*

OK.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:45
>> +static void unquoteString(String* string)
> 
> This would probably be better if it just returned a String.  Strings in WebKit are immutable.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:70
>> +                keyValuePairs->add(key, value.toString().stripWhiteSpace());
> 
> What if there are duplicates?  Do we keep the first or the last?  Maybe we error out?

Added a LOG_ERROR, keeping the last value.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:91
>> +    //        return 0;
> 
> Please remove commented out code.

Oops, removed.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:93
>> +    RefPtr<MIMEHeader> mimeHeader = adoptRef(new MIMEHeader());
> 
> MIMEHeader::create

Added.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:94
>> +    WTF::HashMap<String, String> keyValuePairs;
> 
> Consider creating a typedef for WTF::HashMap<String, String>

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:95
>> +    retrieveKeyValuePairs(buffer, &keyValuePairs);
> 
> We should just have retrieveKeyValuePairs return a HashMap rather than using out-params.  Also, you can probably skip the WTF on the HashMap.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:117
>> +    return mimeHeader.release();
> 
> yay!

I shall someday fully master the RefPtr class! :-)

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:119
>> +bool MIMEHeader::isMultipart() const
> 
> Missing blank line before this line.

Added.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:152
>> +}
> 
> All these one-liners can go in the header.

Moved to header.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp:163
>> +    encoding.makeLower();
> 
> Do these modify the string?  I would have expected:
> 
> encoding = text.lower().stripWhiteSpace();

Ah, yes you are right!
Corrected.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:38
>> +// This class is a limited MIME parser used to parse the MIME headers of MHTML files.
> 
> I'd make this a FIXME comment.  Ideally we'd generalize this code to work for more than just MHTML.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
>> +    enum Encoding { QuotedPrintable, Base64, SevenBit, Unknown };
> 
> These should each get their own line.

Done.

>> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:49
>> +    // the end of the header. Returns 0 if there was an error.
> 
> This comment can be removed.

Removed.

>> Tools/Scripts/webkitpy/layout_tests/port/test_files.py:47
>> +                                  '.php', '.svg', '.mht'])
> 
> Does this need to do a similar check for the supported feature set?

Done in webkit.py.
------- Comment #36 From 2011-05-13 10:50:25 PST -------
(From update of attachment 93475 [details])
Attachment 93475 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8694223
------- Comment #37 From 2011-05-13 10:54:17 PST -------
(From update of attachment 93475 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review

This is getting close.  A few minor points (and some questions) below.  Thanks!

> Source/WebCore/CMakeLists.txt:1950
> +        loader/archive/mhtml/MIMEHeader.cpp

I thought this got moved to platform/network...

> Source/WebCore/GNUmakefile.list.am:2008
> +	Source/WebCore/loader/archive/mhtml/MIMEHeader.cpp \
> +	Source/WebCore/loader/archive/mhtml/MIMEHeader.h \

Same here

> Source/WebCore/WebCore.gypi:3352
> +            'loader/archive/mhtml/MHTMLArchive.h',

The headers only need to appear once (in the first group of files).

> Source/WebCore/loader/MainResourceLoader.cpp:261
> +        bool isRemoteWebArchive = (equalIgnoringCase("application/x-webarchive", mimeType) || equalIgnoringCase("message/rfc822", mimeType))
> +            && !m_substituteData.isValid() && !url.isLocalFile();

does message/rfc822 need to be behind ENABLE(MHTML) ?

> Source/WebCore/loader/archive/Archive.h:47
> +    virtual ~Archive() { }

Virtual functions should be out-of-line.

> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:62
> +            m_subframes.set((*iFrame)->mainResource()->url().string(), iFrame->get());

Is the variable really called iFrame ?  *cries*

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:121
> +    if (mimeType != "text/html" && mimeType != "application/xhtml+xml" && mimeType != "image/svg+xml")  {

This line doesn't feel right.  Is there some function in WebCore we should be calling that knows about the set of MIME types you're trying to select here?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:140
> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());

I meant that we should add ASSERT(!endOfArchiveReached) here, right?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),

KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?

> Source/WebCore/loader/archive/mhtml/MIMEHeader.h:46
> +class MIMEHeader : public RefCounted<MIMEHeader> {

Am I confused?  I thought we did this class already...
------- Comment #38 From 2011-05-13 11:36:44 PST -------
(From update of attachment 93475 [details])
Attachment 93475 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8694229
------- Comment #39 From 2011-05-13 12:47:02 PST -------
(From update of attachment 93475 [details])
Attachment 93475 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8700083
------- Comment #40 From 2011-05-13 15:23:20 PST -------
Created an attachment (id=93523) [details]
Patch
------- Comment #41 From 2011-05-13 15:23:28 PST -------
(From update of attachment 93475 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review

>> Source/WebCore/CMakeLists.txt:1950
>> +        loader/archive/mhtml/MIMEHeader.cpp
> 
> I thought this got moved to platform/network...

Moved them in the new patch.

>> Source/WebCore/WebCore.gypi:3352
>> +            'loader/archive/mhtml/MHTMLArchive.h',
> 
> The headers only need to appear once (in the first group of files).

Ah, ok.

>> Source/WebCore/loader/MainResourceLoader.cpp:261
>> +            && !m_substituteData.isValid() && !url.isLocalFile();
> 
> does message/rfc822 need to be behind ENABLE(MHTML) ?

I think this is fine, since we don't want to let you load such a message from the network regardless of whether MHTML support is enabled or not.
(as this code does for webarchive).

>> Source/WebCore/loader/archive/Archive.h:47
>> +    virtual ~Archive() { }
> 
> Virtual functions should be out-of-line.

Done (sadly I had to create a CPP file to contain that dummy destructor).

>> Source/WebCore/loader/archive/ArchiveResourceCollection.cpp:62
>> +            m_subframes.set((*iFrame)->mainResource()->url().string(), iFrame->get());
> 
> Is the variable really called iFrame ?  *cries*

Don't cry! :-) I changed that code!

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:121
>> +    if (mimeType != "text/html" && mimeType != "application/xhtml+xml" && mimeType != "image/svg+xml")  {
> 
> This line doesn't feel right.  Is there some function in WebCore we should be calling that knows about the set of MIME types you're trying to select here?

Now using MIMETypeRegistry::isSupportedNonImageMIMEType() which should do it I believe.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:140
>> +    ASSERT(endOfPartBoundary.isEmpty() == endOfDocumentBoundary.isEmpty());
> 
> I meant that we should add ASSERT(!endOfArchiveReached) here, right?

Since it is a reference, what would that really check?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
>> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),
> 
> KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?

It's complicated... The spec says that it could be a relative URL and there are several ways it could be resolved.
One of them is to use the base specification if any in the part's content. This is tricky...
Otherwise there are other ways to resolve it, including using the surrounding parts location.
Practically, IE and Firefox (UnMHT) always use absolute URLs (from what I can tell), so I wonder if this could be a FIXME for now.
What do you think?
------- Comment #42 From 2011-05-13 15:25:14 PST -------
(In reply to comment #41)
> (From update of attachment 93475 [details] [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review
> 
> >> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
> >> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),
> > 
> > KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?
> 
> It's complicated... The spec says that it could be a relative URL and there are several ways it could be resolved.
> One of them is to use the base specification if any in the part's content. This is tricky...
> Otherwise there are other ways to resolve it, including using the surrounding parts location.
> Practically, IE and Firefox (UnMHT) always use absolute URLs (from what I can tell), so I wonder if this could be a FIXME for now.
> What do you think?

Recording that information in a FIXME seems valuable.
------- Comment #43 From 2011-05-13 15:58:43 PST -------
(From update of attachment 93475 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93475&action=review

>>>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:183
>>>> +    return ArchiveResource::create(contentBuffer, KURL(ParsedURLString, mimeHeader.contentLocation()),
>>> 
>>> KURL(ParsedURLString, mimeHeader.contentLocation()) doesn't seem quite right either.  Is contentLocation always an absolute URL?
>> 
>> It's complicated... The spec says that it could be a relative URL and there are several ways it could be resolved.
>> One of them is to use the base specification if any in the part's content. This is tricky...
>> Otherwise there are other ways to resolve it, including using the surrounding parts location.
>> Practically, IE and Firefox (UnMHT) always use absolute URLs (from what I can tell), so I wonder if this could be a FIXME for now.
>> What do you think?
> 
> Recording that information in a FIXME seems valuable.

Yep, that's what I did in my patch.
------- Comment #44 From 2011-05-13 16:06:16 PST -------
(From update of attachment 93523 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review

There is a bunch of non-trivial changes to Web Archive code. Did you verify that those still work in Safari nightly builds?

It's great to see many test cases! Is there one verifying that WebKit is not vulnerable to <http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0096>?

I made some nitpick comments below, but I didn't review deeply.

> Source/WebCore/loader/archive/ArchiveFactory.cpp:71
> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);

Where does message/rfc822 type come from in practice? Does CFNetwork know to give local .mht files this type?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:155
> +            // Quoted-Printable require \r\n to be parsed correctly.

What's "correctly"? All code in WebCore to be correct, so this comment doesn't say much.

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:179
> +        LOG_ERROR("Invalid encoding for MHTML part.");

Why not log the encoding here to make debugging easier?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:186
> +    KURL location = KURL(ParsedURLString, mimeHeader.contentLocation());

This doesn't seem right. The contentLocation string comes from external content, but this KURL constructor asserts if the string is not a canonical absolute URL in KURL sense.

If we only support absolute URLs, use a constructor with empty base URL: KURL(KURL(), mimeHeader.contentLocation());

> Source/WebCore/loader/archive/mhtml/MHTMLParser.h:50
> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);

This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.

> Source/WebCore/platform/network/MIMEHeader.h:69
> +    static PassRefPtr<MIMEHeader> create();
> +    MIMEHeader();

I'm not sure if we need a private create function. It would certainly be fairly unusual.

> Source/WebCore/platform/network/MIMEHeader.h:71
> +    static Encoding parseEncoding(const String&);

Should this be ContentEncoding? An unqualified "encoding" is usually about charsets.
------- Comment #45 From 2011-05-13 16:06:17 PST -------
(From update of attachment 93523 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review

> LayoutTests/mhtml/multi_frames_ie.mht:14
> +------=_NextPart_000_0005_01CC0FF0.EF60FBE0
> +Content-Type: image/png
> +Content-Transfer-Encoding: base64

Where are the results for these tests?
------- Comment #46 From 2011-05-16 16:07:29 PST -------
(From update of attachment 93523 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review

>> LayoutTests/mhtml/multi_frames_ie.mht:14
>> +Content-Transfer-Encoding: base64
> 
> Where are the results for these tests?

Added.

>> Source/WebCore/loader/archive/ArchiveFactory.cpp:71
>> +    mimeTypes.set("message/rfc822", archiveFactoryCreate<MHTMLArchive>);
> 
> Where does message/rfc822 type come from in practice? Does CFNetwork know to give local .mht files this type?

CFNetwork does not return message/rfc822 for local .mht files.
I do my development on the Chromium port  which uses a different MIME type registry.
I investigated but I am not familiar with the CFNetwork, do you know how I can register a MIME type for a file extension?

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:155
>> +            // Quoted-Printable require \r\n to be parsed correctly.
> 
> What's "correctly"? All code in WebCore to be correct, so this comment doesn't say much.

I meant that the line reader removes the \r\n, but we need them for the content in this case as they are expected by the QuotedPrintable decoder.
I clarified the comment.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:179
>> +        LOG_ERROR("Invalid encoding for MHTML part.");
> 
> Why not log the encoding here to make debugging easier?

Here we don't have access to the encoding string anymore (this is an enum).
I added a log error with the actual unknown encoding in the MIMEHeader where we parse the actual string.

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:186
>> +    KURL location = KURL(ParsedURLString, mimeHeader.contentLocation());
> 
> This doesn't seem right. The contentLocation string comes from external content, but this KURL constructor asserts if the string is not a canonical absolute URL in KURL sense.
> 
> If we only support absolute URLs, use a constructor with empty base URL: KURL(KURL(), mimeHeader.contentLocation());

Switched to KURL(KURL(), mimeHeader.contentLocation());

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.h:50
>> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);
> 
> This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.

It is in MHTMLParser.cpp line 94.

>> Source/WebCore/platform/network/MIMEHeader.h:69
>> +    MIMEHeader();
> 
> I'm not sure if we need a private create function. It would certainly be fairly unusual.

OK, removed the create method.

>> Source/WebCore/platform/network/MIMEHeader.h:71
>> +    static Encoding parseEncoding(const String&);
> 
> Should this be ContentEncoding? An unqualified "encoding" is usually about charsets.

Done.
------- Comment #47 From 2011-05-16 16:09:47 PST -------
(In reply to comment #44)
> (From update of attachment 93523 [details] [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=93523&action=review
> 
> There is a bunch of non-trivial changes to Web Archive code. Did you verify that those still work in Safari nightly builds?
Yes, I was able to save www.apple.com and reload it as a WebArchive.
Also all the webarchive layout tests still pass.  

> It's great to see many test cases! Is there one verifying that WebKit is not vulnerable to <http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-0096>?
From what I understand about the attack, I think we are safe as we only load MHTML from the local file system.
------- Comment #48 From 2011-05-16 16:10:34 PST -------
Created an attachment (id=93709) [details]
Patch
------- Comment #49 From 2011-05-16 16:40:03 PST -------
Created an attachment (id=93712) [details]
Patch
------- Comment #50 From 2011-05-16 16:55:33 PST -------
(From update of attachment 93712 [details])
Attachment 93712 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8707098
------- Comment #51 From 2011-05-16 17:26:01 PST -------
> I investigated but I am not familiar with the CFNetwork, do you know how I can register a MIME type for a file extension?

We have a default mapping in mediaMIMETypeMap() (MIMETypeRegistry.cpp), but it's only used for media. I'm also not sure how to best fix this.

Wikipedia says that the type is multipart/related, not message/rfc822. Which one is right? Should we perhaps just look at the extension, not at whatever type the loader synthesizes for local files?

Also, both .mht and .mhtml need to be supported, according to Wikipedia.

>>> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);
>> This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.
>
>It is in MHTMLParser.cpp line 94.

This doesn't seem accurate to me. The argument is RefPtr there, not a PassRefPtr.

In general, using PassRefPtr for arguments should be a rare exception. It's dangerous (untested code paths almost always end up with null dereferences), and has a very small impact on performance.
------- Comment #52 From 2011-05-16 19:31:32 PST -------
(From update of attachment 93712 [details])
Attachment 93712 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8704176
------- Comment #53 From 2011-05-17 15:17:23 PST -------
Created an attachment (id=93828) [details]
Patch
------- Comment #54 From 2011-05-17 15:31:28 PST -------
(In reply to comment #51)
> > I investigated but I am not familiar with the CFNetwork, do you know how I can register a MIME type for a file extension?
> 
> We have a default mapping in mediaMIMETypeMap() (MIMETypeRegistry.cpp), but it's only used for media. I'm also not sure how to best fix this.
> Wikipedia says that the type is multipart/related, not message/rfc822. Which one is right? Should we perhaps just look at the extension, not at whatever type the loader synthesizes for local files?

OK, my thought is that we should probably stick to the current model where the port decides the file extension/MIME type mapping. It should be fairly easy for other ports to add that mapping to enable the feature. 

Regarding the MIME type, you are right, we should use multipart/related not rfc822. I removed reference to message/rfc822.

>
> Also, both .mht and .mhtml need to be supported, according to Wikipedia.
> 
> >>> +    PassRefPtr<MHTMLArchive> parseArchive(PassRefPtr<MIMEHeader> = 0);
> >> This should not be taking a PassRefPtr - this function is never called with a PassRefPtr argument.
> >
> >It is in MHTMLParser.cpp line 94.
> 
> This doesn't seem accurate to me. The argument is RefPtr there, not a PassRefPtr.
> 
> In general, using PassRefPtr for arguments should be a rare exception. It's dangerous (untested code paths almost always end up with null dereferences), and has a very small impact on performance.
Ah, you are right.
Changed it to a RefPtr.cd
------- Comment #55 From 2011-05-17 15:35:42 PST -------
(From update of attachment 93828 [details])
Attachment 93828 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8711182
------- Comment #56 From 2011-05-17 15:45:19 PST -------
> Changed it to a RefPtr.

RefPtr should never be used in an argument, see <http://www.webkit.org/coding/RefPtr.html>. This argument should be a plain pointer.
------- Comment #57 From 2011-05-17 17:31:07 PST -------
(From update of attachment 93828 [details])
Attachment 93828 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8702582
------- Comment #58 From 2011-05-18 10:24:37 PST -------
(In reply to comment #56)
> > Changed it to a RefPtr.
> 
> RefPtr should never be used in an argument, see <http://www.webkit.org/coding/RefPtr.html>. This argument should be a plain pointer.

OK, I am a bit confused as I initially used a PassRefPtr just like http://www.webkit.org/coding/RefPtr.html suggests making sure I immediately assigned it to a RefPtr.
I am OK to change it to a raw pointer, but that means I'll have to do a little dance as the MIMEHeader if null is retrieved from the MIMEHeader::parseHeader class which returns a PassRefPtr that I would have to store in a RefPtr and then get the raw pointer out of it.
------- Comment #59 From 2011-05-18 10:38:54 PST -------
> OK, I am a bit confused as I initially used a PassRefPtr just like 
> http://www.webkit.org/coding/RefPtr.html suggests

The document doesn't describe PassRefPtr usage in sufficient detail at the moment. 

> I am OK to change it to a raw pointer, but that means I'll have to do a little dance

I don't understand this comment. Maybe post a patch, and we'll see?
------- Comment #60 From 2011-05-18 10:45:15 PST -------
Created an attachment (id=93935) [details]
Patch
------- Comment #61 From 2011-05-18 11:52:01 PST -------
(In reply to comment #59)
> > OK, I am a bit confused as I initially used a PassRefPtr just like 
> > http://www.webkit.org/coding/RefPtr.html suggests
> 
> The document doesn't describe PassRefPtr usage in sufficient detail at the moment. 
> 
> > I am OK to change it to a raw pointer, but that means I'll have to do a little dance
> 
> I don't understand this comment. Maybe post a patch, and we'll see?
I posted a patch. I forgot to rebase so it failed to build, but you can see what I mean.
------- Comment #62 From 2011-05-18 12:09:44 PST -------
Thanks, I see what you mean. I'm not fully following why the header sometimes comes from outside, and other times doesn't. Depending on what the situation is, there can be different ways to make the code better:
- Split parseArchive into two functions - one that takes a pre-parsed MIMEHeader object, and another that doesn't. Only the latter would parse it internally.
- Always parse the header before calling parseArchive, so that it would never see a null argument.
- Or give refPtrHeader a better name, explaining how it's different from the header argument.

Currently, it's difficult to follow when the header argument may or may not be null.

Adam, would you be willing to take back the reviewing of this work? I only commented on a few things that stood out to me, and didn't review deeply.
------- Comment #63 From 2011-05-18 12:48:34 PST -------
Created an attachment (id=93970) [details]
Patch
------- Comment #64 From 2011-05-18 12:50:19 PST -------
(In reply to comment #62)
> Thanks, I see what you mean. I'm not fully following why the header sometimes comes from outside, and other times doesn't. Depending on what the situation is, there can be different ways to make the code better:
> - Split parseArchive into two functions - one that takes a pre-parsed MIMEHeader object, and another that doesn't. Only the latter would parse it internally.
> - Always parse the header before calling parseArchive, so that it would never see a null argument.
> - Or give refPtrHeader a better name, explaining how it's different from the header argument.
Good suggestions. I want with #1 and made it 2 functions: parseArchive and parseArchiveWithHeader.
------- Comment #65 From 2011-05-18 13:14:38 PST -------
(From update of attachment 93970 [details])
Attachment 93970 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8704791
------- Comment #66 From 2011-05-18 13:31:04 PST -------
> Adam, would you be willing to take back the reviewing of this work? I only commented on a few things that stood out to me, and didn't review deeply.

Sure.  I'd be happy to.
------- Comment #67 From 2011-05-18 13:45:11 PST -------
(From update of attachment 93970 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93970&action=review

This looks basically done.  The only major issue is the use of render-tree tests.  I've also added some minor comments/questions below.  Thanks!

> LayoutTests/platform/chromium-mac/mhtml/multi_frames_ie-expected.txt:23
> +      RenderPartObject {IFRAME} at (0,172) size 304x154 [border: (2px inset #000000)]
> +        layer at (0,0) size 300x150
> +          RenderView at (0,0) size 300x150
> +        layer at (0,0) size 300x150
> +          RenderBlock {HTML} at (0,0) size 300x150
> +            RenderBody {BODY} at (8,8) size 284x134

We don't need render tree tests for this feature.  We can either use text tests or dump-as-markup tests.  Using render tree tests is too much of a maintenance burden.  I feel like I've made this comment every time I've reviewed this patch.

> Source/WebCore/WebCore.pro:3487
> +        loader/archive/Archive.cpp \

This seems slightly misplaced, but I guess Qt doesn't build with WebArchive support.

> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-17019
> -				892CF207134C8BB300AAEDA1 /* JSStorageInfoQuotaCallback.cpp */,
> -				892CF208134C8BB300AAEDA1 /* JSStorageInfoQuotaCallback.h */,

These changes seem unrelated.  I was seeing them today too.  I'm not sure what their deal is.

> Source/WebCore/loader/DocumentLoader.cpp:658
> +        ASSERT(false);

ASSERT_NOT_REACHED().  You should be able to just remove the default case and rely upon the compiler to warn you if you forget a type.

> Source/WebCore/loader/DocumentLoader.cpp:661
> +    return false;

It's strange that both branches in the Archive::WebArchive lead to returning false.  Is that your intention?

> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:125
> +    if (!MIMETypeRegistry::isSupportedNonImageMIMEType(mimeType) || MIMETypeRegistry::isSupportedJavaScriptMIMEType(mimeType) || mimeType == "text/css") { // Chromium considers CSS as a document suitable resource.

I'd just remove this comment.  Adding text/css here is a little goofy, but probably harmless.

> Source/WebCore/platform/network/MIMEHeader.cpp:68
> +        size_t semiColonIndex = line.find(L':');

No need for L here.

> Source/WebCore/platform/network/MIMEHeader.cpp:69
> +        if (semiColonIndex == notFound || (semiColonIndex == line.length() - 1)) {

Why isn't

Foo-bar:

A key-value pair?  It just seems like the value is the empty string...

> Source/WebCore/platform/network/MIMEHeader.cpp:78
> +        keyValuePairs.set(key, value.toString().stripWhiteSpace());

Yeah, especially that's then whitespace sensitive.  If you've got a trailing space, then you end up with a key-value pair with an empty value.

> Source/WebCore/platform/network/MIMEHeader.cpp:88
> +        ContentTypeParser contentTypeParser(makeString("Content-Type:", mimeParametersIterator->second));

It's strange that you need to synthesize a fake key name here.

> Source/WebCore/platform/network/MIMEHeader.cpp:107
> +        mimeHeader->m_contentTransferEncoding = parseContentTransferEncoding(mimeParametersIterator->second);

This one works more nicely.

> Source/WebCore/platform/network/MIMEHeader.h:38
> +// FIXME: This class is a limited MIME parser used to parse the MIME headers of MHTML files.

Super minor nit: I'd more this comment to right above the "class MIMEHeader" line.
------- Comment #68 From 2011-05-18 17:19:51 PST -------
(From update of attachment 93970 [details])
Attachment 93970 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8709673
------- Comment #69 From 2011-05-19 10:11:04 PST -------
Created an attachment (id=94082) [details]
Patch
------- Comment #70 From 2011-05-19 10:29:17 PST -------
(From update of attachment 94082 [details])
Attachment 94082 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8716367
------- Comment #71 From 2011-05-19 10:46:48 PST -------
(From update of attachment 93970 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=93970&action=review

>> LayoutTests/platform/chromium-mac/mhtml/multi_frames_ie-expected.txt:23
>> +            RenderBody {BODY} at (8,8) size 284x134
> 
> We don't need render tree tests for this feature.  We can either use text tests or dump-as-markup tests.  Using render tree tests is too much of a maintenance burden.  I feel like I've made this comment every time I've reviewed this patch.

Made them text test. Sorry it took so long.

>> Source/WebCore/WebCore.pro:3487
>> +        loader/archive/Archive.cpp \
> 
> This seems slightly misplaced, but I guess Qt doesn't build with WebArchive support.

Yeah, since MHTML is the only archive type, this won't compile without it.

>> Source/WebCore/WebCore.xcodeproj/project.pbxproj:-17019
>> -				892CF208134C8BB300AAEDA1 /* JSStorageInfoQuotaCallback.h */,
> 
> These changes seem unrelated.  I was seeing them today too.  I'm not sure what their deal is.

That's weird. I reverted my change, open the file in XCode and added my file and I still see those. Hopefully the bots will tell us if that causes anything bad.

>> Source/WebCore/loader/DocumentLoader.cpp:658
>> +        ASSERT(false);
> 
> ASSERT_NOT_REACHED().  You should be able to just remove the default case and rely upon the compiler to warn you if you forget a type.

Done.

>> Source/WebCore/loader/DocumentLoader.cpp:661
>> +    return false;
> 
> It's strange that both branches in the Archive::WebArchive lead to returning false.  Is that your intention?

Good point. I simplified that code (coming in my next patch where I'll hopefully have the qt build working).

>> Source/WebCore/loader/archive/mhtml/MHTMLParser.cpp:125
>> +    if (!MIMETypeRegistry::isSupportedNonImageMIMEType(mimeType) || MIMETypeRegistry::isSupportedJavaScriptMIMEType(mimeType) || mimeType == "text/css") { // Chromium considers CSS as a document suitable resource.
> 
> I'd just remove this comment.  Adding text/css here is a little goofy, but probably harmless.

Removed comment.

>> Source/WebCore/platform/network/MIMEHeader.cpp:68
>> +        size_t semiColonIndex = line.find(L':');
> 
> No need for L here.

Removed.

>> Source/WebCore/platform/network/MIMEHeader.cpp:69
>> +        if (semiColonIndex == notFound || (semiColonIndex == line.length() - 1)) {
> 
> Why isn't
> 
> Foo-bar:
> 
> A key-value pair?  It just seems like the value is the empty string...

Good point, I am now just requiring a : but relaxed the "no empty value" constraint.

>> Source/WebCore/platform/network/MIMEHeader.cpp:88
>> +        ContentTypeParser contentTypeParser(makeString("Content-Type:", mimeParametersIterator->second));
> 
> It's strange that you need to synthesize a fake key name here.

Yeah, the ContentTypeParser expects the actual key.
Added a FIXME that I'll try to address in a follow-up patch to make the ContentTypeParser more flexible so it can also take just the ContenType value.

>> Source/WebCore/platform/network/MIMEHeader.h:38
>> +// FIXME: This class is a limited MIME parser used to parse the MIME headers of MHTML files.
> 
> Super minor nit: I'd more this comment to right above the "class MIMEHeader" line.

Done.
------- Comment #72 From 2011-05-19 10:51:24 PST -------
(From update of attachment 94082 [details])
Attachment 94082 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8721028
------- Comment #73 From 2011-05-19 10:52:31 PST -------
(From update of attachment 94082 [details])
Attachment 94082 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/8707937
------- Comment #74 From 2011-05-19 11:11:13 PST -------
(From update of attachment 94082 [details])
It's hard for me to read this patch again with fresh eyes, but I think we're in good shape now.  Thanks for changing the tests to be text tests.  It looks like you've still got some broken builds, but feel free to land once you've got that all sorted out.  Thanks Jay.
------- Comment #75 From 2011-05-19 11:39:53 PST -------
(From update of attachment 94082 [details])
Attachment 94082 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8711934
------- Comment #76 From 2011-05-19 12:23:47 PST -------
(From update of attachment 94082 [details])
Attachment 94082 [details] did not pass cr-mac-ews (chromium):
Output: http://queues.webkit.org/results/8719148
------- Comment #77 From 2011-05-19 13:38:22 PST -------
Created an attachment (id=94107) [details]
Patch
------- Comment #78 From 2011-05-19 13:53:41 PST -------
(From update of attachment 94107 [details])
Attachment 94107 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/8707970
------- Comment #79 From 2011-05-19 15:10:21 PST -------
Created an attachment (id=94126) [details]
Patch
------- Comment #80 From 2011-05-19 16:43:41 PST -------
(From update of attachment 94126 [details])
Attachment 94126 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8719210
------- Comment #81 From 2011-05-19 16:43:48 PST -------
Created an attachment (id=94146) [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
------- Comment #82 From 2011-05-19 20:52:09 PST -------
(From update of attachment 94126 [details])
Attachment 94126 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8717384
------- Comment #83 From 2011-05-20 10:54:29 PST -------
Created an attachment (id=94239) [details]
Patch
------- Comment #84 From 2011-05-20 11:55:38 PST -------
(From update of attachment 94239 [details])
Attachment 94239 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8722243

New failing tests:
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
------- Comment #85 From 2011-05-20 11:55:46 PST -------
Created an attachment (id=94253) [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
------- Comment #86 From 2011-05-20 14:48:56 PST -------
(From update of attachment 94239 [details])
Attachment 94239 [details] did not pass gtk-ews (gtk):
Output: http://queues.webkit.org/results/8651038
------- Comment #87 From 2011-05-23 11:15:25 PST -------
Created an attachment (id=94447) [details]
Patch
------- Comment #88 From 2011-05-23 12:29:59 PST -------
(From update of attachment 94447 [details])
Attachment 94447 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8723866

New failing tests:
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
------- Comment #89 From 2011-05-23 12:30:07 PST -------
Created an attachment (id=94460) [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
------- Comment #90 From 2011-05-23 13:57:56 PST -------
dglazkov says cr-linux needs a clobber to turn green and that it should be safe to land like this.
------- Comment #91 From 2011-05-23 14:03:59 PST -------
(From update of attachment 94447 [details])
Ok.  I'm skeptical, but dglazkov is always right.
------- Comment #92 From 2011-05-23 14:54:45 PST -------
(In reply to comment #91)
> (From update of attachment 94447 [details] [details])
> Ok.  I'm skeptical, but dglazkov is always right.

Oh noes.
------- Comment #93 From 2011-05-23 15:11:17 PST -------
(From update of attachment 94447 [details])
Rejecting attachment 94447 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-7', 'build-..." exit_code: 2

Last 500 characters of output:
kitpy/layout_tests/port/webkit_unittest.py", line 85, in test_skipped_directories_for_symbols
    self.assertEqual(result_directories, expected_directories)
AssertionError: set(['fast/wcss', 'mathml', 'compositing/webgl', 'fast/canvas/webgl', 'mhtml', 'http/tests/canvas/webgl']) != set(['fast/wcss', 'compositing/webgl', 'mathml', 'http/tests/canvas/webgl', 'fast/canvas/webgl'])

----------------------------------------------------------------------
Ran 1028 tests in 29.203s

FAILED (failures=1)

Full output: http://queues.webkit.org/results/8722974
------- Comment #94 From 2011-05-23 15:11:26 PST -------
Created an attachment (id=94498) [details]
Archive of layout-test-results from cr-jail-7

The attached test failures were seen while running run-webkit-tests on the commit-queue.
Bot: cr-jail-7  Port: Mac  Platform: Mac OS X 10.6.7
------- Comment #95 From 2011-05-23 16:08:29 PST -------
Created an attachment (id=94512) [details]
Patch
------- Comment #96 From 2011-05-23 17:58:07 PST -------
(From update of attachment 94512 [details])
Attachment 94512 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8723945

New failing tests:
mhtml/simple_page_unmht.mht
mhtml/page_with_image_ie.mht
mhtml/multi_frames_ie.mht
mhtml/multi_frames_unmht.mht
mhtml/page_with_image_unmht.mht
mhtml/simple_page_ie.mht
mhtml/page_with_css_and_js_ie.mht
mhtml/page_with_css_and_js_unmht.mht
------- Comment #97 From 2011-05-23 17:58:15 PST -------
Created an attachment (id=94527) [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
------- Comment #98 From 2011-05-23 21:27:06 PST -------
(From update of attachment 94512 [details])
Rejecting attachment 94512 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-7', 'land-a..." exit_code: 1

Last 500 characters of output:
owElement.h
    M    Source/WebCore/html/shadow/MeterShadowElement.cpp
    M    Source/WebCore/html/shadow/DetailsMarkerControl.h
    M    Source/WebCore/html/HTMLElement.h
    M    Source/WebCore/html/HTMLEmbedElement.h
    M    Source/WebCore/html/HTMLEmbedElement.cpp
    M    Source/WebCore/CMakeLists.txt
    M    Source/WebCore/WebCore.xcodeproj/project.pbxproj
r87125 = c3af04450a613e1379f0e1ef9e3bd9054191b2d8 (refs/remotes/trunk)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/trunk.

Full output: http://queues.webkit.org/results/8727482
------- Comment #99 From 2011-05-24 08:58:15 PST -------
Created an attachment (id=94618) [details]
Patch
------- Comment #100 From 2011-05-24 09:33:24 PST -------
Created an attachment (id=94627) [details]
Patch for landing
------- Comment #101 From 2011-05-24 12:32:21 PST -------
(From update of attachment 94627 [details])
Clearing flags on attachment: 94627

Committed r87189: <http://trac.webkit.org/changeset/87189>
------- Comment #102 From 2011-05-24 12:32:52 PST -------
All reviewed patches have been landed.  Closing bug.
------- Comment #103 From 2011-05-24 12:38:29 PST -------
<rdar://problem/3714721>
------- Comment #104 From 2011-05-24 12:40:30 PST -------
I guess we should file a separate bug for the ability to save MHTML files?
------- Comment #105 From 2011-05-24 12:47:00 PST -------
(In reply to comment #104)
> I guess we should file a separate bug for the ability to save MHTML files?

There is already one:
https://bugs.webkit.org/show_bug.cgi?id=7169
------- Comment #106 From 2011-05-24 13:48:35 PST -------
The new tests are all failing with missing expectations on the Chromium canaries:

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=mhtml%2Fmulti_frames_ie.mht%2Cmhtml%2Fmulti_frames_unmht.mht%2Cmhtml%2Fpage_with_css_and_js_ie.mht%2Cmhtml%2Fpage_with_css_and_js_unmht.mht%2Cmhtml%2Fpage_with_image_ie.mht%2Cmhtml%2Fpage_with_image_unmht.mht%2Cmhtml%2Fsimple_page_ie.mht%2Cmhtml%2Fsimple_page_unmht.mht&group=%40ToT%20-%20chromium.org

Is this expected?  It looks like you put the proper expecations in platform/chromium/mhtml, not sure yet why they're not being picked up.
------- Comment #107 From 2011-05-24 14:21:58 PST -------
(In reply to comment #101)
> (From update of attachment 94627 [details] [details])
> Clearing flags on attachment: 94627
> 
> Committed r87189: <http://trac.webkit.org/changeset/87189>

When you edit one FeatureDefines.xcconfig, you need to edit them all (in JavaScriptCore, WebKit and WebKit2).  See the comment at the top of the file:

// The contents of this file must be kept in sync with FeatureDefines.xcconfig in JavaScriptCore,
// WebCore, WebKit and WebKit2.  Also the default values of the ENABLE_FEATURE_NAME macros in
// build-webkit should match the values below, but they do not need to be in the same order.
------- Comment #108 From 2011-05-24 14:41:40 PST -------
(In reply to comment #107)
> (In reply to comment #101)
> > (From update of attachment 94627 [details] [details] [details])
> > Clearing flags on attachment: 94627
> > 
> > Committed r87189: <http://trac.webkit.org/changeset/87189>
> 
> When you edit one FeatureDefines.xcconfig, you need to edit them all (in JavaScriptCore, WebKit and WebKit2).  See the comment at the top of the file:
> 
> // The contents of this file must be kept in sync with FeatureDefines.xcconfig in JavaScriptCore,
> // WebCore, WebKit and WebKit2.  Also the default values of the ENABLE_FEATURE_NAME macros in
> // build-webkit should match the values below, but they do not need to be in the same order.

Sorry I missed that.
I have a patch ready. Should I file a new bug for that?
------- Comment #109 From 2011-05-24 14:44:02 PST -------
(In reply to comment #108)
> (In reply to comment #107)
> > (In reply to comment #101)
> > > (From update of attachment 94627 [details] [details] [details] [details])
> > > Clearing flags on attachment: 94627
> > > 
> > > Committed r87189: <http://trac.webkit.org/changeset/87189>
> > 
> > When you edit one FeatureDefines.xcconfig, you need to edit them all (in JavaScriptCore, WebKit and WebKit2).  See the comment at the top of the file:
> > 
> > // The contents of this file must be kept in sync with FeatureDefines.xcconfig in JavaScriptCore,
> > // WebCore, WebKit and WebKit2.  Also the default values of the ENABLE_FEATURE_NAME macros in
> > // build-webkit should match the values below, but they do not need to be in the same order.
> 
> Sorry I missed that.
> I have a patch ready. Should I file a new bug for that?

Technically yes, but let's just say:  rs=me to fix it.  Thanks!
------- Comment #110 From 2011-05-24 15:03:40 PST -------
Created an attachment (id=94696) [details]
Updated to the various FeatureDefines.xcconfig I forgot previously.
------- Comment #111 From 2011-05-24 15:09:40 PST -------
(From update of attachment 94696 [details])
Clearing flags on attachment: 94696

Committed r87213: <http://trac.webkit.org/changeset/87213>
------- Comment #112 From 2011-05-24 15:09:57 PST -------
All reviewed patches have been landed.  Closing bug.