Bug 89978

Summary: [GTK] MHTML files not being loaded due to reported mime type not supported
Product: WebKit Reporter: Mario Sanchez Prada <mario>
Component: WebKitGTKAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: cgarcia, japhet, jcivelli, jochen, mrobinson, svillar, webkit.review.bot
Priority: P2 Keywords: Gtk
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 89987    
Attachments:
Description Flags
Patch proposal cgarcia: review+

Description Mario Sanchez Prada 2012-06-26 06:56:36 PDT
With bug 7168 fixed, we have now MHTML read support in WebKit, which will enable any WebKit-based browser to read MHTML files as if they were "normal" HTML files with the external resources embedded in it (normally using base64 encoding).

However, at the moment any WebKitGtk based browser willing to load a MHTML file (typically with a .mht extension) won't be able to do it because the patch for bug 7168 enables WebCore to load files of mime type multipart/related only, when MHTML is enabled at build time:

    Source/WebCore/loader/archive/ArchiveFactory.cpp
    ------------------------------------------------
    [...]
    #if ENABLE(MHTML)
        mimeTypes.set("multipart/related", archiveFactoryCreate<MHTMLArchive>);
    #endif
    [...]

This seems to be working wonderfully in other WebKit-based browsers (e.g. Chromium) but not in WebKitGtk, since the mimetype obtained for those MHTML files is not multipart/related, but message/rfc822[1] instead.

I've been digging this issue for some time today and found out that the sequence of libraries used to retrieve the mime type for a given file, when using WebKitGTK, is as follows:

  WKGTK -> libsoup -> GIO -> GVFS -> GNOME-VFS -> shared-mime-info (from freedesktop.org)

So, I've hacked for a while on shared-mime-info's XML file where the heuristics and matching rules are defined and managed to get it telling me at some point that .mht files generated with WebKit were actually of mime type multipart/related, instead of message/rfc822. Unfortunately, the very same patch in shared-mime-info caused some regressions there, making some files containing true email messages report multipart/related too, instead of message/rfc822.

For the sake of completeness, this is the an excerpt of the content of the MHTML file I was using for testing:

  From: <Saved by WebKit>
  Subject: 
  Date: Wed, 26 Jun 2012 12:38:36 +0100
  MIME-Version: 1.0
  Content-Type: multipart/related;
        type="text/html";
        boundary="----=_NextPart_000_47A9_6EDBB149.472FD3B3"

  ------=_NextPart_000_47A9_6EDBB149.472FD3B3
  Content-Type: text/html
  Content-Transfer-Encoding: quoted-printable
  Content-Location: file:///home/mario/work/gnome3/WebKitTests/mhtml.html

  <html><head><meta charset=3D"ISO-8859-1"></head><body>
      A red box: <img src=3D"data:image/png;base64,iVBORw0...ggg=3D=3D"><br>
      A blue box: <img src=3D"data:image/png;base64,iVBORw...ggg=3D=3D">
   =20
  </body></html>
  ------=_NextPart_000_47A9_6EDBB149.472FD3B3
  Content-Type: image/png
  Content-Transfer-Encoding: base64
  Content-Location:
  data:image/png;base64,iVBORw0...ggg==
  [...]


So, after reaching that point I realized that perhaps the problem was not in shared-mime-info since, after all, despite of it being a .mht file, it is also true that it is written in a perfectly valid e-mail format, so it's not strange that it's returning message/rfc822. So I now think it would be better to make some changes here in WebKit (at least in WebKitGTK) just to support reading MHTML files, hence this bug.

Furthermore, we should also consider that there are cases where a MHTML file is not encoded using multipart/related (when no there are no external resources linked, according to [2]), which seems to suggest that the 'multipart/related' rule added with the patch for bug 7168 could be insufficient, and also that making shared-mime-info return that mime tipe instead of message/rfc822 could even be wrong (since they could very well be encoded in text/plain or text/html).

So, I'll be working in a patch for fixing this issue, at least in WebKitGTK. Otherwise, the efforts done in bug 89872 and bug 89873 will be pointless, at least for GTK.

[1] http://tools.ietf.org/html/rfc822
[2] http://tools.ietf.org/html/rfc2557#section-9.1
Comment 1 Mario Sanchez Prada 2012-06-26 09:19:03 PDT
Created attachment 149543 [details]
Patch proposal

Attaching patch that adds 'message/rfc822' to the list of supported MIME types when MHTML is enabled.

The tests for this changeset is the fact that we add new expectations to platform/gtk/mhtml, that will be taken into consideration as soon as we enable MHTML by default at build time (bug+patch  coming soon)
Comment 2 Mario Sanchez Prada 2012-06-28 02:11:13 PDT
As the author of the the commit adding MHTML read support in WebCore [1], I'm adding Jay Civelli to CC.

He could probably be interested in providing feedback about to the following doubts I still have:

 - Do you agree with this bug and with the proposed solution?
   (at the moment the current patch is meant to affect GTK only)

 - As test expectations are identical for GTK than for 
   chromium, would you agree with moving them out from
   platform/chromium/mhtml to mhtml/? Or would it be better
   to introduce new expectations files for GTK (even if they
   are identical) under platform/gtk/mhtml?

About the second one, I think it would be better to share expectations (thus moving to mhtml/). After all, platforms building without the MHTML feature enabled will skip that directory when running the tests.

[1] http://trac.webkit.org/changeset/87189
Comment 3 Mario Sanchez Prada 2012-07-18 03:33:11 PDT
Ping reviewers?
Comment 4 Carlos Garcia Campos 2012-07-18 03:40:28 PDT
Comment on attachment 149543 [details]
Patch proposal

I guess we can land this patch for now, and simply move the test expectation later if chromium guys agree on it.
Comment 5 Jay Civelli 2012-07-18 10:02:47 PDT
Sorry for the late reply.
I am OK with adding message/rfc822 as a MHTML mime-type.

For the test expectations, yes, we should share them.
Feel free to make that in a subsequent patch.

Thanks.
Comment 6 Mario Sanchez Prada 2012-07-18 10:17:26 PDT
(In reply to comment #5)
> Sorry for the late reply.
> I am OK with adding message/rfc822 as a MHTML mime-type.
> 
> For the test expectations, yes, we should share them.
> Feel free to make that in a subsequent patch.

Great. Now I need to leave but I will take care of updating the patch tomorrow already.

Thanks for the review.
Comment 7 Mario Sanchez Prada 2012-07-19 04:00:04 PDT
Committed r123088: <http://trac.webkit.org/changeset/123088>