Bug 25939 - [GTK] Text attachments in bugzilla are downloaded instead of being opened
Summary: [GTK] Text attachments in bugzilla are downloaded instead of being opened
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Gustavo Noronha (kov)
URL:
Keywords:
: 25400 (view as bug list)
Depends on: 25064
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-21 10:11 PDT by Xan Lopez
Modified: 2009-07-07 03:49 PDT (History)
5 users (show)

See Also:


Attachments
Don't bailout if content-type is text/plain so we can setup the headers correctly (3.49 KB, patch)
2009-06-13 06:40 PDT, Jan Alonzo
no flags Details | Formatted Diff | Diff
updated patch to sniff text/plain data too - but only once we have the data (2.92 KB, patch)
2009-06-13 17:32 PDT, Jan Alonzo
no flags Details | Formatted Diff | Diff
update patch with updated comments and changelog (3.33 KB, patch)
2009-06-14 01:22 PDT, Jan Alonzo
abarth: review-
Details | Formatted Diff | Diff
remove workaround and rely on libsoup to do the sniffing (2.95 KB, patch)
2009-07-03 05:30 PDT, Jan Alonzo
gustavo: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xan Lopez 2009-05-21 10:11:29 PDT
Title says it all pretty much. One example is attachment in comment #4 here: http://bugzilla.gnome.org/show_bug.cgi?id=583180
Comment 1 Jan Alonzo 2009-06-06 02:52:25 PDT
*** Bug 25400 has been marked as a duplicate of this bug. ***
Comment 2 Jan Alonzo 2009-06-13 06:40:39 PDT
Created attachment 31233 [details]
Don't bailout if content-type is text/plain so we can setup the headers correctly

We should only bail out of the got-headers callback if the content is not modified and we don't have a content-type. At the moment we are also bailing out if content-type is text/plain, hence we don't get to setup the headers correctly for text/plain data. 

I've stumbled on this while trying to enable more tests in http/tests/navigation. I've also removed a couple more tests and can probably serve as tests for this patch.

I've tested with the launcher and works fine and can view text/plain now :)
Comment 3 Holger Freyther 2009-06-13 07:56:15 PDT
Thanks, this will allow me to use epiphany for reviewing. I will let kov review this one.
Comment 4 Xan Lopez 2009-06-13 08:17:10 PDT
Comment on attachment 31233 [details]
Don't bailout if content-type is text/plain so we can setup the headers correctly

> diff --git a/WebCore/platform/network/soup/ResourceHandleSoup.cpp b/WebCore/platform/network/soup/ResourceHandleSoup.cpp
> index 6be13e2..04725b4 100644
> --- a/WebCore/platform/network/soup/ResourceHandleSoup.cpp
> +++ b/WebCore/platform/network/soup/ResourceHandleSoup.cpp
> @@ -151,7 +151,7 @@ static void fillResponseFromMessage(SoupMessage* msg, ResourceResponse* response
>          response->setHTTPHeaderField(name, value);
>  
>      GHashTable* contentTypeParameters = 0;
> -    String contentType = soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters);
> +    String contentType = String::fromUTF8(soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters));

This seems to be unrelated, and probably not needed?

>  
>      // When the server sends multiple Content-Type headers, soup will
>      // give us their values concatenated with commas as a separator;
> @@ -255,8 +255,8 @@ static void gotHeadersCallback(SoupMessage* msg, gpointer data)
>      // headers; we will not do content sniffing for 304 responses,
>      // though, since they do not have a body.
>      const char* contentType = soup_message_headers_get_content_type(msg->response_headers, NULL);
> -    if ((msg->status_code != SOUP_STATUS_NOT_MODIFIED)
> -        && (!contentType || !g_ascii_strcasecmp(contentType, "text/plain")))
> +
> +    if (msg->status_code != SOUP_STATUS_NOT_MODIFIED && !contentType)
>          return;
>  
>      RefPtr<ResourceHandle> handle = static_cast<ResourceHandle*>(data);

If you do a git blame you'll see you are basically un-doing what kov did to fix https://bugs.webkit.org/show_bug.cgi?id=24930, so I'd say this does not seem right.
Comment 5 Xan Lopez 2009-06-13 08:21:42 PDT
And for what is worth, IIRC the problem with this bug was text subtypes, not text/plain itself (that's why you can see some patches in bugzilla, but not those marked as patches I think). The actual fix was something like Christian tried on bug https://bugs.webkit.org/show_bug.cgi?id=24903.
Comment 6 Jan Alonzo 2009-06-13 16:32:24 PDT
(In reply to comment #4)
> > @@ -151,7 +151,7 @@ static void fillResponseFromMessage(SoupMessage* msg, ResourceResponse* response
> >          response->setHTTPHeaderField(name, value);
> >  
> >      GHashTable* contentTypeParameters = 0;
> > -    String contentType = soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters);
> > +    String contentType = String::fromUTF8(soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters));
> 
> This seems to be unrelated, and probably not needed?

I think I meant to put this in a separate patch. We have a bug https://bugs.webkit.org/show_bug.cgi?id=17432 that should avoid using String(char*) in Gtk and instead we should be using String::fromUTF8.

> 
> >  
> >      // When the server sends multiple Content-Type headers, soup will
> >      // give us their values concatenated with commas as a separator;
> > @@ -255,8 +255,8 @@ static void gotHeadersCallback(SoupMessage* msg, gpointer data)
> >      // headers; we will not do content sniffing for 304 responses,
> >      // though, since they do not have a body.
> >      const char* contentType = soup_message_headers_get_content_type(msg->response_headers, NULL);
> > -    if ((msg->status_code != SOUP_STATUS_NOT_MODIFIED)
> > -        && (!contentType || !g_ascii_strcasecmp(contentType, "text/plain")))
> > +
> > +    if (msg->status_code != SOUP_STATUS_NOT_MODIFIED && !contentType)
> >          return;
> >  
> >      RefPtr<ResourceHandle> handle = static_cast<ResourceHandle*>(data);
> 
> If you do a git blame you'll see you are basically un-doing what kov did to fix
> https://bugs.webkit.org/show_bug.cgi?id=24930, so I'd say this does not seem
> right.

The patch in bug #24930 comment #7 probably seemed right at that time but looking at it now, it became a blanket solution for text/plain. What we really want is to just avoid displaying content that we can't display eventhough server sends a Content-type: text-plain. And I think we can do that in navigation-requested signal right (please correct me if i'm wrong)?
Comment 7 Jan Alonzo 2009-06-13 16:36:01 PDT
(In reply to comment #5)
> And for what is worth, IIRC the problem with this bug was text subtypes, not
> text/plain itself (that's why you can see some patches in bugzilla, but not
> those marked as patches I think). The actual fix was something like Christian
> tried on bug https://bugs.webkit.org/show_bug.cgi?id=24903.

Thanks for pointing this out. I'd like to fix our issue with text/plain here and leave the media subtypes issue in bug #24903, as it's affecting some of the tests I want to run (if that's ok with you =)

Comment 8 Jan Alonzo 2009-06-13 16:43:50 PDT
(In reply to comment #6)
> 
> The patch in bug #24930 comment #7 probably seemed right at that time but
> looking at it now, it became a blanket solution for text/plain. What we really
> want is to just avoid displaying content that we can't display eventhough
> server sends a Content-type: text-plain. And I think we can do that in
> navigation-requested signal right (please correct me if i'm wrong)?

Sorry, I meant "decide policy for mime type" handlers.


Comment 9 Jan Alonzo 2009-06-13 17:32:15 PDT
Created attachment 31246 [details]
updated patch to sniff text/plain data too - but only once we have the data

Updated patch to bring back the fix in bug #24930 but only sniff once we have the data (i.e. in the got-chunk callback) and if the content-type in the header is text/plain. This also removes the blanket solution we're currently applying for text/plain. 

One caveat of this update is the tests I've enabled now fail. But those will be fixed once we have the text/ media subtypes issue fixed, which is discussed in a separate bug report.
Comment 10 Jan Alonzo 2009-06-14 01:22:29 PDT
Created attachment 31256 [details]
update patch with updated comments and changelog

This is almost the same patch except for updated inline comments and the use of g_content_type_equals instead of g_strcasecmp.
Comment 11 Adam Barth 2009-06-14 14:51:36 PDT
Comment on attachment 31256 [details]
update patch with updated comments and changelog

You really shouldn't sniff from text/plain.  That causes big security problems.
Comment 12 Adam Barth 2009-06-14 14:53:11 PDT
Instead, we should use the HTML 5 content sniffing algorithm.  See Bug 25064.
Comment 13 Jan Alonzo 2009-06-15 03:43:54 PDT
(In reply to comment #12)
> Instead, we should use the HTML 5 content sniffing algorithm.  See Bug 25064.

Thanks for the review. Making it dependent on 25064.
Comment 14 Gustavo Noronha (kov) 2009-06-15 14:42:22 PDT
(In reply to comment #13)
> (In reply to comment #12)
> > Instead, we should use the HTML 5 content sniffing algorithm.  See Bug 25064.
> 
> Thanks for the review. Making it dependent on 25064.
> 

Though by doing nothing we _are_ still sniffing text/plain. So my suggestion is this: sniff text/plain, and if the type resulting from the sniffing starts with "text/", do not change what we received (keep text/plain). If the type is something else, make it application/octet-stream. This is better than what we currently have security-wise, and gives us a simple solution while we do not get the mime types list updated, and have no HTML5 sniffing.
Comment 15 Adam Barth 2009-06-15 16:11:07 PDT
> Though by doing nothing we _are_ still sniffing text/plain. So my suggestion is
> this: sniff text/plain, and if the type resulting from the sniffing starts with
> "text/", do not change what we received (keep text/plain). If the type is
> something else, make it application/octet-stream.

That sounds like a big hack.

> This is better than what we
> currently have security-wise, and gives us a simple solution while we do not
> get the mime types list updated, and have no HTML5 sniffing.

If you're mainly concerned about text/plain <=> application/octet-stream, you can implement the text-or-binary algorithm, which is quite simple.

Is there a big rush for resolving this issue?  I think the best solution is to wait for Bug 25064 to be fixed.
Comment 16 Gustavo Noronha (kov) 2009-06-16 06:41:21 PDT
(In reply to comment #15)
> > Though by doing nothing we _are_ still sniffing text/plain. So my suggestion is
> > this: sniff text/plain, and if the type resulting from the sniffing starts with
> > "text/", do not change what we received (keep text/plain). If the type is
> > something else, make it application/octet-stream.
> 
> That sounds like a big hack.

Well, our whole sniffing code is a big hack.

> > This is better than what we
> > currently have security-wise, and gives us a simple solution while we do not
> > get the mime types list updated, and have no HTML5 sniffing.
> 
> If you're mainly concerned about text/plain <=> application/octet-stream, you
> can implement the text-or-binary algorithm, which is quite simple.

Yes, that's the main concern. Do you have a pointer on how to get this algorithm implemented?

> Is there a big rush for resolving this issue?  I think the best solution is to
> wait for Bug 25064 to be fixed.
 
Somewhat. We do snapshot releases quite often, and people are already using the code in preview releases of their software. If nothing else, we need to at least stop sniffing text/plain unconditionally, I think. The whole sniffing hack got written because we had no proper content sniffing in our http library, nor in webkit, and people were getting numerous download requests when, say, flash content was sent as application/octet-stream.

What we need is really a quick fix that, even if not ideal, works well enough while we wait for proper sniffing to happen in either libsoup (I'm working on this), or webkit.
Comment 17 Adam Barth 2009-06-16 11:34:13 PDT
> Yes, that's the main concern. Do you have a pointer on how to get this
> algorithm implemented?

The algorithm is super simple.  You can see it specified here:

http://tools.ietf.org/html/draft-abarth-mime-sniff-01#section-4

or implemented here in the "LooksBinary" function here:

http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc

> Somewhat. We do snapshot releases quite often, and people are already using the
> code in preview releases of their software. If nothing else, we need to at
> least stop sniffing text/plain unconditionally, I think.

I'd be happy to review a patch that does that.

> What we need is really a quick fix that, even if not ideal, works well enough
> while we wait for proper sniffing to happen in either libsoup (I'm working on
> this), or webkit.

Do you have a pointer to the libsoup discussion?  I'd like to make sure it implements the HTML 5 algorithm.
Comment 18 Gustavo Noronha (kov) 2009-06-16 11:41:21 PDT
(In reply to comment #17)
> > Yes, that's the main concern. Do you have a pointer on how to get this
> > algorithm implemented?
> 
> The algorithm is super simple.  You can see it specified here:

Thanks for the pointers!

> > What we need is really a quick fix that, even if not ideal, works well enough
> > while we wait for proper sniffing to happen in either libsoup (I'm working on
> > this), or webkit.
> 
> Do you have a pointer to the libsoup discussion?  I'd like to make sure it
> implements the HTML 5 algorithm.

Implementing the HTML5 algorithm is the idea =): http://bugzilla.gnome.org/show_bug.cgi?id=572589

Comment 19 Jan Alonzo 2009-07-03 05:30:49 PDT
Created attachment 32236 [details]
remove workaround and rely on libsoup to do the sniffing
Comment 20 Gustavo Noronha (kov) 2009-07-04 16:03:47 PDT
Comment on attachment 32236 [details]
remove workaround and rely on libsoup to do the sniffing

> +        Remove workarounds for content type sniffing and rely on libsoup's
> +        content sniffer (now that it has one).

This is not enough. We need to actually add the sniffer feature to the session, and use the new signal. I have a now-outdated webkitgtk+ patch at http://bugzilla.gnome.org/show_bug.cgi?id=572589 which I was planning to update and propose for reviewing. I'll do it tomorrow, but if you'd like to beat me to it, go ahead =D.
Comment 21 Jan Alonzo 2009-07-07 03:49:48 PDT
Fix landed in 45558. Related bug: https://bugs.webkit.org/show_bug.cgi?id=26982