RESOLVED FIXED 25939
[GTK] Text attachments in bugzilla are downloaded instead of being opened
https://bugs.webkit.org/show_bug.cgi?id=25939
Summary [GTK] Text attachments in bugzilla are downloaded instead of being opened
Xan Lopez
Reported 2009-05-21 10:11:29 PDT
Title says it all pretty much. One example is attachment in comment #4 here: http://bugzilla.gnome.org/show_bug.cgi?id=583180
Attachments
Don't bailout if content-type is text/plain so we can setup the headers correctly (3.49 KB, patch)
2009-06-13 06:40 PDT, Jan Alonzo
no flags
updated patch to sniff text/plain data too - but only once we have the data (2.92 KB, patch)
2009-06-13 17:32 PDT, Jan Alonzo
no flags
update patch with updated comments and changelog (3.33 KB, patch)
2009-06-14 01:22 PDT, Jan Alonzo
abarth: review-
remove workaround and rely on libsoup to do the sniffing (2.95 KB, patch)
2009-07-03 05:30 PDT, Jan Alonzo
gustavo: review-
Jan Alonzo
Comment 1 2009-06-06 02:52:25 PDT
*** Bug 25400 has been marked as a duplicate of this bug. ***
Jan Alonzo
Comment 2 2009-06-13 06:40:39 PDT
Created attachment 31233 [details] Don't bailout if content-type is text/plain so we can setup the headers correctly We should only bail out of the got-headers callback if the content is not modified and we don't have a content-type. At the moment we are also bailing out if content-type is text/plain, hence we don't get to setup the headers correctly for text/plain data. I've stumbled on this while trying to enable more tests in http/tests/navigation. I've also removed a couple more tests and can probably serve as tests for this patch. I've tested with the launcher and works fine and can view text/plain now :)
Holger Freyther
Comment 3 2009-06-13 07:56:15 PDT
Thanks, this will allow me to use epiphany for reviewing. I will let kov review this one.
Xan Lopez
Comment 4 2009-06-13 08:17:10 PDT
Comment on attachment 31233 [details] Don't bailout if content-type is text/plain so we can setup the headers correctly > diff --git a/WebCore/platform/network/soup/ResourceHandleSoup.cpp b/WebCore/platform/network/soup/ResourceHandleSoup.cpp > index 6be13e2..04725b4 100644 > --- a/WebCore/platform/network/soup/ResourceHandleSoup.cpp > +++ b/WebCore/platform/network/soup/ResourceHandleSoup.cpp > @@ -151,7 +151,7 @@ static void fillResponseFromMessage(SoupMessage* msg, ResourceResponse* response > response->setHTTPHeaderField(name, value); > > GHashTable* contentTypeParameters = 0; > - String contentType = soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters); > + String contentType = String::fromUTF8(soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters)); This seems to be unrelated, and probably not needed? > > // When the server sends multiple Content-Type headers, soup will > // give us their values concatenated with commas as a separator; > @@ -255,8 +255,8 @@ static void gotHeadersCallback(SoupMessage* msg, gpointer data) > // headers; we will not do content sniffing for 304 responses, > // though, since they do not have a body. > const char* contentType = soup_message_headers_get_content_type(msg->response_headers, NULL); > - if ((msg->status_code != SOUP_STATUS_NOT_MODIFIED) > - && (!contentType || !g_ascii_strcasecmp(contentType, "text/plain"))) > + > + if (msg->status_code != SOUP_STATUS_NOT_MODIFIED && !contentType) > return; > > RefPtr<ResourceHandle> handle = static_cast<ResourceHandle*>(data); If you do a git blame you'll see you are basically un-doing what kov did to fix https://bugs.webkit.org/show_bug.cgi?id=24930, so I'd say this does not seem right.
Xan Lopez
Comment 5 2009-06-13 08:21:42 PDT
And for what is worth, IIRC the problem with this bug was text subtypes, not text/plain itself (that's why you can see some patches in bugzilla, but not those marked as patches I think). The actual fix was something like Christian tried on bug https://bugs.webkit.org/show_bug.cgi?id=24903.
Jan Alonzo
Comment 6 2009-06-13 16:32:24 PDT
(In reply to comment #4) > > @@ -151,7 +151,7 @@ static void fillResponseFromMessage(SoupMessage* msg, ResourceResponse* response > > response->setHTTPHeaderField(name, value); > > > > GHashTable* contentTypeParameters = 0; > > - String contentType = soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters); > > + String contentType = String::fromUTF8(soup_message_headers_get_content_type(msg->response_headers, &contentTypeParameters)); > > This seems to be unrelated, and probably not needed? I think I meant to put this in a separate patch. We have a bug https://bugs.webkit.org/show_bug.cgi?id=17432 that should avoid using String(char*) in Gtk and instead we should be using String::fromUTF8. > > > > > // When the server sends multiple Content-Type headers, soup will > > // give us their values concatenated with commas as a separator; > > @@ -255,8 +255,8 @@ static void gotHeadersCallback(SoupMessage* msg, gpointer data) > > // headers; we will not do content sniffing for 304 responses, > > // though, since they do not have a body. > > const char* contentType = soup_message_headers_get_content_type(msg->response_headers, NULL); > > - if ((msg->status_code != SOUP_STATUS_NOT_MODIFIED) > > - && (!contentType || !g_ascii_strcasecmp(contentType, "text/plain"))) > > + > > + if (msg->status_code != SOUP_STATUS_NOT_MODIFIED && !contentType) > > return; > > > > RefPtr<ResourceHandle> handle = static_cast<ResourceHandle*>(data); > > If you do a git blame you'll see you are basically un-doing what kov did to fix > https://bugs.webkit.org/show_bug.cgi?id=24930, so I'd say this does not seem > right. The patch in bug #24930 comment #7 probably seemed right at that time but looking at it now, it became a blanket solution for text/plain. What we really want is to just avoid displaying content that we can't display eventhough server sends a Content-type: text-plain. And I think we can do that in navigation-requested signal right (please correct me if i'm wrong)?
Jan Alonzo
Comment 7 2009-06-13 16:36:01 PDT
(In reply to comment #5) > And for what is worth, IIRC the problem with this bug was text subtypes, not > text/plain itself (that's why you can see some patches in bugzilla, but not > those marked as patches I think). The actual fix was something like Christian > tried on bug https://bugs.webkit.org/show_bug.cgi?id=24903. Thanks for pointing this out. I'd like to fix our issue with text/plain here and leave the media subtypes issue in bug #24903, as it's affecting some of the tests I want to run (if that's ok with you =)
Jan Alonzo
Comment 8 2009-06-13 16:43:50 PDT
(In reply to comment #6) > > The patch in bug #24930 comment #7 probably seemed right at that time but > looking at it now, it became a blanket solution for text/plain. What we really > want is to just avoid displaying content that we can't display eventhough > server sends a Content-type: text-plain. And I think we can do that in > navigation-requested signal right (please correct me if i'm wrong)? Sorry, I meant "decide policy for mime type" handlers.
Jan Alonzo
Comment 9 2009-06-13 17:32:15 PDT
Created attachment 31246 [details] updated patch to sniff text/plain data too - but only once we have the data Updated patch to bring back the fix in bug #24930 but only sniff once we have the data (i.e. in the got-chunk callback) and if the content-type in the header is text/plain. This also removes the blanket solution we're currently applying for text/plain. One caveat of this update is the tests I've enabled now fail. But those will be fixed once we have the text/ media subtypes issue fixed, which is discussed in a separate bug report.
Jan Alonzo
Comment 10 2009-06-14 01:22:29 PDT
Created attachment 31256 [details] update patch with updated comments and changelog This is almost the same patch except for updated inline comments and the use of g_content_type_equals instead of g_strcasecmp.
Adam Barth
Comment 11 2009-06-14 14:51:36 PDT
Comment on attachment 31256 [details] update patch with updated comments and changelog You really shouldn't sniff from text/plain. That causes big security problems.
Adam Barth
Comment 12 2009-06-14 14:53:11 PDT
Instead, we should use the HTML 5 content sniffing algorithm. See Bug 25064.
Jan Alonzo
Comment 13 2009-06-15 03:43:54 PDT
(In reply to comment #12) > Instead, we should use the HTML 5 content sniffing algorithm. See Bug 25064. Thanks for the review. Making it dependent on 25064.
Gustavo Noronha (kov)
Comment 14 2009-06-15 14:42:22 PDT
(In reply to comment #13) > (In reply to comment #12) > > Instead, we should use the HTML 5 content sniffing algorithm. See Bug 25064. > > Thanks for the review. Making it dependent on 25064. > Though by doing nothing we _are_ still sniffing text/plain. So my suggestion is this: sniff text/plain, and if the type resulting from the sniffing starts with "text/", do not change what we received (keep text/plain). If the type is something else, make it application/octet-stream. This is better than what we currently have security-wise, and gives us a simple solution while we do not get the mime types list updated, and have no HTML5 sniffing.
Adam Barth
Comment 15 2009-06-15 16:11:07 PDT
> Though by doing nothing we _are_ still sniffing text/plain. So my suggestion is > this: sniff text/plain, and if the type resulting from the sniffing starts with > "text/", do not change what we received (keep text/plain). If the type is > something else, make it application/octet-stream. That sounds like a big hack. > This is better than what we > currently have security-wise, and gives us a simple solution while we do not > get the mime types list updated, and have no HTML5 sniffing. If you're mainly concerned about text/plain <=> application/octet-stream, you can implement the text-or-binary algorithm, which is quite simple. Is there a big rush for resolving this issue? I think the best solution is to wait for Bug 25064 to be fixed.
Gustavo Noronha (kov)
Comment 16 2009-06-16 06:41:21 PDT
(In reply to comment #15) > > Though by doing nothing we _are_ still sniffing text/plain. So my suggestion is > > this: sniff text/plain, and if the type resulting from the sniffing starts with > > "text/", do not change what we received (keep text/plain). If the type is > > something else, make it application/octet-stream. > > That sounds like a big hack. Well, our whole sniffing code is a big hack. > > This is better than what we > > currently have security-wise, and gives us a simple solution while we do not > > get the mime types list updated, and have no HTML5 sniffing. > > If you're mainly concerned about text/plain <=> application/octet-stream, you > can implement the text-or-binary algorithm, which is quite simple. Yes, that's the main concern. Do you have a pointer on how to get this algorithm implemented? > Is there a big rush for resolving this issue? I think the best solution is to > wait for Bug 25064 to be fixed. Somewhat. We do snapshot releases quite often, and people are already using the code in preview releases of their software. If nothing else, we need to at least stop sniffing text/plain unconditionally, I think. The whole sniffing hack got written because we had no proper content sniffing in our http library, nor in webkit, and people were getting numerous download requests when, say, flash content was sent as application/octet-stream. What we need is really a quick fix that, even if not ideal, works well enough while we wait for proper sniffing to happen in either libsoup (I'm working on this), or webkit.
Adam Barth
Comment 17 2009-06-16 11:34:13 PDT
> Yes, that's the main concern. Do you have a pointer on how to get this > algorithm implemented? The algorithm is super simple. You can see it specified here: http://tools.ietf.org/html/draft-abarth-mime-sniff-01#section-4 or implemented here in the "LooksBinary" function here: http://src.chromium.org/viewvc/chrome/trunk/src/net/base/mime_sniffer.cc > Somewhat. We do snapshot releases quite often, and people are already using the > code in preview releases of their software. If nothing else, we need to at > least stop sniffing text/plain unconditionally, I think. I'd be happy to review a patch that does that. > What we need is really a quick fix that, even if not ideal, works well enough > while we wait for proper sniffing to happen in either libsoup (I'm working on > this), or webkit. Do you have a pointer to the libsoup discussion? I'd like to make sure it implements the HTML 5 algorithm.
Gustavo Noronha (kov)
Comment 18 2009-06-16 11:41:21 PDT
(In reply to comment #17) > > Yes, that's the main concern. Do you have a pointer on how to get this > > algorithm implemented? > > The algorithm is super simple. You can see it specified here: Thanks for the pointers! > > What we need is really a quick fix that, even if not ideal, works well enough > > while we wait for proper sniffing to happen in either libsoup (I'm working on > > this), or webkit. > > Do you have a pointer to the libsoup discussion? I'd like to make sure it > implements the HTML 5 algorithm. Implementing the HTML5 algorithm is the idea =): http://bugzilla.gnome.org/show_bug.cgi?id=572589
Jan Alonzo
Comment 19 2009-07-03 05:30:49 PDT
Created attachment 32236 [details] remove workaround and rely on libsoup to do the sniffing
Gustavo Noronha (kov)
Comment 20 2009-07-04 16:03:47 PDT
Comment on attachment 32236 [details] remove workaround and rely on libsoup to do the sniffing > + Remove workarounds for content type sniffing and rely on libsoup's > + content sniffer (now that it has one). This is not enough. We need to actually add the sniffer feature to the session, and use the new signal. I have a now-outdated webkitgtk+ patch at http://bugzilla.gnome.org/show_bug.cgi?id=572589 which I was planning to update and propose for reviewing. I'll do it tomorrow, but if you'd like to beat me to it, go ahead =D.
Jan Alonzo
Comment 21 2009-07-07 03:49:48 PDT
Fix landed in 45558. Related bug: https://bugs.webkit.org/show_bug.cgi?id=26982
Note You need to log in before you can comment on or make changes to this bug.