RESOLVED FIXED 128739
[Soup][Curl] HTTP header values should be treated as latin1, not UTF-8
https://bugs.webkit.org/show_bug.cgi?id=128739
Summary [Soup][Curl] HTTP header values should be treated as latin1, not UTF-8
youenn fablet
Reported 2014-02-13 05:25:55 PST
WebKit soup layer (and probably CURL layer as well) treats HTTP headers and HTTP reason phrase as UTF-8. They should be treated as latin as per RFC 2616.
Attachments
Patch (11.79 KB, patch)
2014-02-13 05:57 PST, youenn fablet
no flags
Changed method conversion to ASCII (12.27 KB, patch)
2014-02-13 13:27 PST, youenn fablet
no flags
Fixing contentDisposition (13.99 KB, patch)
2014-12-07 04:43 PST, youenn fablet
no flags
youenn fablet
Comment 1 2014-02-13 05:57:49 PST
Peter Gal
Comment 2 2014-02-13 06:44:01 PST
This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.
youenn fablet
Comment 3 2014-02-13 07:48:43 PST
(In reply to comment #2) > This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests. I can split the changes in CURL and SOUP specific patches, if that helps? As of UTF-8 related tests, I found one (LayoutTests/http/tests/misc/non-utf8-header-name). It is passing with and without the patch. I did not find any regression related to this patch in the LayoutTests/http but I may have missed something. This patch tries to align WebKit/Soup behavior with WebKit/mac and Firefox. To encode header values containing other character set than ASCII, one is expected to encode them according RFC 2047. The test is inspired from https://github.com/w3c/web-platform-tests/tree/master/XMLHttpRequest/getresponseheader-special-characters.htm
Peter Gal
Comment 4 2014-02-13 08:26:43 PST
(In reply to comment #3) > (In reply to comment #2) > > This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests. > > I can split the changes in CURL and SOUP specific patches, if that helps? IMHO: just remove the [soup] prefix.
Sergio Villar Senin
Comment 5 2014-02-13 09:04:22 PST
Comment on attachment 224062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=224062&action=review The patch looks good and correct from the specs POV. But I have some concerns specially related to headers like "Content-Disposition". Many servers don't respect the RFCs and send for example "filename=" using utf-8 or win-1250 strings (depending on the OS of the user-agent) instead of the only allowed encoding which is latin1. So I wonder if this could break downloads for example (we had some bugs related to the names of the downloaded files in the past). Let's see what danw or kov think about this. > Source/WebCore/ChangeLog:3 > + [Soup] HTTP header values should be treated as latin, not UTF-8 You should probably update the title of the bug (and the the ChangeLog) to include also [curl] and use specifically latin1 instead of latin. > Source/WebCore/ChangeLog:19 > + (WebCore::ResourceRequest::updateSoupMessageHeaders): Removed header UTF-8 translation You can just keep this line here and use "Ditto." for the other files (remember to finish with a dot ".") instead of repeating the same description. > LayoutTests/ChangeLog:11 > + * http/tests/xmlhttprequest/response-special-characters.html: Added (test that non ascii header & reason phrase values are correctly retrieved by the web application). Better leave the word "Added." after each file, and include the description of what the test does after the "Reviewed by" line.
Dan Winship
Comment 6 2014-02-13 09:12:47 PST
There are probably tests for Content-Disposition charset-handling already anyway though? The fact that iso-8859-1 is correct according to the spec doesn't necessarily mean that it's the right behavior. But if this makes WebkitGTK more in sync with mac, then maybe it's the right thing. Note that httpbis says that unless you have specific knowledge about the contents, you should treat non-ASCII header values as opaque. And you should ignore the reason phrase in the response header entirely, except perhaps for debugging purposes. >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); the method has to be ASCII, so this isn't really useful
youenn fablet
Comment 7 2014-02-13 12:09:45 PST
(In reply to comment #6) > There are probably tests for Content-Disposition charset-handling already anyway though? Test http/tests/download/literal-utf-8.html actually checks that. > >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); > >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); > > the method has to be ASCII, so this isn't really useful Agreed, but it aligns with curl ResourceHandleManager.cpp
Martin Robinson
Comment 8 2014-02-13 12:28:21 PST
(In reply to comment #7) > > >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); > > >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); > > > > the method has to be ASCII, so this isn't really useful > > Agreed, but it aligns with curl ResourceHandleManager.cpp There is a WTFString::ascii() method. --Martin
youenn fablet
Comment 9 2014-02-13 13:27:10 PST
Created attachment 224098 [details] Changed method conversion to ASCII
youenn fablet
Comment 10 2014-04-15 00:16:55 PDT
Patch is still valid and passes all xhr tests. Anybody to review it?
youenn fablet
Comment 11 2014-11-03 07:43:07 PST
Patch is still valid although it needs a rebase first. Also to be noted that patch aligns with RFC7230 which recently obsoloted RFC2616.
Sergio Villar Senin
Comment 12 2014-11-10 04:14:10 PST
Comment on attachment 224098 [details] Changed method conversion to ASCII OK, let's give it a try
youenn fablet
Comment 13 2014-12-07 04:43:20 PST
Created attachment 242740 [details] Fixing contentDisposition
youenn fablet
Comment 14 2014-12-07 04:44:48 PST
(In reply to comment #13) > Created attachment 242740 [details] > Fixing contentDisposition Rerunning the tests, I was apparently wrong with contentDisposition header. I added explicit UTF-8 header conversion to compute suggested filename. I also removed a gtk/efl specific expectation which is no longer needed after this patch.
Dan Winship
Comment 15 2014-12-07 05:29:31 PST
> while (soup_message_headers_iter_next(&headersIter, &headerName, &headerValue)) >- m_httpHeaderFields.set(String::fromUTF8(headerName), String::fromUTF8(headerValue)); >+ m_httpHeaderFields.set(String(headerName), String(headerValue)); That definitely seems correct; the headerName and headerValue returned from soup_message_headers_iter_next() are just the raw bytes from the wire, so it doesn't make sense for WebKit to be calling fromUTF8() on them if it doesn't have some specific reason to think they're utf-8.
Martin Robinson
Comment 16 2014-12-07 05:32:15 PST
Youenn, did you mean to mark this patch for review?
WebKit Commit Bot
Comment 17 2014-12-07 12:04:44 PST
Comment on attachment 242740 [details] Fixing contentDisposition Clearing flags on attachment: 242740 Committed r176930: <http://trac.webkit.org/changeset/176930>
WebKit Commit Bot
Comment 18 2014-12-07 12:04:50 PST
All reviewed patches have been landed. Closing bug.
Konstantin Tokarev
Comment 19 2016-01-19 10:53:30 PST
It seems to me like that curl implementation ResourceResponse::platformSuggestedFilename() also needs this fix, or am I missing anything?
youenn fablet
Comment 20 2016-01-19 12:54:35 PST
(In reply to comment #19) > It seems to me like that curl implementation > ResourceResponse::platformSuggestedFilename() also needs this fix, or am I > missing anything? It seems so, would you be able to file a bug?
Konstantin Tokarev
Comment 21 2016-01-19 13:45:29 PST
(In reply to comment #20) > It seems so, would you be able to file a bug? I can submit patch but I won't be able to test it (I'm just reading the code)
Konstantin Tokarev
Comment 22 2016-01-19 13:47:09 PST
I mean I won't be able to test curl part
Michael Catanzaro
Comment 23 2016-01-19 15:31:35 PST
CCing some folks who might be interested in the curl backend....
peavo
Comment 24 2016-01-20 00:10:23 PST
(In reply to comment #22) > I mean I won't be able to test curl part Thanks, I can run the http tests to test for regressions :)
Note You need to log in before you can comment on or make changes to this bug.