WebKit soup layer (and probably CURL layer as well) treats HTTP headers and HTTP reason phrase as UTF-8. They should be treated as latin as per RFC 2616.
Created attachment 224062 [details] Patch
This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.
(In reply to comment #2) > This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests. I can split the changes in CURL and SOUP specific patches, if that helps? As of UTF-8 related tests, I found one (LayoutTests/http/tests/misc/non-utf8-header-name). It is passing with and without the patch. I did not find any regression related to this patch in the LayoutTests/http but I may have missed something. This patch tries to align WebKit/Soup behavior with WebKit/mac and Firefox. To encode header values containing other character set than ASCII, one is expected to encode them according RFC 2047. The test is inspired from https://github.com/w3c/web-platform-tests/tree/master/XMLHttpRequest/getresponseheader-special-characters.htm
(In reply to comment #3) > (In reply to comment #2) > > This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests. > > I can split the changes in CURL and SOUP specific patches, if that helps? IMHO: just remove the [soup] prefix.
Comment on attachment 224062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=224062&action=review The patch looks good and correct from the specs POV. But I have some concerns specially related to headers like "Content-Disposition". Many servers don't respect the RFCs and send for example "filename=" using utf-8 or win-1250 strings (depending on the OS of the user-agent) instead of the only allowed encoding which is latin1. So I wonder if this could break downloads for example (we had some bugs related to the names of the downloaded files in the past). Let's see what danw or kov think about this. > Source/WebCore/ChangeLog:3 > + [Soup] HTTP header values should be treated as latin, not UTF-8 You should probably update the title of the bug (and the the ChangeLog) to include also [curl] and use specifically latin1 instead of latin. > Source/WebCore/ChangeLog:19 > + (WebCore::ResourceRequest::updateSoupMessageHeaders): Removed header UTF-8 translation You can just keep this line here and use "Ditto." for the other files (remember to finish with a dot ".") instead of repeating the same description. > LayoutTests/ChangeLog:11 > + * http/tests/xmlhttprequest/response-special-characters.html: Added (test that non ascii header & reason phrase values are correctly retrieved by the web application). Better leave the word "Added." after each file, and include the description of what the test does after the "Reviewed by" line.
There are probably tests for Content-Disposition charset-handling already anyway though? The fact that iso-8859-1 is correct according to the spec doesn't necessarily mean that it's the right behavior. But if this makes WebkitGTK more in sync with mac, then maybe it's the right thing. Note that httpbis says that unless you have specific knowledge about the contents, you should treat non-ASCII header values as opaque. And you should ignore the reason phrase in the response header entirely, except perhaps for debugging purposes. >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); the method has to be ASCII, so this isn't really useful
(In reply to comment #6) > There are probably tests for Content-Disposition charset-handling already anyway though? Test http/tests/download/literal-utf-8.html actually checks that. > >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); > >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); > > the method has to be ASCII, so this isn't really useful Agreed, but it aligns with curl ResourceHandleManager.cpp
(In reply to comment #7) > > >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); > > >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); > > > > the method has to be ASCII, so this isn't really useful > > Agreed, but it aligns with curl ResourceHandleManager.cpp There is a WTFString::ascii() method. --Martin
Created attachment 224098 [details] Changed method conversion to ASCII
Patch is still valid and passes all xhr tests. Anybody to review it?
Patch is still valid although it needs a rebase first. Also to be noted that patch aligns with RFC7230 which recently obsoloted RFC2616.
Comment on attachment 224098 [details] Changed method conversion to ASCII OK, let's give it a try
Created attachment 242740 [details] Fixing contentDisposition
(In reply to comment #13) > Created attachment 242740 [details] > Fixing contentDisposition Rerunning the tests, I was apparently wrong with contentDisposition header. I added explicit UTF-8 header conversion to compute suggested filename. I also removed a gtk/efl specific expectation which is no longer needed after this patch.
> while (soup_message_headers_iter_next(&headersIter, &headerName, &headerValue)) >- m_httpHeaderFields.set(String::fromUTF8(headerName), String::fromUTF8(headerValue)); >+ m_httpHeaderFields.set(String(headerName), String(headerValue)); That definitely seems correct; the headerName and headerValue returned from soup_message_headers_iter_next() are just the raw bytes from the wire, so it doesn't make sense for WebKit to be calling fromUTF8() on them if it doesn't have some specific reason to think they're utf-8.
Youenn, did you mean to mark this patch for review?
Comment on attachment 242740 [details] Fixing contentDisposition Clearing flags on attachment: 242740 Committed r176930: <http://trac.webkit.org/changeset/176930>
All reviewed patches have been landed. Closing bug.
It seems to me like that curl implementation ResourceResponse::platformSuggestedFilename() also needs this fix, or am I missing anything?
(In reply to comment #19) > It seems to me like that curl implementation > ResourceResponse::platformSuggestedFilename() also needs this fix, or am I > missing anything? It seems so, would you be able to file a bug?
(In reply to comment #20) > It seems so, would you be able to file a bug? I can submit patch but I won't be able to test it (I'm just reading the code)
I mean I won't be able to test curl part
CCing some folks who might be interested in the curl backend....
(In reply to comment #22) > I mean I won't be able to test curl part Thanks, I can run the http tests to test for regressions :)