Bug 128739 - [Soup][Curl] HTTP header values should be treated as latin1, not UTF-8
Summary: [Soup][Curl] HTTP header values should be treated as latin1, not UTF-8
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-13 05:25 PST by youenn fablet
Modified: 2016-01-20 00:10 PST (History)
12 users (show)

See Also:


Attachments
Patch (11.79 KB, patch)
2014-02-13 05:57 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Changed method conversion to ASCII (12.27 KB, patch)
2014-02-13 13:27 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Fixing contentDisposition (13.99 KB, patch)
2014-12-07 04:43 PST, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description youenn fablet 2014-02-13 05:25:55 PST
WebKit soup layer (and probably CURL layer as well) treats HTTP headers and HTTP reason phrase as UTF-8.
They should be treated as latin as per RFC 2616.
Comment 1 youenn fablet 2014-02-13 05:57:49 PST
Created attachment 224062 [details]
Patch
Comment 2 Peter Gal 2014-02-13 06:44:01 PST
This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.
Comment 3 youenn fablet 2014-02-13 07:48:43 PST
(In reply to comment #2)
> This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.

I can split the changes in CURL and SOUP specific patches, if that helps?

As of UTF-8 related tests, I found one (LayoutTests/http/tests/misc/non-utf8-header-name). It is passing with and without the patch.
I did not find any regression related to this patch in the LayoutTests/http but I may have missed something.

This patch tries to align WebKit/Soup behavior with WebKit/mac and Firefox.
To encode header values containing other character set than ASCII, one is expected to encode them according RFC 2047.

The test is inspired from https://github.com/w3c/web-platform-tests/tree/master/XMLHttpRequest/getresponseheader-special-characters.htm
Comment 4 Peter Gal 2014-02-13 08:26:43 PST
(In reply to comment #3)
> (In reply to comment #2)
> > This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.
> 
> I can split the changes in CURL and SOUP specific patches, if that helps?

IMHO: just remove the [soup] prefix.
Comment 5 Sergio Villar Senin 2014-02-13 09:04:22 PST
Comment on attachment 224062 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224062&action=review

The patch looks good and correct from the specs POV. But I have some concerns specially related to headers like "Content-Disposition". Many servers don't respect the RFCs and send for example "filename=" using utf-8 or win-1250 strings (depending on the OS of the user-agent) instead of the only allowed encoding which is latin1. So I wonder if this could break downloads for example (we had some bugs related to the names of the downloaded files in the past).

Let's see what danw or kov think about this.

> Source/WebCore/ChangeLog:3
> +        [Soup] HTTP header values should be treated as latin, not UTF-8

You should probably update the title of the bug (and the the ChangeLog) to include also [curl] and use specifically latin1 instead of latin.

> Source/WebCore/ChangeLog:19
> +        (WebCore::ResourceRequest::updateSoupMessageHeaders): Removed header UTF-8 translation

You can just keep this line here and use "Ditto." for the other files (remember to finish with a dot ".") instead of repeating the same description.

> LayoutTests/ChangeLog:11
> +        * http/tests/xmlhttprequest/response-special-characters.html: Added (test that non ascii header & reason phrase values are correctly retrieved by the web application).

Better leave the word "Added." after each file, and include the description of what the test does after the "Reviewed by" line.
Comment 6 Dan Winship 2014-02-13 09:12:47 PST
There are probably tests for Content-Disposition charset-handling already anyway though?

The fact that iso-8859-1 is correct according to the spec doesn't necessarily mean that it's the right behavior. But if this makes WebkitGTK more in sync with mac, then maybe it's the right thing. Note that httpbis says that unless you have specific knowledge about the contents, you should treat non-ASCII header values as opaque. And you should ignore the reason phrase in the response header entirely, except perhaps for debugging purposes.

>-    g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL);
>+    g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL);

the method has to be ASCII, so this isn't really useful
Comment 7 youenn fablet 2014-02-13 12:09:45 PST
(In reply to comment #6)
> There are probably tests for Content-Disposition charset-handling already anyway though?

Test http/tests/download/literal-utf-8.html actually checks that.

> >-    g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL);
> >+    g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL);
> 
> the method has to be ASCII, so this isn't really useful

Agreed, but it aligns with curl ResourceHandleManager.cpp
Comment 8 Martin Robinson 2014-02-13 12:28:21 PST
(In reply to comment #7)

> > >-    g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL);
> > >+    g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL);
> > 
> > the method has to be ASCII, so this isn't really useful
> 
> Agreed, but it aligns with curl ResourceHandleManager.cpp


There is a WTFString::ascii() method.

--Martin
Comment 9 youenn fablet 2014-02-13 13:27:10 PST
Created attachment 224098 [details]
Changed method conversion to ASCII
Comment 10 youenn fablet 2014-04-15 00:16:55 PDT
Patch is still valid and passes all xhr tests.
Anybody to review it?
Comment 11 youenn fablet 2014-11-03 07:43:07 PST
Patch is still valid although it needs a rebase first.
Also to be noted that patch aligns with RFC7230 which recently obsoloted RFC2616.
Comment 12 Sergio Villar Senin 2014-11-10 04:14:10 PST
Comment on attachment 224098 [details]
Changed method conversion to ASCII

OK, let's give it a try
Comment 13 youenn fablet 2014-12-07 04:43:20 PST
Created attachment 242740 [details]
Fixing contentDisposition
Comment 14 youenn fablet 2014-12-07 04:44:48 PST
(In reply to comment #13)
> Created attachment 242740 [details]
> Fixing contentDisposition

Rerunning the tests, I was apparently wrong with contentDisposition header.
I added explicit UTF-8 header conversion to compute suggested filename.
I also removed a gtk/efl specific expectation which is no longer needed after this patch.
Comment 15 Dan Winship 2014-12-07 05:29:31 PST
>     while (soup_message_headers_iter_next(&headersIter, &headerName, &headerValue))
>-        m_httpHeaderFields.set(String::fromUTF8(headerName), String::fromUTF8(headerValue));
>+        m_httpHeaderFields.set(String(headerName), String(headerValue));

That definitely seems correct; the headerName and headerValue returned from soup_message_headers_iter_next() are just the raw bytes from the wire, so it doesn't make sense for WebKit to be calling fromUTF8() on them if it doesn't have some specific reason to think they're utf-8.
Comment 16 Martin Robinson 2014-12-07 05:32:15 PST
Youenn, did you mean to mark this patch for review?
Comment 17 WebKit Commit Bot 2014-12-07 12:04:44 PST
Comment on attachment 242740 [details]
Fixing contentDisposition

Clearing flags on attachment: 242740

Committed r176930: <http://trac.webkit.org/changeset/176930>
Comment 18 WebKit Commit Bot 2014-12-07 12:04:50 PST
All reviewed patches have been landed.  Closing bug.
Comment 19 Konstantin Tokarev 2016-01-19 10:53:30 PST
It seems to me like that curl implementation ResourceResponse::platformSuggestedFilename() also needs this fix, or am I missing anything?
Comment 20 youenn fablet 2016-01-19 12:54:35 PST
(In reply to comment #19)
> It seems to me like that curl implementation
> ResourceResponse::platformSuggestedFilename() also needs this fix, or am I
> missing anything?

It seems so, would you be able to file a bug?
Comment 21 Konstantin Tokarev 2016-01-19 13:45:29 PST
(In reply to comment #20)
> It seems so, would you be able to file a bug?

I can submit patch but I won't be able to test it (I'm just reading the code)
Comment 22 Konstantin Tokarev 2016-01-19 13:47:09 PST
I mean I won't be able to test curl part
Comment 23 Michael Catanzaro 2016-01-19 15:31:35 PST
CCing some folks who might be interested in the curl backend....
Comment 24 peavo 2016-01-20 00:10:23 PST
(In reply to comment #22)
> I mean I won't be able to test curl part

Thanks, I can run the http tests to test for regressions :)