WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
128739
[Soup][Curl] HTTP header values should be treated as latin1, not UTF-8
https://bugs.webkit.org/show_bug.cgi?id=128739
Summary
[Soup][Curl] HTTP header values should be treated as latin1, not UTF-8
youenn fablet
Reported
2014-02-13 05:25:55 PST
WebKit soup layer (and probably CURL layer as well) treats HTTP headers and HTTP reason phrase as UTF-8. They should be treated as latin as per RFC 2616.
Attachments
Patch
(11.79 KB, patch)
2014-02-13 05:57 PST
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Changed method conversion to ASCII
(12.27 KB, patch)
2014-02-13 13:27 PST
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Fixing contentDisposition
(13.99 KB, patch)
2014-12-07 04:43 PST
,
youenn fablet
no flags
Details
Formatted Diff
Diff
Show Obsolete
(2)
View All
Add attachment
proposed patch, testcase, etc.
youenn fablet
Comment 1
2014-02-13 05:57:49 PST
Created
attachment 224062
[details]
Patch
Peter Gal
Comment 2
2014-02-13 06:44:01 PST
This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.
youenn fablet
Comment 3
2014-02-13 07:48:43 PST
(In reply to
comment #2
)
> This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests.
I can split the changes in CURL and SOUP specific patches, if that helps? As of UTF-8 related tests, I found one (LayoutTests/http/tests/misc/non-utf8-header-name). It is passing with and without the patch. I did not find any regression related to this patch in the LayoutTests/http but I may have missed something. This patch tries to align WebKit/Soup behavior with WebKit/mac and Firefox. To encode header values containing other character set than ASCII, one is expected to encode them according RFC 2047. The test is inspired from
https://github.com/w3c/web-platform-tests/tree/master/XMLHttpRequest/getresponseheader-special-characters.htm
Peter Gal
Comment 4
2014-02-13 08:26:43 PST
(In reply to
comment #3
)
> (In reply to
comment #2
) > > This patch also contains curl modifications so I don't see any point of the [soup] prefix. Also if I'm not mistaken there are already some tests which expects that the headers are in utf8, so this change could brake those tests. > > I can split the changes in CURL and SOUP specific patches, if that helps?
IMHO: just remove the [soup] prefix.
Sergio Villar Senin
Comment 5
2014-02-13 09:04:22 PST
Comment on
attachment 224062
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=224062&action=review
The patch looks good and correct from the specs POV. But I have some concerns specially related to headers like "Content-Disposition". Many servers don't respect the RFCs and send for example "filename=" using utf-8 or win-1250 strings (depending on the OS of the user-agent) instead of the only allowed encoding which is latin1. So I wonder if this could break downloads for example (we had some bugs related to the names of the downloaded files in the past). Let's see what danw or kov think about this.
> Source/WebCore/ChangeLog:3 > + [Soup] HTTP header values should be treated as latin, not UTF-8
You should probably update the title of the bug (and the the ChangeLog) to include also [curl] and use specifically latin1 instead of latin.
> Source/WebCore/ChangeLog:19 > + (WebCore::ResourceRequest::updateSoupMessageHeaders): Removed header UTF-8 translation
You can just keep this line here and use "Ditto." for the other files (remember to finish with a dot ".") instead of repeating the same description.
> LayoutTests/ChangeLog:11 > + * http/tests/xmlhttprequest/response-special-characters.html: Added (test that non ascii header & reason phrase values are correctly retrieved by the web application).
Better leave the word "Added." after each file, and include the description of what the test does after the "Reviewed by" line.
Dan Winship
Comment 6
2014-02-13 09:12:47 PST
There are probably tests for Content-Disposition charset-handling already anyway though? The fact that iso-8859-1 is correct according to the spec doesn't necessarily mean that it's the right behavior. But if this makes WebkitGTK more in sync with mac, then maybe it's the right thing. Note that httpbis says that unless you have specific knowledge about the contents, you should treat non-ASCII header values as opaque. And you should ignore the reason phrase in the response header entirely, except perhaps for debugging purposes.
>- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL);
the method has to be ASCII, so this isn't really useful
youenn fablet
Comment 7
2014-02-13 12:09:45 PST
(In reply to
comment #6
)
> There are probably tests for Content-Disposition charset-handling already anyway though?
Test http/tests/download/literal-utf-8.html actually checks that.
> >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); > >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); > > the method has to be ASCII, so this isn't really useful
Agreed, but it aligns with curl ResourceHandleManager.cpp
Martin Robinson
Comment 8
2014-02-13 12:28:21 PST
(In reply to
comment #7
)
> > >- g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().utf8().data(), NULL); > > >+ g_object_set(soupMessage, SOUP_MESSAGE_METHOD, httpMethod().latin1().data(), NULL); > > > > the method has to be ASCII, so this isn't really useful > > Agreed, but it aligns with curl ResourceHandleManager.cpp
There is a WTFString::ascii() method. --Martin
youenn fablet
Comment 9
2014-02-13 13:27:10 PST
Created
attachment 224098
[details]
Changed method conversion to ASCII
youenn fablet
Comment 10
2014-04-15 00:16:55 PDT
Patch is still valid and passes all xhr tests. Anybody to review it?
youenn fablet
Comment 11
2014-11-03 07:43:07 PST
Patch is still valid although it needs a rebase first. Also to be noted that patch aligns with RFC7230 which recently obsoloted RFC2616.
Sergio Villar Senin
Comment 12
2014-11-10 04:14:10 PST
Comment on
attachment 224098
[details]
Changed method conversion to ASCII OK, let's give it a try
youenn fablet
Comment 13
2014-12-07 04:43:20 PST
Created
attachment 242740
[details]
Fixing contentDisposition
youenn fablet
Comment 14
2014-12-07 04:44:48 PST
(In reply to
comment #13
)
> Created
attachment 242740
[details]
> Fixing contentDisposition
Rerunning the tests, I was apparently wrong with contentDisposition header. I added explicit UTF-8 header conversion to compute suggested filename. I also removed a gtk/efl specific expectation which is no longer needed after this patch.
Dan Winship
Comment 15
2014-12-07 05:29:31 PST
> while (soup_message_headers_iter_next(&headersIter, &headerName, &headerValue)) >- m_httpHeaderFields.set(String::fromUTF8(headerName), String::fromUTF8(headerValue)); >+ m_httpHeaderFields.set(String(headerName), String(headerValue));
That definitely seems correct; the headerName and headerValue returned from soup_message_headers_iter_next() are just the raw bytes from the wire, so it doesn't make sense for WebKit to be calling fromUTF8() on them if it doesn't have some specific reason to think they're utf-8.
Martin Robinson
Comment 16
2014-12-07 05:32:15 PST
Youenn, did you mean to mark this patch for review?
WebKit Commit Bot
Comment 17
2014-12-07 12:04:44 PST
Comment on
attachment 242740
[details]
Fixing contentDisposition Clearing flags on attachment: 242740 Committed
r176930
: <
http://trac.webkit.org/changeset/176930
>
WebKit Commit Bot
Comment 18
2014-12-07 12:04:50 PST
All reviewed patches have been landed. Closing bug.
Konstantin Tokarev
Comment 19
2016-01-19 10:53:30 PST
It seems to me like that curl implementation ResourceResponse::platformSuggestedFilename() also needs this fix, or am I missing anything?
youenn fablet
Comment 20
2016-01-19 12:54:35 PST
(In reply to
comment #19
)
> It seems to me like that curl implementation > ResourceResponse::platformSuggestedFilename() also needs this fix, or am I > missing anything?
It seems so, would you be able to file a bug?
Konstantin Tokarev
Comment 21
2016-01-19 13:45:29 PST
(In reply to
comment #20
)
> It seems so, would you be able to file a bug?
I can submit patch but I won't be able to test it (I'm just reading the code)
Konstantin Tokarev
Comment 22
2016-01-19 13:47:09 PST
I mean I won't be able to test curl part
Michael Catanzaro
Comment 23
2016-01-19 15:31:35 PST
CCing some folks who might be interested in the curl backend....
peavo
Comment 24
2016-01-20 00:10:23 PST
(In reply to
comment #22
)
> I mean I won't be able to test curl part
Thanks, I can run the http tests to test for regressions :)
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug