WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED INVALID
87351
[BlackBerry] Cookie and Location header should be converted to latin-1/utf-8 in the same place.
https://bugs.webkit.org/show_bug.cgi?id=87351
Summary
[BlackBerry] Cookie and Location header should be converted to latin-1/utf-8 ...
Jason Liu
Reported
2012-05-24 00:50:32 PDT
The other headers may be set as utf-8, too. So, I think we should check all headers.
Attachments
Patch
(4.86 KB, patch)
2012-05-24 02:05 PDT
,
Jason Liu
abarth
: review-
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Jason Liu
Comment 1
2012-05-24 02:05:55 PDT
Created
attachment 143764
[details]
Patch
Jason Liu
Comment 2
2012-05-24 02:11:25 PDT
Hi, Joe We need to talk about this issue.
Joe Mason
Comment 3
2012-05-24 08:31:33 PDT
I definitely think this change is a good idea, since the main objection to doing it for all headers was cookies, and it turns out we do want to do it for cookies. Most well-defined headers (such as authentication) are defined to take only ASCII anyway. The only thing I'm not sure about is, this patch still does the latin1 or utf8 check in two different places, but it does it in a different way each time. Can we just use fromUTF8WithLatin1Fallback in initializePlatformRequest as well?
Adam Barth
Comment 4
2012-05-24 10:36:18 PDT
Comment on
attachment 143764
[details]
Patch This patch is wrong. HTTP headers are not UTF-8.
Joe Mason
Comment 5
2012-05-24 11:28:00 PDT
(In reply to
comment #4
)
> (From update of
attachment 143764
[details]
) > This patch is wrong. HTTP headers are not UTF-8.
Then we should at least make a whitelist (currently containing Location and Cookie, since those are the ones where we've found real-world sites sending UTF-8) and put them in the same place.
Adam Barth
Comment 6
2012-05-24 11:36:34 PDT
http://tools.ietf.org/html/rfc6265
specifies how to process the Cookie and Set-Cookie headers. It's not correct to decode the whole header value as UTF-8. The correct way to process the header as ASCII and then to use UTF-8 to decode portions of the header.
Jason Liu
Comment 7
2012-05-24 19:31:11 PDT
close it since we won't do this check.
Joe Mason
Comment 8
2012-05-25 07:35:51 PDT
Disagree. We already have a check that converts an entire Cookie header (but not Set-Cookie) to UTF-8 or Latin-1 depending on contents, in ResourceRequestBlackBerry, and a different check that does the same conversion for Location headers, in NetworkJob. If we're going to keep the Cookie check, we should at least merge this code to do the checks in the same place using the same method. Or if what Adam says is true, we need to fix the Cookie check to only work on part of the header at a time. But I argue that it's not important: (In reply to
comment #6
)
>
http://tools.ietf.org/html/rfc6265
specifies how to process the Cookie and Set-Cookie headers. It's not correct to decode the whole header value as UTF-8. The correct way to process the header as ASCII and then to use UTF-8 to decode portions of the header.
ASCII is a subset of UTF-8, so I don't see the difference between processing it as ASCII and then using UTF-8 to decode bytes which are not valid ASCII, and just decoding as UTF-8. All it says in RFC6265 is: NOTE: Despite its name, the cookie-string is actually a sequence of octets, not a sequence of characters. To convert the cookie-string (or components thereof) into a sequence of characters (e.g., for presentation to the user), the user agent might wish to try using the UTF-8 character encoding [RFC3629] to decode the octet sequence. This decoding might fail, however, because not every sequence of octets is valid UTF-8. Which implies that we can decode the whole header as UTF-8. This contradicts the BNF, in fact, which defines cookie-octet to only allow ASCII, but some sites do send UTF-8 and others send Latin-1, so we have to deal. It's possible that some of the components of a Set-Cookie header, like the domain, should cause the cookie to be rejected if they're not plain ASCII, but we're not doing this check for Set-Cookie (yet?)
Joe Mason
Comment 9
2012-05-25 07:37:56 PDT
(In reply to
comment #8
)
> This contradicts the BNF, in fact, which defines cookie-octet to only allow ASCII, but some sites do send UTF-8 and others send Latin-1, so we have to deal.
Sorry, I shouldn't say "send" - we're talking about Cookie, not Set-Cookie. I mean "expect".
Adam Barth
Comment 10
2012-05-25 15:55:28 PDT
> ASCII is a subset of UTF-8, so I don't see the difference between processing it as ASCII and then using UTF-8 to decode bytes which are not valid ASCII, and just decoding as UTF-8.
Those two operations are different. For example, consider a sequence of octets (like a BOM) in UTF-8 that, when decoded, doesn't produce any Unicode characters. If you first decode the header using UTF-8 and then attempt to parse it, you can get the wrong answer because those sequence of octets will have disappeared. For this reason, it's not possible to correctly process HTTP headers, be they the Cookie, Set-Cookie, or otherwise, in Unicode. You need to process them as sequences of octets in order to get the correct behavior. The design of handleNotifyHeaderReceived is broken and cannot be fixed without changing its type: void NetworkJob::handleNotifyHeaderReceived(const String& key, const String& value) Specifically, the key and the value need to be changed from Unicode strings to sequences of octets. I'm repeating myself, but it is not possible to correctly process HTTP header in Unicode.
> All it says in RFC6265 is: > > NOTE: Despite its name, the cookie-string is actually a sequence of > octets, not a sequence of characters. To convert the cookie-string > (or components thereof) into a sequence of characters (e.g., for > presentation to the user), the user agent might wish to try using the > UTF-8 character encoding [RFC3629] to decode the octet sequence. > This decoding might fail, however, because not every sequence of > octets is valid UTF-8.
Yes, I know what it says because I wrote it.
> Which implies that we can decode the whole header as UTF-8.
No, that's not what it says. It says explicitly that cookie-string is actually a sequence of octets, not a sequence of characters. If a user agent wishes to display the cookie-string to the user (e.g., using a font who's glyphs represent Unicode codepoints), then the user agent can try using UTF-8. However, nothing in that note says that it's possible to meet the request of the requirements in the spec by processing the cookie-string in Unicode. It doesn't say that because it's not possible.
> This contradicts the BNF, in fact, which defines cookie-octet to only allow ASCII, but some sites do send UTF-8 and others send Latin-1, so we have to deal.
Correct. Not all servers send Set-Cookie headers that comply with the BNF. That's why the RFC defines the precise handling of all sequences of octets that might be sent by servers.
> It's possible that some of the components of a Set-Cookie header, like the domain, should cause the cookie to be rejected if they're not plain ASCII, but we're not doing this check for Set-Cookie (yet?)
The design of this code is broken. The only way to correctly process HTTP header is as sequences of octets. Any attempt to process them in Unicode will not be correct. Period.
Adam Barth
Comment 11
2012-05-25 16:04:05 PDT
Eric points out on chat that you can get pretty close to correct behavior using Unicode. Most of the time, you'll get the write answer, but the problem is that you'll never be able to get everything right. Rather than mess around with a broken architecture, you should just stop trying to use Unicode to process HTTP and work in octets.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug