WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
214860
[EventSource] WebKit fails to UTF-8 encode the Last-Event-ID HTTP header value
https://bugs.webkit.org/show_bug.cgi?id=214860
Summary
[EventSource] WebKit fails to UTF-8 encode the Last-Event-ID HTTP header value
Chris Dumez
Reported
2020-07-27 16:48:23 PDT
WebKit fails to UTF-8 encode the Last-Event-ID HTTP header value for EventSource. The specification is here:
https://html.spec.whatwg.org/multipage/server-sent-events.html#reestablish-the-connection
(step 5.3.) This is causing us to fail a couple of EventSource web-platform-tests which are passing is both Chrome & Firefox.
Attachments
Patch
(4.11 KB, patch)
2020-07-27 16:49 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(4.11 KB, patch)
2020-07-27 16:50 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(6.77 KB, patch)
2020-07-28 12:54 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Patch
(6.83 KB, patch)
2020-07-28 14:37 PDT
,
Chris Dumez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Chris Dumez
Comment 1
2020-07-27 16:49:57 PDT
Created
attachment 405329
[details]
Patch
Chris Dumez
Comment 2
2020-07-27 16:50:34 PDT
Created
attachment 405330
[details]
Patch
Chris Dumez
Comment 3
2020-07-27 16:51:41 PDT
C.f.
https://wpt.fyi/results/eventsource/format-field-id.htm?label=master&product=safari&product=chrome&product=firefox&q=%28safari%3A%21pass%26safari%3A%21ok%29%20%28chrome%3Apass%7Cchrome%3Aok%29%20%28firefox%3Apass%7Cfirefox%3Aok%29%20%28webkitgtk%3Apass%7Cwebkitgtk%3Aok%29
Darin Adler
Comment 4
2020-07-27 16:53:11 PDT
Comment on
attachment 405330
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405330&action=review
> Source/WebCore/page/EventSource.cpp:106 > + // Specification says this needs to be UTF8-encoded. > + auto lastEventID = m_lastEventId.utf8(); > + request.setHTTPHeaderField(HTTPHeaderName::LastEventID, String(reinterpret_cast<const LChar*>(lastEventID.data()), lastEventID.length()));
This is not the correct way to do this. We should never have a WTF::String with characters that are UTF-8 encoded. It’s *defined* as Latin-1.
Darin Adler
Comment 5
2020-07-27 16:53:52 PDT
Comment on
attachment 405330
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405330&action=review
>> Source/WebCore/page/EventSource.cpp:106 >> + request.setHTTPHeaderField(HTTPHeaderName::LastEventID, String(reinterpret_cast<const LChar*>(lastEventID.data()), lastEventID.length())); > > This is not the correct way to do this. We should never have a WTF::String with characters that are UTF-8 encoded. It’s *defined* as Latin-1.
Where is the code that converts the String into either a CFString or into bytes that go on the wire?
Chris Dumez
Comment 6
2020-07-27 16:55:41 PDT
(In reply to Darin Adler from
comment #5
)
> Comment on
attachment 405330
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=405330&action=review
> > >> Source/WebCore/page/EventSource.cpp:106 > >> + request.setHTTPHeaderField(HTTPHeaderName::LastEventID, String(reinterpret_cast<const LChar*>(lastEventID.data()), lastEventID.length())); > > > > This is not the correct way to do this. We should never have a WTF::String with characters that are UTF-8 encoded. It’s *defined* as Latin-1. > > Where is the code that converts the String into either a CFString or into > bytes that go on the wire?
I believe this would be ResourceRequest::doUpdatePlatformRequest() in ResourceRequestCocoa.mm: for (const auto& header : httpHeaderFields()) [nsRequest setValue:header.value forHTTPHeaderField:header.key];
Chris Dumez
Comment 7
2020-07-27 16:57:47 PDT
(In reply to Chris Dumez from
comment #6
)
> (In reply to Darin Adler from
comment #5
) > > Comment on
attachment 405330
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=405330&action=review
> > > > >> Source/WebCore/page/EventSource.cpp:106 > > >> + request.setHTTPHeaderField(HTTPHeaderName::LastEventID, String(reinterpret_cast<const LChar*>(lastEventID.data()), lastEventID.length())); > > > > > > This is not the correct way to do this. We should never have a WTF::String with characters that are UTF-8 encoded. It’s *defined* as Latin-1. > > > > Where is the code that converts the String into either a CFString or into > > bytes that go on the wire? > > I believe this would be ResourceRequest::doUpdatePlatformRequest() in > ResourceRequestCocoa.mm: > > for (const auto& header : httpHeaderFields()) > [nsRequest setValue:header.value forHTTPHeaderField:header.key];
Do you think we should special-case Last-Event-ID in ResourceRequest::doUpdatePlatformRequest() instead?
Darin Adler
Comment 8
2020-07-27 17:02:55 PDT
I started talking about this in Slack: The bug seems to be in the CFNetwork layer. When converting the WTF::String to the CFString the characters are not in any particular encoding; conceptually they are represented as a sequence of UTF-16 code units (although CFString can store them in whatever way it likes internally). Converting the CFString to bytes to go on the wire is done inside CFNetwork; apparently incorrectly if it’s not using UTF-8! Making a CFString that has UTF-16 code units that are actually UTF-8 code units with the high bytes set to zero is a possible way to work around that CFNetwork issue, I guess, but a dangerous workaround. It would be broken if CFNetwork was ever fixed to correctly encode strings as UTF-8 (the string would be double-encoded).
Chris Dumez
Comment 9
2020-07-27 17:12:58 PDT
(In reply to Darin Adler from
comment #8
)
> I started talking about this in Slack: > > The bug seems to be in the CFNetwork layer. > > When converting the WTF::String to the CFString the characters are not in > any particular encoding; conceptually they are represented as a sequence of > UTF-16 code units (although CFString can store them in whatever way it likes > internally). > > Converting the CFString to bytes to go on the wire is done inside CFNetwork; > apparently incorrectly if it’s not using UTF-8! > > Making a CFString that has UTF-16 code units that are actually UTF-8 code > units with the high bytes set to zero is a possible way to work around that > CFNetwork issue, I guess, but a dangerous workaround. It would be broken if > CFNetwork was ever fixed to correctly encode strings as UTF-8 (the string > would be double-encoded).
I am not convinced this is a CFNetwork bug. Headers are ISO-8859-1 encoded as per RFC: -
https://tools.ietf.org/html/rfc7230#section-3.2.4
""" Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data. """ Here, it is the HTML specification that is enforcing that we use UTF-8 encoding to this particular header value.
Alexey Proskuryakov
Comment 10
2020-07-27 18:06:35 PDT
rdar://problem/13799963
Alexey Proskuryakov
Comment 11
2020-07-27 18:08:29 PDT
I don't think that it's a CFNetwork *bug*, but I think that there should be a way to get this via API cleanly, see
rdar://problem/13800011
.
Chris Dumez
Comment 12
2020-07-28 12:54:15 PDT
Created
attachment 405397
[details]
Patch
Chris Dumez
Comment 13
2020-07-28 12:59:46 PDT
As discussed on Slack, it is not safe to UTF-8 encode all header and is causing some test failures (Those tests seem to pass various octet sequences for a custom header and do not expect the value to be UTF-8 encoded). The HTTP RFC suggests that any octet is valid in header field value (in theory), that historically it was Latin1, and that new header fields SHOULD only use US-ASCII
Alex Christensen
Comment 14
2020-07-28 14:20:53 PDT
Comment on
attachment 405397
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405397&action=review
> Source/WebCore/platform/network/cocoa/ResourceRequestCocoa.mm:198 > + [nsRequest setValue:encodedValue.bridgingAutorelease() forHTTPHeaderField:header.key];
This will cause a spike in the autoreleasepool size. Why not just encodedValue.get()? Then it's immediately released after use.
Darin Adler
Comment 15
2020-07-28 14:24:13 PDT
Comment on
attachment 405397
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=405397&action=review
> Source/WebCore/platform/network/cf/ResourceRequestCFNet.h:91 > + // HTTP headers are normally latin1 but the HTML specification says to use UTF-8 encoding for the Last-Event-ID header value.
Latin-1, not "latin1". Missing second half of the comment. Something like this: // Constructing a string with the UTF-8 bytes but claiming that it’s Latin-1 is the way to get CFNetwork to put those UTF-8 bytes on the wire.
> Source/WebCore/platform/network/cocoa/ResourceRequestCocoa.mm:198 > + [nsRequest setValue:encodedValue.bridgingAutorelease() forHTTPHeaderField:header.key];
Slightly wasteful to use autorelease here. We do need bridging, but not autorelease. Could write (__bridge CFStringRef)encodedValue.get() but it would be even better if it could be done more elegantly.
Chris Dumez
Comment 16
2020-07-28 14:37:48 PDT
Created
attachment 405415
[details]
Patch
EWS
Comment 17
2020-07-28 16:15:17 PDT
Committed
r265014
: <
https://trac.webkit.org/changeset/265014
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 405415
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug