RESOLVED FIXED 45372
EventSource fails to connect if Content-Type header has a charset attribute
https://bugs.webkit.org/show_bug.cgi?id=45372
Summary EventSource fails to connect if Content-Type header has a charset attribute
Richard Zschech
Reported 2010-09-08 00:27:57 PDT
For example: HTTP/1.1 200 OK Content-Type: text/event-stream; charset=UTF-8 Fails with an error event being received to the onerror handler.
Attachments
Proposed fix: Use the MIME type and not directly the Content-Type header (8.23 KB, patch)
2010-09-14 21:42 PDT, Julien Chaffraix
no flags
Adam Bergkvist
Comment 1 2010-09-14 07:09:02 PDT
Since UTF-8 is the only encoding allowed for text/event-stream, the charset parameter doesn't add anything. However, what we perhaps should do is to make sure that anyhow specifying charset=UTF-8 doesn't fail.
Anne van Kesteren
Comment 2 2010-09-14 10:53:47 PDT
Tests: http://tc.labs.opera.com/apis/EventSource/format-mime-trailing-semicolon.htm http://tc.labs.opera.com/apis/EventSource/format-utf-8.htm Can be used under any license you want. (Opera should start matching these tests soon too. Hopefully we can then move the specification forward.)
Julien Chaffraix
Comment 3 2010-09-14 21:42:57 PDT
Created attachment 67641 [details] Proposed fix: Use the MIME type and not directly the Content-Type header I just saw that Opera has some nice test-cases. The 2 test cases included in the fix are complementary to one mentioned by Anne. We may want to include the whole test suite to make sure we match the specification and Opera (my testing showed that WebKit was the only one to support EventSource for now).
Alexey Proskuryakov
Comment 4 2010-09-14 22:11:05 PDT
Comment on attachment 67641 [details] Proposed fix: Use the MIME type and not directly the Content-Type header > + "text/event-stream" does not work. text/event-stream-foobar > - if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") { > + if (statusCode == 200 && response.mimeType() == "text/event-stream") { Do we disable content sniffing for EventSource? MIME type is normally subject to content sniffing, while ResourceResponse.httpHeaderField("Content-Type") is not. So, there is a chance that mimeType() will return text/plain. The rest of the patch looks fine to me.
Richard Zschech
Comment 5 2010-09-15 00:09:01 PDT
Yes we want to disable content sniffing for EventSource. Content sniffing generally requires buffering a certain amount of data before giving it to the application which would be bad for EventSource.
Adam Bergkvist
Comment 6 2010-09-15 07:26:57 PDT
(In reply to comment #3) > - if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") { > + if (statusCode == 200 && response.mimeType() == "text/event-stream") { Besides checking the mime type, shouldn't we verify that the charset parameter, if present, is UTF-8? I think the test eventsource-content-type-text-event-stream-foobar is redundant, see the existing eventsource-bad-mime-type test. (In reply to comment #4) Yes, content sniffing is indeed disabled for the reason mentioned in comment #5.
Julien Chaffraix
Comment 7 2010-09-15 14:18:56 PDT
(In reply to comment #6) > (In reply to comment #3) > > > - if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") { > > + if (statusCode == 200 && response.mimeType() == "text/event-stream") { > > Besides checking the mime type, shouldn't we verify that the charset parameter, > if present, is UTF-8? The question is what should we do if the charset is not UTF-8. The specification does not allow anything else so we need to define this behaviour. My solution was just to ignore anything not part of the MIME type. We may want to send a console warning if we have Content-Type != MIME type. > I think the test eventsource-content-type-text-event-stream-foobar is redundant, see the existing eventsource-bad-mime-type test. Yes and no. Your test checks a completely unrelated mime type ("text/bogus"). I just wanted to make sure we don't match MIME types that starts with "text/event-stream". I think it is still relevant and complement your test.
Alexey Proskuryakov
Comment 8 2010-09-15 16:08:01 PDT
>The question is what should we do if the charset is not UTF-8. I slightly prefer failing the connection. It seems unsafe to ignore what the server says, and also, it is not unthinkable that in the future, arbitrary encodings will be supported (not that HTML5 is going to change soon, but what if, say, a dominant browser implements it that way?)
Adam Bergkvist
Comment 9 2010-09-16 04:41:22 PDT
(In reply to comment #7) >Yes and no. Your test checks a completely unrelated mime type ("text/bogus"). I just wanted to make sure we don't match MIME types that starts with "text/event-stream". I think it is still relevant and complement your test. This could be a test for a test suite to verify that implementations don't rely on only matching the beginning of the mime type. However, it is not a regression test for this particular bug, since mime type "text/event-stream-foobar" already fails with the current implementation. (In reply to comment #8) > I slightly prefer failing the connection. It seems unsafe to ignore what the server says, and also, it is not unthinkable that in the future, arbitrary encodings will be supported (not that HTML5 is going to change soon, but what if, say, a dominant browser implements it that way?) I agree.
Anne van Kesteren
Comment 10 2010-09-17 00:38:45 PDT
Per my reading of the specification the charset parameter should just be ignored. No different from text/cache-manifest. That is what our test suite is testing. That makes the most sense as well, given that servers might sometimes append weird things there.
Alexey Proskuryakov
Comment 11 2010-09-17 00:46:01 PDT
I feel that the spec is written in a manner that's less than pragmatic in this regard. Trying to pretend that text encodings don't exist on the web doesn't seem helpful neither short term nor long term. It certainly complicates implementation to have "new" features take a different code path for text decoding.
Alexey Proskuryakov
Comment 12 2010-09-17 11:09:01 PDT
It also seems that EventSource or HTML5 don't really have a say in how charset should be handled - it's defined by HTTP spec. The higher level specs can mandate that some encodings must be supported, or prohibit establishing a connection when other ones are used, but they can't say that the charset parameter doesn't exist for their purposes. Anyway, I don't see huge harm from going either way.
Julien Chaffraix
Comment 13 2010-09-20 20:05:32 PDT
> Anyway, I don't see huge harm from going either way. Currently we are just ignoring anything that is not in the MIME type in our appcache implementation. The appcache standard is also silent on the charset handling so it would make sense to follow the same pattern for coherency in the EventSource. Alexey, you mentioned that there may be security issues involved in ignoring what the server send. If this is the case, then the 2 approaches are not symmetrical and we should just fail early and update our appcache implementation to do the same.
Alexey Proskuryakov
Comment 14 2010-09-21 10:12:12 PDT
> it would make sense to follow the same pattern for coherency in the EventSource Or vice versa. > Alexey, you mentioned that there may be security issues involved in ignoring what the server send. I didn't say the "security" word. Some antivirus software may not be aware of this HTTP violation, so it may misparse event streams and manifests, but I don't see why antivirus software would want to look inside those anyway.
Julien Chaffraix
Comment 15 2010-09-22 19:58:20 PDT
> > it would make sense to follow the same pattern for coherency in the EventSource > > Or vice versa. Right :) > > Alexey, you mentioned that there may be security issues involved in ignoring what the server send. > > I didn't say the "security" word. Sorry for jumping a bit on this one. > Some antivirus software may not be aware of this HTTP violation, so it may misparse event streams and manifests, but I don't see why antivirus software would want to look inside those anyway. Looks like this is something we would like to avoid in the absolute. I am not set up on the solution but taking the charset into account has this upside (plus the support for other charset in the future). Unless there is some strong resistance from Anne, this is way I will go for the next iteration.
Anne van Kesteren
Comment 16 2010-09-23 02:58:43 PDT
I do not think this is a HTTP violation. Each media type gets to decide whether the charset parameter works for it. (It's a feature of the media type, not the Content-Type header.) If you want to change the specifications for EventSource and appcache I suggest you get the relevant specifications changed.
Alexey Proskuryakov
Comment 17 2010-09-23 08:54:11 PDT
I'd say that practically, text/ subtypes don't get a chance to pretend that they are binary. I don't think that this change should be blocked on changing (or in my reading, clarifying) HTML5 and EventSource specs. The fact that WebKit will be slightly more strict in this edge case shouldn't do harm to anyone.
Richard Zschech
Comment 18 2010-09-23 18:46:42 PDT
I (the original submitter) am happy with the distinction between mime type and content type as described in comment #16. I agree that the stricter the better. I'm happy for this to be closed as invalid. The main problem is that it was difficult to determine what the issue was because the error handler gets no information about the error. I had to read the source code to determine the possible causes of an error.
Alexey Proskuryakov
Comment 19 2010-09-23 20:43:07 PDT
That's a great point, we should definitely add logging here.
Eric Seidel (no email)
Comment 20 2010-12-14 01:52:38 PST
Comment on attachment 67641 [details] Proposed fix: Use the MIME type and not directly the Content-Type header Seems this is approved and only needs to be landed? Logging can be done in a follow-up patch, no?
WebKit Commit Bot
Comment 21 2010-12-14 03:30:01 PST
Comment on attachment 67641 [details] Proposed fix: Use the MIME type and not directly the Content-Type header Clearing flags on attachment: 67641 Committed r74011: <http://trac.webkit.org/changeset/74011>
WebKit Commit Bot
Comment 22 2010-12-14 03:30:07 PST
All reviewed patches have been landed. Closing bug.
Alexey Proskuryakov
Comment 23 2010-12-14 11:35:41 PST
> > + "text/event-stream" does not work. > text/event-stream-foobar This comment hasn't been addressed, so the patch shouldn't have been marked cq+. Another reason why it shouldn't have been marked cq+ is that it has been posted by a committer, and never marked cq?. Patches shouldn't be landed like this.
Eric Seidel (no email)
Comment 24 2010-12-14 11:51:37 PST
You should feel welcome to roll it out. It was sitting marked r+ for 2 months, I figured it was time to land or r-. Seemed landable. :)
Eric Seidel (no email)
Comment 25 2010-12-14 12:06:04 PST
I'm also happy to roll it out if that's the desire. Just say: sheriffbot: rollout 74011 "Was landed without addressing final comment" in #webkit or: webkit-patch rollout 74011 "Was landed without addressing final comment" on your command line and the magic will happen for you.
Alexey Proskuryakov
Comment 26 2010-12-14 12:39:10 PST
I don't feel like rolling this out just because of a mistake in ChangeLog. But I'd welcome comments from Julien.
Adam Bergkvist
Comment 27 2010-12-15 01:58:08 PST
Still, the new test eventsource-content-type-text-event-stream-foobar.html is not a regression test for this bug. It also duplicates eventsource-bad-mime-type.html.
Alexey Proskuryakov
Comment 28 2010-12-15 02:10:09 PST
Adam, would you be willing to clean this up? I'm not sure if Julien is available at this time.
Julien Chaffraix
Comment 29 2010-12-15 02:35:37 PST
> I don't feel like rolling this out just because of a mistake in ChangeLog. But I'd welcome comments from Julien. Sorry for the mess. This patch should not have gone in because of the issues you and Adam mentioned. Also it does not reflect the discussions we had on this bug. My take would be just to revert it. I will revisit a better alternative for the patch but it will not be before January.
Alexey Proskuryakov
Comment 30 2010-12-28 10:03:41 PST
Per <http://code.google.com/p/chromium/issues/detail?id=66666>, "Tomcat always adds "charset" to the Content-Type in the response. The only workaround is to modify Tomcat internals."
Julien Chaffraix
Comment 31 2011-03-23 11:34:01 PDT
(In reply to comment #30) > Per <http://code.google.com/p/chromium/issues/detail?id=66666>, "Tomcat always adds "charset" to the Content-Type in the response. The only workaround is to modify Tomcat internals." Follow-up on this bug: see bug56942
goberman
Comment 32 2011-06-01 07:01:46 PDT
Hello folks, I am trying to get EventSource working for streaming financial market data and it has been a pain so far. Can someone do me a favor and clarify the following questions: 1) The "charset attribute" bug has been fixed in the latest Chrome, but still is a problem with Safari. Is there some wiki page that describes what WebKit version is used for the Chrome/ Safari. 2) There is still a problem with memory leak that I filed about half a year ago: http://code.google.com/p/chromium/issues/detail?id=68160 I see no progress on it (Chrome bugs have terrible turnaround in general). What is the right place to file this issue for it to get resolved? 3) I set "Access-Control-Allow-Origin" header to "*" in server response, but cross domain calls fail. It is very reasonable to allow these. Can it be enhanced? I can file a feature request if necessary. Thanks in advance.
Julien Chaffraix
Comment 33 2011-06-01 08:06:40 PDT
Bugs are no place to ask questions. Please direct them to the webkit-help mailing list (http://www.webkit.org/contact.html). If you can confirm that your bug is reproducible on Safari, please file a bug on this bugzilla as it would be WebKit specific. Make sure you include the way to reproduce it. Thanks!
goberman
Comment 34 2011-06-01 08:19:57 PDT
goberman
Comment 35 2011-06-23 08:54:52 PDT
This bug was fixed for a while and even worked with Chrome for months. However, I have noticed it is broken again in the latest Chrome. I retested it with the WebKit nightly build and it does not work ether. The bug needs to be reopened. Can someone change the status to open, or do I need to file a new bug?
Julien Chaffraix
Comment 36 2011-06-23 09:16:29 PDT
(In reply to comment #35) > This bug was fixed for a while and even worked with Chrome for months. > However, I have noticed it is broken again in the latest Chrome. I retested it with the WebKit nightly build and it does not work ether. > > The bug needs to be reopened. Can someone change the status to open, or do I need to file a new bug? As this was fixed at one point but was regressed, please file another bug and CC me on it. We need to investigate how / why it regressed.
goberman
Comment 37 2011-06-23 10:01:52 PDT
Note You need to log in before you can comment on or make changes to this bug.