Summary: | EventSource fails to connect if Content-Type header has a charset attribute | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Richard Zschech <richard> | ||||
Component: | WebCore Misc. | Assignee: | Julien Chaffraix <jchaffraix> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | adam.bergkvist, annevk, ap, commit-queue, eric, goberman | ||||
Priority: | P2 | ||||||
Version: | 528+ (Nightly build) | ||||||
Hardware: | PC | ||||||
OS: | Windows XP | ||||||
Attachments: |
|
Description
Richard Zschech
2010-09-08 00:27:57 PDT
Since UTF-8 is the only encoding allowed for text/event-stream, the charset parameter doesn't add anything. However, what we perhaps should do is to make sure that anyhow specifying charset=UTF-8 doesn't fail. Tests: http://tc.labs.opera.com/apis/EventSource/format-mime-trailing-semicolon.htm http://tc.labs.opera.com/apis/EventSource/format-utf-8.htm Can be used under any license you want. (Opera should start matching these tests soon too. Hopefully we can then move the specification forward.) Created attachment 67641 [details]
Proposed fix: Use the MIME type and not directly the Content-Type header
I just saw that Opera has some nice test-cases. The 2 test cases included in the fix are complementary to one mentioned by Anne.
We may want to include the whole test suite to make sure we match the specification and Opera (my testing showed that WebKit was the only one to support EventSource for now).
Comment on attachment 67641 [details] Proposed fix: Use the MIME type and not directly the Content-Type header > + "text/event-stream" does not work. text/event-stream-foobar > - if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") { > + if (statusCode == 200 && response.mimeType() == "text/event-stream") { Do we disable content sniffing for EventSource? MIME type is normally subject to content sniffing, while ResourceResponse.httpHeaderField("Content-Type") is not. So, there is a chance that mimeType() will return text/plain. The rest of the patch looks fine to me. Yes we want to disable content sniffing for EventSource. Content sniffing generally requires buffering a certain amount of data before giving it to the application which would be bad for EventSource. (In reply to comment #3) > - if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") { > + if (statusCode == 200 && response.mimeType() == "text/event-stream") { Besides checking the mime type, shouldn't we verify that the charset parameter, if present, is UTF-8? I think the test eventsource-content-type-text-event-stream-foobar is redundant, see the existing eventsource-bad-mime-type test. (In reply to comment #4) Yes, content sniffing is indeed disabled for the reason mentioned in comment #5. (In reply to comment #6) > (In reply to comment #3) > > > - if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") { > > + if (statusCode == 200 && response.mimeType() == "text/event-stream") { > > Besides checking the mime type, shouldn't we verify that the charset parameter, > if present, is UTF-8? The question is what should we do if the charset is not UTF-8. The specification does not allow anything else so we need to define this behaviour. My solution was just to ignore anything not part of the MIME type. We may want to send a console warning if we have Content-Type != MIME type. > I think the test eventsource-content-type-text-event-stream-foobar is redundant, see the existing eventsource-bad-mime-type test. Yes and no. Your test checks a completely unrelated mime type ("text/bogus"). I just wanted to make sure we don't match MIME types that starts with "text/event-stream". I think it is still relevant and complement your test. >The question is what should we do if the charset is not UTF-8.
I slightly prefer failing the connection. It seems unsafe to ignore what the server says, and also, it is not unthinkable that in the future, arbitrary encodings will be supported (not that HTML5 is going to change soon, but what if, say, a dominant browser implements it that way?)
(In reply to comment #7) >Yes and no. Your test checks a completely unrelated mime type ("text/bogus"). I just wanted to make sure we don't match MIME types that starts with "text/event-stream". I think it is still relevant and complement your test. This could be a test for a test suite to verify that implementations don't rely on only matching the beginning of the mime type. However, it is not a regression test for this particular bug, since mime type "text/event-stream-foobar" already fails with the current implementation. (In reply to comment #8) > I slightly prefer failing the connection. It seems unsafe to ignore what the server says, and also, it is not unthinkable that in the future, arbitrary encodings will be supported (not that HTML5 is going to change soon, but what if, say, a dominant browser implements it that way?) I agree. Per my reading of the specification the charset parameter should just be ignored. No different from text/cache-manifest. That is what our test suite is testing. That makes the most sense as well, given that servers might sometimes append weird things there. I feel that the spec is written in a manner that's less than pragmatic in this regard. Trying to pretend that text encodings don't exist on the web doesn't seem helpful neither short term nor long term. It certainly complicates implementation to have "new" features take a different code path for text decoding. It also seems that EventSource or HTML5 don't really have a say in how charset should be handled - it's defined by HTTP spec. The higher level specs can mandate that some encodings must be supported, or prohibit establishing a connection when other ones are used, but they can't say that the charset parameter doesn't exist for their purposes. Anyway, I don't see huge harm from going either way. > Anyway, I don't see huge harm from going either way.
Currently we are just ignoring anything that is not in the MIME type in our appcache implementation. The appcache standard is also silent on the charset handling so it would make sense to follow the same pattern for coherency in the EventSource.
Alexey, you mentioned that there may be security issues involved in ignoring what the server send. If this is the case, then the 2 approaches are not symmetrical and we should just fail early and update our appcache implementation to do the same.
> it would make sense to follow the same pattern for coherency in the EventSource Or vice versa. > Alexey, you mentioned that there may be security issues involved in ignoring what the server send. I didn't say the "security" word. Some antivirus software may not be aware of this HTTP violation, so it may misparse event streams and manifests, but I don't see why antivirus software would want to look inside those anyway. > > it would make sense to follow the same pattern for coherency in the EventSource > > Or vice versa. Right :) > > Alexey, you mentioned that there may be security issues involved in ignoring what the server send. > > I didn't say the "security" word. Sorry for jumping a bit on this one. > Some antivirus software may not be aware of this HTTP violation, so it may misparse event streams and manifests, but I don't see why antivirus software would want to look inside those anyway. Looks like this is something we would like to avoid in the absolute. I am not set up on the solution but taking the charset into account has this upside (plus the support for other charset in the future). Unless there is some strong resistance from Anne, this is way I will go for the next iteration. I do not think this is a HTTP violation. Each media type gets to decide whether the charset parameter works for it. (It's a feature of the media type, not the Content-Type header.) If you want to change the specifications for EventSource and appcache I suggest you get the relevant specifications changed. I'd say that practically, text/ subtypes don't get a chance to pretend that they are binary. I don't think that this change should be blocked on changing (or in my reading, clarifying) HTML5 and EventSource specs. The fact that WebKit will be slightly more strict in this edge case shouldn't do harm to anyone. I (the original submitter) am happy with the distinction between mime type and content type as described in comment #16. I agree that the stricter the better. I'm happy for this to be closed as invalid. The main problem is that it was difficult to determine what the issue was because the error handler gets no information about the error. I had to read the source code to determine the possible causes of an error. That's a great point, we should definitely add logging here. Comment on attachment 67641 [details]
Proposed fix: Use the MIME type and not directly the Content-Type header
Seems this is approved and only needs to be landed?
Logging can be done in a follow-up patch, no?
Comment on attachment 67641 [details] Proposed fix: Use the MIME type and not directly the Content-Type header Clearing flags on attachment: 67641 Committed r74011: <http://trac.webkit.org/changeset/74011> All reviewed patches have been landed. Closing bug. > > + "text/event-stream" does not work.
> text/event-stream-foobar
This comment hasn't been addressed, so the patch shouldn't have been marked cq+.
Another reason why it shouldn't have been marked cq+ is that it has been posted by a committer, and never marked cq?. Patches shouldn't be landed like this.
You should feel welcome to roll it out. It was sitting marked r+ for 2 months, I figured it was time to land or r-. Seemed landable. :) I'm also happy to roll it out if that's the desire. Just say: sheriffbot: rollout 74011 "Was landed without addressing final comment" in #webkit or: webkit-patch rollout 74011 "Was landed without addressing final comment" on your command line and the magic will happen for you. I don't feel like rolling this out just because of a mistake in ChangeLog. But I'd welcome comments from Julien. Still, the new test eventsource-content-type-text-event-stream-foobar.html is not a regression test for this bug. It also duplicates eventsource-bad-mime-type.html. Adam, would you be willing to clean this up? I'm not sure if Julien is available at this time. > I don't feel like rolling this out just because of a mistake in ChangeLog. But I'd welcome comments from Julien.
Sorry for the mess. This patch should not have gone in because of the issues you and Adam mentioned. Also it does not reflect the discussions we had on this bug.
My take would be just to revert it. I will revisit a better alternative for the patch but it will not be before January.
Per <http://code.google.com/p/chromium/issues/detail?id=66666>, "Tomcat always adds "charset" to the Content-Type in the response. The only workaround is to modify Tomcat internals." (In reply to comment #30) > Per <http://code.google.com/p/chromium/issues/detail?id=66666>, "Tomcat always adds "charset" to the Content-Type in the response. The only workaround is to modify Tomcat internals." Follow-up on this bug: see bug56942 Hello folks, I am trying to get EventSource working for streaming financial market data and it has been a pain so far. Can someone do me a favor and clarify the following questions: 1) The "charset attribute" bug has been fixed in the latest Chrome, but still is a problem with Safari. Is there some wiki page that describes what WebKit version is used for the Chrome/ Safari. 2) There is still a problem with memory leak that I filed about half a year ago: http://code.google.com/p/chromium/issues/detail?id=68160 I see no progress on it (Chrome bugs have terrible turnaround in general). What is the right place to file this issue for it to get resolved? 3) I set "Access-Control-Allow-Origin" header to "*" in server response, but cross domain calls fail. It is very reasonable to allow these. Can it be enhanced? I can file a feature request if necessary. Thanks in advance. Bugs are no place to ask questions. Please direct them to the webkit-help mailing list (http://www.webkit.org/contact.html). If you can confirm that your bug is reproducible on Safari, please file a bug on this bugzilla as it would be WebKit specific. Make sure you include the way to reproduce it. Thanks! OK, Thanks. I filed 2 new bugs for EventSource: https://bugs.webkit.org/show_bug.cgi?id=61863 https://bugs.webkit.org/show_bug.cgi?id=61862 This bug was fixed for a while and even worked with Chrome for months. However, I have noticed it is broken again in the latest Chrome. I retested it with the WebKit nightly build and it does not work ether. The bug needs to be reopened. Can someone change the status to open, or do I need to file a new bug? (In reply to comment #35) > This bug was fixed for a while and even worked with Chrome for months. > However, I have noticed it is broken again in the latest Chrome. I retested it with the WebKit nightly build and it does not work ether. > > The bug needs to be reopened. Can someone change the status to open, or do I need to file a new bug? As this was fixed at one point but was regressed, please file another bug and CC me on it. We need to investigate how / why it regressed. |