Bug 45372

Summary: EventSource fails to connect if Content-Type header has a charset attribute
Product: WebKit Reporter: Richard Zschech <richard>
Component: WebCore Misc.Assignee: Julien Chaffraix <jchaffraix>
Status: RESOLVED FIXED    
Severity: Normal CC: adam.bergkvist, annevk, ap, commit-queue, eric, goberman
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Windows XP   
Attachments:
Description Flags
Proposed fix: Use the MIME type and not directly the Content-Type header none

Description Richard Zschech 2010-09-08 00:27:57 PDT
For example:

HTTP/1.1 200 OK
Content-Type: text/event-stream; charset=UTF-8

Fails with an error event being received to the onerror handler.
Comment 1 Adam Bergkvist 2010-09-14 07:09:02 PDT
Since UTF-8 is the only encoding allowed for text/event-stream, the charset parameter doesn't add anything. However, what we perhaps should do is to make sure that anyhow specifying charset=UTF-8 doesn't fail.
Comment 2 Anne van Kesteren 2010-09-14 10:53:47 PDT
Tests:

http://tc.labs.opera.com/apis/EventSource/format-mime-trailing-semicolon.htm
http://tc.labs.opera.com/apis/EventSource/format-utf-8.htm

Can be used under any license you want.

(Opera should start matching these tests soon too. Hopefully we can then move the specification forward.)
Comment 3 Julien Chaffraix 2010-09-14 21:42:57 PDT
Created attachment 67641 [details]
Proposed fix: Use the MIME type and not directly the Content-Type header

I just saw that Opera has some nice test-cases. The 2 test cases included in the fix are complementary to one mentioned by Anne.

We may want to include the whole test suite to make sure we match the specification and Opera (my testing showed that WebKit was the only one to support EventSource for now).
Comment 4 Alexey Proskuryakov 2010-09-14 22:11:05 PDT
Comment on attachment 67641 [details]
Proposed fix: Use the MIME type and not directly the Content-Type header

> +        "text/event-stream" does not work.

text/event-stream-foobar

> -    if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") {
> +    if (statusCode == 200 && response.mimeType() == "text/event-stream") {

Do we disable content sniffing for EventSource? MIME type is normally subject to content sniffing, while ResourceResponse.httpHeaderField("Content-Type") is not. So, there is a chance that mimeType() will return text/plain.

The rest of the patch looks fine to me.
Comment 5 Richard Zschech 2010-09-15 00:09:01 PDT
Yes we want to disable content sniffing for EventSource. Content sniffing generally requires buffering a certain amount of data before giving it to the application which would be bad for EventSource.
Comment 6 Adam Bergkvist 2010-09-15 07:26:57 PDT
(In reply to comment #3)

> -    if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") {
> +    if (statusCode == 200 && response.mimeType() == "text/event-stream") {

Besides checking the mime type, shouldn't we verify that the charset parameter, if present, is UTF-8?

I think the test eventsource-content-type-text-event-stream-foobar is redundant, see the existing eventsource-bad-mime-type test.


(In reply to comment #4)

Yes, content sniffing is indeed disabled for the reason mentioned in comment #5.
Comment 7 Julien Chaffraix 2010-09-15 14:18:56 PDT
(In reply to comment #6)
> (In reply to comment #3)
> 
> > -    if (statusCode == 200 && response.httpHeaderField("Content-Type") == "text/event-stream") {
> > +    if (statusCode == 200 && response.mimeType() == "text/event-stream") {
> 
> Besides checking the mime type, shouldn't we verify that the charset parameter,
> if present, is UTF-8?

The question is what should we do if the charset is not UTF-8. The specification does not allow anything else so we need to define this behaviour.

My solution was just to ignore anything not part of the MIME type. We may want to send a console warning if we have Content-Type != MIME type.

> I think the test eventsource-content-type-text-event-stream-foobar is 
redundant, see the existing eventsource-bad-mime-type test.

Yes and no. Your test checks a completely unrelated mime type ("text/bogus"). I just wanted to make sure we don't match MIME types that starts with "text/event-stream". I think it is still relevant and complement your test.
Comment 8 Alexey Proskuryakov 2010-09-15 16:08:01 PDT
>The question is what should we do if the charset is not UTF-8.

I slightly prefer failing the connection. It seems unsafe to ignore what the server says, and also, it is not unthinkable that in the future, arbitrary encodings will be supported (not that HTML5 is going to change soon, but what if, say, a dominant browser implements it that way?)
Comment 9 Adam Bergkvist 2010-09-16 04:41:22 PDT
(In reply to comment #7)

>Yes and no. Your test checks a completely unrelated mime type ("text/bogus"). I just wanted to make sure we don't match MIME types that starts with "text/event-stream". I think it is still relevant and complement your test.

This could be a test for a test suite to verify that implementations don't rely on only matching the beginning of the mime type. However, it is not a regression test for this particular bug, since mime type "text/event-stream-foobar" already fails with the current implementation.

(In reply to comment #8)

> I slightly prefer failing the connection. It seems unsafe to ignore what the server says, and also, it is not unthinkable that in the future, arbitrary encodings will be supported (not that HTML5 is going to change soon, but what if, say, a dominant browser implements it that way?)

I agree.
Comment 10 Anne van Kesteren 2010-09-17 00:38:45 PDT
Per my reading of the specification the charset parameter should just be ignored. No different from text/cache-manifest. That is what our test suite is testing. That makes the most sense as well, given that servers might sometimes append weird things there.
Comment 11 Alexey Proskuryakov 2010-09-17 00:46:01 PDT
I feel that the spec is written in a manner that's less than pragmatic in this regard. Trying to pretend that text encodings don't exist on the web doesn't seem helpful neither short term nor long term. It certainly complicates implementation to have "new" features take a different code path for text decoding.
Comment 12 Alexey Proskuryakov 2010-09-17 11:09:01 PDT
It also seems that EventSource or HTML5 don't really have a say in how charset should be handled - it's defined by HTTP spec. The higher level specs can mandate that some encodings must be supported, or prohibit establishing a connection when other ones are used, but they can't say that the charset parameter doesn't exist for their purposes.

Anyway, I don't see huge harm from going either way.
Comment 13 Julien Chaffraix 2010-09-20 20:05:32 PDT
> Anyway, I don't see huge harm from going either way.

Currently we are just ignoring anything that is not in the MIME type in our appcache implementation. The appcache standard is also silent on the charset handling so it would make sense to follow the same pattern for coherency in the EventSource.

Alexey, you mentioned that there may be security issues involved in ignoring what the server send. If this is the case, then the 2 approaches are not symmetrical and we should just fail early and update our appcache implementation to do the same.
Comment 14 Alexey Proskuryakov 2010-09-21 10:12:12 PDT
> it would make sense to follow the same pattern for coherency in the EventSource

Or vice versa.

> Alexey, you mentioned that there may be security issues involved in ignoring what the server send.

I didn't say the "security" word. Some antivirus software may not be aware of this HTTP violation, so it may misparse event streams and manifests, but I don't see why antivirus software would want to look inside those anyway.
Comment 15 Julien Chaffraix 2010-09-22 19:58:20 PDT
> > it would make sense to follow the same pattern for coherency in the EventSource
> 
> Or vice versa.

Right :)

> > Alexey, you mentioned that there may be security issues involved in ignoring what the server send.
> 
> I didn't say the "security" word. 

Sorry for jumping a bit on this one.

> Some antivirus software may not be aware of this HTTP violation, so it may misparse event streams and manifests, but I don't see why antivirus software would want to look inside those anyway.

Looks like this is something we would like to avoid in the absolute. I am not set up on the solution but taking the charset into account has this upside (plus the support for other charset in the future). Unless there is some strong resistance from Anne, this is way I will go for the next iteration.
Comment 16 Anne van Kesteren 2010-09-23 02:58:43 PDT
I do not think this is a HTTP violation. Each media type gets to decide whether the charset parameter works for it. (It's a feature of the media type, not the Content-Type header.)

If you want to change the specifications for EventSource and appcache I suggest you get the relevant specifications changed.
Comment 17 Alexey Proskuryakov 2010-09-23 08:54:11 PDT
I'd say that practically, text/ subtypes don't get a chance to pretend that they are binary.

I don't think that this change should be blocked on changing (or in my reading, clarifying) HTML5 and EventSource specs. The fact that WebKit will be slightly more strict in this edge case shouldn't do harm to anyone.
Comment 18 Richard Zschech 2010-09-23 18:46:42 PDT
I (the original submitter) am happy with the distinction between mime type and content type as described in comment #16. I agree that the stricter the better.

I'm happy for this to be closed as invalid.

The main problem is that it was difficult to determine what the issue was because the error handler gets no information about the error. I had to read the source code to determine the possible causes of an error.
Comment 19 Alexey Proskuryakov 2010-09-23 20:43:07 PDT
That's a great point, we should definitely add logging here.
Comment 20 Eric Seidel (no email) 2010-12-14 01:52:38 PST
Comment on attachment 67641 [details]
Proposed fix: Use the MIME type and not directly the Content-Type header

Seems this is approved and only needs to be landed?

Logging can be done in a follow-up patch, no?
Comment 21 WebKit Commit Bot 2010-12-14 03:30:01 PST
Comment on attachment 67641 [details]
Proposed fix: Use the MIME type and not directly the Content-Type header

Clearing flags on attachment: 67641

Committed r74011: <http://trac.webkit.org/changeset/74011>
Comment 22 WebKit Commit Bot 2010-12-14 03:30:07 PST
All reviewed patches have been landed.  Closing bug.
Comment 23 Alexey Proskuryakov 2010-12-14 11:35:41 PST
> > +        "text/event-stream" does not work.
> text/event-stream-foobar

This comment hasn't been addressed, so the patch shouldn't have been marked cq+.

Another reason why it shouldn't have been marked cq+ is that it has been posted by a committer, and never marked cq?. Patches shouldn't be landed like this.
Comment 24 Eric Seidel (no email) 2010-12-14 11:51:37 PST
You should feel welcome to roll it out.  It was sitting marked r+ for 2 months, I figured it was time to land or r-.  Seemed landable. :)
Comment 25 Eric Seidel (no email) 2010-12-14 12:06:04 PST
I'm also happy to roll it out if that's the desire.

Just say:
sheriffbot: rollout 74011 "Was landed without addressing final comment"

in #webkit

or:
webkit-patch rollout 74011 "Was landed without addressing final comment"

on your command line and the magic will happen for you.
Comment 26 Alexey Proskuryakov 2010-12-14 12:39:10 PST
I don't feel like rolling this out just because of a mistake in ChangeLog. But I'd welcome comments from Julien.
Comment 27 Adam Bergkvist 2010-12-15 01:58:08 PST
Still, the new test eventsource-content-type-text-event-stream-foobar.html is not a regression test for this bug. It also duplicates eventsource-bad-mime-type.html.
Comment 28 Alexey Proskuryakov 2010-12-15 02:10:09 PST
Adam, would you be willing to clean this up? I'm not sure if Julien is available at this time.
Comment 29 Julien Chaffraix 2010-12-15 02:35:37 PST
> I don't feel like rolling this out just because of a mistake in ChangeLog. But I'd welcome comments from Julien.

Sorry for the mess. This patch should not have gone in because of the issues you and Adam mentioned. Also it does not reflect the discussions we had on this bug.

My take would be just to revert it. I will revisit a better alternative for the patch but it will not be before January.
Comment 30 Alexey Proskuryakov 2010-12-28 10:03:41 PST
Per <http://code.google.com/p/chromium/issues/detail?id=66666>, "Tomcat always adds "charset" to the Content-Type in the response. The only workaround is to modify Tomcat internals."
Comment 31 Julien Chaffraix 2011-03-23 11:34:01 PDT
(In reply to comment #30)
> Per <http://code.google.com/p/chromium/issues/detail?id=66666>, "Tomcat always adds "charset" to the Content-Type in the response. The only workaround is to modify Tomcat internals."

Follow-up on this bug: see bug56942
Comment 32 goberman 2011-06-01 07:01:46 PDT
Hello folks,
I am trying to get EventSource working for streaming financial market data and it has been a pain so far. Can someone do me a favor and clarify the following questions:
1) The "charset attribute" bug has been fixed in the latest Chrome, but still is a problem with Safari. Is there some wiki page that describes what WebKit version is used for the Chrome/ Safari.

2) There is still a problem with memory leak that I filed about half a year ago:
http://code.google.com/p/chromium/issues/detail?id=68160 I see no progress on it (Chrome bugs have terrible turnaround in general). What is the right place to file this issue for it to get resolved?

3) I set "Access-Control-Allow-Origin" header to "*" in server response, but cross domain calls fail. It is very reasonable to allow these. Can it be enhanced? I can file a feature request if necessary.

Thanks in advance.
Comment 33 Julien Chaffraix 2011-06-01 08:06:40 PDT
Bugs are no place to ask questions. Please direct them to the webkit-help mailing list (http://www.webkit.org/contact.html).

If you can confirm that your bug is reproducible on Safari, please file a bug on this bugzilla as it would be WebKit specific. Make sure you include the way to reproduce it.

Thanks!
Comment 34 goberman 2011-06-01 08:19:57 PDT
OK, Thanks. I filed 2 new bugs for EventSource:
https://bugs.webkit.org/show_bug.cgi?id=61863
https://bugs.webkit.org/show_bug.cgi?id=61862
Comment 35 goberman 2011-06-23 08:54:52 PDT
This bug was fixed for a while and even worked with Chrome for months. 
However, I have noticed it is broken again in the latest Chrome. I retested it with the WebKit nightly build and it does not work ether.

The bug needs to be reopened. Can someone change the status to open, or do I need to file a new bug?
Comment 36 Julien Chaffraix 2011-06-23 09:16:29 PDT
(In reply to comment #35)
> This bug was fixed for a while and even worked with Chrome for months. 
> However, I have noticed it is broken again in the latest Chrome. I retested it with the WebKit nightly build and it does not work ether.
> 
> The bug needs to be reopened. Can someone change the status to open, or do I need to file a new bug?

As this was fixed at one point but was regressed, please file another bug and CC me on it. We need to investigate how / why it regressed.
Comment 37 goberman 2011-06-23 10:01:52 PDT
Filed: https://bugs.webkit.org/show_bug.cgi?id=63262