Bug 56942 - EventSource should only accept UTF-8 charset
Summary: EventSource should only accept UTF-8 charset
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-23 11:33 PDT by Julien Chaffraix
Modified: 2011-04-07 23:45 PDT (History)
6 users (show)

See Also:


Attachments
Proposed fix: Accept charset UTF-8, block and log any other charset (11.52 KB, patch)
2011-03-23 11:46 PDT, Julien Chaffraix
ap: review-
ap: commit-queue-
Details | Formatted Diff | Diff
New patch following Alexey's comments: Only accept UTF-8 and log most rejection cases (13.82 KB, patch)
2011-03-28 19:50 PDT, Julien Chaffraix
no flags Details | Formatted Diff | Diff
Same as previously - with a work-around for some PHP escaping issues (14.06 KB, patch)
2011-04-03 19:53 PDT, Julien Chaffraix
ap: review+
ap: commit-queue-
Details | Formatted Diff | Diff
Same as previous but fixed Alexey's comments (14.20 KB, patch)
2011-04-07 18:10 PDT, Julien Chaffraix
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Julien Chaffraix 2011-03-23 11:33:06 PDT
This is a follow-up of bug45372. The discussion hinted at accepting UTF-8 charset  for EventSource (the charset mentioned in the spec) but failing other charset and logging it to the console. What got landed is not accepting any charset altogether.

This bug is about implementing the previous behavior for compatibility with some servers sending charset automatically (tomcat was mentioned).
Comment 1 Ian 'Hixie' Hickson 2011-03-23 11:39:42 PDT
Per spec only streams with the exact MIME type "text/event-stream" should be accepted. If you think other streams should be accepted too, please make sure to contact the mailing list as well so we can update the spec accordingly.
Comment 2 Julien Chaffraix 2011-03-23 11:46:20 PDT
Created attachment 86657 [details]
Proposed fix: Accept charset UTF-8, block and log any other charset
Comment 3 Alexey Proskuryakov 2011-03-23 12:31:10 PDT
Comment on attachment 86657 [details]
Proposed fix: Accept charset UTF-8, block and log any other charset

View in context: https://bugs.webkit.org/attachment.cgi?id=86657&action=review

Does charset="UTF-8" (with quotes) work? Could you add a test?

Pelase do send a message to HTML mailing list, as suggested by Ian.

> LayoutTests/http/tests/eventsource/eventsource-content-type-charset-expected.txt:1
> +CONSOLE MESSAGE: line 1: EventSource's response had a charset ("Windows-1152") that is not UTF-8. Aborting the connection.

There are two failure subtests, so why only one console message?

> Source/WebCore/page/EventSource.cpp:186
> -    if (statusCode == 200 && response.mimeType() == "text/event-stream") {
> +    bool isResponseValid = statusCode == 200 && response.mimeType() == "text/event-stream";

I'm confused. The bug description sounds like we should be allowing more than we do today, but the actual fix makes code more restrictive. Which is correct?

Also, s/isResponseValid/responseIsValid/ (or isValidResponse).

> Source/WebCore/page/EventSource.cpp:194
> +            String message = "EventSource's response had a charset (\"";
> +            message += charset;
> +            message += "\") that is not UTF-8. Aborting the connection.";

I think it's wrong grammar to have different tenses in one sentence (had... is).
Comment 4 Julien Chaffraix 2011-03-23 14:05:53 PDT
> Does charset="UTF-8" (with quotes) work? Could you add a test?

It does, I will add some testing along those lines.

> Pelase do send a message to HTML mailing list, as suggested by Ian.

Sure. Ian, we are actually already allowing a charset in the Content-Type header.

> > LayoutTests/http/tests/eventsource/eventsource-content-type-charset-expected.txt:1
> > +CONSOLE MESSAGE: line 1: EventSource's response had a charset ("Windows-1152") that is not UTF-8. Aborting the connection.
> 
> There are two failure subtests, so why only one console message?

There is 2 failures but only one failure involves an invalid charset. We only log the error when we have a valid mime-type but an invalid charset thus only one console message. Would like me to log also if the mime-type is wrong?

> > Source/WebCore/page/EventSource.cpp:186
> > -    if (statusCode == 200 && response.mimeType() == "text/event-stream") {
> > +    bool isResponseValid = statusCode == 200 && response.mimeType() == "text/event-stream";
> 
> I'm confused. The bug description sounds like we should be allowing more than we do today, but the actual fix makes code more restrictive. Which is correct?

This is correct. This change is restricting the behavior to only accept UTF-8 as was suggested in the previous bug.

> Also, s/isResponseValid/responseIsValid/ (or isValidResponse).

OK.

> > Source/WebCore/page/EventSource.cpp:194
> > +            String message = "EventSource's response had a charset (\"";
> > +            message += charset;
> > +            message += "\") that is not UTF-8. Aborting the connection.";
> 
> I think it's wrong grammar to have different tenses in one sentence (had... is).

I will fix it to use present in both cases.
Comment 5 Alexey Proskuryakov 2011-03-23 14:41:52 PDT
Comment on attachment 86657 [details]
Proposed fix: Accept charset UTF-8, block and log any other charset

> This is correct. This change is restricting the behavior to only accept UTF-8 as was suggested in the previous bug.

OK. Please rename the bug in ChangeLog.

> Would like me to log also if the mime-type is wrong?

Sounds like that would be useful for developers.
Comment 6 Julien Chaffraix 2011-03-24 07:23:20 PDT
Brought this change to the WHATWG group: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-March/031003.html
Comment 7 Julien Chaffraix 2011-03-28 19:50:00 PDT
Created attachment 87251 [details]
New patch following Alexey's comments: Only accept UTF-8 and log most rejection cases

As the proposal did raise any objection, here is the updated patch.
Comment 8 WebKit Commit Bot 2011-03-29 05:54:47 PDT
Comment on attachment 87251 [details]
New patch following Alexey's comments: Only accept UTF-8 and log most rejection cases

Rejecting attachment 87251 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'build-..." exit_code: 2

Last 500 characters of output:
/tests/appcache ...........................................................
http/tests/cache ........
http/tests/canvas/philip/tests ..........
http/tests/canvas/webgl .
http/tests/cookies .....
http/tests/css ....
http/tests/eventsource ..
http/tests/eventsource/eventsource-content-type-charset.html -> failed

Exiting early after 1 failures. 22089 tests run.
569.43s total testing time

22088 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
14 test cases (<1%) had stderr output

Full output: http://queues.webkit.org/results/8284290
Comment 9 Julien Chaffraix 2011-03-31 07:50:41 PDT
> http/tests/eventsource/eventsource-content-type-charset.html -> failed

The cause of failure is that the Mac implementation of ResourceResponse::textEncodingName does not strip the leading and trailing quotes in the charset. Instead it escapes them and we fail the comparison to "utf-8". Qt, for example, does remove them.

I could add a work-around for that or just go ahead and fix the output of Mac, documenting what we expect textEncodingName to return.
Comment 10 Alexey Proskuryakov 2011-03-31 09:04:36 PDT
> The cause of failure is that the Mac implementation of ResourceResponse::textEncodingName does not strip the leading and trailing quotes in the charset.

I thought that I added a workaround for this in bug 46573. What did go wrong?
Comment 11 Julien Chaffraix 2011-03-31 13:47:49 PDT
> I thought that I added a workaround for this in bug 46573. What did go wrong?

I haven't had time to debug the problem so I don't know what's going on yet. I will look into why this is happening here.
Comment 12 Julien Chaffraix 2011-04-03 19:50:13 PDT
(In reply to comment #11)
> > I thought that I added a workaround for this in bug 46573. What did go wrong?
> 
> I haven't had time to debug the problem so I don't know what's going on yet. I will look into why this is happening here.

Alexey, the problem is not in WebKit. It looks like PHP on Mac escapes quotes in the charset with slashes which is causing the problem (I had to look at some network capture to find this out). I will be posting an updated patch to work around this misbehavior.
Comment 13 Julien Chaffraix 2011-04-03 19:53:02 PDT
Created attachment 88028 [details]
Same as previously - with a work-around for some PHP escaping issues
Comment 14 WebKit Review Bot 2011-04-03 19:55:32 PDT
Attachment 88028 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style']" exit_code: 1

LayoutTests/ChangeLog-2011-02-16:27403:  ChangeLog entry has no bug number  [changelog/bugnumber] [5]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 Alexey Proskuryakov 2011-04-03 21:02:38 PDT
Comment on attachment 88028 [details]
Same as previously - with a work-around for some PHP escaping issues

View in context: https://bugs.webkit.org/attachment.cgi?id=88028&action=review

> LayoutTests/ChangeLog:21
> +        in the URL. Also added a work-around a bug in PHP.

I don't think that it's a bug. A quick web search suggested that it's a configuration option, and we should do something like:

if (get_magic_quotes_gpc()){
    $value = stripslashes($value);
}

See also: <http://php.net/manual/en/function.stripslashes.php>, <http://php.net/manual/en/function.get-magic-quotes-gpc.php>.

Please use the proposed pattern, and fix the ChangeLog accordingly.

> LayoutTests/http/tests/eventsource/eventsource-content-type-charset.html:35
> +                     'text/event-stream; charset=windows-1152',

Did you mean to use a non-existent charset here? If so, 1152 is an overly subtle way to express that.

I think that we should test windows-1251 or us-ascii.
Comment 16 Eric Seidel (no email) 2011-04-06 10:46:09 PDT
Comment on attachment 87251 [details]
New patch following Alexey's comments: Only accept UTF-8 and log most rejection cases

Cleared Alexey Proskuryakov's review+ from obsolete attachment 87251 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Comment 17 Julien Chaffraix 2011-04-07 17:35:12 PDT
> > LayoutTests/ChangeLog:21
> > +        in the URL. Also added a work-around a bug in PHP.
> 
> I don't think that it's a bug. A quick web search suggested that it's a configuration option, and we should do something like:
> 
> if (get_magic_quotes_gpc()){
>     $value = stripslashes($value);
> }
> 
> See also: <http://php.net/manual/en/function.stripslashes.php>, <http://php.net/manual/en/function.get-magic-quotes-gpc.php>.

Indeed. I should have expected some different options on Mac. Do we know why the configuration is different between Linux and Mac? It looks like something that will bite us at some point and would like to fix.

> > LayoutTests/http/tests/eventsource/eventsource-content-type-charset.html:35
> > +                     'text/event-stream; charset=windows-1152',
> 
> Did you mean to use a non-existent charset here? If so, 1152 is an overly subtle way to express that.

No, this is a typo. Changed it to windows-1251 as the logic does not care if the charset exists or not.
Comment 18 Alexey Proskuryakov 2011-04-07 17:49:36 PDT
> Do we know why the configuration is different between Linux and Mac?

I don't know.
Comment 19 Julien Chaffraix 2011-04-07 18:10:09 PDT
Created attachment 88746 [details]
Same as previous but fixed Alexey's comments
Comment 20 WebKit Review Bot 2011-04-07 18:12:12 PDT
Attachment 88746 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/Chan..." exit_code: 1

LayoutTests/ChangeLog-2011-02-16:27403:  ChangeLog entry has no bug number  [changelog/bugnumber] [5]
Total errors found: 1 in 8 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 WebKit Commit Bot 2011-04-07 23:42:30 PDT
The commit-queue encountered the following flaky tests while processing attachment 88746 [details]:

fast/loader/recursive-before-unload-crash.html bug 50880 (authors: beidson@apple.com and eric@webkit.org)
The commit-queue is continuing to process your patch.
Comment 22 WebKit Commit Bot 2011-04-07 23:45:50 PDT
Comment on attachment 88746 [details]
Same as previous but fixed Alexey's comments

Clearing flags on attachment: 88746

Committed r83260: <http://trac.webkit.org/changeset/83260>
Comment 23 WebKit Commit Bot 2011-04-07 23:45:55 PDT
All reviewed patches have been landed.  Closing bug.