Bug 57138 - WebSocket urls should always be encoded as UTF-8
Summary: WebSocket urls should always be encoded as UTF-8
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-25 20:13 PDT by Joe Mason
Modified: 2011-05-06 01:59 PDT (History)
7 users (show)

See Also:


Attachments
patch (5.39 KB, patch)
2011-03-25 20:22 PDT, Joe Mason
abarth: review-
Details | Formatted Diff | Diff
updated patch (15.67 KB, patch)
2011-03-28 12:35 PDT, Joe Mason
abarth: review+
commit-queue: commit-queue-
Details | Formatted Diff | Diff
fix compile error (12.81 KB, patch)
2011-04-28 15:58 PDT, Joe Mason
ap: review-
ap: commit-queue-
Details | Formatted Diff | Diff
simplified fix (10.57 KB, patch)
2011-05-05 22:39 PDT, Joe Mason
abarth: review+
ap: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-02 (4.77 MB, application/zip)
2011-05-05 23:23 PDT, WebKit Review Bot
no flags Details
removed useless comment (7.76 KB, patch)
2011-05-05 23:58 PDT, Joe Mason
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Mason 2011-03-25 20:13:47 PDT
If a document includes a URL whose query includes non-ASCII chars, we normally encode it using the document's encoding.  But the websocket spec specifies that websocket URLs should always be encoded with UTF-8 (see http://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-06#section-3 - hasn't changed since -00).
Comment 1 Joe Mason 2011-03-25 20:22:16 PDT
Created attachment 87005 [details]
patch
Comment 2 Eric Seidel (no email) 2011-03-27 01:08:19 PDT
Comment on attachment 87005 [details]
patch

What about GKURL?
Comment 3 Adam Barth 2011-03-27 13:11:48 PDT
Comment on attachment 87005 [details]
patch

I don't think this change is correct.  The requirement you cite is w.r.t. the WebSockets protocol, but you've changed how ws and wss behave for all protocols.  Instead, you should make the change in the implementation of the WebSocket protocol.
Comment 4 Joe Mason 2011-03-28 12:35:55 PDT
Created attachment 87180 [details]
updated patch

How's this?

Quite a few ways to implement Document::completeURLWithUTF8 - since from the comments completeURL seems to be performance critical, I chose to make both completeURL and completeURLWithUTF8 into inlines wrapper to a new func which takes a bool param to force UTF-8 or not.  The alternative is to not touch completeURL, and copy the code to a new function which differs only in how it uses m_decoder.
Comment 5 Adam Barth 2011-03-28 12:42:30 PDT
Comment on attachment 87180 [details]
updated patch

Thanks!  Looks good, but I'd like to look at it more carefully before marking r+.  Other reviewers should feel free to finish the review if they like,
Comment 6 Adam Barth 2011-04-26 18:36:47 PDT
Comment on attachment 87180 [details]
updated patch

Looks great!
Comment 7 WebKit Commit Bot 2011-04-26 21:07:48 PDT
Comment on attachment 87180 [details]
updated patch

Rejecting attachment 87180 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-8', 'build'..." exit_code: 2

Last 500 characters of output:
 XCODE_APP_SUPPORT_DIR /Developer/Library/Xcode
    setenv XCODE_VERSION_ACTUAL 0322
    setenv XCODE_VERSION_MAJOR 0300
    setenv XCODE_VERSION_MINOR 0320
    setenv YACC /Developer/usr/bin/yacc
    /bin/sh -c /mnt/git/webkit-commit-queue/WebKitBuild/WebKit.build/Debug/WebKit.build/Script-5DE6D18C0FCF231B002DE28C.sh

** BUILD FAILED **


The following build commands failed:
WebKit:
	Ld /mnt/git/webkit-commit-queue/WebKitBuild/Debug/WebKit.framework/Versions/A/WebKit normal x86_64
(1 failure)


Full output: http://queues.webkit.org/results/8507818
Comment 8 Joe Mason 2011-04-28 15:58:29 PDT
Created attachment 91574 [details]
fix compile error

Looks like on some platforms completeURL is called across translation units, so it can't be inlined.  This is the same patch, but without inlining completeURL.
Comment 9 Alexey Proskuryakov 2011-04-28 16:14:23 PDT
Comment on attachment 91574 [details]
fix compile error

Why do we even call completeURL on these if they must be absolute in the first place? I think that you should simply change parseURL() call to KURL(KURL(), url, utf8Encoding());
Comment 10 Joe Mason 2011-04-29 07:09:37 PDT
Comment on attachment 91574 [details]
fix compile error

Because completeURL does more than check the encoding: it also checks the document's m_baseURL, and only falls back to using KURL() if it is not set.
Comment 11 Alexey Proskuryakov 2011-04-29 08:54:50 PDT
Comment on attachment 91574 [details]
fix compile error

As I already said, URLs passed to WebSocket constructor need to be absolute, so looking at document base URL is not necessary.
Comment 12 Joe Mason 2011-05-05 22:39:21 PDT
Created attachment 92543 [details]
simplified fix
Comment 13 Alexey Proskuryakov 2011-05-05 22:48:12 PDT
Comment on attachment 92543 [details]
simplified fix

+        // Use a String instead of a KURL for the url to ensure that WebSocket
+        // parses the url itself using WebSocket rules (must be absolute and
+        // UTF-8 encoded).

I don't see how the comment can be useful to someone reading this code. There is no option to use KURL, so explaining why String is better than KURL is pointless.

Otherwise, looks fine to me. Adam?
Comment 14 Adam Barth 2011-05-05 22:55:08 PDT
Comment on attachment 92543 [details]
simplified fix

View in context: https://bugs.webkit.org/attachment.cgi?id=92543&action=review

> Source/WebCore/websockets/WebSocket.h:66
> +        // Use a String instead of a KURL for the url to ensure that WebSocket
> +        // parses the url itself using WebSocket rules (must be absolute and
> +        // UTF-8 encoded).

Yeah, I agree.  Having a test for this behavior is more valuable than having a comment.
Comment 15 Joe Mason 2011-05-05 23:17:51 PDT
(In reply to comment #13)
> (From update of attachment 92543 [details])
> +        // Use a String instead of a KURL for the url to ensure that WebSocket
> +        // parses the url itself using WebSocket rules (must be absolute and
> +        // UTF-8 encoded).
> 
> I don't see how the comment can be useful to someone reading this code. There is no option to use KURL, so explaining why String is better than KURL is pointless.
> 
> Otherwise, looks fine to me. Adam?

My thought is that people looking at it will wonder, "If it's a url, why isn't a KURL?" so an explanation is valuable.
Comment 16 WebKit Review Bot 2011-05-05 23:23:49 PDT
Created attachment 92548 [details]
Archive of layout-test-results from ec2-cr-linux-02

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-02  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 17 Adam Barth 2011-05-05 23:31:52 PDT
> My thought is that people looking at it will wonder, "If it's a url, why isn't a KURL?" so an explanation is valuable.

We do that a lot.  If they try to change it, the test will fail.
Comment 18 Joe Mason 2011-05-05 23:58:31 PDT
Created attachment 92550 [details]
removed useless comment
Comment 19 WebKit Commit Bot 2011-05-06 01:59:03 PDT
Comment on attachment 92550 [details]
removed useless comment

Clearing flags on attachment: 92550

Committed r85939: <http://trac.webkit.org/changeset/85939>
Comment 20 WebKit Commit Bot 2011-05-06 01:59:11 PDT
All reviewed patches have been landed.  Closing bug.