Bug 51253 - WebSockets: unbounded buffer growth when server sends bad data
Summary: WebSockets: unbounded buffer growth when server sends bad data
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL: http://bohuco.net/dev/websocket/
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-17 07:55 PST by Joe Mason
Modified: 2011-01-10 18:46 PST (History)
10 users (show)

See Also:


Attachments
patch to check beginning bytes of buffer (2.45 KB, patch)
2010-12-17 08:02 PST, Joe Mason
ap: review-
Details | Formatted Diff | Diff
patch to cap handshake length at 1024 bytes (14.24 KB, patch)
2011-01-09 23:34 PST, Joe Mason
no flags Details | Formatted Diff | Diff
same patch, with ChangeLogs. Oops. (16.73 KB, patch)
2011-01-09 23:36 PST, Joe Mason
no flags Details | Formatted Diff | Diff
fix chromium build failure (16.72 KB, patch)
2011-01-09 23:54 PST, Joe Mason
ap: review+
ap: commit-queue-
Details | Formatted Diff | Diff
addressed review comments (14.87 KB, patch)
2011-01-10 10:30 PST, Joe Mason
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joe Mason 2010-12-17 07:55:10 PST
If a server sends data with an embedded NULL before the handshake, the socket will stay in the "connecting" state and buffer all data sent to it.  The reason is this clause in WebSocketHandshake::readServerHandshake:

    if (!strnstr(header, "\r\n\r\n", len)) {
        // Just hasn't been received fully yet.
        m_mode = Incomplete;
        return -1;
    }

strnstr stops searching at the first NULL, so if one is found before \r\n\r\n then the client will continue to read more data and then search the same segment of the buffer for \r\n\r\n.

This can be seen on http://bohuco.net/dev/websocket/, whose server has a race condition: every time data is received from one connected socket it is echoed to all connected sockets without checking their state, even those that have not completed a handshake.  So it's possible for the first bytes received by a client to be a WebSocket frame, which begins with NULL.  (To test this, open a window to http://bohuco.net/dev/websocket/ and move the mouse continuously in it while connecting to the same site in another window.  It will trigger the race condition at least half the time.)

The correct thing to do is disconnect immediately when the invalid frame is received from the server instead of the handshake.
Comment 1 Joe Mason 2010-12-17 08:02:43 PST
Created attachment 76879 [details]
patch to check beginning bytes of buffer

My solution is to check the exact beginning bytes of the buffer for "HTTP/" before calling strnstr.  This makes sure that if a websocket frame without a handshake is sent it will disconnect immediately, and is safer before the existing handshake parser could accept other line-oriented protocols with different text before the first space.

It just occurred to me that this fix still us vulnerable to a server that sends "HTTP/\0".

Unfortunately, I haven't had time to create a test case, but I wanted to file this before going on vacation tomorrow.
Comment 2 WebKit Review Bot 2010-12-17 08:09:38 PST
Attachment 76879 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7205018
Comment 3 Alexey Proskuryakov 2010-12-17 10:44:01 PST
Comment on attachment 76879 [details]
patch to check beginning bytes of buffer

View in context: https://bugs.webkit.org/attachment.cgi?id=76879&action=review

This is a violation of WebSockets-76 spec - the spec doesn't require the response to start with "HTTP/", see <http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76#section-4.1> step 28.

Also, this doesn't look like a complete fix.  What if the response line starts with "HTTP/", but doesn't have any newlines? Current code would just read INT_MAX bytes, which is effectively unbounded.

It seems that we should just put a more practical limit on status line length, and tell the hybi working group that a limit should be added to the spec. In fact, it will be possible to add a regression test then.

> WebCore/ChangeLog:14
> +        No new tests. (OOPS!)

A commit hook will prevent landing with OOPS - please replace it with an explanation of why there is no test.
Comment 4 Joe Mason 2011-01-05 11:08:33 PST
(In reply to comment #3)
> (From update of attachment 76879 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=76879&action=review
> 
> This is a violation of WebSockets-76 spec - the spec doesn't require the response to start with "HTTP/", see <http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76#section-4.1> step 28.

I guess that's what making reasonable assumptions gets you...

What's a good max length?  The server response doesn't contain a URL so we don't have to worry about long addresses.  My first thought is to just use 1024 chars.
Comment 5 Alexey Proskuryakov 2011-01-05 11:23:04 PST
Please consider sending an e-mail to hybi@ietf.org working group about maximum length first. If there is no response, I'd go with 1024 bytes indeed.
Comment 6 Ian Fette 2011-01-05 18:19:35 PST
It seems like we have rough consensus on the working group for a new handshake that is GET+Upgrade with the payload masking from client to server. As such, I'm working on a -04 draft this week based on that method. The -04 text I'm working on does not have this issue. 

In my mind -76 is long dead, I would much rather see us put the nail in it by shipping -04 than continuing to fiddle with -76. In the meantime however, if this is important to avoid any sort of exploitable bug I would say feel free to do what you want, -76 should hopefully live no longer than another month and so I'm not really worried about being strictly compliant to a version of the spec we're trying to kill.

My $0.02.
Comment 7 Ian Fette 2011-01-05 18:22:16 PST
Also, as a bit more context, I'm trying to write -04 in the context of RFC2616. That is, rather than saying "Read X bytes, interpret it in such a fashion" it's more "There shall be a status-line as per RFC2616, there shall be a header called this, etc." Given that -04 is meant to be HTTP complaint, I'm hoping we can re-use not just definitions from HTTP but also hopefully some of the parsing code for the handshake.
Comment 8 Joe Mason 2011-01-09 23:34:13 PST
Created attachment 78375 [details]
patch to cap handshake length at 1024 bytes
Comment 9 Joe Mason 2011-01-09 23:36:39 PST
Created attachment 78376 [details]
same patch, with ChangeLogs. Oops.
Comment 10 WebKit Review Bot 2011-01-09 23:40:11 PST
Attachment 78375 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7477004
Comment 11 WebKit Review Bot 2011-01-09 23:42:15 PST
Attachment 78376 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/7371101
Comment 12 Joe Mason 2011-01-09 23:54:26 PST
Created attachment 78377 [details]
fix chromium build failure
Comment 13 Alexey Proskuryakov 2011-01-10 01:20:29 PST
Comment on attachment 78377 [details]
fix chromium build failure

View in context: https://bugs.webkit.org/attachment.cgi?id=78377&action=review

I'm going to say r+, but please address the comments before landing.

If you'd like to ask for another quick review pass after addressing the comments, please feel free to do so.

> Source/WebCore/websockets/WebSocketHandshake.cpp:3
> + * Copyright (C) Research In Motion Limited 2011. All rights reserved.

Should it be "Copyright (C) 2011"?

> Source/WebCore/websockets/WebSocketHandshake.cpp:433
> +    static const int maxLength = 1024;

We don't abbreviate when we can avoid it. I suggest something like "maximumStatusLineLength".

> Source/WebCore/websockets/WebSocketHandshake.cpp:454
> +            // The caller will search for the ending newline with strnstr,
> +            // which does not scan past nulls.  So a null in the stream will
> +            // cause it to loop forever, reading more data and then repeating
> +            // the search only on the data before the null, unless we abort
> +            // immediately.

This explains why the caller will do a wrong thing, but doesn't explain why we shouldn't just fix the caller. Do we really want to detect an error here?

Assuming that I understand the situation correctly, this may be more clear: "The caller isn't prepared to deal with null bytes in status line. WebSockets specification doesn't prohibit this, but HTTP does, so we'll just treat this as an error".

> Source/WebCore/websockets/WebSocketHandshake.cpp:456
> +            return p + 1 - header;

Shouldn't we set m_mode = Failed?

> Source/WebCore/websockets/WebSocketHandshake.cpp:466
> -        return INT_MAX;
> +        return maxLength;

Ditto, shouldn't we set m_mode = Failed?

> LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength-expected.txt:1
> +CONSOLE MESSAGE: line 0: Status line is too long: ................................................................................................................................â¦

Hmm, but this isn't 1024 bytes? Why?

Please consider not dumping the bad string - if it's over 1024 bytes, it's likely garbage, and can obstruct reading other text in console. The developer can using tcpdump or Wireshark to debug the problem.

> LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength.html:13
> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> +<html>
> +<head>
> +<link rel="stylesheet" href="../../../js-test-resources/js-test-style.css">
> +<script src="../../../js-test-resources/js-test-pre.js"></script>
> +<script src="../../../js-test-resources/js-test-post-function.js"></script>
> +</head>
> +<body>
> +<div id="description"></div>
> +<div id="console"></div>
> +<script src="script-tests/handshake-fail-by-maxlength.js"></script>
> +</body>
> +</html>

Please don't split tests into .html and .js parts. This is a anti-pattern that can be seen in many existing tests, but it should be avoided for new tests.

> LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength_wsh.py:1
> +# Copyright (C) Research In Motion Limited 2011. All rights reserved.

Copyright (C) 2011?

> LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength_wsh.py:15
> +#     * Neither the name of Google Inc. nor the names of its
> +# contributors may be used to endorse or promote products derived from
> +# this software without specific prior written permission.

<http://webkit.org/coding/bsd-license.html> is two clause. Or is this copied from a Google originated file, so you decided to keep their license?

I don't really understand why this needs a license block, and other new files don't.
Comment 14 Joe Mason 2011-01-10 08:27:56 PST
(In reply to comment #13)

> > Source/WebCore/websockets/WebSocketHandshake.cpp:3
> > + * Copyright (C) Research In Motion Limited 2011. All rights reserved.
> 
> Should it be "Copyright (C) 2011"?

Not according to our legal department.  I don't know why they want us to be special snowflakes.

> > Source/WebCore/websockets/WebSocketHandshake.cpp:433
> > +    static const int maxLength = 1024;
> 
> We don't abbreviate when we can avoid it. I suggest something like "maximumStatusLineLength".

Whoops, will fix.

> > Source/WebCore/websockets/WebSocketHandshake.cpp:454
> > +            // The caller will search for the ending newline with strnstr,
> > +            // which does not scan past nulls.  So a null in the stream will
> > +            // cause it to loop forever, reading more data and then repeating
> > +            // the search only on the data before the null, unless we abort
> > +            // immediately.
> 
> This explains why the caller will do a wrong thing, but doesn't explain why we shouldn't just fix the caller. Do we really want to detect an error here?

Fixing the caller would involve hand rolling string searching code or drastically changing the algorithm, which I think is overkill for this bug.  This code will be obsolete once we implement the new websocket spec anyway.

> Assuming that I understand the situation correctly, this may be more clear: "The caller isn't prepared to deal with null bytes in status line. WebSockets specification doesn't prohibit this, but HTTP does, so we'll just treat this as an error".
> 
> > Source/WebCore/websockets/WebSocketHandshake.cpp:456
> > +            return p + 1 - header;
> 
> Shouldn't we set m_mode = Failed?

The caller will do this, since the statusCode is -1.

> > LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength-expected.txt:1
> > +CONSOLE MESSAGE: line 0: Status line is too long: ................................................................................................................................â¦
> 
> Hmm, but this isn't 1024 bytes? Why?

Console messages are capped at roughly 128 bytes (see trimConsoleMessage).

> Please consider not dumping the bad string - if it's over 1024 bytes, it's likely garbage, and can obstruct reading other text in console. The developer can using tcpdump or Wireshark to debug the problem.

Ok, was just following the existing usage.  I'll change it.

> > LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength.html:13
> > +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> > +<html>
> > +<head>
> > +<link rel="stylesheet" href="../../../js-test-resources/js-test-style.css">
> > +<script src="../../../js-test-resources/js-test-pre.js"></script>
> > +<script src="../../../js-test-resources/js-test-post-function.js"></script>
> > +</head>
> > +<body>
> > +<div id="description"></div>
> > +<div id="console"></div>
> > +<script src="script-tests/handshake-fail-by-maxlength.js"></script>
> > +</body>
> > +</html>
> 
> Please don't split tests into .html and .js parts. This is a anti-pattern that can be seen in many existing tests, but it should be avoided for new tests.

Ditto, I was copying an existing handshake-fail test.  I'll fix it.

> > LayoutTests/http/tests/websocket/tests/handshake-fail-by-maxlength_wsh.py:15
> > +#     * Neither the name of Google Inc. nor the names of its
> > +# contributors may be used to endorse or promote products derived from
> > +# this software without specific prior written permission.
> 
> <http://webkit.org/coding/bsd-license.html> is two clause. Or is this copied from a Google originated file, so you decided to keep their license?

Yeah, cut and paste error.  I'll fix it.

> I don't really understand why this needs a license block, and other new files don't.

Again, was in the original file I used as a model.
Comment 15 Alexey Proskuryakov 2011-01-10 08:52:37 PST
> > Shouldn't we set m_mode = Failed?
> The caller will do this, since the statusCode is -1.

Indeed. It seems a bit opaque though - I think that code that detects and logs the error should also set the error flag.
Comment 16 Joe Mason 2011-01-10 10:30:16 PST
Created attachment 78408 [details]
addressed review comments
Comment 17 Joe Mason 2011-01-10 10:33:28 PST
> > I don't really understand why this needs a license block, and other new files don't.
>
> Again, was in the original file I used as a model

Oh, yeah, the other reason is that the .html and .js files retain a lot of existing code copied from existing tests, so I can't just slap a new license on them claiming copyright.  The .py files are all-original, so I added a license.
Comment 18 Alexey Proskuryakov 2011-01-10 10:50:44 PST
Comment on attachment 78408 [details]
addressed review comments

I still don't like the subtlety or error detection code, but OK.
Comment 19 WebKit Commit Bot 2011-01-10 12:04:51 PST
Comment on attachment 78408 [details]
addressed review comments

Rejecting attachment 78408 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=eseidel-sf-cq', 'land-attachment', '--force-clean', '--ignore-builders', '--non-interactive', '--parent-command=commit-queue', 78408]" exit_code: 2
Last 500 characters of output:
2f8a4035bfde05e82f551a1eeb73a3c944d50fda
r75397 = 9b49509651112f4326642de3ad5ef7a0772f56fb
r75398 = ae7e202e6792383d22079e2f8013d5d8a0ebc1ac
r75399 = 5a6966f33c249967eaee5fcf1a15ec1a44a998ac
r75400 = cd356303c6d0a4b682875e48d855601f9a991203
r75401 = 73b728835f92bf8d19cc0e1009d66de02d22b689
Done rebuilding .git/svn/refs/remotes/origin/master/.rev_map.268f45cc-cd09-0410-ab3c-d52691b4dbfc
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.

Full output: http://queues.webkit.org/results/7392097
Comment 20 Adam Barth 2011-01-10 15:53:28 PST
Comment on attachment 78408 [details]
addressed review comments

That's a very strange error.  Lets try again.
Comment 21 WebKit Commit Bot 2011-01-10 18:24:21 PST
The commit-queue encountered the following flaky tests while processing attachment 78408 [details]:

fast/workers/storage/use-same-database-in-page-and-workers.html bug 50995 (author: dumi@chromium.org)
The commit-queue is continuing to process your patch.
Comment 22 WebKit Commit Bot 2011-01-10 18:26:23 PST
Comment on attachment 78408 [details]
addressed review comments

Clearing flags on attachment: 78408

Committed r75461: <http://trac.webkit.org/changeset/75461>
Comment 23 WebKit Commit Bot 2011-01-10 18:26:32 PST
All reviewed patches have been landed.  Closing bug.
Comment 24 WebKit Review Bot 2011-01-10 18:46:13 PST
http://trac.webkit.org/changeset/75461 might have broken SnowLeopard Intel Release (Build)