Bug 213143 - Add artificial delay to WebSocket connections to mitigate port scanning attacks
Summary: Add artificial delay to WebSocket connections to mitigate port scanning attacks
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kate Cheney
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-06-12 13:18 PDT by Kate Cheney
Modified: 2020-07-13 22:39 PDT (History)
14 users (show)

See Also:


Attachments
Patch (8.08 KB, patch)
2020-06-12 13:38 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (7.86 KB, patch)
2020-06-12 15:47 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (7.59 KB, patch)
2020-06-12 16:37 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (8.22 KB, patch)
2020-06-15 13:05 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (19.78 KB, patch)
2020-07-07 14:13 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (19.78 KB, patch)
2020-07-07 14:16 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (10.08 KB, patch)
2020-07-08 17:42 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (11.90 KB, patch)
2020-07-09 11:00 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (1.90 KB, patch)
2020-07-13 12:50 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (1.96 KB, patch)
2020-07-13 12:58 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kate Cheney 2020-06-12 13:18:23 PDT
WebSocket connections can potentially be used to determine if specific ports are open using timing attacks. We should make this more difficult by adding some sort of delay or noise to the reporting of WebSocket state.
Comment 1 Kate Cheney 2020-06-12 13:18:39 PDT
rdar://problem/64308927
Comment 2 Kate Cheney 2020-06-12 13:38:46 PDT
Created attachment 401773 [details]
Patch
Comment 3 Chris Dumez 2020-06-12 13:44:26 PDT
Comment on attachment 401773 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401773&action=review

> Source/WebCore/Modules/websockets/WebSocket.cpp:613
>      dispatchOrQueueErrorEvent();

You're still firing the JS event right away here so I don't think you are achieving anything. Also, you are violating the HTML specification now which says to *first* set readyState to CLOSED and *then* fire the error event:
https://html.spec.whatwg.org/multipage/web-sockets.html#feedback-from-the-protocol:concept-websocket-closed
Comment 4 Kate Cheney 2020-06-12 13:52:09 PDT
Comment on attachment 401773 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401773&action=review

>> Source/WebCore/Modules/websockets/WebSocket.cpp:613
>>      dispatchOrQueueErrorEvent();
> 
> You're still firing the JS event right away here so I don't think you are achieving anything. Also, you are violating the HTML specification now which says to *first* set readyState to CLOSED and *then* fire the error event:
> https://html.spec.whatwg.org/multipage/web-sockets.html#feedback-from-the-protocol:concept-websocket-closed

The JS attack checks ws.readyState to determine the interval, which I think calls into WebSocket::readyState(). So by delaying setting m_state, it should be applicable to this specific method. But maybe I'm missing something?

Good point about the spec. I can potentially add the error event to the timer execution to happen after changing the state.
Comment 5 Chris Dumez 2020-06-12 14:03:25 PDT
(In reply to katherine_cheney from comment #4)
> Comment on attachment 401773 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=401773&action=review
> 
> >> Source/WebCore/Modules/websockets/WebSocket.cpp:613
> >>      dispatchOrQueueErrorEvent();
> > 
> > You're still firing the JS event right away here so I don't think you are achieving anything. Also, you are violating the HTML specification now which says to *first* set readyState to CLOSED and *then* fire the error event:
> > https://html.spec.whatwg.org/multipage/web-sockets.html#feedback-from-the-protocol:concept-websocket-closed
> 
> The JS attack checks ws.readyState to determine the interval, which I think
> calls into WebSocket::readyState(). So by delaying setting m_state, it
> should be applicable to this specific method. But maybe I'm missing
> something?

My point is that you can assume that if you got the error event, the readyState is closed, since that's the spec. Also, you can time how long it takes to get the error event...

> 
> Good point about the spec. I can potentially add the error event to the
> timer execution to happen after changing the state.

A SuspendableTimer then otherwise you are going to fire events while in the back/forward cache potentially.
Comment 6 Kate Cheney 2020-06-12 14:30:10 PDT
(In reply to Chris Dumez from comment #5)
> (In reply to katherine_cheney from comment #4)
> > Comment on attachment 401773 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=401773&action=review
> > 
> > >> Source/WebCore/Modules/websockets/WebSocket.cpp:613
> > >>      dispatchOrQueueErrorEvent();
> > > 
> > > You're still firing the JS event right away here so I don't think you are achieving anything. Also, you are violating the HTML specification now which says to *first* set readyState to CLOSED and *then* fire the error event:
> > > https://html.spec.whatwg.org/multipage/web-sockets.html#feedback-from-the-protocol:concept-websocket-closed
> > 
> > The JS attack checks ws.readyState to determine the interval, which I think
> > calls into WebSocket::readyState(). So by delaying setting m_state, it
> > should be applicable to this specific method. But maybe I'm missing
> > something?
> 
> My point is that you can assume that if you got the error event, the
> readyState is closed, since that's the spec. Also, you can time how long it
> takes to get the error event...
> 

I see, adding the error event to the timer should fix this.

> > 
> > Good point about the spec. I can potentially add the error event to the
> > timer execution to happen after changing the state.
> 
> A SuspendableTimer then otherwise you are going to fire events while in the
> back/forward cache potentially.

Noted, good call!
Comment 7 Kate Cheney 2020-06-12 15:47:10 PDT
Created attachment 401796 [details]
Patch
Comment 8 Chris Dumez 2020-06-12 16:00:36 PDT
Comment on attachment 401796 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401796&action=review

> Source/WebCore/Modules/websockets/WebSocket.cpp:612
> +    auto closeDelay = Seconds((10.0 + static_cast<double>(cryptographicallyRandomNumber() % 50)) / 1000);

I think this deserves a comment to explain why we're doing this.

> Source/WebCore/Modules/websockets/WebSocket.h:150
> +    SuspendableTimer m_closingTimer;

Looks like I was wrong about needing a SuspendableTimer, sorry. A simple timer should suffice because WebSocket::dispatchOrQueueEvent() takes care of queueing events while suspended.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:1
> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

<!DOCTYPE html>

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:4
> +<script src="../../../../js-test-resources/js-test-pre.js"></script>

<script src="/js-test-resources/js-test.js"></script>

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:7
> +<div id="description"></div>

Not needed.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:8
> +<div id="console"></div>

Not needed.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:9
> +<script type="text/javascript">

type="text/javascript" is not needed.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:10
> +

Would be nice if this test had a call to description("blablabla") to set a description.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:11
> +window.jsTestIsAsync = true;

window. is not needed.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:13
> +function endTest()

What's the function of this function? Why not call finishJSTest directly?

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:89
> +<script src="../../../../js-test-resources/js-test-post.js"></script>

We don't use pre / post version in new tests anymore.
Comment 9 Kate Cheney 2020-06-12 16:37:48 PDT
Created attachment 401802 [details]
Patch
Comment 10 Chris Dumez 2020-06-12 16:59:17 PDT
Comment on attachment 401802 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401802&action=review

Patch seems correct. It would be good if a security person could take a look too before landing to make sure it properly mitigates the attacks.

> Source/WebCore/Modules/websockets/WebSocket.cpp:612
> +    auto closeDelay = Seconds((10.0 + static_cast<double>(cryptographicallyRandomNumber() % 50)) / 1000);

Maybe use Seconds::fromMilliseconds() to avoid / 1000.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:1
> +<!DOCTYPE HTML>

We usually don't uppercase it but it works.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:4
> +<script src="/js-test-resources/js-test.js"></script></head>

I think the </head> should be on the next line for consistency.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:8
> +jsTestIsAsync = true;

This test is still missing a description().

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:34
> +        if (ws.readyState === 0) // CONNECTING

WebSocket.CONNECTING then you don't need a comment.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:38
> +        if (ws.readyState === 1) // OPEN

ditto.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:42
> +        if (ws.readyState === 2) // CLOSING

ditto.

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:46
> +        if (ws.readyState === 3) // CLOSE

ditto.
Comment 11 Kate Cheney 2020-06-12 17:28:37 PDT
(In reply to Chris Dumez from comment #10)
> Comment on attachment 401802 [details]
> Patch

Thanks for the comments!

> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=401802&action=review
> 
> Patch seems correct. It would be good if a security person could take a look
> too before landing to make sure it properly mitigates the attacks.
> 

Sounds good, I'll ask someone to look. Looks like EWS might be unhappy too. 

> > Source/WebCore/Modules/websockets/WebSocket.cpp:612
> > +    auto closeDelay = Seconds((10.0 + static_cast<double>(cryptographicallyRandomNumber() % 50)) / 1000);
> 
> Maybe use Seconds::fromMilliseconds() to avoid / 1000.
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:1
> > +<!DOCTYPE HTML>
> 
> We usually don't uppercase it but it works.
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:4
> > +<script src="/js-test-resources/js-test.js"></script></head>
> 
> I think the </head> should be on the next line for consistency.
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:8
> > +jsTestIsAsync = true;
> 
> This test is still missing a description().
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:34
> > +        if (ws.readyState === 0) // CONNECTING
> 
> WebSocket.CONNECTING then you don't need a comment.
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:38
> > +        if (ws.readyState === 1) // OPEN
> 
> ditto.
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:42
> > +        if (ws.readyState === 2) // CLOSING
> 
> ditto.
> 
> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:46
> > +        if (ws.readyState === 3) // CLOSE
> 
> ditto.

I will fix these.
Comment 12 Chris Dumez 2020-06-12 17:41:09 PDT
Comment on attachment 401802 [details]
Patch

Reverting r+ due to web-platform-tests failures
Comment 13 Alex Christensen 2020-06-15 08:59:22 PDT
Comment on attachment 401802 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401802&action=review

> Source/WebCore/ChangeLog:11
> +        10 and 60 ms before reporting the closed state of a WebSocket

Where did these two numbers come from?
Comment 14 Kate Cheney 2020-06-15 09:11:33 PDT
(In reply to Alex Christensen from comment #13)
> Comment on attachment 401802 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=401802&action=review
> 
> > Source/WebCore/ChangeLog:11
> > +        10 and 60 ms before reporting the closed state of a WebSocket
> 
> Where did these two numbers come from?

Guessing and logging in the test case, mostly. They can be changed if someone has an argument for a better number.
Comment 15 youenn fablet 2020-06-15 11:22:44 PDT
Comment on attachment 401802 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401802&action=review

>>> Source/WebCore/Modules/websockets/WebSocket.cpp:612
>>> +    auto closeDelay = Seconds((10.0 + static_cast<double>(cryptographicallyRandomNumber() % 50)) / 1000);
>> 
>> Maybe use Seconds::fromMilliseconds() to avoid / 1000.
> 
> I will fix these.

I am not sure it is a good idea to delay m_state = CLOSED.
It might be better to just delay the event dispatching.
Comment 16 John Wilander 2020-06-15 11:25:56 PDT
(In reply to youenn fablet from comment #15)
> Comment on attachment 401802 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=401802&action=review
> 
> >>> Source/WebCore/Modules/websockets/WebSocket.cpp:612
> >>> +    auto closeDelay = Seconds((10.0 + static_cast<double>(cryptographicallyRandomNumber() % 50)) / 1000);
> >> 
> >> Maybe use Seconds::fromMilliseconds() to avoid / 1000.
> > 
> > I will fix these.
> 
> I am not sure it is a good idea to delay m_state = CLOSED.

Could you share what makes you think it's not a good idea? Are you thinking hangs?

> It might be better to just delay the event dispatching.

I worry that some internal logic that hangs off of m_state == CLOSED will expose the state prematurely in a side channel.
Comment 17 youenn fablet 2020-06-15 11:28:28 PDT
Ah, not possible m_state is actually exposed as readyState.
Comment 18 Kate Cheney 2020-06-15 11:35:03 PDT
I am able to fix all but 2 failing WPT tests. 

imported/w3c/web-platform-tests/websockets/interfaces/WebSocket/readyState/003.html calls ws.close() then deletes ws.readyState and checks the value. This is not compatible with delaying updating the readyState because there is no onclose() function.

and

imported/w3c/web-platform-tests/websockets/interfaces/WebSocket/close/close-nested.html also expects the readyState to be closed immediately after calling ws.close().

These may need other workarounds if we want to go forward with this method.
Comment 19 youenn fablet 2020-06-15 11:44:30 PDT
We might have a SocketStreamHandle which is closed but not its channel, which is probably a weird state.

We could emulate delaying the IPC message going from network process to web process at WebProcess reception time.
Comment 20 Kate Cheney 2020-06-15 13:05:47 PDT
Created attachment 401927 [details]
Patch
Comment 21 Kate Cheney 2020-06-15 13:10:17 PDT
(In reply to youenn fablet from comment #19)
> We might have a SocketStreamHandle which is closed but not its channel,
> which is probably a weird state.
> 

I moved the delay to be set in WebSocketChannel::fail(), which might solve this problem. It calls disconnect() on the SocketStreamHandle with the same delay that it changes the state with. Let me know what you think.


I am still not sure what to do to with the two wpt tests which rely on checking the closed state immediately after calling ws.close(). It is my understanding that we don't change these tests because they are imported.
Comment 22 Chris Dumez 2020-06-15 13:13:21 PDT
(In reply to katherine_cheney from comment #21)
> (In reply to youenn fablet from comment #19)
> > We might have a SocketStreamHandle which is closed but not its channel,
> > which is probably a weird state.
> > 
> 
> I moved the delay to be set in WebSocketChannel::fail(), which might solve
> this problem. It calls disconnect() on the SocketStreamHandle with the same
> delay that it changes the state with. Let me know what you think.
> 
> 
> I am still not sure what to do to with the two wpt tests which rely on
> checking the closed state immediately after calling ws.close(). It is my
> understanding that we don't change these tests because they are imported.

Those tests are correct and match the spec:
https://html.spec.whatwg.org/multipage/web-sockets.html#dom-websocket-close

There is no reason/need for your "fix" to impact the case where the JS is calling "close()" on the WebSocket as far as I can tell. So let's not.
Comment 23 Kate Cheney 2020-06-15 13:16:35 PDT
(In reply to Chris Dumez from comment #22)
> (In reply to katherine_cheney from comment #21)
> > (In reply to youenn fablet from comment #19)
> > > We might have a SocketStreamHandle which is closed but not its channel,
> > > which is probably a weird state.
> > > 
> > 
> > I moved the delay to be set in WebSocketChannel::fail(), which might solve
> > this problem. It calls disconnect() on the SocketStreamHandle with the same
> > delay that it changes the state with. Let me know what you think.
> > 
> > 
> > I am still not sure what to do to with the two wpt tests which rely on
> > checking the closed state immediately after calling ws.close(). It is my
> > understanding that we don't change these tests because they are imported.
> 
> Those tests are correct and match the spec:
> https://html.spec.whatwg.org/multipage/web-sockets.html#dom-websocket-close
> 
> There is no reason/need for your "fix" to impact the case where the JS is
> calling "close()" on the WebSocket as far as I can tell. So let's not.

Currently our expectations for these tests both have FAIL messages in them, so they already don't seem to work as expected with the spec.
Comment 24 Kate Cheney 2020-06-15 13:21:51 PDT
(In reply to katherine_cheney from comment #23)
> (In reply to Chris Dumez from comment #22)
> > (In reply to katherine_cheney from comment #21)
> > > (In reply to youenn fablet from comment #19)
> > > > We might have a SocketStreamHandle which is closed but not its channel,
> > > > which is probably a weird state.
> > > > 
> > > 
> > > I moved the delay to be set in WebSocketChannel::fail(), which might solve
> > > this problem. It calls disconnect() on the SocketStreamHandle with the same
> > > delay that it changes the state with. Let me know what you think.
> > > 
> > > 
> > > I am still not sure what to do to with the two wpt tests which rely on
> > > checking the closed state immediately after calling ws.close(). It is my
> > > understanding that we don't change these tests because they are imported.
> > 
> > Those tests are correct and match the spec:
> > https://html.spec.whatwg.org/multipage/web-sockets.html#dom-websocket-close
> > 
> > There is no reason/need for your "fix" to impact the case where the JS is
> > calling "close()" on the WebSocket as far as I can tell. So let's not.
> 
> Currently our expectations for these tests both have FAIL messages in them,
> so they already don't seem to work as expected with the spec.

With the delay, the expectations return PASS messages. That being said, the intention of this patch was obviously not to fix these tests, and adding a random delay to make a test work seems very wrong. I am trying to understand why we have the expectations for the tests set this way in the first place.
Comment 25 Chris Dumez 2020-06-15 14:31:48 PDT
(In reply to katherine_cheney from comment #24)
> (In reply to katherine_cheney from comment #23)
> > (In reply to Chris Dumez from comment #22)
> > > (In reply to katherine_cheney from comment #21)
> > > > (In reply to youenn fablet from comment #19)
> > > > > We might have a SocketStreamHandle which is closed but not its channel,
> > > > > which is probably a weird state.
> > > > > 
> > > > 
> > > > I moved the delay to be set in WebSocketChannel::fail(), which might solve
> > > > this problem. It calls disconnect() on the SocketStreamHandle with the same
> > > > delay that it changes the state with. Let me know what you think.
> > > > 
> > > > 
> > > > I am still not sure what to do to with the two wpt tests which rely on
> > > > checking the closed state immediately after calling ws.close(). It is my
> > > > understanding that we don't change these tests because they are imported.
> > > 
> > > Those tests are correct and match the spec:
> > > https://html.spec.whatwg.org/multipage/web-sockets.html#dom-websocket-close
> > > 
> > > There is no reason/need for your "fix" to impact the case where the JS is
> > > calling "close()" on the WebSocket as far as I can tell. So let's not.
> > 
> > Currently our expectations for these tests both have FAIL messages in them,
> > so they already don't seem to work as expected with the spec.
> 
> With the delay, the expectations return PASS messages. That being said, the
> intention of this patch was obviously not to fix these tests, and adding a
> random delay to make a test work seems very wrong. I am trying to understand
> why we have the expectations for the tests set this way in the first place.

When we import tests from W3C, we commit them with whatever result they produce. This means that a lot of their expected results contain fail lines. This helps us notice when somebody fixes a bug because these FAIL lines turn into PASS lines.
Comment 26 youenn fablet 2020-06-15 14:58:49 PDT
Comment on attachment 401927 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=401927&action=review

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:242
> +    m_failCloseTimer.startOneShot(closeDelay);

Isn't it WebSocketChannel::didFailSocketStream that gets notification that a port is blocked?

I am also wondering whether random timer noise cannot be filtered out by doing repetitive WebSocket loads and averaging the results.
Maybe a fixed value of no less than XX (100 ms maybe) between the time WebSocketChannel::connect is executed and the time the error is surfacing to JS would be good?

> LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:20
> +    setTimeout(scan_port(1234), 10);

We potentially run tests for 8080 and 1234 in parallel.
It is usually best to run them sequentially for stability and debuggability.
If scan_port returns a promise, then we could write: await scan_port(8080); await scan_port(1234);
Should we use testharness.js we could use promise_test.
Comment 27 Kate Cheney 2020-06-15 15:45:32 PDT
(In reply to youenn fablet from comment #26)
> Comment on attachment 401927 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=401927&action=review
> 
Thanks for the comments Youenn! It seems like this might need some refactoring to add a floor value to avoid the averaging tactic. 

> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:242
> > +    m_failCloseTimer.startOneShot(closeDelay);
> 
> Isn't it WebSocketChannel::didFailSocketStream that gets notification that a
> port is blocked?
> 
Yes good point, WebSocketChannel::didFailSocketStream() gets called when a port is closed. WebSocketChannel::fail() gets called when a port is open but the connection fails. 

I think we need a delay or floor value in both places to make sure they aren't distinguishable.

> I am also wondering whether random timer noise cannot be filtered out by
> doing repetitive WebSocket loads and averaging the results.
> Maybe a fixed value of no less than XX (100 ms maybe) between the time
> WebSocketChannel::connect is executed and the time the error is surfacing to
> JS would be good?
> 
Yes, this might be good. Tricky because it seems like we need to also delay disconnecting the handle, otherwise that could be an indicator of port status. This happens in WebSocketChannel.cpp I think.

> > LayoutTests/http/tests/websocket/tests/hybi/port-scan.html:20
> > +    setTimeout(scan_port(1234), 10);
> 
> We potentially run tests for 8080 and 1234 in parallel.
> It is usually best to run them sequentially for stability and debuggability.
> If scan_port returns a promise, then we could write: await scan_port(8080);
> await scan_port(1234);
> Should we use testharness.js we could use promise_test.

Thanks for pointing this out, I'll make this change.
Comment 28 Kate Cheney 2020-07-07 14:13:04 PDT
Created attachment 403724 [details]
Patch
Comment 29 Kate Cheney 2020-07-07 14:16:21 PDT
Created attachment 403725 [details]
Patch
Comment 30 Kate Cheney 2020-07-07 15:11:47 PDT
Waiting for EWS before setting the review flag.
Comment 31 Kate Cheney 2020-07-07 19:22:46 PDT
It would be great if someone familiar with the WebSocketChannel code could take a look at this patch, specifically at WebSocketChannel::didFailSocketStream() -- in what situation will this be called with a handle different from m_handle, if ever? Other places in the code seem to assume handle == m_handle, but I wanted to check, because this case will not have a delay according to this patch.


I changed expectations for web-platform-tests now that there is a delay setting the state to closed. Since the tests are checking ws.readyState immediately after calling ws.close (the same method used in a port-scanning attack), I can’t think of a way to avoid updating expectations to account for the delay.
Comment 32 Geoffrey Garen 2020-07-08 11:32:08 PDT
I think Youenn is our best WebSocket expert; he's on vacation for two weeks.
Comment 33 Chris Dumez 2020-07-08 12:25:44 PDT
Comment on attachment 403725 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403725&action=review

This patch seems to add a lot of complexity and I have trouble believing it is all needed.

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:207
> +Seconds WebSocketChannel::delayTime()

Why isn't this const?

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:210
> +    if (timeSinceConnection.milliseconds() < 100)

if (timeSinceConnection < 100_ms)

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:211
> +        return Seconds::fromMilliseconds(100) - timeSinceConnection;

100_ms - timeSinceConnection

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:213
> +    return Seconds(0);

return { };

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:226
>  void WebSocketChannel::fail(const String& reason)

Can you clarify why we cannot delay the whole call to WebSocketChannel::fail() when there is a networking error? Seems very weird that we would need 2 timers for this.

> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:337
> +        m_closeDelayTimer.startOneShot(Seconds::fromMilliseconds(100));

100_ms
Comment 34 Kate Cheney 2020-07-08 13:07:06 PDT
(In reply to Chris Dumez from comment #33)
> Comment on attachment 403725 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=403725&action=review
> 
> This patch seems to add a lot of complexity and I have trouble believing it
> is all needed.
> 

Thanks for the comments, I agree it ended up being more complex than I originally thought it would be. I had to make changes to fix failing tests, see inline comment for explanation.

> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:207
> > +Seconds WebSocketChannel::delayTime()
> 
> Why isn't this const?
> 

Mistake, will change.

> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:210
> > +    if (timeSinceConnection.milliseconds() < 100)
> 
> if (timeSinceConnection < 100_ms)

Will fix.

> 
> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:211
> > +        return Seconds::fromMilliseconds(100) - timeSinceConnection;
> 
> 100_ms - timeSinceConnection
> 

Ditto.

> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:213
> > +    return Seconds(0);
> 
> return { };

Ditto. 

> 
> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:226
> >  void WebSocketChannel::fail(const String& reason)
> 
> Can you clarify why we cannot delay the whole call to
> WebSocketChannel::fail() when there is a networking error? Seems very weird
> that we would need 2 timers for this.
> 

This seems to be bringing up 2 different points. 

1) Do you mean delaying WebSocketChannel::fail() at its call-site? Or moving the entire function into the Timer? I didn't put the entire function in a Timer because some of it relies on the 'reason' parameter, which isn't known when the Timer is initialized.

2) I added the second timer because I was hitting failing tests -- the web socket error was being delayed in WebSocketChannel::fail(), but WebSocketChannel::didCloseSocketStream() was being called immediately when the stream reached the end of its data, without delay (usually before the error was reported). From the specs, we know an error should always be reported first, before close. So I added a timer to delay the call to close the socket stream if an error is pending. Maybe there's a simpler way?

> > Source/WebCore/Modules/websockets/WebSocketChannel.cpp:337
> > +        m_closeDelayTimer.startOneShot(Seconds::fromMilliseconds(100));
> 
> 100_ms

Will change.
Comment 35 Chris Dumez 2020-07-08 13:09:14 PDT
Comment on attachment 403725 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403725&action=review

>>> Source/WebCore/Modules/websockets/WebSocketChannel.cpp:226
>>>  void WebSocketChannel::fail(const String& reason)
>> 
>> Can you clarify why we cannot delay the whole call to WebSocketChannel::fail() when there is a networking error? Seems very weird that we would need 2 timers for this.
> 
> This seems to be bringing up 2 different points. 
> 
> 1) Do you mean delaying WebSocketChannel::fail() at its call-site? Or moving the entire function into the Timer? I didn't put the entire function in a Timer because some of it relies on the 'reason' parameter, which isn't known when the Timer is initialized.
> 
> 2) I added the second timer because I was hitting failing tests -- the web socket error was being delayed in WebSocketChannel::fail(), but WebSocketChannel::didCloseSocketStream() was being called immediately when the stream reached the end of its data, without delay (usually before the error was reported). From the specs, we know an error should always be reported first, before close. So I added a timer to delay the call to close the socket stream if an error is pending. Maybe there's a simpler way?

I meant at the call site. Note that there may be call sites that we may not want to delay (e.g. maybe when WebSocket::close() is called by JS, which is not truly a network error).
Comment 36 Chris Dumez 2020-07-08 14:35:37 PDT
Could we only add logic to WebSocketChannel::didFailSocketStream() which adds a random delay before running its whole implementation?
Comment 37 Kate Cheney 2020-07-08 14:50:58 PDT
(In reply to Chris Dumez from comment #36)
> Could we only add logic to WebSocketChannel::didFailSocketStream() which
> adds a random delay before running its whole implementation?

I agree this is probably the simplest right now. Though I am not sure of the right range for a random delay to make sure we prevent distinguishing open and closed ports. If the delay for closed ports is consistently too high or too low, I think someone could average results of multiple scans, even with randomness. e.g. if we choose a range of 0-10ms for a delay, and the real difference ends up being ~100ms. Even with randomness, it is easy to tell the difference.
Comment 38 Kate Cheney 2020-07-08 17:42:38 PDT
Created attachment 403828 [details]
Patch
Comment 39 Kate Cheney 2020-07-08 17:43:56 PDT
(In reply to katherine_cheney from comment #38)
> Created attachment 403828 [details]
> Patch

This is a simpler patch with only a delay for closed ports, as Chris suggested. The random delay range can be adjusted. I am still investigating the windows failure so I am waiting to set the r flag.
Comment 40 Kate Cheney 2020-07-09 11:00:46 PDT
Created attachment 403891 [details]
Patch
Comment 41 EWS 2020-07-13 10:58:03 PDT
Committed r264306: <https://trac.webkit.org/changeset/264306>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 403891 [details].
Comment 42 Simon Fraser (smfr) 2020-07-13 11:46:54 PDT
Comment on attachment 403891 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=403891&action=review

> Source/WebKit/NetworkProcess/NetworkSocketStream.cpp:106
> +static const auto delayMaxMilliseconds = 100;
> +static const double delayMinMilliseconds = 10;

It would be nice to use Seconds for these; you can write 100_ms.
Comment 43 Chris Dumez 2020-07-13 11:47:45 PDT
(In reply to Simon Fraser (smfr) from comment #42)
> Comment on attachment 403891 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=403891&action=review
> 
> > Source/WebKit/NetworkProcess/NetworkSocketStream.cpp:106
> > +static const auto delayMaxMilliseconds = 100;
> > +static const double delayMinMilliseconds = 10;
> 
> It would be nice to use Seconds for these; you can write 100_ms.

Right, I made similar suggestions in my earlier review of this patch.
Comment 44 Kate Cheney 2020-07-13 12:50:25 PDT
Reopening to attach new patch.
Comment 45 Kate Cheney 2020-07-13 12:50:26 PDT
Created attachment 404165 [details]
Patch
Comment 46 Darin Adler 2020-07-13 12:56:05 PDT
Comment on attachment 404165 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=404165&action=review

> Source/WebKit/NetworkProcess/NetworkSocketStream.cpp:107
> -static const auto delayMaxMilliseconds = 100;
> -static const double delayMinMilliseconds = 10;
> +static const auto delayMax = 100_ms;
> +static const auto delayMin = 10_ms;
>  static const auto closedPortErrorCode = 61;

Should be using constexpr instead of const for new things like this these days.
Comment 47 Kate Cheney 2020-07-13 12:58:28 PDT
Created attachment 404166 [details]
Patch
Comment 48 EWS 2020-07-13 14:08:01 PDT
Committed r264316: <https://trac.webkit.org/changeset/264316>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 404166 [details].