Bug 235707 - WebSocket.send() should synchronously update bufferedAmount
Summary: WebSocket.send() should synchronously update bufferedAmount
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: Safari 15
Hardware: All All
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2022-01-27 05:46 PST by Roberto D'Auria
Modified: 2022-05-11 10:14 PDT (History)
8 users (show)

See Also:


Attachments
JS console on Epiphany (Linux) (252.71 KB, image/png)
2022-01-28 09:16 PST, Roberto D'Auria
no flags Details
Network tab on Epiphany (Linux) (12.62 KB, image/png)
2022-01-28 09:17 PST, Roberto D'Auria
no flags Details
Patch (8.72 KB, patch)
2022-03-07 11:59 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (20.95 KB, patch)
2022-03-07 13:24 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (55.45 KB, patch)
2022-03-07 13:33 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (55.98 KB, patch)
2022-03-07 13:40 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (56.07 KB, patch)
2022-03-07 13:41 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (61.23 KB, patch)
2022-03-07 16:35 PST, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (62.53 KB, patch)
2022-03-07 22:34 PST, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roberto D'Auria 2022-01-27 05:46:48 PST
Overview:

Checking the bufferedAmount property is not sufficient to prevent send() from crashing with “Failed to send WebSocket frame” while attempting to maintain a steady send backlog. The code described below works as intended on many platforms but not the iPhone 11s and other fast MacOS devices running Safari.

Context:

The M-Lab team maintains a WebSocket-based network measurement client, https://speed.measurementlab.net/ (and elsewhere) which measures the maximum TCP throughput of the user’s Internet connection. The upload throughput measurement sends repeated messages over a WebSocket connection for 10 seconds. It uses bufferedAmount to monitor the backlog of untransmitted data. The goal is to always have some data ready and waiting to be transmitted but not so much data that the buffer overflows or takes too long to drain at the end of the test.

This technique works well on a large number of platforms (Chrome/Chromium, Firefox, Opera and Edge on both desktop and mobile where applicable). except it fails on the iPhone 11s and other fast Apple devices. We had previously implemented a workaround by limiting the desired backlog size to a value much smaller than the maximum buffer size. The problem seems to be that the bufferedAmount property is not updated in a timely manner following a send(), resulting in overrunning the buffer on faster devices.

Additional details:

The message size starts very small (8KB) and is doubled over time up to a maximum of 8MB within the initial 2-3 seconds of the measurement. We aim to keep 8 messages in the buffer at all times. To determine whether we can push more messages into the WebSocket buffer without exceeding its maximum size, we compare the bufferedAmount property of the WebSocket object with a desiredBuffer (8 * data.length, so up to 64 MB).

We received several reports of the upload function crashing on WebKit-based browsers (specifically, Safari) due to the code attempting to call WebSocket.send() when the buffer is full (or would exceed its maximum size after the send()).

While debugging this issue, we noticed that the bufferedAmount property does not seem to be updated immediately after a send() but it’s updated asynchronously and lags behind in a non-predictable way, not reflecting the actual state of the buffer at the time the property is checked. This causes in some cases (depending on the user’s hardware, connection speed, and latency) the algorithm to attempt to send more data than what the WS buffer can hold at the moment and the connection to be abruptly terminated during a measurement with a "Failed to send WebSocket frame" error on the Javascript console.

You can see some of the investigation that was made and further details at:

- https://github.com/m-lab/ndt7-js/issues/44#issuecomment-839912733
- https://www.measurementlab.net/blog/ndt7-updates/

As you can see from the comments to the GitHub issue above and as we confirmed in our testing, it is possible for bufferedAmount to be reported as zero at the time of the check but the subsequent call to sock.send(data) fails -- we suspect because sending data would overflow the buffer.

We attempted to work around this issue by basically slowing down the send loop and queuing fewer messages, as you can see in this PR:
https://github.com/m-lab/ndt7-js/pull/45/files

While this seemed to work reasonably well initially, we are still getting reports from our users about the upload function crashing intermittently.

This is the relevant part of the uploader function’s code (full code at https://github.com/m-lab/ndt7-js/blob/main/src/ndt7-upload-worker.js), which works as expected on all the other browsers listed above.


function uploader(data, start, end, previous, total) {
    // [... part of the function omitted ...]

    const maxMessageSize = 8388608; /* = (1<<23) = 8MB */

    // Message size is doubled after the first 16 messages, and subsequently
    // every 8, up to maxMessageSize.
    const nextSizeIncrement =
        (data.length >= maxMessageSize) ? Infinity : 16 * data.length;

    // Determine whether it's time to double the message size.
    if ((total - sock.bufferedAmount) >= nextSizeIncrement) {
      data = new Uint8Array(data.length * 2);
    }

    // We keep 7 messages in the send buffer, so there is always some more
    // data to send. The maximum buffer size is 7 * 8MB - 1 byte ~= 56M.
    // Note: this was previously 8 messages - changed to 7 in an attempt to work around this issue.
    const desiredBuffer = 7 * data.length;
 
    // XXX: on WebKit this check can succeed but the subsequent sock.send(data) will fail.
    if (sock.bufferedAmount < desiredBuffer) {
      sock.send(data);
      total += data.length;
    }

    // Loop the uploader function in a way that respects the JS event handler.
    setTimeout(() => uploader(data, start, end, previous, total), 0);
  }

We have been looking for similar issues with bufferedAmount reported in the past. The only issue we found was a regression in 2017 (https://bugs.webkit.org/show_bug.cgi?id=170463) where the bufferedAmount property didn't seem to be updated at all, which is marked as RESOLVED FIXED.

Actual results:
The "bufferedAmount" property reports a value of zero (or small enough for another send() to be called). After send(), the WebSocket connection crashes with a "Failed to send WebSocket frame" error.

Expected results:
As long as the backlog is smaller than (maximum WebSocket buffer size - data.length), the next send(data) should always succeed.
Comment 1 Roberto D'Auria 2022-01-28 09:14:19 PST
Adding some more information I've gathered about this:

I've also tested the most recent WebKit release on a Linux system (using Epiphany Technology Preview as recommended by the repository's README) - version 42.alpha-23-ga73a12344+.

On Linux, the JavaScript code above fails in a different way:

- bufferedAmount never exceeds the size of a single message (8388608 bytes, or 8KB). It seems strange since this should depend on how fast data is written to the network. With the same hardware/Internet connection on a different browser, I get different numbers every time bufferedAmount is printed on the console during the measurement.

- send() does never fail, but the amount of data "sent" is exceedingly high - way more than my link rate, which is 100Mb/s.

- the "Network" tab reports 1.21 GBytes sent over 9.5 seconds for this WebSocket connection. This matches what's detected by the javascript code (i.e. the "total" variable at the end of the measurement).

- the Mb/s rate with the above figures would be ~975 Mb/s.

- closing the Developer Tools during the measurement brings the speed up to > 4Gb/s.

I suspect this is either measuring how fast data can be written to memory, or just how fast the messages are processed and discarded after a send(), rather than the speed at which data is actually sent over the network.

The behavior of WebKit's WebSocket on Linux is quite different than on MacOS/iOS, based on what I'm observing. While on MacOS/iOS the issue requires a specific combination of network speed and device speed (indicatively, an upload over 100Mb/s on fast hardware) to be triggered and causes send() to fail with an error, on Linux systems the above happens every time so it can perhaps be helpful for debugging. I've also tested other WebKit-based browsers in the past (Midori and a previous release of Epiphany), getting the same results.

I've added a debug console.log() just before the bufferedAmount < desiredBuffer check and the result is visible at https://mlab-sandbox.web.app/ - just start the test and look at the JS console when it gets to the upload phase.

P.S. Please let me know if these should be treated as different bugs and I should open a separate report. It seems to me that there might be a common cause explaining both behaviors on MacOS and Linux, but I'm not familiar enough with WebKit's source code to tell.
Comment 2 Roberto D'Auria 2022-01-28 09:16:26 PST
Created attachment 450239 [details]
JS console on Epiphany (Linux)
Comment 3 Roberto D'Auria 2022-01-28 09:17:15 PST
Created attachment 450240 [details]
Network tab on Epiphany (Linux)
Comment 4 Radar WebKit Bug Importer 2022-02-03 05:47:18 PST
<rdar://problem/88431345>
Comment 5 Alex Christensen 2022-03-07 11:59:29 PST
Created attachment 454008 [details]
Patch
Comment 6 Chris Dumez 2022-03-07 12:23:43 PST
Comment on attachment 454008 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=454008&action=review

> Source/WebCore/Modules/websockets/WebSocket.cpp:355
> +    m_bufferedAmount += message.utf8().length();

I looked at our implementation and it seems we end up calling `utf8(StrictConversionReplacingUnpairedSurrogatesWithFFFD)`, not utf8(). I suspect this can have an impact on the actual number of bytes sent, no?
If so, should we call utf8(StrictConversionReplacingUnpairedSurrogatesWithFFFD) here and for payloadSize a few lines above?

Also, to avoid multiple conversion to UTF8() and and avoid little consistency issues like that, it may be nice to convert to utf8 in this function and then pass a CString to ThreadableWebSocketChannel::send().

May be nice to do the conversion earlier and have only one payloadSize computation in this function too.
Comment 7 youenn fablet 2022-03-07 12:26:10 PST
Comment on attachment 454008 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=454008&action=review

> Source/WebCore/Modules/websockets/WebSocket.cpp:355
> +    m_bufferedAmount += message.utf8().length();

RTCDataChannel does this conversion once, which seems nice.
We should also increment for blob like we do in RTCDataChannel.
And ArrayBufferView as well.
Comment 8 Alex Christensen 2022-03-07 13:24:24 PST
Created attachment 454021 [details]
Patch
Comment 9 Alex Christensen 2022-03-07 13:33:20 PST
Created attachment 454023 [details]
Patch
Comment 10 Chris Dumez 2022-03-07 13:35:37 PST
Comment on attachment 454021 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=454021&action=review

Nice

> Source/WebCore/Modules/websockets/WebSocket.cpp:352
> +    auto utf8 = message.utf8(StrictConversionReplacingUnpairedSurrogatesWithFFFD);

I mentioned it in my earlier review but which not do this before the if check above so that we can do:
`size_t payloadSize = utf8.length();`
inside the if check instead of:
`size_t payloadSize = message.utf8().length();`
Comment 11 Chris Dumez 2022-03-07 13:36:41 PST
Comment on attachment 454023 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=454023&action=review

> Source/WebCore/Modules/websockets/WebSocket.cpp:352
> +    auto utf8 = message.utf8(StrictConversionReplacingUnpairedSurrogatesWithFFFD);

Please see my comment on the other patch about doing this before the if check above so we can use it inside the if check.
Comment 12 Alex Christensen 2022-03-07 13:40:10 PST
Created attachment 454025 [details]
Patch
Comment 13 Alex Christensen 2022-03-07 13:41:23 PST
Created attachment 454026 [details]
Patch
Comment 14 Alex Christensen 2022-03-07 16:35:09 PST
Created attachment 454050 [details]
Patch
Comment 15 Alex Christensen 2022-03-07 22:34:49 PST
Created attachment 454074 [details]
Patch
Comment 16 Alex Christensen 2022-03-08 09:03:21 PST
r290995
Comment 17 Roberto D'Auria 2022-05-11 07:52:40 PDT
Hi,

We are testing the fix above on the latest (version 144) Safari Technology Preview, and it looks like there is a significant regression. On some machines (more details below) bufferedAmount isn’t decremented anymore after just a few writes, so the client stops sending data. Is it possible that bufferedAmount is not being decremented correctly in all the possible places in the code?

The testing was done with the same uploader function in the message above, with added debugging logs around the sock.send():

console.log("desired buffer: " + desiredBuffer + " - bufferedAmount: " + sock.bufferedAmount);
if (sock.bufferedAmount < desiredBuffer) {
         console.log("Sending " + data.length + " bytes");
         sock.send(data);
         console.log("Sent");
         total += data.length;
}

The very first calls to send() work normally. After flushing the buffer at least once (I can see bufferedAmount going down to zero), bufferedAmount reaches the desiredBuffer and then it’s never decremented again for the whole duration of the measurement -- 10 seconds. From the logs attached you can also see that NumBytes (total number of bytes sent) stays at 64KB and does not increase anymore.

This code works as intended on Safari stable (except for the sporadic crashes that caused us to report the original bug) and every other major browser.

This new issue also affects the speedtest at https://speed.measurementlab.net and Google's own speedtest client in search pages (OneBox), both based on websockets and the ndt7 protocol. Both of them report extremely low upload rates due to no new data being sent after the initial few milliseconds, matching what we observed in our testing. We believe that including this patch as-is in the next Safari release will cause problems to all the integrations of the ndt7 speedtest protocol and everyone else relying on bufferedAmount.

We have noticed differences in behavior depending on the hardware this was tested on:

2019 iMac 8-Core Intel i9, 64GB ram, MacOS Monterey: Works as intended
2019 MacBook Pro 8-core Intel i9 16GB ram, MacOS Monterey: Works as intended
2018 MacBook Pro 4-core Intel i7 16GB ram, MacOS Big Sur: Never works, stops sending after a few milliseconds as described above
2020 M1 MacBook Air, 16GB ram, MacOS Monterey: Works as intended when the tab running the measurement is in the foreground, stops sending when the tab is in the background. This is the most puzzling finding, but it could be replicated very consistently

Each test was run on a different network connection. Link speed does not seem to make a difference.

Console logs, with comments added, collected on the 2018 MacBook Pro running Safari Technology Preview 144:

[Log] Upload started 
[Log] desired buffer: 57344 - bufferedAmount: 0 

First calls to send():

[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 8192 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 16384 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 24576 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 32768 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 40960 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 49152 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 57344 

bufferedAmount is decremented for the first time here:

[Log] desired buffer: 57344 - bufferedAmount: 24600 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 0 

After this point, bufferedAmount is not decremented anymore, as if no data is being sent over the network:

[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 8192 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 16384 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 24576 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 32768 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 40960 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 49152 
[Log] Sending 8192 bytes 
[Log] Sent 
[Log] desired buffer: 57344 - bufferedAmount: 57344 

From here on the client stops sending, since bufferedAmount == desiredBuffer. As a result, no new send() happens until the end of the 10-second measurement interval. What follows are just periodic measurement results which continue to decrease since no new data is sent:

[Log] Server-side measurement:  \'96 \{ConnectionInfo: \{Client: "<redacted>:52361", Server: "149.20.5.88:443", UUID: "<redacted>"\}, BBRInfo: Object, TCPInfo: Object\} 
\{ConnectionInfo: \{Client: "<redacted>:52361", Server: "149.20.5.88:443", UUID: "<redacted>"\}, BBRInfo: Object, TCPInfo: Object\}Object\
[Log] desired buffer: 57344 - bufferedAmount: 57344 
[Log] Client-side measurement:  \'96 \{ElapsedTime: 0.253, NumBytes: 65536, MeanClientMbps: 2.072284584980237\} 
[Log] desired buffer: 57344 - bufferedAmount: 57344 
[Log] Client-side measurement:  \'96 \{ElapsedTime: 0.507, NumBytes: 65536, MeanClientMbps: 1.0340986193293884\} 
[Log] desired buffer: 57344 - bufferedAmount: 57344 
[Log] Server-side measurement:  \'96 \{ConnectionInfo: \{Client: "<redacted>:52361", Server: "149.20.5.88:443", UUID: "<redacted>"\}, BBRInfo: Object, TCPInfo: Object\} 
\{ConnectionInfo: \{Client: "<redacted>:52361", Server: "149.20.5.88:443", UUID: "<redacted>"\}, BBRInfo: Object, TCPInfo: Object\}Object\
[Log] desired buffer: 57344 - bufferedAmount: 57344 
[Log] Client-side measurement:  \'96 \{ElapsedTime: 0.759, NumBytes: 65536, MeanClientMbps: 0.6907615283267456\} 
[Log] desired buffer: 57344 - bufferedAmount: 57344 
[Log] Server-side measurement:  \'96 \{ConnectionInfo: \{Client: "<redacted>:52361", Server: "149.20.5.88:443", UUID: "<redacted>"\}, BBRInfo: Object, TCPInfo: Object\} 

(... snip ...)
Comment 18 Alexey Proskuryakov 2022-05-11 08:56:44 PDT
Could you please file a new bug? Sorry for asking you to do more work, but we WILL get confused by tracking both old and new issues in one bug.
Comment 19 Roberto D'Auria 2022-05-11 10:14:37 PDT
Of course, no problem. Please see https://bugs.webkit.org/show_bug.cgi?id=240311 -- and thank you for looking into this.