Bug 170463 - REGRESSION(r204512): WebSocket errors with "Failed to send WebSocket frame." if too much data is sent
Summary: REGRESSION(r204512): WebSocket errors with "Failed to send WebSocket frame." ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore JavaScript (show other bugs)
Version: Safari 10
Hardware: Macintosh macOS 10.12
: P2 Normal
Assignee: Alex Christensen
URL: https://watson-speech.mybluemix.net/f...
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-04-04 12:14 PDT by Nathan Friedly
Modified: 2017-10-17 06:21 PDT (History)
18 users (show)

See Also:


Attachments
Patch (2.33 KB, patch)
2017-04-06 09:55 PDT, Alex Christensen
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Friedly 2017-04-04 12:14:45 PDT
I have some code that streams audio data over a WebSocket connection. In Safari 10.03 (and it other current and recent browsers), it works as expected. In Safari 10.1, it errors with "WebSocket connection to wss://[...] failed: Failed to send WebSocket frame." and closes the connection with status 1006 (CLOSE_ABNORMAL) after "too much" data is sent. (I haven't worked out the exact amount, but artificially slowing down my source seems to make it work sometimes. My code checks the bufferedAmount property and slows it self down if it gets too high, but this appears to always be 0 in Safari until after the error.)

This is some example code that reproduces the issue:


// this fails in Safari 10.1 but works in 10.03 (and other browsers)
var ws = new WebSocket('wss://echo.websocket.org');

ws.onerror = function(evt) {
  // Not sure why, but error events don't seem to include the actual error
  // The console, however, will have the following error:
  // WebSocket connection to 'wss://echo.websocket.org/' failed: Failed to send WebSocket frame.
  console.log("WebSocket error - see console for message");
}

ws.onclose = function(evt) {
  console.log(`WebSocket closed with code: ${evt.code}, reason: ${evt.reason}`);
}

ws.onopen = function() {
  console.log('sending first binary message');
  ws.send(new Uint8Array(23085)); // works
  console.log('bufferedAmount is ' + ws.bufferedAmount); // 0

  // this gets the error
  console.log('sending second binary message');
  ws.send(new Uint8Array(23085)); // triggers error and close
  console.log('bufferedAmount is ' + ws.bufferedAmount); // 0

  console.log('sending third binary message');
  ws.send(new Uint8Array(23085));
  console.log('bufferedAmount is ' + ws.bufferedAmount);

  ws.close();
}


https://jsfiddle.net/yta2mjuf/

My actual code is at https://github.com/watson-developer-cloud/speech-javascript-sdk/blob/v0.32.3/speech-to-text/recognize-stream.js, and you can see a live example of it at https://watson-speech.mybluemix.net/file-url.html
Comment 1 Nathan Friedly 2017-04-04 12:22:33 PDT
Op, the JSFiddle link should be https://jsfiddle.net/yta2mjuf/4/
Comment 2 Nathan Friedly 2017-04-04 12:43:27 PDT
Safari Technology Preview behaves the same as 10.1.

WebKit Nightly (r214532) behaves a little differently -  it logs the "Failed to send WebSocket frame" error three times and doesn't close the connection until the end.
Comment 3 Radar WebKit Bug Importer 2017-04-04 14:04:33 PDT
<rdar://problem/31436977>
Comment 4 Adam Duren 2017-04-05 15:04:25 PDT
We have encountered this as well. Without a workaround our product which requires moving large amounts of data over WebSockets is totally broken in Safari.
Comment 5 Alex Christensen 2017-04-05 15:26:51 PDT
CFWriteStreamCanAcceptBytes seems to be returning false.  We should just put the bytes in the buffer in this case instead of reporting a failure.
Comment 6 Alex Christensen 2017-04-05 15:49:25 PDT
Carlos, does this reproduce on linux, too?
Comment 7 Carlos Garcia Campos 2017-04-05 22:43:40 PDT
(In reply to Alex Christensen from comment #6)
> Carlos, does this reproduce on linux, too?

I'm not sure how to test it, but loading https://jsfiddle.net/yta2mjuf/4/ and sending console messages to stdout, this is what I get:

https://fiddle.jshell.net/_display/:60:14: CONSOLE LOG sending first binary message
https://fiddle.jshell.net/_display/:62:14: CONSOLE LOG bufferedAmount is 0
https://fiddle.jshell.net/_display/:65:14: CONSOLE LOG sending second binary message
https://fiddle.jshell.net/_display/:67:14: CONSOLE LOG bufferedAmount is 0
https://fiddle.jshell.net/_display/:69:14: CONSOLE LOG sending third binary message
https://fiddle.jshell.net/_display/:71:14: CONSOLE LOG bufferedAmount is 0
https://fiddle.jshell.net/_display/:56:14: CONSOLE LOG WebSocket closed with code: 1005, reason:
Comment 8 Carlos Garcia Campos 2017-04-05 22:45:40 PDT
And this is what I get with WebKitGTK+ 2.16 branch (before web sockets were moved to the network process)

https://fiddle.jshell.net/_display/:60:14: CONSOLE LOG sending first binary message
https://fiddle.jshell.net/_display/:62:14: CONSOLE LOG bufferedAmount is 6709
https://fiddle.jshell.net/_display/:65:14: CONSOLE LOG sending second binary message
https://fiddle.jshell.net/_display/:67:14: CONSOLE LOG bufferedAmount is 29802
https://fiddle.jshell.net/_display/:69:14: CONSOLE LOG sending third binary message
https://fiddle.jshell.net/_display/:71:14: CONSOLE LOG bufferedAmount is 52895
https://fiddle.jshell.net/_display/:56:14: CONSOLE LOG WebSocket closed with code: 1005, reason:
Comment 9 Jonas 2017-04-06 04:14:10 PDT
We too suffer from this blocking issue. We transfer large amounts of data in our product using a WebSocket.

We have emailed all of our customers and partners to postpone the update of Safari and in the meanwhile we are working on a hack/patch to limit the amount of data we transfer. This fix however is seriously affecting the user experience.

Hope to receive additional feedback about this issue.
Comment 10 Alex Christensen 2017-04-06 09:11:47 PDT
This regressed with http://trac.webkit.org/r204512
Comment 11 Adam Duren 2017-04-06 09:20:31 PDT
@jona(In reply to Jonas from comment #9)
Our workaround in the meantime was to delay between each call to send. Our service was already chunking data to 8000 bytes and a delay of 125ms seemed to mostly smooth over this issue in the meantime but with a hit to overall performance. It will most likely depend on individual message sizes, the bandwidth of your users, and other network factors.
Comment 12 Alex Christensen 2017-04-06 09:55:51 PDT
Created attachment 306397 [details]
Patch
Comment 13 Alex Christensen 2017-04-06 09:58:45 PDT
The attached patch fixes your issue and restores our compatibility with other browsers.
Carlos, could you see if it fixes the libsoup implementation?
I'm having trouble writing a regression test that fails before this change and succeeds after using a local python server.  What exactly is echo.websocket.org doing?  Is the source available somewhere?  Is it taking a long time to do something?  Maybe it's just that it's a remote server, so sending and receiving takes longer than it ever would when testing on a local machine.
Comment 14 Michael Catanzaro 2017-04-06 10:01:22 PDT
Comment on attachment 306397 [details]
Patch

Um, shouldn't we fix the callers to expect std::nullopt instead? If I saw this when working on nearby code, I'd be tempted to change it back to return nullopt and call it good if it doesn't break any tests.
Comment 15 Alex Christensen 2017-04-06 10:03:57 PDT
No, if CFWriteStreamWrite returns -1, then we do need an error because something has gone wrong with the socket.  If the stream cannot accept bytes yet, then we report that we have written 0 bytes and there was no error.  The bytes go into the buffer and they will be written later.
Comment 16 Nathan Friedly 2017-04-06 10:35:24 PDT
(In reply to Alex Christensen from comment #13)
> The attached patch fixes your issue and restores our compatibility with
> other browsers.
> Carlos, could you see if it fixes the libsoup implementation?
> I'm having trouble writing a regression test that fails before this change
> and succeeds after using a local python server.  What exactly is
> echo.websocket.org doing?  Is the source available somewhere?  Is it taking
> a long time to do something?  Maybe it's just that it's a remote server, so
> sending and receiving takes longer than it ever would when testing on a
> local machine.

I think you're correct that the difference is just that it's a remote vs local server.

You could try increasing the amount of data the test sends, but I'm not sure how much it would take to get a reliable test. Alternatively, is it possible to throttle the connection to the local server, or otherwise make it slow?

echo.websocket.org is just supposed to respond with whatever data you sent it. My actual server is stream.watsonplatform.net - but that requires credentials.
Comment 17 Alex Christensen 2017-04-06 11:18:31 PDT
I don't think this should be committed without a regression test to make sure we never regress this behavior again.  Using https://trac.webkit.org/browser/webkit/trunk/LayoutTests/http/tests/websocket/tests/hybi/send_wsh.py doesn't reproduce the issue.
Comment 18 Alex Christensen 2017-04-07 11:28:38 PDT
I committed this to https://trac.webkit.org/changeset/215102/webkit after manually verifying manually that Nathan's test case was fixed, meaning the error was no longer there and data was actually sent and received.

Nathan, Adam, and Jonas, could you please do some testing with the upcoming Safari Technology Preview?  We have released them every 2 weeks or so at https://developer.apple.com/safari/technology-preview/
Comment 19 Adam Duren 2017-04-07 13:34:35 PDT
Will do. When is this expected to drop into the main Safari builds?
Comment 20 Nathan Friedly 2017-04-07 13:37:15 PDT
Yes, I'll check on the next Safari Technology Preview. Could you post an update here (or send me an email if you prefer) once it's available?
Comment 21 Carlos Garcia Campos 2017-04-10 02:21:01 PDT
(In reply to Carlos Garcia Campos from comment #7)
> (In reply to Alex Christensen from comment #6)
> > Carlos, does this reproduce on linux, too?
> 
> I'm not sure how to test it, but loading https://jsfiddle.net/yta2mjuf/4/
> and sending console messages to stdout, this is what I get:
> 
> https://fiddle.jshell.net/_display/:60:14: CONSOLE LOG sending first binary
> message
> https://fiddle.jshell.net/_display/:62:14: CONSOLE LOG bufferedAmount is 0
> https://fiddle.jshell.net/_display/:65:14: CONSOLE LOG sending second binary
> message
> https://fiddle.jshell.net/_display/:67:14: CONSOLE LOG bufferedAmount is 0
> https://fiddle.jshell.net/_display/:69:14: CONSOLE LOG sending third binary
> message
> https://fiddle.jshell.net/_display/:71:14: CONSOLE LOG bufferedAmount is 0
> https://fiddle.jshell.net/_display/:56:14: CONSOLE LOG WebSocket closed with
> code: 1005, reason:

How can I check this is fixed in soup? I'm still getting this output even after the patch.
Comment 22 Alex Christensen 2017-04-10 10:11:23 PDT
I used Nathan's code to verify that onerror was no longer being called.  I modified it by adding an onmessage that counts the number of bytes received to verify that the bytes were actually being sent and received.

We really need to figure out how to reproduce this issue locally with python.
Comment 23 gneius 2017-04-13 08:49:34 PDT
Our customers are currently experiencing the same issue as described by this defect and we have to put in a fixed delay between sending data frames in order to get around this problem and it doesn't always work. Since this has been marked as fixed. Is there an ETA on when the new version of Safari and iOS will be released with this fix so we can communicate with our customers? thx
Comment 24 Divya Manian 2017-04-17 10:59:40 PDT
Folks mentioning impact, can you give us names of your products and the number of folks who are impacted?
Comment 25 gneius 2017-04-17 12:12:28 PDT
(In reply to Divya Manian from comment #24)
> Folks mentioning impact, can you give us names of your products and the
> number of folks who are impacted?

Our product is called Klipfolio (www.klipfolio.com). We have 22 of our customers reporting this issue, before we put in a 50ms delay between sending messages. After the 50ms delay there are still 3 reported cases that are still experiencing this issue.
Comment 26 Nathan Friedly 2017-04-17 13:43:39 PDT
(In reply to Divya Manian from comment #24)
> Folks mentioning impact, can you give us names of your products and the
> number of folks who are impacted?

In my case it's the IBM Watson Speech to Text API. I can't give exact numbers, but we handled around 100~150k WebSocket connections from Safari users last month. (Presumably it would be quite a bit more if Safari supported microphone input via getUserMedia rather than just files, so maybe we got lucky here ;)
Comment 27 Adam Duren 2017-04-18 14:33:57 PDT
Tested our application with r215468 and it appears to be fixed in this build.
Comment 28 Alex Christensen 2017-04-18 15:14:31 PDT
Hooray!  Thanks!
I still think we need to add an automated regression test using python somehow to prevent us from accidentally breaking this again.  We will do refactoring or reimplementing in the future
Comment 29 Jonas 2017-04-19 07:22:23 PDT
Our product is called Awingu and we emailed all of our customers warning them not to update to the latest Safari. Because of this, we only had to patch one customer environment with a fix that limits the package size to around 5000 bytes.

How and where can we test the "r215468" build? Is it better to wait for the next technology preview? I'm guessing it's going to be released one of these days?

Thanks for the update!
Comment 30 Alexey Proskuryakov 2017-04-19 09:17:42 PDT
> How and where can we test the "r215468" build? Is it better to wait for the next technology preview?

Either the next Safari Technology preview or the current WebKit nightly build (https://webkit.org/downloads/) would work for this purpose.
Comment 31 Jonas 2017-04-20 05:38:31 PDT
I've verified the fix on the Safari Technology Preview r28 of yesterday and I can confirm the issue is not present anymore.

Now it's only a matter of knowing when this version is releasable to the public.
Comment 32 gneius 2017-04-20 07:05:22 PDT
Just an additional note, in addition to Safari, a portion of our mobile user who's updated to latest iOS is also affected by this defect. Our native mobile app also uses webkit behind it. 

It will be great to know the release schedule for iOS with this fix as well.
Comment 33 Igor Khomenko 2017-04-27 08:34:36 PDT
I have the same issue with 10.1
but it works fine with 10.0.3
 
Here is my log from browser console:
http://image.quickblox.com/459f0469f8c5af77d71775066988.injoit.png

Our product https://qm.quickblox.com is completely broken in Safari 10.1.
So waiting for you guys to release a new update..
Comment 34 Alex Christensen 2017-04-27 10:47:38 PDT
Igor, could you also verify that the latest Safari Technology Preview has fixed your problem?
Comment 35 Igor Khomenko 2017-05-05 05:56:40 PDT
It works fine with dev preview version.
When are you going to push 10.1.1?
Comment 36 Deepak Chaula 2017-05-14 22:41:34 PDT
Hi guys,
When are you going to push fixed 10.1 safari latest version?
Comment 37 Songli Li 2017-05-16 02:46:56 PDT
Hi folks, I've just upgraded Safari to 10.1.1 and this issue can still happen...
Comment 38 Jonas 2017-05-16 02:55:05 PDT
It looks like 10.1.1 only contains Security updates.

https://support.apple.com/en-us/HT207804
Comment 39 Deepak Chaula 2017-05-16 06:25:25 PDT
Yes, still I am facing this issue...
Comment 40 Divya Manian 2017-05-17 16:57:53 PDT
There is a Beta program https://beta.apple.com that would be good to sign up for to get early access on what goes into the next release of Safari ahead of time.
Comment 41 gneius 2017-05-23 05:59:32 PDT
(In reply to Divya Manian from comment #40)
> There is a Beta program https://beta.apple.com that would be good to sign up
> for to get early access on what goes into the next release of Safari ahead
> of time.

Hi, 

We have more and more customer experiencing this issue. So far the work around is to add some delay when sending data through websocket. The work around is not really acceptable by our customers as it adds a lot of overhead and reduce performance by a large margin. 

Is it possible to get an ETA on when the fix will make it into the release version of Safari and iOS so we can communicate that to our customers? 

TIA!
qing
Comment 42 Michael Catanzaro 2017-05-23 06:29:27 PDT
Hi, you're unlikely to get a response from Apple developers here, as they're not generally allowed to comment about future product releases. This issue has been fixed in WebKit, and that's as much as you're going to be able to get here.
Comment 43 Jonas 2017-09-14 05:51:01 PDT
Is this confirmed to be part of Safari 11?
Comment 44 Alexey Proskuryakov 2017-09-14 09:26:31 PDT
This is fixed in the latest updates to shipping Safari, in macOS 10.12.6 and iOS 10.3.3.