Bug 228296 - REGRESSION (iOS 15): Websocket connection instance in javascript client getting closed
Summary: REGRESSION (iOS 15): Websocket connection instance in javascript client getti...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: Safari Technology Preview
Hardware: iPhone / iPad Other
: P2 Blocker
Assignee: Nobody
URL:
Keywords: InRadar
: 230076 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-07-26 13:13 PDT by ABDUL RAHIMAN MULLA
Modified: 2021-12-03 10:52 PST (History)
15 users (show)

See Also:


Attachments
Screencast of the bug (43.80 MB, video/quicktime)
2021-11-22 16:15 PST, mattwindwer
no flags Details
Screenshot of the error (798.00 KB, image/png)
2021-11-26 00:17 PST, marc_aurel
no flags Details
Screenshot showing the web socket error when using chats in Basecamp (5.97 MB, image/png)
2021-12-01 04:28 PST, Jorge Manrubia
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description ABDUL RAHIMAN MULLA 2021-07-26 13:13:30 PDT
While sending a websocket response from Websocket-sharp c# implemented websocket server, which is having mutiple fragments writing into the socket stream, connection at the client instance is getting closed.

WebSocket connection to 'ws://******:****/' failed: The operation couldn’t be completed. (kNWErrorDomainPOSIX error 100 - Protocol error)
Comment 1 ABDUL RAHIMAN MULLA 2021-07-26 13:15:10 PDT
This issue is being observed only in iPadOS 15 beta, this was working as expected in iPadOS 14.6 and earlier versions
Comment 2 Alexey Proskuryakov 2021-07-26 15:31:39 PDT
Could you please provide a test case (preferably live, as I'm not sure if anyone here would know how to run C# code).
Comment 3 Alex Christensen 2021-07-26 16:16:30 PDT
I'm quite interested in this.  What do you mean by "multiple fragments"?  Are you doing anything else interesting with your server?
Comment 4 ABDUL RAHIMAN MULLA 2021-07-27 00:19:25 PDT
(In reply to Alex Christensen from comment #3)
> I'm quite interested in this.  What do you mean by "multiple fragments"? 
> Are you doing anything else interesting with your server?

while sending large message size (> 70 KB) from websocket-sharp server, it is trying to send multiple frames in the websocket protocol, and in js client we will have websocket api to take care of that frames and reading as a single message. Now issue is when my server is trying to write multiple frames to socket stream, host machine (js client) is abnormally closing the connection and getting this protocol error.
Comment 5 Alex Christensen 2021-07-27 09:53:40 PDT
Is there any way you could either provide the code of a server that hits this issue or provide an IP address of such a server running on the internet?  Feel free to send the IP address to my email privately if you don't want to post it publicly here.
Comment 6 Alex Christensen 2021-07-27 18:21:09 PDT
My initial investigation has found that large numbers of continuation frames works as expected.  I do this:
1. Send a text frame with fin=0 and length=1
2. Send many continuation frames with fin=0 and length=1
3. Send a continuation frame with fin=1 and length=1
Could you describe exactly what I'm doing differently than your server?  Sending a packet capture could also be enlightening.
Comment 7 Radar WebKit Bug Importer 2021-08-02 13:14:16 PDT
<rdar://problem/81425845>
Comment 8 isaac+webkit 2021-09-08 18:49:52 PDT
I've raised what could be a similar issue, affecting macOS Monterey here: https://bugs.webkit.org/show_bug.cgi?id=230076
Comment 9 Jay Charles 2021-09-10 13:59:51 PDT
I am encountering similar issues as well against golang gorilla websockets on iOS 15 beta 8. Websocket connections occasionally fail to complete, and frequently close unexpectedly.
Comment 10 Alex Christensen 2021-09-14 09:00:05 PDT
I would really like to look into this, but the descriptions do not contain enough information for me to reproduce.  If someone could send me a link to a server that is reproducing this issue, I'll look into what is going on.
Comment 11 youenn fablet 2021-09-14 09:24:33 PDT
rdar://81747517 is probably the bug tracking internal work.
Comment 12 Alex Christensen 2021-09-14 11:55:08 PDT
Or alternatively if someone provides me with actual C# or golang code that makes a server that reproduces the issue, I could also use that to look into what is going on.  I've made web socket servers that do what you describe and they work, but I definitely believe that something is going wrong.
Comment 13 Alex Christensen 2021-09-15 00:28:22 PDT
*** Bug 230076 has been marked as a duplicate of this bug. ***
Comment 14 Alex Christensen 2021-09-15 00:29:14 PDT
With some additional information provided by https://bugs.webkit.org/show_bug.cgi?id=230076 I found what is going on here, and prepared a fix for the underlying framework.
Comment 15 Alex Christensen 2021-09-15 00:29:58 PDT
I'm tracking my internal work with rdar://82917968
Comment 16 Viesturs 2021-09-28 06:28:50 PDT
This can be easily reproduced with a reference ws as pointed out in https://developer.apple.com/forums/thread/685403?login=true via https://libwebsockets.org/testserver/
Comment 17 Alex Christensen 2021-09-29 09:38:02 PDT
That is a different bug that is not a regression in iOS15.  I filed https://bugs.webkit.org/show_bug.cgi?id=230962
Comment 18 Mikael Nousiainen 2021-10-05 02:52:01 PDT
I've encountered this issue too with a WebSocket server using HTTP/2 + TLS 1.3. The connection succeeds at first and I'm able to send and receive some (short) WebSocket messages, but then the connection gets disconnected with the aforementioned "kNWErrorDomainPOSIX error 100 - Protocol error" error message. A longer message sent by the server seems to cause the disconnection.

Based on the discussion in this thread:
https://developer.apple.com/forums/thread/685403

I'm also suspecting that the reason might be what is described there:

"This error is caused by NSURLSession’s inability to process split messages normally. As long as the received WebSocket Message Frame is Fin=0, an error will occur."

There seems to be a workaround for this:

1. Navigate to: Settings > Safari > Advanced > Experimental Features
2. Set "NSURLSession WebSocket" to OFF and restart Safari (or the phone/tablet). This seems to fix the issue for now at least.
Comment 19 Mads Erik Forberg 2021-10-06 06:13:55 PDT
I can confirm that the disabling of "NSURLSession WebSocket" workaround works.
Comment 20 mattwindwer 2021-11-18 11:17:40 PST
We have a similar issue starting iOS 15 and also Safari 15.x on Mac.  The error we are getting is "[Error] WebSocket connection to 'wss://xxx' failed: WebSocket is closed before the connection is established." (xxx is our site).

Simply disconnecting from WiFi then reconnecting seems to fix it, but reloading the website does not. Our customers also report that restarting their devices fixes it. I do not know how to force reproduction of this bug but it is definitely happening to many of our customers since iOS 15 and the latest version of Safari. Is this still being investigated by the webkit team?
Comment 21 Alex Christensen 2021-11-18 13:13:48 PST
The "kNWErrorDomainPOSIX error 100" bug should've been fixed in iOS 15.1.  If it wasn't, please let me know, ideally with a way to reproduce the bug.

This is the first I've heard of the "WebSocket is closed before the connection is established" bug.  I'm happy to look into it if you have a way to reproduce the bug.  It sounds like something may be going on with TLS but I'd need more information to be sure.
Comment 22 mattwindwer 2021-11-22 16:15:53 PST
Created attachment 444989 [details]
Screencast of the bug

Alex, we are able to reproduce an issue on Safari 15.1 for Mac as well as the latest Safari Technology Preview Release 135 where "NSUrlSession WebSocket" (which is enabled by default) is breaking the ability to connect to our server at all using WebSockets after the computer sleeps in the following scenario:

1. Visit our website, which establishes an active WebSocket connection to our server.
2. Walk away from the laptop (letting it sleep) for some time.
3. Return to the laptop, and when it awakens, Safari loses the ability to connect to WebSockets completely on our website.  Even reloading the website no longer establishes a WebSocket connection.

In order to resolve this, we have found that the user needs to do one of two things:

1. Exit Safari and re-open Safari.
2. Disconnect from WiFi and then reconnect (no need to exit Safari).

We are pretty sure this bug is also in Safari for iOS 15.x (including 15.1) based on customer reports.

I have attached a screencast demonstrating how WebSockets becomes broken on our website, even after reload, with "NSUrlSession WebSocket" enabled when the above scenario occurs on Safari Technology Preview 135 using my M1 MacBook Air. The video shows the following:

1. Computer has woken from sleep after some time, and WebSocket connection (via the /cable endpoint) can no longer be established (client-side code continues to retry every 6 seconds).
2. After reloading the page, the connection still cannot be established.
3. After disabling "NSURLSession WebSocket", the connection is established immediately.
4. Reloading the page works fine with "NSURLSession WebSocket" disabled.
5. Re-enabling "NSURLSession WebSocket" kills the connection again, in perpetuity, until it is disabled.

Please ignore the Basic Auth prompts in the video as that is just for the JavaScript source maps since the web inspector is open.

I hope that this helps demonstrate the issue at hand. Please let me know if I can provide any further details to help in the investigation, or if a new bug report needs to be filed, and I will be happy to help in any way I can.
Comment 23 marc_aurel 2021-11-26 00:16:27 PST
This also happens when using macOS Safari 15.1 (15.0 works fine) when connecting to a c# websocket server from https://github.com/ngld/OverlayPlugin and large chunks of data get send. 

I’ve also  attached an image with the specific error if that helps…
Comment 24 marc_aurel 2021-11-26 00:17:13 PST
Created attachment 445174 [details]
Screenshot of the error
Comment 25 Jorge Manrubia 2021-12-01 04:26:54 PST
I could reproduce the problem exactly as described by Matt Windwer in https://bugs.webkit.org/show_bug.cgi?id=228296#c22.

We have been experiencing this bug in our Basecamp iOS app since the upgrade to iOS 15: when opening a chats and trying to send a message, it would sometimes fail to deliver it. We got several reports from customers about this.

The cause is that the underlying web sockets fail to connect with the same error:

> [Error] WebSocket connection to 'wss://xxx' failed: WebSocket is closed before the connection is established.

1. It happens intermittently when restoring the app from the background.
2. The only way to fix is restarting the app. The app embeds a WKWebView instance, and reloading the page there won't fix the problem.

   I tried to debug the problem with Safari dev tools, with my unit plugged, and closing the web sockets and creating new ones from the console wouldn't fix the problem either.

3. Disconnecting the wifi, trying to send a message, and connecting it again fixed the issue.

Today, I was able to also reproduce the problem with Safari Technology Preview release 135 (macOS Monterey):

1. Open a campfire in Basecamp (you can create a free account in https://basecamp.com)
2. Put the computer to sleep.
3. Wait for like 15 minutes (you need to give it some time to happen, it won't fail if you wake up the computer right away).
4. Come back and try to send a message, you will see exactly the same problem described by Matt and shown in his screencast. I'm attaching an screenshot.
Comment 26 Jorge Manrubia 2021-12-01 04:28:09 PST
Created attachment 445552 [details]
Screenshot showing the web socket error when using chats in Basecamp
Comment 27 mattwindwer 2021-12-01 08:53:24 PST
Hey Jorge, as a work-around, I found that closing the WebSocket connection manually when the page is backgrounded prevents this issue from occurring (instead of waiting for the browser/OS to close the connection some time after sleep, which seems to trigger the bad state that results in the inability to reconnect to WebSockets even after page reload).

In our case, using ActionCable:

document.addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') {
    cable.disconnect()
  } else {
    cable.connect()
  }
})

We are no longer getting customer reports after the above code went live. Still, this is a regression with "NSURLSession WebSocket" as previously mentioned.
Comment 28 youenn fablet 2021-12-01 09:01:57 PST
@mattwinder, if you can reproduce, could you send me (youenn@apple.com) a sysdiagnose when you reproduce the issue (including the time the issue reproduced).
Comment 29 Sathiamoorthy 2021-12-02 01:53:00 PST
Facing the same issue in iOS 15.1, but not able to reproduce in iOS 14.1 (may be lesser than 15 version).

Any solution for this ?
Comment 30 Sathiamoorthy 2021-12-02 01:55:44 PST
(In reply to Sathiamoorthy from comment #29)
> Facing the same issue in iOS 15.1, but not able to reproduce in iOS 14.1
> (may be lesser than 15 version).
> 
> Any solution for this ?

Same issue happened in iOS 15.1 & Chrome (96.0.4664.53)
Comment 31 Jorge Manrubia 2021-12-02 03:30:21 PST
Matt, thank you so much for the workaround! We are giving a try to that patch. I'll follow up when we validate whether it works or not.
Comment 32 Jorge Manrubia 2021-12-03 04:28:37 PST
Matt, we are testing the workaround internally but we are still hitting the error. Could you confirm if it has fixed the problems for you for good?
Comment 33 mattwindwer 2021-12-03 10:52:52 PST
@Jorge

After deploying the workaround, our support requests regarding the issue stopped within a day. Previously we were getting a dozen or so requests daily of the nature "you need to force close and re-open Safari and/or the app" (our app is an embedded WKWebView using capacitor/cordova).

Since the workaround, I also haven't been able to reproduce the issue after many attempts on multiple devices (Mac & iPad) over the course of several days, except I think it did happen on one occasion, so I wouldn't say that it is full proof.

I sent youenn fablet a sysdiagnose by reproducing the issue (without the workaround) along with the time of reproduction. Perhaps you can do the same to help them diagnose the issue.