Bug 257965 - REGRESSION (iOS 16.4): Safari occasionally locks up and stops completing XHR requests
Summary: REGRESSION (iOS 16.4): Safari occasionally locks up and stops completing XHR ...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: Safari 16
Hardware: iPhone / iPad Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2023-06-12 09:44 PDT by Nick M
Modified: 2024-03-15 05:20 PDT (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nick M 2023-06-12 09:44:14 PDT
I am a developer at a company that hosts a web application that is heavily used by iOS users. Starting a couple months ago, we saw a decrease in conversion metrics.

After investigating, we discovered strange behavior happening to iOS users, starting with iOS 16.4 (also seen on iOS 16.5). At some point in the session, a API request would appear to hang, eventually timing out (the application times out the request, this timeout is not from iOS or Safari itself). From that point on, all XHR requests fail to resolve and are eventually timed out as well. This behavior is observed via a front end event recording service (FullStory). After this happens, the tab seems to be "fouled" and all requests fail.

We are confident network conditions are fine in a majority of these cases are these are not "true" timeouts. We have been able to replicate the issue on personal iOS devices, but not reliably. 

We believe this issue could be related to Bug 255524 due to new cookies often being set around the time the bug appears and the fact that we were not able to replicate the bug in a private window. However, we decided to make a new bug report as the impact we're seeing is different than what is reported in that thread.

Personally reproduced with:
 - iPhone 13 Pro running iOS 16.5

Additional info:
 - This bug could also be present on MacOS Safari as we've seen a couple sessions that appear to show this behavior on those devices. We however see much more traffic from mobile iOS so we see it there most often.
 - The first call to hang/timeout is not always the same call, or even in the same user flow in the application. We also haven't seen this behavior on any other browser. Given this, we're confident the issue isn't with the application itself and is most likely a Safari bug.
 - I have seen this behavior from a user perspective while interacting with other sites day to day on my iPhone. A tab will lock up and be 'loading' the site indefinitely. The only thing that resolves the issue in this case is closing the tab and opening a new one.
Comment 1 Alexey Proskuryakov 2023-06-12 15:00:08 PDT
Thank you for the report! Would it be possible for you to provide steps to reproduce that we can follow, even if not 100%?

If that's not possible, could you please file a report with a sysdiagnose (taken in state) at https://feedbackassistant.apple.com?
Comment 2 Radar WebKit Bug Importer 2023-06-12 15:00:22 PDT
<rdar://problem/110668220>
Comment 3 Nick M 2023-06-13 13:37:08 PDT
Hey Alexey, thanks for the response.

Unfortunately I cannot share the site (which rules out reproduction steps).

I have spent time today trying to recreate the issue myself so I can file a report with a sysdiagnose. This has been unsuccessful.

We realized today one version of the app seems to see the issue more than the other two (this is one app that runs as 3 different brands). This is something we're investigating to hope to narrow down the specific root cause for trigging the bug.

I realize this is not really enough information for y'all to investigate, I apologize. I plan to continue investigating this and trying to recreate personally. I will be sure to update this thread as we discover more information.
Comment 4 Nick M 2023-06-15 08:31:07 PDT
Hey there, I think we may have a lead on what could be causing the issue.

While researching the issue we found this article[1] talking about changes to how cookies are handled in iOS 16.4, specifically surrounding third party cookies and cookies using ‘CNAME cloaking.’ That article links to this Webkit PR[2] which discusses the change in detail. 

Our application uses Okta as an Identity Provider, and utilizes the function ‘getWithoutPrompt’ in their SDK. According to their docs[3], this function is known to cause issues when it comes to third party cookie tracking prevention. Additionally, we use a CNAME record on our domain to direct traffic to Okta.

From what we’ve seen, the cookie is not being dropped from requests so we don’t believe the Okta method is broken or just being caught by tracking prevention. We are wondering if a bug was introduced with the iOS 16.4 change that is being triggered by our usage of Okta. 

We also found a post on the Apple community forums[4] that feels like the same behavior we’re seeing in our application.

[1] https://www.imore.com/security/apples-secret-safari-cookie-crackdown-could-have-unintended-consequences-for-your-logins

[2] https://github.com/WebKit/WebKit/pull/5347

[3] https://github.com/okta/okta-auth-js#third-party-cookies

[4] https://discussions.apple.com/thread/254879217
Comment 5 Nick M 2023-06-15 08:37:15 PDT
Ope, realized I missed an important detail. This behavior seems to regularly occur right after our call to Okta. This further leads us to believe that the call is in some way triggering the bug.
Comment 6 Joel 2024-03-15 05:20:20 PDT
We have users experiencing very similar issues:

> At some point in the session, a API request would appear to hang, eventually timing out (the application times out the request, this timeout is not from iOS or Safari itself). From that point on, all XHR requests fail to resolve and are eventually timed out as well. After this happens, the tab seems to be "fouled" and all requests fail.

Same behavior observed by us, with the difference that the floodgates are released at an arbitrary point in time in the future, sometimes even within the 60 seconds, making no timeouts occur.

> We are confident network conditions are fine in a majority of these cases are these are not "true" timeouts.

Same here, fetch requests are both starting and completing just fine during this "freeze".

> The first call to hang/timeout is not always the same call, or even in the same user flow in the application. We also haven't seen this behavior on any other browser. Given this, we're confident the issue isn't with the application itself and is most likely a Safari bug.

Same observed by us.

Furthermore, we have client logs from the network tab, along with server logs, which together paint the story that Nick shared with you of XHR requests locking up. Summarized, they say:
1. Several XHR requests* are started, and stay in a pending state for very long.
2. Some fetch requests are performed in parallell, and they start and complete within reasonable time frames.
3. If 60 seconds pass, the XHR requests time out**. If not, a successful response is returned.
4. In case of a successful response, server logs indicate that no XHR request reached the server until very shortly before the response was sent.
5. Meanwhile, XHR requests from other clients (Windows Edge in this example) are handled just fine during the same time frame that the requests are held up, indicating no server issues.

*XHR requests towards one domain only, it actually works fine for requests targeting another domain, in the same timeframe. Perhaps this lends some credit to Nick's cookie theories?

**In one case, the OPTIONS request was recorded on the server, but then the request timed out without performing the follow-up request, indicating the floodgates were released just before 60 seconds passed. Most of the time, there is no such trace, indicating that the timeout occurred before anything reached the server.

We have the option to migrate from XHR to fetch, so we are probably going to set up slow request tracking, then migrate to fetch, and then see if there is any measurable difference.