Bug 195451 - Safari sometimes (1%) fails relay loopback test
Summary: Safari sometimes (1%) fails relay loopback test
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebRTC (show other bugs)
Version: Safari 12
Hardware: iPhone / iPad iOS 12
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-03-08 01:17 PST by daginge
Modified: 2019-06-10 09:18 PDT (History)
2 users (show)

See Also:


Attachments
Sample call with the failure (72.43 KB, text/plain)
2019-03-08 01:17 PST, daginge
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description daginge 2019-03-08 01:17:01 PST
Created attachment 363997 [details]
Sample call with the failure

We're seeing a 1% failure rate when performing a relay loopback test on iOS devices running Safari 12 in production.

The issues is that one peer connection is able to get a relay candidate, while the other times out. This is happening at the same time, on the same device, so there is no reason why only one peer connection can get a relay candidate and not the other. Both peer connections are configured with an iceTransportPolicy to only gather relay candidates.

Attached I have a sample log of this happening, please view it at https://fippo.github.io/webrtc-dump-importer/rtcstats

As you can see from PC_0 and PC_1, they attempt to connect to each other, and PC_1 gets an onicecandidate as expected, while PC_0 just times out and generates a null candidate. Neither peer connection goes to the failed ice connection state as expected.

We have observed this in 1% of all loopback tests done with iOS 12 and Safari 12.x (we have the most data from 12.1.2 and 12.1.4).

Let me know if you need additional debug details, hopefully the attached log may give enough insight into this issue happening to discover what's going on. Smells like a race condition to me...
Comment 1 daginge 2019-03-08 01:28:27 PST
I should also note that we're not seeing a correlation between specific relay servers here, or configs.

Our TURN servers are set up with coturn on ubuntu, and configured to support UDP and TCP. We do supply two different origins for our TURN servers, one for turn.confrere.com, and one for turn-static.confrere.com. Not sure if this might trigger the issue currently. We don't have the volume yet to do large scale testing like this.
Comment 2 Radar WebKit Bug Importer 2019-03-08 14:50:32 PST
<rdar://problem/48726683>
Comment 3 youenn fablet 2019-06-10 09:18:50 PDT
Hi Daginge,

Is it also reproducible on MacOS?
It would be good if we could reproduce it with libwebrtc logging enabled.
In latest STP, this is enabled through web inspector WebRTC Logging menu and appears in the system console.