Bug 182906 - ICE Gathering never completes due to srflx candidate
Summary: ICE Gathering never completes due to srflx candidate
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebRTC (show other bugs)
Version: Safari 11
Hardware: Macintosh macOS 10.13
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-17 15:15 PST by Thomas Mullen
Modified: 2018-08-20 23:57 PDT (History)
3 users (show)

See Also:


Attachments
rtcstats-dump of failing call where no srvflx candidate was emitted (129.65 KB, text/plain)
2018-08-20 23:53 PDT, daginge
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Mullen 2018-02-17 15:15:32 PST
Safari 11 is intermittently waiting for a srflx candidate that never comes. Because of this, the null candidate never comes and ice gathering never completes. Although you can connect if you trickle ice and don't worry about the lack of ice completion, disabling trickle is broken.

Normal operation (on my network), gives me two candidates:
```
candidate:2222700650 1 udp 2113937151 192.168.1.105 50465 typ host generation 0 ufrag Cn1a network-cost 50
candidate:842163049 1 udp 1677729535 147.194.220.130 50465 typ srflx raddr 192.168.1.105 rport 50465 generation 0 ufrag Cn1a network-cost 50
```

However, around 1 in 10 times (by my estimate), Safari never delivers the second srflx candidate and just sits at gathering. The null candidate never arrives either.

I've tried it with a variety of STUN servers, networks and machines. This issue only occurs when using STUN.


Code to reproduce:
```
var config = {
		"iceTransports": 'all',
		"iceServers": [
			{
				"urls": "stun:stun.l.google.com:19302"
			}
		]
	}
var pc = new RTCPeerConnection(config)
pc.createDataChannel('test')
pc.createOffer().then(offer => {
  pc.setLocalDescription(offer)
})
pc.onicecandidate = (event) => {
  console.log(testId, event.candidate)
}
```

Here is a JS fiddle demonstrating how it breaks connection attempts without ice trickle (it will repeat the test multiple times and eventually stall after ~10 runs). Chrome and Firefox run this fiddle perfectly fine (they never stall on gathering).
http://jsfiddle.net/hvkn8b5e/6/

Disabling ice candidate restrictions, requesting media capture access, http/https, and restarting Safari have no effect.
Comment 1 Thomas Mullen 2018-02-17 15:58:11 PST
I can confirm through Wireshark that the STUN server is sending the srflx candidate. Safari just doesn't emit the event.
Comment 2 youenn fablet 2018-03-27 14:54:15 PDT
(In reply to Thomas Mullen from comment #1)
> I can confirm through Wireshark that the STUN server is sending the srflx
> candidate. Safari just doesn't emit the event.

The issue with http://jsfiddle.net/hvkn8b5e/6/ is that the peer connection should be closed so that some clean-up happens and sockets be closed. Otherwise, we run out of sockets and stop doing anything.
If you close explicitly the pc, it seems to run fine.

Thomas, we just updated to a new libwebrtc endpoint on STP 52.
I am wondering whether you could do some additional testing here.
Comment 3 Thomas Mullen 2018-03-27 16:47:42 PDT
(In reply to youenn fablet from comment #2)

I usually hit the stall before running out of peer connections, but you're right.
http://jsfiddle.net/hvkn8b5e/8/

The stall on the srflx still occurs, even when closing after each test.

I'll try STP 52.
Comment 4 daginge 2018-08-20 23:53:30 PDT
Created attachment 347617 [details]
rtcstats-dump of failing call where no srvflx candidate was emitted

We are seeing some instances of this as well. I have attached an rtcstats dump from a failing call done on iOS towards Chrome 68. Please open in https://fippo.github.io/webrtc-dump-importer/rtcstats

The issues is that as you can see, onicecandidate is called twice in the initial attempt from the iOS side, generating only host candidates. This causes the call to fail after 15 seconds. Upon an icerestart (initiated by iOS), it works just fine and gathers candidates as normal.

Any idea what's going on here? Unfortunately I don't have a perfect repro, as I have yet to discover the cases where this happens consistently. I suspect it might be related to https://bugs.webkit.org/show_bug.cgi?id=181009 but I funnily enough don't have an IPv6 network to test on.
Comment 5 daginge 2018-08-20 23:57:13 PDT
Note, my example dump does use trickle, which leads me to believe there is something happening outside disabling trickle here