Bug 183331 - [webrtc] Receiving an IDR frame under packet loss conditions can cause unrecoverable video distortions
Summary: [webrtc] Receiving an IDR frame under packet loss conditions can cause unreco...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebRTC (show other bugs)
Version: Safari 11
Hardware: All All
: P2 Major
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-05 06:49 PST by andj2223
Modified: 2018-03-09 03:52 PST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description andj2223 2018-03-05 06:49:39 PST
Using Safari on the desktop, do the following:
1. Visit the example url at <redacted> and log in with <redacted> username/password. (Youenn, check your email for this info, and any other devs that want it just let me know.)
2. Once you're logged in, you should see a character spinning around. Packet loss is being simulated at 10% on the server, so there is no need for you to do so locally.
3. Press alt+i. This will send an IDR to you. You may notice corruption/distortion right away, but if you don't, just keep pressing alt+i. It should only take 6-7 keypresses to reproduce the distortion. You can confirm that your keypress is being registered by looking for the "!!! REQUESTING IDR !!!" message in the web inspector console.

Actual behavior: This distortion is not corrected within a reasonable time, and stays around for quite a while.
Expected behavior: Either no distortion at all, or very brief distortion that gets corrected.

This ends up being quite a serious issue, because any packet loss that occurs during an IDR can cause permanent corruption and distortions in the video stream.

Notes:
* Note I use out-of-band SPS/PPS with this example via the SDP sprop-parameter-sets parameter. The problem also occurs with in-band SPS/PPS. The purpose of delivering SPS/PPS out-of-band over a reliable transport for this repro is to completely eliminate SPS/PPS nalu packet corruption as a potential source of the problem, so we can focus on where the real issue is -- probably a problem in the code that handles lost packets of an IDR slice.
* The problem also occurs on iOS devices, but there isn't a convenient hotkey to use to send IDR on iOS. To replicate the problem on iOS, perform steps 1-2 on a desktop and leave the page open, then perform steps 1-2 on an iOS device. Now, with both devices consuming the same stream, you can press Alt+i on the desktop  and the IDR will be sent to the iOS device, where you should notice the same kind of corruption happening.
* The problem does not occur when there is not packet loss.
* The issue also occurs in Chrome, so the problem may be within libwebrtc.
* There is also a slightly related issue filed on the webrtc project, https://bugs.chromium.org/p/webrtc/issues/detail?id=8423 HOWEVER, I believe they are currently a bit confused about the real source of the problem -- They seem to think the issue is being caused by lost SPS/PPS, whereas I'm using an out-of-band SPS/PPS here. The problem is therefore more likely caused by a mishandling of lost IDR frame packets and doesn't have anything to do with lost SPS/PPS. So the 8423 issue is useful reading, but can also be a bit confusing due to the "lost SPS/PPS" (unintentional) red herring. I've updated 8423 with notes trying to convince them to look more deeply into the lost IDR packet problem.
Comment 1 andj2223 2018-03-05 07:00:13 PST
Another quick note:
Sometimes you will see corruption immediately, without ever pressing Alt+i. This is the same issue, it just so happened that the very first IDR frame had lost packet(s).
Comment 2 andj2223 2018-03-09 03:52:38 PST
I wanted to provide an update on this:

I was able to work around this issue by setting my encoder on the sender-side to produce a short-duration intra refresh after every IDR. This is certainly not ideal - more data is sent than should be necessary, but does work for now.

Ultimately, this is an issue that libwebrtc can probably fix somehow (perhaps by enabling the SpsPpsIdrIsH264Keyframe fieldtrial by default), but they're not sure exactly which method to do use because of how it might impact other streams. You can find a lot more information in the following bugs:
https://bugs.chromium.org/p/webrtc/issues/detail?id=8536

Particularly comment 28 and greater.

These also contain some relevant information:
https://bugs.chromium.org/p/webrtc/issues/detail?id=8423
https://bugs.chromium.org/p/webrtc/issues/detail?id=8923

Note:
This can also -- perhaps but I have not confirmed this -- be fixed on the encoder side by ensuring that any IDR frame is also accompanied by a new SPS/PPS with an updated sps_id and pps_id. Unfortunately though, it is common for many hw encoders to keep sps_id and pps_id at 0 even after an IDR.