Bug 194747

Summary: [MSE] SourceBuffer sample time increment vs. last frame duration check is broken
Product: WebKit Reporter: up
Component: MediaAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Blocker CC: commit-queue, eric.carlson, jer.noble, lietz, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Safari Technology Preview   
Hardware: Unspecified   
OS: All   
Attachments:
Description Flags
Test case file for the reported issue
none
Expected output file for the reported issue
none
Correct test case file for the reported issue
none
Patch Rev1
none
Patch Rev2
none
Patch Rev3 none

Description up 2019-02-16 09:06:35 PST
Technical Severity 

This issue is breaking MSE playback for totally MSE compliant streams, 
in our case fMP4/H.264/AAC, but not limited to this container or codecs. 
Effects are playback not starting at all or significantly delayed and  
massive amount of frame/sample drop mid stream forced by a 
wrong a MSE compliance check.  

Business Severity 

If the issue will make it into Safari 12.2, 
MSE playback will be broken in Safari for our cdn,  
business leading customers of ours and 
other MSE applications and users. 
We would like to offer assistance and developer time 
to prevent this. 

Code Location and Context 

Location of the issue is SourceBuffer::sourceBufferPrivateDidReceiveSample 
in WebCore/Modules/mediasource/SourceBuffer.cpp, 
in the second part of this conditional statement related to trackBuffer.greatestDecodeDuration. 
The problem is how trackBuffer.greatestDecodeDuration is set/calculated. 
It is mainly a logical issue. Please find more in the details section. 

// 1.6 ↳ If last decode timestamp for track buffer is set and decode timestamp is less than last
// decode timestamp:
// OR
// ↳ If last decode timestamp for track buffer is set and the difference between decode timestamp and
// last decode timestamp is greater than 2 times last frame duration:
if (trackBuffer.lastDecodeTimestamp.isValid() && (decodeTimestamp < trackBuffer.lastDecodeTimestamp
   || (trackBuffer.greatestDecodeDuration.isValid() && abs(decodeTimestamp - trackBuffer.lastDecodeTimestamp) > (trackBuffer.greatestDecodeDuration * 2)))) 

Bug history 

We have first discovered the issue in Safari Technology preview 75. 
We have been able to verify that it exists already in STP 72, 
and that it did NOT exist in STP 61. Further it does not exist in 
the current Safari 12.1 release. 

Checking out WebKit itself we have been able to recreate the issue 
in WebKit nightly 2019 Feb. 14th. 

We have checked in between [MEDIA] commits and found this one: 
https://trac.webkit.org/changeset/236314/webkit/

According to the description the trackBuffer.greatestDecodeDuration check 
has been hidden behind a smaller bug related to the assignment. 
By fixing this the major issue with trackBuffer.greatestDecodeDuration 
became active. 

Details 

Here is how we understand the timestamp increment vs last frame duration check 
in the MSE spec and how it is done in the current implementation. 

Example frame/sample sequence that should not fail (but currently does): 

Frame A) dts:0, duration 10 
Frame B) dts:10, duration 90 
Frame C) dts:100, duration 10 
Timestamp increase and previous frame duration are perfectly aligned. 

Example frame/sample sequence that should fail: 

Frame A) dts:0, duration 10 
Frame B) dts:10, duration 10 
Frame C) dts:100, duration 10 
Discontinuity due to mismatch from B to C. 

What should be compared is (C.dts - B.dts) vs. B.duration .

Current impl. is comparing (C.dts - B.dts) vs. (B.dts - A.dts) . 
(involving a maximum value storage but that does not matter here) 

Instead of comparing current timestamp increment vs. last frame duration 
it's comparing current timestamp increment vs. previous timestamp increment, 
which makes a huge difference, because the target here is not to check 
how constant the frame rate is, but to detect discontinuities. 
Not sure about other containers, but in fMP4 the timestamp and duration 
of a frame are stored separately and explicitely. 

Implications seem to be even stranger for audio in fMP4, 
where one sample can hold multiple audio frames. 
In this case the current check will apply constraints to 
how many audio frames one sample contains and how it 
compares to the number of audio frames in the previous sample, 
which is obviously not what this check should be about. 

Proposal for a 'simple' fix 

Logically the solution would be to compare against trackBuffer.lastFrameDuration 
if valid instead of trackBuffer.greatestDecodeDuration. 

Technically both approaches could be combined to keep changes at a minimum, 
but no sample should be rejected if trackBuffer.lastFrameDuration is not valid. 
It must be ensured that, if valid, trackBuffer.lastFrameDuration holds the duration 
of the last frame/sample, not of any previous ones. 

Thanks in advance!
Comment 1 Jer Noble 2019-02-17 12:05:27 PST
@up, are you describing “decode time stamp” with “dts” or “display time stamp”? Also, decode duration or display duration?
Comment 2 Jer Noble 2019-02-17 12:10:28 PST
Also, we have a pretty capable “mock” MSE implementation for running layout tests that isn’t dependent on any particular demuxer/decoder. I wonder if we could translate your examples into a LayoutTest and verify that it is indeed failing the way you describe. 

FWIW, for audio samples, we have a way of “splitting” a large sample group into smaller ones for the purposes of overlap avoidance. So the “large sample” issue should be less of a problem in practice. 

FWIW #2, other container formats (most notably Matroshka) are much less rigorous than ISO BMFF when it comes to timing info.
Comment 3 Jer Noble 2019-02-17 12:12:12 PST
@up, do you have a public ally available trstcase that demonstrates the issue?
Comment 4 Radar WebKit Bug Importer 2019-02-17 12:12:47 PST
<rdar://problem/48148469>
Comment 5 up 2019-02-18 03:40:57 PST
@jer Thank you for the fast response. I was referring to decode time and decode duration, as this is what is checked in this case. dts vs. pts does not really matter, or is a more special question, because for video without frame reordering, e.g. H.264 Baseline Profile, dts and pts are the same, even if there is a constant offset sometimes, and will break just as described. 

Sorry, I'm not familiar (yet) with the tests in WebKit. I have seen the mock source, but not 100% sure how it works. Can somebody assist to model the first frame sequence I have described in a mock source test? 

Regarding a test case, I will try to provide one, but as I said the issue can be understood logically and that's what I tried to explain in the report. 
I suggest to focus on creating the mock test first. 

Regarding the split up of large audio sample groups, what are the limits that will cause the split up? Problem is that especially in low latency live streaming use cases, the fMP4 fragments are kept pretty small, might be even video frame length. 

Really appreciating your help.
Comment 6 Jer Noble 2019-02-18 14:27:36 PST
(In reply to up from comment #5)
> @jer Thank you for the fast response. I was referring to decode time and
> decode duration, as this is what is checked in this case. dts vs. pts does
> not really matter, or is a more special question, because for video without
> frame reordering, e.g. H.264 Baseline Profile, dts and pts are the same,
> even if there is a constant offset sometimes, and will break just as
> described. 
> 
> Sorry, I'm not familiar (yet) with the tests in WebKit. I have seen the mock
> source, but not 100% sure how it works. Can somebody assist to model the
> first frame sequence I have described in a mock source test? 

Sure, here's a representative sort of test case for this kind of test:

https://trac.webkit.org/browser/webkit/trunk/LayoutTests/media/media-source/media-source-dropped-iframe.html

Which has the expected results of:

https://trac.webkit.org/browser/webkit/trunk/LayoutTests/media/media-source/media-source-dropped-iframe-expected.txt

You can see the interface to "makeAInitSegment()" and "makeASample()" here:

https://trac.webkit.org/browser/webkit/trunk/LayoutTests/media/media-source/mock-media-source.js

From your initial example, you would need something like:

	var samples = concatenateSamples(
		makeASample(0, 0, 10, 100, 1, SAMPLE_FLAG.SYNC, 1),
		makeASample(10, 10, 90, 100, 1, SAMPLE_FLAG.SYNC, 1),
		makeASample(100, 100, 10, 100, 1, SAMPLE_FLAG.SYNC, 1),
	);

..for a sample sequence of DTS,Duration [{0,10}, {10, 90}, {100, 10}], all with a timebase of 100 where each sample was an I-frame.

> Regarding a test case, I will try to provide one, but as I said the issue
> can be understood logically and that's what I tried to explain in the
> report. 
> I suggest to focus on creating the mock test first. 
> 
> Regarding the split up of large audio sample groups, what are the limits
> that will cause the split up? Problem is that especially in low latency live
> streaming use cases, the fMP4 fragments are kept pretty small, might be even
> video frame length. 

I believe in theory that on platforms using CoreMedia (i.e., macOS and iOS), a composite audio sample can be divided up to the sample rate.  So a composite sample with a DTS,Duration of {0,44100} and a timescale of 44100 (which is also the audio track's sample rate) could be cleanly divided into two samples of {0, 12345} and {12345, 31755}. (Note that this will probably never line up perfectly with the timescale of a video track, whose own timebases tend to be not be based on the audio track's sample rate.)
Comment 7 up 2019-02-19 10:55:42 PST
Created attachment 362395 [details]
Test case file for the reported issue
Comment 8 up 2019-02-19 10:56:20 PST
Created attachment 362396 [details]
Expected output file for the reported issue
Comment 9 up 2019-02-19 10:58:29 PST
Created attachment 362398 [details]
Correct test case file for the reported issue
Comment 10 up 2019-02-19 11:08:05 PST
@jer I have created a simple test case and expected output file. 
The expectation is that none of the added samples gets dropped. 
The case will fail with current webkit version due to the describe issue. 
Sample 3 and all following samples will get dropped until the next sync sample, which might take multiple seconds and might repeat for the next group of pictures/samples.
Comment 11 up 2019-02-19 16:30:43 PST
@jer What's the best way to proceed? I can propose a patch that will maintain the current logic as far as possible while fixing the issue.?
Comment 12 Jer Noble 2019-02-20 07:08:55 PST
(In reply to up from comment #11)
> @jer What's the best way to proceed? I can propose a patch that will
> maintain the current logic as far as possible while fixing the issue.?

@up, sure, that'd be fine! When you propose a patch, the EWS (early warning system) bots will build the patch and run all the existing tests, including all the MSE ones to ensure that there are no regressions to previous fixes.
Comment 13 up 2019-02-22 14:29:09 PST
Created attachment 362766 [details]
Patch Rev1

@jer I have added a patch. Code only, so without test case files and change log yet. I will add more comments on the idea of the patch. Is this ok for now, or should I submit the complete patch already before review?
Comment 14 Jer Noble 2019-02-22 14:37:10 PST
(In reply to up from comment #13)
> Created attachment 362766 [details]
> Patch Rev1
> 
> @jer I have added a patch. Code only, so without test case files and change
> log yet. I will add more comments on the idea of the patch. Is this ok for
> now, or should I submit the complete patch already before review?

@up I think what you've done so far is fine. You can signal that your patch is ready for review by setting the "r?" flag on the attachment, but attaching "work in progress" patches like yours is perfectly acceptable. Next step is to force the EWS bots to try to apply, build, and test your patch. The EWS bots will automatically do so for any "r?" patch, but for "work in progress" patches, you have to do that manually by clicking the "Submit for EWS analysis" button next to the attachment.
Comment 15 up 2019-02-22 16:20:22 PST
Created attachment 362781 [details]
Patch Rev2
Comment 16 up 2019-02-25 09:45:13 PST
@jer Patch Rev2 has passed the EWS. Intentions:  

- Keeping changes local, compact and clear 
- Avoiding to modify the state of the SourceBuffer object 
- Not mixing time differences and durations in greatestDecodeDuration in general
- Avoiding to make the final conditional statement more complex and nested 
- Respecting the current greatestDecodeDuration approach and complementing it only as far as required 
- Fixing the issue while not breaking out-of-order appends
Comment 17 Jer Noble 2019-02-25 10:20:48 PST
This looks like a good approach. For Rev 3, please add the tests you attached to this bug in LayoutTests/media/media-source, and generate a ChangeLog using 'prepare-ChangeLog -b 194747', and I'll review it.
Comment 18 up 2019-02-25 10:30:09 PST
@jer Thanks, I will do so.
Comment 19 up 2019-02-26 12:40:51 PST
Created attachment 363004 [details]
Patch Rev3
Comment 20 Jer Noble 2019-02-26 13:39:34 PST
Comment on attachment 363004 [details]
Patch Rev3

r=me. If you'd like me (or any other WebKit contributor) to land it on your behalf, please set the 'cq' flag to 'cq?'.
Comment 21 up 2019-02-27 05:41:17 PST
@jer Thank you! Patch Rev3 passed the EWS. I have set the commit-queue flag for it to ?.
Comment 22 WebKit Commit Bot 2019-02-27 09:31:51 PST
Comment on attachment 363004 [details]
Patch Rev3

Clearing flags on attachment: 363004

Committed r242129: <https://trac.webkit.org/changeset/242129>
Comment 23 WebKit Commit Bot 2019-02-27 09:31:53 PST
All reviewed patches have been landed.  Closing bug.
Comment 24 Eric Carlson 2019-02-27 09:45:51 PST
(In reply to up from comment #21)
> @jer Thank you! Patch Rev3 passed the EWS. I have set the commit-queue flag
> for it to ?.

@up thanks for finding, reporting and fixing this!
Comment 25 up 2019-02-28 13:24:41 PST
Thank you all, and especially @jer for the support and guidance!
Comment 26 up 2019-03-07 01:41:45 PST
@jer Hi, just checked the recent STP 77 branch. As it seems to be branched a bit back in time this patch is not included. Is there something else required from our side, except patience, to hopefully find it in the next STP(s) and  the next Safari OSX release based on recent STPs?
Comment 27 Jer Noble 2019-03-07 05:30:45 PST
@up, this change will automatically get picked up in the next STP, so there patience is all that’s required. It will also get picked up in a future Safari release, and I will ping this thread once that release is available.
Comment 28 up 2019-04-02 06:24:23 PDT
@jer Seems like the issue has been released in Safari 12.1 without this patch.
Comment 29 Oliver Lietz 2019-04-02 10:45:53 PDT
Hello everybody,

we are excited to learn how the patch we proposed might land in the Safari releases. 
We noticed that Safari 12.1 still has the bug we worked on. 
This has a severe impact on our live streaming business applications. 
Is there a way to find out the ETA for getting this into production?
It would be great to hear back about this.
Please pm or mail me.

Thank you!

Oliver Lietz
Founder/CEO nanocosmos