Bug 194747

Summary:

[MSE] SourceBuffer sample time increment vs. last frame duration check is broken

Product:

WebKit

Reporter:

Component:

Media

Assignee:

Nobody <webkit-unassigned>

Status:

RESOLVED FIXED

Severity:

Blocker

CC:

commit-queue, eric.carlson, jer.noble, lietz, webkit-bug-importer

Priority:

Keywords:

InRadar

Version:

Safari Technology Preview

Hardware:

Unspecified

OS:

All

Attachments:

Description	Flags
Test case file for the reported issue	none
Expected output file for the reported issue	none
Correct test case file for the reported issue	none
Patch Rev1	none
Patch Rev2	none
Patch Rev3	none

Reported 2019-02-16 09:06:35 PST

Technical Severity This issue is breaking MSE playback for totally MSE compliant streams, in our case fMP4/H.264/AAC, but not limited to this container or codecs. Effects are playback not starting at all or significantly delayed and massive amount of frame/sample drop mid stream forced by a wrong a MSE compliance check. Business Severity If the issue will make it into Safari 12.2, MSE playback will be broken in Safari for our cdn, business leading customers of ours and other MSE applications and users. We would like to offer assistance and developer time to prevent this. Code Location and Context Location of the issue is SourceBuffer::sourceBufferPrivateDidReceiveSample in WebCore/Modules/mediasource/SourceBuffer.cpp, in the second part of this conditional statement related to trackBuffer.greatestDecodeDuration. The problem is how trackBuffer.greatestDecodeDuration is set/calculated. It is mainly a logical issue. Please find more in the details section. // 1.6 ↳ If last decode timestamp for track buffer is set and decode timestamp is less than last // decode timestamp: // OR // ↳ If last decode timestamp for track buffer is set and the difference between decode timestamp and // last decode timestamp is greater than 2 times last frame duration: if (trackBuffer.lastDecodeTimestamp.isValid() && (decodeTimestamp < trackBuffer.lastDecodeTimestamp || (trackBuffer.greatestDecodeDuration.isValid() && abs(decodeTimestamp - trackBuffer.lastDecodeTimestamp) > (trackBuffer.greatestDecodeDuration * 2)))) Bug history We have first discovered the issue in Safari Technology preview 75. We have been able to verify that it exists already in STP 72, and that it did NOT exist in STP 61. Further it does not exist in the current Safari 12.1 release. Checking out WebKit itself we have been able to recreate the issue in WebKit nightly 2019 Feb. 14th. We have checked in between [MEDIA] commits and found this one: https://trac.webkit.org/changeset/236314/webkit/ According to the description the trackBuffer.greatestDecodeDuration check has been hidden behind a smaller bug related to the assignment. By fixing this the major issue with trackBuffer.greatestDecodeDuration became active. Details Here is how we understand the timestamp increment vs last frame duration check in the MSE spec and how it is done in the current implementation. Example frame/sample sequence that should not fail (but currently does): Frame A) dts:0, duration 10 Frame B) dts:10, duration 90 Frame C) dts:100, duration 10 Timestamp increase and previous frame duration are perfectly aligned. Example frame/sample sequence that should fail: Frame A) dts:0, duration 10 Frame B) dts:10, duration 10 Frame C) dts:100, duration 10 Discontinuity due to mismatch from B to C. What should be compared is (C.dts - B.dts) vs. B.duration . Current impl. is comparing (C.dts - B.dts) vs. (B.dts - A.dts) . (involving a maximum value storage but that does not matter here) Instead of comparing current timestamp increment vs. last frame duration it's comparing current timestamp increment vs. previous timestamp increment, which makes a huge difference, because the target here is not to check how constant the frame rate is, but to detect discontinuities. Not sure about other containers, but in fMP4 the timestamp and duration of a frame are stored separately and explicitely. Implications seem to be even stranger for audio in fMP4, where one sample can hold multiple audio frames. In this case the current check will apply constraints to how many audio frames one sample contains and how it compares to the number of audio frames in the previous sample, which is obviously not what this check should be about. Proposal for a 'simple' fix Logically the solution would be to compare against trackBuffer.lastFrameDuration if valid instead of trackBuffer.greatestDecodeDuration. Technically both approaches could be combined to keep changes at a minimum, but no sample should be rejected if trackBuffer.lastFrameDuration is not valid. It must be ensured that, if valid, trackBuffer.lastFrameDuration holds the duration of the last frame/sample, not of any previous ones. Thanks in advance!

Attachments
Test case file for the reported issue (733 bytes, text/html) 2019-02-19 10:55 PST, up	no flags	Details
Expected output file for the reported issue (733 bytes, text/plain) 2019-02-19 10:56 PST, up	no flags	Details
Correct test case file for the reported issue (1.59 KB, text/html) 2019-02-19 10:58 PST, up	no flags	Details
Patch Rev1 (1.26 KB, patch) 2019-02-22 14:29 PST, up	no flags	Details Formatted Diff Diff
Patch Rev2 (1.36 KB, patch) 2019-02-22 16:20 PST, up	no flags	Details Formatted Diff Diff
Patch Rev3 (6.28 KB, patch) 2019-02-26 12:40 PST, up	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Jer Noble

Comment 1 2019-02-17 12:05:27 PST

@up, are you describing “decode time stamp” with “dts” or “display time stamp”? Also, decode duration or display duration?

Jer Noble

Comment 2 2019-02-17 12:10:28 PST

Also, we have a pretty capable “mock” MSE implementation for running layout tests that isn’t dependent on any particular demuxer/decoder. I wonder if we could translate your examples into a LayoutTest and verify that it is indeed failing the way you describe. FWIW, for audio samples, we have a way of “splitting” a large sample group into smaller ones for the purposes of overlap avoidance. So the “large sample” issue should be less of a problem in practice. FWIW #2, other container formats (most notably Matroshka) are much less rigorous than ISO BMFF when it comes to timing info.

Jer Noble

Comment 3 2019-02-17 12:12:12 PST

@up, do you have a public ally available trstcase that demonstrates the issue?

Radar WebKit Bug Importer

Comment 4 2019-02-17 12:12:47 PST

<rdar://problem/48148469>

Comment 5 2019-02-18 03:40:57 PST

@jer Thank you for the fast response. I was referring to decode time and decode duration, as this is what is checked in this case. dts vs. pts does not really matter, or is a more special question, because for video without frame reordering, e.g. H.264 Baseline Profile, dts and pts are the same, even if there is a constant offset sometimes, and will break just as described. Sorry, I'm not familiar (yet) with the tests in WebKit. I have seen the mock source, but not 100% sure how it works. Can somebody assist to model the first frame sequence I have described in a mock source test? Regarding a test case, I will try to provide one, but as I said the issue can be understood logically and that's what I tried to explain in the report. I suggest to focus on creating the mock test first. Regarding the split up of large audio sample groups, what are the limits that will cause the split up? Problem is that especially in low latency live streaming use cases, the fMP4 fragments are kept pretty small, might be even video frame length. Really appreciating your help.

Jer Noble

Comment 6 2019-02-18 14:27:36 PST

(In reply to up from comment #5) > @jer Thank you for the fast response. I was referring to decode time and > decode duration, as this is what is checked in this case. dts vs. pts does > not really matter, or is a more special question, because for video without > frame reordering, e.g. H.264 Baseline Profile, dts and pts are the same, > even if there is a constant offset sometimes, and will break just as > described. > > Sorry, I'm not familiar (yet) with the tests in WebKit. I have seen the mock > source, but not 100% sure how it works. Can somebody assist to model the > first frame sequence I have described in a mock source test? Sure, here's a representative sort of test case for this kind of test: https://trac.webkit.org/browser/webkit/trunk/LayoutTests/media/media-source/media-source-dropped-iframe.html Which has the expected results of: https://trac.webkit.org/browser/webkit/trunk/LayoutTests/media/media-source/media-source-dropped-iframe-expected.txt You can see the interface to "makeAInitSegment()" and "makeASample()" here: https://trac.webkit.org/browser/webkit/trunk/LayoutTests/media/media-source/mock-media-source.js From your initial example, you would need something like: var samples = concatenateSamples( makeASample(0, 0, 10, 100, 1, SAMPLE_FLAG.SYNC, 1), makeASample(10, 10, 90, 100, 1, SAMPLE_FLAG.SYNC, 1), makeASample(100, 100, 10, 100, 1, SAMPLE_FLAG.SYNC, 1), ); ..for a sample sequence of DTS,Duration [{0,10}, {10, 90}, {100, 10}], all with a timebase of 100 where each sample was an I-frame. > Regarding a test case, I will try to provide one, but as I said the issue > can be understood logically and that's what I tried to explain in the > report. > I suggest to focus on creating the mock test first. > > Regarding the split up of large audio sample groups, what are the limits > that will cause the split up? Problem is that especially in low latency live > streaming use cases, the fMP4 fragments are kept pretty small, might be even > video frame length. I believe in theory that on platforms using CoreMedia (i.e., macOS and iOS), a composite audio sample can be divided up to the sample rate. So a composite sample with a DTS,Duration of {0,44100} and a timescale of 44100 (which is also the audio track's sample rate) could be cleanly divided into two samples of {0, 12345} and {12345, 31755}. (Note that this will probably never line up perfectly with the timescale of a video track, whose own timebases tend to be not be based on the audio track's sample rate.)

Comment 7 2019-02-19 10:55:42 PST

Created attachment 362395 [details] Test case file for the reported issue

Comment 8 2019-02-19 10:56:20 PST

Created attachment 362396 [details] Expected output file for the reported issue

Comment 9 2019-02-19 10:58:29 PST

Created attachment 362398 [details] Correct test case file for the reported issue

Comment 10 2019-02-19 11:08:05 PST

@jer I have created a simple test case and expected output file. The expectation is that none of the added samples gets dropped. The case will fail with current webkit version due to the describe issue. Sample 3 and all following samples will get dropped until the next sync sample, which might take multiple seconds and might repeat for the next group of pictures/samples.

Comment 11 2019-02-19 16:30:43 PST

@jer What's the best way to proceed? I can propose a patch that will maintain the current logic as far as possible while fixing the issue.?

Jer Noble

Comment 12 2019-02-20 07:08:55 PST

(In reply to up from comment #11) > @jer What's the best way to proceed? I can propose a patch that will > maintain the current logic as far as possible while fixing the issue.? @up, sure, that'd be fine! When you propose a patch, the EWS (early warning system) bots will build the patch and run all the existing tests, including all the MSE ones to ensure that there are no regressions to previous fixes.

Comment 13 2019-02-22 14:29:09 PST

Created attachment 362766 [details] Patch Rev1 @jer I have added a patch. Code only, so without test case files and change log yet. I will add more comments on the idea of the patch. Is this ok for now, or should I submit the complete patch already before review?

Jer Noble

Comment 14 2019-02-22 14:37:10 PST

(In reply to up from comment #13) > Created attachment 362766 [details] > Patch Rev1 > > @jer I have added a patch. Code only, so without test case files and change > log yet. I will add more comments on the idea of the patch. Is this ok for > now, or should I submit the complete patch already before review? @up I think what you've done so far is fine. You can signal that your patch is ready for review by setting the "r?" flag on the attachment, but attaching "work in progress" patches like yours is perfectly acceptable. Next step is to force the EWS bots to try to apply, build, and test your patch. The EWS bots will automatically do so for any "r?" patch, but for "work in progress" patches, you have to do that manually by clicking the "Submit for EWS analysis" button next to the attachment.

Comment 15 2019-02-22 16:20:22 PST

Created attachment 362781 [details] Patch Rev2

Comment 16 2019-02-25 09:45:13 PST

@jer Patch Rev2 has passed the EWS. Intentions: - Keeping changes local, compact and clear - Avoiding to modify the state of the SourceBuffer object - Not mixing time differences and durations in greatestDecodeDuration in general - Avoiding to make the final conditional statement more complex and nested - Respecting the current greatestDecodeDuration approach and complementing it only as far as required - Fixing the issue while not breaking out-of-order appends

Jer Noble

Comment 17 2019-02-25 10:20:48 PST

This looks like a good approach. For Rev 3, please add the tests you attached to this bug in LayoutTests/media/media-source, and generate a ChangeLog using 'prepare-ChangeLog -b 194747', and I'll review it.

Comment 18 2019-02-25 10:30:09 PST

@jer Thanks, I will do so.

Comment 19 2019-02-26 12:40:51 PST

Created attachment 363004 [details] Patch Rev3

Jer Noble

Comment 20 2019-02-26 13:39:34 PST

Comment on attachment 363004 [details] Patch Rev3 r=me. If you'd like me (or any other WebKit contributor) to land it on your behalf, please set the 'cq' flag to 'cq?'.

Comment 21 2019-02-27 05:41:17 PST

@jer Thank you! Patch Rev3 passed the EWS. I have set the commit-queue flag for it to ?.

WebKit Commit Bot

Comment 22 2019-02-27 09:31:51 PST

Comment on attachment 363004 [details] Patch Rev3 Clearing flags on attachment: 363004 Committed r242129: <https://trac.webkit.org/changeset/242129>

WebKit Commit Bot

Comment 23 2019-02-27 09:31:53 PST

All reviewed patches have been landed. Closing bug.

Eric Carlson

Comment 24 2019-02-27 09:45:51 PST

(In reply to up from comment #21) > @jer Thank you! Patch Rev3 passed the EWS. I have set the commit-queue flag > for it to ?. @up thanks for finding, reporting and fixing this!

Comment 25 2019-02-28 13:24:41 PST

Thank you all, and especially @jer for the support and guidance!

Comment 26 2019-03-07 01:41:45 PST

@jer Hi, just checked the recent STP 77 branch. As it seems to be branched a bit back in time this patch is not included. Is there something else required from our side, except patience, to hopefully find it in the next STP(s) and the next Safari OSX release based on recent STPs?

Jer Noble

Comment 27 2019-03-07 05:30:45 PST

@up, this change will automatically get picked up in the next STP, so there patience is all that’s required. It will also get picked up in a future Safari release, and I will ping this thread once that release is available.

Comment 28 2019-04-02 06:24:23 PDT

@jer Seems like the issue has been released in Safari 12.1 without this patch.

Oliver Lietz

Comment 29 2019-04-02 10:45:53 PDT

Hello everybody, we are excited to learn how the patch we proposed might land in the Safari releases. We noticed that Safari 12.1 still has the bug we worked on. This has a severe impact on our live streaming business applications. Is there a way to find out the ETA for getting this into production? It would be great to hear back about this. Please pm or mail me. Thank you! Oliver Lietz Founder/CEO nanocosmos

Note You need to log in before you can comment on or make changes to this bug.