With the old logic SourceBuffer was enqueueing the first frame to be appended in any circumstances. This was a bug because the user could append first [5, 10) and then [0, 5). With the old behavior [5, 10) would be enqueued first despite being clearly ahead of the initial playback time (zero). By the time [0, 5) is enqueued it can't be enqueued anymore because the decodeQueue is already ahead. This patch fixes that logic to work when the first segments are appended unordered. The test media-source-first-append-not-starting-at-zero.html validates it. The test media-source-append-presentation-durations.html checks the new logic does not break in presence of presentation duration != decode duration. As part of the same logic block, the lastEnqueuedPresentationTime was used to decide when it's necessary to perform reenqueue after an .erase() (it is necessary if any enqueued frames are replaced). Using lastEnqueuedPresentationTime was not entirely accurate in presence of B-frames, as you could erase a frame that has a presentation time higher than the last enqueued one. That logic is replaced with a monotonicly increasing highestEnqueuedPresentationTime and is tested by media-source-remove-b-frame.html. Tests: media/media-source/media-source-append-presentation-durations.html media/media-source/media-source-first-append-not-starting-at-zero.html media/media-source/media-source-remove-b-frame.html
Created attachment 377665 [details] Patch
Comment on attachment 377665 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377665&action=review > Source/WebCore/Modules/mediasource/SourceBuffer.cpp:1803 > + if (trackBuffer.lastEnqueuedDecodeTime.isInvalid() || sample.decodeTime() >= trackBuffer.lastEnqueuedDecodeTime) { decodeTime is insufficient to enforce a strict ordering of samples. Two samples with the same decodeTime but different presentationTimes need to be ordered by both the DTS and PTS, which is why the original code uses the Decode maps KeyType. I also don’t understand why this code is necessary. Why isn’t the decode queue flushed and re-enqueued when [0,5) samples are appended? Time can’t be moving on from 0 since there’s nothing to display yet.
Comment on attachment 377665 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=377665&action=review >> Source/WebCore/Modules/mediasource/SourceBuffer.cpp:1803 >> + if (trackBuffer.lastEnqueuedDecodeTime.isInvalid() || sample.decodeTime() >= trackBuffer.lastEnqueuedDecodeTime) { > > decodeTime is insufficient to enforce a strict ordering of samples. Two samples with the same decodeTime but different presentationTimes need to be ordered by both the DTS and PTS, which is why the original code uses the Decode maps KeyType. > > I also don’t understand why this code is necessary. Why isn’t the decode queue flushed and re-enqueued when [0,5) samples are appended? Time can’t be moving on from 0 since there’s nothing to display yet. You have a point about the first paragraph, I will rework it to use DecodeKey instead of DecodeTime. About why the if in the first place instead of just flushing... well, this function is called every time a frame is appended (therefore, in append-order), which does not need to coincide with decode order. In the [5, 10) [0, 5) case this may not seem like a big deal, but imagine [0, 10) [15, 20) [10, 15)... inserting frames blindly in the decode queue would require a flush while the player is potentially playing [0, 10), which would cause an unnecessary hiccup as the decoders would have to start from scratch when [10, 15) was inserted. Never enqueueing frames too far ahead in the future avoids that.
(In reply to Alicia Boya García from comment #3) > About why the if in the first place instead of just flushing... well, this > function is called every time a frame is appended Is that true? provideMediaData() should only be called in sourceBufferPrivateAppendComplete(), removeCodedFrames(), and sourceBufferPrivateDidBecomeReadyForMoreSamples(). None of which are per-appended-frame. > (therefore, in > append-order), which does not need to coincide with decode order. In the [5, > 10) [0, 5) case this may not seem like a big deal, but imagine [0, 10) [15, > 20) [10, 15)... inserting frames blindly in the decode queue would require a > flush while the player is potentially playing [0, 10), which would cause an > unnecessary hiccup as the decoders would have to start from scratch when > [10, 15) was inserted. Never enqueueing frames too far ahead in the future > avoids that. That may be true, but I'm trying to work out why the decode queue cares about presentation times. There was already code to detect an unreasonably large gap between decode times. Why is that insufficient?
(In reply to Jer Noble from comment #4) > (In reply to Alicia Boya García from comment #3) > > About why the if in the first place instead of just flushing... well, this > > function is called every time a frame is appended > > Is that true? provideMediaData() should only be called in > sourceBufferPrivateAppendComplete(), removeCodedFrames(), and > sourceBufferPrivateDidBecomeReadyForMoreSamples(). None of which are > per-appended-frame. What is done per-frame is adding frames to the decodeQueue while an append is being parsed. provideMediaData(), is indeed called only once when that append has being finished parsing and tries to actually send the frames of the queue into the decoders. The piece of code the comment was in was sourceBufferPrivateDidReceiveSample(), which is called per-frame, and it decides whether a frame can be added to the queue or if it's late (the decode queue is ahead of that frame in decode time). > > > (therefore, in > > append-order), which does not need to coincide with decode order. In the [5, > > 10) [0, 5) case this may not seem like a big deal, but imagine [0, 10) [15, > > 20) [10, 15)... inserting frames blindly in the decode queue would require a > > flush while the player is potentially playing [0, 10), which would cause an > > unnecessary hiccup as the decoders would have to start from scratch when > > [10, 15) was inserted. Never enqueueing frames too far ahead in the future > > avoids that. > > That may be true, but I'm trying to work out why the decode queue cares > about presentation times. There was already code to detect an unreasonably > large gap between decode times. Why is that insufficient? All the numbers in the explanation above are decode time, not presentation time.
(In reply to Alicia Boya García from comment #5) > The piece of code the comment was in was > sourceBufferPrivateDidReceiveSample(), which is called per-frame, and it > decides whether a frame can be added to the queue or if it's late (the > decode queue is ahead of that frame in decode time). Correction: if it's late (a frame with an earlier DTS has already been enqueued). It's always confusing because enqueuing and adding to the decodeQueue mean different things. Enqueuing is the last part when SourceBuffer calls `enqueueSample()` on the SourceBufferPrivate to feed the frame to the decoder.
Created attachment 377820 [details] Patch
Created attachment 387229 [details] Patch
Comment on attachment 387229 [details] Patch LGTM, though maybe we could give Eric/Jer some time to have a look as well.
Created attachment 387788 [details] Patch
Comment on attachment 387788 [details] Patch Clearing flags on attachment: 387788 Committed r254670: <https://trac.webkit.org/changeset/254670>
All reviewed patches have been landed. Closing bug.
<rdar://problem/58638968>