Bug 201323 - [MSE] Don't enqueue samples that start at a big discontinuity
Summary: [MSE] Don't enqueue samples that start at a big discontinuity
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alicia Boya García
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-08-29 18:10 PDT by Alicia Boya García
Modified: 2020-01-16 02:44 PST (History)
9 users (show)

See Also:


Attachments
Patch (27.54 KB, patch)
2019-08-29 18:12 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch (27.03 KB, patch)
2019-09-01 05:21 PDT, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch (26.98 KB, patch)
2020-01-09 07:54 PST, Alicia Boya García
no flags Details | Formatted Diff | Diff
Patch (27.01 KB, patch)
2020-01-15 06:40 PST, Alicia Boya García
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alicia Boya García 2019-08-29 18:10:57 PDT
With the old logic SourceBuffer was enqueueing the first frame to be
appended in any circumstances. This was a bug because the user could
append first [5, 10) and then [0, 5). With the old behavior [5, 10)
would be enqueued first despite being clearly ahead of the initial
playback time (zero). By the time [0, 5) is enqueued it can't be
enqueued anymore because the decodeQueue is already ahead.

This patch fixes that logic to work when the first segments are
appended unordered. The test media-source-first-append-not-starting-at-zero.html
validates it.

The test media-source-append-presentation-durations.html checks the
new logic does not break in presence of presentation duration !=
decode duration.

As part of the same logic block, the lastEnqueuedPresentationTime was
used to decide when it's necessary to perform reenqueue after an
.erase() (it is necessary if any enqueued frames are replaced). Using
lastEnqueuedPresentationTime was not entirely accurate in presence of
B-frames, as you could erase a frame that has a presentation time
higher than the last enqueued one. That logic is replaced with a
monotonicly increasing highestEnqueuedPresentationTime and is tested
by media-source-remove-b-frame.html.

Tests: media/media-source/media-source-append-presentation-durations.html
       media/media-source/media-source-first-append-not-starting-at-zero.html
       media/media-source/media-source-remove-b-frame.html
Comment 1 Alicia Boya García 2019-08-29 18:12:24 PDT
Created attachment 377665 [details]
Patch
Comment 2 Jer Noble 2019-08-29 20:47:13 PDT
Comment on attachment 377665 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377665&action=review

> Source/WebCore/Modules/mediasource/SourceBuffer.cpp:1803
> +        if (trackBuffer.lastEnqueuedDecodeTime.isInvalid() || sample.decodeTime() >= trackBuffer.lastEnqueuedDecodeTime) {

decodeTime is insufficient to enforce a strict ordering of samples. Two samples with the same decodeTime but different presentationTimes need to be ordered by both the DTS and PTS, which is why the original code uses the Decode maps KeyType.

I also don’t understand why this code is necessary. Why isn’t the decode queue flushed and re-enqueued when [0,5) samples are appended? Time can’t be moving on from 0 since there’s nothing to display yet.
Comment 3 Alicia Boya García 2019-08-30 07:40:14 PDT
Comment on attachment 377665 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=377665&action=review

>> Source/WebCore/Modules/mediasource/SourceBuffer.cpp:1803
>> +        if (trackBuffer.lastEnqueuedDecodeTime.isInvalid() || sample.decodeTime() >= trackBuffer.lastEnqueuedDecodeTime) {
> 
> decodeTime is insufficient to enforce a strict ordering of samples. Two samples with the same decodeTime but different presentationTimes need to be ordered by both the DTS and PTS, which is why the original code uses the Decode maps KeyType.
> 
> I also don’t understand why this code is necessary. Why isn’t the decode queue flushed and re-enqueued when [0,5) samples are appended? Time can’t be moving on from 0 since there’s nothing to display yet.

You have a point about the first paragraph, I will rework it to use DecodeKey instead of DecodeTime.

About why the if in the first place instead of just flushing... well, this function is called every time a frame is appended (therefore, in append-order), which does not need to coincide with decode order. In the [5, 10) [0, 5) case this may not seem like a big deal, but imagine [0, 10) [15, 20) [10, 15)... inserting frames blindly in the decode queue would require a flush while the player is potentially playing [0, 10), which would cause an unnecessary hiccup as the decoders would have to start from scratch when [10, 15) was inserted. Never enqueueing frames too far ahead in the future avoids that.
Comment 4 Jer Noble 2019-08-30 13:33:39 PDT
(In reply to Alicia Boya García from comment #3)
> About why the if in the first place instead of just flushing... well, this
> function is called every time a frame is appended

Is that true? provideMediaData() should only be called in sourceBufferPrivateAppendComplete(), removeCodedFrames(), and sourceBufferPrivateDidBecomeReadyForMoreSamples(). None of which are per-appended-frame.

> (therefore, in
> append-order), which does not need to coincide with decode order.  In the [5,
> 10) [0, 5) case this may not seem like a big deal, but imagine [0, 10) [15,
> 20) [10, 15)... inserting frames blindly in the decode queue would require a
> flush while the player is potentially playing [0, 10), which would cause an
> unnecessary hiccup as the decoders would have to start from scratch when
> [10, 15) was inserted. Never enqueueing frames too far ahead in the future
> avoids that.

That may be true, but I'm trying to work out why the decode queue cares about presentation times.  There was already code to detect an unreasonably large gap between decode times.  Why is that insufficient?
Comment 5 Alicia Boya García 2019-08-30 17:37:14 PDT
(In reply to Jer Noble from comment #4)
> (In reply to Alicia Boya García from comment #3)
> > About why the if in the first place instead of just flushing... well, this
> > function is called every time a frame is appended
> 
> Is that true? provideMediaData() should only be called in
> sourceBufferPrivateAppendComplete(), removeCodedFrames(), and
> sourceBufferPrivateDidBecomeReadyForMoreSamples(). None of which are
> per-appended-frame.

What is done per-frame is adding frames to the decodeQueue while an append is being parsed. provideMediaData(), is indeed called only once when that append has being finished parsing and tries to actually send the frames of the queue into the decoders.

The piece of code the comment was in was sourceBufferPrivateDidReceiveSample(), which is called per-frame, and it decides whether a frame can be added to the queue or if it's late (the decode queue is ahead of that frame in decode time).

> 
> > (therefore, in
> > append-order), which does not need to coincide with decode order.  In the [5,
> > 10) [0, 5) case this may not seem like a big deal, but imagine [0, 10) [15,
> > 20) [10, 15)... inserting frames blindly in the decode queue would require a
> > flush while the player is potentially playing [0, 10), which would cause an
> > unnecessary hiccup as the decoders would have to start from scratch when
> > [10, 15) was inserted. Never enqueueing frames too far ahead in the future
> > avoids that.
> 
> That may be true, but I'm trying to work out why the decode queue cares
> about presentation times.  There was already code to detect an unreasonably
> large gap between decode times.  Why is that insufficient?

All the numbers in the explanation above are decode time, not presentation time.
Comment 6 Alicia Boya García 2019-08-31 09:40:53 PDT
(In reply to Alicia Boya García from comment #5)
> The piece of code the comment was in was
> sourceBufferPrivateDidReceiveSample(), which is called per-frame, and it
> decides whether a frame can be added to the queue or if it's late (the
> decode queue is ahead of that frame in decode time).

Correction: if it's late (a frame with an earlier DTS has already been enqueued).

It's always confusing because enqueuing and adding to the decodeQueue mean different things. Enqueuing is the last part when SourceBuffer calls `enqueueSample()` on the SourceBufferPrivate to feed the frame to the decoder.
Comment 7 Alicia Boya García 2019-09-01 05:21:56 PDT
Created attachment 377820 [details]
Patch
Comment 8 Alicia Boya García 2020-01-09 07:54:26 PST
Created attachment 387229 [details]
Patch
Comment 9 Xabier Rodríguez Calvar 2020-01-09 08:38:58 PST
Comment on attachment 387229 [details]
Patch

LGTM, though maybe we could give Eric/Jer some time to have a look as well.
Comment 10 Alicia Boya García 2020-01-15 06:40:11 PST
Created attachment 387788 [details]
Patch
Comment 11 WebKit Commit Bot 2020-01-16 02:43:25 PST
Comment on attachment 387788 [details]
Patch

Clearing flags on attachment: 387788

Committed r254670: <https://trac.webkit.org/changeset/254670>
Comment 12 WebKit Commit Bot 2020-01-16 02:43:26 PST
All reviewed patches have been landed.  Closing bug.
Comment 13 Radar WebKit Bug Importer 2020-01-16 02:44:13 PST
<rdar://problem/58638968>