Bug 226481 - [MSE] When mediaSample of the same pts and dts are received consecutively, the mediaSample received later are discarded.
Summary: [MSE] When mediaSample of the same pts and dts are received consecutively, th...
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-05-31 22:15 PDT by Toshio Ogasawara
Modified: 2022-02-10 16:38 PST (History)
12 users (show)

See Also:


Attachments
patch (6.18 KB, text/plain)
2021-05-31 22:44 PDT, Toshio Ogasawara
no flags Details
patch (6.18 KB, text/plain)
2021-05-31 23:41 PDT, Toshio Ogasawara
no flags Details
patch (6.27 KB, patch)
2021-06-01 01:57 PDT, Toshio Ogasawara
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Toshio Ogasawara 2021-05-31 22:15:57 PDT
When mediaSample of the same pts and dts are received consecutively, SourceBufferPrivate::didReceiveSample() discards the mediaSample received later.
Remove the mediaSample received earlier in order to give priority to the mediaSample received later.
Comment 1 Toshio Ogasawara 2021-05-31 22:44:17 PDT
Created attachment 430231 [details]
patch
Comment 2 Toshio Ogasawara 2021-05-31 23:41:19 PDT
Created attachment 430233 [details]
patch

Fixed patch
Comment 3 Toshio Ogasawara 2021-06-01 01:57:16 PDT
Created attachment 430239 [details]
patch

Fixed patch
Comment 4 Jer Noble 2021-06-01 13:22:09 PDT
I'm not sure I understand; how did multiple samples with the same PTS & DTS get put in the trackBuffer without evicting previously appended samples?
Comment 5 Toshio Ogasawara 2021-06-01 21:13:24 PDT
>+        // When there are MediaSamples in track buffer which have the same PTS and DTS, all of them except for the last MediaSample should be removed.
>+        if (trackBuffer.lastDecodeTimestamp.isValid() && decodeTimestamp == trackBuffer.lastDecodeTimestamp) {
>+            auto iterPair = trackBuffer.samples.presentationOrder().findSamplesBetweenPresentationTimes(presentationTimestamp, frameEndTimestamp);
>+            if (iterPair.first != trackBuffer.samples.presentationOrder().end())
>+                erasedSamples.addRange(iterPair.first, iterPair.second);
>+        }
>+
>         // 1.15 Remove decoding dependencies of the coded frames removed in the previous step:
>         DecodeOrderSampleMap::MapType dependentSamples;
>         if (!erasedSamples.empty()) {

Previously added samples can be ranged with erasedSamples.addRange () and removed with the next step, "1.15 Remove decoding dependencies of the coded frames removed in the previous step:".
Samples with the same PTS and DTS get put in the trackBuffer by removing the previously added sample.
Comment 6 Toshio Ogasawara 2021-06-02 01:09:24 PDT
(In reply to Jer Noble from comment #4)
> I'm not sure I understand; how did multiple samples with the same PTS & DTS
> get put in the trackBuffer without evicting previously appended samples?

For example, if you switch the resolution during video playback, segment data with different resolutions and the same PTS and DTS at the beginning may be added.
When MediaSample is created with multiple frames, the PTS and DTS of the first frame data will be the PTS and DTS of MediaSample.
In this case, MediaSamples with the same PTS and DTS were added in succession.
Comment 7 Radar WebKit Bug Importer 2021-06-07 22:16:17 PDT
<rdar://problem/78983232>
Comment 8 Toshio Ogasawara 2021-06-10 02:30:00 PDT
If you create a MediaSample with one frame of video, such as a GTK port, you will not encounter this issue.
I created a mediasample from multiple frames of video on my own port I developed.

Example)
* "MediaSample = one frame"
ReceiveSample
  MediaSample-00:[frame00 PTS:0 DTS:0 Duration:1 Resolution:720 I-frame]
  MediaSample-01:[frame01 PTS:1 DTS:1 Duration:1 Resolution:720 P-frame]
  MediaSample-02:[frame02 PTS:2 DTS:2 Duration:1 Resolution:720 B-frame]

ReceiveSample
  MediaSample-03:[frame03 PTS:0 DTS:0 Duration:1 Resolution:480 I-frame]
  MediaSample-04:[frame04 PTS:1 DTS:1 Duration:1 Resolution:480 P-frame]
  MediaSample-05:[frame05 PTS:2 DTS:2 Duration:1 Resolution:480 B-frame]

* "MediaSample = multiple frame"
ReceiveSample
  MediaSample-00:[PTS:0 DTS:0 Duration:3 Resolution:720 I-frame
                  (frame00 PTS:0 DTS:0 Duration:1 Resolution:720 I-frame,
                   frame01 PTS:1 DTS:1 Duration:1 Resolution:720 P-frame,
                   frame02 PTS:2 DTS:2 Duration:1 Resolution:720 B-frame)]

ReceiveSample
  MediaSample-01:[PTS:0 DTS:0 Duration:3 Resolution:480 I-frame
                  (frame03 PTS:0 DTS:0 Duration:1 Resolution:480 I-frame,
                   frame04 PTS:1 DTS:1 Duration:1 Resolution:480 P-frame,
                   frame05 PTS:2 DTS:2 Duration:1 Resolution:480 B-frame)]

As mentioned above, didReceiveSample does not receive MediaSample with the same PTS and DTS consecutively when "MediaSample = one frame".
didReceiveSample may receive MediaSample with the same PTS and DTS consecutively when "MediaSample = multiple frame".
Comment 9 Jean-Yves Avenard [:jya] 2021-06-15 03:52:48 PDT
(In reply to Toshio Ogasawara from comment #8)
> If you create a MediaSample with one frame of video, such as a GTK port, you
> will not encounter this issue.
> I created a mediasample from multiple frames of video on my own port I
> developed.

A MediaSample can only contain frames that have their pts only ever monotonically increasing.

MediaSample was only ever designed to contained multiple *audio* frames ; not video.
Comment 10 Jean-Yves Avenard [:jya] 2021-06-15 04:01:33 PDT
Comment on attachment 430239 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=430239&action=review

> Source/WebCore/platform/graphics/SourceBufferPrivate.cpp:1165
> +        if (trackBuffer.lastDecodeTimestamp.isValid() && decodeTimestamp == trackBuffer.lastDecodeTimestamp) {

This code is fundamentally incorrect. A dts value is meaningless, its only purpose is to determine in which order the frames should be decoded.
It should never be used to determine which frame should be removed or where it should be inserted. 
There's no guarantee that a stream will use the same timeframe from one sample to another. The only thing that matters is that they are monotonically increasing.

Only the presentation timestamp is relevant and should be used to determine what should be removed and where.

When a PTS sample is to be removed, the entire GOP (group of picture) should be removed from that sample forward.
E.g you remove all samples from that sample, until the next keyframe is found.

The concept is touched on in MSE's Coded Frame Removal algorithm
https://w3c.github.io/media-source/#dfn-coded-frame-removal
Comment 11 Toshio Ogasawara 2021-06-15 23:55:37 PDT
(In reply to Jean-Yves Avenard [:jya] from comment #9)
> (In reply to Toshio Ogasawara from comment #8)
> > If you create a MediaSample with one frame of video, such as a GTK port, you
> > will not encounter this issue.
> > I created a mediasample from multiple frames of video on my own port I
> > developed.
> 
> A MediaSample can only contain frames that have their pts only ever
> monotonically increasing.
> 
> MediaSample was only ever designed to contained multiple *audio* frames ;
> not video.

Thank you for explaining the specifications of MediaSample.
We made a mistake in the design for MediaSample.
So this ticket is closed as INVALID.