RESOLVED DUPLICATE of bug 185731 Bug 179241
[GStreamer][MSE][webm] Support repetitive appending of the same webm segment
https://bugs.webkit.org/show_bug.cgi?id=179241
Summary [GStreamer][MSE][webm] Support repetitive appending of the same webm segment
Enrique Ocaña
Reported 2017-11-03 07:59:03 PDT
Test "20. VideoBufferSize" in YouTube TV 2018 MSE Conformance Tests[1] tries to append the same webm segment several times and expects it to be properly processed. However, the current matroskademux implementation in GStreamer is unable to recognize a new header after having parsed the first one. [1] http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2018.html
Attachments
Patch (10.81 KB, patch)
2017-11-03 11:09 PDT, Enrique Ocaña
no flags
Patch (10.99 KB, patch)
2017-11-22 10:16 PST, Enrique Ocaña
mcatanzaro: review-
Enrique Ocaña
Comment 1 2017-11-03 11:09:29 PDT
Alicia Boya García
Comment 2 2017-11-03 14:12:27 PDT
Comment on attachment 325918 [details] Patch m_sampleDuration, m_lastPts and m_lastDts are stream (track) specific fields, so they should be encapsulated in some track-specific structure rather than a AppendPipeline one. This will become more important if we include support for multiple tracks per SourceBuffer. A way to fix this would be moving those fields to a separate struct that is instantiated and deleted with a probe. Using DISCONT is problematic because it breaks partial appends: when a single media segment is fed to MSE through many small append operations. This prevents YTTV2018 55.DelayedAACAudio from passing, for instance.
Enrique Ocaña
Comment 3 2017-11-22 10:16:18 PST
Alicia Boya García
Comment 4 2017-11-23 02:17:50 PST
Here goes my unofficial review: > GST_BUFFER_FLAG_SET(buffer, GST_BUFFER_FLAG_DISCONT); Please remove that. It will break things (any append that contains part of a MSE segment). It will break a YT2018 test. The same problem it attempts to fix (being able to demux segments with a timestamp lesser to the first frame PTS) is a bug in matroskademux and should be fixed there. I have a patch for exactly that. It's not a definitive version though, but it should work perfectly for MSE (the rest of the work before I upstream it is to avoid a regression in some non-MSE edge cases). I can upload the current version to WebKit nevertheless so that we can use that in the meantime. > if (gst_structure_get_fraction(structure, "framerate", &framerateNumerator, &framerateDenominator) > else if (gst_structure_get_int(structure, "rate", &framerateNumerator)) There is some confusion here, probably because the terminology is confusing, but it works like this: - "framerate": Fraction specifying the amount of video frames per second. - "rate": Integer number specifying the sampling rate of the audio, i.e. the number of PCM audio samples per second. A GstSample corresponds to a video frame or audio frame. An audio frame contains many PCM audio samples (a PCM audio sample is a just numeric value that indicates the amplitude of the wave signal at a given instant). Two audio GstSample's with the same "rate" may perfectly have very different durations. As an example, a typical Opus audio frame is 20 ms long and is encoded with a sampling rate of 48000 Hz. Assuming duration = 1 / rate as you are doing would give you a duration of 20.833 µs, very far from the real duration of 20 ms. If the "rate" branch makes something work, it would be because any small value would do in that case. Do you know any example where this is the case? > // Some containers like webm and audio/x-opus don't supply a duration. Let's use the one supplied by the caps. I have several problems with that comment: (1) Not all WebM files containing Opus do that, only those muxed by ffmpeg, which among other issues, omit Track.DefaultDuration, to the grief of people working with containers who would rather have them abstract the inner elementary streams (which was partly the point of having containers in the first place!). A more correct statement would be: // Some WebM files containing Opus audio don't supply frame duration. Apart from that... (2) > Let's use the one supplied by the caps. No, we cannot do that for audio, it's not supplied in the caps for the reasons specified above. "rate" is the PCM sampling rate, which has nothing to do with duration. Furthermore... (3) Opus is no longer a good example since now we use opusparse, which handles those correctly. That branch would only be useful for streams that have a useful "framerate" cap but don't set durations. That is impossible in MP4, because durations are mandatory (succesive frame's DTS are computed from durations). In Matroska there is a deprecated FrameRate field, but it's converted to DefaultDuration by GStreamer, no work to do on our part. https://www.matroska.org/technical/specs/index.html#FrameRate https://www.matroska.org/technical/specs/index.html#DefaultDuration What I noticed is that when a file does not define neither FrameRate nor DefaultDuration, GStreamer sets "framerate" to zero in the caps: video/x-vp9, width=(int)426, height=(int)240, framerate=(fraction)0/1 This is the case for feelings_vp9-20130806-242.webm from the YTTV tests. That was long... Next issue: 134 } else if (m_lastPts.isValid()) { 135 m_sampleDuration = MediaTime(GST_BUFFER_PTS(buffer), GST_SECOND) - m_lastPts; Usually this is the case... but what happens in the rare, but no less important cases? There are two pathological (yet common) cases: {PTS=30.0s} -> DUR=?? (probably a small default value, like 1ms) {PTS=30.1s} -> DUR=0.1s {PTS=30.2s} -> DUR=0.1s (new segment is inserted) {PTS=60.0s} -> DUR=29.8s (wrong, and dangerous!) or... {PTS=30.0s} -> DUR=?? (probably a small default value, like 1ms) {PTS=30.1s} -> DUR=0.1s {PTS=30.2s} -> DUR=0.1s (new segment is inserted) {PTS=0.0s} -> DUR=-30.2s (wrong, even more dangerous!) > // Some containers like webm don't supply a duration. Let's assume 60fps and let the gap sample hack fill the gaps if the duration was actually longer. It would be desirable to get as small here as possible to avoid frame overlaps. 60 fps = 16ms. An Opus audio frame can get as low as 2.5 ms. Would everything work with a value of 1ms? > // If we had a last DTS and the PTS hasn't varied too much (continuous samples), DTS++. Some formats like VP8 and VP9 don't need different a different frame order for decoding and presentation. In those cases, per definition, DTS == PTS. What is the reason for DTS++? The only case where it would be useful would be a container format where DTSs are somehow omitted but they are important for something, and they are caveats*. In MP4 this is impossible, as it forces you (thankfully!) to specify DTS in all cases due to the implicit nature of many of its boxes (PTS is defined in terms of DTS plus some offset, DTS is defined in terms of durations). In Matroska... well, there is no such thing as DTSs in Matroska. Frame dependencies, if specified at all, are specified by a completely different mechanism: ReferenceBlock timestamp pointers. https://www.matroska.org/technical/specs/index.html#ReferenceBlock In practice the lack of DTSs for decoders is not such a big deal as frame dependencies are already stored in the video elementary streams. In fact, looking at the GStreamer code for a while I could not find a single decoder element that uses them for anything. This makes sense when you consider that h264 is usually encoded in Matroska too and although matroskademux does not completely understand ReferenceBlock, it works just fine! Caveat on DTS++: The Gst docs define DTS as "the timestamp when the buffer should be decoded" (see https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/GstBuffer.html). This definition is consistent with the MPEG standards and the MSE spec. In other words, DTS is a deadline for when the frame should be fully decoded so that it can, at any moment from that point of time, be either presented or used as a dependency for another frame coming next. Note that, as a consequence DTS <= PTS for any correct frame. This is not the case with your code. Take this few frames as an example and run your algorithm in your head: DUR = 1 1 1 1 PTS = 0 1 3 2 Actual DTS = -1 0 1 2 Calculated DTS = 1 2 3 4 The DTS calculated by your algorithm don't resemble the actual decoding order (of course, since that was lost when the file was muxed!). Given that, wouldn't it make more sense to use PTS as default DTS value? Furthermore, you don't even need to do that because there is this code in GStreamerMediaSample: https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/graphics/gstreamer/mse/GStreamerMediaSample.cpp#L56 Using GST_BUFFER_DTS_OR_PTS() is good because we don't need to modify the GstBuffer, so even if any decoder would look at the DTS (chances are they won't), it won't make a difference.
Michael Catanzaro
Comment 5 2017-11-30 12:35:13 PST
Comment on attachment 327459 [details] Patch Alicia has reviewed this. Removing from request queue.
Alicia Boya García
Comment 6 2018-02-08 10:32:30 PST
> The DTS calculated by your algorithm don't resemble the actual decoding order (of course, since that was lost when the file was muxed!). Given that, wouldn't it make more sense to use PTS as default DTS value? Correction: It resembles the actual decoding order because the values are increasing and frames in Matroska are stored in decoding order. In any case, DTS=PTS for all formats used with WebM. Some other formats supported by Matroska (e.g. h264) require more attention, but those are not supported in WebM either way.
Enrique Ocaña
Comment 7 2018-10-01 07:39:31 PDT
The changes proposed in this patch aren't needed anymore, as the original issues are already resolved by the recent webm patches landed by Alicia. *** This bug has been marked as a duplicate of bug 185731 ***
Note You need to log in before you can comment on or make changes to this bug.