Bug 182200 - Samples in sourceBufferPrivateDidReceiveSample are added but not inserted in the decodeQueue
Summary: Samples in sourceBufferPrivateDidReceiveSample are added but not inserted in ...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Media (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-26 17:25 PST by Eric Stobbart
Modified: 2020-12-06 17:56 PST (History)
4 users (show)

See Also:


Attachments
Patch (3.04 KB, patch)
2018-01-26 17:28 PST, Eric Stobbart
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Stobbart 2018-01-26 17:25:31 PST
It's possible that a MediaTime compare failure can cause a sample to be added to the trackBuffer samples but not the trackBuffer decodeQueue.

To reproduce it has been somewhat sporadic, but the best way is to report Samples from a platform with a decodeTime with values which appear equal to lastEnqueuedDecodeEndTime. Offsetting the lastEnqueuedDecodeEndTime by a microsecond fixes this problem.
Comment 1 Eric Stobbart 2018-01-26 17:28:29 PST
Created attachment 332439 [details]
Patch
Comment 2 Jer Noble 2018-01-29 14:55:09 PST
What if we just changed the check from:

    if (trackBuffer.lastEnqueuedDecodeEndTime.isInvalid() || decodeTimestamp >= trackBuffer.lastEnqueuedDecodeEndTime) {

to:

    if (trackBuffer.lastEnqueuedDecodeEndTime.isInvalid() || decodeTimestamp >= trackBuffer.lastEnqueuedDecodeTime) {

Decode "durations" are largely a fiction, and in the case of ISOBMFF files, are really just there to generate the next sample's decode timestamp.  I don't see that there's be much danger in just enforcing strict decode timestamp ordering (without duration), and it wouldn't need a "microsecond of slop" hack.

(Note, this would require changing the definition of lastEnqueuedDecodeEndTime -> lastEnqueuedDecodeTime and modifying the one place where it's correctly set.)
Comment 3 Eric Stobbart 2018-01-29 15:45:54 PST
For additional context, I'm writing a platform implementation for embedded devices. I'm taking advantage of a platform sample's duration and reporting an ISOBMFF as an array of all the samples in a single platform sample & call back the SourceBufferClient once. I'm reading duration off the trun so the calculations work well as is. This speeds up the update cycle to the SourceBuffer and frees up the event loop back to JavaScript. I'm with you on it being a hack for sure, open to any ideas here, but I didn't want to rock the boat too much either.

Few ideas, could we round down the lastEnqueuedDecodeEndTime when decodeTime and duration get added?
Should a platform sample generate a decodeEndTime MediaTime?

It seemed slightly dangerous to at least not have logging around adding it to the samples array but not the decodeQueue.
Comment 4 Eric Stobbart 2018-01-29 15:50:15 PST
Also.. I do this this gap detection has some value..

        if (trackBuffer.lastEnqueuedDecodeEndTime.isValid() && sample->decodeTime() - trackBuffer.lastEnqueuedDecodeEndTime > oneSecond)
            break;
Comment 5 Jer Noble 2018-01-29 22:24:26 PST
(In reply to Eric Stobbart from comment #3)
> For additional context, I'm writing a platform implementation for embedded
> devices. I'm taking advantage of a platform sample's duration and reporting
> an ISOBMFF as an array of all the samples in a single platform sample & call
> back the SourceBufferClient once. I'm reading duration off the trun so the
> calculations work well as is. This speeds up the update cycle to the
> SourceBuffer and frees up the event loop back to JavaScript. I'm with you on
> it being a hack for sure, open to any ideas here, but I didn't want to rock
> the boat too much either.

Interesting. Well, as long as you implement "divisible" MediaSamples in case the client asks to remove() a range that occurs in the middle of your giant 'trun' sample, that should work fine. On iOS, that's basically how the audio MediaSamples work on Mac and iOS (since adding 44100 samples-per-second to a map structure is a bit overkill).

But now I see why you might be concerned about dropping lastEnqueuedDecodeEndTime for lastEnqueuedDecodeTime.

So is the issue due to poorly encoded content? I.e., are you seeing files where the 'trun's duration + the 'tfdt's' baseMediaDecodeTime != the next 'tfdt's baseMediaDecodeTime?

> Few ideas, could we round down the lastEnqueuedDecodeEndTime when decodeTime
> and duration get added?

I don't think there's much to round here, since both values should be in the sample's track's media's timebase.
Comment 6 Eric Stobbart 2018-01-30 07:45:08 PST
It does support being divisible, but not that fine grained. The segments are about 2 seconds with ~47 samples each (demuxed content) so the ~94 calls was still more than I'd like to append 2 seconds worth of content to 2 source buffers. The content isn't encoded poorly. It appears to be an issue with taking times (decode, and duration), dividing by the timebase, and creating two MediaTimes from doubles.. then adding those two doubles together, and comparing with another double. If the MediaSample provided a nextExpectedDecodeTime value, I could add the two ints then divide by the timebase. I didn't want to trim from off the MediaSample's duration, because it would return inaccurate buffered values. It really seemed like this microsecond slop was the best scenario.
Comment 7 Eric Stobbart 2018-02-08 09:40:00 PST
Can we pick up this conversation again? Happy to converge on a solution and try to get this closed out.
Comment 8 Jer Noble 2018-02-08 15:22:01 PST
(In reply to Eric Stobbart from comment #7)
> Can we pick up this conversation again? Happy to converge on a solution and
> try to get this closed out.

Sure thing.

(In reply to Eric Stobbart from comment #6)
> It does support being divisible, but not that fine grained. The segments are
> about 2 seconds with ~47 samples each (demuxed content) so the ~94 calls was
> still more than I'd like to append 2 seconds worth of content to 2 source
> buffers. The content isn't encoded poorly. It appears to be an issue with
> taking times (decode, and duration), dividing by the timebase, and creating
> two MediaTimes from doubles.. then adding those two doubles together, and
> comparing with another double.

Huh. Well, round tripping from MediaTimes to doubles and back is definitely a problem.  In fact, maybe that's the underlying cause of your issue.  If you have time values and time bases, you should be able to create a MediaTime from those two parts without going through a double; that's the primary use case of MediaTime.  So lets see if we can solve the rounding problem by using more MediaTime values directly everywhere.

Are you working in the GTK+ port? Can you point me towards your MediaSample subclass?
Comment 9 Jer Noble 2018-02-08 15:25:43 PST
(In reply to Jer Noble from comment #8)
> Are you working in the GTK+ port? Can you point me towards your MediaSample
> subclass?

Oof, if we're talking MediaSampleGStreamer, it looks like GStreamer stores pts and dts as nanoseconds, and not in units of the underlying media's timescale. So your rounding problems are already present when you're handed a GstBuffer.
Comment 10 Eric Stobbart 2018-02-08 16:03:28 PST
It's not gstreamer, but I get your point about the double conversion. Are you opposed to the microsecond when MediaTime is a double? Seems like if it allows MediaTime, then it should allow MediaTime from double. The MediaSample could have it's own lastPresentationTime where it returns it's own MediaTime instead of one that gets added. The other issue I had with MediaTime was that if I give it my pts and timebase, I can't get those original values back when I write the PES header during the enqueue process.
Comment 11 Alicia Boya García 2018-02-09 13:45:31 PST
(In reply to Jer Noble from comment #9)
> (In reply to Jer Noble from comment #8)
> > Are you working in the GTK+ port? Can you point me towards your MediaSample
> > subclass?
> 
> Oof, if we're talking MediaSampleGStreamer, it looks like GStreamer stores
> pts and dts as nanoseconds, and not in units of the underlying media's
> timescale. So your rounding problems are already present when you're handed
> a GstBuffer.

Note: Using nanoseconds as GStreamer does is lossy, but it poses no rounding problems by itself as long as the demuxer is careful and only converts to nanoseconds at the end. Then, working with nanoseconds is safe as long as only additions and subtractions are performed. That is not the case with floats, where even common operations like addition are dangerous.