200949 – Media Source Extensions performance during seek

Dustin Kerstein

Reported 2019-08-20 16:08:49 PDT

The following JSFiddle uses both MSE and WebGL (through Three.js) and downloads an MP4Dashed mp4 and then attempts to seek through every single frame and then loops to the beginning. Neither Chrome nor Firefox seem to have any issue with this use-case, but Safari 12.1.2 (and others) only make it through the first loop, and then performance slows to a crawl. https://jsfiddle.net/8w2Lb4dt/ I'm also hoping to get this working on iOS 13, but I figure first getting it working on Desktop Safari is probably the best path forward. Is there any reason this particular use-case for MSE + OSX/iOS isn't valid? Thanks, Dustin

Dustin Kerstein

Comment 1 2019-08-21 08:54:30 PDT

Here is a JSFiddle that doesn't rely on Three.js - https://jsfiddle.net/yfjowtqn/ Note that this still has issues after the first loop (rendering speed is <50% compared to first loop) though it's not as drastic as the above Three.js example.

Dustin Kerstein

Comment 2 2019-08-21 09:04:39 PDT

Upon further testing it doesn't seem like it requires a full loop of the file. The issue appears to be when seeking to a frame that has already been seeked to. Ex. (assuming all frames are keyframes) 1. Seek to frame 1 and it will take X milliseconds 2. Seek to frame 2 and it will take around X milliseconds 3. Seek to frame 1 and it will take Y milliseconds (and be 5-10 times longer that X) I am also able to replicate similar behavior when seeking to contiguous frames even when they are being seeked to the very first time.

Radar WebKit Bug Importer

Comment 3 2019-08-22 15:21:59 PDT

<rdar://problem/54617026>

Dustin Kerstein

Comment 4 2020-05-01 10:28:28 PDT

It looks like some of the above issue was related to appending an already existing frame in the buffer. I was able to utilize a workaround by tricking Safari into setting m_shouldGenerateTimestamps (https://github.com/WebKit/webkit/blob/29271ffbec500cd9c92050fcc0e613adffd0ce6a/Source/WebCore/Modules/mediasource/SourceBuffer.cpp) by using the following code: sourceBuffer = ms.addSourceBuffer('audio/mpeg'); sourceBuffer.mode = 'sequence'; And then seeking to an ever increasing value. See here for a demo - https://jsfiddle.net/dustinkerstein/wsm8gh7e/ - Note that when testing in Safari you'll need to click the video in the iFrame to get the rAF render loop to run at 60fps (otherwise it's stuck at 30fps - and less likely to exhibit the issue mentioned below). However, the above workaround only works in certain conditions based on the computer's capabilities / video specs. It appears as though the seek operation (via fastSeek or setting currentTime) gets lost / cancelled out by a subsequent seek. Ie. When seeking quickly, I am missing 'seeked' events (try on an iPad or older laptop / desktop for easier replication with the demo above). No error is shown, the video just gets stuck, either temporarily or permanently. I've tried other possible workarounds that don't rely on seeking, but to no avail. Does anyone see any alternative way to get this working? Would it be considered a bug for those 'seeked' events to get lost?

Dustin Kerstein

Comment 5 2020-05-02 10:34:16 PDT

Alright, so I think I'm seeing two distinct issues related to seeking with MSE. 1. Seeking to non-latest buffer - When using Mode=Segments, even when I protect against appending an already appended buffer, I still see huge lag when seeking - https://jsfiddle.net/nh97yba3 - I further tested appending the entire video - https://jsfiddle.net/rxjom47L - and I still see this same lag. So I think what it boils down to is seeking to a buffer that isn't the "end/latest". 2. Seeking too quickly - Here is an MSE test based on my workaround using Mode=Sequence, using SetInterval to control the seek rate - https://jsfiddle.net/5jtfbh8m - Note you'll need to tweak the SetInterval period to find where your computer is unable to keep up. On my iPad Pro it starts to freeze well below 60fps even at the HD resolution, whereas my Macbook Pro can handle UHD up to over 100fps. But either way, at some combination of computer capabilities + video specs it just stops seeking (ie. displaying frames) and this state can't be easily detected in code during playback. Non-MSE playback behaves differently - https://jsfiddle.net/m4hystc5 - Even when pushed beyond decoding limits, it is still able to display new frames. One further interesting note is that in the MSE version when playback is frozen, you can switch to a different tab and then back, and you'll see a new frame. Note that with all of these JSFiddles, you'll need to click into the result/video frame to get it to render at full speed. There is battery saving ad tech in Safari that restricts iFrames from rendering at full speed until they are focused on. Let me know if you need any further info / debug. Do you feel that either of these issues could be addressed in a fix, or possibly a workaround?

Jer Noble

Comment 6 2020-05-02 12:13:33 PDT

Seeking isn't a cheap operation. It requires flushing and re-enqueueing enough frames to completely feed the decoder. There's a good chance a seek operation will take longer than 16ms, and you'll overshoot your rAF window. Why aren't you just playing at a normal rate?

Jer Noble

Comment 7 2020-05-02 12:17:24 PDT

To expand on this last comment, the decoder will pre-decode more than just the first frame in the queue. By seeking to every subsequent frame, you're effectively making the decoder work 2x to 10x more by making it flush all it's previously decoded but not yet displayed frames. So yeah, if you seek at 60fps, you'll only keep up if you have a super powerful machine, because you're effectively playing the equivalent of ~6 videos simultaneously.

Dustin Kerstein

Comment 8 2020-05-02 12:32:16 PDT

Hi Jer, the videos I'm testing with (and my use-case for PanoMoments.com) are entirely comprised of keyframes to minimize the amount of time (and frames) needed to seek. We've been using Media Source Extensions on Chrome/Firefox to effectively build an "ad-hoc non-sequential frame decoder" for the past few years. I understand that seeking isn't cheap when using a longer GOP, but when using 100% keyframes, seeking/decoding cost should be at its minimum. It just appears as though WebKit is doing something a bit different when seeking with MSE (both when seeking to "previous" buffered frames and when seeked rapidly). Have you been able to replicate either of those behaviors with the JSFiddles? I'd be happy to provide modified versions for testing the behavior in Chrome/Firefox for comparison.

Jer Noble

Comment 9 2020-05-02 13:00:36 PDT

(In reply to Dustin Kerstein from comment #8) > Hi Jer, the videos I'm testing with (and my use-case for PanoMoments.com) > are entirely comprised of keyframes to minimize the amount of time (and > frames) needed to seek. The only thing being all I-frames does is mean you can have a constant seek time for any point in the timeline, not necessarily a fast seek time, and definitely not necessarily a faster-than-realtime seek time. > We've been using Media Source Extensions on > Chrome/Firefox to effectively build an "ad-hoc non-sequential frame decoder" > for the past few years. I understand that seeking isn't cheap when using a > longer GOP, but when using 100% keyframes, seeking/decoding cost should be > at its minimum. Seeking into the middle of a long GOP can be expensive, true. But the general decode cost of an all I-frame movie is much, much higher. I-frame decoding is expensive, both in file size and decode time. The MSE specification is built upon the premise of positive playback rates. By seeking backward one frame at a time to simulate a negative playback rate, you're swimming against the stream as far as optimizations we've built into the decoder to support normal playback. Frankly, the MSE API is not built for what you're trying to achieve, and I think you're barking up the wrong tree here. If the MSE specification explicitly allowed negative playback rates, you could just specify some negative rate and let the media engine do all its optimizations in your favor, including pre-decoding as-of-yet-undisplayed frames, and dropping frames when the decoder can't keep up with the rate you're requesting. But it doesn't, and you're trying to force it to through seeking.

Jer Noble

Comment 10 2020-05-02 13:03:53 PDT

All that said, if I were you, I'd do some experimenting with re-writing your mp4 stream so that, even when playback was proceeding visually in the negative direction, you keep appending I-frames whose PTSs monotonically increased. Since all your samples are I-frames, this should be trivial re-muxing work without having to re-encode anything. Then rather than seeking at 60fps, you could just modulate the media's forward playback rate to achieve your desired "play backwards" behavior. It would mean having to re-mux whenever the user changed playback direction with a swipe, but that should be much less expensive than repeatedly seeking.

Jer Noble

Comment 11 2020-05-02 13:06:19 PDT

Heck, you could just append the same stream twice, once forward from time [0, d), and the other remuxed in reverse from [d, 2d) (where d is the original stream duration). Then when the user changed directions, you'd just seek to the appropriate place in the opposite half of the bufferedRange and play back forwards. That means that you could remux on the server, rather that in JS on the client.

Jer Noble

Comment 12 2020-05-02 13:08:48 PDT

...contd. And this could even remove your requirement that all frames were I-frames. The only time there might be a long-gop expensive seek was when the user changed directions. Your file sizes would be dramatically reduced (like up to 10x smaller) and you'd still get much better performance.

Dustin Kerstein

Comment 13 2020-05-02 14:38:32 PDT

> The only thing being all I-frames does is mean you can have a constant seek > time for any point in the timeline, not necessarily a fast seek time, and > definitely not necessarily a faster-than-realtime seek time. Very true. Though faster than rAF decoding isn't required for PanoMoments. As long as the seek operation doesn't squash a previous seek (or we have a way to protect overloading), any amount of time the decoder takes is fine. We use video.readyState in our Chrome/Firefox MSE implementations to prevent overloading the decoder, but it appears as though Safari doesn't change its readyState when seeking, so we can't throttle the seek requests. Do you know if this is intended? > Seeking into the middle of a long GOP can be expensive, true. But the > general decode cost of an all I-frame movie is much, much higher. I-frame > decoding is expensive, both in file size and decode time. Indeed. Though for ad-hoc (and non-contiguous) decoding we've not yet found a better way. MSE + all I-Frames do have a huge side benefit though - we can "stream" frames and allow a viewer to interact before the entire video is downloaded. We use an asterisk-like algorithm that allows playback starting with 20% of the total frames and then seamlessly layer in the frames as they download. > The MSE specification is built upon the premise of positive playback rates. > By seeking backward one frame at a time to simulate a negative playback > rate, you're swimming against the stream as far as optimizations we've built > into the decoder to support normal playback. Yep, that's why our main MSE implementation on Chrome/Firefox uses Mode=Sequential and a monotonically increasing timestamp regardless of the viewer's chosen frame (forwards, backwards, skipping 10, etc.) We originally tried this approach with WebKit, but it appears to still use the frame's PTS when Mode=Sequential and starts to slow down / break after a short while (to see this in action just switch the MIME type back to 'video/mp4' in https://jsfiddle.net/wsm8gh7e). When using the addSourceBuffer('audio/mpeg') workaround, this slowdown / break is avoided, though I do admit it's a rather flaky solution given that it relies on pretending we're submitting audio when we're most definitely not. There is a question here though, should WebKit be setting m_shouldGenerateTimestamps when the user sets Mode=Sequence? It appears as though Chrome/Firefox do generate their own timestamps when the user sets sequential mode. > Frankly, the MSE API is not built for what you're trying to achieve, and I > think you're barking up the wrong tree here. If the MSE specification > explicitly allowed negative playback rates, you could just specify some > negative rate and let the media engine do all its optimizations in your > favor, including pre-decoding as-of-yet-undisplayed frames, and dropping > frames when the decoder can't keep up with the rate you're requesting. But > it doesn't, and you're trying to force it to through seeking. We've prototyped using negative rates (and the dual stream forwards+backwards design you mentioned in your later comment) and while they were functional, there were many browser/device specific idiosyncrasies, and to get it working well enough (particularly the seek time when switching directions) we ended up with a total file size that wasn't justifiable, especially when considering that these designs lose the ability to "stream" as the MSE + keyframe design can. Bit of a tangent, but you'd probably appreciate how we solved this on native code (see here for a demo https://apps.apple.com/us/app/panomoments/id1227039970 and here for the beta SDK https://github.com/MomentCaptureInc/PanoMoments). It utilizes a highly compressed long GOP h264 elementary stream, and transcodes it to a 100% keyframe stream during the download - essentially a streaming transcoder. This design was only feasible by having access to GPU accelerated encoded/decoding the native SDKs provide. I've also recently tried using a non-seeking approach with the Mode=Sequence + addSourceBuffer('audio/mpeg') workaround. It works for a little while, but the playback always ends up stalled, even when trying to detect this state and manually setting play() again. The monotonically increasing PTS + seeking approach was definitely the best design for Chrome/Firefox and it's very close to working with WebKit (with the 'audio/mpeg' workaround). If the readyState reflected the overloaded condition (as it does on Chrome/Firefox) we'd be able to throttle the seek call. Do you think there's a chance that would be possible/appropriate to implement within WebKit? Let me know if you see any other possible workarounds. Thanks!

Dustin Kerstein

Comment 14 2020-05-02 14:49:20 PDT

Quick followup - Even when holding a seek request until the previous frame's 'seeked' event, we still see this squashed seek issue. It appears WebKit dispatches this event before the frame is actually rendered to screen (potentially only in MSE implementations).

Jer Noble

Comment 15 2020-05-02 15:50:33 PDT

(In reply to Dustin Kerstein from comment #13) > We use video.readyState in our Chrome/Firefox MSE implementations to prevent > overloading the decoder, but it appears as though Safari doesn't change its > readyState when seeking, so we can't throttle the seek requests. Do you know > if this is intended? Yes. Chrome drops to HAVE_METADATA always during a seek, even if they're seeking within a buffered range. This is arguably against the spec, since the HTML spec defines HAVE_METADATA as "no media data is available for the immediate current playback position." If you're seeking within the buffered range, there is definitely media data available for the seek destination. It seems that Chrome is using the decoder state as a proxy for media availability. There's some informational notes to that end in the spec, but nothing normative: "Really the only time the difference is relevant is when painting a video element onto a canvas, where it distinguishes the case where something will be drawn (HAVE_CURRENT_DATA or greater) from the case where nothing is drawn (HAVE_METADATA or less)." WebKit will always have something to draw into a canvas, regardless of whether you seek into an unbuffered range, so even this non-normative note doesn't apply here. > There is a question here though, > should WebKit be setting m_shouldGenerateTimestamps when the user sets > Mode=Sequence? No, the "generate timestamps flag" is set by the call to addSourceBuffer(), not by setting the mode. See step 6 of the "addSourceBuffer()" algorithm here: <https://w3c.github.io/media-source/#dom-mediasource-addsourcebuffer>. > > Frankly, the MSE API is not built for what you're trying to achieve, and I > > think you're barking up the wrong tree here. If the MSE specification > > explicitly allowed negative playback rates, you could just specify some > > negative rate and let the media engine do all its optimizations in your > > favor, including pre-decoding as-of-yet-undisplayed frames, and dropping > > frames when the decoder can't keep up with the rate you're requesting. But > > it doesn't, and you're trying to force it to through seeking. > > I've also recently tried using a non-seeking approach with the Mode=Sequence > + addSourceBuffer('audio/mpeg') workaround. It works for a little while, but > the playback always ends up stalled, even when trying to detect this state > and manually setting play() again. This is something the spec authors are aware of; they would like to add the ability for clients to control the behavior when the currentTime runs up against the end of the buffered range; clients can ask to "play through" gaps and underruns in buffered ranges. See here: <https://github.com/w3c/media-source/issues/160>

Dustin Kerstein

Comment 16 2020-05-03 07:45:01 PDT

Ok, so it sounds like the second issue (seeks being squashed when seeking too quickly) is more or less as-designed and there isn't a way to protect against / detect this condition. Is that right? I'm still curious about the first issue though, where seeking to earlier frames in the buffer (even when they are 100% keyframes) are much slower than seeking later frames. Here's a really simple test that appends 243 keyframes to the buffer and measures seek time to the console - https://jsfiddle.net/o27m5pwt - Hitting the keyboard number keys seeks to various individual frames spaced out through the buffered time range. Here's the behavior I see: 1. When seeking between frames 242 and 241, the seek time is around 1ms 2. When seeking between frames 0 and 1, the seek time is around 40ms 3. When seeking between frames 242 and 0, the seek time is around 20ms (and I am seeing a seeing a flicker of a third frame being decoded somehow) If I perform this same test when using a Blob as a video source (comment out line #7 and switch lines #20 and #21) then I see near constant seek times of around 20ms (except for the occasional slowness which I believe is caused by https://bugs.webkit.org/show_bug.cgi?id=211295) I guess even if we figure out the cause of this slowness, I'd probably still run into the squashed seek issue, but the former issue seems like it could be more of a general issue across more normal seeking use-cases. Do you see what could be going on there?

Dustin Kerstein

Comment 17 2020-05-03 08:14:37 PDT

Also, here's a much more realistic use-case of the issue with squashed MSE seeks using the WebKit scrubber bar - https://jsfiddle.net/njvcs6bu (note this video is also 100% keyframes). Simply move the slide back and forth from beginning to end and you'll find that you often end up with a frame displayed that isn't the one currently selected based on the scrubber bar. Ie. it's not always displaying the last seeked frame. Then try commenting out line #7 and switching lines #18 and #19 (using a Blob as src) and note that scrubbing works as expected. Are you able to replicate?

Dustin Kerstein

Comment 18 2020-05-04 06:41:04 PDT

Another behavior that can be seen in 360 keyframe SMPTE video - https://jsfiddle.net/njvcs6bu is when clicking in the timeline bar (rather than dragging), you'll occasionally see a flicker of a third frame (also mentioned in a comment above). Further, for each click on the timeline bar, you'll receive two 'seeked' events (neither of which correspond to the flickered third frame). Also, in my previous comment I said "it's not always displaying the last seeked frame" - Upon further testing, when dragging the timeline bar quickly, seeks are squashed, but it appears that the last seek does eventually return the correct frame, except when dragging the timeline bar to the end. That seems to be a special case where the last seek never returns correctly. A quick compilation of the behaviors I see (some of which may be related to each other): 1. Slow seeking to non-latest buffer 2. Squashed seeks with no way to prevent overload (either through readyState or using 'seeked' events) 3. Duplicate 'seek' events when using timeline bar 4. Occasional flicker of third frame when seeking from keyframe frame A -> keyframe B Do you feel any of those are worth further investigation? I'm happy to help with anything I can do on my end.

Dustin Kerstein

Comment 19 2020-05-07 16:30:17 PDT

Here's another MSE test containing a long GOP file with 1 I-Frame and 359 P-Frames that shows the 3rd frame flicker when seeking using the timeline bar - https://jsfiddle.net/95k4xfvn - As expected, seeking within the long GOP is very slow as it needs to decode from that sole keyframe, but hopefully being able to easily see that extra rendered frame helps diagnose what's going on. The 3rd frame flicker only seems to happen when seeking forward in the timeline (on both the long GOP and 100% I-Frame tests). Are you able to replicate behaviors #1-4 from above?