WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
200615
FMP4 segments streamed into a MediaSource results in a black video.
https://bugs.webkit.org/show_bug.cgi?id=200615
Summary
FMP4 segments streamed into a MediaSource results in a black video.
Andrew Sampson
Reported
2019-08-10 12:21:10 PDT
Hi, we operate a game streaming service that enables end-users to play PC games inside of their web browser via a low-latency network stream. Safari in the past worked (albeit without Opus support) however, since we changed the format of our video to better support low-latency decoding, Safari fails to render anything when passing data to a MediaSource via append buffer. You can download a dump of one of our streams from
https://images.rainway.com/videos/demos/StreamDump.mp4.zip
to analyze. Some information on our setup: We stream real-time video by appending MSE data chunks and our goal is to have lowest latency decoding/presentation of this data. In most cases we are interested in having zero scheduling on client side (by browser) since if there is a video frame data it is already time to show it without bothering about time stamps and scheduling for presentation what would normal video delivery assume. Still we have to format the data to look like traditional fragmented MP4 streams, with frame rate indication in particular or otherwise browsers tend to fail to play the content. Our streams do not contain I-Frames after the first initialization segment. We believe our content is okay for low latency overall. Specifically in non-browser scenarios (e.g. in Xbox One MediaElement control which is essentially close in terms of pipeline structure: receives fragments MP4, decodes with the help of DXVA2, schedules for GPU-enabled presentation - like in browsers with just a bit more of control over the pipeline from our end) the same content shows good stable ultra low latency. Our app also works both in Chrome (
https://images.rainway.com/videos/demos/Chrome_75_Mac.mp4
) and in Firefox (
https://images.rainway.com/videos/demos/Firefox_69_Windows.mp4
) We've disabled support for Safari in our web app for now, but would love to restore it as soon as possible. Please let me know if I can provide any other information.
Attachments
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2019-08-10 13:43:05 PDT
<
rdar://problem/54168087
>
Andrew Sampson
Comment 2
2019-08-11 16:15:50 PDT
I am following up to add one quick note: should you require access to one of our development servers to make testing easier, please don't hesitate to reach out to me via email and we can get one provisioned.
Jer Noble
Comment 3
2019-08-12 11:35:11 PDT
When pushing this dump through AVFoundation generates the following error: (kFigAtomStream_AtomOffsetOutOfRange) (Attempt to read 0 bytes at 4 out of range of 'sdtp' atom's data size 4) So I don't think the parser likes that your `stsz` atom has a zero-sample length. This may very well be a parser bug, but it suggest that there may be a workaround available.
Jer Noble
Comment 4
2019-08-12 11:48:45 PDT
Oh, your `stdp` box is in your `moof`? I don't think that's allowed; `sdtp` is defined as being contained in a `stbl` box, and that's in your `moov`, not your `moof`. Probably still a parser bug, but what if you just removed the `stdp` box entirely?
Roman R.
Comment 5
2019-08-14 03:39:36 PDT
I would like to post a follow up to original Andrew's post. Indeed, we had just 'sdtp' boxes in our test content (left from older trying to mimic another flavor of video stream). We removed them and here is the updated description of the problem. The fragmented MP4 video only stream added via MSE is accepted by Safari. The browser displays a few frames and then freezes afterwards. I created a simpler reproducer which can show the problem outside of primary application and with minimal client side JS code adding MSE buffers into video tag. The way I understand the symptom is that video tag automatically transits from running to paused state once it reaches the last available video frame. Next appended frame results in buffering rather than resumed playback. I recoded a video of what I am seeing here:
https://youtu.be/IKz0EDuwD0A
The behavior is consistent, a freeze shortly after playback is started, often on the second frame like on video. The (beginning) original MP4 stream is here:
http://alax.info/temp/2019-08-14/video-AA.mp4
(no 'sdtp' boxes in particular there). When I click "play" again on video tag controls after freeze, the playback continues and it can freeze once again later. I can again resume it interactively. With sufficient data the stream plays presumably infinitely (at least 10 minutes and 1+ GB of data). One another additionally weird symptom is that playback of this video only stream is, sometimes, associated with audio noise at the beginning of playback. Safari plays some half a second of junk audio for no apparent reason.
Andrew Sampson
Comment 6
2019-08-26 18:43:32 PDT
Hey @Jer Noble, just wanted to check on the status of this bug. We're really anxious to get Safari support re-enabled. Can we provide any additional details?
Jer Noble
Comment 7
2019-08-27 07:59:45 PDT
(In reply to Andrew Sampson from
comment #6
)
> Hey @Jer Noble, just wanted to check on the status of this bug. We're really > anxious to get Safari support re-enabled. Can we provide any additional > details?
I'm not entirely sure what the bug _is_ at this point. There's like three different behaviors listed in Roman's comment above: 1) "video freezes shortly after starting" 2) "video pauses when reaching the last available video frame" 3) "video does not resume when appending one additional video frame" 1) and 2) might be the same bug. If you're trying to live on the bleeding edge of realtime, you will inevitably hit network or main thread jank and deliver frames late, and in that case, yes, we will pause playback. It's hard to know without a live test case. 3) is definitely true; we won't resume until we hit HAVE_ENOUGH_DATA. It would be a terrible user experience for most videos if we started playing every time one additional frame was made available from the network. That said, MSE-based players know /exactly/ when additional frames are made available and can call play() at the end of an update(). All of this is moot, however. If you want frames delivered to the screen ASAP, then MSE is the wrong technology. You should be using WebRTC for this, as that's the entire reason for WebRTC to exist: low latency, immediate mode rendering.
Roman R.
Comment 8
2019-08-27 08:24:00 PDT
I think the problem is the transition to paused state is never followed by resumed playback even though there is apparently a lot of new data added. It might be possible that play() from our end could resolve this problem more or less (it is hard to tell upfront if such play() could have ay side effects, esp. performance related) and perhaps the complaint here is that other browsers are handling this automatic resume pretty smoothly where Safari media element just stays paused.
Jer Noble
Comment 9
2019-08-27 09:00:05 PDT
(In reply to Roman R. from
comment #8
)
> I think the problem is the transition to paused state is never followed by > resumed playback even though there is apparently a lot of new data added. It > might be possible that play() from our end could resolve this problem more > or less (it is hard to tell upfront if such play() could have ay side > effects, esp. performance related) and perhaps the complaint here is that > other browsers are handling this automatic resume pretty smoothly where > Safari media element just stays paused.
I really don’t understand this argument. If you want to stay as close to the live edge as possible, why the heck wouldn’t you call play() immediately after your appends complete? This is the Faustian Bargain that you make when you adopt MSE. You must take full responsibility for all networking, including all networking adjacent activities, such as predicting when media is sufficiently buffered and the network is sufficiently fast to allow media to play back uninterrupted.
Roman R.
Comment 10
2019-08-27 09:54:01 PDT
Apart from pausing transition, which is I think the key problem here, there is an audio bug. It's apparent browser problem since played asset contains video track only at all times but with certain probability (maybe 5-10%? not sure) there is an audio noise emitted by browser: it plays some junk. Back to pausing. I gave a quick try to doing play() after every buffer (every frame in our case) appending. This sort of works because it keep browser in play state however I don't think this is how things are designed to work because this is associated with multiple problems: there is still transition through paused state and the browser is simply not being able to keep playing state reliably. Then at some times play would restart from position zero since presumably browser considers playback to be at the end of the stream. Then playing/paused state transition in first place is quite a heavy transition and I don't think it's even supposed to happen on per frame basis. Long story short, play() per frame/update does not look like a workaround. This would not happen if browser would simply stay playing without automatic fall into paused state. It is exactly the problem that it does not resume playback on HAVE_ENOUGH_DATA staying paused until interactive or programmatic play request. We do have full responsibility for networking, we are fine with this. We supply individual video frames in appropriate format recognized by browser and crafted for low latency playback and everything we basically expect is to have media element extract ES, decoder frames and present them. Setting autoplay to true, by the way, does not seem to have any effect on described behavior. That is, when other browser change state to paused as a result of explicit pause() call, Safari seems to be additionally switching to paused state on reaching lack of media data during active playback. I think that in the end of the day this is what differs Safari's behavior and creates a problem for us trying to play live ultralow latency video via MSE.
Jer Noble
Comment 11
2019-08-27 10:06:31 PDT
(In reply to Roman R. from
comment #10
)
> Apart from pausing transition, which is I think the key problem here, there > is an audio bug. It's apparent browser problem since played asset contains > video track only at all times but with certain probability (maybe 5-10%? not > sure) there is an audio noise emitted by browser: it plays some junk.
Without a test case, I can't speculate what's happening here. The original file you provided had strange muxing errors in it; it's possible that you've got poorly mastered media files which are ending up with audio tracks.
> Back to pausing. > > I gave a quick try to doing play() after every buffer (every frame in our > case) appending. This sort of works because it keep browser in play state > however I don't think this is how things are designed to work because this > is associated with multiple problems: there is still transition through > paused state and the browser is simply not being able to keep playing state > reliably. Then at some times play would restart from position zero since > presumably browser considers playback to be at the end of the stream.
Are you setting MediaSource.duration to some large value, or are you letting the duration naturally increase with each append()? Because yes, if we reach the end of the duration, we will pause, as that's what the HTML spec says we _must_ do.
> We do have full responsibility for networking, we are fine with this. We > supply individual video frames in appropriate format recognized by browser > and crafted for low latency playback and everything we basically expect is > to have media element extract ES, decoder frames and present them.
Yes and this is an incorrect assumption. Again, you're trying to use MSE in a way that it was never designed for; the correct API to use in order to achieve what you're trying to achieve is the WebRTC API.
> Setting autoplay to true, by the way, does not seem to have any effect on > described behavior.
Yes, autoplay only has an effect until the first play() command is called. Again, this is defined by the HTML specification.
Roman R.
Comment 12
2019-08-27 10:16:12 PDT
Audio junk is played from an asset that has video track only. We generate the live data programmatically and by no means the data could have an audio track. Additionally, addSourceBuffer call argument has no reference to audio. Also audio problem has nothing to do with sdtp boxes, their removal left the audio bug in place. We do not set duration and it is updated according to MSE duration change algorithm (presumably). In the same time duration embedded into FMP4 is zero to indicate live stream. So I thought that yes there is a chance that Safari does not have a notion of duration-less stream and pauses on considering reaching EOS, what browsers like Chrome and Firefox would not do because they see the stream as live and keep playing just waiting for content appends. I will look into that a bit more.
Roman R.
Comment 13
2019-08-27 10:32:40 PDT
A quick try with non-zero duration suggest that the duration assumption above is true. There is no pause transition and behavior of playback looks "fixed" (from the first glance). If this is correct, this indeed redefines the problem to Safari's lack of proper support for zero duration assets. Which in turn might be more or less okay if this is not a mandatory requirement. Regarding the audio glitch, to my best knowledge perhaps important thing is that audio junk comes on first playback, such as after browser is started. Subsequent plays, page reloads and alike go silent.
Jer Noble
Comment 14
2019-08-27 10:42:12 PDT
(In reply to Roman R. from
comment #12
)
> Audio junk is played from an asset that has video track only. We generate > the live data programmatically and by no means the data could have an audio > track. Additionally, addSourceBuffer call argument has no reference to audio.
It doesn't need to. Throwing an error when encountering an unspecified track is non-normative. Can you point to a reduced test case that demonstrates this problem? And how have you determined that it's the MSE stream that's generating the audio and not the programmatic audio?
> Also audio problem has nothing to do with sdtp boxes, their removal left the > audio bug in place. > > We do not set duration and it is updated according to MSE duration change > algorithm (presumably). In the same time duration embedded into FMP4 is zero > to indicate live stream.
That does not indicate a live stream. A duration of +Inf will indicate a live stream, as per the MSE specification. The MSE spec says that the initial duration of a SourceBuffer is only set to +Inf if there is no duration, and it's not possible to have a `moov` segment without a duration, so that instruction may only be relevant to WebM or TS containers.
> So I thought that yes there is a chance that Safari does not have a notion > of duration-less stream
It does, but you need to set the duration to +Inf explicitly.
Jer Noble
Comment 15
2019-08-27 11:09:33 PDT
I just checked ISO 14496-12, and in 8.2.2.3 it says:
> duration is an integer that declares length of the presentation (in the indicated timescale). This property is derived from the presentation’s tracks: the value of this field corresponds to the duration of the longest track in the presentation. If the duration cannot be determined then duration is set to all 1s.
Have you tried setting the duration of your init segment to 0xFFFFFFFF? That seems to be the official way of specifying "no duration".
Roman R.
Comment 16
2019-08-27 11:13:42 PDT
I was submitting duration through FMP4 content letting the browser to recognize infinite stream. That is, without touching MediaSource.duration. I don't see MSE reference to +infinity duration as indication of live stream, however I confirm that Safari recognizes this as you described. I also re-checked ISO BMFF spec to see if there is a special mention as for inifinite stream and it does not have any special meaning (for 32-bit values) except that 0xFFFFFFFF is indicating that duration cannot be determined. Safari would treat this as a number and this results in huge duration. In contrast, MSE duration setting results in media element showing "live feed". As for audio, the problem happens with any stream we generate, including for example the one I attached in
comment 5
, but under special conditions. I originally described them as "rare" and then I realized it was just on first run. When this happens, video itself continues to play. I will add more comments on reproduction trying it out once again.
Jer Noble
Comment 17
2019-08-27 11:18:23 PDT
(In reply to Roman R. from
comment #16
)
> I also re-checked ISO BMFF spec to see if there is a special mention as for > inifinite stream and it does not have any special meaning (for 32-bit > values) except that 0xFFFFFFFF is indicating that duration cannot be > determined. Safari would treat this as a number and this results in huge > duration.
Okay, then that may be a bug either in WebKit or our fMP4 parser.
Roman R.
Comment 18
2019-08-27 11:56:31 PDT
Regarding audio, I don't see a pattern [yet]. No, it's not the first run but I still can see the problem from time to time. It does not sound plausible that somehow audio track is getting into browser with junk audio and in the same time it also bypasses all parser checks and reaches speaker. I run data producer under debugger and it has no activity related to audio, even unrelated audio code path breakpoints are never hit. Yes in Safari if playback is associated with the bug condition there is a speaker item in the address bar to the left from refresh icon and weird junk audio is played back. I will post new details when/if something clears up.
Andrew Sampson
Comment 19
2019-08-28 18:28:40 PDT
(In reply to Jer Noble from
comment #17
)
> (In reply to Roman R. from
comment #16
) > > I also re-checked ISO BMFF spec to see if there is a special mention as for > > inifinite stream and it does not have any special meaning (for 32-bit > > values) except that 0xFFFFFFFF is indicating that duration cannot be > > determined. Safari would treat this as a number and this results in huge > > duration. > > Okay, then that may be a bug either in WebKit or our fMP4 parser.
Adding a comment to this, Chrome and Firefox (In reply to Jer Noble from
comment #9
)
> (In reply to Roman R. from
comment #8
) > > I think the problem is the transition to paused state is never followed by > > resumed playback even though there is apparently a lot of new data added. It > > might be possible that play() from our end could resolve this problem more > > or less (it is hard to tell upfront if such play() could have ay side > > effects, esp. performance related) and perhaps the complaint here is that > > other browsers are handling this automatic resume pretty smoothly where > > Safari media element just stays paused. > > I really don’t understand this argument. If you want to stay as close to the > live edge as possible, why the heck wouldn’t you call play() immediately > after your appends complete? > > This is the Faustian Bargain that you make when you adopt MSE. You must take > full responsibility for all networking, including all networking adjacent > activities, such as predicting when media is sufficiently buffered and the > network is sufficiently fast to allow media to play back uninterrupted.
So while we _could_ be using WebRTC video, it comes with a lot of caveats that I don't think are worth getting into here. MSE has worked well for us in other browsers (Chrome/Firefox) and in our side-by-side testing, our MSE method results in lower latency than WebRTC video, which is a huge win. Chromium is using hints from the MP4 stream to determine whether low latency mode should be used by checking if an MP4 file is created in real-time, such as used in live streaming, as the fragment_duration is not known in advance and this (mehd) box may be omitted. Firefox also has an explicit low latency mode, though this does not rely on hints and seems to be the default behavior. In both of these browsers our users can startup a game stream and play with ultra-low latency. That being said we did manage to get Safari working in our test bed using your suggestion just now, however, the video latency seems to be pretty high on average. We see it has a buffer around 70-100 MS (where as Chrome and Firefox can be < 23 MS at times.) Are there any modifications we can make to our byte stream to trigger low-delay rendering/decoding in Safari like we see in Chromium/Firefox? Or rather, any other suggestions you can offer. Supporting all web browsers is really important to us not just for the user convenience, but also to ensure we aren't forced to recommend Chrome as the "best way to play." To make it easier for you to reproduce this specific use-case I can get you setup with a Rainway account and instance located near you. Is the email listed here the best place to shoot you a note to coordinate? LMK if I can answer any other questions or how our team can be most helpful.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug