Bug 302735

Summary: WPT's webvtt tests are failing very often
Product: WebKit Reporter: Jean-Yves Avenard [:jya] <jean-yves.avenard>
Component: MediaAssignee: Jean-Yves Avenard [:jya] <jean-yves.avenard>
Status: RESOLVED FIXED    
Severity: Normal CC: darbinyan, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   

Jean-Yves Avenard [:jya]
Reported 2025-11-18 14:24:46 PST
`run-webkit-test --debug --repeat 100 --exit-after-n-failures=1 --show-window imported/w3c/web-platform-tests/webvtt/rendering/cues-with-video/processing-model/align_end.html` will always fail within 15 runs. imported/w3c/web-platform-tests/webvtt/rendering/cues-with-video/processing-model/align_end.html [ ImageOnlyFailure Pass ] same with all other webvtt tests.
Attachments
Radar WebKit Bug Importer
Comment 1 2025-11-18 14:24:58 PST
Jean-Yves Avenard [:jya]
Comment 2 2025-11-19 03:45:53 PST
The test loads a blank video with a vtt text track, wait for the `playing` event to be fired and performs a reftest. At which point it expects to have the text cue be displayed on screen. The test fails intermittently easily. In a debug build doing `run-webkit-test debug --repeat 100 --exit-after-n-failures=1 --show-window imported/w3c/web-platform-tests/webvtt/rendering/cues-with-video/processing-model/align_end.html` will fail typically between 8 and 15 runs on my dev machine. Looking at the log, there’s nothing obviously wrong. The `playing` event, is fired when the readyState moved to HAVE_FUTURE_DATA ; with the webm player this occurs when we have rendered the first frame. At which time we call HTMLMediaPlayer::mediaPlayerReadyStateChanged HTMLMediaElement::setReadyState will move to HAVE_FUTURE_DATA if the MediaPlayer's readyState is `HaveFutureData` and the text tracks have been loaded. Per spec ``` HAVE_FUTURE_DATA Data for the immediate current playback position is available, as well as enough data for the user agent to advance the current playback position in the direction of playback at least a little without immediately reverting to the HAVE_METADATA state, and the text tracks are ready ``` So the text tracks need to be ready, but this doesn't imply that they are actually been drawn to the screen. And our code makes no attempt to guarantee that the text tracks' cues have actually been painted. `HTMLMediaPlayer::setReadyState` will be called when `HTMLMediaElement::textTrackReadyStateChanged` is called With the `WebMUseRemoteAudioVideoRenderer` preference sets, the MediaPlayerPrivateWebM runs in the web. When the test fails, we can see from the resulting ref test that the cues aren't visible on screen. Looking at the logs, the time between when `textTrackReadyStateChanged` has been called and when we have achieved all requirements have readyState to move to `HAVE_FUTURE_DATA` ; it typically takes under 0.01s `2025-11-19 22:14:41.798029+1100 HTMLMediaElement::textTrackReadyStateChanged(10335156763480636993) track loaded` `2025-11-19 22:14:41.798145+1100 HTMLMediaElement::setReadyState(8F6DDA51813D3E41) new state = HaveEnoughData, current state = HAVE_CURRENT_DATA` It took 0.000116s between the time the text track was ready and we moved to HAVE_ENOUGH_DATA. HTMLVideoElement::mediaPlayerFirstVideoFrameAvailable was received at 2025-11-19 22:14:41.797698+1100 ; before `textTrackReadyStateChanged` was received. Now when the MediaPlayerPrivateWebM runs in the GPU process however. `2025-11-19 22:22:35.868571+1100 HTMLMediaElement::textTrackReadyStateChanged(6698457972533799464) track loaded` `2025-11-19 22:22:35.879026+1100 HTMLMediaElement::setReadyState(5CF5B4093BC2AA28) new state = HaveEnoughData, current state = HAVE_METADATA` it took 0.01s between the time the text track was ready and we moved to HAVE_ENOUGH_DATA. MediaPlayerPrivateRemote::firstVideoFrameAvailable(6698457972533799464) was received at 2025-11-19 22:22:35.878968+1100 *after* `textTrackReadyStateChanged` Whenever the failure occurs, with the player running in the web content process, firstVideoFrameAvailable was received before textTrackReadyStateChanged The test requires the text cues to not only be ready, but also displayed on screen. With the MediaPlayer running in the content process, the operations are several order of magnitude quicker, the `playing` event is fired much earlier than before and the text tracks cues didn't get the chance to be rendered on the text yet. Adding a simple 0.1s delay in the MediaPlayer before calling HTMLMediaElement::mediaPlayerReadyStateChanged allows the test to run without failures. At this stage, there's no ability for the web content process to determine accurately that the text cues got actually rendered on screen and delay the firing of `playing` after that time. I believe the test is incorrect and the spec should be further clarified as to what a text track being ready means in practice. The spec clearly states that the track must be reading, not "showing" The definition used in `HTMLMediaElement::textTracksAreReady()` appears out of date, the comment states: ``` // 4.8.10.12.1 Text track model // ... // The text tracks of a media element are ready if all the text tracks whose mode was not // in the disabled state when the element's resource selection algorithm last started now // have a text track readiness state of loaded or failed to load. ``` but the spec states that it's only if https://html.spec.whatwg.org/dev/media.html#the-text-tracks-are-ready ``` The text tracks of a media element are ready when both the element's list of pending text tracks is empty and the element's blocked-on-parser flag is false. ``` still no mention of them "showing"
Jean-Yves Avenard [:jya]
Comment 3 2025-11-19 03:46:12 PST
The test loads a blank video with a vtt text track, wait for the `playing` event to be fired and performs a reftest. At which point it expects to have the text cue be displayed on screen. The test fails intermittently easily. In a debug build doing `run-webkit-test debug --repeat 100 --exit-after-n-failures=1 --show-window imported/w3c/web-platform-tests/webvtt/rendering/cues-with-video/processing-model/align_end.html` will fail typically between 8 and 15 runs on my dev machine. Looking at the log, there’s nothing obviously wrong. The `playing` event, is fired when the readyState moved to HAVE_FUTURE_DATA ; with the webm player this occurs when we have rendered the first frame. At which time we call HTMLMediaPlayer::mediaPlayerReadyStateChanged HTMLMediaElement::setReadyState will move to HAVE_FUTURE_DATA if the MediaPlayer's readyState is `HaveFutureData` and the text tracks have been loaded. Per spec ``` HAVE_FUTURE_DATA Data for the immediate current playback position is available, as well as enough data for the user agent to advance the current playback position in the direction of playback at least a little without immediately reverting to the HAVE_METADATA state, and the text tracks are ready ``` So the text tracks need to be ready, but this doesn't imply that they are actually been drawn to the screen. And our code makes no attempt to guarantee that the text tracks' cues have actually been painted. `HTMLMediaPlayer::setReadyState` will be called when `HTMLMediaElement::textTrackReadyStateChanged` is called With the `WebMUseRemoteAudioVideoRenderer` preference sets, the MediaPlayerPrivateWebM runs in the web. When the test fails, we can see from the resulting ref test that the cues aren't visible on screen. Looking at the logs, the time between when `textTrackReadyStateChanged` has been called and when we have achieved all requirements have readyState to move to `HAVE_FUTURE_DATA` ; it typically takes under 0.01s `2025-11-19 22:14:41.798029+1100 HTMLMediaElement::textTrackReadyStateChanged(10335156763480636993) track loaded` `2025-11-19 22:14:41.798145+1100 HTMLMediaElement::setReadyState(8F6DDA51813D3E41) new state = HaveEnoughData, current state = HAVE_CURRENT_DATA` It took 0.000116s between the time the text track was ready and we moved to HAVE_ENOUGH_DATA. HTMLVideoElement::mediaPlayerFirstVideoFrameAvailable was received at 2025-11-19 22:14:41.797698+1100 ; before `textTrackReadyStateChanged` was received. Now when the MediaPlayerPrivateWebM runs in the GPU process however. `2025-11-19 22:22:35.868571+1100 HTMLMediaElement::textTrackReadyStateChanged(6698457972533799464) track loaded` `2025-11-19 22:22:35.879026+1100 HTMLMediaElement::setReadyState(5CF5B4093BC2AA28) new state = HaveEnoughData, current state = HAVE_METADATA` it took 0.01s between the time the text track was ready and we moved to HAVE_ENOUGH_DATA. MediaPlayerPrivateRemote::firstVideoFrameAvailable(6698457972533799464) was received at 2025-11-19 22:22:35.878968+1100 *after* `textTrackReadyStateChanged` Whenever the failure occurs, with the player running in the web content process, firstVideoFrameAvailable was received before textTrackReadyStateChanged The test requires the text cues to not only be ready, but also displayed on screen. With the MediaPlayer running in the content process, the operations are several order of magnitude quicker, the `playing` event is fired much earlier than before and the text tracks cues didn't get the chance to be rendered on the text yet. Adding a simple 0.1s delay in the MediaPlayer before calling HTMLMediaElement::mediaPlayerReadyStateChanged allows the test to run without failures. At this stage, there's no ability for the web content process to determine accurately that the text cues got actually rendered on screen and delay the firing of `playing` after that time. I believe the test is incorrect and the spec should be further clarified as to what a text track being ready means in practice. The spec clearly states that the track must be reading, not "showing" The definition used in `HTMLMediaElement::textTracksAreReady()` appears out of date, the comment states: ``` // 4.8.10.12.1 Text track model // ... // The text tracks of a media element are ready if all the text tracks whose mode was not // in the disabled state when the element's resource selection algorithm last started now // have a text track readiness state of loaded or failed to load. ``` but the spec states that it's only if https://html.spec.whatwg.org/dev/media.html#the-text-tracks-are-ready ``` The text tracks of a media element are ready when both the element's list of pending text tracks is empty and the element's blocked-on-parser flag is false. ``` still no mention of them "showing"
Jean-Yves Avenard [:jya]
Comment 4 2025-11-21 04:54:25 PST
*** Bug 302889 has been marked as a duplicate of this bug. ***
Jean-Yves Avenard [:jya]
Comment 5 2025-11-21 05:00:27 PST
EWS
Comment 6 2025-11-21 14:46:17 PST
Test gardening commit 303422@main (fbd9dbb45242): <https://commits.webkit.org/303422@main> Reviewed commits have been landed. Closing PR #54314 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.