Bug 230435

Summary: Regression (r282201 - r282220?) : [ MacOS ] http/tests/media/track-in-band-hls-metadata.html is a flaky failure
Product: WebKit Reporter: ayumi_kojima
Component: MediaAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: eric.carlson, jer.noble, webkit-bot-watchers-bugzilla, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   

Description ayumi_kojima 2021-09-17 16:35:43 PDT
http/tests/media/track-in-band-hls-metadata.html

Is a flaky failure on MacOS.

History: https://results.webkit.org/?suite=layout-tests&test=http%2Ftests%2Fmedia%2Ftrack-in-band-hls-metadata.html

The test has been marked as pass/timeout at Bug 140022 and has been flaky timing out/ timing out as far as seen in the history.

The flaky failure started showing up at around r282220.

Diff:

--- /Volumes/Data/worker/bigsur-debug-applesilicon-tests-wk2/build/layout-test-results/http/tests/media/track-in-band-hls-metadata-expected.txt
+++ /Volumes/Data/worker/bigsur-debug-applesilicon-tests-wk2/build/layout-test-results/http/tests/media/track-in-band-hls-metadata-actual.txt
@@ -20,26 +20,26 @@
 * 1
 EXPECTED (typeof(cue) != 'undefined') OK
 EXPECTED (cue.data == 'null') OK
-EXPECTED (cue.type == 'com.apple.quicktime.HLS') OK
+EXPECTED (cue.type == 'com.apple.quicktime.HLS'), OBSERVED 'org.id3' FAIL
 EXPECTED (cue.value != 'null') OK
-EXPECTED (cue.value.key == '"X-START-OFFSET"') OK
-EXPECTED (cue.value.data == '"0.000000"') OK
+EXPECTED (cue.value.key == '"X-START-OFFSET"'), OBSERVED '"TIT2"' FAIL
+EXPECTED (cue.value.data == '"0.000000"'), OBSERVED '"Stream Counting"' FAIL
 
 * 2
 EXPECTED (typeof(cue) != 'undefined') OK
 EXPECTED (cue.data == 'null') OK
 EXPECTED (cue.type == 'com.apple.quicktime.HLS') OK
 EXPECTED (cue.value != 'null') OK
-EXPECTED (cue.value.key == '"X-END-OFFSET"') OK
-EXPECTED (cue.value.data == '"5.000000"') OK
+EXPECTED (cue.value.key == '"X-END-OFFSET"'), OBSERVED '"X-START-OFFSET"' FAIL
+EXPECTED (cue.value.data == '"5.000000"'), OBSERVED '"0.000000"' FAIL
 
 * 3
 EXPECTED (typeof(cue) != 'undefined') OK
 EXPECTED (cue.data == 'null') OK
-EXPECTED (cue.type == 'org.id3') OK
+EXPECTED (cue.type == 'org.id3'), OBSERVED 'com.apple.quicktime.HLS' FAIL
 EXPECTED (cue.value != 'null') OK
-EXPECTED (cue.value.key == '"TIT2"') OK
-EXPECTED (cue.value.data == '"Stream Counting"') OK
+EXPECTED (cue.value.key == '"TIT2"'), OBSERVED '"X-END-OFFSET"' FAIL
+EXPECTED (cue.value.data == '"Stream Counting"'), OBSERVED '"5.000000"' FAIL
 
 * 4
 EXPECTED (typeof(cue) != 'undefined') OK
Comment 1 Radar WebKit Bug Importer 2021-09-17 16:37:27 PDT
<rdar://problem/83260221>
Comment 2 ayumi_kojima 2021-09-17 16:41:39 PDT
Marked test expectations: https://trac.webkit.org/changeset/282709/webkit
Comment 3 ayumi_kojima 2021-10-12 09:00:50 PDT
I was able to reproduce the failure at TOT locally on BigSur using run-webkit-tests -1 --force --debug --iterations 50 http/tests/media/track-in-band-hls-metadata.html --exit-after-n-failures 1.

According to the history, the test has been flaky timing out for long time, but at r282220 it started flaky failing. At around that revision, the timeout became flakier.

The test passed on r282200 locally. It timed out on r282205 and r282217 and hanged with --no-timeout.

From the reproduction, I think the regression range is r282201 - r282220, but not able to find it out the exact revision because I couldn't reproduce the failure.