https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r237491%20(8613)/results.html shows two dozen flaky IntersectionObserver tests. It looks like these tests all usually pass, but sometimes fail all together? The couple I cheked have been flaky since they were added in r234723 "Import WPTs for IntersectionObserver" and unfortunately not fixed by r237218 "Flaky IntersectionObserver web platform tests involving style updates". I'm marking the entire imported/w3c/web-platform-tests/intersection-observer/ directory as flaky.
Flakiness is observed also on Apple bots, so I'll do this in the global expectations file.
(In reply to Michael Catanzaro from comment #1) > Flakiness is observed also on Apple bots, so I'll do this in the global > expectations file. (That said, the GTK bots are *way* more flaky than the Apple bots.)
Here's an example failure that occurred after r237218: https://build.webkit.org/results/Apple%20High%20Sierra%20Debug%20WK2%20(Tests)/r237322%20(5225)/results.html --- /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/intersection-observer/containing-block-expected.txt +++ /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/intersection-observer/containing-block-actual.txt @@ -1,7 +1,7 @@ PASS IntersectionObserver should only report intersections if root is a containing block ancestor of target. -PASS In containing block and intersecting. -PASS In containing block and not intersecting. -PASS Not in containing block and intersecting. -PASS Not in containing block and not intersecting. +FAIL In containing block and intersecting. assert_equals: entries.length expected 1 but got 0 +FAIL In containing block and not intersecting. assert_equals: entries.length expected 2 but got 1 +FAIL Not in containing block and intersecting. assert_equals: entries.length expected 2 but got 1 +FAIL Not in containing block and not intersecting. assert_equals: entries.length expected 2 but got 1
The root cause of this flakiness is that the test suite assumes HTMLEventLoop timing, and WebKit doesn't implement the HTMLEventLoop spec (and, when I brought this up at the Contributors' Meeting, there wasn't consensus that we should implement that spec). More specifically, the tests assume that if changes to style/layout are made in a rAF callback, then intersection observations will be updated before the next rAF callback, and events will fire within one setTimeout of that second callback. In HTMLEventLoop-based engines, this is straightforward to implement -- rAF is immediately followed by style/layout, and immediately followed by an update of intersection observations. After spending a while considering different approaches for WebKit, the most predictable (and closest to the spec) approach was to update intersection observations at the same time as we flush layers. On iOS, flushing and rAF are both driven by didUpdate messages received from the UIProcess. None of the tests are flaky on iOS. Mac is slightly different, but rAF still gets throttled when we fall behind on flushes. There's still a bit of flakiness, but still, the vast majority of tests are not flaky. And those that are flaky, are only flaky on Debug bots (where we're more likely to fall behind on flushes), not on Release bots. So I suspect that on GTK, there's perhaps weaker coupling between rAF and flushes, and so on some runs we get into a state where flushing/drawing is slower for some reason, so the tests fail because the observations they're expecting haven't been computed yet. In any case, given the above, I think that marking the entire suite as flaky in the global expectations file is too broad. So I'm going to move that over to the GTK expectations file.
Thanks for the detailed explanation! Sounds like some investigation of the GTK graphics stack would be needed here. (In reply to Ali Juma from comment #4) > In any case, given the above, I think that marking the entire suite as flaky > in the global expectations file is too broad. So I'm going to move that over > to the GTK expectations file. FWIW I don't agree with this because the flakes are still occurring on several Apple bots so it's going to result in spurious complaints from the EWS.
(In reply to Michael Catanzaro from comment #5) > Thanks for the detailed explanation! Sounds like some investigation of the > GTK graphics stack would be needed here. > > (In reply to Ali Juma from comment #4) > > In any case, given the above, I think that marking the entire suite as flaky > > in the global expectations file is too broad. So I'm going to move that over > > to the GTK expectations file. > > FWIW I don't agree with this because the flakes are still occurring on > several Apple bots so it's going to result in spurious complaints from the > EWS. I'm also going to add expectations for the specific tests that are flaky on Mac Debug (rather than having a global expectation for the whole test suite).
OK, sounds good!
Created attachment 353294 [details] Update expectations
Comment on attachment 353294 [details] Update expectations Clearing flags on attachment: 353294 Committed r237556: <https://trac.webkit.org/changeset/237556>
There were definitely some failures on WK1 bots too, which should be accounted for.
(In reply to Michael Catanzaro from comment #10) > There were definitely some failures on WK1 bots too, which should be > accounted for. These tests were skipped entirely on mac-wk1 in r237218. And on the Win bots, all web platform tests are skipped. Looking at https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#tests=intersection-observer, the only flakiness I see is on Mac WK2 Debug and GTK. Are there others that I'm missing?
Hmm, I don't know. I'm sure I saw them, but maybe they were too far back in the history? Not sure. Clearly it looks good now!
After recent changes to rAF timing to align with the spec, these tests are no longer flaky on Mac Debug, so I've updated the expectations in https://trac.webkit.org/changeset/245304.