191002 – At least two dozen IntersectionObserver tests are flaky

NEW 191002

At least two dozen IntersectionObserver tests are flaky

https://bugs.webkit.org/show_bug.cgi?id=191002

Summary At least two dozen IntersectionObserver tests are flaky

Michael Catanzaro

Reported 2018-10-27 17:43:47 PDT

https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r237491%20(8613)/results.html shows two dozen flaky IntersectionObserver tests. It looks like these tests all usually pass, but sometimes fail all together? The couple I cheked have been flaky since they were added in r234723 "Import WPTs for IntersectionObserver" and unfortunately not fixed by r237218 "Flaky IntersectionObserver web platform tests involving style updates". I'm marking the entire imported/w3c/web-platform-tests/intersection-observer/ directory as flaky.

Attachments
Update expectations (4.39 KB, patch) 2018-10-29 10:07 PDT, Ali Juma	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Michael Catanzaro

Comment 1 2018-10-27 17:45:43 PDT

Flakiness is observed also on Apple bots, so I'll do this in the global expectations file.

Michael Catanzaro

Comment 2 2018-10-27 17:47:22 PDT

(In reply to Michael Catanzaro from comment #1) > Flakiness is observed also on Apple bots, so I'll do this in the global > expectations file. (That said, the GTK bots are *way* more flaky than the Apple bots.)

Michael Catanzaro

Comment 3 2018-10-28 18:20:08 PDT

Here's an example failure that occurred after r237218: https://build.webkit.org/results/Apple%20High%20Sierra%20Debug%20WK2%20(Tests)/r237322%20(5225)/results.html --- /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/intersection-observer/containing-block-expected.txt +++ /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/intersection-observer/containing-block-actual.txt @@ -1,7 +1,7 @@ PASS IntersectionObserver should only report intersections if root is a containing block ancestor of target. -PASS In containing block and intersecting. -PASS In containing block and not intersecting. -PASS Not in containing block and intersecting. -PASS Not in containing block and not intersecting. +FAIL In containing block and intersecting. assert_equals: entries.length expected 1 but got 0 +FAIL In containing block and not intersecting. assert_equals: entries.length expected 2 but got 1 +FAIL Not in containing block and intersecting. assert_equals: entries.length expected 2 but got 1 +FAIL Not in containing block and not intersecting. assert_equals: entries.length expected 2 but got 1

Ali Juma

Comment 4 2018-10-29 06:47:01 PDT

The root cause of this flakiness is that the test suite assumes HTMLEventLoop timing, and WebKit doesn't implement the HTMLEventLoop spec (and, when I brought this up at the Contributors' Meeting, there wasn't consensus that we should implement that spec). More specifically, the tests assume that if changes to style/layout are made in a rAF callback, then intersection observations will be updated before the next rAF callback, and events will fire within one setTimeout of that second callback. In HTMLEventLoop-based engines, this is straightforward to implement -- rAF is immediately followed by style/layout, and immediately followed by an update of intersection observations. After spending a while considering different approaches for WebKit, the most predictable (and closest to the spec) approach was to update intersection observations at the same time as we flush layers. On iOS, flushing and rAF are both driven by didUpdate messages received from the UIProcess. None of the tests are flaky on iOS. Mac is slightly different, but rAF still gets throttled when we fall behind on flushes. There's still a bit of flakiness, but still, the vast majority of tests are not flaky. And those that are flaky, are only flaky on Debug bots (where we're more likely to fall behind on flushes), not on Release bots. So I suspect that on GTK, there's perhaps weaker coupling between rAF and flushes, and so on some runs we get into a state where flushing/drawing is slower for some reason, so the tests fail because the observations they're expecting haven't been computed yet. In any case, given the above, I think that marking the entire suite as flaky in the global expectations file is too broad. So I'm going to move that over to the GTK expectations file.

Michael Catanzaro

Comment 5 2018-10-29 07:42:42 PDT

Thanks for the detailed explanation! Sounds like some investigation of the GTK graphics stack would be needed here. (In reply to Ali Juma from comment #4) > In any case, given the above, I think that marking the entire suite as flaky > in the global expectations file is too broad. So I'm going to move that over > to the GTK expectations file. FWIW I don't agree with this because the flakes are still occurring on several Apple bots so it's going to result in spurious complaints from the EWS.

Ali Juma

Comment 6 2018-10-29 07:47:05 PDT

(In reply to Michael Catanzaro from comment #5) > Thanks for the detailed explanation! Sounds like some investigation of the > GTK graphics stack would be needed here. > > (In reply to Ali Juma from comment #4) > > In any case, given the above, I think that marking the entire suite as flaky > > in the global expectations file is too broad. So I'm going to move that over > > to the GTK expectations file. > > FWIW I don't agree with this because the flakes are still occurring on > several Apple bots so it's going to result in spurious complaints from the > EWS. I'm also going to add expectations for the specific tests that are flaky on Mac Debug (rather than having a global expectation for the whole test suite).

Michael Catanzaro

Comment 7 2018-10-29 07:57:22 PDT

OK, sounds good!

Ali Juma

Comment 8 2018-10-29 10:07:45 PDT

Created attachment 353294 [details] Update expectations

Ali Juma

Comment 9 2018-10-29 10:09:16 PDT

Comment on attachment 353294 [details] Update expectations Clearing flags on attachment: 353294 Committed r237556: <https://trac.webkit.org/changeset/237556>

Michael Catanzaro

Comment 10 2018-10-29 10:57:45 PDT

There were definitely some failures on WK1 bots too, which should be accounted for.

Ali Juma

Comment 11 2018-10-29 11:04:14 PDT

(In reply to Michael Catanzaro from comment #10) > There were definitely some failures on WK1 bots too, which should be > accounted for. These tests were skipped entirely on mac-wk1 in r237218. And on the Win bots, all web platform tests are skipped. Looking at https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#tests=intersection-observer, the only flakiness I see is on Mac WK2 Debug and GTK. Are there others that I'm missing?

Michael Catanzaro

Comment 12 2018-10-30 06:43:02 PDT

Hmm, I don't know. I'm sure I saw them, but maybe they were too far back in the history? Not sure. Clearly it looks good now!

Ali Juma

Comment 13 2019-05-14 13:40:22 PDT

After recent changes to rAF timing to align with the spec, these tests are no longer flaky on Mac Debug, so I've updated the expectations in https://trac.webkit.org/changeset/245304.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version Other

Hardware PC

OS Linux

Product WebKit

Component WebKitGTK

Assignee

Ali Juma

Reported

2018-10-27 17:43 PDT

Modified

2019-05-14 13:40 PDT History

CC List

12 users Show

URL

Keywords

Depends on

Blocks