WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
191002
At least two dozen IntersectionObserver tests are flaky
https://bugs.webkit.org/show_bug.cgi?id=191002
Summary
At least two dozen IntersectionObserver tests are flaky
Michael Catanzaro
Reported
2018-10-27 17:43:47 PDT
https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20(Tests)/r237491%20(8613)/results.html
shows two dozen flaky IntersectionObserver tests. It looks like these tests all usually pass, but sometimes fail all together? The couple I cheked have been flaky since they were added in
r234723
"Import WPTs for IntersectionObserver" and unfortunately not fixed by
r237218
"Flaky IntersectionObserver web platform tests involving style updates". I'm marking the entire imported/w3c/web-platform-tests/intersection-observer/ directory as flaky.
Attachments
Update expectations
(4.39 KB, patch)
2018-10-29 10:07 PDT
,
Ali Juma
no flags
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Michael Catanzaro
Comment 1
2018-10-27 17:45:43 PDT
Flakiness is observed also on Apple bots, so I'll do this in the global expectations file.
Michael Catanzaro
Comment 2
2018-10-27 17:47:22 PDT
(In reply to Michael Catanzaro from
comment #1
)
> Flakiness is observed also on Apple bots, so I'll do this in the global > expectations file.
(That said, the GTK bots are *way* more flaky than the Apple bots.)
Michael Catanzaro
Comment 3
2018-10-28 18:20:08 PDT
Here's an example failure that occurred after
r237218
:
https://build.webkit.org/results/Apple%20High%20Sierra%20Debug%20WK2%20(Tests)/r237322%20(5225)/results.html
--- /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/intersection-observer/containing-block-expected.txt +++ /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/imported/w3c/web-platform-tests/intersection-observer/containing-block-actual.txt @@ -1,7 +1,7 @@ PASS IntersectionObserver should only report intersections if root is a containing block ancestor of target. -PASS In containing block and intersecting. -PASS In containing block and not intersecting. -PASS Not in containing block and intersecting. -PASS Not in containing block and not intersecting. +FAIL In containing block and intersecting. assert_equals: entries.length expected 1 but got 0 +FAIL In containing block and not intersecting. assert_equals: entries.length expected 2 but got 1 +FAIL Not in containing block and intersecting. assert_equals: entries.length expected 2 but got 1 +FAIL Not in containing block and not intersecting. assert_equals: entries.length expected 2 but got 1
Ali Juma
Comment 4
2018-10-29 06:47:01 PDT
The root cause of this flakiness is that the test suite assumes HTMLEventLoop timing, and WebKit doesn't implement the HTMLEventLoop spec (and, when I brought this up at the Contributors' Meeting, there wasn't consensus that we should implement that spec). More specifically, the tests assume that if changes to style/layout are made in a rAF callback, then intersection observations will be updated before the next rAF callback, and events will fire within one setTimeout of that second callback. In HTMLEventLoop-based engines, this is straightforward to implement -- rAF is immediately followed by style/layout, and immediately followed by an update of intersection observations. After spending a while considering different approaches for WebKit, the most predictable (and closest to the spec) approach was to update intersection observations at the same time as we flush layers. On iOS, flushing and rAF are both driven by didUpdate messages received from the UIProcess. None of the tests are flaky on iOS. Mac is slightly different, but rAF still gets throttled when we fall behind on flushes. There's still a bit of flakiness, but still, the vast majority of tests are not flaky. And those that are flaky, are only flaky on Debug bots (where we're more likely to fall behind on flushes), not on Release bots. So I suspect that on GTK, there's perhaps weaker coupling between rAF and flushes, and so on some runs we get into a state where flushing/drawing is slower for some reason, so the tests fail because the observations they're expecting haven't been computed yet. In any case, given the above, I think that marking the entire suite as flaky in the global expectations file is too broad. So I'm going to move that over to the GTK expectations file.
Michael Catanzaro
Comment 5
2018-10-29 07:42:42 PDT
Thanks for the detailed explanation! Sounds like some investigation of the GTK graphics stack would be needed here. (In reply to Ali Juma from
comment #4
)
> In any case, given the above, I think that marking the entire suite as flaky > in the global expectations file is too broad. So I'm going to move that over > to the GTK expectations file.
FWIW I don't agree with this because the flakes are still occurring on several Apple bots so it's going to result in spurious complaints from the EWS.
Ali Juma
Comment 6
2018-10-29 07:47:05 PDT
(In reply to Michael Catanzaro from
comment #5
)
> Thanks for the detailed explanation! Sounds like some investigation of the > GTK graphics stack would be needed here. > > (In reply to Ali Juma from
comment #4
) > > In any case, given the above, I think that marking the entire suite as flaky > > in the global expectations file is too broad. So I'm going to move that over > > to the GTK expectations file. > > FWIW I don't agree with this because the flakes are still occurring on > several Apple bots so it's going to result in spurious complaints from the > EWS.
I'm also going to add expectations for the specific tests that are flaky on Mac Debug (rather than having a global expectation for the whole test suite).
Michael Catanzaro
Comment 7
2018-10-29 07:57:22 PDT
OK, sounds good!
Ali Juma
Comment 8
2018-10-29 10:07:45 PDT
Created
attachment 353294
[details]
Update expectations
Ali Juma
Comment 9
2018-10-29 10:09:16 PDT
Comment on
attachment 353294
[details]
Update expectations Clearing flags on attachment: 353294 Committed
r237556
: <
https://trac.webkit.org/changeset/237556
>
Michael Catanzaro
Comment 10
2018-10-29 10:57:45 PDT
There were definitely some failures on WK1 bots too, which should be accounted for.
Ali Juma
Comment 11
2018-10-29 11:04:14 PDT
(In reply to Michael Catanzaro from
comment #10
)
> There were definitely some failures on WK1 bots too, which should be > accounted for.
These tests were skipped entirely on mac-wk1 in
r237218
. And on the Win bots, all web platform tests are skipped. Looking at
https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#tests=intersection-observer
, the only flakiness I see is on Mac WK2 Debug and GTK. Are there others that I'm missing?
Michael Catanzaro
Comment 12
2018-10-30 06:43:02 PDT
Hmm, I don't know. I'm sure I saw them, but maybe they were too far back in the history? Not sure. Clearly it looks good now!
Ali Juma
Comment 13
2019-05-14 13:40:22 PDT
After recent changes to rAF timing to align with the spec, these tests are no longer flaky on Mac Debug, so I've updated the expectations in
https://trac.webkit.org/changeset/245304
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug