Bug 181832 - Layout Test inspector/unit-tests/throttle.html is a flaky failure
Summary: Layout Test inspector/unit-tests/throttle.html is a flaky failure
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Inspector (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Matt Baker
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-18 20:40 PST by Ryan Haddad
Modified: 2018-01-23 11:34 PST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Haddad 2018-01-18 20:40:37 PST
The following layout test is flaky on macOS and GTK

inspector/unit-tests/throttle.html

Probable cause:

This test has been extremely flaky since it was added in https://trac.webkit.org/changeset/226755/webkit

Flakiness Dashboard:

https://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=inspector%2Funit-tests%2Fthrottle.html

--- /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/inspector/unit-tests/throttle-expected.txt
+++ /Volumes/Data/slave/highsierra-debug-tests-wk2/build/layout-test-results/inspector/unit-tests/throttle-actual.txt
@@ -4,15 +4,7 @@
 == Running test suite: Throttle
 -- Running test case: Basic throttle
 Call throttled function every 1 ms for 110 ms.
-PASS: All calls delayed at least 50 ms.
+FAIL: Delay should be at least 50 ms. Was 49 ms.
+!! EXCEPTION: undefined
+Stack Trace: #0: (anonymous) ((unknown))
 
--- Running test case: Throttle proxy uniqueness
-PASS: Each call to throttle should return a new proxy.
-
--- Running test case: Throttled function arguments
-PASS: Trailing call invoked with most recent arguments.
-
--- Running test case: Cancel throttle
-Canceled throttled function.
-PASS: Throttled function canceled.
-
Comment 1 Matt Baker 2018-01-19 13:21:05 PST
A quick fix would be to use expectEqualWithAccuracy, with an accuracy of 1ms. I haven't seen a delay be off by more than 1ms. That could be a last resort if the underlying problem is difficult to pin down.
Comment 2 Joseph Pecoraro 2018-01-19 13:50:08 PST
I don't think we can rely on our bots to have 1ms accuracy. May be best to just skip this test / mark it as flakey since its all about timing.
Comment 3 Matt Baker 2018-01-19 14:33:18 PST
(In reply to Joseph Pecoraro from comment #2)
> I don't think we can rely on our bots to have 1ms accuracy. May be best to
> just skip this test / mark it as flakey since its all about timing.

With poor accuracy I'd still expect the throttled rate to be at least as long as the specified delay, never less. Before I just mark this flaky I'm going to investigate a little further.
Comment 4 Matt Baker 2018-01-19 18:17:57 PST
Making an enhancement to Object.throttle which should fix this.
Comment 5 Ryan Haddad 2018-01-23 09:51:09 PST
This continues to fail on the bots, so I'd like to either fix the issue or roll out the change as soon as possible.
Comment 6 Joseph Pecoraro 2018-01-23 11:01:06 PST
I'd suggest marking this test as flakey rather then rolling out.

The entire premise of Throttle is to test timing related things. Timing related tests are never reliable on the bots, particularly debug bots.
Comment 7 Ryan Haddad 2018-01-23 11:30:45 PST
Marked test as flaky in https://trac.webkit.org/r227426
Comment 8 Joseph Pecoraro 2018-01-23 11:34:11 PST
My justification for marking this as flakey instead of rolling out the original change is that flakiness on this test (which is timing based) does not indicate the change itself is incorrect. It may be doing what it is intended to do but missing the specific timing windows expected by the test.

We've seen that is frequently the case for tests that rely on timing being run on the bots (particularly debug).

The particular output is interesting:

  -PASS: All calls delayed at least 50 ms.
  +FAIL: Delay should be at least 50 ms. Was 49 ms.

Getting 49 when expecting 50 probably means some round/fudge factor would be appropriate in the test. More disparate values would have been cause for concern. I suspect adding some fudge that is what Matt was alluding to improve the test. However he hasn't seemed to have gotten around to it yet.