Bug 204769 - Do not retry the EWS build due to flaky failures in layout-test
Summary: Do not retry the EWS build due to flaky failures in layout-test
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Aakash Jain
Keywords: InRadar
Depends on:
Reported: 2019-12-02 14:39 PST by Aakash Jain
Modified: 2019-12-03 13:08 PST (History)
10 users (show)

See Also:

Patch (4.66 KB, patch)
2019-12-02 15:36 PST, Aakash Jain
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aakash Jain 2019-12-02 14:39:36 PST
In EWS, when layout-tests run fails few tests, we re-run the tests. Then we run these tests on clean-tree, and compare the results of first run, second run, and clean tree run.

In case there is flakiness (different tests fail in first and second runs), and no consistent failure is introduced by the patch being tested, we retry the entire build.

We have this logic for years in old EWS and we copied it in new EWS. See: https://trac.webkit.org/browser/webkit/trunk/Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py#L352 (Defer means retrying the build).

350    # At this point we know that at least one test flaked, but no consistent failures
351    # were introduced. This is a bit of a grey-zone.
352    return False  # Defer patch

Retrying the entire build, just because of few flaky tests seems excessive and inefficient. This means that if someone commits a very flaky test (which is not rare), then EWS will keep retrying the patch (almost in a infinite loop), and would be almost stuck until the flakiness is fixed. 

Also, we know that we already have various tests in trunk which are flaky.

In other test-suites like API-tests and JSC-tests, we do not retry the build on flaky test failures, we ignore the flaky tests, and only focus on tests which failed consistently with the patch, and passed without the patch. We should do the similar thing for layout-tests as well.
Comment 1 Aakash Jain 2019-12-02 14:46:50 PST
For e.g.: In https://ews-build.webkit.org/#/builders/17/builds/7537

first_run_failures: "loader/stateobjects/pushstate-size.html"

second_run_failures: "loader/stateobjects/pushstate-size.html", "webaudio/audioparam-setTargetAtTime.html"

clean_tree_run_failures: "loader/stateobjects/pushstate-size.html"

There was a pre-existing test-failure on trunk, and just because of one flaky test (webaudio/audioparam-setTargetAtTime.html), entire build was retried. This build should have been marked as SUCCESS.
Comment 2 Aakash Jain 2019-12-02 15:36:30 PST
Created attachment 384663 [details]
Comment 3 Jonathan Bedard 2019-12-03 08:05:13 PST
Comment on attachment 384663 [details]

Seems like this is what we want more often than not, but I still dislike our flakiness story....
Comment 4 WebKit Commit Bot 2019-12-03 09:56:22 PST
Comment on attachment 384663 [details]

Clearing flags on attachment: 384663

Committed r253049: <https://trac.webkit.org/changeset/253049>
Comment 5 WebKit Commit Bot 2019-12-03 09:56:23 PST
All reviewed patches have been landed.  Closing bug.
Comment 6 Radar WebKit Bug Importer 2019-12-03 09:57:19 PST
Comment 7 Aakash Jain 2019-12-03 13:08:33 PST
Deployed on production server.