204704 – [EWS] Do not retry layout-tests build if the flaky test failures are also present in clean tree run

RESOLVED FIXED204704

[EWS] Do not retry layout-tests build if the flaky test failures are also present in clean tree run

https://bugs.webkit.org/show_bug.cgi?id=204704

Summary [EWS] Do not retry layout-tests build if the flaky test failures are also pre...

Aakash Jain

Reported 2019-11-29 08:31:34 PST

In EWS, do not retry layout-tests build if the (flaky) test failures are also present in clean tree run. If same failures (whether flaky or consistent) are also found on clean tree, it is very unlikely that the failures were introduced by the patch. e.g.: In https://ews-build.webkit.org/#/builders/24/builds/5456, the first and second run failures were also present on clean tree. However the build was retried, instead it should be marked as passing.

Attachments
Patch (3.50 KB, patch) 2019-11-29 08:37 PST, Aakash Jain	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Aakash Jain

Comment 1 2019-11-29 08:37:24 PST

Created attachment 384510 [details] Patch

Alexey Proskuryakov

Comment 2 2019-11-29 22:23:57 PST

Comment on attachment 384510 [details] Patch This is clearly an improvement, it's obvious that a patch with results like these should get a green bubble. I think that we are still too strict with declaring SUCCESS, and delay the verdict unnecessarily. One thing that doesn't make logical sense to me is that we completely ignore flaky tests if they pass on retry within a single test run - but tests that also fail on retry within the run yet pass on a new run are treated as ultra important.

Aakash Jain

Comment 3 2019-11-29 23:44:38 PST

(In reply to Alexey Proskuryakov from comment #2) > Comment on attachment 384510 [details] > > One thing that doesn't make logical sense to me is that we completely ignore > flaky tests if they pass on retry within a single test run - but tests that > also fail on retry within the run yet pass on a new run are treated as ultra > important. Completely agree. I don’t know why we retry so aggressively when a flakiness is encountered between two test runs. However, as you said, we do ignore flaky tests within each single test-run. Maybe we should take the intersection of test failures from first and second test runs, and compare it with clean tree run (unless there are 30+ failures in any run in which case we use special logic). This is what we do on API tests and JSC tests EWSes.

WebKit Commit Bot

Comment 4 2019-11-30 00:15:43 PST

Comment on attachment 384510 [details] Patch Clearing flags on attachment: 384510 Committed r252953: <https://trac.webkit.org/changeset/252953>

WebKit Commit Bot

Comment 5 2019-11-30 00:15:45 PST

All reviewed patches have been landed. Closing bug.

Radar WebKit Bug Importer

Comment 6 2019-11-30 00:16:16 PST

<rdar://problem/57537845>

Aakash Jain

Comment 7 2019-11-30 09:29:43 PST

Deployed on production server.

Alexey Proskuryakov

Comment 8 2019-11-30 15:59:59 PST

I think that taking an intersection would be an improvement in most cases. We can make the logic more subtle and detect patches that introduce substantial flakiness. E.g. a human would probably be concerned about a patch that makes 5 tests flaky when nothing is flaky on a clean run.

Aakash Jain

Comment 9 2019-12-01 08:32:51 PST

(In reply to Alexey Proskuryakov from comment #8) > I think that taking an intersection would be an improvement in most cases. Yeah, For e.g.: in https://ews-build.webkit.org/#/builders/17/builds/7537 there was one pre-existing failure on trunk. In second-run a flaky failure "webaudio/audioparam-setTargetAtTime.html" caused the entire build to be retried. When there is a pre-existing failure, any flaky failure would almost result in build being RETRIED for a good patch, while the build should be marked as SUCCESS (this is a huge waste of resources).

Jonathan Bedard

Comment 10 2019-12-02 08:43:10 PST

(In reply to Alexey Proskuryakov from comment #2) > Comment on attachment 384510 [details] > Patch ... > One thing that doesn't make logical sense to me is that we completely ignore > flaky tests if they pass on retry within a single test run - but tests that > also fail on retry within the run yet pass on a new run are treated as ultra > important. I agree we need a better way to handle this. Lately, every example I’ve seen of a test which passes during the retry in run-webkit-tests has had some sort of stateful dependency in the test which caused it to fail when run with other tests. I actually think that passing tests on the retry in run-webkit-tests should be fatal. With the current state of the tree, that’s usually a pretty uncontroverdial bug.

Aakash Jain

Comment 11 2019-12-02 14:52:33 PST

Going to fix the entire thing in Bug 204769. This bug covered only a specific case of flakiness, while Bug 204769 would be more generic and cover most of the flakiness scenarios. I would need to revert this change as it added explicit if condition for this specific scenario. It might be little unusual, but I am going to revert this commit (r252953) separately and after that create a clean patch for Bug 204769, so that it is easier to review.

Aakash Jain

Comment 12 2019-12-02 15:23:19 PST

(In reply to WebKit Commit Bot from comment #4) > Committed r252953: <https://trac.webkit.org/changeset/252953> Reverted in https://trac.webkit.org/changeset/253014/webkit

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version Other

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Aakash Jain

Reported

2019-11-29 08:31 PST

Modified

2019-12-02 15:24 PST History

CC List

5 users Show

URL

Keywords InRadar

Depends on

Blocks