Bug 204704

Summary: [EWS] Do not retry layout-tests build if the flaky test failures are also present in clean tree run
Product: WebKit Reporter: Aakash Jain <aakash_jain>
Component: Tools / TestsAssignee: Aakash Jain <aakash_jain>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, ap, commit-queue, jbedard, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=204769
Attachments:
Description Flags
Patch none

Description Aakash Jain 2019-11-29 08:31:34 PST
In EWS, do not retry layout-tests build if the (flaky) test failures are also present in clean tree run. If same failures (whether flaky or consistent) are also found on clean tree, it is very unlikely that the failures were introduced by the patch.

e.g.: In https://ews-build.webkit.org/#/builders/24/builds/5456, the first and second run failures were also present on clean tree. However the build was retried, instead it should be marked as passing.
Comment 1 Aakash Jain 2019-11-29 08:37:24 PST
Created attachment 384510 [details]
Patch
Comment 2 Alexey Proskuryakov 2019-11-29 22:23:57 PST
Comment on attachment 384510 [details]
Patch

This is clearly an improvement, it's obvious that a patch with results like these should get a green bubble. I think that we are still too strict with declaring SUCCESS, and delay the verdict unnecessarily.

One thing that doesn't make logical sense to me is that we completely ignore flaky tests if they pass on retry within a single test run - but tests that also fail on retry within the run yet pass on a new run are treated as ultra important.
Comment 3 Aakash Jain 2019-11-29 23:44:38 PST
(In reply to Alexey Proskuryakov from comment #2)
> Comment on attachment 384510 [details]
> 
> One thing that doesn't make logical sense to me is that we completely ignore
> flaky tests if they pass on retry within a single test run - but tests that
> also fail on retry within the run yet pass on a new run are treated as ultra
> important.
Completely agree. I don’t know why we retry so aggressively when a flakiness is encountered between two test runs. However, as you said, we do ignore flaky tests within each single test-run.

Maybe we should take the intersection of test failures from first and second test runs, and compare it with clean tree run (unless there are 30+ failures in any run in which case we use special logic). This is what we do on API tests and JSC tests EWSes.
Comment 4 WebKit Commit Bot 2019-11-30 00:15:43 PST
Comment on attachment 384510 [details]
Patch

Clearing flags on attachment: 384510

Committed r252953: <https://trac.webkit.org/changeset/252953>
Comment 5 WebKit Commit Bot 2019-11-30 00:15:45 PST
All reviewed patches have been landed.  Closing bug.
Comment 6 Radar WebKit Bug Importer 2019-11-30 00:16:16 PST
<rdar://problem/57537845>
Comment 7 Aakash Jain 2019-11-30 09:29:43 PST
Deployed on production server.
Comment 8 Alexey Proskuryakov 2019-11-30 15:59:59 PST
I think that taking an intersection would be an improvement in most cases. We can make the logic more subtle and detect patches that introduce substantial flakiness. E.g. a human would probably be concerned about a patch that makes 5 tests flaky when nothing is flaky on a clean run.
Comment 9 Aakash Jain 2019-12-01 08:32:51 PST
(In reply to Alexey Proskuryakov from comment #8)
> I think that taking an intersection would be an improvement in most cases.
Yeah, For e.g.: in https://ews-build.webkit.org/#/builders/17/builds/7537 there was one pre-existing failure on trunk. In second-run a flaky failure "webaudio/audioparam-setTargetAtTime.html" caused the entire build to be retried.

When there is a pre-existing failure, any flaky failure would almost result in build being RETRIED for a good patch, while the build should be marked as SUCCESS (this is a huge waste of resources).
Comment 10 Jonathan Bedard 2019-12-02 08:43:10 PST
(In reply to Alexey Proskuryakov from comment #2)
> Comment on attachment 384510 [details]
> Patch
...
> One thing that doesn't make logical sense to me is that we completely ignore
> flaky tests if they pass on retry within a single test run - but tests that
> also fail on retry within the run yet pass on a new run are treated as ultra
> important.

I agree we need a better way to handle this. Lately, every example I’ve seen of a test which passes during the retry in run-webkit-tests has had some sort of stateful dependency in the test which caused it to fail when run with other tests. I actually think that passing tests on the retry in run-webkit-tests should be fatal. With the current state of the tree, that’s usually a pretty uncontroverdial bug.
Comment 11 Aakash Jain 2019-12-02 14:52:33 PST
Going to fix the entire thing in Bug 204769.

This bug covered only a specific case of flakiness, while Bug 204769 would be more generic and cover most of the flakiness scenarios.

I would need to revert this change as it added explicit if condition for this specific scenario.

It might be little unusual, but I am going to revert this commit (r252953) separately and after that create a clean patch for Bug 204769, so that it is easier to review.
Comment 12 Aakash Jain 2019-12-02 15:23:19 PST
(In reply to WebKit Commit Bot from comment #4)
> Committed r252953: <https://trac.webkit.org/changeset/252953>
Reverted in https://trac.webkit.org/changeset/253014/webkit