Bug 224289 - results.webkit.org should provide API for EWS to check flakiness of tests
Summary: results.webkit.org should provide API for EWS to check flakiness of tests
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-04-07 10:05 PDT by Aakash Jain
Modified: 2021-05-21 13:40 PDT (History)
8 users (show)

See Also:


Attachments
Current mock-up of script (2.34 KB, text/x-python-script)
2021-05-21 13:40 PDT, Chris Gambrell
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aakash Jain 2021-04-07 10:05:21 PDT
results.webkit.org should provide a REST API for EWS to check if the test is passing, consistently failing or flaky.

API should accept these parameters: test name(s), commit identifier, test-suite (layout-tests, api-tests etc,), platform (macos, iOS etc.), configuration (debug, release etc.) and any other necessary parameter.
API should return whether the test is consistently passing, consistently failing, flaky etc.
Comment 1 Aakash Jain 2021-04-12 08:39:47 PDT
This might need discussion about the specifics of the API we might need for EWS, specifically for flakiness information. I think we can tackle the problem in two parts: API for dealing with flaky failures in EWS, API for dealing with consistent failures in EWS.

For consistent failures, I filed two specific API requests in Bug 224434 and Bug 224435.
Comment 2 Jonathan Bedard 2021-04-12 09:23:56 PDT
I think the way that this API should work is that is should provide a "percent likelihood" for each outcome of a given test with a given configuration at a given commit. We will need to toy with the algorithm a bit to figure out what the appropriate way to rank commits surrounding the commit in question is, I'm envisioning a result that looks something like this:

{
    "PASS": 80,
    "FAIL": 10,
    "TIMEOUT": 5,
    "CRASH": 5
}

Meaning that given the configuration that the user provided, we would expect that the given test passes 80% of the time, fails 10% of the time, timeout 5% of the time and crashes 5% of the time. From that point, EWS can decide if the pass percentage is high enough to justify failing the build.
Comment 3 Radar WebKit Bug Importer 2021-04-14 10:06:17 PDT
<rdar://problem/76651206>
Comment 4 Chris Gambrell 2021-05-21 13:40:33 PDT
Created attachment 429332 [details]
Current mock-up of script