Bug 265066 - [run-webkit-tests] Tests which fail and then crash on retry aren't flakey
Summary: [run-webkit-tests] Tests which fail and then crash on retry aren't flakey
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Safari Technology Preview
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2023-11-17 13:39 PST by Jonathan Bedard
Modified: 2024-05-23 11:48 PDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2023-11-17 13:39:01 PST
Our run-webkit-tests logic retries failed tests runs in some configurations. It will classify a test as "flakey" if the retry results don't match the original results, and EWS uses this to decide when to mark a change as having failed tests.

This logic isn't quite right, though, as demonstrated by https://github.com/WebKit/WebKit/pull/20624. When we say "flakey", we mean that the second test run matches an expected test result, not that the second test run is a different (but also failing) result.
Comment 1 Radar WebKit Bug Importer 2023-11-17 13:43:18 PST
<rdar://problem/118578976>
Comment 2 Aakash Jain 2023-11-22 03:32:11 PST
I'm not sure if I understand this bug report. It would be helpful to have expected and actual behaviours, along-with link to buildbot build demonstrating the issue (the linked PR has many builds).
Comment 3 Jonathan Bedard 2024-05-21 08:13:55 PDT
Relevant bit of code is this in webkitpy, line 311 in Tools/Scripts/webkitpy/layout_tests/models/test_run_results.py:
```
elif test_name in initial_results.unexpected_results_by_name:
    if retry_results and test_name in retry_results.unexpected_results_by_name:
        retry_result_type = retry_results.unexpected_results_by_name[test_name].type
        if result_type != retry_result_type:
            if enabled_pixel_tests_in_retry and result_type == test_expectations.TEXT and (retry_result_type == test_expectations.IMAGE_PLUS_TEXT or retry_result_type == test_expectations.MISSING):
                if retry_result_type == test_expectations.MISSING:
                    num_missing += 1
                num_regressions += 1
                test_dict['report'] = 'REGRESSION'
            else:
                num_flaky += 1
                test_dict['report'] = 'FLAKY'
            actual.append(keywords[retry_result_type])
        else:
            num_regressions += 1
            test_dict['report'] = 'REGRESSION'
```
These 'if' statements are a mess, but basically, instead of checking if the retry result is unexpected, we just check if the retry result matches the original, and if it doesn't, consider the test flaky. That's not really what people mean when they say "flaky". "flaky" means that a test sometimes passes, but sometimes fails. A test which, for example, either times out or crashes, is failing, not flaky.
Comment 4 Jonathan Bedard 2024-05-21 10:01:55 PDT
Pull request: https://github.com/WebKit/WebKit/pull/28856
Comment 5 EWS 2024-05-23 11:48:42 PDT
Committed 279220@main (ca4e82e8dd0a): <https://commits.webkit.org/279220@main>

Reviewed commits have been landed. Closing PR #28856 and removing active labels.