<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>265066</bug_id>
          
          <creation_ts>2023-11-17 13:39:01 -0800</creation_ts>
          <short_desc>[run-webkit-tests] Tests which fail and then crash on retry aren&apos;t flakey</short_desc>
          <delta_ts>2024-05-23 11:48:44 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Tools / Tests</component>
          <version>Safari Technology Preview</version>
          <rep_platform>Unspecified</rep_platform>
          <op_sys>Unspecified</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>InRadar</keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Jonathan Bedard">jbedard</reporter>
          <assigned_to name="Jonathan Bedard">jbedard</assigned_to>
          <cc>aakash_jain</cc>
    
    <cc>ap</cc>
    
    <cc>webkit-bug-importer</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>1993579</commentid>
    <comment_count>0</comment_count>
    <who name="Jonathan Bedard">jbedard</who>
    <bug_when>2023-11-17 13:39:01 -0800</bug_when>
    <thetext>Our run-webkit-tests logic retries failed tests runs in some configurations. It will classify a test as &quot;flakey&quot; if the retry results don&apos;t match the original results, and EWS uses this to decide when to mark a change as having failed tests.

This logic isn&apos;t quite right, though, as demonstrated by https://github.com/WebKit/WebKit/pull/20624. When we say &quot;flakey&quot;, we mean that the second test run matches an expected test result, not that the second test run is a different (but also failing) result.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1993580</commentid>
    <comment_count>1</comment_count>
    <who name="Radar WebKit Bug Importer">webkit-bug-importer</who>
    <bug_when>2023-11-17 13:43:18 -0800</bug_when>
    <thetext>&lt;rdar://problem/118578976&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>1994351</commentid>
    <comment_count>2</comment_count>
    <who name="Aakash Jain">aakash_jain</who>
    <bug_when>2023-11-22 03:32:11 -0800</bug_when>
    <thetext>I&apos;m not sure if I understand this bug report. It would be helpful to have expected and actual behaviours, along-with link to buildbot build demonstrating the issue (the linked PR has many builds).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2036736</commentid>
    <comment_count>3</comment_count>
    <who name="Jonathan Bedard">jbedard</who>
    <bug_when>2024-05-21 08:13:55 -0700</bug_when>
    <thetext>Relevant bit of code is this in webkitpy, line 311 in Tools/Scripts/webkitpy/layout_tests/models/test_run_results.py:
```
elif test_name in initial_results.unexpected_results_by_name:
    if retry_results and test_name in retry_results.unexpected_results_by_name:
        retry_result_type = retry_results.unexpected_results_by_name[test_name].type
        if result_type != retry_result_type:
            if enabled_pixel_tests_in_retry and result_type == test_expectations.TEXT and (retry_result_type == test_expectations.IMAGE_PLUS_TEXT or retry_result_type == test_expectations.MISSING):
                if retry_result_type == test_expectations.MISSING:
                    num_missing += 1
                num_regressions += 1
                test_dict[&apos;report&apos;] = &apos;REGRESSION&apos;
            else:
                num_flaky += 1
                test_dict[&apos;report&apos;] = &apos;FLAKY&apos;
            actual.append(keywords[retry_result_type])
        else:
            num_regressions += 1
            test_dict[&apos;report&apos;] = &apos;REGRESSION&apos;
```
These &apos;if&apos; statements are a mess, but basically, instead of checking if the retry result is unexpected, we just check if the retry result matches the original, and if it doesn&apos;t, consider the test flaky. That&apos;s not really what people mean when they say &quot;flaky&quot;. &quot;flaky&quot; means that a test sometimes passes, but sometimes fails. A test which, for example, either times out or crashes, is failing, not flaky.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2036754</commentid>
    <comment_count>4</comment_count>
    <who name="Jonathan Bedard">jbedard</who>
    <bug_when>2024-05-21 10:01:55 -0700</bug_when>
    <thetext>Pull request: https://github.com/WebKit/WebKit/pull/28856</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>2037415</commentid>
    <comment_count>5</comment_count>
    <who name="EWS">ews-feeder</who>
    <bug_when>2024-05-23 11:48:42 -0700</bug_when>
    <thetext>Committed 279220@main (ca4e82e8dd0a): &lt;https://commits.webkit.org/279220@main&gt;

Reviewed commits have been landed. Closing PR #28856 and removing active labels.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>