219500 – [EWS] run-layout-tests-without-patch step should only retry the tests that failed on the previous steps

RESOLVED FIXED219500

[EWS] run-layout-tests-without-patch step should only retry the tests that failed on the previous steps

https://bugs.webkit.org/show_bug.cgi?id=219500

Summary [EWS] run-layout-tests-without-patch step should only retry the tests that fa...

Carlos Alberto Lopez Perez

Reported 2020-12-03 11:53:17 PST

When a patch that introduces failures the EWS for layout tests currently does this: 1. step run-layout-tests: run all the tests with the pach 2. step re-run-layout-tests: re-run all the tests with the pach 3. step run-layout-tests-without-patch: re-run all the tests without the pach Then it picks the tests that failed both in 1. and 2. and compares that with the tests that failed on 3. tests_that_consistently_failed = first_results_failing_tests.intersection(second_results_failing_tests) flaky_failures = first_results_failing_tests.union(second_results_failing_tests) - first_results_failing_tests.intersection(second_results_failing_tests) failures_introduced_by_patch = tests_that_consistently_failed - clean_tree_results_failing_tests The problem with this is the time that it takes to run the whole layout tests (between 30 minutes and 1 hour). So when a patch introduces new failures the EWS takes a lot to process. This problem is specially relevant for WebKitGTK, where we are trying to add an EWS that runs layout tests but our tree is not always green, so even when the patch doesn't introduce new failures the EWS has to run the 3 steps to compare with the clean tree (which is not clean). I think it would be idea to change this logic to on steps 2 and 3 to only run the tests that failed on the first step. See example: 1. step run-layout-tests: run all the tests with the pach (30 minutes) - Failed tests: {'fast/css/failure1.html', 'fast/css/failure2.html', 'fast/css/flaky_test.html'} 2. step re-run-layout-tests: re-run only this 3 failures (1 minute) - Failed tests: {'fast/css/failure1.html', 'fast/css/failure2.html'} 3. step run-layout-tests-without-patch: re-run only the 3 failures (1 minute= - Failed tests: {'fast/css/flaky_test.html'} result: tests_that_consistently_failed = {'fast/css/failure1.html', 'fast/css/failure2.html'} flaky_failures = {'fast/css/flaky_test.html'} failures_introduced_by_patch = {'fast/css/failure1.html', 'fast/css/failure2.html'} So the end result is the same but much faster as we only run once the whole layout test step.

Attachments
Patch (2.30 KB, patch) 2020-12-03 12:01 PST, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Patch (3.47 KB, patch) 2020-12-03 12:09 PST, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Patch (6.02 KB, patch) 2021-03-03 17:55 PST, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Patch (10.45 KB, patch) 2021-03-10 11:30 PST, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Patch (10.60 KB, patch) 2021-03-15 19:29 PDT, Carlos Alberto Lopez Perez	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Carlos Alberto Lopez Perez

Comment 1 2020-12-03 12:01:26 PST

Created attachment 415318 [details] Patch

Carlos Alberto Lopez Perez

Comment 2 2020-12-03 12:09:24 PST

Created attachment 415319 [details] Patch

Alexey Proskuryakov

Comment 3 2020-12-03 13:26:18 PST

Comment on attachment 415319 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415319&action=review > Tools/ChangeLog:14 > + This helps to speed up the test times on the EWS when the patch > + doesn't pass on the first try. This is specially useful when the > + clean-tree-without-patch is not 100% green or has some flakies. Great idea! I can't think of any major problem with running a subset of tests in clean-tree-without-patch, given that retries are enabled on EWS (so tests that only fail because of dependencies aren't detected anyway). However, re-run-layout-tests needs to run the whole test suite, so that we could detect failures by count. E.g. if a patch introduces failures on a random subset totaling 1% of tests, we would detect 30 failures on the first run, and it's absolutely not sufficient to only re-run those 30 (chances are that we'll hit zero failures, and green-light the patch). There is a small problem with run-layout-tests-without-patch too. What about tests that were just added with the patch, just marked in TestExpectations, or just unmarked? Forcing these tests via command line arguments will produce incorrect results in some of these cases. > Tools/CISupport/ews-build/steps.py:1949 > + self.setCommand(self.command + list(first_results_failing_tests)) Are we guaranteed to not run out of command line length limit?

Alexey Proskuryakov

Comment 4 2020-12-03 13:27:26 PST

At a higher level, I think that the solution to this problem is using flakiness dashboard data to ignore tests known to be flaky.

Carlos Alberto Lopez Perez

Comment 5 2020-12-03 13:48:36 PST

(In reply to Alexey Proskuryakov from comment #3) > Comment on attachment 415319 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=415319&action=review > > > Tools/ChangeLog:14 > > + This helps to speed up the test times on the EWS when the patch > > + doesn't pass on the first try. This is specially useful when the > > + clean-tree-without-patch is not 100% green or has some flakies. > > Great idea! > > I can't think of any major problem with running a subset of tests in > clean-tree-without-patch, given that retries are enabled on EWS (so tests > that only fail because of dependencies aren't detected anyway). > > However, re-run-layout-tests needs to run the whole test suite, so that we > could detect failures by count. E.g. if a patch introduces failures on a > random subset totaling 1% of tests, we would detect 30 failures on the first > run, and it's absolutely not sufficient to only re-run those 30 (chances are > that we'll hit zero failures, and green-light the patch). > The current patch already takes that into account. If on the first run we hit 30 failures then we exit early and first_results_did_exceed_test_failure_limit becomes True. So in that case the second retry runs with the whole set of layout tests again. > There is a small problem with run-layout-tests-without-patch too. What about > tests that were just added with the patch, just marked in TestExpectations, > or just unmarked? Forcing these tests via command line arguments will > produce incorrect results in some of these cases. > I'm not sure if I'm understanding this problem. Is because passing a path to a test that doesn't exist when the patch is unaplied? I have tested now to pass to run-webkit-tests a test path that doesn't exist and it simply ignores it. > > Tools/CISupport/ews-build/steps.py:1949 > > + self.setCommand(self.command + list(first_results_failing_tests)) > > Are we guaranteed to not run out of command line length limit? We are guaranteed because the layout test step for the EWS runs with --exit-after-n-failures 30'. So only 29 tests will be passed in the worst case. If it reaches 30 then first_results_did_exceed_test_failure_limit becomes True and the second retry runs the whole step as I said above.

Ryosuke Niwa

Comment 6 2020-12-03 13:54:43 PST

(In reply to Carlos Alberto Lopez Perez from comment #5) > (In reply to Alexey Proskuryakov from comment #3) > > Comment on attachment 415319 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=415319&action=review > > > > > Tools/ChangeLog:14 > > > + This helps to speed up the test times on the EWS when the patch > > > + doesn't pass on the first try. This is specially useful when the > > > + clean-tree-without-patch is not 100% green or has some flakies. > > > > Great idea! > > > > I can't think of any major problem with running a subset of tests in > > clean-tree-without-patch, given that retries are enabled on EWS (so tests > > that only fail because of dependencies aren't detected anyway). > > > > However, re-run-layout-tests needs to run the whole test suite, so that we > > could detect failures by count. E.g. if a patch introduces failures on a > > random subset totaling 1% of tests, we would detect 30 failures on the first > > run, and it's absolutely not sufficient to only re-run those 30 (chances are > > that we'll hit zero failures, and green-light the patch). > > > > > The current patch already takes that into account. > > If on the first run we hit 30 failures then we exit early and > first_results_did_exceed_test_failure_limit becomes True. > > So in that case the second retry runs with the whole set of layout tests > again. There are cases in which the number of failures is less than that limit. There are also cases in which a test failure happens because some states or conditions set by the proceeding tests so it's not always sufficient to just re-run the test that failed.

Alexey Proskuryakov

Comment 7 2020-12-03 17:00:19 PST

> If on the first run we hit 30 failures then we exit early and first_results_did_exceed_test_failure_limit becomes True. OK - but as Ryosuke pointed out, it could also be 20, and then the proposed logic will make EWS work incorrectly. > Is because passing a path to a test that doesn't exist when the patch is unaplied? I have tested now to pass to run-webkit-tests a test path that doesn't exist and it simply ignores it. One example is when a patch purportedly fixes a test and removes it from TestExpectations, but it actually still fails. Currently, clean-tree-without-patch wouldn't run this test, and it would be reported as a new failure. With the proposed change, it will be on the command line, will fail, and thus will be misreported as preexisting failure. > So only 29 tests will be passed in the worst case. Right, I was asking if you calculated how long 29 paths plus other options can be, and compared that to command line length limit.

Radar WebKit Bug Importer

Comment 8 2020-12-10 11:54:15 PST

<rdar://problem/72190937>

Carlos Alberto Lopez Perez

Comment 9 2021-03-03 17:55:04 PST

Created attachment 422168 [details] Patch

Carlos Alberto Lopez Perez

Comment 10 2021-03-04 08:23:10 PST

(In reply to Alexey Proskuryakov from comment #7) > > If on the first run we hit 30 failures then we exit early and first_results_did_exceed_test_failure_limit becomes True. > > OK - but as Ryosuke pointed out, it could also be 20, and then the proposed > logic will make EWS work incorrectly. > > > Is because passing a path to a test that doesn't exist when the patch is unaplied? > I have tested now to pass to run-webkit-tests a test path that doesn't exist > and it simply ignores it. > > One example is when a patch purportedly fixes a test and removes it from > TestExpectations, but it actually still fails. Currently, > clean-tree-without-patch wouldn't run this test, and it would be reported as > a new failure. With the proposed change, it will be on the command line, > will fail, and thus will be misreported as preexisting failure. > > > So only 29 tests will be passed in the worst case. > > Right, I was asking if you calculated how long 29 paths plus other options > can be, and compared that to command line length limit. Ok. I implemented a new version with this suggestions. It only modified the step run-layout-tests-without-patch to run the subset of failed tests from the previous steps (only if those previous steps finished testing with all tests) BTW, The red EWS from iOS is unrelated.

Alexey Proskuryakov

Comment 11 2021-03-05 16:48:17 PST

I think that this still has a problem with "tests that were just added with the patch, just marked in TestExpectations, or just unmarked". Sorry for not clarifying when you asked before. Consider a patch that removes a [ Skip ] test expectation. If the test still fails in reality, then EWS will be green anyway! That's because all three runs will see it fail - passing the test on command line overrides [ Skip ]. Skipped directories would result in something similar, but I'm not even entirely sure what will happen. Same thing with tests previously marked as [ Failure ], because EWS skips those. This is a practical scenario, as people do rely on EWS to tell them what happens when unskipping tests. Other than that, I cannot think of problems with the latest iteration.

Carlos Alberto Lopez Perez

Comment 12 2021-03-10 11:17:33 PST

(In reply to Alexey Proskuryakov from comment #11) > I think that this still has a problem with "tests that were just added with > the patch, just marked in TestExpectations, or just unmarked". Sorry for not > clarifying when you asked before. > > Consider a patch that removes a [ Skip ] test expectation. If the test still > fails in reality, then EWS will be green anyway! That's because all three > runs will see it fail - passing the test on command line overrides [ Skip ]. > Skipped directories would result in something similar, but I'm not even > entirely sure what will happen. > > Same thing with tests previously marked as [ Failure ], because EWS skips > those. > > This is a practical scenario, as people do rely on EWS to tell them what > happens when unskipping tests. > > Other than that, I cannot think of problems with the latest iteration. I see. This is an interesting case. Thanks for pointing it out. I have a new iteration of this: - Now it takes a look at the patch that is being tested, and if the patch modifies a TestExpectation file inside the directory LayoutTests then it avoids doing this optimization (it runs the whole test suite on the step run-layout-tests-without-patch)

Carlos Alberto Lopez Perez

Comment 13 2021-03-10 11:30:48 PST

Created attachment 422850 [details] Patch

Alexey Proskuryakov

Comment 14 2021-03-10 16:17:58 PST

I think that this makes sense conceptually! Probably best for Aakash to review the code, as he has more recent hands on experience with EWS.

Aakash Jain

Comment 15 2021-03-15 09:26:11 PDT

Comment on attachment 422850 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=422850&action=review > Tools/CISupport/ews-build/steps.py:2069 > + if ('LayoutTests/' in line and 'TestExpectations' in line) and (line.startswith('---') or line.startswith('+++')): patch is stored by buildbot as bytes (see https://github.com/buildbot/buildbot/issues/5812#issuecomment-790175979). So this line should be changed to something like: if (b'LayoutTests/' in line and b'TestExpectations' in line) and (line.startswith(b'---') or line.startswith(b'+++')): Else it hits issue like: builtins.TypeError: a bytes-like object is required, not 'str' (https://ews-build.webkit-uat.org/#/builders/69/builds/4/steps/29/logs/err_text) > Tools/CISupport/ews-build/steps_unittest.py:1983 > + RunWebKitTests._get_patch = lambda x: '+++ LayoutTests/Changelog\n+++ LayoutTests/platform/gtk/TestExpectations\n' Please make the patch content a byte object, since in reality patch is stored as byte object.

Aakash Jain

Comment 16 2021-03-15 09:28:17 PDT

Looks fine, tested on uat instance. e.g.: Patch which modifies TestExpectations: https://ews-build.webkit-uat.org/#/builders/69/builds/44 Patch which doesn't modifies TestExpectations: https://ews-build.webkit-uat.org/#/builders/69/builds/45

Aakash Jain

Comment 17 2021-03-15 10:48:57 PDT

Carlos Alberto Lopez Perez

Comment 18 2021-03-15 19:25:56 PDT

(In reply to Aakash Jain from comment #15) > Comment on attachment 422850 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=422850&action=review > > > Tools/CISupport/ews-build/steps.py:2069 > > + if ('LayoutTests/' in line and 'TestExpectations' in line) and (line.startswith('---') or line.startswith('+++')): > > patch is stored by buildbot as bytes (see > https://github.com/buildbot/buildbot/issues/5812#issuecomment-790175979). So > this line should be changed to something like: > > if (b'LayoutTests/' in line and b'TestExpectations' in line) and > (line.startswith(b'---') or line.startswith(b'+++')): > > Else it hits issue like: builtins.TypeError: a bytes-like object is > required, not 'str' > (https://ews-build.webkit-uat.org/#/builders/69/builds/4/steps/29/logs/ > err_text) > > > Tools/CISupport/ews-build/steps_unittest.py:1983 > > + RunWebKitTests._get_patch = lambda x: '+++ LayoutTests/Changelog\n+++ LayoutTests/platform/gtk/TestExpectations\n' > > Please make the patch content a byte object, since in reality patch is > stored as byte object. Good catch! I will fix it. (In reply to Aakash Jain from comment #16) > Looks fine, tested on uat instance. > e.g.: > Patch which modifies TestExpectations: > https://ews-build.webkit-uat.org/#/builders/69/builds/44 > > Patch which doesn't modifies TestExpectations: > https://ews-build.webkit-uat.org/#/builders/69/builds/45 Great! :)

Carlos Alberto Lopez Perez

Comment 19 2021-03-15 19:29:28 PDT

Created attachment 423284 [details] Patch use bytes for the patch

Carlos Alberto Lopez Perez

Comment 20 2021-03-16 07:05:52 PDT

Committed r274475 (235331@main): <https://commits.webkit.org/235331@main>

Aakash Jain

Comment 21 2021-03-16 07:07:31 PDT

Restarted buildbot to pick up this change.

Carlos Alberto Lopez Perez

Comment 22 2021-10-05 17:04:59 PDT

(In reply to Alexey Proskuryakov from comment #11) > I think that this still has a problem with "tests that were just added with > the patch, just marked in TestExpectations, or just unmarked". Sorry for not > clarifying when you asked before. > > Consider a patch that removes a [ Skip ] test expectation. If the test still > fails in reality, then EWS will be green anyway! That's because all three > runs will see it fail - passing the test on command line overrides [ Skip ]. > Skipped directories would result in something similar, but I'm not even > entirely sure what will happen. > > Same thing with tests previously marked as [ Failure ], because EWS skips > those. > > This is a practical scenario, as people do rely on EWS to tell them what > happens when unskipping tests. > > Other than that, I cannot think of problems with the latest iteration. I just discovered the option '--skipped=always' for run-webkit-tests. As far as I can test, with this option we can apply this optimization in this corner case without worries. I uploaded a patch in bug 231265

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Carlos Alberto Lopez Perez

Reported

2020-12-03 11:53 PST

Modified

2022-09-17 19:58 PDT History

CC List

10 users Show

URL

Keywords InRadar

Depends on

Blocks