WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
219500
[EWS] run-layout-tests-without-patch step should only retry the tests that failed on the previous steps
https://bugs.webkit.org/show_bug.cgi?id=219500
Summary
[EWS] run-layout-tests-without-patch step should only retry the tests that fa...
Carlos Alberto Lopez Perez
Reported
2020-12-03 11:53:17 PST
When a patch that introduces failures the EWS for layout tests currently does this: 1. step run-layout-tests: run all the tests with the pach 2. step re-run-layout-tests: re-run all the tests with the pach 3. step run-layout-tests-without-patch: re-run all the tests without the pach Then it picks the tests that failed both in 1. and 2. and compares that with the tests that failed on 3. tests_that_consistently_failed = first_results_failing_tests.intersection(second_results_failing_tests) flaky_failures = first_results_failing_tests.union(second_results_failing_tests) - first_results_failing_tests.intersection(second_results_failing_tests) failures_introduced_by_patch = tests_that_consistently_failed - clean_tree_results_failing_tests The problem with this is the time that it takes to run the whole layout tests (between 30 minutes and 1 hour). So when a patch introduces new failures the EWS takes a lot to process. This problem is specially relevant for WebKitGTK, where we are trying to add an EWS that runs layout tests but our tree is not always green, so even when the patch doesn't introduce new failures the EWS has to run the 3 steps to compare with the clean tree (which is not clean). I think it would be idea to change this logic to on steps 2 and 3 to only run the tests that failed on the first step. See example: 1. step run-layout-tests: run all the tests with the pach (30 minutes) - Failed tests: {'fast/css/failure1.html', 'fast/css/failure2.html', 'fast/css/flaky_test.html'} 2. step re-run-layout-tests: re-run only this 3 failures (1 minute) - Failed tests: {'fast/css/failure1.html', 'fast/css/failure2.html'} 3. step run-layout-tests-without-patch: re-run only the 3 failures (1 minute= - Failed tests: {'fast/css/flaky_test.html'} result: tests_that_consistently_failed = {'fast/css/failure1.html', 'fast/css/failure2.html'} flaky_failures = {'fast/css/flaky_test.html'} failures_introduced_by_patch = {'fast/css/failure1.html', 'fast/css/failure2.html'} So the end result is the same but much faster as we only run once the whole layout test step.
Attachments
Patch
(2.30 KB, patch)
2020-12-03 12:01 PST
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(3.47 KB, patch)
2020-12-03 12:09 PST
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(6.02 KB, patch)
2021-03-03 17:55 PST
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(10.45 KB, patch)
2021-03-10 11:30 PST
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(10.60 KB, patch)
2021-03-15 19:29 PDT
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Carlos Alberto Lopez Perez
Comment 1
2020-12-03 12:01:26 PST
Created
attachment 415318
[details]
Patch
Carlos Alberto Lopez Perez
Comment 2
2020-12-03 12:09:24 PST
Created
attachment 415319
[details]
Patch
Alexey Proskuryakov
Comment 3
2020-12-03 13:26:18 PST
Comment on
attachment 415319
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=415319&action=review
> Tools/ChangeLog:14 > + This helps to speed up the test times on the EWS when the patch > + doesn't pass on the first try. This is specially useful when the > + clean-tree-without-patch is not 100% green or has some flakies.
Great idea! I can't think of any major problem with running a subset of tests in clean-tree-without-patch, given that retries are enabled on EWS (so tests that only fail because of dependencies aren't detected anyway). However, re-run-layout-tests needs to run the whole test suite, so that we could detect failures by count. E.g. if a patch introduces failures on a random subset totaling 1% of tests, we would detect 30 failures on the first run, and it's absolutely not sufficient to only re-run those 30 (chances are that we'll hit zero failures, and green-light the patch). There is a small problem with run-layout-tests-without-patch too. What about tests that were just added with the patch, just marked in TestExpectations, or just unmarked? Forcing these tests via command line arguments will produce incorrect results in some of these cases.
> Tools/CISupport/ews-build/steps.py:1949 > + self.setCommand(self.command + list(first_results_failing_tests))
Are we guaranteed to not run out of command line length limit?
Alexey Proskuryakov
Comment 4
2020-12-03 13:27:26 PST
At a higher level, I think that the solution to this problem is using flakiness dashboard data to ignore tests known to be flaky.
Carlos Alberto Lopez Perez
Comment 5
2020-12-03 13:48:36 PST
(In reply to Alexey Proskuryakov from
comment #3
)
> Comment on
attachment 415319
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=415319&action=review
> > > Tools/ChangeLog:14 > > + This helps to speed up the test times on the EWS when the patch > > + doesn't pass on the first try. This is specially useful when the > > + clean-tree-without-patch is not 100% green or has some flakies. > > Great idea! > > I can't think of any major problem with running a subset of tests in > clean-tree-without-patch, given that retries are enabled on EWS (so tests > that only fail because of dependencies aren't detected anyway). > > However, re-run-layout-tests needs to run the whole test suite, so that we > could detect failures by count. E.g. if a patch introduces failures on a > random subset totaling 1% of tests, we would detect 30 failures on the first > run, and it's absolutely not sufficient to only re-run those 30 (chances are > that we'll hit zero failures, and green-light the patch). >
The current patch already takes that into account. If on the first run we hit 30 failures then we exit early and first_results_did_exceed_test_failure_limit becomes True. So in that case the second retry runs with the whole set of layout tests again.
> There is a small problem with run-layout-tests-without-patch too. What about > tests that were just added with the patch, just marked in TestExpectations, > or just unmarked? Forcing these tests via command line arguments will > produce incorrect results in some of these cases. >
I'm not sure if I'm understanding this problem. Is because passing a path to a test that doesn't exist when the patch is unaplied? I have tested now to pass to run-webkit-tests a test path that doesn't exist and it simply ignores it.
> > Tools/CISupport/ews-build/steps.py:1949 > > + self.setCommand(self.command + list(first_results_failing_tests)) > > Are we guaranteed to not run out of command line length limit?
We are guaranteed because the layout test step for the EWS runs with --exit-after-n-failures 30'. So only 29 tests will be passed in the worst case. If it reaches 30 then first_results_did_exceed_test_failure_limit becomes True and the second retry runs the whole step as I said above.
Ryosuke Niwa
Comment 6
2020-12-03 13:54:43 PST
(In reply to Carlos Alberto Lopez Perez from
comment #5
)
> (In reply to Alexey Proskuryakov from
comment #3
) > > Comment on
attachment 415319
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=415319&action=review
> > > > > Tools/ChangeLog:14 > > > + This helps to speed up the test times on the EWS when the patch > > > + doesn't pass on the first try. This is specially useful when the > > > + clean-tree-without-patch is not 100% green or has some flakies. > > > > Great idea! > > > > I can't think of any major problem with running a subset of tests in > > clean-tree-without-patch, given that retries are enabled on EWS (so tests > > that only fail because of dependencies aren't detected anyway). > > > > However, re-run-layout-tests needs to run the whole test suite, so that we > > could detect failures by count. E.g. if a patch introduces failures on a > > random subset totaling 1% of tests, we would detect 30 failures on the first > > run, and it's absolutely not sufficient to only re-run those 30 (chances are > > that we'll hit zero failures, and green-light the patch). > > > > > The current patch already takes that into account. > > If on the first run we hit 30 failures then we exit early and > first_results_did_exceed_test_failure_limit becomes True. > > So in that case the second retry runs with the whole set of layout tests > again.
There are cases in which the number of failures is less than that limit. There are also cases in which a test failure happens because some states or conditions set by the proceeding tests so it's not always sufficient to just re-run the test that failed.
Alexey Proskuryakov
Comment 7
2020-12-03 17:00:19 PST
> If on the first run we hit 30 failures then we exit early and first_results_did_exceed_test_failure_limit becomes True.
OK - but as Ryosuke pointed out, it could also be 20, and then the proposed logic will make EWS work incorrectly.
> Is because passing a path to a test that doesn't exist when the patch is unaplied?
I have tested now to pass to run-webkit-tests a test path that doesn't exist and it simply ignores it. One example is when a patch purportedly fixes a test and removes it from TestExpectations, but it actually still fails. Currently, clean-tree-without-patch wouldn't run this test, and it would be reported as a new failure. With the proposed change, it will be on the command line, will fail, and thus will be misreported as preexisting failure.
> So only 29 tests will be passed in the worst case.
Right, I was asking if you calculated how long 29 paths plus other options can be, and compared that to command line length limit.
Radar WebKit Bug Importer
Comment 8
2020-12-10 11:54:15 PST
<
rdar://problem/72190937
>
Carlos Alberto Lopez Perez
Comment 9
2021-03-03 17:55:04 PST
Created
attachment 422168
[details]
Patch
Carlos Alberto Lopez Perez
Comment 10
2021-03-04 08:23:10 PST
(In reply to Alexey Proskuryakov from
comment #7
)
> > If on the first run we hit 30 failures then we exit early and first_results_did_exceed_test_failure_limit becomes True. > > OK - but as Ryosuke pointed out, it could also be 20, and then the proposed > logic will make EWS work incorrectly. > > > Is because passing a path to a test that doesn't exist when the patch is unaplied? > I have tested now to pass to run-webkit-tests a test path that doesn't exist > and it simply ignores it. > > One example is when a patch purportedly fixes a test and removes it from > TestExpectations, but it actually still fails. Currently, > clean-tree-without-patch wouldn't run this test, and it would be reported as > a new failure. With the proposed change, it will be on the command line, > will fail, and thus will be misreported as preexisting failure. > > > So only 29 tests will be passed in the worst case. > > Right, I was asking if you calculated how long 29 paths plus other options > can be, and compared that to command line length limit.
Ok. I implemented a new version with this suggestions. It only modified the step run-layout-tests-without-patch to run the subset of failed tests from the previous steps (only if those previous steps finished testing with all tests) BTW, The red EWS from iOS is unrelated.
Alexey Proskuryakov
Comment 11
2021-03-05 16:48:17 PST
I think that this still has a problem with "tests that were just added with the patch, just marked in TestExpectations, or just unmarked". Sorry for not clarifying when you asked before. Consider a patch that removes a [ Skip ] test expectation. If the test still fails in reality, then EWS will be green anyway! That's because all three runs will see it fail - passing the test on command line overrides [ Skip ]. Skipped directories would result in something similar, but I'm not even entirely sure what will happen. Same thing with tests previously marked as [ Failure ], because EWS skips those. This is a practical scenario, as people do rely on EWS to tell them what happens when unskipping tests. Other than that, I cannot think of problems with the latest iteration.
Carlos Alberto Lopez Perez
Comment 12
2021-03-10 11:17:33 PST
(In reply to Alexey Proskuryakov from
comment #11
)
> I think that this still has a problem with "tests that were just added with > the patch, just marked in TestExpectations, or just unmarked". Sorry for not > clarifying when you asked before. > > Consider a patch that removes a [ Skip ] test expectation. If the test still > fails in reality, then EWS will be green anyway! That's because all three > runs will see it fail - passing the test on command line overrides [ Skip ]. > Skipped directories would result in something similar, but I'm not even > entirely sure what will happen. > > Same thing with tests previously marked as [ Failure ], because EWS skips > those. > > This is a practical scenario, as people do rely on EWS to tell them what > happens when unskipping tests. > > Other than that, I cannot think of problems with the latest iteration.
I see. This is an interesting case. Thanks for pointing it out. I have a new iteration of this: - Now it takes a look at the patch that is being tested, and if the patch modifies a TestExpectation file inside the directory LayoutTests then it avoids doing this optimization (it runs the whole test suite on the step run-layout-tests-without-patch)
Carlos Alberto Lopez Perez
Comment 13
2021-03-10 11:30:48 PST
Created
attachment 422850
[details]
Patch
Alexey Proskuryakov
Comment 14
2021-03-10 16:17:58 PST
I think that this makes sense conceptually! Probably best for Aakash to review the code, as he has more recent hands on experience with EWS.
Aakash Jain
Comment 15
2021-03-15 09:26:11 PDT
Comment on
attachment 422850
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=422850&action=review
> Tools/CISupport/ews-build/steps.py:2069 > + if ('LayoutTests/' in line and 'TestExpectations' in line) and (line.startswith('---') or line.startswith('+++')):
patch is stored by buildbot as bytes (see
https://github.com/buildbot/buildbot/issues/5812#issuecomment-790175979
). So this line should be changed to something like: if (b'LayoutTests/' in line and b'TestExpectations' in line) and (line.startswith(b'---') or line.startswith(b'+++')): Else it hits issue like: builtins.TypeError: a bytes-like object is required, not 'str' (
https://ews-build.webkit-uat.org/#/builders/69/builds/4/steps/29/logs/err_text
)
> Tools/CISupport/ews-build/steps_unittest.py:1983 > + RunWebKitTests._get_patch = lambda x: '+++ LayoutTests/Changelog\n+++ LayoutTests/platform/gtk/TestExpectations\n'
Please make the patch content a byte object, since in reality patch is stored as byte object.
Aakash Jain
Comment 16
2021-03-15 09:28:17 PDT
Looks fine, tested on uat instance. e.g.: Patch which modifies TestExpectations:
https://ews-build.webkit-uat.org/#/builders/69/builds/44
Patch which doesn't modifies TestExpectations:
https://ews-build.webkit-uat.org/#/builders/69/builds/45
Aakash Jain
Comment 17
2021-03-15 10:48:57 PDT
Comment on
attachment 422850
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=422850&action=review
> Tools/CISupport/ews-build/steps.py:2054 > + if self.name == 'run-layout-tests-without-patch':
can change this to: RunWebKitTestsWithoutPatch.name
Carlos Alberto Lopez Perez
Comment 18
2021-03-15 19:25:56 PDT
(In reply to Aakash Jain from
comment #15
)
> Comment on
attachment 422850
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=422850&action=review
> > > Tools/CISupport/ews-build/steps.py:2069 > > + if ('LayoutTests/' in line and 'TestExpectations' in line) and (line.startswith('---') or line.startswith('+++')): > > patch is stored by buildbot as bytes (see >
https://github.com/buildbot/buildbot/issues/5812#issuecomment-790175979
). So > this line should be changed to something like: > > if (b'LayoutTests/' in line and b'TestExpectations' in line) and > (line.startswith(b'---') or line.startswith(b'+++')): > > Else it hits issue like: builtins.TypeError: a bytes-like object is > required, not 'str' > (
https://ews-build.webkit-uat.org/#/builders/69/builds/4/steps/29/logs/
> err_text) > > > Tools/CISupport/ews-build/steps_unittest.py:1983 > > + RunWebKitTests._get_patch = lambda x: '+++ LayoutTests/Changelog\n+++ LayoutTests/platform/gtk/TestExpectations\n' > > Please make the patch content a byte object, since in reality patch is > stored as byte object.
Good catch! I will fix it. (In reply to Aakash Jain from
comment #16
)
> Looks fine, tested on uat instance. > e.g.: > Patch which modifies TestExpectations: >
https://ews-build.webkit-uat.org/#/builders/69/builds/44
> > Patch which doesn't modifies TestExpectations: >
https://ews-build.webkit-uat.org/#/builders/69/builds/45
Great! :)
Carlos Alberto Lopez Perez
Comment 19
2021-03-15 19:29:28 PDT
Created
attachment 423284
[details]
Patch use bytes for the patch
Carlos Alberto Lopez Perez
Comment 20
2021-03-16 07:05:52 PDT
Committed
r274475
(
235331@main
): <
https://commits.webkit.org/235331@main
>
Aakash Jain
Comment 21
2021-03-16 07:07:31 PDT
Restarted buildbot to pick up this change.
Carlos Alberto Lopez Perez
Comment 22
2021-10-05 17:04:59 PDT
(In reply to Alexey Proskuryakov from
comment #11
)
> I think that this still has a problem with "tests that were just added with > the patch, just marked in TestExpectations, or just unmarked". Sorry for not > clarifying when you asked before. > > Consider a patch that removes a [ Skip ] test expectation. If the test still > fails in reality, then EWS will be green anyway! That's because all three > runs will see it fail - passing the test on command line overrides [ Skip ]. > Skipped directories would result in something similar, but I'm not even > entirely sure what will happen. > > Same thing with tests previously marked as [ Failure ], because EWS skips > those. > > This is a practical scenario, as people do rely on EWS to tell them what > happens when unskipping tests. > > Other than that, I cannot think of problems with the latest iteration.
I just discovered the option '--skipped=always' for run-webkit-tests. As far as I can test, with this option we can apply this optimization in this corner case without worries. I uploaded a patch in
bug 231265
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug