WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
231999
[EWS][GTK][WPE] Add a new Class for running layout tests on the EWS for the GTK and WPE ports.
https://bugs.webkit.org/show_bug.cgi?id=231999
Summary
[EWS][GTK][WPE] Add a new Class for running layout tests on the EWS for the G...
Carlos Alberto Lopez Perez
Reported
2021-10-19 20:11:09 PDT
Comment hidden (obsolete)
SSIA
Attachments
Patch
(64.45 KB, patch)
2021-10-19 20:37 PDT
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(65.74 KB, patch)
2021-10-19 20:59 PDT
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(67.53 KB, patch)
2021-11-11 11:23 PST
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(70.01 KB, patch)
2021-12-01 14:38 PST
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Carlos Alberto Lopez Perez
Comment 1
2021-10-19 20:33:22 PDT
We would like to add a new class to the EWS for running layout tests on the GTK and WPE ports. Initially this class will be used for the GTK port, and later for WPE. Mac/iOS ports will continue to use the current default EWS, so no behaviour change for them. This new class is specially designed to: A. Work with a tree that is not always green (or even that is often quite red). B. To not report any false positive to the patch author. C. To allow patch authors to use this EWS to get new expectations for patches that need lot of new re-baselines (like a WPT import) The very simplified logic of how this works is the following: 1. Run layout tests with patch (abort early at 500 unexpected failures) 2. Run layout tests with patch 10 times for each test that failed consistently (non-flaky) on step 1. 3. Run layout tests without patch 10 times for each test that failed consistently (non-flaky) on step 2. Then report to the patch author only the new consistent failures (tests that failed always with the patch and passed always without the patch, retrying 10 times) Any flaky test found is only reported to the bot watchers. For an explainer about why this is needed, and more details about the flow diagram and several design considerations, please check:
https://people.igalia.com/clopez/wkbug/231999/explainer.html
Carlos Alberto Lopez Perez
Comment 2
2021-10-19 20:37:55 PDT
Created
attachment 441841
[details]
Patch
Carlos Alberto Lopez Perez
Comment 3
2021-10-19 20:42:48 PDT
I have tested this locally quite a lot and I'm happy with the results. I think is robust and working as expected. I tested it with several patches, some of them adding failures and flakies like the one on
bug 231192
and other removing expectations that still fail like the one on
bug 231273
For it to work as expected a patch on run-webkit-tests fixing an existent issue is needed, see
bug 231790
I will be quite happy if we can run this on the UAT server for some time to test it before going into production on the real server.
Carlos Alberto Lopez Perez
Comment 4
2021-10-19 20:59:33 PDT
Created
attachment 441844
[details]
Patch
Aakash Jain
Comment 5
2021-10-20 08:57:01 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #3
)
> I will be quite happy if we can run this on the UAT server for some time to test it before going into production on the real server.
Applied
attachment 441844
[details]
to the UAT server. It is running builds now with the change. e.g.:
https://ews-build.webkit-uat.org/#/builders/34/builds/50455
Lauro Moura
Comment 6
2021-10-20 20:49:38 PDT
Comment on
attachment 441844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=441844&action=review
Nice work! Just some minor inline comments.
> Tools/CISupport/ews-build/steps.py:63 > + super(BufferLogHeaderObserver, self).__init__(**kwargs)
Given the buildbots are using Python3, maybe we could use just `super()`, without repeating the class name. (Ditto for the other classes added by the patch)
> Tools/CISupport/ews-build/steps.py:2766 > + timed_out_line_start = 'command timed out: {} seconds elapsed running'.format(self.MAX_SECONDS_STEP_RUN)
Should those timeouts "command timed out: TIMEOUT seconds without output..." be treated here too or would they follow another path?
> Tools/CISupport/ews-build/steps.py:2989 > + return self.report_infrastructure_issue_and_maybe_retry_build('The layout-test run with patch generated no list of results but exited with error, and the clean_tree without patch run did the samething.')
Missing space in "same thing"
Carlos Alberto Lopez Perez
Comment 7
2021-10-22 11:30:38 PDT
Comment on
attachment 441844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=441844&action=review
>> Tools/CISupport/ews-build/steps.py:2766 >> + timed_out_line_start = 'command timed out: {} seconds elapsed running'.format(self.MAX_SECONDS_STEP_RUN) > > Should those timeouts "command timed out: TIMEOUT seconds without output..." be treated here too or would they follow another path?
There are two kinds of timeouts: 1. Command doesn't produce any output for more than XX seconds. 2. Command reaches the value of the timeout even when it was producing output. This code cares about the second case. The first one is a different problem (likely RWT hanged) and it would be treated like a general unknown failure on the case of "run-layout-tests returned error code and didn't returned a list of failures or flakies"
Carlos Alberto Lopez Perez
Comment 8
2021-10-22 12:02:16 PDT
(In reply to Aakash Jain from
comment #5
)
> (In reply to Carlos Alberto Lopez Perez from
comment #3
) > > I will be quite happy if we can run this on the UAT server for some time to test it before going into production on the real server. > Applied
attachment 441844
[details]
to the UAT server. It is running builds > now with the change. e.g.: >
https://ews-build.webkit-uat.org/#/builders/34/builds/50455
Thanks a lot for applying it! The start has not been great since it is aborting early due to more than 500 unexpected failures due to a combination of: 1. we currently have on the tree more than 230 unexpected failures due to
bug 230797
2. on the EWS there are some tests timing out that don't happen on the post-commit bots that add to those 230 failures. I have to investigate this. But even with this quite broken tree, it is still being able to do work and is is even finding real new failures on bugs, but the completeness of this new findings is (of course) limited due to the early aborts. Another important thing is that is not reporting false positives (it reported false positives at the start because I think there were some problem with the flatpak SDK version used on the build bot, but I triggered clean builds and got fixed) On top of that there is also now a build failure with the build that is adding to the set of problems (
bug 232160
) and is causing the queue to grow more than acceptable. I will continue to monitor how it behaves and try to fine-tune some things.
Carlos Alberto Lopez Perez
Comment 9
2021-10-22 13:27:07 PDT
Comment on
attachment 441844
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=441844&action=review
> Tools/CISupport/ews-build/steps.py:2868 > + self.setProperty('xdebug_timeout_previous', with_patch_repeat_failures_timedout)
Note to me: remove this debug thing
Carlos Alberto Lopez Perez
Comment 10
2021-11-11 11:17:28 PST
Ok! Sorry for the long time in replying.. I have been on a holidays and busy with other things... Let's recap this: 1. This has been running for several weeks on the UAT. - On the first weeks it worked badly because we had some problems with the builds (random crashes when running layout tests due to a bug in ccache and the linker that is already solved) - Then it didn't started to work really well until
r284784
landed which is needed for the reporting on the retry steps to be reliable So after this, it has been working as expected. I did an analysis of the last 300 builds. You can read the detailed analysis here:
https://people.igalia.com/clopez/wkbug/231999/ews-results-analysis.txt
The conclusions are: * This new EWS is working as intended * It finds real new failues on the patches * It doesn't report false positives * It works with a tree that is red (when this analysis was done there were around 100 unexpected failures on the clean tree of the GTK port) * Is slow compared to the Mac EWS tester, but this performance will improve when we deploy it. * Having it working will help to reduce the number of unexpected failures landing. And the number of unexpected failures on the clean tree has an exponential direct impact on the time it takes to finish the testing (since it has to retry those unexpected failures from the clean tree 20 times) on the retry steps (with patch and without it). * We plan to add more hardware resources. I found two cases where it reported false positives, that will be corrected on a next version the patch - What happened: The step 'layout-tests-repeat-failures-without-patch' aborted due to Xvfb not starting and the EWS thought it finished clean (without producing failures) - Corrective action: We need to detect a general failure of the step on this cases and then repeat t - What happened: The step 'layout-tests' aborted due to the WPT http server not starting, so it caused a general failure (exit error but no results). So the EWS tried to run the whole layout tests without patch ('run-layout-tests-without-patch') and this time it finished (no general error, simply a list of failures) - Corrective action: if we have a general failure on the step 'layout-tests' and then the step 'run-layout-tests-without-patch' produces unexpected results then we need to retry the whole testing (we can't conclude anything when that happens)
Carlos Alberto Lopez Perez
Comment 11
2021-11-11 11:23:25 PST
Created
attachment 443981
[details]
Patch v2 of the patch. diff to previous:
http://sprunge.us/b79tdR?diff
Carlos Alberto Lopez Perez
Comment 12
2021-11-12 06:32:00 PST
BTW, the failures on the EWS (mac-wk1 and jsc) for this last version of the patch are totally unrelated
Carlos Alberto Lopez Perez
Comment 13
2021-11-24 12:06:52 PST
ping reviewers?
Jonathan Bedard
Comment 14
2021-12-01 12:05:46 PST
Comment on
attachment 443981
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=443981&action=review
> Tools/CISupport/ews-build/steps.py:2984 > + if clean_tree_run_failures or clean_tree_run_flakies:
A bit confused why this case triggers a retry. If the run with the patch caused an exit code that didn't generate test results, but the clean-tree run did generate test results, doesn't that mean we can blame the patch for not generating test results? Or is that already covered because the return code of such a run is either SUCCESS or WARNINGS?
> Tools/CISupport/ews-build/steps.py:3026 > + self.send_email_for_flaky_failure(flaky_failure, step_names_str)
It seems wrong to warn bot watchers about flakey failures that we only detect with a patch.
Carlos Alberto Lopez Perez
Comment 15
2021-12-01 12:28:38 PST
Comment on
attachment 443981
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=443981&action=review
>> Tools/CISupport/ews-build/steps.py:2984 >> + if clean_tree_run_failures or clean_tree_run_flakies: > > A bit confused why this case triggers a retry. If the run with the patch caused an exit code that didn't generate test results, but the clean-tree run did generate test results, doesn't that mean we can blame the patch for not generating test results? Or is that already covered because the return code of such a run is either SUCCESS or WARNINGS?
This is a corner case that was detected here
https://bugs.webkit.org/show_bug.cgi?id=231999#c10
when reviewing the results on the UAT: - What happened: The step 'layout-tests' aborted due to the WPT http server not starting, so it caused a general failure (exit error but no results). So the EWS tried to run the whole layout tests without patch ('run-layout-tests-without-patch' aka 'clean-tree') and this time it finished (no general error, simply a list of failures) In other words: If the clean-tree generates a list of failures, then the run with patch should also generate a list of failures unless it exists with zero status. Otherwise it means something broke during the testing with patch that caused a general failure (non-zero exit code but no list of failures/flakies) and then we can't conclude anything.
>> Tools/CISupport/ews-build/steps.py:3026 >> + self.send_email_for_flaky_failure(flaky_failure, step_names_str) > > It seems wrong to warn bot watchers about flakey failures that we only detect with a patch.
The idea of this is to help bot watchers to catch flakies and start marking them on the TestExpectation files. And I agree that flakies of interest are those that happen without patch. But the way this works, it causes that most of the flakies are filtered before reaching the run without patch. It Does this: with-patch: run-layout-tests(all_tests) => failures1, flakies1 with-patch: run-layout-tests-10times(failures1) => failures2, flakies2 without-patch: run-layout-tests-10times(failures2) => wfailures3, wflakies3 So most of the flakies will not reach the testing without-patch, and wflakies3 will likely be empty or will miss flakies of interest that are caused without the patch as well. The assumption here is that flakies are a general problem that will happen with patch even when the patch has not caused them so it reports everything found as flaky with the hope that it will be useful info for the bot watcher. The function to report the flakies includes the step name where the flaky was found so the bot watcher also knows if the flaky was found with patch or without it, so he can take that into account when investigating the issue.
Carlos Alberto Lopez Perez
Comment 16
2021-12-01 13:48:59 PST
Comment on
attachment 443981
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=443981&action=review
>>> Tools/CISupport/ews-build/steps.py:2984 >>> + if clean_tree_run_failures or clean_tree_run_flakies: >> >> A bit confused why this case triggers a retry. If the run with the patch caused an exit code that didn't generate test results, but the clean-tree run did generate test results, doesn't that mean we can blame the patch for not generating test results? Or is that already covered because the return code of such a run is either SUCCESS or WARNINGS? > > This is a corner case that was detected here
https://bugs.webkit.org/show_bug.cgi?id=231999#c10
when reviewing the results on the UAT: > > - What happened: The step 'layout-tests' aborted due to the WPT http server not starting, so it caused a general failure (exit error but no results). So the EWS tried to run the whole layout tests without patch ('run-layout-tests-without-patch' aka 'clean-tree') and this time it finished (no general error, simply a list of failures) > > In other words: If the clean-tree generates a list of failures, then the run with patch should also generate a list of failures unless it exists with zero status. Otherwise it means something broke during the testing with patch that caused a general failure (non-zero exit code but no list of failures/flakies) and then we can't conclude anything.
But thinking again about this corner case maybe the patch itself broke the script run-webkit-tests. In reality we should not treat different this corner case depending on the results of run-layout-tests-without-patch' aka 'clean-tree' (whether it passed green or generated a list of failures). Detecting a patch breaking the script run-webkit-tests shouldn't depend on the results of the 'clean-tree' run However it is true that is much more likely that a random error not caused by the patch happened on the first run, than the patch itself broke the script. So I think we can maybe do here the following: - 1. if retry_count is less than self.MAX_RETRY and we reach the case of "non-zero exist status for first-run-with-patch but no results of failure or flakies" then retry the whole testing again (we assume some random infrastructure error happened) - 2. otherwise (retry_count is self.MAX_RETRY) we report the patch as bad with 'Found unexpected failure with patch' That way we give it a few tries with the hope it was a random error that will not happen on a retry before reporting the patch as bad.
Carlos Alberto Lopez Perez
Comment 17
2021-12-01 14:38:27 PST
Created
attachment 445617
[details]
Patch v3 of the patch: 1. Implement the change from the comment above and 2. Add 3 more workers to speed up the queue. Diff to previous version:
http://sprunge.us/TdLC6I?diff
Jonathan Bedard
Comment 18
2021-12-01 16:13:27 PST
(In reply to Carlos Alberto Lopez Perez from
comment #15
)
> .... > > The idea of this is to help bot watchers to catch flakies and start marking them on the TestExpectation files. > And I agree that flakies of interest are those that happen without patch. > But the way this works, it causes that most of the flakies are filtered before reaching the run without patch. > ....
This is the bit I wasn't thinking about. I think the flakiness noise may be too high to be useful, but that's something we can address after we get this queue running.
Carlos Alberto Lopez Perez
Comment 19
2021-12-01 16:42:51 PST
Committed
r286405
(
244752@main
): <
https://commits.webkit.org/244752@main
>
Radar WebKit Bug Importer
Comment 20
2021-12-01 16:43:24 PST
<
rdar://problem/85945620
>
Carlos Alberto Lopez Perez
Comment 21
2021-12-01 16:43:48 PST
Thanks! landed it with some edits on the comments to make them more clear:
http://sprunge.us/5DOchH?diff
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug