Bug 225803 - [JSC] Implement high-level retry loop for run-jsc-stress-tests
Summary: [JSC] Implement high-level retry loop for run-jsc-stress-tests
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Angelos Oikonomopoulos
URL:
Keywords: InRadar
: 220794 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-05-14 03:01 PDT by Angelos Oikonomopoulos
Modified: 2021-05-28 11:41 PDT (History)
5 users (show)

See Also:


Attachments
Patch (22.41 KB, patch)
2021-05-14 03:04 PDT, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (23.03 KB, patch)
2021-05-14 07:21 PDT, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Angelos Oikonomopoulos 2021-05-14 03:01:37 PDT
[JSC] Implement high-level retry loop for run-jsc-stress-tests
Comment 1 Angelos Oikonomopoulos 2021-05-14 03:04:09 PDT
Created attachment 428615 [details]
Patch
Comment 2 Angelos Oikonomopoulos 2021-05-14 07:21:59 PDT
Created attachment 428619 [details]
Patch
Comment 3 Angelos Oikonomopoulos 2021-05-14 08:28:52 PDT
The behavior of this patch (for --gnu-parallel-runner only) should be captured in the MIPS EWS run at https://ews-build.webkit.org/#/builders/45/builds/4271 (I manually rebooted one of the remote boards partway through):

[...]
Remote host lost state, triggering high-level retry: mips-ci20-board26.local.igalia.com
5d65329bd1a3mips-ci20-board26.local.igalia.coma9aea5c3b843
parallel: SIGTERM received. No new jobs will be started.
parallel: Waiting for these 9 jobs to finish. Send SIGTERM again to stop now.
[...]
After try 1/3: got results for 1566/40530 tests, 3/3 hosts live
[reinitialization of the remotes]
[lots of successful tests]
Results for JSC stress tests:
    0 failures found.
    0 tests failed to complete.
    OK.
Comment 4 Radar WebKit Bug Importer 2021-05-21 03:02:19 PDT
<rdar://problem/78303014>
Comment 5 Angelos Oikonomopoulos 2021-05-24 01:56:43 PDT
*** Bug 220794 has been marked as a duplicate of this bug. ***
Comment 6 Angelos Oikonomopoulos 2021-05-24 02:03:48 PDT
Ping. This patch should only make a difference --gnu-parallel-runner which AFAIK is currently only used by the MIPS bots.

Only touches the generic path to change remoteIndex + array access to using a remoteHost passed from the caller (which I think is a simplification) and to make use of common code (processStatusLine) in both the local and remote paths in getStatusMap. Oh, and to remove some apparently unneeded (or, no longer needed) escapes in exportBaseEnvironmentVariables that fail on the new (only when using --gnu-parallel-runner) runAndMonitorCommandOutput.
Comment 7 Adrian Perez 2021-05-27 04:41:09 PDT
I'm not super fluent with Ruby, but thankfully Angelos' changes
are well commented so it was not much trouble to make sense of
it. Thanks for the patch!
Comment 8 EWS 2021-05-27 04:49:36 PDT
Committed r278159 (238203@main): <https://commits.webkit.org/238203@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 428619 [details].
Comment 9 Zhifei Fang 2021-05-27 21:10:00 PDT
Comment on attachment 428619 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=428619&action=review

> Tools/Scripts/run-jsc-stress-tests:2201
> +    dyldFrameworkPath = "\$(cd #{$testingFrameworkPath.dirname}; pwd)"

Opps, I think you forget the beginning '\', makes on device test failed, since we will run the command by ssh, without \$, $(...) will be execute locally