Summary: | run-jsc-stress-tests doesn't handle dead remotes in detectFailures | ||
---|---|---|---|
Product: | WebKit | Reporter: | Angelos Oikonomopoulos <angelos> |
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED DUPLICATE | ||
Severity: | Normal | CC: | aakash_jain, angelos, clopez, webkit-bug-importer |
Priority: | P2 | Keywords: | InRadar |
Version: | WebKit Nightly Build | ||
Hardware: | Unspecified | ||
OS: | Unspecified |
Description
Angelos Oikonomopoulos
2021-01-21 06:36:47 PST
(In reply to Angelos Oikonomopoulos from comment #0) > When a remote board goes away while run-jsc-stress tests is running, the > --gnu-parallel-runner reschedules the tests properly, but detectFailures can > fail in a number of ways: > > - if the board is down when detectFailures runs, it'll fail the whole test > run after getting a connection error > - if the board has come up again, there's no guarantee that the failure > files are still there. In fact, the mips boards will recreate the R/W > filesystem if fsck detects any errors on boot, which means that all the > machinery in the remoteDirectory isn't there anymore. > > One way to handle this case would be to also restart jobs for which we > weren't able to get the PASS/FAIL status. Perhaps by including the fetch in > the command invocation, so that GNU parallel will transparently handle this > for us -- guess this means we need to move away from detectFailures on > --gnu-parallel-runner. > > Note that detectFailures is fundamentally flawed in any case: it should be > actively confirming that the job finished successfully, not relying on the > absence of a 'failure' file. This has been partly taken care of in https://bugs.webkit.org/show_bug.cgi?id=222601 (don't be so optimistic in detecting failures). Building on this, https://bugs.webkit.org/show_bug.cgi?id=225803 implements a retry loop within run-jsc-stress-tests. Closing this bug in favor of 225803. *** This bug has been marked as a duplicate of bug 225803 *** |