NEW 228429
[run-webkit-tests] Run largest shards first
https://bugs.webkit.org/show_bug.cgi?id=228429
Summary [run-webkit-tests] Run largest shards first
Jonathan Bedard
Reported 2021-07-27 15:16:16 PDT
Alexey pointed this out a few weeks ago, and it's particularly obvious with webgl tests. Essentially, because we run tests in alphabetical order, any given invocation of run-webkit-tests will be inefficiently running towards the end, because there will be a larger shard running on a single process. The solution to this is to run the largest shards first. The potential problem with this is that we're sacrificing predictable test order, which may have an effect on flakiness. I think this trade-off will be worth it.
Attachments
Patch (1.39 KB, patch)
2021-07-27 15:20 PDT, Jonathan Bedard
no flags
Patch (3.75 KB, patch)
2021-07-27 16:21 PDT, Jonathan Bedard
no flags
Patch (3.74 KB, patch)
2021-07-28 19:08 PDT, Jonathan Bedard
no flags
Patch (2.00 KB, patch)
2022-05-25 15:58 PDT, Jonathan Bedard
ews-feeder: commit-queue-
Radar WebKit Bug Importer
Comment 1 2021-07-27 15:16:46 PDT
Jonathan Bedard
Comment 2 2021-07-27 15:20:56 PDT
Jonathan Bedard
Comment 3 2021-07-27 16:21:54 PDT
Jonathan Bedard
Comment 4 2021-07-27 16:56:05 PDT
Let's wait for EWS before officially landing this, since we know it could effect flakiness, this will at least help avoid anything obvious.
Alexey Proskuryakov
Comment 5 2021-07-27 17:44:49 PDT
It looks like build 12531 (for this patch) was slower than others on this bot? So not sure if this is a progression or a regression. Perhaps test run time has a huge variance. I didn't look at the logs to dig deeply. https://ews-build.webkit.org/#/workers/56
Jonathan Bedard
Comment 6 2021-07-28 08:05:33 PDT
The more concerning thing is that we surfaced a debug crash by changing test ordering. That being said, looking through our test runs, it looks to be a 10-15% improvement in most cases, layout tests just have so much variance it's hard to tell.
Jonathan Bedard
Comment 7 2021-07-28 17:43:26 PDT
There aren't any easy answers to this, the two crashes are pretty difficult to reproduce, but EWS is able to reliably do so, and it's a good bet infrastructure will as well. We need to resolve these crashes (and maybe a few others) before landing.
Jonathan Bedard
Comment 8 2021-07-28 19:08:06 PDT
Sam Sneddon [:gsnedders]
Comment 9 2021-07-30 06:03:09 PDT
(In reply to Jonathan Bedard from comment #7) > There aren't any easy answers to this, the two crashes are pretty difficult > to reproduce, but EWS is able to reliably do so, and it's a good bet > infrastructure will as well. We need to resolve these crashes (and maybe a > few others) before landing. I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to have affected main as well. But needless to say, the WK2 crash is bug 224085; note that other fullscreen tests have been skipped as a result of it. In general: I'm concerned that changing the order that the shards are run effects test results, as it implies little in the way of isolation between the different shards, and any bad state in one shard can end up affecting other shards. That said, some of the problem here is that we allocate all the shards to workers ahead-of-time, instead of (e.g.) reading from a Queue or having any sort of work-stealing, hence we can still end up with imbalances in worker execution time even after this patch.
Jonathan Bedard
Comment 10 2021-08-05 15:48:33 PDT
(In reply to Sam Sneddon [:gsnedders] from comment #9) > (In reply to Jonathan Bedard from comment #7) > > There aren't any easy answers to this, the two crashes are pretty difficult > > to reproduce, but EWS is able to reliably do so, and it's a good bet > > infrastructure will as well. We need to resolve these crashes (and maybe a > > few others) before landing. > > I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to > have affected main as well. But needless to say, the WK2 crash is bug > 224085; note that other fullscreen tests have been skipped as a result of it. > > In general: I'm concerned that changing the order that the shards are run > effects test results, as it implies little in the way of isolation between > the different shards, and any bad state in one shard can end up affecting > other shards. > > That said, some of the problem here is that we allocate all the shards to > workers ahead-of-time, instead of (e.g.) reading from a Queue or having any > sort of work-stealing, hence we can still end up with imbalances in worker > execution time even after this patch. EWS retries failed tests individually, the flaky failures in something like https://ews-build.webkit.org/#/builders/59/builds/12630 are still of concern. Also, the TaskPool code doesn't allocate the shards to the workers ahead of time, we are using a Queue that's shared between each process. You will always get some imbalances at the end of the test run, but this patch should minimize them.
Jonathan Bedard
Comment 11 2022-05-25 15:58:06 PDT
Sam Sneddon [:gsnedders]
Comment 12 2022-05-25 22:30:52 PDT
Comment on attachment 459772 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review I don't know what your plans are here (and this isn't r? anyway), but… > Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568 > - shard = TestShard(directory, test_inputs) > - shards.append(shard) > + while len(test_inputs) > MAX_SHARD_SIZE: > + shards.append(TestShard(directory, test_inputs[:MAX_SHARD_SIZE])) > + test_inputs = test_inputs[MAX_SHARD_SIZE:] > + shards.append(TestShard(directory, test_inputs)) I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug?
Jonathan Bedard
Comment 13 2022-05-27 15:08:49 PDT
Comment on attachment 459772 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568 >> + shards.append(TestShard(directory, test_inputs)) > > I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug? Didn't mark this as review because this was just posted for testing, no intention of landing this change as is.
Jonathan Bedard
Comment 14 2022-12-21 14:23:19 PST
Note You need to log in before you can comment on or make changes to this bug.