Alexey pointed this out a few weeks ago, and it's particularly obvious with webgl tests. Essentially, because we run tests in alphabetical order, any given invocation of run-webkit-tests will be inefficiently running towards the end, because there will be a larger shard running on a single process. The solution to this is to run the largest shards first. The potential problem with this is that we're sacrificing predictable test order, which may have an effect on flakiness. I think this trade-off will be worth it.
<rdar://problem/81184667>
Created attachment 434344 [details] Patch
Created attachment 434382 [details] Patch
Let's wait for EWS before officially landing this, since we know it could effect flakiness, this will at least help avoid anything obvious.
It looks like build 12531 (for this patch) was slower than others on this bot? So not sure if this is a progression or a regression. Perhaps test run time has a huge variance. I didn't look at the logs to dig deeply. https://ews-build.webkit.org/#/workers/56
The more concerning thing is that we surfaced a debug crash by changing test ordering. That being said, looking through our test runs, it looks to be a 10-15% improvement in most cases, layout tests just have so much variance it's hard to tell.
There aren't any easy answers to this, the two crashes are pretty difficult to reproduce, but EWS is able to reliably do so, and it's a good bet infrastructure will as well. We need to resolve these crashes (and maybe a few others) before landing.
Created attachment 434490 [details] Patch
(In reply to Jonathan Bedard from comment #7) > There aren't any easy answers to this, the two crashes are pretty difficult > to reproduce, but EWS is able to reliably do so, and it's a good bet > infrastructure will as well. We need to resolve these crashes (and maybe a > few others) before landing. I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to have affected main as well. But needless to say, the WK2 crash is bug 224085; note that other fullscreen tests have been skipped as a result of it. In general: I'm concerned that changing the order that the shards are run effects test results, as it implies little in the way of isolation between the different shards, and any bad state in one shard can end up affecting other shards. That said, some of the problem here is that we allocate all the shards to workers ahead-of-time, instead of (e.g.) reading from a Queue or having any sort of work-stealing, hence we can still end up with imbalances in worker execution time even after this patch.
(In reply to Sam Sneddon [:gsnedders] from comment #9) > (In reply to Jonathan Bedard from comment #7) > > There aren't any easy answers to this, the two crashes are pretty difficult > > to reproduce, but EWS is able to reliably do so, and it's a good bet > > infrastructure will as well. We need to resolve these crashes (and maybe a > > few others) before landing. > > I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to > have affected main as well. But needless to say, the WK2 crash is bug > 224085; note that other fullscreen tests have been skipped as a result of it. > > In general: I'm concerned that changing the order that the shards are run > effects test results, as it implies little in the way of isolation between > the different shards, and any bad state in one shard can end up affecting > other shards. > > That said, some of the problem here is that we allocate all the shards to > workers ahead-of-time, instead of (e.g.) reading from a Queue or having any > sort of work-stealing, hence we can still end up with imbalances in worker > execution time even after this patch. EWS retries failed tests individually, the flaky failures in something like https://ews-build.webkit.org/#/builders/59/builds/12630 are still of concern. Also, the TaskPool code doesn't allocate the shards to the workers ahead of time, we are using a Queue that's shared between each process. You will always get some imbalances at the end of the test run, but this patch should minimize them.
Created attachment 459772 [details] Patch
Comment on attachment 459772 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review I don't know what your plans are here (and this isn't r? anyway), but… > Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568 > - shard = TestShard(directory, test_inputs) > - shards.append(shard) > + while len(test_inputs) > MAX_SHARD_SIZE: > + shards.append(TestShard(directory, test_inputs[:MAX_SHARD_SIZE])) > + test_inputs = test_inputs[MAX_SHARD_SIZE:] > + shards.append(TestShard(directory, test_inputs)) I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug?
Comment on attachment 459772 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568 >> + shards.append(TestShard(directory, test_inputs)) > > I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug? Didn't mark this as review because this was just posted for testing, no intention of landing this change as is.
Pull request: https://github.com/WebKit/WebKit/pull/7979