Bug 228429 - [run-webkit-tests] Run largest shards first
Summary: [run-webkit-tests] Run largest shards first
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on: 171615
Blocks:
  Show dependency treegraph
 
Reported: 2021-07-27 15:16 PDT by Jonathan Bedard
Modified: 2022-12-21 14:23 PST (History)
7 users (show)

See Also:


Attachments
Patch (1.39 KB, patch)
2021-07-27 15:20 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (3.75 KB, patch)
2021-07-27 16:21 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (3.74 KB, patch)
2021-07-28 19:08 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (2.00 KB, patch)
2022-05-25 15:58 PDT, Jonathan Bedard
ews-feeder: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2021-07-27 15:16:16 PDT
Alexey pointed this out a few weeks ago, and it's particularly obvious with webgl tests. Essentially, because we run tests in alphabetical order, any given invocation of run-webkit-tests will be inefficiently running towards the end, because there will be a larger shard running on a single process. The solution to this is to run the largest shards first. The potential problem with this is that we're sacrificing predictable test order, which may have an effect on flakiness. I think this trade-off will be worth it.
Comment 1 Radar WebKit Bug Importer 2021-07-27 15:16:46 PDT
<rdar://problem/81184667>
Comment 2 Jonathan Bedard 2021-07-27 15:20:56 PDT
Created attachment 434344 [details]
Patch
Comment 3 Jonathan Bedard 2021-07-27 16:21:54 PDT
Created attachment 434382 [details]
Patch
Comment 4 Jonathan Bedard 2021-07-27 16:56:05 PDT
Let's wait for EWS before officially landing this, since we know it could effect flakiness, this will at least help avoid anything obvious.
Comment 5 Alexey Proskuryakov 2021-07-27 17:44:49 PDT
It looks like build 12531 (for this patch) was slower than others on this bot? So not sure if this is a progression or a regression. Perhaps test run time has a huge variance. I didn't look at the logs to dig deeply.

https://ews-build.webkit.org/#/workers/56
Comment 6 Jonathan Bedard 2021-07-28 08:05:33 PDT
The more concerning thing is that we surfaced a debug crash by changing test ordering.

That being said, looking through our test runs, it looks to be a 10-15% improvement in most cases, layout tests just have so much variance it's hard to tell.
Comment 7 Jonathan Bedard 2021-07-28 17:43:26 PDT
There aren't any easy answers to this, the two crashes are pretty difficult to reproduce, but EWS is able to reliably do so, and it's a good bet infrastructure will as well. We need to resolve these crashes (and maybe a few others) before landing.
Comment 8 Jonathan Bedard 2021-07-28 19:08:06 PDT
Created attachment 434490 [details]
Patch
Comment 9 Sam Sneddon [:gsnedders] 2021-07-30 06:03:09 PDT
(In reply to Jonathan Bedard from comment #7)
> There aren't any easy answers to this, the two crashes are pretty difficult
> to reproduce, but EWS is able to reliably do so, and it's a good bet
> infrastructure will as well. We need to resolve these crashes (and maybe a
> few others) before landing.

I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to have affected main as well. But needless to say, the WK2 crash is bug 224085; note that other fullscreen tests have been skipped as a result of it.

In general: I'm concerned that changing the order that the shards are run effects test results, as it implies little in the way of isolation between the different shards, and any bad state in one shard can end up affecting other shards.

That said, some of the problem here is that we allocate all the shards to workers ahead-of-time, instead of (e.g.) reading from a Queue or having any sort of work-stealing, hence we can still end up with imbalances in worker execution time even after this patch.
Comment 10 Jonathan Bedard 2021-08-05 15:48:33 PDT
(In reply to Sam Sneddon [:gsnedders] from comment #9)
> (In reply to Jonathan Bedard from comment #7)
> > There aren't any easy answers to this, the two crashes are pretty difficult
> > to reproduce, but EWS is able to reliably do so, and it's a good bet
> > infrastructure will as well. We need to resolve these crashes (and maybe a
> > few others) before landing.
> 
> I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to
> have affected main as well. But needless to say, the WK2 crash is bug
> 224085; note that other fullscreen tests have been skipped as a result of it.
> 
> In general: I'm concerned that changing the order that the shards are run
> effects test results, as it implies little in the way of isolation between
> the different shards, and any bad state in one shard can end up affecting
> other shards.
> 
> That said, some of the problem here is that we allocate all the shards to
> workers ahead-of-time, instead of (e.g.) reading from a Queue or having any
> sort of work-stealing, hence we can still end up with imbalances in worker
> execution time even after this patch.

EWS retries failed tests individually, the flaky failures in something like https://ews-build.webkit.org/#/builders/59/builds/12630 are still of concern.

Also, the TaskPool code doesn't allocate the shards to the workers ahead of time, we are using a Queue that's shared between each process. You will always get some imbalances at the end of the test run, but this patch should minimize them.
Comment 11 Jonathan Bedard 2022-05-25 15:58:06 PDT
Created attachment 459772 [details]
Patch
Comment 12 Sam Sneddon [:gsnedders] 2022-05-25 22:30:52 PDT
Comment on attachment 459772 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review

I don't know what your plans are here (and this isn't r? anyway), but…

> Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568
> -            shard = TestShard(directory, test_inputs)
> -            shards.append(shard)
> +            while len(test_inputs) > MAX_SHARD_SIZE:
> +                shards.append(TestShard(directory, test_inputs[:MAX_SHARD_SIZE]))
> +                test_inputs = test_inputs[MAX_SHARD_SIZE:]
> +            shards.append(TestShard(directory, test_inputs))

I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug?
Comment 13 Jonathan Bedard 2022-05-27 15:08:49 PDT
Comment on attachment 459772 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review

>> Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568
>> +            shards.append(TestShard(directory, test_inputs))
> 
> I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug?

Didn't mark this as review because this was just posted for testing, no intention of landing this change as is.
Comment 14 Jonathan Bedard 2022-12-21 14:23:19 PST
Pull request: https://github.com/WebKit/WebKit/pull/7979