228429 – [run-webkit-tests] Run largest shards first

NEW 228429

[run-webkit-tests] Run largest shards first

https://bugs.webkit.org/show_bug.cgi?id=228429

Summary [run-webkit-tests] Run largest shards first

Jonathan Bedard

Reported 2021-07-27 15:16:16 PDT

Alexey pointed this out a few weeks ago, and it's particularly obvious with webgl tests. Essentially, because we run tests in alphabetical order, any given invocation of run-webkit-tests will be inefficiently running towards the end, because there will be a larger shard running on a single process. The solution to this is to run the largest shards first. The potential problem with this is that we're sacrificing predictable test order, which may have an effect on flakiness. I think this trade-off will be worth it.

Attachments
Patch (1.39 KB, patch) 2021-07-27 15:20 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (3.75 KB, patch) 2021-07-27 16:21 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (3.74 KB, patch) 2021-07-28 19:08 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (2.00 KB, patch) 2022-05-25 15:58 PDT, Jonathan Bedard	ews-feeder: commit-queue-	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2021-07-27 15:16:46 PDT

<rdar://problem/81184667>

Jonathan Bedard

Comment 2 2021-07-27 15:20:56 PDT

Created attachment 434344 [details] Patch

Jonathan Bedard

Comment 3 2021-07-27 16:21:54 PDT

Created attachment 434382 [details] Patch

Jonathan Bedard

Comment 4 2021-07-27 16:56:05 PDT

Let's wait for EWS before officially landing this, since we know it could effect flakiness, this will at least help avoid anything obvious.

Alexey Proskuryakov

Comment 5 2021-07-27 17:44:49 PDT

It looks like build 12531 (for this patch) was slower than others on this bot? So not sure if this is a progression or a regression. Perhaps test run time has a huge variance. I didn't look at the logs to dig deeply. https://ews-build.webkit.org/#/workers/56

Jonathan Bedard

Comment 6 2021-07-28 08:05:33 PDT

The more concerning thing is that we surfaced a debug crash by changing test ordering. That being said, looking through our test runs, it looks to be a 10-15% improvement in most cases, layout tests just have so much variance it's hard to tell.

Jonathan Bedard

Comment 7 2021-07-28 17:43:26 PDT

There aren't any easy answers to this, the two crashes are pretty difficult to reproduce, but EWS is able to reliably do so, and it's a good bet infrastructure will as well. We need to resolve these crashes (and maybe a few others) before landing.

Jonathan Bedard

Comment 8 2021-07-28 19:08:06 PDT

Created attachment 434490 [details] Patch

Sam Sneddon [:gsnedders]

Comment 9 2021-07-30 06:03:09 PDT

(In reply to Jonathan Bedard from comment #7) > There aren't any easy answers to this, the two crashes are pretty difficult > to reproduce, but EWS is able to reliably do so, and it's a good bet > infrastructure will as well. We need to resolve these crashes (and maybe a > few others) before landing. I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to have affected main as well. But needless to say, the WK2 crash is bug 224085; note that other fullscreen tests have been skipped as a result of it. In general: I'm concerned that changing the order that the shards are run effects test results, as it implies little in the way of isolation between the different shards, and any bad state in one shard can end up affecting other shards. That said, some of the problem here is that we allocate all the shards to workers ahead-of-time, instead of (e.g.) reading from a Queue or having any sort of work-stealing, hence we can still end up with imbalances in worker execution time even after this patch.

Jonathan Bedard

Comment 10 2021-08-05 15:48:33 PDT

(In reply to Sam Sneddon [:gsnedders] from comment #9) > (In reply to Jonathan Bedard from comment #7) > > There aren't any easy answers to this, the two crashes are pretty difficult > > to reproduce, but EWS is able to reliably do so, and it's a good bet > > infrastructure will as well. We need to resolve these crashes (and maybe a > > few others) before landing. > > I'm only seeing one crash in WK2 in all the patches? The WK1 crash seems to > have affected main as well. But needless to say, the WK2 crash is bug > 224085; note that other fullscreen tests have been skipped as a result of it. > > In general: I'm concerned that changing the order that the shards are run > effects test results, as it implies little in the way of isolation between > the different shards, and any bad state in one shard can end up affecting > other shards. > > That said, some of the problem here is that we allocate all the shards to > workers ahead-of-time, instead of (e.g.) reading from a Queue or having any > sort of work-stealing, hence we can still end up with imbalances in worker > execution time even after this patch. EWS retries failed tests individually, the flaky failures in something like https://ews-build.webkit.org/#/builders/59/builds/12630 are still of concern. Also, the TaskPool code doesn't allocate the shards to the workers ahead of time, we are using a Queue that's shared between each process. You will always get some imbalances at the end of the test run, but this patch should minimize them.

Jonathan Bedard

Comment 11 2022-05-25 15:58:06 PDT

Created attachment 459772 [details] Patch

Sam Sneddon [:gsnedders]

Comment 12 2022-05-25 22:30:52 PDT

Comment on attachment 459772 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review I don't know what your plans are here (and this isn't r? anyway), but… > Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568 > - shard = TestShard(directory, test_inputs) > - shards.append(shard) > + while len(test_inputs) > MAX_SHARD_SIZE: > + shards.append(TestShard(directory, test_inputs[:MAX_SHARD_SIZE])) > + test_inputs = test_inputs[MAX_SHARD_SIZE:] > + shards.append(TestShard(directory, test_inputs)) I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug?

Jonathan Bedard

Comment 13 2022-05-27 15:08:49 PDT

Comment on attachment 459772 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=459772&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/layout_test_runner.py:568 >> + shards.append(TestShard(directory, test_inputs)) > > I'd rather we didn't change this in the same patch. That said, simple numeric splits have the risk of things moving between shards too often (i.e., adding a new test early on in a shard will change every split), but let's take this design discussion to a new/different bug? Didn't mark this as review because this was just posted for testing, no intention of landing this change as is.

Jonathan Bedard

Comment 14 2022-12-21 14:23:19 PST

Pull request: https://github.com/WebKit/WebKit/pull/7979

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Jonathan Bedard

Reported

2021-07-27 15:16 PDT

Modified

2022-12-21 14:23 PST History

CC List

7 users Show

URL

Keywords InRadar

Depends on

171615

Blocks

Dependencies

tree graph