RESOLVED FIXED 98562
nrwt: [chromium] run http tests in parallel on bigger machines
https://bugs.webkit.org/show_bug.cgi?id=98562
Summary nrwt: [chromium] run http tests in parallel on bigger machines
Dirk Pranke
Reported 2012-10-05 15:21:22 PDT
nrwt: [chromium] run http tests in parallel on bigger machines
Attachments
Patch (8.39 KB, patch)
2012-10-05 15:31 PDT, Dirk Pranke
eric: review+
Dirk Pranke
Comment 1 2012-10-05 15:31:29 PDT
Dirk Pranke
Comment 2 2012-10-05 15:37:24 PDT
specifically, this should only affect the Mac 10.7 bot (which has 8 workers, so we'll run 2 http tests in parallel) and the Linux dbg bot (which runs 24 works, so 6 http tests in parallel -> the debug bot is a beast).
Dirk Pranke
Comment 3 2012-10-05 15:38:01 PDT
if we see any increased flakiness on those bots, we can back the change out, otherwise we can look at adding more workers to the other debug bots.
Tony Chang
Comment 4 2012-10-05 15:47:01 PDT
A couple weeks ago, I was testing setting --max-locked-shards=2 on my Z620 and found that it made the perf tests flaky, since they run in the same shard. Do you see any flakiness in the perf tests on your machine?
Dirk Pranke
Comment 5 2012-10-05 15:58:21 PDT
I am running some tests to see. In reality I think we need to separate out the concepts of "test needs the lock" from "test should not be run while other tests are running", and we should probably strive to not have any tests fall in the latter category (although they might be necessary for some periods of time). Otherwise we'll always have bottlenecks in the cycle time.
Vincent Scheib
Comment 6 2012-10-05 16:21:26 PDT
Comment on attachment 167396 [details] Patch Dirk made me!
Vincent Scheib
Comment 7 2012-10-05 16:21:58 PDT
Comment on attachment 167396 [details] Patch Test complete.
Dirk Pranke
Comment 8 2012-10-05 16:23:25 PDT
(we were testing if a non-reviewer can r+ bugs ...)
Dirk Pranke
Comment 9 2012-10-05 16:37:09 PDT
So, ironically, on Linux on my z600 in a release build, I see flakiness in http and storage, but not perf (and I always get flakiness in http and storage). I don't see any flakiness on the Mac at all (again in Release).
Eric Seidel (no email)
Comment 10 2012-10-08 11:50:58 PDT
Comment on attachment 167396 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=167396&action=review LGTM. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:120 > + def default_max_locked_shards(self): > + """Return the number of "locked" shards to run in parallel (like the http tests).""" > + max_locked_shards = int(self.default_child_processes()) / 4 > + if not max_locked_shards: > + return 1 > + return max_locked_shards I assume the plan is to move this logic down once tested in the field?
Dirk Pranke
Comment 11 2012-10-08 11:53:25 PDT
(In reply to comment #10) > (From update of attachment 167396 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=167396&action=review > > LGTM. > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:120 > > + def default_max_locked_shards(self): > > + """Return the number of "locked" shards to run in parallel (like the http tests).""" > > + max_locked_shards = int(self.default_child_processes()) / 4 > > + if not max_locked_shards: > > + return 1 > > + return max_locked_shards > > I assume the plan is to move this logic down once tested in the field? If by "down" you mean into base.py and "tested in the field" you mean "works stably on other ports", then yes :).
Dirk Pranke
Comment 12 2012-10-08 15:06:00 PDT
Ojan Vafai
Comment 13 2012-10-08 16:47:48 PDT
Looks like a couple perf tests started consistently failing after this. http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf
Dirk Pranke
Comment 14 2012-10-08 17:46:50 PDT
(In reply to comment #13) > Looks like a couple perf tests started consistently failing after this. > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. In the other cases, it just looks to me like the tests have been flaky, period, and are mostly marked as PASS FAIL. So, I'm not sure why we're even running these?
Ojan Vafai
Comment 15 2012-10-08 18:18:41 PDT
(In reply to comment #14) > (In reply to comment #13) > > Looks like a couple perf tests started consistently failing after this. > > > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf > > I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. They're not timing out, so marking them slow won't help. perf/mouse-event.html was consistently passing before this patch. The way these tests work is to run at different magnitudes and try to compare to see if they're constant, linear, polynomial, etc. So, if they don't have a whole core to themselves, they're likely to be more flaky. > In the other cases, it just looks to me like the tests have been flaky, period, and are mostly marked as PASS FAIL. > > So, I'm not sure why we're even running these? It's a little confusing because I recently committed a couple patches to reduce flakiness. So, you can see that some of them haven't been flaky for the last couple dozen runs (e.g. perf/typing-at-end-of-line.html). I've been waiting for it to stabilize before removing things from TestExpectations. In general, I'm not sure how I feel about the LayoutTests/perf tests. Some of them are never flaky. But, ~30% or so are always flaky. It does give a way to add regression tests when making order of magnitude performance improvements.
Dirk Pranke
Comment 16 2012-10-08 18:48:12 PDT
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > Looks like a couple perf tests started consistently failing after this. > > > > > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf > > > > I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. > > They're not timing out, so marking them slow won't help. perf/mouse-event.html was consistently passing before this patch. Good point. Okay, so there weren't any other tests that started failing or became flakier? It only looked like the one to me. > The way these tests work is to run at different magnitudes and try to compare to see if they're constant, linear, polynomial, etc. So, if they don't have a whole core to themselves, they're likely to be more flaky. > Even without this change, assuming these tests had a dedicated core is a dangerous assumption. As I said in comment #5, I would rather approach this problem by splitting this out into "tests that are load-sensitive" vs. "tests that need the server lock" and try to solve the problem that way. It would not be hard to implement at least a coarse "run these tests by themselves" mechanism and I think it would be generally useful. Do you think we should revert this change in the mean time, or suppress the new failure in the meantime?
Dirk Pranke
Comment 17 2012-10-09 12:24:37 PDT
(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #14) > > > (In reply to comment #13) > > > > Looks like a couple perf tests started consistently failing after this. > > > > > > > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf > > > > > > I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. > > > > They're not timing out, so marking them slow won't help. perf/mouse-event.html was consistently passing before this patch. As an aside, we normally skip the perf tests in debug, but this one was getting run due to an overriding expectation of flakiness on Release that wasn't properly scoped. I've fixed the expectation in http://trac.webkit.org/changeset/130793 , but that doesn't change the general nature of the problem.
Ojan Vafai
Comment 18 2012-10-09 12:26:36 PDT
I didn't notice that this was a debug bot. Nevermind. We can ignore it. I should probably update TestExpectations.
Dirk Pranke
Comment 19 2012-10-09 12:31:33 PDT
(In reply to comment #18) > I didn't notice that this was a debug bot. Nevermind. We can ignore it. I should probably update TestExpectations. I did already. See above :).
Note You need to log in before you can comment on or make changes to this bug.