98562 – nrwt: [chromium] run http tests in parallel on bigger machines

RESOLVED FIXED 98562

nrwt: [chromium] run http tests in parallel on bigger machines

https://bugs.webkit.org/show_bug.cgi?id=98562

Summary nrwt: [chromium] run http tests in parallel on bigger machines

Dirk Pranke

Reported 2012-10-05 15:21:22 PDT

nrwt: [chromium] run http tests in parallel on bigger machines

Attachments
Patch (8.39 KB, patch) 2012-10-05 15:31 PDT, Dirk Pranke	eric: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2012-10-05 15:31:29 PDT

Created attachment 167396 [details] Patch

Dirk Pranke

Comment 2 2012-10-05 15:37:24 PDT

specifically, this should only affect the Mac 10.7 bot (which has 8 workers, so we'll run 2 http tests in parallel) and the Linux dbg bot (which runs 24 works, so 6 http tests in parallel -> the debug bot is a beast).

Dirk Pranke

Comment 3 2012-10-05 15:38:01 PDT

if we see any increased flakiness on those bots, we can back the change out, otherwise we can look at adding more workers to the other debug bots.

Tony Chang

Comment 4 2012-10-05 15:47:01 PDT

A couple weeks ago, I was testing setting --max-locked-shards=2 on my Z620 and found that it made the perf tests flaky, since they run in the same shard. Do you see any flakiness in the perf tests on your machine?

Dirk Pranke

Comment 5 2012-10-05 15:58:21 PDT

I am running some tests to see. In reality I think we need to separate out the concepts of "test needs the lock" from "test should not be run while other tests are running", and we should probably strive to not have any tests fall in the latter category (although they might be necessary for some periods of time). Otherwise we'll always have bottlenecks in the cycle time.

Vincent Scheib

Comment 6 2012-10-05 16:21:26 PDT

Comment on attachment 167396 [details] Patch Dirk made me!

Vincent Scheib

Comment 7 2012-10-05 16:21:58 PDT

Comment on attachment 167396 [details] Patch Test complete.

Dirk Pranke

Comment 8 2012-10-05 16:23:25 PDT

(we were testing if a non-reviewer can r+ bugs ...)

Dirk Pranke

Comment 9 2012-10-05 16:37:09 PDT

So, ironically, on Linux on my z600 in a release build, I see flakiness in http and storage, but not perf (and I always get flakiness in http and storage). I don't see any flakiness on the Mac at all (again in Release).

Eric Seidel (no email)

Comment 10 2012-10-08 11:50:58 PDT

Comment on attachment 167396 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=167396&action=review LGTM. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:120 > + def default_max_locked_shards(self): > + """Return the number of "locked" shards to run in parallel (like the http tests).""" > + max_locked_shards = int(self.default_child_processes()) / 4 > + if not max_locked_shards: > + return 1 > + return max_locked_shards I assume the plan is to move this logic down once tested in the field?

Dirk Pranke

Comment 11 2012-10-08 11:53:25 PDT

(In reply to comment #10) > (From update of attachment 167396 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=167396&action=review > > LGTM. > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:120 > > + def default_max_locked_shards(self): > > + """Return the number of "locked" shards to run in parallel (like the http tests).""" > > + max_locked_shards = int(self.default_child_processes()) / 4 > > + if not max_locked_shards: > > + return 1 > > + return max_locked_shards > > I assume the plan is to move this logic down once tested in the field? If by "down" you mean into base.py and "tested in the field" you mean "works stably on other ports", then yes :).

Dirk Pranke

Comment 12 2012-10-08 15:06:00 PDT

Committed r130690: <http://trac.webkit.org/changeset/130690>

Ojan Vafai

Comment 13 2012-10-08 16:47:48 PDT

Looks like a couple perf tests started consistently failing after this. http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf

Dirk Pranke

Comment 14 2012-10-08 17:46:50 PDT

(In reply to comment #13) > Looks like a couple perf tests started consistently failing after this. > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. In the other cases, it just looks to me like the tests have been flaky, period, and are mostly marked as PASS FAIL. So, I'm not sure why we're even running these?

Ojan Vafai

Comment 15 2012-10-08 18:18:41 PDT

(In reply to comment #14) > (In reply to comment #13) > > Looks like a couple perf tests started consistently failing after this. > > > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf > > I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. They're not timing out, so marking them slow won't help. perf/mouse-event.html was consistently passing before this patch. The way these tests work is to run at different magnitudes and try to compare to see if they're constant, linear, polynomial, etc. So, if they don't have a whole core to themselves, they're likely to be more flaky. > In the other cases, it just looks to me like the tests have been flaky, period, and are mostly marked as PASS FAIL. > > So, I'm not sure why we're even running these? It's a little confusing because I recently committed a couple patches to reduce flakiness. So, you can see that some of them haven't been flaky for the last couple dozen runs (e.g. perf/typing-at-end-of-line.html). I've been waiting for it to stabilize before removing things from TestExpectations. In general, I'm not sure how I feel about the LayoutTests/perf tests. Some of them are never flaky. But, ~30% or so are always flaky. It does give a way to add regression tests when making order of magnitude performance improvements.

Dirk Pranke

Comment 16 2012-10-08 18:48:12 PDT

(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #13) > > > Looks like a couple perf tests started consistently failing after this. > > > > > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf > > > > I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. > > They're not timing out, so marking them slow won't help. perf/mouse-event.html was consistently passing before this patch. Good point. Okay, so there weren't any other tests that started failing or became flakier? It only looked like the one to me. > The way these tests work is to run at different magnitudes and try to compare to see if they're constant, linear, polynomial, etc. So, if they don't have a whole core to themselves, they're likely to be more flaky. > Even without this change, assuming these tests had a dedicated core is a dangerous assumption. As I said in comment #5, I would rather approach this problem by splitting this out into "tests that are load-sensitive" vs. "tests that need the server lock" and try to solve the problem that way. It would not be hard to implement at least a coarse "run these tests by themselves" mechanism and I think it would be generally useful. Do you think we should revert this change in the mean time, or suppress the new failure in the meantime?

Dirk Pranke

Comment 17 2012-10-09 12:24:37 PDT

(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #14) > > > (In reply to comment #13) > > > > Looks like a couple perf tests started consistently failing after this. > > > > > > > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=%5Eperf > > > > > > I'm not sure exactly what you're seeing here, so could you confirm you're seeing what I'm seeing? Namely, it looks like maybe perf/mouse-event started failing consistently on Linux (dbg), but it appears to be failing the same way that it failed on Linux and Linux 32, so maybe it should be marked as SLOW. > > > > They're not timing out, so marking them slow won't help. perf/mouse-event.html was consistently passing before this patch. As an aside, we normally skip the perf tests in debug, but this one was getting run due to an overriding expectation of flakiness on Release that wasn't properly scoped. I've fixed the expectation in http://trac.webkit.org/changeset/130793 , but that doesn't change the general nature of the problem.

Ojan Vafai

Comment 18 2012-10-09 12:26:36 PDT

I didn't notice that this was a debug bot. Nevermind. We can ignore it. I should probably update TestExpectations.

Dirk Pranke

Comment 19 2012-10-09 12:31:33 PDT

(In reply to comment #18) > I didn't notice that this was a debug bot. Nevermind. We can ignore it. I should probably update TestExpectations. I did already. See above :).

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Dirk Pranke

Reported

2012-10-05 15:21 PDT

Modified

2012-10-09 12:31 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks