63116 – nrwt: allow for multiple http shards

RESOLVED FIXED 63116

nrwt: allow for multiple http shards

https://bugs.webkit.org/show_bug.cgi?id=63116

Summary nrwt: allow for multiple http shards

Dirk Pranke

Reported 2011-06-21 20:52:44 PDT

nrwt: allow for multiple http shards

Attachments
checkpoint work; needs more testing (8.48 KB, patch) 2011-06-21 20:55 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
clean up logic, add tests (16.74 KB, patch) 2011-06-22 20:04 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
update w/ review feedback from tony (19.12 KB, patch) 2011-06-24 16:31 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
try to make _resize_shards clearer (19.30 KB, patch) 2011-06-24 19:31 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
update w/ more feedback from tony (19.28 KB, patch) 2011-06-27 15:35 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
patch for landing after rebasing to HEAD (19.96 KB, patch) 2011-06-30 19:55 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2011-06-21 20:55:36 PDT

Created attachment 98112 [details] checkpoint work; needs more testing

Dirk Pranke

Comment 2 2011-06-21 21:10:49 PDT

FYI, the final version of the new sharding logic will probably look something like this, although this needs more tests and I think I can clean it up more. The basic idea is to limit the # of concurrent HTTP tests to something sane like max(# cpus/4, 1) so that we don't hammer the server too much and increase flakiness. I think even splitting the HTTP tests up into 4 chunks should be enough to make them not on the critical path, but I haven't tested the perf improvements yet.

Tony Chang

Comment 3 2011-06-22 10:39:45 PDT

Are both lighttpd and apache2 configured to handle a reasonable number of concurrent connections (maybe 3 times the number of DRT processes)? Are there any tests that try to set/get cookies that might start failing?

Dirk Pranke

Comment 4 2011-06-22 10:56:49 PDT

(In reply to comment #3) > Are both lighttpd and apache2 configured to handle a reasonable number of concurrent connections (maybe 3 times the number of DRT processes)? Are there any tests that try to set/get cookies that might start failing? Good questions. I don't yet know the answers. Presumably the answer to the cookie question is yes :). However, given that we've had --experimental-fully-parallel forever and I've not noticed any real problems with http specifically, I think we're probably fine, but definitely more testing is needed before we turn this on by default.

Eric Seidel (no email)

Comment 5 2011-06-22 10:58:58 PDT

There are php-based cookie tests which are known to be flaky in our current setup. I could forsee them failing in a sharded environment. Unclear. See https://bugs.webkit.org/show_bug.cgi?id=51613 for one example.

Dirk Pranke

Comment 6 2011-06-22 20:04:50 PDT

Created attachment 98302 [details] clean up logic, add tests

Tony Chang

Comment 7 2011-06-24 15:16:26 PDT

Comment on attachment 98302 [details] clean up logic, add tests View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > Tools/ChangeLog:20 > + nrwt: make sharding tests needing locks less hard-coded > + https://bugs.webkit.org/show_bug.cgi?id=63112 Is this ChangeLog entry supposed to be here? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:543 > Return: > Two lists of lists of TestInput objects. The first list should > only be run under the server lock, the second can be run whenever. This patch might be the right time to make this return type into a small class. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > + def _resize_shards(self, shards, max_shards): What is |shards|? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:633 > + i = 1 > + j = 1 Please use more descriptive variable names. It looks like i is current directory count and j is number of shards. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:637 > + tests.extend(shard[1]) It's not clear to me what shard[1]. I assume it's part of a tuple (so it's always valid), but I can't tell. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > + new_shards.append(('locked_shard_%d' % j, tests)) Can we compute j based on the size of new_shards?

Ojan Vafai

Comment 8 2011-06-24 15:28:13 PDT

View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:554 > + if num_workers == 1: > + return self._shard_in_two(test_files) > + elif fully_parallel: > + return self._shard_every_file(test_files) > + else: > + return self._shard_by_directory(test_files, num_workers) No need for the else's since you're early returning as per http://www.webkit.org/coding/coding-style.html. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:630 > + dirs_per_new_shard = math.ceil(len(shards) / max_shards * 1.0) Why the "* 1.0"? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:636 > + while shards: > + shard = shards.pop() Can you just do "for shard in shards:"? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > + new_shards.append(('locked_shard_%d' % j, tests)) Instead of manually keeping track of "j", can you just use len(new_shards)? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:645 > + if tests: > + new_shards.append(('locked_shard_%d' % j, tests)) I think you could simplify this to not need this last case if you did something like: for shard in shards: if not tests: new_shards.append(...) tests.extend(...) i += 1 if i == dirs_per_new_shard: tests = []

Dirk Pranke

Comment 9 2011-06-24 16:26:08 PDT

Comment on attachment 98302 [details] clean up logic, add tests View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review >> Tools/ChangeLog:20 >> + https://bugs.webkit.org/show_bug.cgi?id=63112 > > Is this ChangeLog entry supposed to be here? No. I think this maybe crept in in some merge/rebasing effort. I'll remove it. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:543 >> only be run under the server lock, the second can be run whenever. > > This patch might be the right time to make this return type into a small class. I started to do this as part of revising the patch per your feedback, but it turns out that this just makes things more complicated: you end up creating a class that holds two fields, each of which is a list, and really all you want to do is extract each list and then add them together, the calling code gets much worse. However, I think partially this code looks complex because each shard itself is an anonymous data structure. I've changed that so that there is now a TestShard class (a named list of TestInputs) and I think that ends up making things clearer. Let me know what you think. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 >> + def _resize_shards(self, shards, max_shards): > > What is |shards|? I will rewrite everything to be clearer. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:633 >> + j = 1 > > Please use more descriptive variable names. It looks like i is current directory count and j is number of shards. Will do. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:637 >> + tests.extend(shard[1]) > > It's not clear to me what shard[1]. I assume it's part of a tuple (so it's always valid), but I can't tell. This should become clearer. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 >> + new_shards.append(('locked_shard_%d' % j, tests)) > > Can we compute j based on the size of new_shards? Yeah.

Dirk Pranke

Comment 10 2011-06-24 16:31:12 PDT

Created attachment 98560 [details] update w/ review feedback from tony

Dirk Pranke

Comment 11 2011-06-24 16:36:58 PDT

(In reply to comment #8) > View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:554 > > + if num_workers == 1: > > + return self._shard_in_two(test_files) > > + elif fully_parallel: > > + return self._shard_every_file(test_files) > > + else: > > + return self._shard_by_directory(test_files, num_workers) > > No need for the else's since you're early returning as per http://www.webkit.org/coding/coding-style.html. > True. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:630 > > + dirs_per_new_shard = math.ceil(len(shards) / max_shards * 1.0) > > Why the "* 1.0"? > Otherwise you get an integer division and it truncates down, making the ceil() call useless. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:636 > > + while shards: > > + shard = shards.pop() > > Can you just do "for shard in shards:"? > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > > + new_shards.append(('locked_shard_%d' % j, tests)) > > Instead of manually keeping track of "j", can you just use len(new_shards)? > Yeah, I made these two changes in the new patch. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:645 > > + if tests: > > + new_shards.append(('locked_shard_%d' % j, tests)) > > I think you could simplify this to not need this last case if you did something like: > for shard in shards: > if not tests: > new_shards.append(...) > tests.extend(...) > i += 1 > if i == dirs_per_new_shard: > tests = [] Hm. I will ponder this one :)

Dirk Pranke

Comment 12 2011-06-24 19:31:58 PDT

Created attachment 98571 [details] try to make _resize_shards clearer

Tony Chang

Comment 13 2011-06-27 09:41:08 PDT

Comment on attachment 98571 [details] try to make _resize_shards clearer View in context: https://bugs.webkit.org/attachment.cgi?id=98571&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:241 > + # FIXME: make this class visible, used by workers as well. Nit: make -> Make > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:244 > + def __init__(self, name, test_inputs): > + self.name = name > + self.test_inputs = test_inputs Nit: Maybe document that test_inputs is a list of TestInputs? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:610 > + locked_shards = [] > + unlocked_shards = [] Nit: I would move these lines between the for loop to so it's closer to where it's first used. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > + locked_shards.sort(key=lambda shard: shard.name) > + unlocked_shards.sort(key=lambda shard: shard.name) Should we implement __cmp__ for TestShard? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:657 > + return int(math.ceil(numerator * 1.0 / divisor)) Nit: I think float(numerator) is more explicit than * 1.0. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:759 > + # FIXME: change 'test_list' to 'shard', make sharding public Nit: change -> Change and public -> public.

Dirk Pranke

Comment 14 2011-06-27 15:33:06 PDT

(In reply to comment #13) > (From update of attachment 98571 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98571&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:241 > > + # FIXME: make this class visible, used by workers as well. > > Nit: make -> Make > Done. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:244 > > + def __init__(self, name, test_inputs): > > + self.name = name > > + self.test_inputs = test_inputs > > Nit: Maybe document that test_inputs is a list of TestInputs? > There's a class-level docstring right above the __init__ that says that the object is a named list of TestInputs; between that and the variable/field names that seemed to cover things well enough, I thought. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:610 > > + locked_shards = [] > > + unlocked_shards = [] > > Nit: I would move these lines between the for loop to so it's closer to where it's first used. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > > + locked_shards.sort(key=lambda shard: shard.name) > > + unlocked_shards.sort(key=lambda shard: shard.name) > > Should we implement __cmp__ for TestShard? > I wouldn't just yet; I'm not sure __cmp__ makes a lot of sense as a generic operation, and there's no obvious need for it. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:657 > > + return int(math.ceil(numerator * 1.0 / divisor)) > > Nit: I think float(numerator) is more explicit than * 1.0. > Good call. Done > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:759 > > + # FIXME: change 'test_list' to 'shard', make sharding public > > Nit: change -> Change and public -> public. Done.

Dirk Pranke

Comment 15 2011-06-27 15:35:54 PDT

Created attachment 98804 [details] update w/ more feedback from tony

Dirk Pranke

Comment 16 2011-06-30 19:55:43 PDT

Created attachment 99420 [details] patch for landing after rebasing to HEAD

Dirk Pranke

Comment 17 2011-07-08 14:12:54 PDT

Fixed in r90419.

Eric Seidel (no email)

Comment 18 2012-01-10 10:31:12 PST

IMO the right way to fix all of this http sharding is to use port redirection in the DRT instance, and to start the http server with a custom port. That's more work, but is a better long term solution than any of these locks or hacks we've added in the past. :)

Dirk Pranke

Comment 19 2012-01-10 11:28:52 PST

(In reply to comment #18) > IMO the right way to fix all of this http sharding is to use port redirection in the DRT instance, and to start the http server with a custom port. That's more work, but is a better long term solution than any of these locks or hacks we've added in the past. :) Eric, did you mean for this comment to be on bug 75958 instead? At any rate, I think you're conflating two things ... this feature is about being able to run multiple http tests in parallel in a single NRWT run; the reason we don't do that today is because we're concerned about overloading the http server and/or inter-test dependencies. I don't know if running on different http ports would make this better or worse, but it shouldn't be necessary, regardless. I think you're talking about running multiple http servers on different ports so that multiple separate nrwt runs can occur in parallel, right?

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS All

Product WebKit

Component New Bugs

Assignee

Dirk Pranke

Reported

2011-06-21 20:52 PDT

Modified

2012-01-10 11:28 PST History

CC List

10 users Show

URL

Keywords

Depends on

63112 64092

Blocks

75958

Dependencies

tree graph