RESOLVED FIXED 63116
nrwt: allow for multiple http shards
https://bugs.webkit.org/show_bug.cgi?id=63116
Summary nrwt: allow for multiple http shards
Dirk Pranke
Reported 2011-06-21 20:52:44 PDT
nrwt: allow for multiple http shards
Attachments
checkpoint work; needs more testing (8.48 KB, patch)
2011-06-21 20:55 PDT, Dirk Pranke
no flags
clean up logic, add tests (16.74 KB, patch)
2011-06-22 20:04 PDT, Dirk Pranke
no flags
update w/ review feedback from tony (19.12 KB, patch)
2011-06-24 16:31 PDT, Dirk Pranke
no flags
try to make _resize_shards clearer (19.30 KB, patch)
2011-06-24 19:31 PDT, Dirk Pranke
no flags
update w/ more feedback from tony (19.28 KB, patch)
2011-06-27 15:35 PDT, Dirk Pranke
no flags
patch for landing after rebasing to HEAD (19.96 KB, patch)
2011-06-30 19:55 PDT, Dirk Pranke
no flags
Dirk Pranke
Comment 1 2011-06-21 20:55:36 PDT
Created attachment 98112 [details] checkpoint work; needs more testing
Dirk Pranke
Comment 2 2011-06-21 21:10:49 PDT
FYI, the final version of the new sharding logic will probably look something like this, although this needs more tests and I think I can clean it up more. The basic idea is to limit the # of concurrent HTTP tests to something sane like max(# cpus/4, 1) so that we don't hammer the server too much and increase flakiness. I think even splitting the HTTP tests up into 4 chunks should be enough to make them not on the critical path, but I haven't tested the perf improvements yet.
Tony Chang
Comment 3 2011-06-22 10:39:45 PDT
Are both lighttpd and apache2 configured to handle a reasonable number of concurrent connections (maybe 3 times the number of DRT processes)? Are there any tests that try to set/get cookies that might start failing?
Dirk Pranke
Comment 4 2011-06-22 10:56:49 PDT
(In reply to comment #3) > Are both lighttpd and apache2 configured to handle a reasonable number of concurrent connections (maybe 3 times the number of DRT processes)? Are there any tests that try to set/get cookies that might start failing? Good questions. I don't yet know the answers. Presumably the answer to the cookie question is yes :). However, given that we've had --experimental-fully-parallel forever and I've not noticed any real problems with http specifically, I think we're probably fine, but definitely more testing is needed before we turn this on by default.
Eric Seidel (no email)
Comment 5 2011-06-22 10:58:58 PDT
There are php-based cookie tests which are known to be flaky in our current setup. I could forsee them failing in a sharded environment. Unclear. See https://bugs.webkit.org/show_bug.cgi?id=51613 for one example.
Dirk Pranke
Comment 6 2011-06-22 20:04:50 PDT
Created attachment 98302 [details] clean up logic, add tests
Tony Chang
Comment 7 2011-06-24 15:16:26 PDT
Comment on attachment 98302 [details] clean up logic, add tests View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > Tools/ChangeLog:20 > + nrwt: make sharding tests needing locks less hard-coded > + https://bugs.webkit.org/show_bug.cgi?id=63112 Is this ChangeLog entry supposed to be here? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:543 > Return: > Two lists of lists of TestInput objects. The first list should > only be run under the server lock, the second can be run whenever. This patch might be the right time to make this return type into a small class. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > + def _resize_shards(self, shards, max_shards): What is |shards|? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:633 > + i = 1 > + j = 1 Please use more descriptive variable names. It looks like i is current directory count and j is number of shards. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:637 > + tests.extend(shard[1]) It's not clear to me what shard[1]. I assume it's part of a tuple (so it's always valid), but I can't tell. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > + new_shards.append(('locked_shard_%d' % j, tests)) Can we compute j based on the size of new_shards?
Ojan Vafai
Comment 8 2011-06-24 15:28:13 PDT
View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:554 > + if num_workers == 1: > + return self._shard_in_two(test_files) > + elif fully_parallel: > + return self._shard_every_file(test_files) > + else: > + return self._shard_by_directory(test_files, num_workers) No need for the else's since you're early returning as per http://www.webkit.org/coding/coding-style.html. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:630 > + dirs_per_new_shard = math.ceil(len(shards) / max_shards * 1.0) Why the "* 1.0"? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:636 > + while shards: > + shard = shards.pop() Can you just do "for shard in shards:"? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > + new_shards.append(('locked_shard_%d' % j, tests)) Instead of manually keeping track of "j", can you just use len(new_shards)? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:645 > + if tests: > + new_shards.append(('locked_shard_%d' % j, tests)) I think you could simplify this to not need this last case if you did something like: for shard in shards: if not tests: new_shards.append(...) tests.extend(...) i += 1 if i == dirs_per_new_shard: tests = []
Dirk Pranke
Comment 9 2011-06-24 16:26:08 PDT
Comment on attachment 98302 [details] clean up logic, add tests View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review >> Tools/ChangeLog:20 >> + https://bugs.webkit.org/show_bug.cgi?id=63112 > > Is this ChangeLog entry supposed to be here? No. I think this maybe crept in in some merge/rebasing effort. I'll remove it. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:543 >> only be run under the server lock, the second can be run whenever. > > This patch might be the right time to make this return type into a small class. I started to do this as part of revising the patch per your feedback, but it turns out that this just makes things more complicated: you end up creating a class that holds two fields, each of which is a list, and really all you want to do is extract each list and then add them together, the calling code gets much worse. However, I think partially this code looks complex because each shard itself is an anonymous data structure. I've changed that so that there is now a TestShard class (a named list of TestInputs) and I think that ends up making things clearer. Let me know what you think. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 >> + def _resize_shards(self, shards, max_shards): > > What is |shards|? I will rewrite everything to be clearer. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:633 >> + j = 1 > > Please use more descriptive variable names. It looks like i is current directory count and j is number of shards. Will do. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:637 >> + tests.extend(shard[1]) > > It's not clear to me what shard[1]. I assume it's part of a tuple (so it's always valid), but I can't tell. This should become clearer. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 >> + new_shards.append(('locked_shard_%d' % j, tests)) > > Can we compute j based on the size of new_shards? Yeah.
Dirk Pranke
Comment 10 2011-06-24 16:31:12 PDT
Created attachment 98560 [details] update w/ review feedback from tony
Dirk Pranke
Comment 11 2011-06-24 16:36:58 PDT
(In reply to comment #8) > View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:554 > > + if num_workers == 1: > > + return self._shard_in_two(test_files) > > + elif fully_parallel: > > + return self._shard_every_file(test_files) > > + else: > > + return self._shard_by_directory(test_files, num_workers) > > No need for the else's since you're early returning as per http://www.webkit.org/coding/coding-style.html. > True. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:630 > > + dirs_per_new_shard = math.ceil(len(shards) / max_shards * 1.0) > > Why the "* 1.0"? > Otherwise you get an integer division and it truncates down, making the ceil() call useless. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:636 > > + while shards: > > + shard = shards.pop() > > Can you just do "for shard in shards:"? > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > > + new_shards.append(('locked_shard_%d' % j, tests)) > > Instead of manually keeping track of "j", can you just use len(new_shards)? > Yeah, I made these two changes in the new patch. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:645 > > + if tests: > > + new_shards.append(('locked_shard_%d' % j, tests)) > > I think you could simplify this to not need this last case if you did something like: > for shard in shards: > if not tests: > new_shards.append(...) > tests.extend(...) > i += 1 > if i == dirs_per_new_shard: > tests = [] Hm. I will ponder this one :)
Dirk Pranke
Comment 12 2011-06-24 19:31:58 PDT
Created attachment 98571 [details] try to make _resize_shards clearer
Tony Chang
Comment 13 2011-06-27 09:41:08 PDT
Comment on attachment 98571 [details] try to make _resize_shards clearer View in context: https://bugs.webkit.org/attachment.cgi?id=98571&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:241 > + # FIXME: make this class visible, used by workers as well. Nit: make -> Make > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:244 > + def __init__(self, name, test_inputs): > + self.name = name > + self.test_inputs = test_inputs Nit: Maybe document that test_inputs is a list of TestInputs? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:610 > + locked_shards = [] > + unlocked_shards = [] Nit: I would move these lines between the for loop to so it's closer to where it's first used. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > + locked_shards.sort(key=lambda shard: shard.name) > + unlocked_shards.sort(key=lambda shard: shard.name) Should we implement __cmp__ for TestShard? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:657 > + return int(math.ceil(numerator * 1.0 / divisor)) Nit: I think float(numerator) is more explicit than * 1.0. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:759 > + # FIXME: change 'test_list' to 'shard', make sharding public Nit: change -> Change and public -> public.
Dirk Pranke
Comment 14 2011-06-27 15:33:06 PDT
(In reply to comment #13) > (From update of attachment 98571 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98571&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:241 > > + # FIXME: make this class visible, used by workers as well. > > Nit: make -> Make > Done. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:244 > > + def __init__(self, name, test_inputs): > > + self.name = name > > + self.test_inputs = test_inputs > > Nit: Maybe document that test_inputs is a list of TestInputs? > There's a class-level docstring right above the __init__ that says that the object is a named list of TestInputs; between that and the variable/field names that seemed to cover things well enough, I thought. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:610 > > + locked_shards = [] > > + unlocked_shards = [] > > Nit: I would move these lines between the for loop to so it's closer to where it's first used. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > > + locked_shards.sort(key=lambda shard: shard.name) > > + unlocked_shards.sort(key=lambda shard: shard.name) > > Should we implement __cmp__ for TestShard? > I wouldn't just yet; I'm not sure __cmp__ makes a lot of sense as a generic operation, and there's no obvious need for it. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:657 > > + return int(math.ceil(numerator * 1.0 / divisor)) > > Nit: I think float(numerator) is more explicit than * 1.0. > Good call. Done > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:759 > > + # FIXME: change 'test_list' to 'shard', make sharding public > > Nit: change -> Change and public -> public. Done.
Dirk Pranke
Comment 15 2011-06-27 15:35:54 PDT
Created attachment 98804 [details] update w/ more feedback from tony
Dirk Pranke
Comment 16 2011-06-30 19:55:43 PDT
Created attachment 99420 [details] patch for landing after rebasing to HEAD
Dirk Pranke
Comment 17 2011-07-08 14:12:54 PDT
Fixed in r90419.
Eric Seidel (no email)
Comment 18 2012-01-10 10:31:12 PST
IMO the right way to fix all of this http sharding is to use port redirection in the DRT instance, and to start the http server with a custom port. That's more work, but is a better long term solution than any of these locks or hacks we've added in the past. :)
Dirk Pranke
Comment 19 2012-01-10 11:28:52 PST
(In reply to comment #18) > IMO the right way to fix all of this http sharding is to use port redirection in the DRT instance, and to start the http server with a custom port. That's more work, but is a better long term solution than any of these locks or hacks we've added in the past. :) Eric, did you mean for this comment to be on bug 75958 instead? At any rate, I think you're conflating two things ... this feature is about being able to run multiple http tests in parallel in a single NRWT run; the reason we don't do that today is because we're concerned about overloading the http server and/or inter-test dependencies. I don't know if running on different http ports would make this better or worse, but it shouldn't be necessary, regardless. I think you're talking about running multiple http servers on different ports so that multiple separate nrwt runs can occur in parallel, right?
Note You need to log in before you can comment on or make changes to this bug.