nrwt: allow for multiple http shards
Created attachment 98112 [details] checkpoint work; needs more testing
FYI, the final version of the new sharding logic will probably look something like this, although this needs more tests and I think I can clean it up more. The basic idea is to limit the # of concurrent HTTP tests to something sane like max(# cpus/4, 1) so that we don't hammer the server too much and increase flakiness. I think even splitting the HTTP tests up into 4 chunks should be enough to make them not on the critical path, but I haven't tested the perf improvements yet.
Are both lighttpd and apache2 configured to handle a reasonable number of concurrent connections (maybe 3 times the number of DRT processes)? Are there any tests that try to set/get cookies that might start failing?
(In reply to comment #3) > Are both lighttpd and apache2 configured to handle a reasonable number of concurrent connections (maybe 3 times the number of DRT processes)? Are there any tests that try to set/get cookies that might start failing? Good questions. I don't yet know the answers. Presumably the answer to the cookie question is yes :). However, given that we've had --experimental-fully-parallel forever and I've not noticed any real problems with http specifically, I think we're probably fine, but definitely more testing is needed before we turn this on by default.
There are php-based cookie tests which are known to be flaky in our current setup. I could forsee them failing in a sharded environment. Unclear. See https://bugs.webkit.org/show_bug.cgi?id=51613 for one example.
Created attachment 98302 [details] clean up logic, add tests
Comment on attachment 98302 [details] clean up logic, add tests View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > Tools/ChangeLog:20 > + nrwt: make sharding tests needing locks less hard-coded > + https://bugs.webkit.org/show_bug.cgi?id=63112 Is this ChangeLog entry supposed to be here? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:543 > Return: > Two lists of lists of TestInput objects. The first list should > only be run under the server lock, the second can be run whenever. This patch might be the right time to make this return type into a small class. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > + def _resize_shards(self, shards, max_shards): What is |shards|? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:633 > + i = 1 > + j = 1 Please use more descriptive variable names. It looks like i is current directory count and j is number of shards. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:637 > + tests.extend(shard[1]) It's not clear to me what shard[1]. I assume it's part of a tuple (so it's always valid), but I can't tell. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > + new_shards.append(('locked_shard_%d' % j, tests)) Can we compute j based on the size of new_shards?
View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:554 > + if num_workers == 1: > + return self._shard_in_two(test_files) > + elif fully_parallel: > + return self._shard_every_file(test_files) > + else: > + return self._shard_by_directory(test_files, num_workers) No need for the else's since you're early returning as per http://www.webkit.org/coding/coding-style.html. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:630 > + dirs_per_new_shard = math.ceil(len(shards) / max_shards * 1.0) Why the "* 1.0"? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:636 > + while shards: > + shard = shards.pop() Can you just do "for shard in shards:"? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > + new_shards.append(('locked_shard_%d' % j, tests)) Instead of manually keeping track of "j", can you just use len(new_shards)? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:645 > + if tests: > + new_shards.append(('locked_shard_%d' % j, tests)) I think you could simplify this to not need this last case if you did something like: for shard in shards: if not tests: new_shards.append(...) tests.extend(...) i += 1 if i == dirs_per_new_shard: tests = []
Comment on attachment 98302 [details] clean up logic, add tests View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review >> Tools/ChangeLog:20 >> + https://bugs.webkit.org/show_bug.cgi?id=63112 > > Is this ChangeLog entry supposed to be here? No. I think this maybe crept in in some merge/rebasing effort. I'll remove it. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:543 >> only be run under the server lock, the second can be run whenever. > > This patch might be the right time to make this return type into a small class. I started to do this as part of revising the patch per your feedback, but it turns out that this just makes things more complicated: you end up creating a class that holds two fields, each of which is a list, and really all you want to do is extract each list and then add them together, the calling code gets much worse. However, I think partially this code looks complex because each shard itself is an anonymous data structure. I've changed that so that there is now a TestShard class (a named list of TestInputs) and I think that ends up making things clearer. Let me know what you think. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 >> + def _resize_shards(self, shards, max_shards): > > What is |shards|? I will rewrite everything to be clearer. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:633 >> + j = 1 > > Please use more descriptive variable names. It looks like i is current directory count and j is number of shards. Will do. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:637 >> + tests.extend(shard[1]) > > It's not clear to me what shard[1]. I assume it's part of a tuple (so it's always valid), but I can't tell. This should become clearer. >> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 >> + new_shards.append(('locked_shard_%d' % j, tests)) > > Can we compute j based on the size of new_shards? Yeah.
Created attachment 98560 [details] update w/ review feedback from tony
(In reply to comment #8) > View in context: https://bugs.webkit.org/attachment.cgi?id=98302&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:554 > > + if num_workers == 1: > > + return self._shard_in_two(test_files) > > + elif fully_parallel: > > + return self._shard_every_file(test_files) > > + else: > > + return self._shard_by_directory(test_files, num_workers) > > No need for the else's since you're early returning as per http://www.webkit.org/coding/coding-style.html. > True. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:630 > > + dirs_per_new_shard = math.ceil(len(shards) / max_shards * 1.0) > > Why the "* 1.0"? > Otherwise you get an integer division and it truncates down, making the ceil() call useless. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:636 > > + while shards: > > + shard = shards.pop() > > Can you just do "for shard in shards:"? > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:640 > > + new_shards.append(('locked_shard_%d' % j, tests)) > > Instead of manually keeping track of "j", can you just use len(new_shards)? > Yeah, I made these two changes in the new patch. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:645 > > + if tests: > > + new_shards.append(('locked_shard_%d' % j, tests)) > > I think you could simplify this to not need this last case if you did something like: > for shard in shards: > if not tests: > new_shards.append(...) > tests.extend(...) > i += 1 > if i == dirs_per_new_shard: > tests = [] Hm. I will ponder this one :)
Created attachment 98571 [details] try to make _resize_shards clearer
Comment on attachment 98571 [details] try to make _resize_shards clearer View in context: https://bugs.webkit.org/attachment.cgi?id=98571&action=review > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:241 > + # FIXME: make this class visible, used by workers as well. Nit: make -> Make > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:244 > + def __init__(self, name, test_inputs): > + self.name = name > + self.test_inputs = test_inputs Nit: Maybe document that test_inputs is a list of TestInputs? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:610 > + locked_shards = [] > + unlocked_shards = [] Nit: I would move these lines between the for loop to so it's closer to where it's first used. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > + locked_shards.sort(key=lambda shard: shard.name) > + unlocked_shards.sort(key=lambda shard: shard.name) Should we implement __cmp__ for TestShard? > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:657 > + return int(math.ceil(numerator * 1.0 / divisor)) Nit: I think float(numerator) is more explicit than * 1.0. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:759 > + # FIXME: change 'test_list' to 'shard', make sharding public Nit: change -> Change and public -> public.
(In reply to comment #13) > (From update of attachment 98571 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=98571&action=review > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:241 > > + # FIXME: make this class visible, used by workers as well. > > Nit: make -> Make > Done. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:244 > > + def __init__(self, name, test_inputs): > > + self.name = name > > + self.test_inputs = test_inputs > > Nit: Maybe document that test_inputs is a list of TestInputs? > There's a class-level docstring right above the __init__ that says that the object is a named list of TestInputs; between that and the variable/field names that seemed to cover things well enough, I thought. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:610 > > + locked_shards = [] > > + unlocked_shards = [] > > Nit: I would move these lines between the for loop to so it's closer to where it's first used. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:629 > > + locked_shards.sort(key=lambda shard: shard.name) > > + unlocked_shards.sort(key=lambda shard: shard.name) > > Should we implement __cmp__ for TestShard? > I wouldn't just yet; I'm not sure __cmp__ makes a lot of sense as a generic operation, and there's no obvious need for it. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:657 > > + return int(math.ceil(numerator * 1.0 / divisor)) > > Nit: I think float(numerator) is more explicit than * 1.0. > Good call. Done > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:759 > > + # FIXME: change 'test_list' to 'shard', make sharding public > > Nit: change -> Change and public -> public. Done.
Created attachment 98804 [details] update w/ more feedback from tony
Created attachment 99420 [details] patch for landing after rebasing to HEAD
Fixed in r90419.
IMO the right way to fix all of this http sharding is to use port redirection in the DRT instance, and to start the http server with a custom port. That's more work, but is a better long term solution than any of these locks or hacks we've added in the past. :)
(In reply to comment #18) > IMO the right way to fix all of this http sharding is to use port redirection in the DRT instance, and to start the http server with a custom port. That's more work, but is a better long term solution than any of these locks or hacks we've added in the past. :) Eric, did you mean for this comment to be on bug 75958 instead? At any rate, I think you're conflating two things ... this feature is about being able to run multiple http tests in parallel in a single NRWT run; the reason we don't do that today is because we're concerned about overloading the http server and/or inter-test dependencies. I don't know if running on different http ports would make this better or worse, but it shouldn't be necessary, regardless. I think you're talking about running multiple http servers on different ports so that multiple separate nrwt runs can occur in parallel, right?