WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
54071
nrwt multiprocessing: spawn multiple workers
https://bugs.webkit.org/show_bug.cgi?id=54071
Summary
nrwt multiprocessing: spawn multiple workers
Dirk Pranke
Reported
2011-02-08 23:44:47 PST
nrwt multiprocessing: spawn multiple workers
Attachments
Patch
(5.31 KB, patch)
2011-02-08 23:46 PST
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
reset state properly in _run_tests()
(5.30 KB, patch)
2011-02-09 00:21 PST
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
update w/ feedback from tony
(6.77 KB, patch)
2011-02-11 14:44 PST
,
Dirk Pranke
no flags
Details
Formatted Diff
Diff
rename fields per feedback from tony
(6.91 KB, patch)
2011-02-11 17:42 PST
,
Dirk Pranke
tony
: review+
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Dirk Pranke
Comment 1
2011-02-08 23:46:53 PST
Created
attachment 81760
[details]
Patch
Dirk Pranke
Comment 2
2011-02-09 00:21:00 PST
Created
attachment 81762
[details]
reset state properly in _run_tests()
Tony Chang
Comment 3
2011-02-11 13:24:14 PST
Comment on
attachment 81762
[details]
reset state properly in _run_tests() View in context:
https://bugs.webkit.org/attachment.cgi?id=81762&action=review
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:109 > + w = manager_connection.start_worker(worker_number) > + ws = _WorkerState(worker_number, w) > + self._workers[w.name] = ws
w -> worker, ws -> worker_state
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:113 > + # We sleep a bit in between workers to give the processes a chance > + # to start up without thrashing. > + time.sleep(0.1)
This seems prone to a race. What type of thrashing happens if you don't sleep?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:119 > - manager_connection.post_message('stop') > + for i in xrange(num_workers): > + manager_connection.post_message('stop')
Why do we have to post multiple stop messages?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:144 > + for w in self._workers.values(): > + if not w.done: > + done = False
Can you use any() + list comprehension here?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:147 > + # FIXME: is the following line safe? > + # self._done = done
Can you elaborate on this?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:150 > def handle_started_test(self, src, test_info, hang_timeout):
src -> source
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:156 > + w = self._workers[src] > + w.done = True
w -> worker, src -> source
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:158 > def handle_exception(self, src, exception_info):
src -> source
Dirk Pranke
Comment 4
2011-02-11 14:16:47 PST
(In reply to
comment #3
)
> (From update of
attachment 81762
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=81762&action=review
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:109 > > + w = manager_connection.start_worker(worker_number) > > + ws = _WorkerState(worker_number, w) > > + self._workers[w.name] = ws > > w -> worker, ws -> worker_state >
Will do.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:113 > > + # We sleep a bit in between workers to give the processes a chance > > + # to start up without thrashing. > > + time.sleep(0.1) > > This seems prone to a race. What type of thrashing happens if you don't sleep?
> I don't really understand what is going on, but it appears that the DumpRenderTrees get into weird states and end up timing out on their first couple tests before starting to run smoothly.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:119 > > - manager_connection.post_message('stop') > > + for i in xrange(num_workers): > > + manager_connection.post_message('stop') > > Why do we have to post multiple stop messages? >
One for each worker. Because of the ordering of the messages, and the fact that a worker will stop reading messages after receiving a 'stop', this is sufficient to guarantee that every worker will receive a stop. I'll add a comment to this effect.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:144 > > + for w in self._workers.values(): > > + if not w.done: > > + done = False > > Can you use any() + list comprehension here? >
Sure.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:147 > > + # FIXME: is the following line safe? > > + # self._done = done > > Can you elaborate on this? >
I will add a better comment. Basically, at the moment this code always returns False for TestRunner.is_done(), which means that broker_connect.run_message_loop() will never exit early. This is fine, since we run with timeouts, so in the worst case we bail out within a second. However, I think that if we set _done here, we will safely bail out as soon as the last "done" message is processed. I just hadn't convinced myself 100% that this was true.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:150 > > def handle_started_test(self, src, test_info, hang_timeout): > > src -> source
> Will do.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:156 > > + w = self._workers[src] > > + w.done = True > > w -> worker, src -> source
> Will do.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:158 > > def handle_exception(self, src, exception_info): > > src -> source
Will do.
Dirk Pranke
Comment 5
2011-02-11 14:35:02 PST
(In reply to
comment #4
)
> (In reply to
comment #3
) > > (From update of
attachment 81762
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=81762&action=review
> > > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:109 > > > + w = manager_connection.start_worker(worker_number) > > > + ws = _WorkerState(worker_number, w) > > > + self._workers[w.name] = ws > > > > w -> worker, ws -> worker_state > > > > Will do. >
Oh yeah, I used 'w' because 'worker' is already used to name the module. I'll use 'worker_obj' instead.
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:144 > > > + for w in self._workers.values(): > > > + if not w.done: > > > + done = False > > > > Can you use any() + list comprehension here? > > > > Sure.
Actually, the block inside the loop gets more complicated when we check to see if the worker is wedged, but I think I can pull that into a worker_is_done_or_wedged() function.
Dirk Pranke
Comment 6
2011-02-11 14:44:43 PST
Created
attachment 82183
[details]
update w/ feedback from tony
Tony Chang
Comment 7
2011-02-11 16:57:34 PST
Comment on
attachment 82183
[details]
update w/ feedback from tony View in context:
https://bugs.webkit.org/attachment.cgi?id=82183&action=review
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:52 > + def __init__(self, number, worker): > + self.worker = worker
Should we name this param something other than |worker| to avoid shadowing the module?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:76 > + def _worker_is_done(self, worker): > + # FIXME: check if the worker is wedged. > + return worker.done
What about this param?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:120 > + # FIXME: If we start workers up too quickly, DumpRenderTree appears > + # to thrash on something and time out its first few tests. Until > + # we can figure out what's going on, sleep a bit in between > + # workers. > + time.sleep(0.1)
This is fine for now, but it sounds like this could lead to flakiness. Any ideas why the old code doesn't have this problem?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:130 > + # We post one 'stop' message for each worker. Because the stop message > + # are sent after all of the tests, and because each worker will stop > + # reading messsages after receiving a stop, we can be sure each > + # worker will get a stop message and hence they will all shut down. > + for i in xrange(num_workers): > + manager_connection.post_message('stop')
This is also fine, but maybe ManagerConnection should have a stop_all_workers method?
> Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:155 > + def handle_done(self, source): > + worker = self._workers[source] > + worker.done = True
worker -> worker_state?
Dirk Pranke
Comment 8
2011-02-11 17:32:01 PST
(In reply to
comment #7
)
> (From update of
attachment 82183
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=82183&action=review
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:52 > > + def __init__(self, number, worker): > > + self.worker = worker > > Should we name this param something other than |worker| to avoid shadowing the module? >
I can rename them to worker_connection to be a little clearer.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:76 > > + def _worker_is_done(self, worker): > > + # FIXME: check if the worker is wedged. > > + return worker.done > > What about this param?
> Will rename to worker_state.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:120 > > + # FIXME: If we start workers up too quickly, DumpRenderTree appears > > + # to thrash on something and time out its first few tests. Until > > + # we can figure out what's going on, sleep a bit in between > > + # workers. > > + time.sleep(0.1) > > This is fine for now, but it sounds like this could lead to flakiness. Any ideas why the old code doesn't have this problem? >
It might be being aggravated by using multiple python processes instead of threads so that there's much greater overall load on the system. I haven't really explored it much yet; it's still on my to-do list.
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:130 > > + # We post one 'stop' message for each worker. Because the stop message > > + # are sent after all of the tests, and because each worker will stop > > + # reading messsages after receiving a stop, we can be sure each > > + # worker will get a stop message and hence they will all shut down. > > + for i in xrange(num_workers): > > + manager_connection.post_message('stop') > > This is also fine, but maybe ManagerConnection should have a stop_all_workers method? >
That would make ManagerConnection aware of the semantics of the messages and how many workers have been started, duplicating state in multiple places, so I'd prefer not to do that (the code was originally written that way, for what its worth, and it ended up being much cleaner this way).
> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_runner2.py:155 > > + def handle_done(self, source): > > + worker = self._workers[source] > > + worker.done = True > > worker -> worker_state?
Will do.
Dirk Pranke
Comment 9
2011-02-11 17:42:06 PST
Created
attachment 82215
[details]
rename fields per feedback from tony
Dirk Pranke
Comment 10
2011-02-14 13:48:35 PST
Committed
r78502
: <
http://trac.webkit.org/changeset/78502
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug