RESOLVED WONTFIX 50557
nrwt multiprocessing - revamp message_broker to actually do messaging
https://bugs.webkit.org/show_bug.cgi?id=50557
Summary nrwt multiprocessing - revamp message_broker to actually do messaging
Dirk Pranke
Reported 2010-12-06 03:46:11 PST
nrwt multiprocessing - revamp message_broker to actually do messaging
Attachments
Patch (39.03 KB, patch)
2010-12-06 03:49 PST, Dirk Pranke
no flags
Patch (38.99 KB, patch)
2010-12-06 04:14 PST, Dirk Pranke
no flags
Patch (38.99 KB, patch)
2010-12-06 05:31 PST, Dirk Pranke
no flags
update w/ review feedback (38.65 KB, patch)
2010-12-06 17:40 PST, Dirk Pranke
no flags
Dirk Pranke
Comment 1 2010-12-06 03:49:58 PST
Dirk Pranke
Comment 2 2010-12-06 04:14:24 PST
Dirk Pranke
Comment 3 2010-12-06 04:23:52 PST
Okay, I've split up the patch for bug 50375 into two different patches. This one is the first. In it, we revamp the message_broker module to actually do messaging, and rename it as message_broker2 so it can coexist with the existing module. Apart from unit tests, no code uses the new file yet. To review it, you can diff message_broker2 against message_broker, or just review it on its own merits. This patch is a change from the earlier implementations in that it uses aggregation (has-a) instead of multiple inheritance (is-a) to model objects that have their own concurrency and hold a Worker object. This makes the code much more modular and testable. There is much more extensive documentation, commenting, and unit testing. In addition, it introduces the concept of "connection" object to mediates or proxies the methods between the caller of the message broker and the broker itself. There are two different proxies, a ManagerConnection, and a WorkerConnectionInterface. The latter is an abstract class that is implemented by concrete classes that subclass threading.Thread (or just object, for the inline case). These objects hide the details of the message topic naming from the callers of the broker (TestRunner and dump_render_tree_thread.Worker), and provide a simpler interface that more closely resembles the actual caller's intent.
Dirk Pranke
Comment 4 2010-12-06 05:31:42 PST
Tony Chang
Comment 5 2010-12-06 16:27:26 PST
Comment on attachment 75680 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=75680&action=review Mostly some high level first pass comments. I'm having a hard time seeing the big picture, but I think it's: manager <- manager connection <- broker -> worker connection -> worker I'm not sure I'm convinced that all these layers are needed. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:109 > +class ClientInterface(object): Is this the same as BrokerClientInterface? I don't see anything that inherits from this class. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:184 > + try: > + broker_class = broker_types[worker_model] > + return broker_class(port, options, worker_class, worker_topic, > + worker_reply_topic) > + except KeyError: Nit: Can we move the return outside the try block? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:219 > + def __init__(self, queue): > + self.queue = queue Does queue need to be a parameter? It looks like you always pass in Queue.Queue(). > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:354 > + def cancel_worker(self, worker_name): It's a bit odd that start_worker takes a worker_number, but all the other worker methods take a worker_name. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:372 > + def add_timer(self, topic_name, callback, delay_secs): > + """Add a timer that will fire later during drain() or loop(). Is this for finding hung threads? I don't think finding hung threads needs to be so general purpose. Why not just name the methods add_hung_thread_timer or something? Also, there's a lot of code for timers. It might be easier to add them in a separate patch. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:391 > + def run_message_loop(self, topic_name, reader): > + """Loop processing messages and timers until done. > + I think it would be more clear if the broker was less general purpose. We only have 2 topics and being able to handle an arbitrary number of topics makes the code harder to follow. For example, maybe it would be reasonable to have run_manager_loop and run_worker_loop? Actually, I can't tell right now if the broker class is responsible for running the manager loop. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:392 > + "done" means that reader.is_done() returns True or a timer indicates It would also make "reader" explicit. Of course, run_manager_loop and run_worker_loop may both call into a common protected method that does this abstraction, but the code would be more readable. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:413 > + def drain(self, topic_name, reader): Nit: Can we call this run_all_pending? That's matches MessageLoop in base. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:475 > + fn = getattr(reader, 'handle_' + message.name) fn-> function or callback or message_handler or something > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:480 > + class _InlineWorkerConnection(object): I would move this out of the class since it's so big. It makes it hard to see which methods are part of _InlineBroker and which methods are part of _InlineWorkerConnection. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:579 > + class _ThreadedWorkerConnection(threading.Thread): I would move this out too.
Dirk Pranke
Comment 6 2010-12-06 17:08:34 PST
(In reply to comment #5) > (From update of attachment 75680 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=75680&action=review > > Mostly some high level first pass comments. I'm having a hard time seeing the big picture, but I think it's: > manager <- manager connection <- broker -> worker connection -> worker > I'm not sure I'm convinced that all these layers are needed. > The non-MI version only had the three ... manager <-> broker <-> worker. As I mention in comment #3, using aggregation requires the worker connection, at which point I think having a symmetric manager connection keeps the code cleaner and more understandable. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:109 > > +class ClientInterface(object): > > Is this the same as BrokerClientInterface? I don't see anything that inherits from this class. > Whoops, yes. That was an earlier name and I guessed I missed some search&replaces. will fix. TestWorker in message_broker2_unittest.py should be inheriting from / implementing it, but it isn't. That's a typo that I will fix. More practically, the Worker class in dump_render_tree_thread.py in bug 50375 implements it. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:184 > > + try: > > + broker_class = broker_types[worker_model] > > + return broker_class(port, options, worker_class, worker_topic, > > + worker_reply_topic) > > + except KeyError: > > Nit: Can we move the return outside the try block? > Sure, but why? > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:219 > > + def __init__(self, queue): > > + self.queue = queue > > Does queue need to be a parameter? It looks like you always pass in Queue.Queue(). > For now we always use Queue.Queue(), but in bug 50381, when we add in multiprocessing, we'll either use Queue.Queue()s or multiprocessing.Queue()s. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:354 > > + def cancel_worker(self, worker_name): > > It's a bit odd that start_worker takes a worker_number, but all the other worker methods take a worker_name. > The caller doesn't specify the name, so it can't be passed to start_worker(). Arguably the other methods could be changed to take worker_number as well, but I have concerns at some future date about worker_numbers getting re-used if workers are being restarted, and names being more durable/unique. I'm not sure there's an obvious win here one way or another. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:372 > > + def add_timer(self, topic_name, callback, delay_secs): > > + """Add a timer that will fire later during drain() or loop(). > > Is this for finding hung threads? I don't think finding hung threads needs to be so general purpose. Why not just name the methods add_hung_thread_timer or something? I'm not sure that renaming things to special-purpose methods and combining the layers actually makes things clearer. You can go back to to earlier versions of these patches and compare which is easier to read, but I think the new versions have clean, familiar mechanisms that go a long way to making this code much more understandable than it was previously. > > Also, there's a lot of code for timers. It might be easier to add them in a separate patch. > Perhaps. At this point I'd prefer not to split things up further unless there's some reason to do so either for correctness or to make landing things easier (and we can't land this partially using mb2 without using timers). > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:391 > > + def run_message_loop(self, topic_name, reader): > > + """Loop processing messages and timers until done. > > + > > I think it would be more clear if the broker was less general purpose. We only have 2 topics and being able to handle an arbitrary number of topics makes the code harder to follow. For example, maybe it would be reasonable to have run_manager_loop and run_worker_loop? Actually, I can't tell right now if the broker class is responsible for running the manager loop. > Respectfully, I disagree. I actually had separate loops earlier (you can look at older patches) and things were more intertwined, and it made the logic much hard to follow, let alone test comprehensively. It doesn't help if you focus on the "inline" case as the normal usage. It isn't, you have to jump through hoops to get things to work. If you look at the threaded case everything becomes much more symmetric (and the multiprocess case is exactly the same as the threaded case). More broadly, there is something of a tension between putting logic in different places and trying to keep the abstractions clean and understandable. For example, one design tension is whether to make _WorkerMessageBroker and its descendants smart objects that manage control flow, or a relatively passive objects that just hold thread/process-safe data that other objects manipulate. I have chosen the former tactic here, because the latter tactic led to putting a lot of logic into dump_render_tree_thread.py, making that code *very* hard to understand and maintain. The new Worker code you see in bug 50375 is IMO far more understandable. So, if you accept that, then either you need to remove some of the control over the flow from TestRunner as well, to make things more symmetrical, or you live with asymmetry. Part of the design thinking I had was that, if I was implementing this from scratch as a system based on communicating asynchronous message loops that used common libraries and classes (rather than ones I wrote myself), how would I do it? I think I'd end up with something like this. Given that the net result is actually *less* code and *less convoluted* code than the earlier versions, I think this is probably a win. I explain all of this in the hopes that it will help you understand the motivations for this design better. Maybe I'm just off my rocker ... > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:392 > > + "done" means that reader.is_done() returns True or a timer indicates > > It would also make "reader" explicit. Of course, run_manager_loop and run_worker_loop may both call into a common protected method that does this abstraction, but the code would be more readable. > Admittedly, "reader" isn't a great name here. "caller" or "client" might be better. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:413 > > + def drain(self, topic_name, reader): > > Nit: Can we call this run_all_pending? That's matches MessageLoop in base. > Sure. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:475 > > + fn = getattr(reader, 'handle_' + message.name) > > fn-> function or callback or message_handler or something > Okay. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:480 > > + class _InlineWorkerConnection(object): > > I would move this out of the class since it's so big. It makes it hard to see which methods are part of _InlineBroker and which methods are part of _InlineWorkerConnection. > Okay. > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:579 > > + class _ThreadedWorkerConnection(threading.Thread): > > I would move this out too. Okay. I'll post a revision with the patches and corrections that I've agreed to so far, and we can continue to discuss. Thanks for your time!
Dirk Pranke
Comment 7 2010-12-06 17:40:42 PST
Created attachment 75763 [details] update w/ review feedback
Dirk Pranke
Comment 8 2010-12-06 17:45:51 PST
(In reply to comment #6) > > > > Is this the same as BrokerClientInterface? I don't see anything that inherits from this class. > > > > Whoops, yes. That was an earlier name and I guessed I missed some search&replaces. will fix. > Looks like I missed indicating the interface inheritance in a bunch of places; I'm still not sure how common an idiom that is in Python vs. just relying on duck typing. That said, I definitely intended to document it everywhere, so I'm adding it now. > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:184 > > > + try: > > > + broker_class = broker_types[worker_model] > > > + return broker_class(port, options, worker_class, worker_topic, > > > + worker_reply_topic) > > > + except KeyError: > > > > Nit: Can we move the return outside the try block? > > > > Sure, but why? > Never mind, I thought you wanted me to call broker_class() inside the try block but save the return value and then have an explicit return afterwards. I see now that you just wanted the try/except KeyError to just wrap the dict access. > The caller doesn't specify the name, so it can't be passed to start_worker(). Arguably the other methods could be changed to take worker_number as well, but I have concerns at some future date about worker_numbers getting re-used if workers are being restarted, and names being more durable/unique. I'm not sure there's an obvious win here one way or another. > Oh, I should have noted that I use worker_name where possible because that is the thing that identifies the sender of the messages and is used in debug output. One could consider abandoning worker_name altogether and just using worker_number pervasively everywhere, but I don't think that works as well given the idea of a "generic" message broker, since it's unclear how you refer to the manager? worker zero? Special case code? Leaving this alone for now. We can discuss further. > > It would also make "reader" explicit. Of course, run_manager_loop and run_worker_loop may both call into a common protected method that does this abstraction, but the code would be more readable. > > > > Admittedly, "reader" isn't a great name here. "caller" or "client" might be better. > Leaving "reader" alone for now until we agree on a better name if it is needed. > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:475 > > > + fn = getattr(reader, 'handle_' + message.name) > > > > fn-> function or callback or message_handler or something Renamed to "message_handler" > > > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/message_broker2.py:480 > > > + class _InlineWorkerConnection(object): > > > > I would move this out of the class since it's so big. It makes it hard to see which methods are part of _InlineBroker and which methods are part of _InlineWorkerConnection. > > > > Okay. > I've moved all of the nested classes out to the top level, seems like an all-or-nothing sort of thing.
WebKit Review Bot
Comment 9 2010-12-06 18:11:15 PST
Attachment 75763 [details] did not pass style-queue: Failed to run "[u'git', u'reset', u'--hard', u'HEAD']" exit_code: 128 error: Could not write new index file. fatal: Could not reset index file to revision 'HEAD'. If any of these errors are false positives, please file a bug against check-webkit-style.
Dirk Pranke
Comment 10 2010-12-06 18:18:19 PST
style bot keeled over but that may be because the diff is based against code that got reverted. I think this can be ignored for purposes of review.
Dirk Pranke
Comment 11 2010-12-07 13:59:25 PST
I should note that a lot of your thinking and questions mirror my own as I was initially writing this code (separate message loops, fewer abstractions, custom timer code instead of generic timer code). The code has become much more generic over time, and I think this makes it easier to understand and easier to test and maintain.
Dirk Pranke
Comment 12 2010-12-07 15:47:28 PST
Adam, wdyt about this code vs. the stuff in common/thread that I just remembered?
Adam Barth
Comment 13 2010-12-07 15:55:31 PST
(In reply to comment #12) > Adam, wdyt about this code vs. the stuff in common/thread that I just remembered? I suspect the stuff we currently have in common/thread isn't worth worrying about. It's just the bare minimum we needed to get sheriffbot working.
Dirk Pranke
Comment 14 2010-12-07 20:53:14 PST
Tony, One other thought, for something that might kill all of these birds with one stone ... there's a FIXME: at the top of the file to split the worker logic out from the message_broker code. If we combine that with some of the other concerns you had about being able to understand the flow of control, then we could end up with something like the following ... WorkerConnection manages the message loop and supports timers (so, implements add_timer(), run_message_loop(), and the supporting functions). ManagerConnection (or something like it) would subclass WorkerConnection and adds start_worker(). There would be two versions of this, a normal one and one specialized to deal with the inline worker case. Actually, there'd be three versions, since there are two variants of the "normal" one to deal with threads or processes. WorkerMessageBroker would manage no control flow whatsoever. it would be essentially just a data object. Thoughts?
Dirk Pranke
Comment 15 2010-12-14 20:46:56 PST
Marking as wontfix - I'll split these patches up another way.
Note You need to log in before you can comment on or make changes to this bug.