RESOLVED WONTFIX 47465
new-run-webkit-tests should bail out when all the threads are wedged
https://bugs.webkit.org/show_bug.cgi?id=47465
Summary new-run-webkit-tests should bail out when all the threads are wedged
Dirk Pranke
Reported 2010-10-09 23:13:27 PDT
new-run-webkit-tests should bail out when all the threads are wedged
Attachments
Patch (12.11 KB, patch)
2010-10-09 23:16 PDT, Dirk Pranke
no flags
Patch (32.90 KB, patch)
2010-10-20 19:15 PDT, Dirk Pranke
eric: review-
Dirk Pranke
Comment 1 2010-10-09 23:16:21 PDT
Eric Seidel (no email)
Comment 2 2010-10-11 21:24:24 PDT
Comment on attachment 70388 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=70388&action=review I really like the idea of this change. I'll need to stare at it in more detail to make sure I understand/agree-with the details of it though. :) > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:659 > + # give the other threads a chance to start up. > + time.sleep(0.01) This seems wrong (or at least racy)? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:688 > + living_threads = living_threads + 1 Does python not have a ++ or +=? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:701 > + _log.critical("all remaining threads are wedged, bailing out\n") Seems this should have a capital and a period, no?
Dirk Pranke
Comment 3 2010-10-11 22:15:17 PDT
(In reply to comment #2) > (From update of attachment 70388 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=70388&action=review > > I really like the idea of this change. I'll need to stare at it in more detail to make sure I understand/agree-with the details of it though. :) > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:659 > > + # give the other threads a chance to start up. > > + time.sleep(0.01) > > This seems wrong (or at least racy)? > Yes, it is racy. Without the sleep(), it's possible for this function to get called before any of the other threads' run() methods have been invoked. So, testing them for isAlive() will return False and we'll think we're done. Perhaps a better mechanism is to have a condition variable that is set by the threads to indicate that they've started ... I'll try adding that. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:688 > > + living_threads = living_threads + 1 > > Does python not have a ++ or +=? > No ++; for some reason my brain failed me yesterday and I was thinking there wasn't a +=, either, but there is, so I'll add it. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:701 > > + _log.critical("all remaining threads are wedged, bailing out\n") > > Seems this should have a capital and a period, no? Will fix. Thanks for the review!
Eric Seidel (no email)
Comment 4 2010-10-12 16:22:45 PDT
Comment on attachment 70388 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=70388&action=review I think this is generally a great patch, just I'd like to see one more round. :) > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:653 > + return (thread_timings, test_timings, individual_test_timings) Eventually this probably needs to be a class. Tuples rarely stay tuples for long. :) (Obviously not related to this patch, just noting.) > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:661 > + last_t = time.time() It's slightly unfrotunate this variable doesn't have units in its name. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:666 > + living_threads = 1 > + while living_threads: > + living_threads = 0 This feels like maybe a do while? Does python even have such? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:668 > + min_timeout = last_t + 600 Lack of units in the variable name makes "600" harder to read. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:681 > + if next_timeout and next_timeout < min_timeout: > + min_timeout = next_timeout Isn't this just a min function? Or does min() not handle None as you'd want it to? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:683 > + if not thread in wedged_threads: "foo not in bar" reads slightly nicer to my eyes than "not foo in bar" > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:691 > + if t > last_t + 10: I'm confused by what this does. Possibly due to variable naming. Not sure. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:700 > + if len(wedged_threads): if wedged_threads: should do the same with 4 fewer chars. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:709 > + except Exception, e: > + self._exception = e Do we not want to cancel threads in the non-keyboard case? Maybe we should wrap this whole thing (possibly by using a helper function) in a separate try block to do the self._exception setting and the inner one would handle the keyboard stuff? Maybe that's a separate patch?
Dirk Pranke
Comment 5 2010-10-12 16:39:08 PDT
(In reply to comment #4) > (From update of attachment 70388 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=70388&action=review > > I think this is generally a great patch, just I'd like to see one more round. :) > > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:653 > > + return (thread_timings, test_timings, individual_test_timings) > > Eventually this probably needs to be a class. Tuples rarely stay tuples for long. :) (Obviously not related to this patch, just noting.) > Yes. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:661 > > + last_t = time.time() > > It's slightly unfrotunate this variable doesn't have units in its name. > I'll add something. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:666 > > + living_threads = 1 > > + while living_threads: > > + living_threads = 0 > > This feels like maybe a do while? Does python even have such? > It is a do while, but Python doesn't have a do-while :( > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:668 > > + min_timeout = last_t + 600 > > Lack of units in the variable name makes "600" harder to read. > See above :) > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:681 > > + if next_timeout and next_timeout < min_timeout: > > + min_timeout = next_timeout > > Isn't this just a min function? Or does min() not handle None as you'd want it to? > It's min() as long as min != 0 (or None), i.e., min_timeout must be greater than zero. I could have: if next_timeout: min_timeout = min(min_timeout, next_timeout) > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:683 > > + if not thread in wedged_threads: > > "foo not in bar" reads slightly nicer to my eyes than "not foo in bar" > Sure. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:691 > > + if t > last_t + 10: > > I'm confused by what this does. Possibly due to variable naming. Not sure. > This just makes sure we don't log this message more than once every ten seconds. Perhaps I'm confusing you by using last_t for both this and the initial value of min_timeout. I will rewrite. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:700 > > + if len(wedged_threads): > > if wedged_threads: should do the same with 4 fewer chars. > Fair enough. > > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:709 > > + except Exception, e: > > + self._exception = e > > Do we not want to cancel threads in the non-keyboard case? Maybe we should wrap this whole thing (possibly by using a helper function) in a separate try block to do the self._exception setting and the inner one would handle the keyboard stuff? Maybe that's a separate patch? KeyboardInterrupted is a "normal" occurrence, and things are generally happy in python land. The other Exceptions indicate that something has gone wrong, most likely we have wedged threads. In which case I'm leery about trying to do anything other than getting out as quickly as possible. I think (but am not positive) that I tested trying to cancel threads in this case and got deadlocks because the thread.cancel() wedged. I can re-test to confirm. Note that I don't want to spend too much time on trying to make these code paths flawless. I'd rather be working on the real fix so that the threads don't get wedged ;)
Eric Seidel (no email)
Comment 6 2010-10-15 14:49:31 PDT
(In reply to comment #5) > It's min() as long as min != 0 (or None), i.e., min_timeout must be greater than zero. I could have: > > if next_timeout: > min_timeout = min(min_timeout, next_timeout) SGTM.
Dirk Pranke
Comment 7 2010-10-20 19:15:30 PDT
Eric Seidel (no email)
Comment 8 2010-10-22 16:04:33 PDT
Comment on attachment 71382 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71382&action=review I need to finish resurrecting the EWS fleet. Will look more later. > WebKitTools/Scripts/new-run-webkit-tests:38 > + res = run_webkit_tests.main() :) I'm not sure "res" saves you enough typing to be worth it. exit_code would be best imo. > WebKitTools/Scripts/new-run-webkit-tests:41 > + res = signal.SIGINT + 128 Odd. isn't there a function to do this +128 for us?
Dirk Pranke
Comment 9 2010-10-22 16:09:23 PDT
(In reply to comment #8) > > WebKitTools/Scripts/new-run-webkit-tests:38 > > + res = run_webkit_tests.main() > > :) I'm not sure "res" saves you enough typing to be worth it. exit_code would be best imo. > Will do. > > WebKitTools/Scripts/new-run-webkit-tests:41 > > + res = signal.SIGINT + 128 > > Odd. isn't there a function to do this +128 for us? Not that I'm aware of.
Eric Seidel (no email)
Comment 10 2010-11-01 16:53:24 PDT
Comment on attachment 71382 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=71382&action=review I'm failing to understand what this is trying to do (perhaps it's trying to do too much?) Perhaps I should come down and read this in person sitting next to you. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:247 > + def __init__(self, worker_number, runner, port, options, > + filename_list_queue, result_queue, test_types, test_args): If we're going to have a back-pointer to runner, should more of this list just be pulled off of runner? Even if we just pull it off in the __init__ and cache it locally, that seems cleaner than passing each value to the constructor. donno. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:253 > + runner: handle to TestRunner (master thread) used to notify of > + thread startup/shutdown Does this use a specific delegate protocol? Often we use such for easier testing/api-containment. You can see examples in CommitQueueTask, QueueEngine, etc. > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:274 > + self.setName("Worker-%d" % (worker_number + 1)) Seems the owner of this thread could just do this? Why does the thread set its own name? > WebKitTools/Scripts/webkitpy/layout_tests/layout_package/dump_render_tree_thread.py:325 > + self._runner.notify_thread_start(self) I wouldnt' have used notify in the name. But it's OK. I probably would have just called it thread_started/thread_stopped. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:647 > + self._cv = threading.Condition() What does _cv stand for? It's not clear from that name what the variable is doing. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:648 > + self._cv.acquire() with_statement? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:688 > + self._cv.acquire() If this is a lock, then with_statement would be safer here. I'm not sure what _cv is. > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:1461 > +def _run_with_printer(port, options, printer, num_workers, args): The changes in this function are all just changes in indent, right? > WebKitTools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:36 > +import pdb leftover.
Dirk Pranke
Comment 11 2010-11-01 22:33:32 PDT
Well, the "good" news is that this patch may be irrelevant; it's starting to look like the rewritten code I have to do multi-process NRWT actually works, and most of this code goes away. If that doesn't pan out, I'll come back to this and we can review the design in more detail.
Dirk Pranke
Comment 12 2010-11-04 13:51:27 PDT
Closing this as WONTFIX for now.
Note You need to log in before you can comment on or make changes to this bug.