RESOLVED FIXED 63495
NRWT should wait for ReportCrash
https://bugs.webkit.org/show_bug.cgi?id=63495
Summary NRWT should wait for ReportCrash
Adam Barth
Reported 2011-06-27 17:08:04 PDT
NRWT should wait for ReportCrash
Attachments
Patch (7.06 KB, patch)
2011-06-27 17:24 PDT, Adam Barth
no flags
Patch (7.07 KB, patch)
2011-06-27 17:27 PDT, Adam Barth
eric: review+
Adam Barth
Comment 1 2011-06-27 17:24:21 PDT
Adam Barth
Comment 2 2011-06-27 17:27:16 PDT
Eric Seidel (no email)
Comment 3 2011-06-27 17:27:35 PDT
Comment on attachment 98825 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98825&action=review > Tools/Scripts/webkitpy/layout_tests/port/server_process.py:172 > + if self._executive.running_pids(self._port.is_crash_reporter): > + _log.warning('Waiting for crash reporter...') > + self._executive.wait_newest(self._port.is_crash_reporter) So is this the central controller for all servers? Or is this a single worker? What happens if we start nrwt while a CrashReporter process is running? What if CrashReporter never stops?
Eric Seidel (no email)
Comment 4 2011-06-27 17:28:33 PDT
Comment on attachment 98826 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=98826&action=review > Tools/Scripts/webkitpy/layout_tests/port/server_process.py:171 > + _log.warning('Waiting for crash reporter...') So if I have 24 workers on my machine, I'll get this printed 24 times?
Eric Seidel (no email)
Comment 5 2011-06-27 17:32:51 PDT
Comment on attachment 98826 [details] Patch I'm certain you will get complaints about this spamming the console.
Adam Barth
Comment 6 2011-06-27 17:33:31 PDT
Adam Barth
Comment 7 2011-06-27 17:42:51 PDT
These are the answers I gave Eric in person: > So is this the central controller for all servers? Or is this a single worker? It's for a single worker. > What happens if we start nrwt while a CrashReporter process is running? NRWT will wait for CrashReporter to stop before doing anything meaningful. > What if CrashReporter never stops? Then NRWT will never stop. > So if I have 24 workers on my machine, I'll get this printed 24 times? Yes. I've added the worker name to make it more clear what's going on. Eric suggested that the master process could print this message, which is probably a good idea, but not a blocking issue.
Dirk Pranke
Comment 8 2011-06-27 17:46:43 PDT
(In reply to comment #7) > These are the answers I gave Eric in person: > > > So is this the central controller for all servers? Or is this a single worker? > > It's for a single worker. > > > What happens if we start nrwt while a CrashReporter process is running? > > NRWT will wait for CrashReporter to stop before doing anything meaningful. > > > What if CrashReporter never stops? > > Then NRWT will never stop. > > > So if I have 24 workers on my machine, I'll get this printed 24 times? > > Yes. I've added the worker name to make it more clear what's going on. Eric suggested that the master process could print this message, which is probably a good idea, but not a blocking issue. The master doesn't know anything about what's going on here, so that might be kinda tricky. I suppose you could post a message from the worker back to the master and the master could throttle the rate it logged messages accordingly. But frankly, if you have 24 workers on your machine and they're all crashing at once, you've probably got worse things to worry about, and NRWT will exit shortly due to the max # of crashes being exceeded anyway.
Dirk Pranke
Comment 9 2011-06-27 17:49:15 PDT
(In reply to comment #7) > > What if CrashReporter never stops? > > Then NRWT will never stop. > Actually, since this is being done in the worker threads, if this takes too long, the manager will treat this as a wedged thread. Wedging by itself is not fatal, and workers can actually un-wedge, but if every thread wedges and you make no progress at all, NRWT will eventually bail out.
Adam Barth
Comment 10 2011-06-27 17:50:20 PDT
> NRWT will eventually bail out. Where is the code that does that?
Dirk Pranke
Comment 11 2011-06-27 17:54:00 PDT
(In reply to comment #10) > > NRWT will eventually bail out. > > Where is the code that does that? Look at the _run_tests() and is_done() methods in the Manager class (manager.py:674 and manager.py:1316),
Adam Barth
Comment 12 2011-06-27 17:57:18 PDT
> Look at the _run_tests() and is_done() methods in the Manager class (manager.py:674 and manager.py:1316), Thanks. I suspect we might trigger those conditions if lots of tests are crashing.
Dirk Pranke
Comment 13 2011-06-27 18:07:37 PDT
(In reply to comment #12) > > Look at the _run_tests() and is_done() methods in the Manager class (manager.py:674 and manager.py:1316), > > Thanks. I suspect we might trigger those conditions if lots of tests are crashing. Yeah, it's possible, but even if we do it's really just a slightly-accelerated version of the early exit we'd do when we hit the --exit-after-n-crashes-or-timeouts threshold, so I don't know that I'd worry about it too much (although the fewer workers we have running, the more this wouldn't be true ...). Dunno if you've poked around in this part of the code yet, but in worker.py you have _run_test() on line 127, which calculates the "hang timeout", posts a message to the manager saying that it's starting a test, runs the test and compares the results, and then posts the test completed message. If this really becomes an issue, we have three potential options, in order of increasingly complexity and (IMO) decreasing desirability: 1) Adjust the "hang timeout" to be substantially longer. Given that tests don't really wedge these days, this should be fairly harmless. 2) Adjust the code in manager.py to realize you're getting a lot of crashes and be smart able adjusting the timeouts on that side to compensate 3) Rearrange the code in worker.py so that you post that we've completed the test *before* trying to read the crash log, since we only check for wedged workers if we think they are actively running tests. This would effectively disable the wedge checking for this case, and would mess up the layering of the various methods, but would work in a worst-case scenario.
Adam Barth
Comment 14 2011-06-27 23:56:22 PDT
This patch was rolled out because it made Qt sloooow.
Csaba Osztrogonác
Comment 15 2011-06-28 00:00:08 PDT
Gábor, Kristóf, could you check why this change made our NRWT bot so slow?
Adam Barth
Comment 16 2011-06-28 00:30:25 PDT
Note You need to log in before you can comment on or make changes to this bug.