WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
63495
NRWT should wait for ReportCrash
https://bugs.webkit.org/show_bug.cgi?id=63495
Summary
NRWT should wait for ReportCrash
Adam Barth
Reported
2011-06-27 17:08:04 PDT
NRWT should wait for ReportCrash
Attachments
Patch
(7.06 KB, patch)
2011-06-27 17:24 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch
(7.07 KB, patch)
2011-06-27 17:27 PDT
,
Adam Barth
eric
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2011-06-27 17:24:21 PDT
Created
attachment 98825
[details]
Patch
Adam Barth
Comment 2
2011-06-27 17:27:16 PDT
Created
attachment 98826
[details]
Patch
Eric Seidel (no email)
Comment 3
2011-06-27 17:27:35 PDT
Comment on
attachment 98825
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=98825&action=review
> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:172 > + if self._executive.running_pids(self._port.is_crash_reporter): > + _log.warning('Waiting for crash reporter...') > + self._executive.wait_newest(self._port.is_crash_reporter)
So is this the central controller for all servers? Or is this a single worker? What happens if we start nrwt while a CrashReporter process is running? What if CrashReporter never stops?
Eric Seidel (no email)
Comment 4
2011-06-27 17:28:33 PDT
Comment on
attachment 98826
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=98826&action=review
> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:171 > + _log.warning('Waiting for crash reporter...')
So if I have 24 workers on my machine, I'll get this printed 24 times?
Eric Seidel (no email)
Comment 5
2011-06-27 17:32:51 PDT
Comment on
attachment 98826
[details]
Patch I'm certain you will get complaints about this spamming the console.
Adam Barth
Comment 6
2011-06-27 17:33:31 PDT
Committed
r89881
: <
http://trac.webkit.org/changeset/89881
>
Adam Barth
Comment 7
2011-06-27 17:42:51 PDT
These are the answers I gave Eric in person:
> So is this the central controller for all servers? Or is this a single worker?
It's for a single worker.
> What happens if we start nrwt while a CrashReporter process is running?
NRWT will wait for CrashReporter to stop before doing anything meaningful.
> What if CrashReporter never stops?
Then NRWT will never stop.
> So if I have 24 workers on my machine, I'll get this printed 24 times?
Yes. I've added the worker name to make it more clear what's going on. Eric suggested that the master process could print this message, which is probably a good idea, but not a blocking issue.
Dirk Pranke
Comment 8
2011-06-27 17:46:43 PDT
(In reply to
comment #7
)
> These are the answers I gave Eric in person: > > > So is this the central controller for all servers? Or is this a single worker? > > It's for a single worker. > > > What happens if we start nrwt while a CrashReporter process is running? > > NRWT will wait for CrashReporter to stop before doing anything meaningful. > > > What if CrashReporter never stops? > > Then NRWT will never stop. > > > So if I have 24 workers on my machine, I'll get this printed 24 times? > > Yes. I've added the worker name to make it more clear what's going on. Eric suggested that the master process could print this message, which is probably a good idea, but not a blocking issue.
The master doesn't know anything about what's going on here, so that might be kinda tricky. I suppose you could post a message from the worker back to the master and the master could throttle the rate it logged messages accordingly. But frankly, if you have 24 workers on your machine and they're all crashing at once, you've probably got worse things to worry about, and NRWT will exit shortly due to the max # of crashes being exceeded anyway.
Dirk Pranke
Comment 9
2011-06-27 17:49:15 PDT
(In reply to
comment #7
)
> > What if CrashReporter never stops? > > Then NRWT will never stop. >
Actually, since this is being done in the worker threads, if this takes too long, the manager will treat this as a wedged thread. Wedging by itself is not fatal, and workers can actually un-wedge, but if every thread wedges and you make no progress at all, NRWT will eventually bail out.
Adam Barth
Comment 10
2011-06-27 17:50:20 PDT
> NRWT will eventually bail out.
Where is the code that does that?
Dirk Pranke
Comment 11
2011-06-27 17:54:00 PDT
(In reply to
comment #10
)
> > NRWT will eventually bail out. > > Where is the code that does that?
Look at the _run_tests() and is_done() methods in the Manager class (manager.py:674 and manager.py:1316),
Adam Barth
Comment 12
2011-06-27 17:57:18 PDT
> Look at the _run_tests() and is_done() methods in the Manager class (manager.py:674 and manager.py:1316),
Thanks. I suspect we might trigger those conditions if lots of tests are crashing.
Dirk Pranke
Comment 13
2011-06-27 18:07:37 PDT
(In reply to
comment #12
)
> > Look at the _run_tests() and is_done() methods in the Manager class (manager.py:674 and manager.py:1316), > > Thanks. I suspect we might trigger those conditions if lots of tests are crashing.
Yeah, it's possible, but even if we do it's really just a slightly-accelerated version of the early exit we'd do when we hit the --exit-after-n-crashes-or-timeouts threshold, so I don't know that I'd worry about it too much (although the fewer workers we have running, the more this wouldn't be true ...). Dunno if you've poked around in this part of the code yet, but in worker.py you have _run_test() on line 127, which calculates the "hang timeout", posts a message to the manager saying that it's starting a test, runs the test and compares the results, and then posts the test completed message. If this really becomes an issue, we have three potential options, in order of increasingly complexity and (IMO) decreasing desirability: 1) Adjust the "hang timeout" to be substantially longer. Given that tests don't really wedge these days, this should be fairly harmless. 2) Adjust the code in manager.py to realize you're getting a lot of crashes and be smart able adjusting the timeouts on that side to compensate 3) Rearrange the code in worker.py so that you post that we've completed the test *before* trying to read the crash log, since we only check for wedged workers if we think they are actively running tests. This would effectively disable the wedge checking for this case, and would mess up the layering of the various methods, but would work in a worst-case scenario.
Adam Barth
Comment 14
2011-06-27 23:56:22 PDT
This patch was rolled out because it made Qt sloooow.
Csaba Osztrogonác
Comment 15
2011-06-28 00:00:08 PDT
Gábor, Kristóf, could you check why this change made our NRWT bot so slow?
Adam Barth
Comment 16
2011-06-28 00:30:25 PDT
Committed
r89899
: <
http://trac.webkit.org/changeset/89899
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug