Consider pausing all tests when ReportCrash is running
https://bugs.webkit.org/show_bug.cgi?id=96690
Summary Consider pausing all tests when ReportCrash is running
Alexey Proskuryakov
Reported 2012-09-13 14:00:08 PDT
CrashReporter is a very resource intensive process, so running a dozen DumpRenterTrees along with it is a recipe for trouble. Can we avoid starting any new tests while the reporter process is running? There are crashes that happen without NRWT ever noticing, so it's probably better than pausing tests while explicitly waiting for the crash log. One downside would be that the process doesn't seem to exit immediately after having written out the log, so we may be wasting some time. But maybe that's the right thing to do regardless.
Attachments
Stephanie Lewis
Comment 1 2012-09-13 14:08:20 PDT
I've thought about this, particularly as it relates to the bots timing out due to resource contention. (although unloading CrashTracer didn't seem to affect the number of timeouts). The problem is I've seen CrashTracer finish writing a crash log on these bots 15 minutes after the crash. Pausing 15 minutes for a 20 minute test will increase the length of time it takes to run dramatically.
Dirk Pranke
Comment 2 2012-09-13 14:13:17 PDT
CrashReporter actually sticks around after its done for some longish period of time (~45 seconds or so last I looked) so that it doesn't have to start up again on a subsequent crash. I am not aware of any way to disable that, and there's not any way to tell when CrashReporter is doing some vs. being idle as far as I can tell. So, I'm not sure there's a way to implement your request without adding 45 second pauses into the run on every crash, and I'm not sure that'll be a win on average over just thrashing :(. I'm not sure there's a good solution here. I haven't actually seen the 15 min pause that Stephanie has, though. I spent a lot of time trying to figure out what CrashReporter actually did a few months ago to optimize things and what we do now is the best I could come up with :(.
Alexey Proskuryakov
Comment 3 2012-09-13 14:20:37 PDT
We might also be able to check that the process is idle. It's pretty obvious to a human looking at top output, so there must be some way to detect it in a script.
Dirk Pranke
Comment 4 2012-09-13 14:32:34 PDT
It's possible but likely fragile. I've definitely seen cases where CrashReporter seemed to be doing nothing for a few seconds in top / Activity Monitor and then seemed to wake up and write a log file. I don't know what it was doing but maybe it was blocked on I/O or something I wasn't watching. Not the sort of game I'm super eager to play :(.
Alexey Proskuryakov
Comment 5 2012-09-13 14:48:23 PDT
According to launchd.plist man page, there are configuration options to pass timeout to agents. Perhaps we could edit /System/Library/LaunchAgents/com.apple.ReportCrash.plist (or ReportCrash.Self?) to make this work better. I don't know if ReportCrash supports configurable timeout though.
Dirk Pranke
Comment 6 2012-09-24 14:29:41 PDT
Hi Alexey, Unless someone has a brighter idea than the ones listed so far, I'm inclined to close this as WONTFIX; it's not that I don't think this should be fixed, but I don't have any good ideas on how to do so, so I consider this "considered" :). Would you prefer we leave this open? If so, I'd just as soon edit the subject to something more problem-focused, like "handle ReportCrash load better" or something.
Alexey Proskuryakov
Comment 7 2012-09-24 14:35:02 PDT
I find this highly desirable regardless of concerns raised here. It's crazy that we keep running tests (and sometimes parallel ReportCrash instances) when ReportCrash is already running. As Stephanie noted, this can make ReportCrash take up to 15 minutes instead of a few seconds that it normally takes - imagine how much this strains the system, and how much this contributes to other tests' slowness and flakiness. If there is no better way known, let's just look at whether ReportCrash process exists - we'll waste 45 seconds per crash, but it's small compared to current situation.
Dirk Pranke
Comment 8 2012-09-24 14:50:12 PDT
(In reply to comment #7) > I find this highly desirable regardless of concerns raised here. > > It's crazy that we keep running tests (and sometimes parallel ReportCrash instances) when ReportCrash is already running. As Stephanie noted, this can make ReportCrash take up to 15 minutes instead of a few seconds that it normally takes - imagine how much this strains the system, and how much this contributes to other tests' slowness and flakiness. > > If there is no better way known, let's just look at whether ReportCrash process exists - we'll waste 45 seconds per crash, but it's small compared to current situation Okay, well, it's not hard to post a patch that will sleep while ReportCrash is running, for you to at least try things and see if you like it.
Note You need to log in before you can comment on or make changes to this bug.