Bug 71380

Summary: NRWT incorrectly associates crash logs with tests
Product: WebKit Reporter: Simon Fraser (smfr) <simon.fraser>
Component: Tools / TestsAssignee: Dirk Pranke <dpranke>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, ap, aroben, dpranke, eric, jberlin, mitz, ojan, thorton, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 73942, 81575, 81580, 81586, 81595, 81598, 81600, 81603    
Bug Blocks: 64491    
Attachments:
Description Flags
Wait for Crash Reporter to finish even when it lets the crashed process die quickly
none
Wait for Crash Reporter to finish even when it lets the crashed process die quickly
none
Patch
none
more testing
none
Roll-up patch none

Attachments
Wait for Crash Reporter to finish even when it lets the crashed process die quickly (5.41 KB, patch)
2011-12-06 09:08 PST, Adam Roben (:aroben)
no flags
Wait for Crash Reporter to finish even when it lets the crashed process die quickly (6.18 KB, patch)
2011-12-06 11:20 PST, Adam Roben (:aroben)
no flags
Patch (31.71 KB, patch)
2012-03-15 20:28 PDT, Dirk Pranke
no flags
more testing (42.49 KB, patch)
2012-03-16 15:21 PDT, Dirk Pranke
no flags
Roll-up patch (49.07 KB, patch)
2012-03-16 20:02 PDT, Dirk Pranke
no flags
Eric Seidel (no email)
Comment 1 2011-11-02 13:13:45 PDT
Interesting. Sounds bad. Will look shortly.
Adam Roben (:aroben)
Comment 2 2011-12-06 07:12:42 PST
It looks like we only wait for ReportCrash to finish when a test times out. We don't wait if the test crashed quickly. The code is all in this area: http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/server_process.py?rev=98040#L181 Note that we only call self._executive.wait_newest(self._port.is_crash_reporter) from within _handle_timeout.
Adam Roben (:aroben)
Comment 3 2011-12-06 07:32:39 PST
It seems that self.crashed and self._check_for_crash are about the Python subprocess crashing, not about DRT crashing.
Adam Roben (:aroben)
Comment 4 2011-12-06 07:37:35 PST
(In reply to comment #3) > It seems that self.crashed and self._check_for_crash are about the Python subprocess crashing, not about DRT crashing. My mistake. self.crashed/self._check_for_crash are about ServerProcess's self._proc. self._proc corresponds to DRT/WTR.
Adam Roben (:aroben)
Comment 5 2011-12-06 08:02:34 PST
I think we probably need something like ORWT's processIsCrashing functionality.
Adam Roben (:aroben)
Comment 6 2011-12-06 09:08:14 PST
Created attachment 118060 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly
Adam Roben (:aroben)
Comment 7 2011-12-06 09:09:22 PST
Comment on attachment 118060 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly This is only part of the story. Waiting for the newest Crash Reporter instance isn't enough, since we can have multiple Crash Reporter instances going at once. We should probably teach NRWT how to double-check that it's chosen the right crash log, perhaps by comparing PIDs.
Adam Roben (:aroben)
Comment 8 2011-12-06 09:10:38 PST
BTW, this bug is very easy for me to reproduce: 1. Insert an abort() in DumpRenderTree.mm:runTest 2. run-webkit-tests --no-http --exit-after-n-crashes-or-timeouts=2
Adam Roben (:aroben)
Comment 9 2011-12-06 09:23:34 PST
(In reply to comment #7) > (From update of attachment 118060 [details]) > Waiting for the newest Crash Reporter instance isn't enough, since we can have multiple Crash Reporter instances going at once. Similarly, always choosing the newest crash log for a given process name isn't enough, since we can have multiple crash logs written for DRT before we go to find the crash log itself.
Adam Roben (:aroben)
Comment 10 2011-12-06 11:20:10 PST
Created attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly
Adam Roben (:aroben)
Comment 11 2011-12-06 12:05:06 PST
Comment on attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly Landed as r102162.
Adam Roben (:aroben)
Comment 12 2011-12-06 12:06:41 PST
My plan is to have ServerProcess collect the crash log itself (after waiting as needed), then pass it along to SingleTestWriter as part of the DriverOutput, rather than having SingleTestWriter find the crash log after the fact as it does today.
Adam Roben (:aroben)
Comment 13 2011-12-06 12:17:24 PST
Unfortunately, ServerProcess doesn't (and shouldn't) know about WebProcess crashes. So maybe this needs to be handled at the WebKitDriver level?
Adam Roben (:aroben)
Comment 14 2011-12-06 12:17:48 PST
We might need to change "#CRASHED - WebProcess\n" to "#CRASHED - WebProcess <pid>\n".
Eric Seidel (no email)
Comment 15 2011-12-06 12:27:39 PST
(In reply to comment #14) > We might need to change "#CRASHED - WebProcess\n" to "#CRASHED - WebProcess <pid>\n". That sounds like a good idea. :)
Eric Seidel (no email)
Comment 16 2011-12-19 10:49:39 PST
Comment on attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly Cleared Dirk Pranke's review+ from obsolete attachment 118075 [details] so that this bug does not appear in http://webkit.org/pending-commit.
Dirk Pranke
Comment 17 2012-03-15 15:36:58 PDT
i will take a look ...
Dirk Pranke
Comment 18 2012-03-15 20:28:09 PDT
Dirk Pranke
Comment 19 2012-03-15 20:33:21 PDT
Okay, I'm working on this, but there's definitely some ways to go as I continue to characterize how things crash and how we look for them. I *think* (am not positive) that we were being way too conservative in waiting for ReportCrash to complete. If anyone at Apple knew exactly how ReportCrash runs (and when it exits) and felt like telling me, that would be nice :) there's not a lot of info on the interwebs. From what I can tell, at least when we're blocked on a read from DRT, we don't even find out that a process has crashed until after the report has been written to disk. Interestingly, though, the ReportCrash process sticks around for another 20-30 seconds, basically doing nothing (I haven't attached to it to see if it is doing anything). There are certainly other paths that I'm going to need to investigate, in particular figuring out when to wait in a crashed subprocess (haven't tested WTR at all yet). Any feedback on this patch would be welcome; does this fix crash log issues with DRT on SnowLeopard and/or Lion for y'all if running with one DRT at a time, at least? How about multiple DRTs? (Haven't tested that yet, either).
Dirk Pranke
Comment 20 2012-03-16 15:21:46 PDT
Created attachment 132394 [details] more testing
Dirk Pranke
Comment 21 2012-03-16 20:02:56 PDT
Created attachment 132438 [details] Roll-up patch
Dirk Pranke
Comment 22 2012-03-16 20:11:07 PDT
Comment on attachment 132438 [details] Roll-up patch View in context: https://bugs.webkit.org/attachment.cgi?id=132438&action=review Okay, getting close. Still needs more unit tests, but this should fix the bug and be worth checking out. I haven't tested it on Lion yet, either ... > Tools/Scripts/webkitpy/common/system/crashlogs.py:69 > + errors += "ERROR: Failed to read '%s': %s\n" % (path, str(e)) I'm not sure that we really want to include this in the crash log, but it's useful for debugging race conditions in finding the right crash log and waiting long enough for now ... I'll probably also move the crashlog.py changes into a separate patch. > Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:43 > + writer = TestResultWriter(port._filesystem, port, root_output_dir, test_name) I'll move the filesystem member variable cleanup into a different patch ... > Tools/Scripts/webkitpy/layout_tests/models/test_expectations.py:662 > + CRASH: ('crash', 'crashes'), this change just removes the DRT string since it isn't right for Webkit2. I'm not sure if this whole block even needs to exist, or if we can just pull it from the TestFailures. > Tools/Scripts/webkitpy/layout_tests/models/test_failures.py:-88 > - @staticmethod I'll probably break this into a separate patch as well ... > Tools/Scripts/webkitpy/layout_tests/port/base.py:1052 > + # FIXME: Does it make sense to push this logic into webkitpy.common.system.crashlogs ? Looking for feedback on this in particular ... > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:610 > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, I'm a bit waffly on whether we really need the process name and the pid *in addition to* the crash log, but it's definitely helpful for now to ensure we're correlating pid to crash log properly. > Tools/Scripts/webkitpy/layout_tests/port/mac.py:164 > + # FIXME: Should most of this be moved into CrashLogs()? thoughts? > Tools/WebKitTestRunner/TestController.cpp:812 > if (!m_didPrintWebProcessCrashedMessage) { Any Apple/Mac people wanna comment on this part? It looks like WKPageGetProcessIdentifier is mac-only; should we get other ports to implement something like this?
Dirk Pranke
Comment 23 2012-03-19 16:36:36 PDT
Comment on attachment 132438 [details] Roll-up patch clearing r? since I'm breaking this into multiple patches ...
Dirk Pranke
Comment 24 2012-03-19 18:08:27 PDT
oookay, I think the 7 listed bugs are about the size of it :) (In fairness, a couple of them are barely-related nits that I stumbled over while working on this).
Dirk Pranke
Comment 25 2012-03-21 16:39:05 PDT
closing as all sub-tasks have been committed as of r111619. let me know if you see any more issues.
Note You need to log in before you can comment on or make changes to this bug.