Summary: | NRWT incorrectly associates crash logs with tests | ||
---|---|---|---|
Product: | WebKit | Reporter: | Simon Fraser (smfr) <simon.fraser> |
Component: | Tools / Tests | Assignee: | Dirk Pranke <dpranke> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | abarth, ap, aroben, dpranke, eric, jberlin, mitz, ojan, thorton, webkit.review.bot |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Bug Depends on: | 73942, 81575, 81580, 81586, 81595, 81598, 81600, 81603 | ||
Bug Blocks: | 64491 | ||
Attachments: |
Description
Simon Fraser (smfr)
2011-11-02 11:34:28 PDT
Interesting. Sounds bad. Will look shortly. It looks like we only wait for ReportCrash to finish when a test times out. We don't wait if the test crashed quickly. The code is all in this area: http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/server_process.py?rev=98040#L181 Note that we only call self._executive.wait_newest(self._port.is_crash_reporter) from within _handle_timeout. It seems that self.crashed and self._check_for_crash are about the Python subprocess crashing, not about DRT crashing. (In reply to comment #3) > It seems that self.crashed and self._check_for_crash are about the Python subprocess crashing, not about DRT crashing. My mistake. self.crashed/self._check_for_crash are about ServerProcess's self._proc. self._proc corresponds to DRT/WTR. I think we probably need something like ORWT's processIsCrashing functionality. Created attachment 118060 [details]
Wait for Crash Reporter to finish even when it lets the crashed process die quickly
Comment on attachment 118060 [details]
Wait for Crash Reporter to finish even when it lets the crashed process die quickly
This is only part of the story. Waiting for the newest Crash Reporter instance isn't enough, since we can have multiple Crash Reporter instances going at once. We should probably teach NRWT how to double-check that it's chosen the right crash log, perhaps by comparing PIDs.
BTW, this bug is very easy for me to reproduce: 1. Insert an abort() in DumpRenderTree.mm:runTest 2. run-webkit-tests --no-http --exit-after-n-crashes-or-timeouts=2 (In reply to comment #7) > (From update of attachment 118060 [details]) > Waiting for the newest Crash Reporter instance isn't enough, since we can have multiple Crash Reporter instances going at once. Similarly, always choosing the newest crash log for a given process name isn't enough, since we can have multiple crash logs written for DRT before we go to find the crash log itself. Created attachment 118075 [details]
Wait for Crash Reporter to finish even when it lets the crashed process die quickly
Comment on attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly Landed as r102162. My plan is to have ServerProcess collect the crash log itself (after waiting as needed), then pass it along to SingleTestWriter as part of the DriverOutput, rather than having SingleTestWriter find the crash log after the fact as it does today. Unfortunately, ServerProcess doesn't (and shouldn't) know about WebProcess crashes. So maybe this needs to be handled at the WebKitDriver level? We might need to change "#CRASHED - WebProcess\n" to "#CRASHED - WebProcess <pid>\n". (In reply to comment #14) > We might need to change "#CRASHED - WebProcess\n" to "#CRASHED - WebProcess <pid>\n". That sounds like a good idea. :) Comment on attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly Cleared Dirk Pranke's review+ from obsolete attachment 118075 [details] so that this bug does not appear in http://webkit.org/pending-commit. i will take a look ... Created attachment 132184 [details]
Patch
Okay, I'm working on this, but there's definitely some ways to go as I continue to characterize how things crash and how we look for them. I *think* (am not positive) that we were being way too conservative in waiting for ReportCrash to complete. If anyone at Apple knew exactly how ReportCrash runs (and when it exits) and felt like telling me, that would be nice :) there's not a lot of info on the interwebs. From what I can tell, at least when we're blocked on a read from DRT, we don't even find out that a process has crashed until after the report has been written to disk. Interestingly, though, the ReportCrash process sticks around for another 20-30 seconds, basically doing nothing (I haven't attached to it to see if it is doing anything). There are certainly other paths that I'm going to need to investigate, in particular figuring out when to wait in a crashed subprocess (haven't tested WTR at all yet). Any feedback on this patch would be welcome; does this fix crash log issues with DRT on SnowLeopard and/or Lion for y'all if running with one DRT at a time, at least? How about multiple DRTs? (Haven't tested that yet, either). Created attachment 132394 [details]
more testing
Created attachment 132438 [details]
Roll-up patch
Comment on attachment 132438 [details] Roll-up patch View in context: https://bugs.webkit.org/attachment.cgi?id=132438&action=review Okay, getting close. Still needs more unit tests, but this should fix the bug and be worth checking out. I haven't tested it on Lion yet, either ... > Tools/Scripts/webkitpy/common/system/crashlogs.py:69 > + errors += "ERROR: Failed to read '%s': %s\n" % (path, str(e)) I'm not sure that we really want to include this in the crash log, but it's useful for debugging race conditions in finding the right crash log and waiting long enough for now ... I'll probably also move the crashlog.py changes into a separate patch. > Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:43 > + writer = TestResultWriter(port._filesystem, port, root_output_dir, test_name) I'll move the filesystem member variable cleanup into a different patch ... > Tools/Scripts/webkitpy/layout_tests/models/test_expectations.py:662 > + CRASH: ('crash', 'crashes'), this change just removes the DRT string since it isn't right for Webkit2. I'm not sure if this whole block even needs to exist, or if we can just pull it from the TestFailures. > Tools/Scripts/webkitpy/layout_tests/models/test_failures.py:-88 > - @staticmethod I'll probably break this into a separate patch as well ... > Tools/Scripts/webkitpy/layout_tests/port/base.py:1052 > + # FIXME: Does it make sense to push this logic into webkitpy.common.system.crashlogs ? Looking for feedback on this in particular ... > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:610 > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, I'm a bit waffly on whether we really need the process name and the pid *in addition to* the crash log, but it's definitely helpful for now to ensure we're correlating pid to crash log properly. > Tools/Scripts/webkitpy/layout_tests/port/mac.py:164 > + # FIXME: Should most of this be moved into CrashLogs()? thoughts? > Tools/WebKitTestRunner/TestController.cpp:812 > if (!m_didPrintWebProcessCrashedMessage) { Any Apple/Mac people wanna comment on this part? It looks like WKPageGetProcessIdentifier is mac-only; should we get other ports to implement something like this? Comment on attachment 132438 [details]
Roll-up patch
clearing r? since I'm breaking this into multiple patches ...
oookay, I think the 7 listed bugs are about the size of it :) (In fairness, a couple of them are barely-related nits that I stumbled over while working on this). closing as all sub-tasks have been committed as of r111619. let me know if you see any more issues. |