Bug 71380

Summary:

NRWT incorrectly associates crash logs with tests

Product:

WebKit

Reporter:

Simon Fraser (smfr) <simon.fraser>

Component:

Tools / Tests

Assignee:

Dirk Pranke <dpranke>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, ap, aroben, dpranke, eric, jberlin, mitz, ojan, thorton, webkit.review.bot

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

73942, 81575, 81580, 81586, 81595, 81598, 81600, 81603

Bug Blocks:

64491

Attachments:

Description	Flags
Wait for Crash Reporter to finish even when it lets the crashed process die quickly	none
Wait for Crash Reporter to finish even when it lets the crashed process die quickly	none
Patch	none
more testing	none
Roll-up patch	none

Simon Fraser (smfr)

Reported 2011-11-02 11:34:28 PDT

In these results: http://build.webkit.org/results/SnowLeopard%20Intel%20Debug%20(Tests)/r99063%20(2806)/results.html fast/inline/inline-box-adjust-position-crash2.html is crashing, apparently with the crash log here: http://build.webkit.org/results/SnowLeopard%20Intel%20Debug%20(Tests)/r99063%20(2806)/fast/inline/inline-box-adjust-position-crash2-crash-log.txt However, the stderr output: http://build.webkit.org/results/SnowLeopard%20Intel%20Debug%20(Tests)/r99063%20(2806)/fast/inline/inline-box-adjust-position-crash2-stderr.txt shows that the actual crash log is http://build.webkit.org/results/SnowLeopard%20Intel%20Debug%20(Tests)/r99063%20(2806)/platform/mac/accessibility/select-element-selection-with-optgroups-crash-log.txt

Attachments
Wait for Crash Reporter to finish even when it lets the crashed process die quickly (5.41 KB, patch) 2011-12-06 09:08 PST, Adam Roben (:aroben)	no flags	Details Formatted Diff Diff
Wait for Crash Reporter to finish even when it lets the crashed process die quickly (6.18 KB, patch) 2011-12-06 11:20 PST, Adam Roben (:aroben)	no flags	Details Formatted Diff Diff
Patch (31.71 KB, patch) 2012-03-15 20:28 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
more testing (42.49 KB, patch) 2012-03-16 15:21 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Roll-up patch (49.07 KB, patch) 2012-03-16 20:02 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2011-11-02 13:13:45 PDT

Interesting. Sounds bad. Will look shortly.

Adam Roben (:aroben)

Comment 2 2011-12-06 07:12:42 PST

It looks like we only wait for ReportCrash to finish when a test times out. We don't wait if the test crashed quickly. The code is all in this area: http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/server_process.py?rev=98040#L181 Note that we only call self._executive.wait_newest(self._port.is_crash_reporter) from within _handle_timeout.

Adam Roben (:aroben)

Comment 3 2011-12-06 07:32:39 PST

It seems that self.crashed and self._check_for_crash are about the Python subprocess crashing, not about DRT crashing.

Adam Roben (:aroben)

Comment 4 2011-12-06 07:37:35 PST

(In reply to comment #3) > It seems that self.crashed and self._check_for_crash are about the Python subprocess crashing, not about DRT crashing. My mistake. self.crashed/self._check_for_crash are about ServerProcess's self._proc. self._proc corresponds to DRT/WTR.

Adam Roben (:aroben)

Comment 5 2011-12-06 08:02:34 PST

I think we probably need something like ORWT's processIsCrashing functionality.

Adam Roben (:aroben)

Comment 6 2011-12-06 09:08:14 PST

Created attachment 118060 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly

Adam Roben (:aroben)

Comment 7 2011-12-06 09:09:22 PST

Comment on attachment 118060 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly This is only part of the story. Waiting for the newest Crash Reporter instance isn't enough, since we can have multiple Crash Reporter instances going at once. We should probably teach NRWT how to double-check that it's chosen the right crash log, perhaps by comparing PIDs.

Adam Roben (:aroben)

Comment 8 2011-12-06 09:10:38 PST

BTW, this bug is very easy for me to reproduce: 1. Insert an abort() in DumpRenderTree.mm:runTest 2. run-webkit-tests --no-http --exit-after-n-crashes-or-timeouts=2

Adam Roben (:aroben)

Comment 9 2011-12-06 09:23:34 PST

(In reply to comment #7) > (From update of attachment 118060 [details]) > Waiting for the newest Crash Reporter instance isn't enough, since we can have multiple Crash Reporter instances going at once. Similarly, always choosing the newest crash log for a given process name isn't enough, since we can have multiple crash logs written for DRT before we go to find the crash log itself.

Adam Roben (:aroben)

Comment 10 2011-12-06 11:20:10 PST

Created attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly

Adam Roben (:aroben)

Comment 11 2011-12-06 12:05:06 PST

Comment on attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly Landed as r102162.

Adam Roben (:aroben)

Comment 12 2011-12-06 12:06:41 PST

My plan is to have ServerProcess collect the crash log itself (after waiting as needed), then pass it along to SingleTestWriter as part of the DriverOutput, rather than having SingleTestWriter find the crash log after the fact as it does today.

Adam Roben (:aroben)

Comment 13 2011-12-06 12:17:24 PST

Unfortunately, ServerProcess doesn't (and shouldn't) know about WebProcess crashes. So maybe this needs to be handled at the WebKitDriver level?

Adam Roben (:aroben)

Comment 14 2011-12-06 12:17:48 PST

We might need to change "#CRASHED - WebProcess\n" to "#CRASHED - WebProcess <pid>\n".

Eric Seidel (no email)

Comment 15 2011-12-06 12:27:39 PST

(In reply to comment #14) > We might need to change "#CRASHED - WebProcess\n" to "#CRASHED - WebProcess <pid>\n". That sounds like a good idea. :)

Eric Seidel (no email)

Comment 16 2011-12-19 10:49:39 PST

Comment on attachment 118075 [details] Wait for Crash Reporter to finish even when it lets the crashed process die quickly Cleared Dirk Pranke's review+ from obsolete attachment 118075 [details] so that this bug does not appear in http://webkit.org/pending-commit.

Dirk Pranke

Comment 17 2012-03-15 15:36:58 PDT

i will take a look ...

Dirk Pranke

Comment 18 2012-03-15 20:28:09 PDT

Created attachment 132184 [details] Patch

Dirk Pranke

Comment 19 2012-03-15 20:33:21 PDT

Okay, I'm working on this, but there's definitely some ways to go as I continue to characterize how things crash and how we look for them. I *think* (am not positive) that we were being way too conservative in waiting for ReportCrash to complete. If anyone at Apple knew exactly how ReportCrash runs (and when it exits) and felt like telling me, that would be nice :) there's not a lot of info on the interwebs. From what I can tell, at least when we're blocked on a read from DRT, we don't even find out that a process has crashed until after the report has been written to disk. Interestingly, though, the ReportCrash process sticks around for another 20-30 seconds, basically doing nothing (I haven't attached to it to see if it is doing anything). There are certainly other paths that I'm going to need to investigate, in particular figuring out when to wait in a crashed subprocess (haven't tested WTR at all yet). Any feedback on this patch would be welcome; does this fix crash log issues with DRT on SnowLeopard and/or Lion for y'all if running with one DRT at a time, at least? How about multiple DRTs? (Haven't tested that yet, either).

Dirk Pranke

Comment 20 2012-03-16 15:21:46 PDT

Created attachment 132394 [details] more testing

Dirk Pranke

Comment 21 2012-03-16 20:02:56 PDT

Created attachment 132438 [details] Roll-up patch

Dirk Pranke

Comment 22 2012-03-16 20:11:07 PDT

Comment on attachment 132438 [details] Roll-up patch View in context: https://bugs.webkit.org/attachment.cgi?id=132438&action=review Okay, getting close. Still needs more unit tests, but this should fix the bug and be worth checking out. I haven't tested it on Lion yet, either ... > Tools/Scripts/webkitpy/common/system/crashlogs.py:69 > + errors += "ERROR: Failed to read '%s': %s\n" % (path, str(e)) I'm not sure that we really want to include this in the crash log, but it's useful for debugging race conditions in finding the right crash log and waiting long enough for now ... I'll probably also move the crashlog.py changes into a separate patch. > Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:43 > + writer = TestResultWriter(port._filesystem, port, root_output_dir, test_name) I'll move the filesystem member variable cleanup into a different patch ... > Tools/Scripts/webkitpy/layout_tests/models/test_expectations.py:662 > + CRASH: ('crash', 'crashes'), this change just removes the DRT string since it isn't right for Webkit2. I'm not sure if this whole block even needs to exist, or if we can just pull it from the TestFailures. > Tools/Scripts/webkitpy/layout_tests/models/test_failures.py:-88 > - @staticmethod I'll probably break this into a separate patch as well ... > Tools/Scripts/webkitpy/layout_tests/port/base.py:1052 > + # FIXME: Does it make sense to push this logic into webkitpy.common.system.crashlogs ? Looking for feedback on this in particular ... > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:610 > + crash=crash, crashed_process_name=crashed_process_name, crashed_pid=crashed_pid, crash_log=crash_log, I'm a bit waffly on whether we really need the process name and the pid *in addition to* the crash log, but it's definitely helpful for now to ensure we're correlating pid to crash log properly. > Tools/Scripts/webkitpy/layout_tests/port/mac.py:164 > + # FIXME: Should most of this be moved into CrashLogs()? thoughts? > Tools/WebKitTestRunner/TestController.cpp:812 > if (!m_didPrintWebProcessCrashedMessage) { Any Apple/Mac people wanna comment on this part? It looks like WKPageGetProcessIdentifier is mac-only; should we get other ports to implement something like this?

Dirk Pranke

Comment 23 2012-03-19 16:36:36 PDT

Comment on attachment 132438 [details] Roll-up patch clearing r? since I'm breaking this into multiple patches ...

Dirk Pranke

Comment 24 2012-03-19 18:08:27 PDT

oookay, I think the 7 listed bugs are about the size of it :) (In fairness, a couple of them are barely-related nits that I stumbled over while working on this).

Dirk Pranke

Comment 25 2012-03-21 16:39:05 PDT

closing as all sub-tasks have been committed as of r111619. let me know if you see any more issues.

Note You need to log in before you can comment on or make changes to this bug.