Bug 81659

Summary: [GTK] crash log reports support
Product: WebKit Reporter: Philippe Normand <pnormand>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aroben, dpranke, eric, ojan, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
dpranke: review-
Path
none
Patch
dpranke: review+
Patch mrobinson: review+

Description Philippe Normand 2012-03-20 06:53:25 PDT
When a test crash the "crash log" link in the results page should provide the stacktrace and state of all the running threads.
Comment 1 Philippe Normand 2012-03-20 09:02:55 PDT
Created attachment 132836 [details]
Patch
Comment 2 Philippe Normand 2012-03-20 10:25:57 PDT
I'm not sure about the move of the CrashLog logic to the Port infrastructure, I can undo this change if needed but I'd need the new crashmon/daemontools code to be called only in GTK for now and I don't see how to reliably test this (sys.platform == "linux2" won't work, other ports run on Linux).
Comment 3 Dirk Pranke 2012-03-20 13:08:59 PDT
Comment on attachment 132836 [details]
Patch

I am a bit of two minds as to whether these routines belong on the Port object or under common/system somewhere. I think I agree that figuring out what the right crash log is is a property of the Port and not the operating system, since, for example, chromium uses different mechanisms than apple on the mac or Gtk on Linux.

Adam, Eric, any thoughts?

Apart from that, this approach seems reasonable but we need to merge it with the clean up I'm doing in bugs 81600 and 81603. Also, you might care about the change in bug 81575. 


View in context: https://bugs.webkit.org/attachment.cgi?id=132836&action=review

> Tools/Scripts/webkitpy/common/system/executive.py:264
> +                return e.errno == errno.EPERM

Why is this change needed? I don't understand what was failing before or is fixed now ...

> Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:65
> +            writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error, driver_output.pid)

We should coordinate this change with my patches in bug 81603 and 81600 - the driver should return the crash log and the test result writer should just use that.

> Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:166
> +        fs.write_text_file(filename, log if log else error.decode("utf-8"))

see above.

> Tools/Scripts/webkitpy/layout_tests/port/driver.py:67
> +            test_time=0, timeout=False, error='', crashed_process_name=None, pid=None):

see above; I think this should be crashed_pid, not pid.

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:146
> +        return process_name == 'gdb-dump'

This function goes away w/ the patch in bug 81600 ...

> Tools/Scripts/webkitpy/tool/commands/queries.py:375
> +        for port_name in tool.port_factory.all_port_names():

You probably don't want to iterate across all_port_names(), since that could presumably get you the same crash log multiple times.
Comment 4 Philippe Normand 2012-03-21 02:41:12 PDT
(In reply to comment #3)
> (From update of attachment 132836 [details])
> I am a bit of two minds as to whether these routines belong on the Port object or under common/system somewhere. I think I agree that figuring out what the right crash log is is a property of the Port and not the operating system, since, for example, chromium uses different mechanisms than apple on the mac or Gtk on Linux.
> 
> Adam, Eric, any thoughts?
> 
> Apart from that, this approach seems reasonable but we need to merge it with the clean up I'm doing in bugs 81600 and 81603. Also, you might care about the change in bug 81575. 
> 

Ok I'll hold my horse on these bugs then :)

> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=132836&action=review
> 
> > Tools/Scripts/webkitpy/common/system/executive.py:264
> > +                return e.errno == errno.EPERM
> 
> Why is this change needed? I don't understand what was failing before or is fixed now ...
> 

When I was testing our crashmon script and running it as root I was hitting this issue. Running the tests as a normal user I wasn't able to send the 0 signal to the crashmon process executed by root.

> > Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:65
> > +            writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error, driver_output.pid)
> 
> We should coordinate this change with my patches in bug 81603 and 81600 - the driver should return the crash log and the test result writer should just use that.
> 
> > Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py:166
> > +        fs.write_text_file(filename, log if log else error.decode("utf-8"))
> 
> see above.
> 
> > Tools/Scripts/webkitpy/layout_tests/port/driver.py:67
> > +            test_time=0, timeout=False, error='', crashed_process_name=None, pid=None):
> 
> see above; I think this should be crashed_pid, not pid.
> 
> > Tools/Scripts/webkitpy/layout_tests/port/gtk.py:146
> > +        return process_name == 'gdb-dump'
> 
> This function goes away w/ the patch in bug 81600 ...
> 
> > Tools/Scripts/webkitpy/tool/commands/queries.py:375
> > +        for port_name in tool.port_factory.all_port_names():
> 
> You probably don't want to iterate across all_port_names(), since that could presumably get you the same crash log multiple times.

So when I find the first, print it and exit early?
Comment 5 Dirk Pranke 2012-03-21 13:37:17 PDT
(In reply to comment #4)
> > > Tools/Scripts/webkitpy/common/system/executive.py:264
> > > +                return e.errno == errno.EPERM
> > 
> > Why is this change needed? I don't understand what was failing before or is fixed now ...
> > 
> 
> When I was testing our crashmon script and running it as root I was hitting this issue. Running the tests as a normal user I wasn't able to send the 0 signal to the crashmon process executed by root.
>

OK.
 
> > > Tools/Scripts/webkitpy/tool/commands/queries.py:375
> > > +        for port_name in tool.port_factory.all_port_names():
> > 
> > You probably don't want to iterate across all_port_names(), since that could presumably get you the same crash log multiple times.
> 
> So when I find the first, print it and exit early?

That's probably as good as anything else I can think of.
Comment 6 Philippe Normand 2012-03-27 10:20:35 PDT
Created attachment 134094 [details]
Path
Comment 7 Martin Robinson 2012-03-27 14:52:14 PDT
Comment on attachment 134094 [details]
Path

View in context: https://bugs.webkit.org/attachment.cgi?id=134094&action=review

Looks good, but I'd rather someone more familiar with the crash dumping code do the final review. I have left some comments below.

> Tools/BuildSlaveSupport/gtk/README:-23
> -==============================================
> - Running a GTK+ build slave under daemontools
> -==============================================
> -
> -This directory contains several scripts which can be used to run a WebKitGTK+
> -build slave under daemontools [1]. This is convenient because daemontools
> -will automatically restart services when they die, and that means less human
> -intervention is needed.
> -
> -
> -Dependencies
> -============
> -
> -In order to use the provided service control files, you will need the
> -following:
> -
> -* The GNU Bash shell (the scripts contain some bash-isms)
> -
> -* The daemontools package (or one of its drop-in replacements, like runit
> -  or freedt; but only daemontools has been tested so far).
> -
> -* The crash dump monitor also uses "inotifywait" (part of inotify-tools [2])
> -

Did you mean to remove all of this README? Not all of it deals with crashmon.

> Tools/Scripts/new-run-webkit-tests:61
> +        jhbuild = os.path.join(script_dir, '..', 'gtk', 'run-with-jhbuild')
> +        cmd.insert(1, jhbuild)

This can be one line.

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:156
> +        name_str = name or '<unknown process name>'
> +        pid_str = str(pid or '<unknown>')

Okay, just a nit but you should use full words, unless the abbreviation would be more canonical. So IMO pid is good and str is bad.

In what cases is pid == None? If pid can be None and name can be None shouldn't you handle that case explicitly and do an early return?

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:164
> +        expected_crash_dump_filename = "core-pid_%s-_-process_%s" % (pid_str, name_str)
> +        if pid:
> +            match_filename = lambda name: name == expected_crash_dump_filename
> +        else:
> +            match_filename = lambda name: name.find(name_str) > -1

I think would be more clear as:

def match_filename(name):
    if pid:
        return name == expected_crash_dump_filename
    return name.find(name_str) > -1

If it's frequent that pid is None, wouldn't it be better to simply find the dump with the most recent timestamp?

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:166
> +        for path in reversed(sorted(dumps)):

Perhaps the sorting here ensures that the most recent dump is selected? I'm just guessing, so maybe it deserves a comment.

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:170
> +        stderr_lines = errors + (stderr or '<empty>').decode('utf8').splitlines()

What guarantee do we have that stderr is in UTF-8 format?
Comment 8 Dirk Pranke 2012-03-27 19:12:36 PDT
Comment on attachment 134094 [details]
Path

The patch generally looks pretty good to me, just a few nits ...

>> Tools/Scripts/new-run-webkit-tests:61
>> +        cmd.insert(1, jhbuild)
> 
> This can be one line.

Ew, this feels like a hack; then again, so is much of this file :(

>> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:156
>> +        pid_str = str(pid or '<unknown>')
> 
> Okay, just a nit but you should use full words, unless the abbreviation would be more canonical. So IMO pid is good and str is bad.
> 
> In what cases is pid == None? If pid can be None and name can be None shouldn't you handle that case explicitly and do an early return?

pid is None if WebProcess crashes but we're not told the pid of the WebProcess (which I think is currently always on the GTk, and will be until you add support for that). name is probably never None now.

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:165
> +        dumps = self._filesystem.files_under(log_directory)

note that you can pass match_filename as a filter to files_under() directly.

>> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:170
>> +        stderr_lines = errors + (stderr or '<empty>').decode('utf8').splitlines()
> 
> What guarantee do we have that stderr is in UTF-8 format?

I'm not actually sure that it's always (or even ever) in UTF-8; I saw a crash earlier today on the chromium mac 10.6 bot when we failed to decode something as utf-8, so we might want to trap a codec error here and try it the other way as well.
Comment 9 Philippe Normand 2012-03-28 01:13:10 PDT
Created attachment 134234 [details]
Patch
Comment 10 Dirk Pranke 2012-03-28 12:15:53 PDT
Comment on attachment 134234 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=134234&action=review

> Tools/Scripts/webkitpy/layout_tests/port/gtk.py:173
> +        stderr_lines = errors + (stderr or '<empty>').decode('utf8', errors='ignore').splitlines()

errors='ignore', huh? didn't know about that ...
Comment 11 Philippe Normand 2012-03-30 01:12:03 PDT
Thanks for the review! However I think I'll upload a new iteration of the patch, removing the --coredump-dir command line and using an environment variable instead. It should ease deployment on the bots.
Comment 12 Philippe Normand 2012-03-30 02:58:02 PDT
Created attachment 134760 [details]
Patch
Comment 13 Philippe Normand 2012-04-03 08:23:30 PDT
Committed r113037: <http://trac.webkit.org/changeset/113037>