Bug 159827 - run-webkit-tests should trigger a spindump when WebContent process is unresponsive
Summary: run-webkit-tests should trigger a spindump when WebContent process is unrespo...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-15 13:25 PDT by Jonathan Bedard
Modified: 2016-08-08 09:20 PDT (History)
7 users (show)

See Also:


Attachments
Patch (7.68 KB, patch)
2016-07-15 13:47 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (7.43 KB, patch)
2016-07-15 15:54 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (7.53 KB, patch)
2016-07-19 10:09 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (9.98 KB, patch)
2016-07-19 15:32 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (9.79 KB, patch)
2016-07-20 11:32 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (7.12 KB, patch)
2016-07-21 15:38 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.65 KB, patch)
2016-07-22 11:48 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.59 KB, patch)
2016-07-26 15:56 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.73 KB, patch)
2016-07-26 16:32 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.83 KB, patch)
2016-07-27 10:17 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.77 KB, patch)
2016-07-27 11:05 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (7.17 KB, patch)
2016-07-27 14:47 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.86 KB, patch)
2016-07-29 13:15 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.86 KB, patch)
2016-07-29 13:17 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews126 for ios-simulator-wk2 (915.94 KB, application/zip)
2016-07-29 14:21 PDT, Build Bot
no flags Details
Patch (6.70 KB, patch)
2016-07-29 14:27 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (6.70 KB, patch)
2016-08-02 10:52 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (9.51 KB, patch)
2016-08-04 15:14 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (9.92 KB, patch)
2016-08-04 17:34 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2016-07-15 13:25:31 PDT
Processes which hang and are killed by the WTR::TestInvocator leave no stack trace or logging details.  The Python test scripts should run spindump and save the output file so that such errors can be debugged, particularly should they happen on bots.
Comment 1 Jonathan Bedard 2016-07-15 13:47:03 PDT
Created attachment 283786 [details]
Patch
Comment 2 Jonathan Bedard 2016-07-15 15:54:33 PDT
Created attachment 283805 [details]
Patch
Comment 3 Alexey Proskuryakov 2016-07-15 15:59:40 PDT
Have you considered adding an option for this, and/or generalizing --(no-)sample?
Comment 4 Jonathan Bedard 2016-07-18 10:27:30 PDT
Yes, I did consider adding an option.  I hadn't implemented it yet because I think it should be opt-out, and given how rarely it should run (which is ideally never) I wasn't and am still not sure if it's even worth an option.
Comment 5 David Kilzer (:ddkilzer) 2016-07-18 15:46:05 PDT
Comment on attachment 283805 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=283805&action=review

Overall this looks good, but please post a revised patch addressing the comments below.  r- to address feedback.

> Tools/ChangeLog:19
> +        (WTR::TestInvocation:: triggerSpindump): Exchange spindump data with python through stdin/stderr

Spindump logs can be quite large.  Where will this text output show up in the test results, as stderr output for the test?

It's okay to send the contents via stderr as an initial experiment (so I'm not asking you to rewrite this patch), but if we find it useful (or it happens more often than we think), we should consider writing the spindump log to a file, and adding the file to the results page so it's a separate entity.  (This also makes it easy to create a Safari extension to process the spindump log when visiting a test results page, rather than parsing it out of a chunk of stderr text.)

> Tools/Scripts/webkitpy/port/driver.py:38
>  import sys
>  import time
>  import os
> +import subprocess

Nit: We try to keep these sorted in alphabetical order.  Please put 'subprocess' between 'shlex' and 'sys', or just re-alphabetize both.

> Tools/Scripts/webkitpy/port/driver.py:436
> +        match = re.search('pid (\d+)', error_line)

Is it worth matching the parenthesis here?  (pid NNNN)

No change necessary to land the patch; just asking the question since it may reduce the off chance of a false-positive match.

> Tools/Scripts/webkitpy/port/driver.py:445
> +            parsed_filename = os.path.splitext(self._test_name)[0]
> +            path_location = os.path.split(parsed_filename)[0]
> +
> +            if not os.path.exists(self._port.results_directory()):
> +                os.makedirs(self._port.results_directory())
> +            if not os.path.exists(self._port.results_directory() + '/' + path_location):
> +                os.makedirs(self._port.results_directory() + '/' + path_location)

Was this code copied from somewhere?  If so, seems like it should be pulled into a new method.  (Not required to fix to land this patch.)

> Tools/WebKitTestRunner/TestInvocation.cpp:211
> +    sprintf(spindumpMessage, "#TRIGGER SPINDUMP - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid));
> +#else
> +    sprintf(spindumpMessage, "#TRIGGER SPINDUMP - %s\n", TestController::webProcessName());

I know dumpWebProcessUnresponsiveness() doesn't use it, but please use snprintf() here and pull the length into a 'const size_t length' variable.

> Tools/WebKitTestRunner/TestInvocation.cpp:219
> +    if (!fgets(spindumpMessage, 1024, stdin)
> +        || strcmp(spindumpMessage, "#SPINDUMP FINISHED\n"))

Formatting: IIRC, this should be on one line, or braces should be used for the body of the if statement.
Comment 6 Daniel Bates 2016-07-18 17:57:36 PDT
Comment on attachment 283805 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=283805&action=review

> Tools/Scripts/webkitpy/port/driver.py:452
> +
> +            spindump_process = subprocess.Popen(['sudo', '-A', 'spindump', str(child_process_pid), '10', '-file', self._port.results_directory() + '/' + parsed_filename + '.spindump.txt'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
> +            read_line = spindump_process.stdout.readline()
> +            while read_line:
> +                if read_line:
> +                    _log.debug(read_line)
> +                read_line = spindump_process.stdout.readline()

This assumes password-less sudo access or that sudo was invoked within the password prompt timeout period. Otherwise, we will hang waiting for a password. This is not the correct place for such port-specific code. It should be in the port-specific Port object. Use Port.sample_process() as an example.
Comment 7 Alexey Proskuryakov 2016-07-18 18:02:15 PDT
> This assumes password-less sudo access or that sudo was invoked within the password prompt timeout period. Otherwise, we will hang waiting for a password.

With the -A flag, sudo fails immediately, unless SUDO_ASKPASS is in the environment. Jonathan and I discussed this, but I forgot whether the script needed to explicitly unset SUDO_ASKPASS.
Comment 8 Jonathan Bedard 2016-07-19 10:01:49 PDT
Comment on attachment 283805 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=283805&action=review

>> Tools/ChangeLog:19
>> +        (WTR::TestInvocation:: triggerSpindump): Exchange spindump data with python through stdin/stderr
> 
> Spindump logs can be quite large.  Where will this text output show up in the test results, as stderr output for the test?
> 
> It's okay to send the contents via stderr as an initial experiment (so I'm not asking you to rewrite this patch), but if we find it useful (or it happens more often than we think), we should consider writing the spindump log to a file, and adding the file to the results page so it's a separate entity.  (This also makes it easy to create a Safari extension to process the spindump log when visiting a test results page, rather than parsing it out of a chunk of stderr text.)

My comments here are misleading, partially because the original implementation did dump the spindump log to stdout.  Currently, the spindump log is written to a file.

>> Tools/Scripts/webkitpy/port/driver.py:436
>> +        match = re.search('pid (\d+)', error_line)
> 
> Is it worth matching the parenthesis here?  (pid NNNN)
> 
> No change necessary to land the patch; just asking the question since it may reduce the off chance of a false-positive match.

It shouldn't be needed, as the #TRIGGER SPINDUMP - (\S+) should have already checked for the matching parenthesis

>> Tools/Scripts/webkitpy/port/driver.py:445
>> +                os.makedirs(self._port.results_directory() + '/' + path_location)
> 
> Was this code copied from somewhere?  If so, seems like it should be pulled into a new method.  (Not required to fix to land this patch.)

It was not copied, I'm just not sure how much of it is needed (or exactly how the 'makedris' function works in Python).  The purpose of this code is to guarantee that the location spindump is sending it's output to actually exists.  It seems to me to be connected closely enough to the detection of the trigger output from the C++ program to justify being one function, although it may be sensible to separate the trigger detection from the actual spindump call, particularly if it turns out we want to call spindump from other places as well.

>> Tools/Scripts/webkitpy/port/driver.py:452
>> +                read_line = spindump_process.stdout.readline()
> 
> This assumes password-less sudo access or that sudo was invoked within the password prompt timeout period. Otherwise, we will hang waiting for a password. This is not the correct place for such port-specific code. It should be in the port-specific Port object. Use Port.sample_process() as an example.

That's what the '-A' does.  If password-less sudo is not enabled (and it is enabled on the automation machines) this will simply continue execution without calling spindump (and will output 'sudo: no askpass program specified, try setting SUDO_ASKPASS'), which should be fine since this is designed to catch process hangs on automation machines.
Comment 9 Jonathan Bedard 2016-07-19 10:09:12 PDT
Created attachment 284016 [details]
Patch
Comment 10 Jonathan Bedard 2016-07-19 13:30:42 PDT
SUDO_ASK_PASS essentially points to a script to be run.  The idea being that a user can define, for example, a pop-up which is triggered for cases when a program without a command-line interface needs to access sudo permissions.  The question here is how do we want this to work?  Given that SUDO_ASK_PASS is essentially designed for this use-case, I think we're good.  That being said, I have questions as to weather running this script without some sort of timeout is prudent.
Comment 11 Jonathan Bedard 2016-07-19 15:32:34 PDT
Created attachment 284062 [details]
Patch
Comment 12 Alexey Proskuryakov 2016-07-19 15:45:47 PDT
Comment on attachment 284062 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review

> Tools/WebKitTestRunner/TestInvocation.cpp:216
> +    fputs("If running manually, hit enter...\n", stderr);

Would isatty(stdout) avoid the need for interaction in this case?
Comment 13 Jonathan Bedard 2016-07-19 15:55:06 PDT
Comment on attachment 284062 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review

>> Tools/WebKitTestRunner/TestInvocation.cpp:216
>> +    fputs("If running manually, hit enter...\n", stderr);
> 
> Would isatty(stdout) avoid the need for interaction in this case?

Would a script being driven by Python consider itself to by run in a terminal context?  I think isatty(stdout) might work, assuming pipes (which is what Python is using, correct?) don't count as terminal contexts.  That's a handy little function, haven't seen it before.
Comment 14 Daniel Bates 2016-07-19 19:49:11 PDT
Comment on attachment 284062 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review

> Tools/Scripts/webkitpy/port/ios.py:434
> +        spindump_process = subprocess.Popen(['sudo', '-A', 'spindump', str(pid), '10', '-file', self.results_directory() + '/' + parsed_filename + '.spindump.txt'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

Can we make use of option -n instead of -A to have sudo exit with an error if the command requires a password?

From some experiments, "sudo -n" seems to work and avoids prompting for a password if either the current user has password-less sudo access (marked NOPASSWD: ALL in the sudoers file) or if "sudo -n" was invoked within the password prompt timeout period. If a password is required then sudo will exit with an exit code of 1.

> Tools/Scripts/webkitpy/port/ios.py:439
> +        read_line = spindump_process.stdout.readline()
> +        while read_line:
> +            if read_line:
> +                _log.debug(read_line)
> +            read_line = spindump_process.stdout.readline()

We should be using Executive.run_command() or Executive.run_and_throw_if_fail() to run this command (with the executive object on the port object) instead of using subprocess directly. Unless you feel that the messages emitted by spindump(8) to explain each step that it is performing is beneficial, I would use Executive.run_command() and pass return_exit_code=True to avoid capturing any standard output.
Comment 15 Jonathan Bedard 2016-07-20 09:53:11 PDT
Comment on attachment 284062 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review

>> Tools/Scripts/webkitpy/port/ios.py:434
>> +        spindump_process = subprocess.Popen(['sudo', '-A', 'spindump', str(pid), '10', '-file', self.results_directory() + '/' + parsed_filename + '.spindump.txt'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
> 
> Can we make use of option -n instead of -A to have sudo exit with an error if the command requires a password?
> 
> From some experiments, "sudo -n" seems to work and avoids prompting for a password if either the current user has password-less sudo access (marked NOPASSWD: ALL in the sudoers file) or if "sudo -n" was invoked within the password prompt timeout period. If a password is required then sudo will exit with an exit code of 1.

sudo -n may be a better fit, particularly if people don't define SUDO_ASKPASS.  I suppose that what this ultimately comes down to is which functionality is preferred.  Do we want the user to be prompted if SUDO_ASKPASS is defined through whatever mechanism they personally provided in that script, and fail if such a method is not defined, or simply fail if the command cannot get achieve sudo permissions?  For the time being, I think -n is probably the safer choice, although it may be less useful.
Comment 16 Alexey Proskuryakov 2016-07-20 09:58:31 PDT
I agree, let's go with -n. We can always switch to -A in the future if there is a need for SUDO_ASKPASS.
Comment 17 Jonathan Bedard 2016-07-20 11:32:31 PDT
Created attachment 284131 [details]
Patch
Comment 18 Daniel Bates 2016-07-20 22:55:04 PDT
Comment on attachment 284131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review

> Tools/ChangeLog:7
> +

Please explain the motivation for this change. From what I can tell the spindump(8) report is the concatenation of sample(1)ing all running processing plus some additional data. I take it that our interest in a spindump report is to capture the state of other WebKit processes (e.g. the Network process) when a WebContent process hangs to help diagnose a hard-to-reproduce test failure.

> Tools/Scripts/webkitpy/port/driver.py:192
> +        if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://')  or driver_input.test_name.startswith('file:///'):
> +            self._test_name = self.uri_to_test(driver_input.test_name)
> +        else:
> +            self._test_name = driver_input.test_name

See my remark in _check_for_spindump_trigger() and in run_spindump() (defined in ios.py) that suggest making use of the existing machinery for sample reports. If we can make use of this machinery then we can remove this code.

> Tools/Scripts/webkitpy/port/driver.py:440
> +        if not error_line.startswith("#TRIGGER SPINDUMP - "):
> +            return False
> +
> +        match = re.match('#TRIGGER SPINDUMP - (\S+)', error_line)
> +        child_process_name = match.group(1) if match else 'WebProcess'
> +        match = re.search('pid (\d+)', error_line)
> +        child_process_pid = int(match.group(1)) if match else None
> +        if child_process_pid:
> +            self._port.run_spindump(self._test_name, child_process_pid)
> +        self._server_process.write('#SPINDUMP FINISHED\n')
> +        return True

Is it necessary to have separate logic for spin dumps? I mean, this patch unconditionally replaces the machinery for sampling a process using sample(1) with machinery to call spindump(8). I do not see the need to only do one or the other, especially given that spindump(8) can only be run by root. Could we make use of the existing "PROCESS UNRESPONSIVE" message and incorporate the spin dump logic into Port.sample_process()? Could we incorporate the spin dump logic into Port.sample_process() such that we first call spindump(8) and if that command returns a non-zero exit status (say, the current user does not have password-less sudo access) then we call sample(1) on the WebContent process as we do today?

> Tools/Scripts/webkitpy/port/ios.py:432
> +        parsed_filename = os.path.splitext(test_name)[0]
> +        path_location = os.path.split(parsed_filename)[0]
> +
> +        if not os.path.exists(self.results_directory()):
> +            os.makedirs(self.results_directory())
> +        if not os.path.exists(self.results_directory() + '/' + path_location):
> +            os.makedirs(self.results_directory() + '/' + path_location)

Can we make use of the existing machinery (*) that copies sample(1) reports to the appropriate place in the results directory? If it makes sense to incorporate the spin dump logic into Port.sample_process() then we likely can pass Port.sample_file_path(..., ...) as the path for the -file argument to spindump(8) and make use of the existing machinery the collect samples and moves then to be near the test results for a failed test.

(*) After running a test Manager._look_for_new_crash_logs() is called to collect crash reports and sample reports. Manager._look_for_new_crash_logs() calls Port.look_for_new_samples() for the list of sample reports that were written and uses TestResultWriter.copy_sample_file() to copy each sample report to the appropriate place in the results directory.

> Tools/WebKitTestRunner/TestInvocation.cpp:212
> +    char spindumpMessage[1024];
> +#if PLATFORM(COCOA)
> +    pid_t pid = WKPageGetProcessIdentifier(TestController::singleton().mainWebView()->page());
> +    snprintf(spindumpMessage, 1024, "#TRIGGER SPINDUMP - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid));
> +#else
> +    snprintf(spindumpMessage, 1024, "#TRIGGER SPINDUMP - %s\n", TestController::webProcessName());
> +#endif

It it necessary to emit a new message here "TRIGGER SPINDUMP"? Could we make use of the existing "#PROCESS UNRESPONSIVE" message and teach run-webkit-test to call spindump(8) when it sees this message instead of sample(1) on applicable platforms?

> Tools/WebKitTestRunner/TestInvocation.cpp:215
> +    if (!TestController::singleton().usingServerMode())
> +        return;

Can you elaborate on this?

> Tools/WebKitTestRunner/TestInvocation.cpp:216
> +    fputs("If running manually, hit enter...\n", stderr);

Is this a workflow that most people will make use of? Running WebKitTestRunner in server mode and wanting to invoke spindump(8) by hand?

> Tools/WebKitTestRunner/TestInvocation.cpp:218
> +    if (!fgets(spindumpMessage, 1024, stdin) || strcmp(spindumpMessage, "#SPINDUMP FINISHED\n"))

Is it necessary to look for the message "#SPINDUMP FINISHED\n"? Would it be sufficient to use fgetc(3) to read a single newline character or EOF character and teach run-webkit-tests to emit a single newline character?

> Tools/WebKitTestRunner/TestInvocation.cpp:219
> +        fputs("Spindump return failure\n", stderr);

I take it that this output is helpful.
Comment 19 Daniel Bates 2016-07-20 23:16:38 PDT
Comment on attachment 284131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review

> Tools/Scripts/webkitpy/port/ios.py:434
> +            self._executive.run_command(['sudo', '-n', 'spindump', str(pid), '10', '-file', self.results_directory() + '/' + parsed_filename + '.spindump.txt'])

Among other reasons it is good form to make use of full filesystem paths when invoking a command (even though the underlying implementation of Executive.run_command() ultimately invokes execvp(3), which uses the PATH environment variable to resolve program paths). In particular, we should use the full filesystem path for both sudo and spindump, /usr/bin/sudo and /usr/sbin/spindump, respectively.
Comment 20 Jonathan Bedard 2016-07-21 12:28:29 PDT
Comment on attachment 284131 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review

>> Tools/ChangeLog:7
>> +
> 
> Please explain the motivation for this change. From what I can tell the spindump(8) report is the concatenation of sample(1)ing all running processing plus some additional data. I take it that our interest in a spindump report is to capture the state of other WebKit processes (e.g. the Network process) when a WebContent process hangs to help diagnose a hard-to-reproduce test failure.

The motivation for this change was a flakey timeout which occurred before a test had actually started.  In that use case, no crash report was generated

>> Tools/Scripts/webkitpy/port/driver.py:440
>> +        return True
> 
> Is it necessary to have separate logic for spin dumps? I mean, this patch unconditionally replaces the machinery for sampling a process using sample(1) with machinery to call spindump(8). I do not see the need to only do one or the other, especially given that spindump(8) can only be run by root. Could we make use of the existing "PROCESS UNRESPONSIVE" message and incorporate the spin dump logic into Port.sample_process()? Could we incorporate the spin dump logic into Port.sample_process() such that we first call spindump(8) and if that command returns a non-zero exit status (say, the current user does not have password-less sudo access) then we call sample(1) on the WebContent process as we do today?

Maybe not.  There is another bug (r27472618) which may be related to this, and may have contributed to the difficulty tracking down the bug which created a need for this change.

>> Tools/Scripts/webkitpy/port/ios.py:432
>> +            os.makedirs(self.results_directory() + '/' + path_location)
> 
> Can we make use of the existing machinery (*) that copies sample(1) reports to the appropriate place in the results directory? If it makes sense to incorporate the spin dump logic into Port.sample_process() then we likely can pass Port.sample_file_path(..., ...) as the path for the -file argument to spindump(8) and make use of the existing machinery the collect samples and moves then to be near the test results for a failed test.
> 
> (*) After running a test Manager._look_for_new_crash_logs() is called to collect crash reports and sample reports. Manager._look_for_new_crash_logs() calls Port.look_for_new_samples() for the list of sample reports that were written and uses TestResultWriter.copy_sample_file() to copy each sample report to the appropriate place in the results directory.

My concern with integrating the spin dump logic into Port.sample_process() is that we would lose the hook back to the TestInvocation through stdin.  I kept spin dump separate because unlike sample_process(), spin dump is actually preventing the TestInvocation process from continuing.

It's also possible that I have misunderstood the point of sample_process() and spin dump does belong there.  I think it is worth noting, however, that the cause of this change was a flakey test which specifically wasn't able to be debugged through sample_process().

>> Tools/WebKitTestRunner/TestInvocation.cpp:215
>> +        return;
> 
> Can you elaborate on this?

The point of this snippet of code (per Alexey) is to continue on without waiting for a stdin trigger from the driving process.  The m_usingServerMode flag is used in the testController to know when to receive testing arguments from standard in (at least, this is my understanding of it).  This should mean that for most cases where the tests are being manually driven, the spindump trigger will do nothing.

>> Tools/WebKitTestRunner/TestInvocation.cpp:216
>> +    fputs("If running manually, hit enter...\n", stderr);
> 
> Is this a workflow that most people will make use of? Running WebKitTestRunner in server mode and wanting to invoke spindump(8) by hand?

No, see above.  WebKitTestRunner should be driven by a script if being run in server mode, but, if for some reason WebKitTestRunnner is being run in server mode manually, we would want to alert the user that the application is waiting for input.

>> Tools/WebKitTestRunner/TestInvocation.cpp:219
>> +        fputs("Spindump return failure\n", stderr);
> 
> I take it that this output is helpful.

The intention of this 'if' statement and it's pickiness is to provide some logging information indicating something has gone wrong with spindump.  Since this code path will be run rarely, I thought that this statement (and the pickiness of the stdin read) would be helpful for any future debugging
Comment 21 Jonathan Bedard 2016-07-21 12:29:21 PDT
Putting this bug on hold for the time being.  It may be related to a more general problem with process crashes uncovered today.
Comment 22 Jonathan Bedard 2016-07-21 15:38:56 PDT
Created attachment 284272 [details]
Patch
Comment 23 Jonathan Bedard 2016-07-22 11:48:57 PDT
Created attachment 284354 [details]
Patch
Comment 24 Jonathan Bedard 2016-07-22 11:50:17 PDT
This bug is unrelated to the other crashlog saving bug
Comment 25 Daniel Bates 2016-07-26 14:14:25 PDT
Comment on attachment 284354 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284354&action=review

> Tools/ChangeLog:7
> +

As implied by my remark in comment #18, please explain the motivation of this change in this ChangeLog.

> Tools/ChangeLog:19
> +        * Scripts/webkitpy/port/driver.py:
> +        (Driver._check_for_driver_crash_or_unresponsiveness): Notify test when finished through stdin
> +        * Scripts/webkitpy/port/ios.py:
> +        (IOSSimulatorPort):
> +        (IOSSimulatorPort.sample_process): Default to spindump, attempt sample if fail
> +        * Scripts/webkitpy/port/mac.py:
> +        (MacPort):
> +        (MacPort.sample_process): Default to spindump, attempt sample if fail
> +        * WebKitTestRunner/TestController.h:
> +        (WTR::TestController::usingServerMode): Added accessor
> +        * WebKitTestRunner/TestInvocation.cpp:
> +        (WTR::TestInvocation::dumpWebProcessUnresponsiveness): Wait for stdin before continuing

Please use complete sentences that end in a period.

> Tools/Scripts/webkitpy/port/driver.py:447
> +            self._server_process.write('#SAMPLE FINISHED\n')

OK. Alternatively, we could have taught WebKitTestRunner to suspend itself (raise a SIGSTOP) when it detects an unresponsive process and then run-webkit-tests can send a SIGCONT when its done.

> Tools/Scripts/webkitpy/port/ios.py:435
>          try:
>              hang_report = self.sample_file_path(name, pid)
> -            self._executive.run_command([
> -                "/usr/bin/sample",
> +            exit_status = self._executive.run_command([
> +                "/usr/bin/sudo",
> +                "-n",
> +                "/usr/sbin/spindump",
>                  pid,
>                  10,
>                  10,
>                  "-file",
>                  hang_report,
> -            ])
> +            ], None, None, None, None, True, False)
> +
> +            if not exit_status == 0:
> +                self._executive.run_command([
> +                    "/usr/bin/sample",
> +                    pid,
> +                    10,
> +                    10,
> +                    "-file",
> +                    hang_report,
> +                ])
>          except ScriptError as e:
> -            _log.warning('Unable to sample process:' + str(e))
> +            _log.warning('Unable to run spindump and sample process:' + str(e))

I would write this as:

hang_report = self.sample_file_path(name, pid)
exit_code = self._executive.run_command(['/usr/bin/sudo', '-n', '/usr/sbin/spindump', pid, 10, 10, '-file', hang_report], return_exit_code=True)
if not exit_code:
    return
# User does not have password-less sudo access. Fall back to using sample(1).
try:
    self._executive.run_command([
        "/usr/bin/sample",
        pid,
        10,
        10,
        "-file",
        hang_report,
    ])
except ScriptError as e:
    _log.warning('Unable to sample process:' + str(e))

> Tools/Scripts/webkitpy/port/mac.py:316
> -            self._executive.run_command([
> -                "/usr/bin/sample",
> +            exit_status = self._executive.run_command([
> +                "/usr/bin/sudo",
> +                "-n",
> +                "/usr/sbin/spindump",
>                  pid,
>                  10,
>                  10,
>                  "-file",
>                  hang_report,
> -            ])
> +            ], None, None, None, None, True, False)
> +
> +            if not exit_status == 0:
> +                self._executive.run_command([
> +                    "/usr/bin/sample",
> +                    pid,
> +                    10,
> +                    10,
> +                    "-file",
> +                    hang_report,
> +                ])
>          except ScriptError as e:
> -            _log.warning('Unable to sample process:' + str(e))
> +            _log.warning('Unable to run spindump and sample process:' + str(e))

Ditto.

> Tools/WebKitTestRunner/TestInvocation.cpp:203
> +    if (!TestController::singleton().usingServerMode())
> +        return;

OK. Is it necessary to know whether we are running in server mode to know if we should prompt for input for sample completion before terminating the WebContent process? I suspect that waiting for a sample to be taken of an unresponsive WebContent process is mostly of interest to scripts such as run-webkit-tests (for the purpose of implementing it sample on timeout feature). Would it be sufficient to only prompt to sample the WebContent process if standard error is not attached to a tty (e.g.  isatty(fileno(stderr)) returns 0)?

> Tools/WebKitTestRunner/TestInvocation.cpp:206
> +    if (isatty(fileno(stdin)))

This does not seem correct. We should be checking whether standard error is a tty device as opposed to standard input because scripts that tend to capture standard output and standard error of a subprocess (like run-webkit-tests) tend to forward standard input to the subprocess.

> Tools/WebKitTestRunner/TestInvocation.cpp:209
> +    fputs("Grab an image of the stack, then hit enter...\n", stderr);

We should be checking if standard error is attached to a tty device before we emit this message just as we do above.

> Tools/WebKitTestRunner/TestInvocation.cpp:213
> +#endif
> +    
> +    if (!fgets(buffer, sizeof(buffer), stdin) || strcmp(buffer, "#SAMPLE FINISHED\n"))
> +        fputs("Failed to sample process\n", stderr);

Maybe we could simplify the above logic to be:

...
dump(errorMessage, buffer, true);

if (!isatty(fileno(stderr))) {
    // Run-webkit-tests may have initiated a process sample of the unresponsive process. We wait for run-webkit-tests to
    // signal completion of its sample activity (it will signal regardless of whether it actually takes a process sample)
    // before we continue on and terminate the unresponsive process.
    if (!fgets(buffer, sizeof(buffer), stdin) || strcmp(buffer, "#SAMPLE FINISHED\n"))
        fprintf(stderr, "Failed to receive expected sample finished response. Got \"%s\". Continuing...", buffer);
}
Comment 26 Jonathan Bedard 2016-07-26 14:56:41 PDT
Comment on attachment 284354 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284354&action=review

>> Tools/WebKitTestRunner/TestInvocation.cpp:203
>> +        return;
> 
> OK. Is it necessary to know whether we are running in server mode to know if we should prompt for input for sample completion before terminating the WebContent process? I suspect that waiting for a sample to be taken of an unresponsive WebContent process is mostly of interest to scripts such as run-webkit-tests (for the purpose of implementing it sample on timeout feature). Would it be sufficient to only prompt to sample the WebContent process if standard error is not attached to a tty (e.g.  isatty(fileno(stderr)) returns 0)?

Since it is legal to have a script driving the testing process and not be in server mode (even if this may not be a current use case) I figure that it is safer to check for server mode.  We will never waiting for standard in if we aren't in server mode, so while it may not be needed for current use cases, it seems prudent so that this code doesn't cause hangs later on.

>> Tools/WebKitTestRunner/TestInvocation.cpp:206
>> +    if (isatty(fileno(stdin)))
> 
> This does not seem correct. We should be checking whether standard error is a tty device as opposed to standard input because scripts that tend to capture standard output and standard error of a subprocess (like run-webkit-tests) tend to forward standard input to the subprocess.

Well, it is possible for a user to pipe standard error to a file, correct?  It seems to me that in this circumstance, where the user pipes standard error into a file by standard in is a terminal, we should still prompt the user, as the application will hang, but at least standard error will have a line describing why.

>> Tools/WebKitTestRunner/TestInvocation.cpp:213
>> +        fputs("Failed to sample process\n", stderr);
> 
> Maybe we could simplify the above logic to be:
> 
> ...
> dump(errorMessage, buffer, true);
> 
> if (!isatty(fileno(stderr))) {
>     // Run-webkit-tests may have initiated a process sample of the unresponsive process. We wait for run-webkit-tests to
>     // signal completion of its sample activity (it will signal regardless of whether it actually takes a process sample)
>     // before we continue on and terminate the unresponsive process.
>     if (!fgets(buffer, sizeof(buffer), stdin) || strcmp(buffer, "#SAMPLE FINISHED\n"))
>         fprintf(stderr, "Failed to receive expected sample finished response. Got \"%s\". Continuing...", buffer);
> }

The way I see this, we have a number of use cases:

Not in server mode: Probably not being run by a script, no stdin expected

Server mode, stdin a terminal, stderr and stdout to pipes: A user has instantiated server mode and is piping the results, stdin expected
Server mode, stdin a pipe, stderr and stdout to pipes: Probably a script.  Stdin again expected
Server mode, stdin a pipe, stderr and stdout to terminals: A user is piping in commands (so stdin expected) but reading the output

I can't think of anyone in their right mind who would use the last use case, I supposed we might should output a line to stderr if either stdin or stderr are terminals
Comment 27 Daniel Bates 2016-07-26 15:49:58 PDT
(In reply to comment #26)
> Comment on attachment 284354 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=284354&action=review
> 
> >> Tools/WebKitTestRunner/TestInvocation.cpp:203
> >> +        return;
> > 
> > OK. Is it necessary to know whether we are running in server mode to know if we should prompt for input for sample completion before terminating the WebContent process? I suspect that waiting for a sample to be taken of an unresponsive WebContent process is mostly of interest to scripts such as run-webkit-tests (for the purpose of implementing it sample on timeout feature). Would it be sufficient to only prompt to sample the WebContent process if standard error is not attached to a tty (e.g.  isatty(fileno(stderr)) returns 0)?
> 
> Since it is legal to have a script driving the testing process and not be in
> server mode (even if this may not be a current use case) I figure that it is
> safer to check for server mode.  We will never waiting for standard in if we
> aren't in server mode, so while it may not be needed for current use cases,
> it seems prudent so that this code doesn't cause hangs later on.
> 

OK.

> >> Tools/WebKitTestRunner/TestInvocation.cpp:206
> >> +    if (isatty(fileno(stdin)))
> > 
> > This does not seem correct. We should be checking whether standard error is a tty device as opposed to standard input because scripts that tend to capture standard output and standard error of a subprocess (like run-webkit-tests) tend to forward standard input to the subprocess.
> 
> Well, it is possible for a user to pipe standard error to a file, correct?

You're right!

> It seems to me that in this circumstance, where the user pipes standard
> error into a file by standard in is a terminal, we should still prompt the
> user, as the application will hang, but at least standard error will have a
> line describing why.
> 

OK.
Comment 28 Jonathan Bedard 2016-07-26 15:56:45 PDT
Created attachment 284650 [details]
Patch
Comment 29 Darin Adler 2016-07-26 16:16:48 PDT
Comment on attachment 284650 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284650&action=review

> Tools/WebKitTestRunner/TestInvocation.cpp:195
> +    snprintf(buffer, sizeof(buffer), "#PROCESS UNRESPONSIVE - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid));

Can we use snprintf_s instead of snprintf? Otherwise we aren’t guaranteed it gets null terminated. Or does that not exist on all the platforms we support?
Comment 30 Jonathan Bedard 2016-07-26 16:26:21 PDT
Comment on attachment 284650 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284650&action=review

>> Tools/WebKitTestRunner/TestInvocation.cpp:195
>> +    snprintf(buffer, sizeof(buffer), "#PROCESS UNRESPONSIVE - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid));
> 
> Can we use snprintf_s instead of snprintf? Otherwise we aren’t guaranteed it gets null terminated. Or does that not exist on all the platforms we support?

I'm actually not sure if it's supported.  What I will do instead is ensure it's null terminated manually.
Comment 31 Jonathan Bedard 2016-07-26 16:32:18 PDT
Created attachment 284653 [details]
Patch
Comment 32 Daniel Bates 2016-07-26 17:32:16 PDT
Comment on attachment 284653 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284653&action=review

> Tools/WebKitTestRunner/TestInvocation.cpp:211
> +#else
> +    fputs("Grab an image of the stack, then hit enter...\n", stderr);
> +#endif

Is there a reason we do not check for a tty device for this case?
Comment 33 Jonathan Bedard 2016-07-27 08:54:40 PDT
Comment on attachment 284653 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284653&action=review

>> Tools/WebKitTestRunner/TestInvocation.cpp:211
>> +#endif
> 
> Is there a reason we do not check for a tty device for this case?

Actually, I guess that should be a Windows #ifdef
Comment 34 Jonathan Bedard 2016-07-27 10:17:29 PDT
Created attachment 284706 [details]
Patch
Comment 35 Jonathan Bedard 2016-07-27 10:46:43 PDT
As it turns out, EFL and GTK don't define isatty().  I'm just going to get rid of that code on all platforms except Cocoa and Windows.  It's not really needed anyway, it just streamlines work-flow
Comment 36 Jonathan Bedard 2016-07-27 11:05:05 PDT
Created attachment 284710 [details]
Patch
Comment 37 Jonathan Bedard 2016-07-27 14:47:55 PDT
Created attachment 284731 [details]
Patch
Comment 38 Daniel Bates 2016-07-29 12:41:50 PDT
Comment on attachment 284731 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284731&action=review

> Tools/WebKitTestRunner/TestInvocation.cpp:47
> +#if PLATFORM(GTK) || PLATFORM(EFL)
> +#include <unistd.h>
> +#endif

Please unconditionally include this header as this header contains the isatty() declaration on macOS. Obviously some other header h is including unistd.h when building for Mac and iOS Simulator as indicated by EWS success bubbles. Regardless, we should explicitly include this header in this file to avoid build breakage should unistd.h be removed from h.

> Tools/WebKitTestRunner/TestInvocation.cpp:197
> -    char errorMessageToStderr[1024];
> +    char buffer[1025];

I do not understand the need to increase the size of the buffer by one character from 1024 to 1025. It seems sufficient to use 1024.

> Tools/WebKitTestRunner/TestInvocation.cpp:198
> +    memset(buffer, 0, sizeof(buffer));

Minor: This is OK as-is. We can use aggregate initialization syntax to initialize the array to 0:

char buffer[1024] = { 0 };

> Tools/WebKitTestRunner/TestInvocation.cpp:203
> +    snprintf(buffer, sizeof(buffer) - 1, "#PROCESS UNRESPONSIVE - %s\n", TestController::webProcessName());

It is sufficient to pass sizeof(buffer) for the second argument to snprintf(). snprintf() guarantees that the buffer will be null terminated on success with respect to our usage by [1]. So, it is sufficient to write:

snprintf(buffer, sizeof(buffer), ...);

[1] <http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html>

> Tools/WebKitTestRunner/TestInvocation.cpp:216
> +#elif PLATFORM(WIN)
> +    if (_isatty(fileno(stdin)) || _isatty(fileno(stderr)))
> +        fputs("Grab an image of the stack, then hit enter...\n", stderr);

This is not needed as we do not build WebKitTestRunner on Windows.

> Tools/WebKitTestRunner/TestInvocation.cpp:221
> +#else
> +    fputs("Grab an image of the stack, then hit enter...\n", stderr);

If we unconditionally include header unistd.h then I do not see the need for this special case as all platforms we build on will have isatty(). Moreover, I do not feel it provides much value to have platform-specific manual instructions. It seems sufficient to make the instructions generic and then we can remove all of the platform-guards, which clutter this code and make it less readable.
Comment 39 Alexey Proskuryakov 2016-07-29 12:47:21 PDT
> char buffer[1024] = { 0 };

In C++, { } is supported and better (notably, MSVC generates abysmal code for { 0 }).
Comment 40 Jonathan Bedard 2016-07-29 13:15:18 PDT
Created attachment 284894 [details]
Patch
Comment 41 Jonathan Bedard 2016-07-29 13:17:51 PDT
Created attachment 284895 [details]
Patch
Comment 42 Jonathan Bedard 2016-07-29 13:20:49 PDT
This patch should address everything.  Windows code has been removed, isatty() is being used on all systems and only platforms which defined spindump and sample specifically prompt the use for it.
Comment 43 Build Bot 2016-07-29 14:20:58 PDT
Comment on attachment 284895 [details]
Patch

Attachment 284895 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/1775355

New failing tests:
media/track/track-remove-quickly.html
Comment 44 Build Bot 2016-07-29 14:21:02 PDT
Created attachment 284900 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.5
Comment 45 Jonathan Bedard 2016-07-29 14:27:58 PDT
Created attachment 284902 [details]
Patch
Comment 46 Jonathan Bedard 2016-07-29 14:33:23 PDT
To address Daniel's comment (which I read-over before the last patch): I think the platform separation makes the logging message more clear.  However, given that (this is a case which will almost never happen (as has been reviewed a number of times in this patch), I think he is right to eliminate the platform differentiation.

Lastly, one of the patch changes has changed something I said about snprintf in response to Darin earlier.  As per http://www.cplusplus.com/reference/cstdio/snprintf/,
    Parameter n: "Maximum number of bytes to be used in the buffer.
    The generated string has a length of at most n-1, leaving space for the additional
    terminating null character."
So given that we instantiate the buffer with all NULL's, we shouldn't need to worry about overrunning out buffer.
Comment 47 Daniel Bates 2016-08-02 08:52:53 PDT
Comment on attachment 284902 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=284902&action=review

> Tools/Scripts/webkitpy/port/ios.py:425
> +            if not exit_status == 0:

This should be written:

if exit_status:

> Tools/Scripts/webkitpy/port/mac.py:306
> +            if not exit_status == 0:

Ditto.
Comment 48 Jonathan Bedard 2016-08-02 10:52:24 PDT
Created attachment 285120 [details]
Patch
Comment 49 Daniel Bates 2016-08-02 11:19:44 PDT
Comment on attachment 285120 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285120&action=review

> Tools/Scripts/webkitpy/port/ios.py:423
> +            exit_status = self._executive.run_command([
> +                "/usr/bin/sudo",
> +                "-n",
> +                "/usr/sbin/spindump",
>                  pid,
>                  10,
>                  10,
>                  "-file",
>                  hang_report,
> -            ])
> +            ], return_exit_code=True)

Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block.
Comment 50 Jonathan Bedard 2016-08-02 11:37:40 PDT
Comment on attachment 285120 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285120&action=review

>> Tools/Scripts/webkitpy/port/ios.py:423
>> +            ], return_exit_code=True)
> 
> Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block.

This perhaps brings up another question, then, about sample_file_path(...), which calls filesystem.join(...).  In the original sample_process code, finding the report was in the try catch block, which is why I kept it there after adding spindump.  If filesystem.join(...) throws an exception (or results_directory(), actually), what is our desired functionality?  I think for the time being, keeping the generation of the hang_report file path inside the try-except block is prudent, as the tests can continue if an error happens while attempting to generate either version of the crash report, and it would seem unhelpful to be to have two try-except blocks in this function. 

I will look into this while refactoring the Mac and IOS ports, which will address some of the code duplication in this patch, if you think this is worth changing.
Comment 51 Daniel Bates 2016-08-02 13:09:23 PDT
(In reply to comment #50)
> Comment on attachment 285120 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=285120&action=review
> 
> >> Tools/Scripts/webkitpy/port/ios.py:423
> >> +            ], return_exit_code=True)
> > 
> > Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block.
> 
> This perhaps brings up another question, then, about sample_file_path(...),
> which calls filesystem.join(...).  In the original sample_process code,
> finding the report was in the try catch block, which is why I kept it there
> after adding spindump.  If filesystem.join(...) throws an exception (or
> results_directory(), actually), what is our desired functionality? 

Filesystem.join() does not raise an exception because it turns around and calls os.path.join(). And I do not expect os.path.join() to raise an exception because its documentation, <https://docs.python.org/2/library/os.path.html>, as of 08/02 makes no mention that it raises an exception.

> I think for the time being, keeping the generation of the hang_report file path
> inside the try-except block is prudent

Notice that the try-except block only catches a webkitpy.common.system.executive.ScriptError exception. That is, it does not catch a general-purpose Python Exception.

> , as the tests can continue if an
> error happens while attempting to generate either version of the crash
> report, and 

> it would seem unhelpful to be to have two try-except blocks in
> this function. 
> 

How did you come to the conclusion that we need to try-except blocks in this function? Notice that I have exactly one try-except block in the example I gave in comment #25.

As aforementioned above, the try-except block in sample_process() only catches a webkitpy.common.system.executive.ScriptError exception. The current code allows any other exception to bubble up. Maybe we should revise this code to be more resilient to other kinds of exceptions. If so, such consideration and work is more appropriate in a separate bug as the focus of this bug is to add an enhancement to our sampling machinery: use spindump if we can.
Comment 52 Daniel Bates 2016-08-02 13:10:38 PDT
(In reply to comment #51)
[...]
> > it would seem unhelpful to be to have two try-except blocks in
> > this function. 
> > 
> 
> How did you come to the conclusion that we need to try-except blocks in this function?

This should read:

How did you come to the conclusion that we need two try-except blocks in this function?
Comment 53 Jonathan Bedard 2016-08-02 13:28:54 PDT
(In reply to comment #51)
> (In reply to comment #50)
> > Comment on attachment 285120 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=285120&action=review
> > 
> > >> Tools/Scripts/webkitpy/port/ios.py:423
> > >> +            ], return_exit_code=True)
> > > 
> > > Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block.
> > 
> > This perhaps brings up another question, then, about sample_file_path(...),
> > which calls filesystem.join(...).  In the original sample_process code,
> > finding the report was in the try catch block, which is why I kept it there
> > after adding spindump.  If filesystem.join(...) throws an exception (or
> > results_directory(), actually), what is our desired functionality? 
> 
> Filesystem.join() does not raise an exception because it turns around and
> calls os.path.join(). And I do not expect os.path.join() to raise an
> exception because its documentation,
> <https://docs.python.org/2/library/os.path.html>, as of 08/02 makes no
> mention that it raises an exception.
> 
> > I think for the time being, keeping the generation of the hang_report file path
> > inside the try-except block is prudent
> 
> Notice that the try-except block only catches a
> webkitpy.common.system.executive.ScriptError exception. That is, it does not
> catch a general-purpose Python Exception.
> 
> > , as the tests can continue if an
> > error happens while attempting to generate either version of the crash
> > report, and 
> 
> > it would seem unhelpful to be to have two try-except blocks in
> > this function. 
> > 
> 
> How did you come to the conclusion that we need to try-except blocks in this
> function? Notice that I have exactly one try-except block in the example I
> gave in comment #25.
> 
> As aforementioned above, the try-except block in sample_process() only
> catches a webkitpy.common.system.executive.ScriptError exception. The
> current code allows any other exception to bubble up. Maybe we should revise
> this code to be more resilient to other kinds of exceptions. If so, such
> consideration and work is more appropriate in a separate bug as the focus of
> this bug is to add an enhancement to our sampling machinery: use spindump if
> we can.

(In reply to comment #52)
> (In reply to comment #51)
> [...]
> > > it would seem unhelpful to be to have two try-except blocks in
> > > this function. 
> > > 
> > 
> > How did you come to the conclusion that we need to try-except blocks in this function?
> 
> This should read:
> 
> How did you come to the conclusion that we need two try-except blocks in
> this function?

Sorry, I think my comment was not very clear.

What I was saying is that I kept the hang_report assignment in the try-except block because it had been in it's previous iteration, and I wasn't quite sure what kind of exceptions (if any) are thrown by sample_file_path(...) (as you point out, probably none, even when including results_directory()).  Essentially, I was trying to keep the same structure as had existed previously, and taking sample_file_path out of the try-except would have broken that pattern.

Since it does seem like sample_file_path(...) won't be throwing any exceptions, I will change this in the re-factor.
Comment 54 Jonathan Bedard 2016-08-04 15:14:14 PDT
Created attachment 285370 [details]
Patch
Comment 55 Jonathan Bedard 2016-08-04 15:16:11 PDT
This patch updates the try-except block and fixes the Python unit tests which test this function.
Comment 56 Daniel Bates 2016-08-04 15:50:20 PDT
Comment on attachment 285370 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285370&action=review

We should add tests to ensure that we only call sample() if spindump returns a non-zero exit status. We should use the test test_sample_process_throws_exception as an example for how to substitute a custom function for Executive.run_command().

> Tools/Scripts/webkitpy/port/mac_unittest.py:-224
> -    def test_sample_process_throws_exception(self):
> -
> -        def throwing_run_command(args):
> -            raise ScriptError("MOCK script error")
> -
> -        port = self.make_port()
> -        port._executive = MockExecutive2(run_command_fn=throwing_run_command)
> -        OutputCapture().assert_outputs(self, port.sample_process, args=['test', 42])
> -

We shouldn't remove this test. We should make this test work with the proposed code changes.
Comment 57 Jonathan Bedard 2016-08-04 17:34:27 PDT
Created attachment 285383 [details]
Patch
Comment 58 Jonathan Bedard 2016-08-04 17:37:03 PDT
This patch adds a specific spindump unit test and updates the sample_process unit tests so they match the new format of sample_process.
Comment 59 Alexey Proskuryakov 2016-08-08 08:58:32 PDT
Comment on attachment 285383 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=285383&action=review

> Tools/WebKitTestRunner/TestInvocation.cpp:193
> -    char errorMessageToStderr[1024];
> +    char buffer[1024] = { };

I do not think that initializing the buffer is an improvement over existing code. This doesn't make it any safer.

> Tools/WebKitTestRunner/TestInvocation.cpp:210
> +        fprintf(stderr, "Failed receive expected sample response, got:\n\t\"%s\"\nContinuing...\n", buffer);

Not sure if this is proper English. "Failed to receive" maybe?

Also, won't the error get printed in interactive mode every time? That would be misleading.
Comment 60 WebKit Commit Bot 2016-08-08 09:19:59 PDT
Comment on attachment 285383 [details]
Patch

Clearing flags on attachment: 285383

Committed r204253: <http://trac.webkit.org/changeset/204253>
Comment 61 WebKit Commit Bot 2016-08-08 09:20:06 PDT
All reviewed patches have been landed.  Closing bug.