Summary: | run-webkit-tests should trigger a spindump when WebContent process is unresponsive | ||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Jonathan Bedard <jbedard> | ||||||||||||||||||||||||||||||||||||||||
Component: | Tools / Tests | Assignee: | Nobody <webkit-unassigned> | ||||||||||||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||||||||||||||||||||
Severity: | Normal | CC: | ap, commit-queue, dbates, ddkilzer, glenn, jbedard, lforschler | ||||||||||||||||||||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||||||||||||||||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=159312 https://bugs.webkit.org/show_bug.cgi?id=158646 |
||||||||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Jonathan Bedard
2016-07-15 13:25:31 PDT
Created attachment 283786 [details]
Patch
Created attachment 283805 [details]
Patch
Have you considered adding an option for this, and/or generalizing --(no-)sample? Yes, I did consider adding an option. I hadn't implemented it yet because I think it should be opt-out, and given how rarely it should run (which is ideally never) I wasn't and am still not sure if it's even worth an option. Comment on attachment 283805 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=283805&action=review Overall this looks good, but please post a revised patch addressing the comments below. r- to address feedback. > Tools/ChangeLog:19 > + (WTR::TestInvocation:: triggerSpindump): Exchange spindump data with python through stdin/stderr Spindump logs can be quite large. Where will this text output show up in the test results, as stderr output for the test? It's okay to send the contents via stderr as an initial experiment (so I'm not asking you to rewrite this patch), but if we find it useful (or it happens more often than we think), we should consider writing the spindump log to a file, and adding the file to the results page so it's a separate entity. (This also makes it easy to create a Safari extension to process the spindump log when visiting a test results page, rather than parsing it out of a chunk of stderr text.) > Tools/Scripts/webkitpy/port/driver.py:38 > import sys > import time > import os > +import subprocess Nit: We try to keep these sorted in alphabetical order. Please put 'subprocess' between 'shlex' and 'sys', or just re-alphabetize both. > Tools/Scripts/webkitpy/port/driver.py:436 > + match = re.search('pid (\d+)', error_line) Is it worth matching the parenthesis here? (pid NNNN) No change necessary to land the patch; just asking the question since it may reduce the off chance of a false-positive match. > Tools/Scripts/webkitpy/port/driver.py:445 > + parsed_filename = os.path.splitext(self._test_name)[0] > + path_location = os.path.split(parsed_filename)[0] > + > + if not os.path.exists(self._port.results_directory()): > + os.makedirs(self._port.results_directory()) > + if not os.path.exists(self._port.results_directory() + '/' + path_location): > + os.makedirs(self._port.results_directory() + '/' + path_location) Was this code copied from somewhere? If so, seems like it should be pulled into a new method. (Not required to fix to land this patch.) > Tools/WebKitTestRunner/TestInvocation.cpp:211 > + sprintf(spindumpMessage, "#TRIGGER SPINDUMP - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid)); > +#else > + sprintf(spindumpMessage, "#TRIGGER SPINDUMP - %s\n", TestController::webProcessName()); I know dumpWebProcessUnresponsiveness() doesn't use it, but please use snprintf() here and pull the length into a 'const size_t length' variable. > Tools/WebKitTestRunner/TestInvocation.cpp:219 > + if (!fgets(spindumpMessage, 1024, stdin) > + || strcmp(spindumpMessage, "#SPINDUMP FINISHED\n")) Formatting: IIRC, this should be on one line, or braces should be used for the body of the if statement. Comment on attachment 283805 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=283805&action=review > Tools/Scripts/webkitpy/port/driver.py:452 > + > + spindump_process = subprocess.Popen(['sudo', '-A', 'spindump', str(child_process_pid), '10', '-file', self._port.results_directory() + '/' + parsed_filename + '.spindump.txt'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) > + read_line = spindump_process.stdout.readline() > + while read_line: > + if read_line: > + _log.debug(read_line) > + read_line = spindump_process.stdout.readline() This assumes password-less sudo access or that sudo was invoked within the password prompt timeout period. Otherwise, we will hang waiting for a password. This is not the correct place for such port-specific code. It should be in the port-specific Port object. Use Port.sample_process() as an example. > This assumes password-less sudo access or that sudo was invoked within the password prompt timeout period. Otherwise, we will hang waiting for a password.
With the -A flag, sudo fails immediately, unless SUDO_ASKPASS is in the environment. Jonathan and I discussed this, but I forgot whether the script needed to explicitly unset SUDO_ASKPASS.
Comment on attachment 283805 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=283805&action=review >> Tools/ChangeLog:19 >> + (WTR::TestInvocation:: triggerSpindump): Exchange spindump data with python through stdin/stderr > > Spindump logs can be quite large. Where will this text output show up in the test results, as stderr output for the test? > > It's okay to send the contents via stderr as an initial experiment (so I'm not asking you to rewrite this patch), but if we find it useful (or it happens more often than we think), we should consider writing the spindump log to a file, and adding the file to the results page so it's a separate entity. (This also makes it easy to create a Safari extension to process the spindump log when visiting a test results page, rather than parsing it out of a chunk of stderr text.) My comments here are misleading, partially because the original implementation did dump the spindump log to stdout. Currently, the spindump log is written to a file. >> Tools/Scripts/webkitpy/port/driver.py:436 >> + match = re.search('pid (\d+)', error_line) > > Is it worth matching the parenthesis here? (pid NNNN) > > No change necessary to land the patch; just asking the question since it may reduce the off chance of a false-positive match. It shouldn't be needed, as the #TRIGGER SPINDUMP - (\S+) should have already checked for the matching parenthesis >> Tools/Scripts/webkitpy/port/driver.py:445 >> + os.makedirs(self._port.results_directory() + '/' + path_location) > > Was this code copied from somewhere? If so, seems like it should be pulled into a new method. (Not required to fix to land this patch.) It was not copied, I'm just not sure how much of it is needed (or exactly how the 'makedris' function works in Python). The purpose of this code is to guarantee that the location spindump is sending it's output to actually exists. It seems to me to be connected closely enough to the detection of the trigger output from the C++ program to justify being one function, although it may be sensible to separate the trigger detection from the actual spindump call, particularly if it turns out we want to call spindump from other places as well. >> Tools/Scripts/webkitpy/port/driver.py:452 >> + read_line = spindump_process.stdout.readline() > > This assumes password-less sudo access or that sudo was invoked within the password prompt timeout period. Otherwise, we will hang waiting for a password. This is not the correct place for such port-specific code. It should be in the port-specific Port object. Use Port.sample_process() as an example. That's what the '-A' does. If password-less sudo is not enabled (and it is enabled on the automation machines) this will simply continue execution without calling spindump (and will output 'sudo: no askpass program specified, try setting SUDO_ASKPASS'), which should be fine since this is designed to catch process hangs on automation machines. Created attachment 284016 [details]
Patch
SUDO_ASK_PASS essentially points to a script to be run. The idea being that a user can define, for example, a pop-up which is triggered for cases when a program without a command-line interface needs to access sudo permissions. The question here is how do we want this to work? Given that SUDO_ASK_PASS is essentially designed for this use-case, I think we're good. That being said, I have questions as to weather running this script without some sort of timeout is prudent. Created attachment 284062 [details]
Patch
Comment on attachment 284062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:216 > + fputs("If running manually, hit enter...\n", stderr); Would isatty(stdout) avoid the need for interaction in this case? Comment on attachment 284062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review >> Tools/WebKitTestRunner/TestInvocation.cpp:216 >> + fputs("If running manually, hit enter...\n", stderr); > > Would isatty(stdout) avoid the need for interaction in this case? Would a script being driven by Python consider itself to by run in a terminal context? I think isatty(stdout) might work, assuming pipes (which is what Python is using, correct?) don't count as terminal contexts. That's a handy little function, haven't seen it before. Comment on attachment 284062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review > Tools/Scripts/webkitpy/port/ios.py:434 > + spindump_process = subprocess.Popen(['sudo', '-A', 'spindump', str(pid), '10', '-file', self.results_directory() + '/' + parsed_filename + '.spindump.txt'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) Can we make use of option -n instead of -A to have sudo exit with an error if the command requires a password? From some experiments, "sudo -n" seems to work and avoids prompting for a password if either the current user has password-less sudo access (marked NOPASSWD: ALL in the sudoers file) or if "sudo -n" was invoked within the password prompt timeout period. If a password is required then sudo will exit with an exit code of 1. > Tools/Scripts/webkitpy/port/ios.py:439 > + read_line = spindump_process.stdout.readline() > + while read_line: > + if read_line: > + _log.debug(read_line) > + read_line = spindump_process.stdout.readline() We should be using Executive.run_command() or Executive.run_and_throw_if_fail() to run this command (with the executive object on the port object) instead of using subprocess directly. Unless you feel that the messages emitted by spindump(8) to explain each step that it is performing is beneficial, I would use Executive.run_command() and pass return_exit_code=True to avoid capturing any standard output. Comment on attachment 284062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284062&action=review >> Tools/Scripts/webkitpy/port/ios.py:434 >> + spindump_process = subprocess.Popen(['sudo', '-A', 'spindump', str(pid), '10', '-file', self.results_directory() + '/' + parsed_filename + '.spindump.txt'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) > > Can we make use of option -n instead of -A to have sudo exit with an error if the command requires a password? > > From some experiments, "sudo -n" seems to work and avoids prompting for a password if either the current user has password-less sudo access (marked NOPASSWD: ALL in the sudoers file) or if "sudo -n" was invoked within the password prompt timeout period. If a password is required then sudo will exit with an exit code of 1. sudo -n may be a better fit, particularly if people don't define SUDO_ASKPASS. I suppose that what this ultimately comes down to is which functionality is preferred. Do we want the user to be prompted if SUDO_ASKPASS is defined through whatever mechanism they personally provided in that script, and fail if such a method is not defined, or simply fail if the command cannot get achieve sudo permissions? For the time being, I think -n is probably the safer choice, although it may be less useful. I agree, let's go with -n. We can always switch to -A in the future if there is a need for SUDO_ASKPASS. Created attachment 284131 [details]
Patch
Comment on attachment 284131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review > Tools/ChangeLog:7 > + Please explain the motivation for this change. From what I can tell the spindump(8) report is the concatenation of sample(1)ing all running processing plus some additional data. I take it that our interest in a spindump report is to capture the state of other WebKit processes (e.g. the Network process) when a WebContent process hangs to help diagnose a hard-to-reproduce test failure. > Tools/Scripts/webkitpy/port/driver.py:192 > + if driver_input.test_name.startswith('http://') or driver_input.test_name.startswith('https://') or driver_input.test_name.startswith('file:///'): > + self._test_name = self.uri_to_test(driver_input.test_name) > + else: > + self._test_name = driver_input.test_name See my remark in _check_for_spindump_trigger() and in run_spindump() (defined in ios.py) that suggest making use of the existing machinery for sample reports. If we can make use of this machinery then we can remove this code. > Tools/Scripts/webkitpy/port/driver.py:440 > + if not error_line.startswith("#TRIGGER SPINDUMP - "): > + return False > + > + match = re.match('#TRIGGER SPINDUMP - (\S+)', error_line) > + child_process_name = match.group(1) if match else 'WebProcess' > + match = re.search('pid (\d+)', error_line) > + child_process_pid = int(match.group(1)) if match else None > + if child_process_pid: > + self._port.run_spindump(self._test_name, child_process_pid) > + self._server_process.write('#SPINDUMP FINISHED\n') > + return True Is it necessary to have separate logic for spin dumps? I mean, this patch unconditionally replaces the machinery for sampling a process using sample(1) with machinery to call spindump(8). I do not see the need to only do one or the other, especially given that spindump(8) can only be run by root. Could we make use of the existing "PROCESS UNRESPONSIVE" message and incorporate the spin dump logic into Port.sample_process()? Could we incorporate the spin dump logic into Port.sample_process() such that we first call spindump(8) and if that command returns a non-zero exit status (say, the current user does not have password-less sudo access) then we call sample(1) on the WebContent process as we do today? > Tools/Scripts/webkitpy/port/ios.py:432 > + parsed_filename = os.path.splitext(test_name)[0] > + path_location = os.path.split(parsed_filename)[0] > + > + if not os.path.exists(self.results_directory()): > + os.makedirs(self.results_directory()) > + if not os.path.exists(self.results_directory() + '/' + path_location): > + os.makedirs(self.results_directory() + '/' + path_location) Can we make use of the existing machinery (*) that copies sample(1) reports to the appropriate place in the results directory? If it makes sense to incorporate the spin dump logic into Port.sample_process() then we likely can pass Port.sample_file_path(..., ...) as the path for the -file argument to spindump(8) and make use of the existing machinery the collect samples and moves then to be near the test results for a failed test. (*) After running a test Manager._look_for_new_crash_logs() is called to collect crash reports and sample reports. Manager._look_for_new_crash_logs() calls Port.look_for_new_samples() for the list of sample reports that were written and uses TestResultWriter.copy_sample_file() to copy each sample report to the appropriate place in the results directory. > Tools/WebKitTestRunner/TestInvocation.cpp:212 > + char spindumpMessage[1024]; > +#if PLATFORM(COCOA) > + pid_t pid = WKPageGetProcessIdentifier(TestController::singleton().mainWebView()->page()); > + snprintf(spindumpMessage, 1024, "#TRIGGER SPINDUMP - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid)); > +#else > + snprintf(spindumpMessage, 1024, "#TRIGGER SPINDUMP - %s\n", TestController::webProcessName()); > +#endif It it necessary to emit a new message here "TRIGGER SPINDUMP"? Could we make use of the existing "#PROCESS UNRESPONSIVE" message and teach run-webkit-test to call spindump(8) when it sees this message instead of sample(1) on applicable platforms? > Tools/WebKitTestRunner/TestInvocation.cpp:215 > + if (!TestController::singleton().usingServerMode()) > + return; Can you elaborate on this? > Tools/WebKitTestRunner/TestInvocation.cpp:216 > + fputs("If running manually, hit enter...\n", stderr); Is this a workflow that most people will make use of? Running WebKitTestRunner in server mode and wanting to invoke spindump(8) by hand? > Tools/WebKitTestRunner/TestInvocation.cpp:218 > + if (!fgets(spindumpMessage, 1024, stdin) || strcmp(spindumpMessage, "#SPINDUMP FINISHED\n")) Is it necessary to look for the message "#SPINDUMP FINISHED\n"? Would it be sufficient to use fgetc(3) to read a single newline character or EOF character and teach run-webkit-tests to emit a single newline character? > Tools/WebKitTestRunner/TestInvocation.cpp:219 > + fputs("Spindump return failure\n", stderr); I take it that this output is helpful. Comment on attachment 284131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review > Tools/Scripts/webkitpy/port/ios.py:434 > + self._executive.run_command(['sudo', '-n', 'spindump', str(pid), '10', '-file', self.results_directory() + '/' + parsed_filename + '.spindump.txt']) Among other reasons it is good form to make use of full filesystem paths when invoking a command (even though the underlying implementation of Executive.run_command() ultimately invokes execvp(3), which uses the PATH environment variable to resolve program paths). In particular, we should use the full filesystem path for both sudo and spindump, /usr/bin/sudo and /usr/sbin/spindump, respectively. Comment on attachment 284131 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284131&action=review >> Tools/ChangeLog:7 >> + > > Please explain the motivation for this change. From what I can tell the spindump(8) report is the concatenation of sample(1)ing all running processing plus some additional data. I take it that our interest in a spindump report is to capture the state of other WebKit processes (e.g. the Network process) when a WebContent process hangs to help diagnose a hard-to-reproduce test failure. The motivation for this change was a flakey timeout which occurred before a test had actually started. In that use case, no crash report was generated >> Tools/Scripts/webkitpy/port/driver.py:440 >> + return True > > Is it necessary to have separate logic for spin dumps? I mean, this patch unconditionally replaces the machinery for sampling a process using sample(1) with machinery to call spindump(8). I do not see the need to only do one or the other, especially given that spindump(8) can only be run by root. Could we make use of the existing "PROCESS UNRESPONSIVE" message and incorporate the spin dump logic into Port.sample_process()? Could we incorporate the spin dump logic into Port.sample_process() such that we first call spindump(8) and if that command returns a non-zero exit status (say, the current user does not have password-less sudo access) then we call sample(1) on the WebContent process as we do today? Maybe not. There is another bug (r27472618) which may be related to this, and may have contributed to the difficulty tracking down the bug which created a need for this change. >> Tools/Scripts/webkitpy/port/ios.py:432 >> + os.makedirs(self.results_directory() + '/' + path_location) > > Can we make use of the existing machinery (*) that copies sample(1) reports to the appropriate place in the results directory? If it makes sense to incorporate the spin dump logic into Port.sample_process() then we likely can pass Port.sample_file_path(..., ...) as the path for the -file argument to spindump(8) and make use of the existing machinery the collect samples and moves then to be near the test results for a failed test. > > (*) After running a test Manager._look_for_new_crash_logs() is called to collect crash reports and sample reports. Manager._look_for_new_crash_logs() calls Port.look_for_new_samples() for the list of sample reports that were written and uses TestResultWriter.copy_sample_file() to copy each sample report to the appropriate place in the results directory. My concern with integrating the spin dump logic into Port.sample_process() is that we would lose the hook back to the TestInvocation through stdin. I kept spin dump separate because unlike sample_process(), spin dump is actually preventing the TestInvocation process from continuing. It's also possible that I have misunderstood the point of sample_process() and spin dump does belong there. I think it is worth noting, however, that the cause of this change was a flakey test which specifically wasn't able to be debugged through sample_process(). >> Tools/WebKitTestRunner/TestInvocation.cpp:215 >> + return; > > Can you elaborate on this? The point of this snippet of code (per Alexey) is to continue on without waiting for a stdin trigger from the driving process. The m_usingServerMode flag is used in the testController to know when to receive testing arguments from standard in (at least, this is my understanding of it). This should mean that for most cases where the tests are being manually driven, the spindump trigger will do nothing. >> Tools/WebKitTestRunner/TestInvocation.cpp:216 >> + fputs("If running manually, hit enter...\n", stderr); > > Is this a workflow that most people will make use of? Running WebKitTestRunner in server mode and wanting to invoke spindump(8) by hand? No, see above. WebKitTestRunner should be driven by a script if being run in server mode, but, if for some reason WebKitTestRunnner is being run in server mode manually, we would want to alert the user that the application is waiting for input. >> Tools/WebKitTestRunner/TestInvocation.cpp:219 >> + fputs("Spindump return failure\n", stderr); > > I take it that this output is helpful. The intention of this 'if' statement and it's pickiness is to provide some logging information indicating something has gone wrong with spindump. Since this code path will be run rarely, I thought that this statement (and the pickiness of the stdin read) would be helpful for any future debugging Putting this bug on hold for the time being. It may be related to a more general problem with process crashes uncovered today. Created attachment 284272 [details]
Patch
Created attachment 284354 [details]
Patch
This bug is unrelated to the other crashlog saving bug Comment on attachment 284354 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284354&action=review > Tools/ChangeLog:7 > + As implied by my remark in comment #18, please explain the motivation of this change in this ChangeLog. > Tools/ChangeLog:19 > + * Scripts/webkitpy/port/driver.py: > + (Driver._check_for_driver_crash_or_unresponsiveness): Notify test when finished through stdin > + * Scripts/webkitpy/port/ios.py: > + (IOSSimulatorPort): > + (IOSSimulatorPort.sample_process): Default to spindump, attempt sample if fail > + * Scripts/webkitpy/port/mac.py: > + (MacPort): > + (MacPort.sample_process): Default to spindump, attempt sample if fail > + * WebKitTestRunner/TestController.h: > + (WTR::TestController::usingServerMode): Added accessor > + * WebKitTestRunner/TestInvocation.cpp: > + (WTR::TestInvocation::dumpWebProcessUnresponsiveness): Wait for stdin before continuing Please use complete sentences that end in a period. > Tools/Scripts/webkitpy/port/driver.py:447 > + self._server_process.write('#SAMPLE FINISHED\n') OK. Alternatively, we could have taught WebKitTestRunner to suspend itself (raise a SIGSTOP) when it detects an unresponsive process and then run-webkit-tests can send a SIGCONT when its done. > Tools/Scripts/webkitpy/port/ios.py:435 > try: > hang_report = self.sample_file_path(name, pid) > - self._executive.run_command([ > - "/usr/bin/sample", > + exit_status = self._executive.run_command([ > + "/usr/bin/sudo", > + "-n", > + "/usr/sbin/spindump", > pid, > 10, > 10, > "-file", > hang_report, > - ]) > + ], None, None, None, None, True, False) > + > + if not exit_status == 0: > + self._executive.run_command([ > + "/usr/bin/sample", > + pid, > + 10, > + 10, > + "-file", > + hang_report, > + ]) > except ScriptError as e: > - _log.warning('Unable to sample process:' + str(e)) > + _log.warning('Unable to run spindump and sample process:' + str(e)) I would write this as: hang_report = self.sample_file_path(name, pid) exit_code = self._executive.run_command(['/usr/bin/sudo', '-n', '/usr/sbin/spindump', pid, 10, 10, '-file', hang_report], return_exit_code=True) if not exit_code: return # User does not have password-less sudo access. Fall back to using sample(1). try: self._executive.run_command([ "/usr/bin/sample", pid, 10, 10, "-file", hang_report, ]) except ScriptError as e: _log.warning('Unable to sample process:' + str(e)) > Tools/Scripts/webkitpy/port/mac.py:316 > - self._executive.run_command([ > - "/usr/bin/sample", > + exit_status = self._executive.run_command([ > + "/usr/bin/sudo", > + "-n", > + "/usr/sbin/spindump", > pid, > 10, > 10, > "-file", > hang_report, > - ]) > + ], None, None, None, None, True, False) > + > + if not exit_status == 0: > + self._executive.run_command([ > + "/usr/bin/sample", > + pid, > + 10, > + 10, > + "-file", > + hang_report, > + ]) > except ScriptError as e: > - _log.warning('Unable to sample process:' + str(e)) > + _log.warning('Unable to run spindump and sample process:' + str(e)) Ditto. > Tools/WebKitTestRunner/TestInvocation.cpp:203 > + if (!TestController::singleton().usingServerMode()) > + return; OK. Is it necessary to know whether we are running in server mode to know if we should prompt for input for sample completion before terminating the WebContent process? I suspect that waiting for a sample to be taken of an unresponsive WebContent process is mostly of interest to scripts such as run-webkit-tests (for the purpose of implementing it sample on timeout feature). Would it be sufficient to only prompt to sample the WebContent process if standard error is not attached to a tty (e.g. isatty(fileno(stderr)) returns 0)? > Tools/WebKitTestRunner/TestInvocation.cpp:206 > + if (isatty(fileno(stdin))) This does not seem correct. We should be checking whether standard error is a tty device as opposed to standard input because scripts that tend to capture standard output and standard error of a subprocess (like run-webkit-tests) tend to forward standard input to the subprocess. > Tools/WebKitTestRunner/TestInvocation.cpp:209 > + fputs("Grab an image of the stack, then hit enter...\n", stderr); We should be checking if standard error is attached to a tty device before we emit this message just as we do above. > Tools/WebKitTestRunner/TestInvocation.cpp:213 > +#endif > + > + if (!fgets(buffer, sizeof(buffer), stdin) || strcmp(buffer, "#SAMPLE FINISHED\n")) > + fputs("Failed to sample process\n", stderr); Maybe we could simplify the above logic to be: ... dump(errorMessage, buffer, true); if (!isatty(fileno(stderr))) { // Run-webkit-tests may have initiated a process sample of the unresponsive process. We wait for run-webkit-tests to // signal completion of its sample activity (it will signal regardless of whether it actually takes a process sample) // before we continue on and terminate the unresponsive process. if (!fgets(buffer, sizeof(buffer), stdin) || strcmp(buffer, "#SAMPLE FINISHED\n")) fprintf(stderr, "Failed to receive expected sample finished response. Got \"%s\". Continuing...", buffer); } Comment on attachment 284354 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284354&action=review >> Tools/WebKitTestRunner/TestInvocation.cpp:203 >> + return; > > OK. Is it necessary to know whether we are running in server mode to know if we should prompt for input for sample completion before terminating the WebContent process? I suspect that waiting for a sample to be taken of an unresponsive WebContent process is mostly of interest to scripts such as run-webkit-tests (for the purpose of implementing it sample on timeout feature). Would it be sufficient to only prompt to sample the WebContent process if standard error is not attached to a tty (e.g. isatty(fileno(stderr)) returns 0)? Since it is legal to have a script driving the testing process and not be in server mode (even if this may not be a current use case) I figure that it is safer to check for server mode. We will never waiting for standard in if we aren't in server mode, so while it may not be needed for current use cases, it seems prudent so that this code doesn't cause hangs later on. >> Tools/WebKitTestRunner/TestInvocation.cpp:206 >> + if (isatty(fileno(stdin))) > > This does not seem correct. We should be checking whether standard error is a tty device as opposed to standard input because scripts that tend to capture standard output and standard error of a subprocess (like run-webkit-tests) tend to forward standard input to the subprocess. Well, it is possible for a user to pipe standard error to a file, correct? It seems to me that in this circumstance, where the user pipes standard error into a file by standard in is a terminal, we should still prompt the user, as the application will hang, but at least standard error will have a line describing why. >> Tools/WebKitTestRunner/TestInvocation.cpp:213 >> + fputs("Failed to sample process\n", stderr); > > Maybe we could simplify the above logic to be: > > ... > dump(errorMessage, buffer, true); > > if (!isatty(fileno(stderr))) { > // Run-webkit-tests may have initiated a process sample of the unresponsive process. We wait for run-webkit-tests to > // signal completion of its sample activity (it will signal regardless of whether it actually takes a process sample) > // before we continue on and terminate the unresponsive process. > if (!fgets(buffer, sizeof(buffer), stdin) || strcmp(buffer, "#SAMPLE FINISHED\n")) > fprintf(stderr, "Failed to receive expected sample finished response. Got \"%s\". Continuing...", buffer); > } The way I see this, we have a number of use cases: Not in server mode: Probably not being run by a script, no stdin expected Server mode, stdin a terminal, stderr and stdout to pipes: A user has instantiated server mode and is piping the results, stdin expected Server mode, stdin a pipe, stderr and stdout to pipes: Probably a script. Stdin again expected Server mode, stdin a pipe, stderr and stdout to terminals: A user is piping in commands (so stdin expected) but reading the output I can't think of anyone in their right mind who would use the last use case, I supposed we might should output a line to stderr if either stdin or stderr are terminals (In reply to comment #26) > Comment on attachment 284354 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=284354&action=review > > >> Tools/WebKitTestRunner/TestInvocation.cpp:203 > >> + return; > > > > OK. Is it necessary to know whether we are running in server mode to know if we should prompt for input for sample completion before terminating the WebContent process? I suspect that waiting for a sample to be taken of an unresponsive WebContent process is mostly of interest to scripts such as run-webkit-tests (for the purpose of implementing it sample on timeout feature). Would it be sufficient to only prompt to sample the WebContent process if standard error is not attached to a tty (e.g. isatty(fileno(stderr)) returns 0)? > > Since it is legal to have a script driving the testing process and not be in > server mode (even if this may not be a current use case) I figure that it is > safer to check for server mode. We will never waiting for standard in if we > aren't in server mode, so while it may not be needed for current use cases, > it seems prudent so that this code doesn't cause hangs later on. > OK. > >> Tools/WebKitTestRunner/TestInvocation.cpp:206 > >> + if (isatty(fileno(stdin))) > > > > This does not seem correct. We should be checking whether standard error is a tty device as opposed to standard input because scripts that tend to capture standard output and standard error of a subprocess (like run-webkit-tests) tend to forward standard input to the subprocess. > > Well, it is possible for a user to pipe standard error to a file, correct? You're right! > It seems to me that in this circumstance, where the user pipes standard > error into a file by standard in is a terminal, we should still prompt the > user, as the application will hang, but at least standard error will have a > line describing why. > OK. Created attachment 284650 [details]
Patch
Comment on attachment 284650 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284650&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:195 > + snprintf(buffer, sizeof(buffer), "#PROCESS UNRESPONSIVE - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid)); Can we use snprintf_s instead of snprintf? Otherwise we aren’t guaranteed it gets null terminated. Or does that not exist on all the platforms we support? Comment on attachment 284650 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284650&action=review >> Tools/WebKitTestRunner/TestInvocation.cpp:195 >> + snprintf(buffer, sizeof(buffer), "#PROCESS UNRESPONSIVE - %s (pid %ld)\n", TestController::webProcessName(), static_cast<long>(pid)); > > Can we use snprintf_s instead of snprintf? Otherwise we aren’t guaranteed it gets null terminated. Or does that not exist on all the platforms we support? I'm actually not sure if it's supported. What I will do instead is ensure it's null terminated manually. Created attachment 284653 [details]
Patch
Comment on attachment 284653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284653&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:211 > +#else > + fputs("Grab an image of the stack, then hit enter...\n", stderr); > +#endif Is there a reason we do not check for a tty device for this case? Comment on attachment 284653 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284653&action=review >> Tools/WebKitTestRunner/TestInvocation.cpp:211 >> +#endif > > Is there a reason we do not check for a tty device for this case? Actually, I guess that should be a Windows #ifdef Created attachment 284706 [details]
Patch
As it turns out, EFL and GTK don't define isatty(). I'm just going to get rid of that code on all platforms except Cocoa and Windows. It's not really needed anyway, it just streamlines work-flow Created attachment 284710 [details]
Patch
Created attachment 284731 [details]
Patch
Comment on attachment 284731 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284731&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:47 > +#if PLATFORM(GTK) || PLATFORM(EFL) > +#include <unistd.h> > +#endif Please unconditionally include this header as this header contains the isatty() declaration on macOS. Obviously some other header h is including unistd.h when building for Mac and iOS Simulator as indicated by EWS success bubbles. Regardless, we should explicitly include this header in this file to avoid build breakage should unistd.h be removed from h. > Tools/WebKitTestRunner/TestInvocation.cpp:197 > - char errorMessageToStderr[1024]; > + char buffer[1025]; I do not understand the need to increase the size of the buffer by one character from 1024 to 1025. It seems sufficient to use 1024. > Tools/WebKitTestRunner/TestInvocation.cpp:198 > + memset(buffer, 0, sizeof(buffer)); Minor: This is OK as-is. We can use aggregate initialization syntax to initialize the array to 0: char buffer[1024] = { 0 }; > Tools/WebKitTestRunner/TestInvocation.cpp:203 > + snprintf(buffer, sizeof(buffer) - 1, "#PROCESS UNRESPONSIVE - %s\n", TestController::webProcessName()); It is sufficient to pass sizeof(buffer) for the second argument to snprintf(). snprintf() guarantees that the buffer will be null terminated on success with respect to our usage by [1]. So, it is sufficient to write: snprintf(buffer, sizeof(buffer), ...); [1] <http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html> > Tools/WebKitTestRunner/TestInvocation.cpp:216 > +#elif PLATFORM(WIN) > + if (_isatty(fileno(stdin)) || _isatty(fileno(stderr))) > + fputs("Grab an image of the stack, then hit enter...\n", stderr); This is not needed as we do not build WebKitTestRunner on Windows. > Tools/WebKitTestRunner/TestInvocation.cpp:221 > +#else > + fputs("Grab an image of the stack, then hit enter...\n", stderr); If we unconditionally include header unistd.h then I do not see the need for this special case as all platforms we build on will have isatty(). Moreover, I do not feel it provides much value to have platform-specific manual instructions. It seems sufficient to make the instructions generic and then we can remove all of the platform-guards, which clutter this code and make it less readable. > char buffer[1024] = { 0 };
In C++, { } is supported and better (notably, MSVC generates abysmal code for { 0 }).
Created attachment 284894 [details]
Patch
Created attachment 284895 [details]
Patch
This patch should address everything. Windows code has been removed, isatty() is being used on all systems and only platforms which defined spindump and sample specifically prompt the use for it. Comment on attachment 284895 [details] Patch Attachment 284895 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/1775355 New failing tests: media/track/track-remove-quickly.html Created attachment 284900 [details]
Archive of layout-test-results from ews126 for ios-simulator-wk2
The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews126 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.5
Created attachment 284902 [details]
Patch
To address Daniel's comment (which I read-over before the last patch): I think the platform separation makes the logging message more clear. However, given that (this is a case which will almost never happen (as has been reviewed a number of times in this patch), I think he is right to eliminate the platform differentiation. Lastly, one of the patch changes has changed something I said about snprintf in response to Darin earlier. As per http://www.cplusplus.com/reference/cstdio/snprintf/, Parameter n: "Maximum number of bytes to be used in the buffer. The generated string has a length of at most n-1, leaving space for the additional terminating null character." So given that we instantiate the buffer with all NULL's, we shouldn't need to worry about overrunning out buffer. Comment on attachment 284902 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=284902&action=review > Tools/Scripts/webkitpy/port/ios.py:425 > + if not exit_status == 0: This should be written: if exit_status: > Tools/Scripts/webkitpy/port/mac.py:306 > + if not exit_status == 0: Ditto. Created attachment 285120 [details]
Patch
Comment on attachment 285120 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285120&action=review > Tools/Scripts/webkitpy/port/ios.py:423 > + exit_status = self._executive.run_command([ > + "/usr/bin/sudo", > + "-n", > + "/usr/sbin/spindump", > pid, > 10, > 10, > "-file", > hang_report, > - ]) > + ], return_exit_code=True) Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block. Comment on attachment 285120 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285120&action=review >> Tools/Scripts/webkitpy/port/ios.py:423 >> + ], return_exit_code=True) > > Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block. This perhaps brings up another question, then, about sample_file_path(...), which calls filesystem.join(...). In the original sample_process code, finding the report was in the try catch block, which is why I kept it there after adding spindump. If filesystem.join(...) throws an exception (or results_directory(), actually), what is our desired functionality? I think for the time being, keeping the generation of the hang_report file path inside the try-except block is prudent, as the tests can continue if an error happens while attempting to generate either version of the crash report, and it would seem unhelpful to be to have two try-except blocks in this function. I will look into this while refactoring the Mac and IOS ports, which will address some of the code duplication in this patch, if you think this is worth changing. (In reply to comment #50) > Comment on attachment 285120 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=285120&action=review > > >> Tools/Scripts/webkitpy/port/ios.py:423 > >> + ], return_exit_code=True) > > > > Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block. > > This perhaps brings up another question, then, about sample_file_path(...), > which calls filesystem.join(...). In the original sample_process code, > finding the report was in the try catch block, which is why I kept it there > after adding spindump. If filesystem.join(...) throws an exception (or > results_directory(), actually), what is our desired functionality? Filesystem.join() does not raise an exception because it turns around and calls os.path.join(). And I do not expect os.path.join() to raise an exception because its documentation, <https://docs.python.org/2/library/os.path.html>, as of 08/02 makes no mention that it raises an exception. > I think for the time being, keeping the generation of the hang_report file path > inside the try-except block is prudent Notice that the try-except block only catches a webkitpy.common.system.executive.ScriptError exception. That is, it does not catch a general-purpose Python Exception. > , as the tests can continue if an > error happens while attempting to generate either version of the crash > report, and > it would seem unhelpful to be to have two try-except blocks in > this function. > How did you come to the conclusion that we need to try-except blocks in this function? Notice that I have exactly one try-except block in the example I gave in comment #25. As aforementioned above, the try-except block in sample_process() only catches a webkitpy.common.system.executive.ScriptError exception. The current code allows any other exception to bubble up. Maybe we should revise this code to be more resilient to other kinds of exceptions. If so, such consideration and work is more appropriate in a separate bug as the focus of this bug is to add an enhancement to our sampling machinery: use spindump if we can. (In reply to comment #51) [...] > > it would seem unhelpful to be to have two try-except blocks in > > this function. > > > > How did you come to the conclusion that we need to try-except blocks in this function? This should read: How did you come to the conclusion that we need two try-except blocks in this function? (In reply to comment #51) > (In reply to comment #50) > > Comment on attachment 285120 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=285120&action=review > > > > >> Tools/Scripts/webkitpy/port/ios.py:423 > > >> + ], return_exit_code=True) > > > > > > Notice that Executive.run_command() only throws a ScriptError exception if return_exit_code := False. It is good programming practice to limit the scope of a try-except to the minimum code that can raise an exception to help make it straightforward to debug the cause of an exception. In comment #25 I gave an example of one way we can write this code such that we limit the scope of the try-except block. > > > > This perhaps brings up another question, then, about sample_file_path(...), > > which calls filesystem.join(...). In the original sample_process code, > > finding the report was in the try catch block, which is why I kept it there > > after adding spindump. If filesystem.join(...) throws an exception (or > > results_directory(), actually), what is our desired functionality? > > Filesystem.join() does not raise an exception because it turns around and > calls os.path.join(). And I do not expect os.path.join() to raise an > exception because its documentation, > <https://docs.python.org/2/library/os.path.html>, as of 08/02 makes no > mention that it raises an exception. > > > I think for the time being, keeping the generation of the hang_report file path > > inside the try-except block is prudent > > Notice that the try-except block only catches a > webkitpy.common.system.executive.ScriptError exception. That is, it does not > catch a general-purpose Python Exception. > > > , as the tests can continue if an > > error happens while attempting to generate either version of the crash > > report, and > > > it would seem unhelpful to be to have two try-except blocks in > > this function. > > > > How did you come to the conclusion that we need to try-except blocks in this > function? Notice that I have exactly one try-except block in the example I > gave in comment #25. > > As aforementioned above, the try-except block in sample_process() only > catches a webkitpy.common.system.executive.ScriptError exception. The > current code allows any other exception to bubble up. Maybe we should revise > this code to be more resilient to other kinds of exceptions. If so, such > consideration and work is more appropriate in a separate bug as the focus of > this bug is to add an enhancement to our sampling machinery: use spindump if > we can. (In reply to comment #52) > (In reply to comment #51) > [...] > > > it would seem unhelpful to be to have two try-except blocks in > > > this function. > > > > > > > How did you come to the conclusion that we need to try-except blocks in this function? > > This should read: > > How did you come to the conclusion that we need two try-except blocks in > this function? Sorry, I think my comment was not very clear. What I was saying is that I kept the hang_report assignment in the try-except block because it had been in it's previous iteration, and I wasn't quite sure what kind of exceptions (if any) are thrown by sample_file_path(...) (as you point out, probably none, even when including results_directory()). Essentially, I was trying to keep the same structure as had existed previously, and taking sample_file_path out of the try-except would have broken that pattern. Since it does seem like sample_file_path(...) won't be throwing any exceptions, I will change this in the re-factor. Created attachment 285370 [details]
Patch
This patch updates the try-except block and fixes the Python unit tests which test this function. Comment on attachment 285370 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285370&action=review We should add tests to ensure that we only call sample() if spindump returns a non-zero exit status. We should use the test test_sample_process_throws_exception as an example for how to substitute a custom function for Executive.run_command(). > Tools/Scripts/webkitpy/port/mac_unittest.py:-224 > - def test_sample_process_throws_exception(self): > - > - def throwing_run_command(args): > - raise ScriptError("MOCK script error") > - > - port = self.make_port() > - port._executive = MockExecutive2(run_command_fn=throwing_run_command) > - OutputCapture().assert_outputs(self, port.sample_process, args=['test', 42]) > - We shouldn't remove this test. We should make this test work with the proposed code changes. Created attachment 285383 [details]
Patch
This patch adds a specific spindump unit test and updates the sample_process unit tests so they match the new format of sample_process. Comment on attachment 285383 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=285383&action=review > Tools/WebKitTestRunner/TestInvocation.cpp:193 > - char errorMessageToStderr[1024]; > + char buffer[1024] = { }; I do not think that initializing the buffer is an improvement over existing code. This doesn't make it any safer. > Tools/WebKitTestRunner/TestInvocation.cpp:210 > + fprintf(stderr, "Failed receive expected sample response, got:\n\t\"%s\"\nContinuing...\n", buffer); Not sure if this is proper English. "Failed to receive" maybe? Also, won't the error get printed in interactive mode every time? That would be misleading. Comment on attachment 285383 [details] Patch Clearing flags on attachment: 285383 Committed r204253: <http://trac.webkit.org/changeset/204253> All reviewed patches have been landed. Closing bug. |