webkitpy: get ServerProcess out of the reportcrash business
Created attachment 132732 [details] Patch
Comment on attachment 132732 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=132732&action=review > Tools/Scripts/webkitpy/layout_tests/port/server_process.py:106 > + if self._proc.returncode in (-1073741510, -signal.SIGINT): -1073741510 <- Can we name this constant?
Comment on attachment 132732 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=132732&action=review >> Tools/Scripts/webkitpy/layout_tests/port/server_process.py:106 >> + if self._proc.returncode in (-1073741510, -signal.SIGINT): > > -1073741510 <- Can we name this constant? I don't actually know what a good name for this constant would be; given the FIXME above, I suspect we can actually delete the constant altogether, and even if we can't, I need to figure out if this value correlates to something. I'd like to defer doing that and removing the fixme to a subsequent patch if that's okay, since getting to the root of this is kind of a different effort.
Comment on attachment 132732 [details] Patch I'm not sure I understand. Where did the code that waits for ReportCrash to finish dumping the stack go?
Do we not need to wait for ReportCrash anymore?
(In reply to comment #4) > (From update of attachment 132732 [details]) > I'm not sure I understand. Where did the code that waits for ReportCrash to finish dumping the stack go? That logic moves into the Port classes (see bug 81603), but I should probably move some of it back here. The reason is slightly complicated. It turns out that waiting for ReportCrash to exit is the wrong thing to do. Under normal circumstances (things are crashing rarely), ReportCrash actually writes the crash log before python is even notified that the subprocess has crashed. When lots of things are crashing, however, the write may be somewhat delayed (I don't really know what's going on, but it's like there is a delay before the file gets flushed to disk). So, in some cases, we actually do want to wait a while longer to look for the log - hence, in the patch in bug 81603, there's a slow spin for a period of time derived from how many workers are running in parallel. However, in any case, the operating system appears to keep ReportCrash around for a long (~30-40second) time just so it doesn't have to re-launch it if there is another crash. Also, there is only ever one ReportCrash running, regardless of how many processes are crashing at once (it appears that the single ReportCrash handles multiple crashes simultaneously). Most of the time, not waiting at all seems to be fine. Sometimes - if the world is broken and so every test crashes - you want to wait for a while. Ideally (for me) we'd land this patch as-is and then get "more correct" when 81603 lands. However, if you like I can rework the division of labor between the two patches a bit so that we even slow-spin as part of this patch as well.
(In reply to comment #6) > That logic moves into the Port classes (see bug 81603), but I should probably move some of it back here. s/should probably/could if necessary/. :)
I see. I think I misunderstood how ReportCrash worked previously. I thought there was one for each crash dump.
Comment on attachment 132732 [details] Patch I'm not sure I understand 100% of the details in this patch, but I like the overall direction.
Committed r111293: <http://trac.webkit.org/changeset/111293>
This is still breaking the layout tests on the bots, even after http://trac.webkit.org/changeset/111307 http://build.webkit.org/builders/Lion%20Intel%20Debug%20%28WebKit2%20Tests%29/builds/5070/steps/layout-test/logs/stdio -03-19 19:51:42,660 43121 single_test_runner.py:209 DEBUG /Volumes/Data/slave/lion-intel-debug/build/Source/WebCore/loader/appcache/ApplicationCacheStorage.cpp(557) : bool WebCore::ApplicationCacheStorage::executeSQLCommand(const WTF::String &) 2012-03-19 19:51:42,660 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 280, in run 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG self._worker_connection.run_message_loop() 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 232, in run_message_loop 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG self._broker.run_message_loop(self._run_topic, self._client, delay_secs) 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 168, in run_message_loop 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG self._run_loop(topic_name, client, block=True, delay_secs=delay_secs) 2012-03-19 19:51:42,661 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 182, in _run_loop 2012-03-19 19:51:42,662 43121 worker.py:206 DEBUG worker/5 fast/media/media-query-invalid-value.html passed 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG self._dispatch_message(msg, client) 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/manager_worker_broker.py", line 191, in _dispatch_message 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG message_handler(message.src, *optargs) 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 128, in handle_test_list 2012-03-19 19:51:42,662 43123 stack_utils.py:67 DEBUG self._run_test(test_input) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 143, in _run_test 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG result = self.run_test_with_timeout(test_input, test_timeout_sec) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 185, in run_test_with_timeout 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG return self._run_test_in_this_thread(test_input) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 270, in _run_test_in_this_thread 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG return self.run_single_test(self._driver, test_input) 2012-03-19 19:51:42,663 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/worker.py", line 274, in run_single_test 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG test_input, driver, self._name) 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 46, in run_single_test 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG return runner.run() 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 106, in run 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG return self._run_compare_test() 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py", line 119, in _run_compare_test 2012-03-19 19:51:42,664 43123 stack_utils.py:67 DEBUG test_result_writer.write_test_result(self._filesystem, self._port, self._test_name, driver_output, expected_driver_output, test_result.failures) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py", line 66, in write_test_result 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/layout_tests/controllers/test_result_writer.py", line 165, in write_crash_report 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG log = crash_logs.find_newest_log(crashed_process_name) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/crashlogs.py", line 40, in find_newest_log 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG return self._find_newest_log_darwin(process_name, pid, include_errors) 2012-03-19 19:51:42,665 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/crashlogs.py", line 57, in _find_newest_log_darwin 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG logs = self._filesystem.files_under(log_directory, file_filter=is_crash_log) 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/filesystem.py", line 111, in files_under 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG if file_filter(self, dirpath, filename): 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG File "/Volumes/Data/slave/lion-intel-debug-tests-wk2/build/Tools/Scripts/webkitpy/common/system/crashlogs.py", line 54, in is_crash_log 2012-03-19 19:51:42,666 43123 stack_utils.py:67 DEBUG return basename.startswith(process_name + "_") and basename.endswith(".crash") 2012-03-19 19:51:42,668 43123 manager_worker_broker.py:291 DEBUG worker/7 done with message loop, exception raised 2012-03-19 19:51:42,669 42883 manager.py:1424 ERROR worker/7 raised TypeError('unsupported operand type(s) for +: 'NoneType' and 'str''): 2012-03-19 19:51:42,669 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:280 (in run) 2012-03-19 19:51:42,669 42883 manager.py:1459 ERROR self._worker_connection.run_message_loop() 2012-03-19 19:51:42,669 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:232 (in run_message_loop) 2012-03-19 19:51:42,669 42883 manager.py:1459 ERROR self._broker.run_message_loop(self._run_topic, self._client, delay_secs) 2012-03-19 19:51:42,669 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:168 (in run_message_loop) 2012-03-19 19:51:42,669 42883 manager.py:1459 ERROR self._run_loop(topic_name, client, block=True, delay_secs=delay_secs) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:182 (in _run_loop) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR self._dispatch_message(msg, client) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:191 (in _dispatch_message) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR message_handler(message.src, *optargs) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:128 (in handle_test_list) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR self._run_test(test_input) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:143 (in _run_test) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR result = self.run_test_with_timeout(test_input, test_timeout_sec) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:185 (in run_test_with_timeout) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR return self._run_test_in_this_thread(test_input) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:270 (in _run_test_in_this_thread) 2012-03-19 19:51:42,670 42883 manager.py:1459 ERROR return self.run_single_test(self._driver, test_input) 2012-03-19 19:51:42,670 42883 manager.py:1458 ERROR layout_tests/controllers/worker.py:274 (in run_single_test) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR test_input, driver, self._name) 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:46 (in run_single_test) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR return runner.run() 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:106 (in run) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR return self._run_compare_test() 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:119 (in _run_compare_test) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR test_result_writer.write_test_result(self._filesystem, self._port, self._test_name, driver_output, expected_driver_output, test_result.failures) 2012-03-19 19:51:42,671 43123 worker.py:151 DEBUG worker/7 cleaning up 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:66 (in write_test_result) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error) 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:165 (in write_crash_report) 2012-03-19 19:51:42,671 42883 manager.py:1459 ERROR log = crash_logs.find_newest_log(crashed_process_name) 2012-03-19 19:51:42,671 43123 worker.py:178 DEBUG worker/7 killing driver 2012-03-19 19:51:42,671 42883 manager.py:1458 ERROR common/system/crashlogs.py:40 (in find_newest_log) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR return self._find_newest_log_darwin(process_name, pid, include_errors) 2012-03-19 19:51:42,672 42883 manager.py:1458 ERROR common/system/crashlogs.py:57 (in _find_newest_log_darwin) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR logs = self._filesystem.files_under(log_directory, file_filter=is_crash_log) 2012-03-19 19:51:42,672 42883 manager.py:1458 ERROR common/system/filesystem.py:111 (in files_under) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR if file_filter(self, dirpath, filename): 2012-03-19 19:51:42,672 42883 manager.py:1458 ERROR common/system/crashlogs.py:54 (in is_crash_log) 2012-03-19 19:51:42,672 42883 manager.py:1459 ERROR return basename.startswith(process_name + "_") and basename.endswith(".crash")
I think I may have to roll out those patches soon if this isn't addressed.
I had to roll out the changes in http://trac.webkit.org/changeset/111343
Comment on attachment 132732 [details] Patch Unflagging patch, since it got rolled out.
Thanks for rolling this out, and I'm sorry I wasn't around to debug/fix/roll it out myself!
Created attachment 132911 [details] Patch
Comment on attachment 132911 [details] Patch The actual fix: layout_tests $ git diff port/webkit.py 494c494,497 < return self._server_process.has_crashed() --- > if self._server_process.has_crashed(): > self._crashed_subprocess_name = self._port.driver_name() > return True > return False 501c504,505 < elif error_line == "#CRASHED - WebProcess\n": --- > return True > if error_line == "#CRASHED - WebProcess\n": 503a508 > return True
Committed r111453: <http://trac.webkit.org/changeset/111453>
(In reply to comment #18) > Committed r111453: <http://trac.webkit.org/changeset/111453> This change broke Lion Intel Release (WebKit2 Tests), as seen here: <http://build.webkit.org/builders/Lion%20Intel%20Release%20%28WebKit2%20Tests%29/builds/5535/steps/layout-test/logs/stdio/text>.
Reverted r111453 for reason: still broken Committed r111470: <http://trac.webkit.org/changeset/111470>
(In reply to comment #20) > Reverted r111453 for reason: > > still broken > > Committed r111470: <http://trac.webkit.org/changeset/111470> I still see this in GTK Debug when a crash happens: 2012-03-21 02:10:45,581 1482 manager.py:1424 ERROR worker/1 raised UnicodeDecodeError(''ascii' codec can't decode byte 0xce in position 235: ordinal not in range(128)'): 2012-03-21 02:10:45,582 1482 manager.py:1458 ERROR layout_tests/controllers/manager_worker_broker.py:280 (in run) 2012-03-21 02:10:45,582 1482 manager.py:1459 ERROR self._worker_connection.run_message_loop() ... 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR layout_tests/controllers/single_test_runner.py:119 (in _run_compare_test) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR test_result_writer.write_test_result(self._filesystem, self._port, self._test_name, driver_output, expected_driver_output, test_result.failures) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:66 (in write_test_result) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR writer.write_crash_report(crashed_driver_output.crashed_process_name, crashed_driver_output.error) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR layout_tests/controllers/test_result_writer.py:168 (in write_crash_report) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR fs.write_text_file(filename, log if log else error) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR common/system/filesystem.py:223 (in write_text_file) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR f.write(contents) 2012-03-21 02:10:45,584 1482 manager.py:1458 ERROR /usr/lib/python2.6/codecs.py:691 (in write) 2012-03-21 02:10:45,584 1482 manager.py:1459 ERROR return self.writer.write(data) 2012-03-21 02:10:45,585 1482 manager.py:1458 ERROR /usr/lib/python2.6/codecs.py:351 (in write) 2012-03-21 02:10:45,585 1482 manager.py:1459 ERROR data, consumed = self.encode(object, self.errors) The error variable needs to be: errors.decode('utf8')
(In reply to comment #21) > The error variable needs to be: errors.decode('utf8') Okay, will fix with the next attempt :).
Created attachment 133102 [details] fix another crash
Actual fix in patch #3: $ diff --git a/Tools/Scripts/webkitpy/layout_tests/port/webkit.py b/Tools/Scripts/webkitpy/layout_tests/port/webkit.py index 32a7a25..b998c47 100644 --- a/Tools/Scripts/webkitpy/layout_tests/port/webkit.py +++ b/Tools/Scripts/webkitpy/layout_tests/port/webkit.py @@ -588,7 +588,7 @@ class WebKitDriver(Driver): block = ContentBlock() out_seen_eof = False - while True: + while not self.has_crashed(): if out_seen_eof and (self.err_seen_eof or not wait_for_stderr_eof): break $ I need to revamp ServerProcess to be a little more bulletproof, but I'll do that in a separate patch.
Committed r111599: <http://trac.webkit.org/changeset/111599>