Summary: | Sort out timeout implementations in DRT and WKTR | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Alexey Proskuryakov <ap> | ||||
Component: | Tools / Tests | Assignee: | Alexey Proskuryakov <ap> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | bfulgham, cgarcia, clopez, commit-queue, dbates, glenn, gyuyoung.kim, ossy, simon.fraser | ||||
Priority: | P2 | ||||||
Version: | 528+ (Nightly build) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
See Also: | https://bugs.webkit.org/show_bug.cgi?id=139734 | ||||||
Bug Depends on: | 139774, 139734, 139772 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Alexey Proskuryakov
2014-12-15 21:44:37 PST
Also not fixing: Windows DumpRenderTree. This should be straightforward, but needs building and testing on a Windows machine. Created attachment 243350 [details]
proposed fix
Attachment 243350 [details] did not pass style-queue:
ERROR: Tools/Scripts/webkitpy/port/driver.py:367: multiple statements on one line (semicolon) [pep8/E702] [5]
Total errors found: 1 in 17 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 243350 [details] proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=243350&action=review > Tools/DumpRenderTree/TestRunner.h:437 > + int m_timeout; Would be nice to use a comment saying what units this is in, or use a double, or std::chrono something. > Tools/WebKitTestRunner/InjectedBundle/mac/TestRunnerMac.mm:55 > + CFTimeInterval interval = m_timeout ? m_timeout / 1000.0 : defaultWaitToDumpWatchdogTimerInterval; Can't we just initialize m_timeout with defaultWaitToDumpWatchdogTimerInterval ? Comment on attachment 243350 [details] proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=243350&action=review > Tools/WebKitTestRunner/InjectedBundle/gtk/TestRunnerGtk.cpp:53 > + // FIXME: Use m_timeout, and only fall back to default when it's zero. > m_waitToDumpWatchdogTimer.scheduleAfterDelay("[WTR] waitToDumpWatchdogTimerCallback", [this] { waitToDumpWatchdogTimerFired(); }, > - std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::duration<double>(waitToDumpWatchdogTimerInterval))); > + std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::duration<double>(defaultWaitToDumpWatchdogTimerInterval))); So, what units is m_timeout in? The conversion to microseconds is correct here for the default timeout that is in seconds as a double value. I guess m_timeout is in milliseconds, since you are doing m_timeout/1000 in other cases. In this case that we know the value will not be truncated when converting to milliseconds (test timeouts are not that short), we could use milliseconds. std::chrono::milliseconds interval = m_timeout ? std::chrono::milliseconds(m_timeout) : std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::duration<double>(defaultWaitToDumpWatchdogTimerInterval)); m_waitToDumpWatchdogTimer.scheduleAfterDelay("[WTR] waitToDumpWatchdogTimerCallback", [this] { waitToDumpWatchdogTimerFired(); }, interval); Everything would be easier if m_timeout was a std::chrono::milliseconds and initialized to the default as Simon proposes. > Would be nice to use a comment saying what units this is in, or use a double, or std::chrono something. True. There are several more m_timeout variables in DumpRenderTree and WebKitTestRunner though, so it's better to update the types in a separate patch. > Can't we just initialize m_timeout with defaultWaitToDumpWatchdogTimerInterval ? Good idea. Committed <https://trac.webkit.org/r177363>. (In reply to comment #7) > Committed <https://trac.webkit.org/r177363>. It broke the test infrastructure on EFL and GTK: - https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/4907 - https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2/builds/18385 Sorry about that. It's not clear to me from the logs what broke. I'd love to help, but someone else will need to analyze what the problems are. Essentially, I made all the fixes blindly, and they worked on Mac from the first time, there wasn't much Mac specific here. One thing that I noticed earlier about Gtk tests was that they tend to time out unexpectedly quickly. See e.g. <http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fworkers%2Fworker-close-more.html> - why do tests time out in 6 seconds? Perhaps that's somehow related. (In reply to comment #9) > Sorry about that. > > It's not clear to me from the logs what broke. I'd love to help, but someone > else will need to analyze what the problems are. Essentially, I made all the > fixes blindly, and they worked on Mac from the first time, there wasn't much > Mac specific here. > > One thing that I noticed earlier about Gtk tests was that they tend to time > out unexpectedly quickly. See e.g. > <http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard. > html#tests=fast%2Fworkers%2Fworker-close-more.html> - why do tests time out > in 6 seconds? Perhaps that's somehow related. I have detected two main issues: 1) On the GTK port, we define the timeout to 6000ms (release), 12000ms (debug) or 10000ms (if checking memory leaks with valgrind) http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/gtk.py?rev=177363#L83 Your patch is substracting 5000ms from this timeout <http://trac.webkit.org/changeset/177363/trunk/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py> Leaving it to 1000ms (1 sec). That don't seems enough for many tests to complete. Why are you removing those 5 seconds from the default timeout? 2) r177363 breakes all ports that don't enable supports_per_test_timeout() on Tools/Scripts/webkitpy/port/$port.py This is because m_timeout is initialized to 0 <http://trac.webkit.org/browser/trunk/Tools/WebKitTestRunner/TestInvocation.cpp?rev=177363#L96>, and never updated with the port default value because the python scripts don't pass the --timeout argument when the previous is not enabled. Maybe we should just enable supports_per_test_timeout() for all ports at Tools/Scripts/webkitpy/port/base.py ? So, I have tested, and the patch below will fix the breakage introduced by r177363 on the GTK port: diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py index 8f72506..1c3fa21 100644 --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py @@ -62,11 +62,6 @@ class SingleTestRunner(object): self._stop_when_done = stop_when_done self._timeout = test_input.timeout - if self._timeout > 5000: - # Timeouts are detected by both script and tool; tool detected timeouts are - # better, because they contain partial output. Give the tool some time to - # report the timeout instead of being killed. - self._timeout = int(self._timeout) - 5000 if self._reference_files: # Detect and report a test which has a wrong combination of expectation files. diff --git a/Tools/Scripts/webkitpy/port/gtk.py b/Tools/Scripts/webkitpy/port/gtk.py index 338e2d1..ad451be 100644 --- a/Tools/Scripts/webkitpy/port/gtk.py +++ b/Tools/Scripts/webkitpy/port/gtk.py @@ -80,6 +80,9 @@ class GtkPort(Port): return XorgDriver return XvfbDriver + def supports_per_test_timeout(self): + return True + def default_timeout_ms(self): # Starting an application under Valgrind takes a lot longer than normal # so increase the timeout (empirically 10x is enough to avoid timeouts). Any comments about it? (In reply to comment #10) > 1) > On the GTK port, we define the timeout to 6000ms (release), 12000ms (debug) > or 10000ms (if checking memory leaks with valgrind) > http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/gtk. > py?rev=177363#L83 I read it wrongly. When using valgrind we are doing a 10x (so 60000ms for release and 120000ms for debug) Thank you for looking into this! > Why are you removing those 5 seconds from the default timeout? This is needed to let tool-detected timeout be reported before run-webkit-tests kills the tool. When both have the same timeout, there is a race to detect it. It is OK to change these numbers to some reasonable values that suit other ports. > Maybe we should just enable supports_per_test_timeout() for all ports at Tools/Scripts/webkitpy/port/base.py ? I'm not sure if DumpRenderTree on Windows supports it yet, but otherwise, this seems like the right thing to do. We could also remove a bunch of code then. But this is not the best solution to this problem, because we should also keep WebKitTestRunner working when launched manually, without the option. > This is because m_timeout is initialized to 0 Oops, I tried to have it use the default 30 second timeout in this case, but didn't do that right. Should be an easy fix - I can do that tomorrow if no one beats me to it. (In reply to comment #12) > Thank you for looking into this! > > > Why are you removing those 5 seconds from the default timeout? > > This is needed to let tool-detected timeout be reported before > run-webkit-tests kills the tool. When both have the same timeout, there is a > race to detect it. > > It is OK to change these numbers to some reasonable values that suit other > ports. > And instead of doing that (subtracting 5 seconds from the parameter --timeout passed to the tool (WKTR))... what about just making run-webkit-test wait 1 second more? diff --git a/Tools/Scripts/webkitpy/port/driver.py b/Tools/Scripts/webkitpy/port/driver.py index 715bd7e..9f5e3cd 100644 --- a/Tools/Scripts/webkitpy/port/driver.py +++ b/Tools/Scripts/webkitpy/port/driver.py @@ -180,6 +180,9 @@ class Driver(object): command = self._command_from_driver_input(driver_input) deadline = test_begin_time + int(driver_input.timeout) / 1000.0 + # Add an extra second to let tool-detected timeout be reported before run-webkit-tests kills the tool. + # When both have the same timeout, there is a race to detect it. + deadline = deadline + 1.0 self._server_process.write(command) text, audio = self._read_first_block(deadline) # First block is either text or audio At least to me, this seems less error prone and guaranteed to work on all ports (no matter which assumptions they make on the number of seconds for a test to timeout) > And instead of doing that (subtracting 5 seconds from the parameter --timeout passed to the tool (WKTR))... what about just making run-webkit-test wait 1 second more?
In this case, run-webkit-tests would use a timeout that is different from what is specified in port configuration, or in --time-out-ms argument. That seems like it would introduce too much confusion. Also, it's not driver's job to make policy decisions, it should use whatever timeout is passed to it without further tweaking.
Fixing the known issues in bug 139734. (In reply to comment #14) > > And instead of doing that (subtracting 5 seconds from the parameter --timeout passed to the tool (WKTR))... what about just making run-webkit-test wait 1 second more? > > In this case, run-webkit-tests would use a timeout that is different from > what is specified in port configuration, or in --time-out-ms argument. That > seems like it would introduce too much confusion. You can see it also the other way around: With the changes introduced by r177363, the test timeouts faster (-5secs) than what is specified in port configuration, or in --time-out-ms argument. I think that this actually introduces much more confusion than what I'm proposing. > Also, it's not driver's > job to make policy decisions, it should use whatever timeout is passed to it > without further tweaking. I don't see any policy decision here. Only a work-around to avoid a race condition. The subtracting of 5 seconds from r177363 isn't also policy, just another workaround. I think that adding 1 second in run-webkit-tests is better. At least to me it looks cleaner. If I say --timeou-out-ms 8, and I have to wait 9 seconds for the test to finish (or timeout), I probably won't notice. But, on the other hand, if I get a timeout after 3 seconds (8-5) probably I will notice something weird. Don't you agree on that? > With the changes introduced by r177363, the test timeouts faster (-5secs) than what is specified in port configuration This was already the case for many years. WebKitTestRunner has multiple shorter timeouts for specific operations, so a test could time out faster than specified in port configuration. Mac in particular always had port timeout specified at 35 seconds, but both DumpRenderTree and WebKitTestRunner ignored that, and timed out after 30 seconds. Maybe there was some confusion due to that, but not anything show-stopping. Additionally, a lot of tests have their own shorter timeouts implemented in JavaScript, which also don't obey port configuration. I am not very strongly opposed to adding instead of subtracting, but I don't see it as obviously superior. > I don't see any policy decision here. Only a work-around to avoid a race condition. A workaround is something that can be made to a bug. But there is no bug here - when both the script and the tool have the same timeout, race is the expected behavior. We need to have script timeout and tool timeout different, and this cannot be hidden under the carpet in driver.py, whose sole purpose is to translate Python variables into command line arguments, and then stdout/stderr into Python objects. I posted a patch to bug 139734, let's move this discussion there. |