WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
139671
Sort out timeout implementations in DRT and WKTR
https://bugs.webkit.org/show_bug.cgi?id=139671
Summary
Sort out timeout implementations in DRT and WKTR
Alexey Proskuryakov
Reported
2014-12-15 21:44:37 PST
There are multiple timeout watchdogs for regression tests: 1. One in run-webkit-test. When this fires, the tool gets sampled, and restarted. No partial output is saved. This supports [ Slow ] modifiers. 2. Async tests have their timeouts detected by DumpRenderTree and WebKitTestRunner, as long as main thread returns to event loop. When this timeout occurs, the test is incorrectly reported as failing, not timing out. There is partial output and an error line for the timeout. [ Slow ] modifiers are not respected - this timeout is always 30 seconds. 3. In WebKitTestRunner, UI process detects a timeout and fails. There is no partial output saved, no sample, and the test is reported as failing. 4. There are timeouts for individual messages that UI process sends to WebProcess in WebKitTestRunner. #3 is no better than #1, and can be eliminated. #2 should support [ Slow ] modifiers, and should be reported as an actual timeout that it is. WebKitTestRunner has a lot longer async test timeout than DumpRenderTree. I don't think that this is necessary or desirable. iOS has a lot longer DumpRenderTree timeout. I don't think that this is necessary or desirable. Gtk has a very strange timeout computation in WebKitTestRunner, see
bug 138831 comment 10
. I'm going to fix all these issues here, except for the Gtk one, which I cannot test.
Attachments
proposed fix
(23.38 KB, patch)
2014-12-15 22:33 PST
,
Alexey Proskuryakov
simon.fraser
: review+
Details
Formatted Diff
Diff
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2014-12-15 21:51:31 PST
Also not fixing: Windows DumpRenderTree. This should be straightforward, but needs building and testing on a Windows machine.
Alexey Proskuryakov
Comment 2
2014-12-15 22:33:29 PST
Created
attachment 243350
[details]
proposed fix
WebKit Commit Bot
Comment 3
2014-12-15 22:35:47 PST
Attachment 243350
[details]
did not pass style-queue: ERROR: Tools/Scripts/webkitpy/port/driver.py:367: multiple statements on one line (semicolon) [pep8/E702] [5] Total errors found: 1 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.
Simon Fraser (smfr)
Comment 4
2014-12-15 23:12:15 PST
Comment on
attachment 243350
[details]
proposed fix View in context:
https://bugs.webkit.org/attachment.cgi?id=243350&action=review
> Tools/DumpRenderTree/TestRunner.h:437 > + int m_timeout;
Would be nice to use a comment saying what units this is in, or use a double, or std::chrono something.
> Tools/WebKitTestRunner/InjectedBundle/mac/TestRunnerMac.mm:55 > + CFTimeInterval interval = m_timeout ? m_timeout / 1000.0 : defaultWaitToDumpWatchdogTimerInterval;
Can't we just initialize m_timeout with defaultWaitToDumpWatchdogTimerInterval ?
Carlos Garcia Campos
Comment 5
2014-12-16 02:05:39 PST
Comment on
attachment 243350
[details]
proposed fix View in context:
https://bugs.webkit.org/attachment.cgi?id=243350&action=review
> Tools/WebKitTestRunner/InjectedBundle/gtk/TestRunnerGtk.cpp:53 > + // FIXME: Use m_timeout, and only fall back to default when it's zero. > m_waitToDumpWatchdogTimer.scheduleAfterDelay("[WTR] waitToDumpWatchdogTimerCallback", [this] { waitToDumpWatchdogTimerFired(); }, > - std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::duration<double>(waitToDumpWatchdogTimerInterval))); > + std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::duration<double>(defaultWaitToDumpWatchdogTimerInterval)));
So, what units is m_timeout in? The conversion to microseconds is correct here for the default timeout that is in seconds as a double value. I guess m_timeout is in milliseconds, since you are doing m_timeout/1000 in other cases. In this case that we know the value will not be truncated when converting to milliseconds (test timeouts are not that short), we could use milliseconds. std::chrono::milliseconds interval = m_timeout ? std::chrono::milliseconds(m_timeout) : std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::duration<double>(defaultWaitToDumpWatchdogTimerInterval)); m_waitToDumpWatchdogTimer.scheduleAfterDelay("[WTR] waitToDumpWatchdogTimerCallback", [this] { waitToDumpWatchdogTimerFired(); }, interval); Everything would be easier if m_timeout was a std::chrono::milliseconds and initialized to the default as Simon proposes.
Alexey Proskuryakov
Comment 6
2014-12-16 10:12:44 PST
> Would be nice to use a comment saying what units this is in, or use a double, or std::chrono something.
True. There are several more m_timeout variables in DumpRenderTree and WebKitTestRunner though, so it's better to update the types in a separate patch.
> Can't we just initialize m_timeout with defaultWaitToDumpWatchdogTimerInterval ?
Good idea.
Alexey Proskuryakov
Comment 7
2014-12-16 10:31:54 PST
Committed <
https://trac.webkit.org/r177363
>.
Csaba Osztrogonác
Comment 8
2014-12-16 12:42:13 PST
(In reply to
comment #7
)
> Committed <
https://trac.webkit.org/r177363
>.
It broke the test infrastructure on EFL and GTK: -
https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/4907
-
https://build.webkit.org/builders/EFL%20Linux%2064-bit%20Release%20WK2/builds/18385
Alexey Proskuryakov
Comment 9
2014-12-16 13:58:48 PST
Sorry about that. It's not clear to me from the logs what broke. I'd love to help, but someone else will need to analyze what the problems are. Essentially, I made all the fixes blindly, and they worked on Mac from the first time, there wasn't much Mac specific here. One thing that I noticed earlier about Gtk tests was that they tend to time out unexpectedly quickly. See e.g. <
http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fworkers%2Fworker-close-more.html
> - why do tests time out in 6 seconds? Perhaps that's somehow related.
Carlos Alberto Lopez Perez
Comment 10
2014-12-16 18:57:52 PST
(In reply to
comment #9
)
> Sorry about that. > > It's not clear to me from the logs what broke. I'd love to help, but someone > else will need to analyze what the problems are. Essentially, I made all the > fixes blindly, and they worked on Mac from the first time, there wasn't much > Mac specific here. > > One thing that I noticed earlier about Gtk tests was that they tend to time > out unexpectedly quickly. See e.g. > <
http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard
. > html#tests=fast%2Fworkers%2Fworker-close-more.html> - why do tests time out > in 6 seconds? Perhaps that's somehow related.
I have detected two main issues: 1) On the GTK port, we define the timeout to 6000ms (release), 12000ms (debug) or 10000ms (if checking memory leaks with valgrind)
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/gtk.py?rev=177363#L83
Your patch is substracting 5000ms from this timeout <
http://trac.webkit.org/changeset/177363/trunk/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> Leaving it to 1000ms (1 sec). That don't seems enough for many tests to complete. Why are you removing those 5 seconds from the default timeout? 2)
r177363
breakes all ports that don't enable supports_per_test_timeout() on Tools/Scripts/webkitpy/port/$port.py This is because m_timeout is initialized to 0 <
http://trac.webkit.org/browser/trunk/Tools/WebKitTestRunner/TestInvocation.cpp?rev=177363#L96
>, and never updated with the port default value because the python scripts don't pass the --timeout argument when the previous is not enabled. Maybe we should just enable supports_per_test_timeout() for all ports at Tools/Scripts/webkitpy/port/base.py ? So, I have tested, and the patch below will fix the breakage introduced by
r177363
on the GTK port: diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py index 8f72506..1c3fa21 100644 --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py @@ -62,11 +62,6 @@ class SingleTestRunner(object): self._stop_when_done = stop_when_done self._timeout = test_input.timeout - if self._timeout > 5000: - # Timeouts are detected by both script and tool; tool detected timeouts are - # better, because they contain partial output. Give the tool some time to - # report the timeout instead of being killed. - self._timeout = int(self._timeout) - 5000 if self._reference_files: # Detect and report a test which has a wrong combination of expectation files. diff --git a/Tools/Scripts/webkitpy/port/gtk.py b/Tools/Scripts/webkitpy/port/gtk.py index 338e2d1..ad451be 100644 --- a/Tools/Scripts/webkitpy/port/gtk.py +++ b/Tools/Scripts/webkitpy/port/gtk.py @@ -80,6 +80,9 @@ class GtkPort(Port): return XorgDriver return XvfbDriver + def supports_per_test_timeout(self): + return True + def default_timeout_ms(self): # Starting an application under Valgrind takes a lot longer than normal # so increase the timeout (empirically 10x is enough to avoid timeouts). Any comments about it?
Carlos Alberto Lopez Perez
Comment 11
2014-12-16 19:01:35 PST
(In reply to
comment #10
)
> 1) > On the GTK port, we define the timeout to 6000ms (release), 12000ms (debug) > or 10000ms (if checking memory leaks with valgrind) >
http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/gtk
. > py?rev=177363#L83
I read it wrongly. When using valgrind we are doing a 10x (so 60000ms for release and 120000ms for debug)
Alexey Proskuryakov
Comment 12
2014-12-16 19:48:59 PST
Thank you for looking into this!
> Why are you removing those 5 seconds from the default timeout?
This is needed to let tool-detected timeout be reported before run-webkit-tests kills the tool. When both have the same timeout, there is a race to detect it. It is OK to change these numbers to some reasonable values that suit other ports.
> Maybe we should just enable supports_per_test_timeout() for all ports at Tools/Scripts/webkitpy/port/base.py ?
I'm not sure if DumpRenderTree on Windows supports it yet, but otherwise, this seems like the right thing to do. We could also remove a bunch of code then. But this is not the best solution to this problem, because we should also keep WebKitTestRunner working when launched manually, without the option.
> This is because m_timeout is initialized to 0
Oops, I tried to have it use the default 30 second timeout in this case, but didn't do that right. Should be an easy fix - I can do that tomorrow if no one beats me to it.
Carlos Alberto Lopez Perez
Comment 13
2014-12-17 05:00:17 PST
(In reply to
comment #12
)
> Thank you for looking into this! > > > Why are you removing those 5 seconds from the default timeout? > > This is needed to let tool-detected timeout be reported before > run-webkit-tests kills the tool. When both have the same timeout, there is a > race to detect it. > > It is OK to change these numbers to some reasonable values that suit other > ports. >
And instead of doing that (subtracting 5 seconds from the parameter --timeout passed to the tool (WKTR))... what about just making run-webkit-test wait 1 second more? diff --git a/Tools/Scripts/webkitpy/port/driver.py b/Tools/Scripts/webkitpy/port/driver.py index 715bd7e..9f5e3cd 100644 --- a/Tools/Scripts/webkitpy/port/driver.py +++ b/Tools/Scripts/webkitpy/port/driver.py @@ -180,6 +180,9 @@ class Driver(object): command = self._command_from_driver_input(driver_input) deadline = test_begin_time + int(driver_input.timeout) / 1000.0 + # Add an extra second to let tool-detected timeout be reported before run-webkit-tests kills the tool. + # When both have the same timeout, there is a race to detect it. + deadline = deadline + 1.0 self._server_process.write(command) text, audio = self._read_first_block(deadline) # First block is either text or audio At least to me, this seems less error prone and guaranteed to work on all ports (no matter which assumptions they make on the number of seconds for a test to timeout)
Alexey Proskuryakov
Comment 14
2014-12-17 09:44:21 PST
> And instead of doing that (subtracting 5 seconds from the parameter --timeout passed to the tool (WKTR))... what about just making run-webkit-test wait 1 second more?
In this case, run-webkit-tests would use a timeout that is different from what is specified in port configuration, or in --time-out-ms argument. That seems like it would introduce too much confusion. Also, it's not driver's job to make policy decisions, it should use whatever timeout is passed to it without further tweaking.
Alexey Proskuryakov
Comment 15
2014-12-17 09:54:54 PST
Fixing the known issues in
bug 139734
.
Carlos Alberto Lopez Perez
Comment 16
2014-12-17 10:06:29 PST
(In reply to
comment #14
)
> > And instead of doing that (subtracting 5 seconds from the parameter --timeout passed to the tool (WKTR))... what about just making run-webkit-test wait 1 second more? > > In this case, run-webkit-tests would use a timeout that is different from > what is specified in port configuration, or in --time-out-ms argument. That > seems like it would introduce too much confusion.
You can see it also the other way around: With the changes introduced by
r177363
, the test timeouts faster (-5secs) than what is specified in port configuration, or in --time-out-ms argument. I think that this actually introduces much more confusion than what I'm proposing.
> Also, it's not driver's > job to make policy decisions, it should use whatever timeout is passed to it > without further tweaking.
I don't see any policy decision here. Only a work-around to avoid a race condition. The subtracting of 5 seconds from
r177363
isn't also policy, just another workaround. I think that adding 1 second in run-webkit-tests is better. At least to me it looks cleaner. If I say --timeou-out-ms 8, and I have to wait 9 seconds for the test to finish (or timeout), I probably won't notice. But, on the other hand, if I get a timeout after 3 seconds (8-5) probably I will notice something weird. Don't you agree on that?
Alexey Proskuryakov
Comment 17
2014-12-17 10:30:14 PST
> With the changes introduced by
r177363
, the test timeouts faster (-5secs) than what is specified in port configuration
This was already the case for many years. WebKitTestRunner has multiple shorter timeouts for specific operations, so a test could time out faster than specified in port configuration. Mac in particular always had port timeout specified at 35 seconds, but both DumpRenderTree and WebKitTestRunner ignored that, and timed out after 30 seconds. Maybe there was some confusion due to that, but not anything show-stopping. Additionally, a lot of tests have their own shorter timeouts implemented in JavaScript, which also don't obey port configuration. I am not very strongly opposed to adding instead of subtracting, but I don't see it as obviously superior.
> I don't see any policy decision here. Only a work-around to avoid a race condition.
A workaround is something that can be made to a bug. But there is no bug here - when both the script and the tool have the same timeout, race is the expected behavior. We need to have script timeout and tool timeout different, and this cannot be hidden under the carpet in driver.py, whose sole purpose is to translate Python variables into command line arguments, and then stdout/stderr into Python objects. I posted a patch to
bug 139734
, let's move this discussion there.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug