Bug 51894

Summary: [Qt] run-qtwebkit-tests needs timeout
Product: WebKit Reporter: Csaba Osztrogonác <ossy>
Component: Tools / TestsAssignee: Jędrzej Nowacki <jedrzej.nowacki>
Status: RESOLVED FIXED    
Severity: Blocker CC: commit-queue, galpeter, jedrzej.nowacki, rgabor, wsiegrist
Priority: P1 Keywords: Qt, QtTriaged
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
patch
ossy: review-
Patch none

Description Csaba Osztrogonác 2011-01-04 14:03:54 PST
Nowadays run-qtwebkit-tests hangs regularly on buildbots.
It should have timeout and kill the bad testcases.

INFO:Exec:Running... WebKitBuild/Release/WebKit/qt/tests/benchmarks/painting/tst_painting
command timed out: 1200 seconds without output, killing pid ....

I prefer run-qtwebkit-tests kills the tst_culprit_test in 1-2 
minutes to buildbot kills run-qtwebkit-tests after 20 minutes.
Comment 1 Jędrzej Nowacki 2011-01-05 00:06:45 PST
Sure, the script should be able to kill a test, but in the first place the test should not hang, it have to fail ASAP ;-).

I'm going to add the new functionality.
Comment 2 Jędrzej Nowacki 2011-01-05 01:01:49 PST
Just to be precise, you mean kill a bad testsuite, not a testcase, right?

I would describe them as:
Testsuite is something like tst_painting
Testcase is something like tst_painting::paint()

QTestLib doesn't provide timeout functionality. I may change run-qtwebkit-tests script so it can kill a subprocess (which is a testsuite) after specified timeout. But that solution is essentially similar to kill the script by buildbot.

If you want to kill a testcase it would be much more complex, as the script have to run each testcase in a separate process. That may introduce an issue; results of running tests manually and by the script could be different, because of possible dependency between testcases.
Comment 3 Csaba Osztrogonác 2011-01-05 04:06:06 PST
(In reply to comment #2)
> Just to be precise, you mean kill a bad testsuite, not a testcase, right?
> 
> I would describe them as:
> Testsuite is something like tst_painting
> Testcase is something like tst_painting::paint()

I was a little ambiguous, I mean killing the bad testsuite, of course.

> QTestLib doesn't provide timeout functionality. I may change 
> run-qtwebkit-tests script so it can kill a subprocess (which 
> is a testsuite) after specified timeout. But that solution is 
> essentially similar to kill the script by buildbot.

It would be absolutely correct solution. ;) I only would like to ensure 
that run-qtwebkit-tests won't block a buildslave for 20 minutes if
something goes wrong in a testsuite. Now the runtime in parallel mode
is ~14-15 seconds and ~38-40 seconds with single thread. So I think a
30 seconds timeout would be more than enough for each testsuite.

Otherwise I supsect a network error/timeout might cause this 
problem, because the occurrences aren't random, but constant 
in a random (max. 1-2 hours long) time interval.
Comment 4 Jędrzej Nowacki 2011-01-06 07:40:09 PST
Created attachment 78120 [details]
patch

I do not set cq to '?' as some configuration changes may by applied on the buildbot side.
Comment 5 Csaba Osztrogonác 2011-01-06 08:06:14 PST
Comment on attachment 78120 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=78120&action=review

I got many errors like this:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.5/threading.py", line 663, in run
    self.function(*self.args, **self.kwargs)
  File "Tools/Scripts/run-qtwebkit-tests", line 133, in process_killer
    tst.terminate()
AttributeError: 'Popen' object has no attribute 'terminate'

> Tools/Scripts/run-qtwebkit-tests:72
> +        opt.add_option("-t", "--timeout", action="store", type="int",
> +              dest="timeout", default=0,
> +              help="Timeout in seconds for each testsuite. Zero value means that there is not timeout. Default: %default.")

I prefer 30 seconds as default value.
Comment 6 Peter Gal 2011-01-06 08:08:25 PST
Just some minor things:

> Tools/Scripts/run-qtwebkit-tests:125
> +            tst = Popen([test_suite.test_file_name(), ] + options.split(), stdout=PIPE, stderr=None, shell=False)

The comma inside the brackets isn't required, it will be a list without it too. Also the shell=False is unnecessary, that is the default behavior for Popen.

> Tools/Scripts/run-qtwebkit-tests:127
> +            tst = Popen([test_suite.test_file_name(), ] + options.split(), stdout=None, stderr=STDOUT, shell=False)

ditto.
Comment 7 Peter Gal 2011-01-06 08:10:13 PST
(In reply to comment #5)
> (From update of attachment 78120 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=78120&action=review
> 
> I got many errors like this:
> 
> Exception in thread Thread-1:
> Traceback (most recent call last):
>   File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
>     self.run()
>   File "/usr/lib/python2.5/threading.py", line 663, in run
>     self.function(*self.args, **self.kwargs)
>   File "Tools/Scripts/run-qtwebkit-tests", line 133, in process_killer
>     tst.terminate()
> AttributeError: 'Popen' object has no attribute 'terminate'

Popen's terminate() method is only available from python 2.6
Comment 8 Peter Gal 2011-01-06 08:15:31 PST
> Tools/Scripts/run-qtwebkit-tests:133
> +            log.debug("Setting timeout timer %i sec on %s (process %s)", timeout, test_suite.test_file_name(), tst.pid)
> +                    tst.terminate()

maybe 'os.kill(tst.pid, signal.SIGTERM)' could resolve the problem for python 2.5
Comment 9 Csaba Osztrogonác 2011-01-06 08:30:29 PST
Comment on attachment 78120 [details]
patch

Please fix issues mentioned in Comment #6 and Comment #8 and add 30 seconds default timeout and I'll give an r+.
Comment 10 Jędrzej Nowacki 2011-01-07 00:59:04 PST
(In reply to comment #5)
> I prefer 30 seconds as default value.
I don't think that it is a good idea. 30 sec is good for buildbot, but is not good enough for others. For example I need 2 min to run tests, more in debug mode and It takes only a few "weeks" on my old laptop :-). Personally I prefer to be not bothered by timeouts.

(In reply to comment #6)
> The comma inside the brackets isn't required, it will be a list without it too. Also the shell=False is unnecessary, that is the default behavior for Popen.
True.

(In reply to comment #8)
> maybe 'os.kill(tst.pid, signal.SIGTERM)' could resolve the problem for python 2.5
It won't work on windows until python 2.7 :/. Anyway I'm not aware of any Windows user who is using the script. Probably one of http://code.activestate.com/recipes/347462-terminating-a-subprocess-on-windows/ solutions should be used instead, but I don't have environment to check.
Comment 11 Jędrzej Nowacki 2011-01-07 01:08:00 PST
(In reply to comment #10)
> (In reply to comment #8)
> > maybe 'os.kill(tst.pid, signal.SIGTERM)' could resolve the problem for python 2.5
> It won't work on windows until python 2.7 :/. Anyway I'm not aware of any Windows user who is using the script. Probably one of http://code.activestate.com/recipes/347462-terminating-a-subprocess-on-windows/ solutions should be used instead, but I don't have environment to check.

I would write something like that:

try:
  tst.terminate()
except AttributeError:
  # python version < 2.6
  os.kill(tst.pid, signal.SIGTERM)
  # FIXME: Windows support

what do you think? Is it enough?
Comment 12 Peter Gal 2011-01-07 01:57:35 PST
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #8)
> > > maybe 'os.kill(tst.pid, signal.SIGTERM)' could resolve the problem for python 2.5
> > It won't work on windows until python 2.7 :/. Anyway I'm not aware of any Windows user who is using the script. Probably one of http://code.activestate.com/recipes/347462-terminating-a-subprocess-on-windows/ solutions should be used instead, but I don't have environment to check.
> 
> I would write something like that:
> 
> try:
>   tst.terminate()
> except AttributeError:
>   # python version < 2.6
>   os.kill(tst.pid, signal.SIGTERM)
>   # FIXME: Windows support
> 
> what do you think? Is it enough?

What if we check the python version? ( http://docs.python.org/library/platform.html#platform.python_version_tuple )
I personally don't like exceptions if I can found an other way for it.

so maybe like this:

ver = platform.python_version_tuple()
if ver[0] == 2 and ver[1] < 6:
    if platform.system == "win32":
        # win
        ..
    else:
        os.kill(tst.pid, signal.SIGTERM)
else:
   tst.terminate()
Comment 13 Csaba Osztrogonác 2011-01-07 03:39:52 PST
(In reply to comment #10)
> I don't think that it is a good idea. 30 sec is good for buildbot, but is not good enough for others. For example I need 2 min to run tests, more in debug mode and It takes only a few "weeks" on my old laptop :-). Personally I prefer to be not bothered by timeouts.

OK, you're right. Please add "--timeout 30" to Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg instead of the default timeout to make buildbots happy. (It must be enough, because the maximum runtime of a test suite is 10-12 seconds on the bot.)

(In reply to comment #12)
> What if we check the python version? ( http://docs.python.org/library/platform.html#platform.python_version_tuple )
> I personally don't like exceptions if I can found an other way for it.
> 
> so maybe like this:
> 
> ver = platform.python_version_tuple()
> if ver[0] == 2 and ver[1] < 6:
>     if platform.system == "win32":
>         # win
>         ..
>     else:
>         os.kill(tst.pid, signal.SIGTERM)
> else:
>    tst.terminate()

It looks good to me with one of the windows implementation mentioned in comment #10 .
Comment 14 Jędrzej Nowacki 2011-01-07 04:07:13 PST
(In reply to comment #12)
> What if we check the python version? ( http://docs.python.org/library/platform.html#platform.python_version_tuple )
> I personally don't like exceptions if I can found an other way for it.

You shouldn't avoid exceptions, it is nice feature that simplify your code ;-). Python encourages usage of exceptions. Please check http://docs.python.org/glossary.html#term-eafp
Comment 15 Jędrzej Nowacki 2011-01-07 04:09:36 PST
Created attachment 78220 [details]
Patch

I haven't tested that on Win, but I checked docs and it should work(TM) :-)
Comment 16 WebKit Commit Bot 2011-01-07 05:07:34 PST
Comment on attachment 78220 [details]
Patch

Clearing flags on attachment: 78220

Committed r75240: <http://trac.webkit.org/changeset/75240>
Comment 17 WebKit Commit Bot 2011-01-07 05:07:41 PST
All reviewed patches have been landed.  Closing bug.
Comment 18 Csaba Osztrogonác 2011-01-07 05:39:20 PST
Bill, could you update the master to make our bot happier, please?
Comment 19 William Siegrist 2011-01-07 07:28:53 PST
Buildbot updated.