Bug 37262 - new-run-webkit-tests should estimate test completion time
Summary: new-run-webkit-tests should estimate test completion time
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-08 02:58 PDT by Adam Barth
Modified: 2010-04-08 15:29 PDT (History)
4 users (show)

See Also:


Attachments
Patch (6.75 KB, patch)
2010-04-08 02:59 PDT, Adam Barth
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2010-04-08 02:58:11 PDT
new-run-webkit-tests should estimate test completion time
Comment 1 Adam Barth 2010-04-08 02:59:39 PDT
Created attachment 52851 [details]
Patch
Comment 2 Dirk Pranke 2010-04-08 12:36:14 PDT
Note that in the perl run-webkit-tests, the HTTP tests were put at the end intentionally because they were slowest. The original chrome run-webkit-tests put HTTP tests at the beginning, since that shard was usually the slowest. But, I would have thought that with Ojan's more recent sharding efforts, it would be more uniform. 

If you run with --experimental-fully-parallel, then the distribution should become more uniform, and if you run with --randomize-order, that would also make it more (arguably completely) uniform. One or both of these should probably be the default, but they risk making the testing more nondeterministic. (Although I suspect that the timing issues when running in parallel under load do enough to make things nondeterministic anyway that it's moot).

Perhaps we should have the script output a seed number when running with --randomize-order (visible with --log config), so that the shuffling could at least be reproduced if necessary.
Comment 3 Adam Barth 2010-04-08 12:38:37 PDT
TOTT tells me that we should randomize once a day and record the seed.  That way you get lots of order coverage but order-dependent failures show up as red tests for 24hrs so you have a chance of fixing them.
Comment 4 Eric Seidel (no email) 2010-04-08 12:43:28 PDT
http tests were put at the end for run-webkit-tests so that it was possible to lock around use of the httpd server was my understanding.

Eventually we'll have to fix new-run-webkit-tests to support running more than one copy on the machine at once.
Comment 5 Adam Barth 2010-04-08 12:43:42 PDT
Comment on attachment 52851 [details]
Patch

I had fun writing this patch, but we probably don't need it at the moment.  The percent complete thing is helpful to me so I don't have to do math in my head.
Comment 6 Eric Seidel (no email) 2010-04-08 12:45:00 PDT
If every failing test would just record the test which were run previous to it on that Driver instance, that would make it super easy to reproduce any failure due to sharding/randomization.
Comment 7 Dirk Pranke 2010-04-08 12:55:54 PDT
Comment on attachment 52851 [details]
Patch

> diff --git a/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/eta.py b/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/eta.py
> new file mode 100644
> index 0000000..ecdf7b5
> --- /dev/null
> +++ b/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/eta.py
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +import time

I think PEP-8 asks that you put a module-level docstring here.

> +
> +
> +class _Checkpoint(object):
> +    """A reference point for estimating the remaining time."""
> +    def __init__(self, index, total, t):
> +        self._index = index
> +        self._total = total
> +        self._time = t
> +
> +    def estimate(self, index, now):
> +        elapsed_index = index - self._index
> +        elapsed_time = now - self._time
> +        return elapsed_time * (self._total - index) / elapsed_index
> +

Please add docstrings indicating what index, now, total, and t are. 

> +
> +class ETA(object):
> +    """A class for estimating the amount of time remaining.
> +    This class uses an exponentially weighed series of checkpoints to estimate
> +    how much time remains."""

Can you describe why the algorithm is the way it is (or include a reference to a description of this somewhere, if it's a standard algorithm? I've never seen it before). It's not obvious why you are skipping some checkpoints, or how this thing works overall. In particular, it's not clear why this algorithm will produce better results than a simple (time so far) / ( tests so far / total tests) estimate, which is a lot simpler and more obvious. I'm sure that this algorithm is probably better, or else you wouldn't have bothered to code it, but for anyone not familiar with this type of thing, they'll have no idea how to debug it or change it.

As to the correctness of the computation, I will not review that at the moment since I'm hoping you'll explain it to me first :)


In run_webkit_tests.py:

>  
> @@ -714,10 +717,16 @@ class TestRunner:
>  
>      def _display_one_line_progress(self, result_summary):
>          """Displays the progress through the test run."""
> -        percent_complete = 100 * (result_summary.expected + result_summary.unexpected) / result_summary.total
> -        self._meter.update("Testing (%d%%): %d ran as expected, %d didn't, %d left" %
> -            (percent_complete, result_summary.expected,
> -             result_summary.unexpected, result_summary.remaining))
> +        if not self._eta:
> +            self._eta = eta.ETA(result_summary.total)

Should self._eta be created at the start of _run_tests(), rather than here? Otherwise you are discarding the time spent between starting the test run and the first invocation of this call.

> +        number_ran = result_summary.expected + result_summary.unexpected
> +        percent_complete = 100 * number_ran / result_summary.total
> +        eta_estimate = self._eta.estimate(number_ran)


> +        eta_string = ", %ds remaining" % math.floor(eta_estimate) if eta_estimate else ""

I'm not at all a fan of this Python idiom (too Perl-y). Can you rewrite this as a standard if : else: branch? The latter has the advantage of working correctly with the python coverage tool.
Comment 8 Adam Barth 2010-04-08 12:59:16 PDT
Thanks for the review.  I'm happy to address your comments if we want this patch, but I'm also not that married to getting this feature.  :)
Comment 9 Dirk Pranke 2010-04-08 13:38:48 PDT
(In reply to comment #4)
> http tests were put at the end for run-webkit-tests so that it was possible to
> lock around use of the httpd server was my understanding.
> 
> Eventually we'll have to fix new-run-webkit-tests to support running more than
> one copy on the machine at once.

Why? As opposed to just getting it to run as fast as possible over all the cores in the machine and queuing each run? 

Handling real concurrency would involve supporting multiple HTTP servers, etc, which might be a real hassle.
Comment 10 Dirk Pranke 2010-04-08 13:39:23 PDT
(In reply to comment #8)
> Thanks for the review.  I'm happy to address your comments if we want this
> patch, but I'm also not that married to getting this feature.  :)

If you had a reference for the algorithm, that would be interesting. Otherwise, addressing it is up to you.
Comment 11 Adam Barth 2010-04-08 14:50:32 PDT
> Why? As opposed to just getting it to run as fast as possible over all the
> cores in the machine and queuing each run? 

I think queuing would be an acceptable solution for our use case, which is running multiple bots (commit-queue, mac-ews) on the same machine.
Comment 12 Adam Barth 2010-04-08 14:51:46 PDT
> If you had a reference for the algorithm, that would be interesting. Otherwise,
> addressing it is up to you.

The algorithm is just simple-exponential-smoothing as explained here:
http://www.duke.edu/~rnau/411avg.htm
Comment 13 Csaba Osztrogonác 2010-04-08 14:56:52 PDT
(In reply to comment #4)
> http tests were put at the end for run-webkit-tests so that it was possible to
> lock around use of the httpd server was my understanding.
> 
> Eventually we'll have to fix new-run-webkit-tests to support running more than
> one copy on the machine at once.

Yes, you're right, we put http tests at the end because now run-webkit-tests script grab the lock of httpd before the first http test and release the lock when exit.

Our first idea was to fix tests and make run-webkit-tests script to
be able to run multiple httpd. Fixing run-webkit-tests isn't a big
task, but test cases and expected files with hard coded localhost,
127.0.0.1, 8000, 8080, etc. port numbers. You can find how many
test cases must be refactored for this in description of
related bug: https://bugs.webkit.org/show_bug.cgi?id=33153

Hard coded URLs and port numbers isn't a big problem if you would
like to run only one multi-threaded (new-)run-webkit-tests script.
But it is a blocker task if you would like to run more script.
Comment 14 Csaba Osztrogonác 2010-04-08 15:29:41 PDT
(In reply to comment #9)
> Why? As opposed to just getting it to run as fast as possible over all the
> cores in the machine and queuing each run? 
> 
> Handling real concurrency would involve supporting multiple HTTP servers, etc,
> which might be a real hassle.

Queueing can be a good solution, but we should guarantee running  
in the arrival order to avoid starvation (see dining philosophers).

András made a good solution for this in run-webkit-tests implementation 
(--wait-for-http locking mechanism). The idea is very simple, the script 
touch a new lock.XXXX file, where XXX is an automatically incremental value,
and always runs the owner of least one.