Bug 37262

Summary: new-run-webkit-tests should estimate test completion time
Product: WebKit Reporter: Adam Barth <abarth>
Component: New BugsAssignee: Nobody <webkit-unassigned>
Status: RESOLVED WONTFIX    
Severity: Normal CC: dpranke, eric, ojan, ossy
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Attachments:
Description Flags
Patch none

Adam Barth
Reported 2010-04-08 02:58:11 PDT
new-run-webkit-tests should estimate test completion time
Attachments
Patch (6.75 KB, patch)
2010-04-08 02:59 PDT, Adam Barth
no flags
Adam Barth
Comment 1 2010-04-08 02:59:39 PDT
Dirk Pranke
Comment 2 2010-04-08 12:36:14 PDT
Note that in the perl run-webkit-tests, the HTTP tests were put at the end intentionally because they were slowest. The original chrome run-webkit-tests put HTTP tests at the beginning, since that shard was usually the slowest. But, I would have thought that with Ojan's more recent sharding efforts, it would be more uniform. If you run with --experimental-fully-parallel, then the distribution should become more uniform, and if you run with --randomize-order, that would also make it more (arguably completely) uniform. One or both of these should probably be the default, but they risk making the testing more nondeterministic. (Although I suspect that the timing issues when running in parallel under load do enough to make things nondeterministic anyway that it's moot). Perhaps we should have the script output a seed number when running with --randomize-order (visible with --log config), so that the shuffling could at least be reproduced if necessary.
Adam Barth
Comment 3 2010-04-08 12:38:37 PDT
TOTT tells me that we should randomize once a day and record the seed. That way you get lots of order coverage but order-dependent failures show up as red tests for 24hrs so you have a chance of fixing them.
Eric Seidel (no email)
Comment 4 2010-04-08 12:43:28 PDT
http tests were put at the end for run-webkit-tests so that it was possible to lock around use of the httpd server was my understanding. Eventually we'll have to fix new-run-webkit-tests to support running more than one copy on the machine at once.
Adam Barth
Comment 5 2010-04-08 12:43:42 PDT
Comment on attachment 52851 [details] Patch I had fun writing this patch, but we probably don't need it at the moment. The percent complete thing is helpful to me so I don't have to do math in my head.
Eric Seidel (no email)
Comment 6 2010-04-08 12:45:00 PDT
If every failing test would just record the test which were run previous to it on that Driver instance, that would make it super easy to reproduce any failure due to sharding/randomization.
Dirk Pranke
Comment 7 2010-04-08 12:55:54 PDT
Comment on attachment 52851 [details] Patch > diff --git a/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/eta.py b/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/eta.py > new file mode 100644 > index 0000000..ecdf7b5 > --- /dev/null > +++ b/WebKitTools/Scripts/webkitpy/layout_tests/layout_package/eta.py > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +import time I think PEP-8 asks that you put a module-level docstring here. > + > + > +class _Checkpoint(object): > + """A reference point for estimating the remaining time.""" > + def __init__(self, index, total, t): > + self._index = index > + self._total = total > + self._time = t > + > + def estimate(self, index, now): > + elapsed_index = index - self._index > + elapsed_time = now - self._time > + return elapsed_time * (self._total - index) / elapsed_index > + Please add docstrings indicating what index, now, total, and t are. > + > +class ETA(object): > + """A class for estimating the amount of time remaining. > + This class uses an exponentially weighed series of checkpoints to estimate > + how much time remains.""" Can you describe why the algorithm is the way it is (or include a reference to a description of this somewhere, if it's a standard algorithm? I've never seen it before). It's not obvious why you are skipping some checkpoints, or how this thing works overall. In particular, it's not clear why this algorithm will produce better results than a simple (time so far) / ( tests so far / total tests) estimate, which is a lot simpler and more obvious. I'm sure that this algorithm is probably better, or else you wouldn't have bothered to code it, but for anyone not familiar with this type of thing, they'll have no idea how to debug it or change it. As to the correctness of the computation, I will not review that at the moment since I'm hoping you'll explain it to me first :) In run_webkit_tests.py: > > @@ -714,10 +717,16 @@ class TestRunner: > > def _display_one_line_progress(self, result_summary): > """Displays the progress through the test run.""" > - percent_complete = 100 * (result_summary.expected + result_summary.unexpected) / result_summary.total > - self._meter.update("Testing (%d%%): %d ran as expected, %d didn't, %d left" % > - (percent_complete, result_summary.expected, > - result_summary.unexpected, result_summary.remaining)) > + if not self._eta: > + self._eta = eta.ETA(result_summary.total) Should self._eta be created at the start of _run_tests(), rather than here? Otherwise you are discarding the time spent between starting the test run and the first invocation of this call. > + number_ran = result_summary.expected + result_summary.unexpected > + percent_complete = 100 * number_ran / result_summary.total > + eta_estimate = self._eta.estimate(number_ran) > + eta_string = ", %ds remaining" % math.floor(eta_estimate) if eta_estimate else "" I'm not at all a fan of this Python idiom (too Perl-y). Can you rewrite this as a standard if : else: branch? The latter has the advantage of working correctly with the python coverage tool.
Adam Barth
Comment 8 2010-04-08 12:59:16 PDT
Thanks for the review. I'm happy to address your comments if we want this patch, but I'm also not that married to getting this feature. :)
Dirk Pranke
Comment 9 2010-04-08 13:38:48 PDT
(In reply to comment #4) > http tests were put at the end for run-webkit-tests so that it was possible to > lock around use of the httpd server was my understanding. > > Eventually we'll have to fix new-run-webkit-tests to support running more than > one copy on the machine at once. Why? As opposed to just getting it to run as fast as possible over all the cores in the machine and queuing each run? Handling real concurrency would involve supporting multiple HTTP servers, etc, which might be a real hassle.
Dirk Pranke
Comment 10 2010-04-08 13:39:23 PDT
(In reply to comment #8) > Thanks for the review. I'm happy to address your comments if we want this > patch, but I'm also not that married to getting this feature. :) If you had a reference for the algorithm, that would be interesting. Otherwise, addressing it is up to you.
Adam Barth
Comment 11 2010-04-08 14:50:32 PDT
> Why? As opposed to just getting it to run as fast as possible over all the > cores in the machine and queuing each run? I think queuing would be an acceptable solution for our use case, which is running multiple bots (commit-queue, mac-ews) on the same machine.
Adam Barth
Comment 12 2010-04-08 14:51:46 PDT
> If you had a reference for the algorithm, that would be interesting. Otherwise, > addressing it is up to you. The algorithm is just simple-exponential-smoothing as explained here: http://www.duke.edu/~rnau/411avg.htm
Csaba Osztrogonác
Comment 13 2010-04-08 14:56:52 PDT
(In reply to comment #4) > http tests were put at the end for run-webkit-tests so that it was possible to > lock around use of the httpd server was my understanding. > > Eventually we'll have to fix new-run-webkit-tests to support running more than > one copy on the machine at once. Yes, you're right, we put http tests at the end because now run-webkit-tests script grab the lock of httpd before the first http test and release the lock when exit. Our first idea was to fix tests and make run-webkit-tests script to be able to run multiple httpd. Fixing run-webkit-tests isn't a big task, but test cases and expected files with hard coded localhost, 127.0.0.1, 8000, 8080, etc. port numbers. You can find how many test cases must be refactored for this in description of related bug: https://bugs.webkit.org/show_bug.cgi?id=33153 Hard coded URLs and port numbers isn't a big problem if you would like to run only one multi-threaded (new-)run-webkit-tests script. But it is a blocker task if you would like to run more script.
Csaba Osztrogonác
Comment 14 2010-04-08 15:29:41 PDT
(In reply to comment #9) > Why? As opposed to just getting it to run as fast as possible over all the > cores in the machine and queuing each run? > > Handling real concurrency would involve supporting multiple HTTP servers, etc, > which might be a real hassle. Queueing can be a good solution, but we should guarantee running in the arrival order to avoid starvation (see dining philosophers). András made a good solution for this in run-webkit-tests implementation (--wait-for-http locking mechanism). The idea is very simple, the script touch a new lock.XXXX file, where XXX is an automatically incremental value, and always runs the owner of least one.
Note You need to log in before you can comment on or make changes to this bug.