Bug 37396 - new-run-webkit-tests should log the order tests are run in
Summary: new-run-webkit-tests should log the order tests are run in
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: NRWT
Depends on:
Blocks:
 
Reported: 2010-04-10 17:12 PDT by Ojan Vafai
Modified: 2012-12-01 17:45 PST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ojan Vafai 2010-04-10 17:12:02 PDT
It should record the order the tests are run in on each thread to a local file. This will help when encountering non-deterministic test failures.

I'm picturing the final result being something like:

THREAD-1
foo/bar/baz1.html
foo/bar/baz2.html
foo/bar/baz4.html

THREAD-2
foo/bar/baz3.html
foo/bar/baz5.html
foo/bar/baz6.html

If we wanted to be really thorough, we may as well throw in the failure type there as well, i.e. 


THREAD-1
foo/bar/baz1.html = TEXT
foo/bar/baz2.html = IMAGE
foo/bar/baz4.html = CRASH

THREAD-2
foo/bar/baz3.html = PASS
foo/bar/baz5.html = IMAGE+TEXT
foo/bar/baz6.html = TIMEOUT
Comment 1 Eric Seidel (no email) 2010-04-10 17:16:32 PDT
Yes.  I totally agree.  But I only care about this when there is a failure.

It should just poop out an extra file "testname-previous-tests.txt" and link to it next to the failure in the results.html page.

The driver could keep track of what tests were run since the last driver restart and every time there is a failure poop out that file.
Comment 2 Ojan Vafai 2010-04-10 17:26:13 PDT
(In reply to comment #1)
> Yes.  I totally agree.  But I only care about this when there is a failure.
> 
> It should just poop out an extra file "testname-previous-tests.txt" and link to
> it next to the failure in the results.html page.
> 
> The driver could keep track of what tests were run since the last driver
> restart and every time there is a failure poop out that file.

It's not sufficient to just know the previous test that was run. You really need the whole history. For example, there are some tests that depend on an image being in the cache. For those tests, they could pass or fail based off a test run many tests ago.

Maybe we want one file per thread though. Then the link next to the failure in the result.html file can be the link to that file. It could even scroll to that test in the file (obviously the file would then need to be html).
Comment 3 Eric Seidel (no email) 2010-04-10 17:40:19 PDT
Historically, the major source of flakiness has simply been test order.

Why I think that the list of previous tests is sufficient:

1.  You only need to know the tests since the last restart.  DRT currently restarts every 1000 tests in run-webkit-tests, you need to know when the last restart was because some state only gets cleared on restarts.
2.  Each DRT is separate, including caches.  So unless the problem is contention of httpd or disk access, a per-thread list should be sufficient.

Currently you can sorta get an order from "run-webkit-tests --verbose", the problem is it doesn't tell you when DRT restarts, so it's hard to reconstruct the previous test list w/o knowing where it should start.
Comment 4 Alexey Proskuryakov 2010-04-10 23:11:42 PDT
As proposed by Zoltan Herczeg on webkit-dev, one could just store random number generator seed. I think it's an elegant solution.

Knowing the seed, you could re-run all tests in the same order.
Comment 5 Dirk Pranke 2012-12-01 17:45:16 PST
Note that we currently do do this in the tests_run*.txt files written into layout-test-results (one file per worker). 

However, the file does not contain DRT/WTR pids so you can't tell when the workers crash or are otherwise restarted, and even the --debug-rwt-logging doesn't give you enough information to fix that. It also doesn't contain any timestamp information to help you determine which tests were running concurrently.

More importantly, it's hard (though not impossible) to do something useful with the data in the tests_run*.txt files, since you can't easily feed it back in to NRWT or control how things are sharded ( you can use --test-list to feed in a single list of tests, and at least now with --order=none it'll honor that, but that won't help across multiple workers).

I think ideally we'd merge all of the tests_run* files into a single file and add a --replay <path to file> option or something that would make this easier. I think there used to be a flag that would do a simpler version of this (--retry-last-failures or something?) but I'm not seeing it there now.

I'm closing this bug for now (since we do at leaset record the order) and going to file a new one for the --replay enhancement.

Regarding comment #4, I'm not sure that a random number seed would be needed or useful here. The nondeterminism comes from test timing and contention, not from using a random order. (You could of course specify the seed used when intentionally randomizing the tests, but that's a whole different thing).