merge test expectations for chromium, chromium-gpu
Created attachment 80861 [details]
I am preparing a change to merge the GPU test expectations back into the main test_expectations file.
I will document the new syntax of the file more extensively on either dev.chromium.org or webkit.org (or both), but for the moment the important point to note is that we add in two new keywords, "GPU" and "CPU" to indicate that the expectation line should apply to either the accelerated or non-accelerated code paths. If neither is specified, the expectation line applies to both. (this works just like the "RELEASE" and "DEBUG" modifiers have always worked).
In order to determine which expectations apply to your current test run (what is now called the "test configuration"):
1) we first figure out if any of the modifiers out-and-out conflict with the configuration. If they do (e.g., you're running GPU on Win 7 but the line says "MAC"), we ignore the line.
Then, for each test we plan to run:
2) If we have not seen a line that applies to this test before, we store the expectation.
3) If we have seen a line before, and the new line matches *less* of the path than the old line, we ignore the new line.
4) If we have seen a line before and the new line matches "more" of the path than the old line, we store the new expectation and throw away the earlier one.
5) If the new line matches the *same* amount of the path as the earlier line, we raise a ParseError and consider it a duplicate expectation.
[ The logic for this is in Tools/Scripts/webkitpy/layout_tests/layout_package/test_expectations.py:804-869, the method _already_seen_better_match(). ]
unfortunately, there's a complication ...
Today, the GPU tests run only a small subset of the total layout test suite (~1100 tests out of 22000+ on win/linux, ~200 or so on Mac). The remaining tests are marked as SKIP in the separate chromium-gpu/test_expectations.txt file.
When we merge the two files, you end up with something like:
% cat test_expectations.txt
// previously in the file
BUG_DIRK : fast/html/foo.html = IMAGE
// newly-merged in
WONTFIX GPU SKIP : fast = PASS FAIL
Given the logic described in comment #2, this means the GPU bots will now start to run fast/html/foo.html , and every other test that has a more explicit path than one of the SKIPped directories.
This means that we will run roughly 800-900 more tests. Because we run the GPU tests on the same bots as the regular tests, this will increase the cycle time for each bot, but because we are still running a small fraction of the total, Ojan and I are guessing that this will only increase the total cycle time of the bots by 10-15%. Note also that the GPU tests today are run single-threaded (only one DRT is used).
[ Note that we have always had this sort of logic; substitute "RELEASE" for "GPU" in my example and you end up running fast/html/foo.html on the RELEASE bots even now. ]
There aren't immediately obvious solutions to this. Here are some possible alternatives:
1) Go ahead and merge the files (as in the patch uploaded above), but don't do anything else. Going forward, continue gardening as you always have done. Downside: cycle times get somewhat slower. Upside: zero additional effort.
2) Merge the files, and annotate every test that we don't want to run in GPU mode with "CPU". Cycle time stays roughly the same as it is now. Downside: expectations file gets slightly uglier.
3) Throw hardware at the problem - move the GPU tests onto a separate set of bots. Upside: Cycle times get shorter by ~10% or so (since the GPU steps are much faster than the main steps). Downside: increased maintenance overhead of 2x the number of bots, both by the people maintaining machines and people watching the (already overly-wide) waterfalls.
4) Speed up the GPU tests - figure out how to run in multiple processes. I don't know if this is feasible or not, but I don't understand why the emulated GL libs (or perhaps some other part of the code) have problems today. Downside: development work, cycle times are still slower than they would be otherwise.
5) Change the parsing logic - What if the configuration modifiers are treated as higher precedence than the length of the path match? I.e., so "GPU SKIP : fast" would trump " : fast/html/foo ". Downside: every time you add a new expectation, it may be shadowed by earlier modifiers, and there would have to be some way to figure out if that's okay or not. Upside: this may actually simplify the existing file in some ways, because we could delete some of the annotations we have today, like "DEBUG SKIP : fast\nRELEASE : fast/html = TEXT" . It may also be possible to change the code to warn about conflicts, but I'm not sure how to do this in a useful manner, so I'll think about it. There may be other unknown downsides to this, since I haven't fully explored it yet.
6) Change the parsing logic - special case "GPU" somehow, and/or "SKIP" somehow.
7) Separate out the "what tests do I run" logic from the "expectations" logic - we could pass the list of tests to new-run-webkit-tests explicitly (either in buildbot code / configuration files, a separate conf file, a separate wrapper script, or a separate config file), and not have to worry about the "SKIP" lines at all. Upside: very clean solution. Downside: You have to modify something somewhere every time you want to add a new directory of tests to the GPU bots. We could create a separate blacklist file, but that really becomes a version of (6) at that point.
8) Leave the expectations in two files. Downside: increased maintenance overhead for the gardeners. (Really also a variant of (6).
I'm not sure what the best approach is. I will spend some time evaluating the impact and feasibility of (5). But, I think we're going to not want to go that route. So, failing that, I recommend we do (1) for now, and either do (2) and or (4) if we are not happy with the cycle time.
Another option is to make the prioritization dependent on the order in the file, so it's okay if a more-specific line comes after a less-specific line, but an error or a lint warning otherwise. So far that's the only way to do (5) or (6) in a way to get useful diagnostics.
(In reply to comment #4)
> Another option is to make the prioritization dependent on the order in the file, so it's okay if a more-specific line comes after a less-specific line, but an error or a lint warning otherwise. So far that's the only way to do (5) or (6) in a way to get useful diagnostics.
Er, that I've thought of.
I am fine with any of 3, 4, 6 or 7. In fact, I would be fine with hard-coding, in the python code which directories we skip on the GPU ports.
7 certainly seems like the simplest short-term option. In practice, it's rare that we'll want to add new directories for the GPU bots to run.
I actually prefer 3 to all the other options. Run the GPU tests on their own machines and just run all the tests on those bots as well. It's simplest in terms of the python code and simplest in terms of future maintenance (e.g. as the set of tests that are affected by the GPU grows).
In the long term we wanna run every layout test through the GPU path (because eventually we may be compositing at least the root layer on every page).
I agree with Ojan and James, 3 sounds like the best option, but 7 would be an acceptable short term solution. I think this is what I argued for a long time ago when the gpu tests were being set up.
I'm fine with 3. We could implement 7 as a stop-gap (till we get new bots) by using --test-list=<file with all the GPU test directories) and --force on the GPU bots.
Okay, it sounds like we have rough consensus. The plan going forward, then, will be:
1) merge the expectations files and remove all of the GPU "skip" lines. This seems most consistent to me with the dream goal of having zero non-WONTFIX lines in the file.
2) as an interim fix, hard-code the list of of directories (and individual tests, if necessary) somewhere. I'm with Ojan that the easiest and "most correct" place for this hard-coded list is in the chromium_gpu.py file (as opposed to Mihai's suggestion of generating a test list text file and modifying the bot code).
3) over the long run, we wish to run more tests through the GPU code paths, so we'll throw hardware at the problem by moving the GPU tests onto separate bots
I will submit a separate patch for (2), then update the patch here for (1), and get the ball rolling on (3).
Oh, for reference, I did do some experiments with changing the precedence logic (option (5), above), and there were definitely going to be gotchas if we went down that route - it wasn't clearly better than the currently logic, and we'd have to re-educate everyone. So, I'm taking that option off the table.
Created attachment 81003 [details]
remove SKIPS, merge to r77429
Comment on attachment 81003 [details]
remove SKIPS, merge to r77429
View in context: https://bugs.webkit.org/attachment.cgi?id=81003&action=review
> +BUG_DRT CPU WIN LINUX DEBUG SLOW : inspector = PASS
Why was it necessary to add CPU to the inspector lines?
> +2011-02-01 Dirk Pranke <email@example.com>
Looks like an extra ChangeLog entry.
(In reply to comment #13)
> (From update of attachment 81003 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=81003&action=review
> > LayoutTests/platform/chromium/test_expectations.txt:623
> > +BUG_DRT CPU WIN LINUX DEBUG SLOW : inspector = PASS
> Why was it necessary to add CPU to the inspector lines?
Good catch. That was left over from when we had an explicit GPU SKIP inspector line. I'll delete it now.
> > Tools/ChangeLog:20
> > +2011-02-01 Dirk Pranke <firstname.lastname@example.org>
> Looks like an extra ChangeLog entry.
Yeah. Will fix.
Created attachment 81010 [details]
update w/ fixes from Mihai's review
Dimitri - let's plan to land this tomorrow morning.
Created attachment 81081 [details]
rebase to r77511
Committed r77666: <http://trac.webkit.org/changeset/77666>
http://trac.webkit.org/changeset/77666 might have broken GTK Linux 32-bit Release and GTK Linux 32-bit Debug
The following tests are not passing: