95268 – garden-o-matic doesn't handle multiple TestExpectations files properly

NEW 95268

garden-o-matic doesn't handle multiple TestExpectations files properly

https://bugs.webkit.org/show_bug.cgi?id=95268

Summary garden-o-matic doesn't handle multiple TestExpectations files properly

Dirk Pranke

Reported 2012-08-28 17:41:13 PDT

This was added as a temporary hack to make some of the webkit-patch commands at least sort-of work while we were implementing support for cascading expectations. However, now that the cascade is fully implemented, not including the full cascade is never needed and will almost certainly do the wrong thing if the overriding files are non-empty.

Attachments
Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2012-09-21 16:06:02 PDT

Okay, turns out life isn't as simple as I would like :( ... webkit-patch rebaseline-test-internal (which is used by rebaseline-json and hence garden-o-matic) takes a 'builder' and a 'test name' as arguments, and uses include_overrides=False so that it only has to update a single TestExpectations file. In other words, garden-o-matic doesn't (and can't correctly) deal with ports that use a cascade of expectations files, which is basically every port but the chromium desktop ports, and probably shortly those as well (since I want to add support for a top-level TestExpectations file shared by all ports). Given that the cascade means that some files may have entries that override other files, making this work isn't totally easy. I think we need to modify the rebaseline-test-internal logic to figure out which file has the given test name for that builder, and then only update that particular file. I'm not yet sure that we even expose all the methods we need to do this properly from the test_expectations.py* objects. At any rate, I'm going to change the subject line for this to reflect the bigger problem.

Dirk Pranke

Comment 2 2012-09-21 16:24:37 PDT

Ugh, dealing with the cascade makes things ugly. If I have foo.html = ImageOnlyFailure in platform/mac, and foo.html = Failure in platform/mac-lion, and we rebaseline the test (which will rebaseline both text and image if needed), then it's not even obvious what the "correct" thing to do is. Do we put a new baseline only in mac-lion and delete the entry from platform/mac-lion? Do we put a new baseline in platform/mac and delete both entries, and then put Failure entries into platform/mac-snowleopard etc. ? Do we delete both entries and wait to see what fails on the next run?

Dirk Pranke

Comment 3 2012-09-21 16:36:03 PDT

given that this appears to be a larger problem than I had at first thought, I'm disclaiming ownership for now :). Copying a few people who I think have tried to use garden-o-matic on the other ports to see what they think.

Ojan Vafai

Comment 4 2012-09-21 16:53:26 PDT

It's not clear to me what the right behavior is. Here's one brainstorm idea. Consider the case of rebaselining a test that has an entry in platform/mac. 1. Copy that entry into all the TestExpectations files that depend on platform/mac. 2. Remove the entry from each TestExpectations file for each bot that you rebaseline. You only remove platform/mac one if you are rebaselining the latest mac port (i.e. Mountain Lion), remove the platform/mac-lion entry iff you're rebaselining the Lion bot, etc. There should be at least one bot per TestExpectations file, right? In the common case where you are rebaselining all the mac ports, the entry will effectively get removed from all TestExpectations files. The potential downside to this is that if you're not rebaselining all the ports, you'll end up with stray leftover lines, but this is pretty much how garden-o-matic works in general anyways with modifying baselines, so I'm inclined to say that's ok.

Dirk Pranke

Comment 5 2012-09-21 17:13:10 PDT

(In reply to comment #4) Yeah, that might work. I'd have to think about it further. We might need more hooks from the Port object to do this properly (e.g., "most_specific_expectations_file" or something). Also, it's not clear what we'd do in the case of files like skia_test_expectations.txt and the downstream test_expectations.txt, since they don't apply to just one configuration. Given that those files are supposed to be empty nearly all of the time, maybe we can error out if they have relevant entries?

Dimitri Glazkov (Google)

Comment 6 2012-09-23 16:03:44 PDT

Just a random thought: if we _disable_ multi-platform expectations syntax in lieu of cascading expectations, we might be able to make the algorithm of collapsing a set of TestConfigurations into specifiers/modifiers easier, because now we have a simple tree structure to rely upon. Of course, this means more pain for gardeners (who now need to modify multiple files), but maybe that's not a big issue, if we have the right tooling?

Ojan Vafai

Comment 7 2012-09-24 10:48:21 PDT

(In reply to comment #6) > Just a random thought: if we _disable_ multi-platform expectations syntax in lieu of cascading expectations, we might be able to make the algorithm of collapsing a set of TestConfigurations into specifiers/modifiers easier, because now we have a simple tree structure to rely upon. Of course, this means more pain for gardeners (who now need to modify multiple files), but maybe that's not a big issue, if we have the right tooling? I've thought a lot about this and see pros/cons in both directions. I think as long as we have a platform/chromium/TestExpectations file, it shouldn't be too much more work for gardeners. One advantage here is that there will be fewer duplicate line warnings because duplicates across files are allowed. The corollary, is that people are more likely to accidentally leave junk behind in some files. That all said, I think we should do it for the following reasons: -Makes Chromium less of a special-case port. -We don't need to figure out which modifiers apply to which ports and in what way. The only platform modifiers would be Release and Debug. Nothing else. -Simplifies the tooling code considerably. The big downside here is that it's a good deal of tooling work. Not sure there's anyone with time to do it at the moment.

Dirk Pranke

Comment 8 2012-09-24 12:48:47 PDT

(In reply to comment #7) > (In reply to comment #6) > > Just a random thought: if we _disable_ multi-platform expectations syntax in lieu of cascading expectations, we might be able to make the algorithm of collapsing a set of TestConfigurations into specifiers/modifiers easier, because now we have a simple tree structure to rely upon. Of course, this means more pain for gardeners (who now need to modify multiple files), but maybe that's not a big issue, if we have the right tooling? > I would expect the algorithms to be somewhat easier, but it's hard to say. For one thing, you could no longer treat things consistently: how you treated a debug-only failure would be different from a lion-only failure, and how you treated a lion-only failure would be different from a mac-wide failure. I fear we'd have more special-case logic in the code, but I haven't thought about it a lot yet. Also, we have to remember that multiple files is needed for *both* platform-specific expectations *and* overrides. Right now the concern I have is figuring out how to deal with the override aspect of the problem. I do think it would be good to make all of the ports work consistently, and I wonder how much the need for platform-specific expectations will be reduced once we start checking in failures. That said, I no longer think anything will be "considerably" simpler in this part of the code :).

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Nobody

Reported

2012-08-28 17:41 PDT

Modified

2017-07-18 08:29 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks