Bug 65834 - NRWT should support cascading expectations
: NRWT should support cascading expectations
Status: RESOLVED FIXED
: WebKit
Tools / Tests
: 528+ (Nightly build)
: Unspecified All
: P2 Normal
Assigned To:
:
:
: 86926 87802 88942 88944 88945 88946 88947 88948
: 88680
  Show dependency treegraph
 
Reported: 2011-08-07 15:57 PST by
Modified: 2012-06-13 16:00 PST (History)


Attachments
make able the wk2 ports to share their common expectations in the wk2 platform (6.43 KB, patch)
2011-08-15 03:43 PST, Kristóf Kosztyó
no flags Review Patch | Details | Formatted Diff | Diff
make able the wk2 ports to share their common expectations in the wk2 platform (6.88 KB, patch)
2011-08-18 00:59 PST, Kristóf Kosztyó
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2011-08-07 15:57:42 PST
Discussion on the list started there: https://lists.webkit.org/pipermail/webkit-dev/2011-July/017520.html.
I believe we have a consensus about cascading is needed. The question is how to do it. Actually I like Maciej's idea about introducing an include directive.
------- Comment #1 From 2011-08-08 09:13:10 PST -------
Could you summarize the consensus?  I don't really like an include directive because it can create non-tree fallback paths.
------- Comment #2 From 2011-08-08 09:57:42 PST -------
(In reply to comment #1)
> Could you summarize the consensus?  I don't really like an include directive because it can create non-tree fallback paths.

I meant we have a consensus about is it needed not about how should we do it.
Actually we can form a tree path with include and we can also have a non-tree path without it. I like the idea of the include directive because this way the fallback path would be explicitly defined in the list itself.
------- Comment #3 From 2011-08-08 13:53:07 PST -------
We don't have tree fallback paths now, I doubt we'll easily be able to get rid of them, and changing the teset expectations shouldn't change the fallback paths, regardless (they're different concepts).

[ Unless of course we fundamentally want to change some aspect of how test expectations work ].

That said, the only way to currently share expectations is to share the same expectations.txt file.

Given that Chromium ports currently use the expectations file to track that the expectation is "expected to fail" and hence are using the expectations as "correctness" tests, and the other ports check in "failing" expectations and hence use the expectations as "regression" tests, we almost certainly don't want to share Chromium- and non-Chromium expectations.txt files. Besides, the Chromium file is is already way too big and churns too often.

It's not clear to me that we need more than a "generic" set of expectations that applies to all ports in addition to the the "port-specific" expectations. I would be tempted to prototype that first and see if that got us what we wanted. 

But, it's not clear to me what we "want", either. 

My proposal would be that we (a) change Chromium to match the other port's behavior and view tests as regression tests, not correctness tests (i.e., check in revised platform-specific expectations) and then (b) see if we can get away with either a single file for all ports or a generic + port-specific file.
------- Comment #4 From 2011-08-09 01:41:31 PST -------
(In reply to comment #3)
> We don't have tree fallback paths now, I doubt we'll easily be able to get rid of them, and changing the teset expectations shouldn't change the fallback paths, regardless (they're different concepts).

As I know with ORWT the path of cascading the results was the same as the fallback paths. I think this is the only logical way of this.

> [ Unless of course we fundamentally want to change some aspect of how test expectations work ].
> 
> That said, the only way to currently share expectations is to share the same expectations.txt file.
> 
> Given that Chromium ports currently use the expectations file to track that the expectation is "expected to fail" and hence are using the expectations as "correctness" tests, and the other ports check in "failing" expectations and hence use the expectations as "regression" tests, we almost certainly don't want to share Chromium- and non-Chromium expectations.txt files. Besides, the Chromium file is is already way too big and churns too often.
> 

I think the cascading has nothing to do with the way how we interpret the results. Cascading is useful for both regression and correctness tests.
The point is that a port has subports (like qt and qt-linux) and the subport
has some metric differences but a great amount of expectation can be shared
across subports.

> It's not clear to me that we need more than a "generic" set of expectations that applies to all ports in addition to the the "port-specific" expectations. I would be tempted to prototype that first and see if that got us what we wanted. 

I don't think we need a generic set of expectations. On the contrary, we need less generic expectations for subports.
------- Comment #5 From 2011-08-09 09:08:34 PST -------
(In reply to comment #4)
> The point is that a port has subports (like qt and qt-linux) and the subport
> has some metric differences but a great amount of expectation can be shared
> across subports.

Agreed; however, the normal usage of the expectations file handles this fine (this is how Chromium uses it): use one expectations file for all of the QT ports. Results that are the same across all subports need no modifiers; results that vary can have them, e.g.:

BUGWK12345 : foo/bar.html = TEXT
BUGWK12346 LINUX : foo/bar.html = IMAGE

indicates that all of the Qt ports fail the first but only Qt-Linux fails the second.

The complication arises if you want to also share expectations with a different port (Gtk, Apple, Chromium, whatever), in which case you'd have to add some way to distinguish qt-linux from chromium-linux, or if you want to have expectations that are common across *all* ports, e.g., because every port will fail the test.
------- Comment #6 From 2011-08-15 02:04:47 PST -------
The webkit2 ports share their expectations in the general wk2 platform.
(e.g. the qt-wk2 should use these: qt, qt-wk2, wk2)

I think the following things can solve this problem:
 - concatenate the wk2/test_expectations.txt and the <port>-wk2/test_expectations.txt without any include directive in a nice named method like cascade_wk2_test_expectations()
 - the test_expectations() return with the <port> expectations
 - the test_expectations_override() return with the concatenated list if the webkit test runner is used
------- Comment #7 From 2011-08-15 03:43:48 PST -------
Created an attachment (id=103899) [details]
make able the wk2 ports to share their common expectations in the wk2 platform
------- Comment #8 From 2011-08-15 09:09:05 PST -------
(From update of attachment 103899 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=103899&action=review

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:358
> +        if self._filesystem.exists(path_to_wk2_test_expectations_file):
> +            expectations += self._filesystem.read_text_file(path_to_wk2_test_expectations_file)

This might be more of a maintenance burden than you'd expect because test_expectations don't allow duplicate / shadowing expectations.  That means you'll need to make sure that you don't have conflicts between the two files.  (That's not to say we shouldn't go this route, just that it's not as pretty as it might seem.)
------- Comment #9 From 2011-08-15 10:22:00 PST -------
(From update of attachment 103899 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=103899&action=review

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:358
>> +            expectations += self._filesystem.read_text_file(path_to_wk2_test_expectations_file)
> 
> This might be more of a maintenance burden than you'd expect because test_expectations don't allow duplicate / shadowing expectations.  That means you'll need to make sure that you don't have conflicts between the two files.  (That's not to say we shouldn't go this route, just that it's not as pretty as it might seem.)

I thought overrides are allowed to duplicate/shadow (hence, it's an override).  That's what http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/models/test_expectations.py#L825 seems to suggest and I think that's how the chromium repository test_expectations.txt works.
------- Comment #10 From 2011-08-15 10:37:27 PST -------
(From update of attachment 103899 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=103899&action=review

>>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:358
>>> +        if self._filesystem.exists(path_to_wk2_test_expectations_file):
>>> +            expectations += self._filesystem.read_text_file(path_to_wk2_test_expectations_file)
>> 
>> This might be more of a maintenance burden than you'd expect because test_expectations don't allow duplicate / shadowing expectations.  That means you'll need to make sure that you don't have conflicts between the two files.  (That's not to say we shouldn't go this route, just that it's not as pretty as it might seem.)
> 
> I thought overrides are allowed to duplicate/shadow (hence, it's an override).  That's what http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/models/test_expectations.py#L825 seems to suggest and I think that's how the chromium repository test_expectations.txt works.

Ah, looks like you're right!
------- Comment #11 From 2011-08-15 10:37:45 PST -------
We should probably add a test to that effect.
------- Comment #12 From 2011-08-15 10:40:17 PST -------
(In reply to comment #11)
> We should probably add a test to that effect.

tomorrow I will test it, and if it will be necessary I will fix it
------- Comment #13 From 2011-08-15 11:07:31 PST -------
The way I'm reading this patch, you will use the regular "qt" expectations by default, and if you're running the wk2 variant, you'll use the concatenation of the platform/wk2 and platform/qt-wk2 expectations as overrides. Is this the intended behavior?

I'm not sure that this is a great idea. The overrides files are really intended to be mostly empty most of the time, and for short-term fixes (they were initially implemented in order to handle downstream failures in the chromium repo). The act of overriding can make it hard to understand which rules are actually taking effect, and can make it harder to figure out which expectations need to be modified where.

In particular, in this case, you won't have any easy way to specify a test that fails one way most webkit 2 ports and a different way on the Qt port (e.g., a test that produces a TEXT diff on most wk2 ports but Qt times out). 

Of course, if the non-Chromium ports follow the approach that these files are empty most of the time anyway, maybe this'll be left of an issue.
------- Comment #14 From 2011-08-18 00:59:52 PST -------
Created an attachment (id=104311) [details]
make able the wk2 ports to share their common expectations in the wk2 platform

Sorry I didn't notice that about the overrides therefore in this patch I don't use it.
It has one more advantage at now: when one test skipped for the orwt it now can mark as expectation for the nrwt too, because it clear the recurring expectation
------- Comment #15 From 2011-08-19 15:58:37 PST -------
(From update of attachment 104311 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=104311&action=review

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:370
> +        return result

I'm pretty sure that by coding things this way you're ignoring any modifiers that might be set on the lines, and so if you have a line in the wk2 file like:

BUGX MAC : foo/bar.html = PASS

and a line in the regular file like:

BUGY WIN : foo/bar.html = TEXT

you will throw away the second line, which would be wrong.

I'm not sure that trying to manually implement a cascade through concatenation and a limited form of deduplication is a fruitful way of doing things. I think we probably need to implement "proper" support for cascading that is aware of the modifiers.
------- Comment #16 From 2011-08-23 02:17:59 PST -------
> BUGX MAC : foo/bar.html = PASS
> 
> and a line in the regular file like:
> 
> BUGY WIN : foo/bar.html = TEXT

In this example the next line is the correct?
BUGY WIN : foo/bar.html = TEXT

But I think it is a very bad idea to write an expected line like this in the file what is for the cross-platform failures.
On the other hand the correct merge of the expected lines is a very nice idea.
------- Comment #17 From 2011-08-23 10:37:42 PST -------
(In reply to comment #16)
> > BUGX MAC : foo/bar.html = PASS
> > 
> > and a line in the regular file like:
> > 
> > BUGY WIN : foo/bar.html = TEXT
> 
> In this example the next line is the correct?
> BUGY WIN : foo/bar.html = TEXT
>

They are both correct. Imagine we're talking about the Qt port here. One expectation might be intended for the webkit2 version of qt-mac, and the other for qt-win (could be either for webkit1 or for both webkit1 and webkit2).

> But I think it is a very bad idea to write an expected line like this in the file what is for the cross-platform failures.

The basic idea of the test expectations file is to be able to manage all of the expectations for a single port in a single file; this is different than what we do using Skipped files, where you have files for each platform (and version, in some cases).

If you wanted to have a file represent only cross-platform failures, and another file represent failures specific to a single platform (or version), we could do that, but that would be a change from what the code does now (I wouldn't really be in favor of this change, in case that wasn't obvious).
------- Comment #18 From 2011-08-30 00:10:03 PST -------
There is any idea how we can merge correctly the different expectation lines?
Anyway is that possible?

e.g.:
BUGWKX :foo/bar.html=PASS //some comment
BUGWKY LINUX:foo/bar.html=FAIL //other comment
=
BUGWKX BUGWKY: foo/bar.html = PASS FAIL //some comment; other comment

In that example the expectation line will be correct, but we don't see if there any regression.
------- Comment #19 From 2011-09-02 18:23:14 PST -------
(In reply to comment #18)
> There is any idea how we can merge correctly the different expectation lines?
> Anyway is that possible?
> 
> e.g.:
> BUGWKX :foo/bar.html=PASS //some comment
> BUGWKY LINUX:foo/bar.html=FAIL //other comment
> =
> BUGWKX BUGWKY: foo/bar.html = PASS FAIL //some comment; other comment
> 
> In that example the expectation line will be correct, but we don't see if there any regression.

Hi Kristóf, 

I'm not sure if I understood your question.

Using your example above, if you have:

BUGWKX :foo/bar.html=PASS //some comment
BUGWKY LINUX:foo/bar.html=FAIL //other comment

Today, you will actually get a parsing conflict, since the expectation is unclear on Linux.

[ At one point the code did actually support this, and the second line would've overridden the first, but we got rid of this because it made the code more complex and arguably made the expectations harder to follow (because you couldn't just look at a single line to figure out what a test was expected to do). ]

Of course, we can change the code. I think we need to start by defining what we want the syntax in the files to be, and the corresponding semantics.

For example, you could just #include something and merge the results to produce a flaky result as you describe, but as you say, this would not be a very good solution.

Another option would be to #include the files and just treat them as conflicts directly. As long as the files contained disjoint lists of tests, this woud be fine. This might be the way to start, as this would be pretty easy to support. Or, we could follow the model of the overrides files and just have later files trump earlier ones.

But, if we wanted to support something fancier, it's not immediately obvious to me how to do it without resorting to something like what I described earlier (where a line with more modifiers takes precedence over a line w/ fewer).
------- Comment #20 From 2012-01-24 06:28:22 PST -------
Are we still want to support cascaded test_expectations.txt?
It seems only chromium port uses test_expectations.txt actively.
------- Comment #21 From 2012-01-24 06:38:28 PST -------
(In reply to comment #20)
> Are we still want to support cascaded test_expectations.txt?
> It seems only chromium port uses test_expectations.txt actively.

Because cascading is not supported. If it would be, we could use test_expectations.txt, and we could automatically notice no longer failing tests.
------- Comment #22 From 2012-05-18 15:46:42 PST -------
I'm working on this now ... the basic plan will be as follows:

cascading expectations will be an extension of the existing "overrides" mechanism. In other words, each port will have an ordered list of expectations files. If two files have an entry for the same test, the later entry will override the earlier one. If a later file has an entry for a directory that contains tests with entries in earlier files, those earlier entries will also be overridden (this will be different from how overrides currently work). This will allow later files to have full control over any and all suppressions.

Initially, all files in the list will be required to have support the same list of platform/configuration modifiers. If need arises, we can work out a scheme by which different files can support different modifiers and/or have certain configurations "defaulted in".

Skipped lists will be converted on-the-fly to expectations files and then appended. This means that tests listed in *any* Skipped file will override all entries in expectations files.

Any tests disabled by compile-time or runtime flags will then be appended as expectations; this will also override any expectations entries.

Finally, the command line flags --additional-expectations and --ignore-tests will be treated as if they were appended to the ordered list above, again giving the user full control over all of the expectations, allowing for local expectations on a machine, etc.

--lint-test-files diagnostics will be improved to report which files each expectation shows up in, and when expectations are redundant.

The basic implementation approach will be

1) rework the existing TestExpectations class interfaces a bit to make it easier to support the next sets of changes

2) add better diagnostics to --lint-test-files

3) update the existing code to remove the "overrides" concept and just pass the ordered lists around. "Skipped" lists

4) profit ... I mean, let each port decide if and how they want to use cascaded files and when they want to drop the Skipped files.
------- Comment #23 From 2012-05-18 15:51:40 PST -------
(In reply to comment #22)
> I'm working on this now ... the basic plan will be as follows:

This plan sounds great!

> Initially, all files in the list will be required to have support the same list of platform/configuration modifiers. If need arises, we can work out a scheme by which different files can support different modifiers and/or have certain configurations "defaulted in".

This is certainly good as a first milestone. I think we'll eventually want the platform modifiers to be inferred from the platform directory names (e.g. linux would be disallowed on Apple's test_expectations.txt file since there's no platform/mac-linux directory).
------- Comment #24 From 2012-05-21 11:48:13 PST -------
(From update of attachment 104311 [details])
clearing the flags from this old patch ... I assume it's no longer interesting/needed?
------- Comment #25 From 2012-06-12 20:02:41 PST -------
all the patches necessary to make NRWT support cascading expectations (with all expectations files using the same sets of modifiers) have been posted. There is more work to be done to make 'webkit-patch rebaseline-expectations' support them, and there's probably more work in the flakiness dashboard and possibly garden-o-matic, but those are separate issues.
------- Comment #26 From 2012-06-13 16:00:39 PST -------
Basic support for cascading is now supported. Each port may specify an ordered list of TestExpectations files; entries in later files will override entries in earlier files. 

The port is required to support any keyword listed in any of the files (which basically means that all the files should use the same list of keywords).

There is no #include mechanism; I don't think it's really needed, and it probably just complicates things.

Right now the non-Chromium ports only use a single TestExpectations file; it should be trivial to change it to do something like the the logic we use for Skipped files, and use platform/X/TestExpectations then platform/X-version/TestExpectations and platform/wk2/TestExpectations as necessary. We could also add a top-level TestExpectations file for tests that will be broken across all ports, if we wanted.

We can of course add more features, but I'd like to keep things simple at the start and see how this goes.

Feedback welcome!