Bug 73613 - [NRWT] reftest should support having multiple references per test
Summary: [NRWT] reftest should support having multiple references per test
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ryosuke Niwa
URL:
Keywords:
Depends on:
Blocks: 66295
  Show dependency treegraph
 
Reported: 2011-12-01 19:51 PST by Ryosuke Niwa
Modified: 2011-12-02 14:58 PST (History)
6 users (show)

See Also:


Attachments
Patch (26.21 KB, patch)
2011-12-01 23:59 PST, Ryosuke Niwa
dpranke: review+
dpranke: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2011-12-01 19:51:12 PST
RIght now new-run-webkit-test supports having exactly one reference file per test. However, this isn't the case for w3c tests, and we need to support having multiple reference files in order to import w3c tests.
Comment 1 Ryosuke Niwa 2011-12-01 20:10:03 PST
Example from http://test.csswg.org/suites/css2.1/20110323/html4/reftest.list:

== block-in-inline-insert-003.htm block-in-inline-insert-003-nosplit-ref.htm
== block-in-inline-insert-003.htm block-in-inline-insert-003-ref.htm
Comment 2 Ryosuke Niwa 2011-12-01 20:16:27 PST
Quotation from http://wiki.csswg.org/test/reftest:
"If a test has multiple == references then any of the references may match the test. If a test has multiple != references, then none of those references may match the test. The reference file may also have entries in the manifest, in this case, the renderings of the references must match each other as well in order to consider the test as passed."
Comment 3 Ryosuke Niwa 2011-12-01 23:59:57 PST
Created attachment 117581 [details]
Patch
Comment 4 Eric Seidel (no email) 2011-12-02 00:03:15 PST
Really?  Why?
Comment 5 Ryosuke Niwa 2011-12-02 00:06:41 PST
https://bugs.webkit.org/show_bug.cgi?id=66295

Comment #14 From Ryosuke Niwa 2011-11-03 10:01:31 PST (-) [reply] 
(In reply to comment #13)
> Alternatively, we can give a feedback to W3C that we should have exactly one reference file per test. What's the purpose of having multiple references anyway?

Answer: "In some cases when creating the reference file, it is necessary to use features that, although different from the tested features, may themselves fail in such a manner as to cause the reference to render identically to a failed test. When this is the case, in order to reduce the possibility of false positive testing outcomes, multiple reference files should be used, each using a different technique to render the reference. One possibility is to create one or more references that must not match the test file, i.e.: a file that renders in the same manner as a failed test."
Comment 6 Dirk Pranke 2011-12-02 11:56:54 PST
Comment on attachment 117581 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=117581&action=review

The patch looks basically fine. I found a few places where I'd rework things a bit or add comments to make them clearer (at least to me).

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:725
> +    def test_reftest_skipped_if_unlisted(self):

It wasn't immediately clear to me what "unlisted" meant, so I'd probably add a comment here that you're testing what happens if there's an old-style reftest found in a directory containing a manifest file.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:70
> +                    _log.error('The reftest (%s) can not have an expectation file. Please remove (%s).',

I would probably log this as:

"%s is both a reftest and has an expected output file: %s"

it may be that the test shouldn't actually be a reftest, rather than that it shouldn't have the expected output.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:273
> +            # Reftest fails if any mismatch matches; it succeeds if any match matches

I think it would be clearer if you pull this comment up before the for statement, and combine it with the comment on line 267, something like:

# A reftest can have multiple match references and multiple mismatch references; 
# the test fails if any mismatch matches and all of the matches don't match. 
# To minimize the number of references we have to check, we run all of the mismatches first, then the matches, and short-circuit out as soon as we can.
# Note that sorting by the expectation sorts "!=" before "==" so this is easy to do.

It's a bit wordy, but the logic is non-obvious to the  uninitiated, so I think it's worth it to be wordy here.

And then I would get rid of the "putAllMismatchBeforeMatch" variable (which threw me until I realized you were just aliasing 'sorted'.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:275
> +                test_result_writer.write_test_result(self._port, self._test_name, test_output, reference_output, test_result.failures)

can this just be a "break"?

> Tools/Scripts/webkitpy/layout_tests/port/base.py:446
> +        return reftest_list[filename]

Nit: these last three lines could be just "return reftest_list.get(filename, [])".

> Tools/Scripts/webkitpy/layout_tests/port/base.py:1064
> +            parsed_list[key] = []

Nit: these three lines can be written as "parsed_list.setdefault(key, filesystem.join(...)". That returns parsed_list[key] as well, so you can actually chain in the next line, but I don't know if that would be more or less readable in this case.
Comment 7 Ojan Vafai 2011-12-02 13:20:11 PST
I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.
Comment 8 Ryosuke Niwa 2011-12-02 13:21:50 PST
(In reply to comment #7)
> I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.

I don't think we can do that unless we don't mind losing test coverage.
Comment 9 Ryosuke Niwa 2011-12-02 13:37:37 PST
(In reply to comment #7)
> I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.

Also note that they don't really "need" to be updated. Exiting tools should all work because there will still be exactly one actual result per test.  It's just "nice" if they did show links for all reference files.
Comment 10 Ryosuke Niwa 2011-12-02 13:41:31 PST
Comment on attachment 117581 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=117581&action=review

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:725
>> +    def test_reftest_skipped_if_unlisted(self):
> 
> It wasn't immediately clear to me what "unlisted" meant, so I'd probably add a comment here that you're testing what happens if there's an old-style reftest found in a directory containing a manifest file.

Renamed the test to test_reftest_should_not_use_naming_convention_if_not_listed_in_reftestlist.

>> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:70
>> +                    _log.error('The reftest (%s) can not have an expectation file. Please remove (%s).',
> 
> I would probably log this as:
> 
> "%s is both a reftest and has an expected output file: %s"
> 
> it may be that the test shouldn't actually be a reftest, rather than that it shouldn't have the expected output.

Done.

>> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:275
>> +                test_result_writer.write_test_result(self._port, self._test_name, test_output, reference_output, test_result.failures)
> 
> can this just be a "break"?

Done.

>> Tools/Scripts/webkitpy/layout_tests/port/base.py:446
>> +        return reftest_list[filename]
> 
> Nit: these last three lines could be just "return reftest_list.get(filename, [])".

Done.

>> Tools/Scripts/webkitpy/layout_tests/port/base.py:1064
>> +            parsed_list[key] = []
> 
> Nit: these three lines can be written as "parsed_list.setdefault(key, filesystem.join(...)". That returns parsed_list[key] as well, so you can actually chain in the next line, but I don't know if that would be more or less readable in this case.

Done.
Comment 11 Ryosuke Niwa 2011-12-02 13:42:37 PST
Committed r101845: <http://trac.webkit.org/changeset/101845>
Comment 12 Ojan Vafai 2011-12-02 13:57:29 PST
(In reply to comment #8)
> (In reply to comment #7)
> > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.
> 
> I don't think we can do that unless we don't mind losing test coverage.

It's only a moderate loss in test coverage. It's worth the simplification to the test harness.

(In reply to comment #9)
> (In reply to comment #7)
> > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.
> 
> Also note that they don't really "need" to be updated. Exiting tools should all work because there will still be exactly one actual result per test.  It's just "nice" if they did show links for all reference files.

But all these tolls also show you the expected result and now they'll need to be updated to show you multiple expected results and multiple diffs. It's all doable, but it's non-trivial and considerably complicates the codebase. It makes understanding test failures more complicated as well. Also, I'm not sure who will actually do the work to make it happen.
Comment 13 Ryosuke Niwa 2011-12-02 14:00:55 PST
(In reply to comment #12)
> (In reply to comment #8)
> > I don't think we can do that unless we don't mind losing test coverage.
> 
> It's only a moderate loss in test coverage. It's worth the simplification to the test harness.

There are many tests in CSS2.1 with multiple reference files.

> (In reply to comment #9)
> > (In reply to comment #7)
> > > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.
> > 
> > Also note that they don't really "need" to be updated. Exiting tools should all work because there will still be exactly one actual result per test.  It's just "nice" if they did show links for all reference files.
> 
> But all these tolls also show you the expected result and now they'll need to be updated to show you multiple expected results and multiple diffs. It's all doable, but it's non-trivial and considerably complicates the codebase. It makes understanding test failures more complicated as well. Also, I'm not sure who will actually do the work to make it happen.

There is exactly one diff per test because we bail out as soon as we find one failure. (the order is stable). And the reference file that caused the failure (i.e. the first one that failed) will be reported as the reference_file in the json file.
Comment 14 Dirk Pranke 2011-12-02 14:42:04 PST
I think there's a significant value to supporting the w3c test suites fully and as close to as-is as possible. In particular, I think there's quite a bit of value in capturing the fact that tests can pass multiple different ways (which we can't today capture with IMAGE tests, only with text-only/script-based tests.

The additional complexity of handling multiple results, at least in NRWT, is smaller than handling the reftest manifest, and in terms of things I'd remove from NRWT if I wanted to reduce the complexity, this isn't close to the top of the list.

It is true, though, that supporting multiple references may complicate things ... do we care if a test used to match A and passes and now matches B and passes? Or used to match A when it shouldn't but now matches B when it shouldn't? Maybe we should have some way of indicating which reference of the allowed set we do expect to match?

Also, what does it mean for garden-o-matic to handle reftests at all? It's not like you can meaningfully rebaseline them, unless you were to switch which match was expected.
Comment 15 Ryosuke Niwa 2011-12-02 14:58:59 PST
(In reply to comment #14)
> I think there's a significant value to supporting the w3c test suites fully and as close to as-is as possible. In particular, I think there's quite a bit of value in capturing the fact that tests can pass multiple different ways (which we can't today capture with IMAGE tests, only with text-only/script-based tests.

Completely agree. That's the whole rationale behind my patch. If reducing the complexity was so important, then I wouldn't even support reftest.list and just convert all tests to use naming convention.

> The additional complexity of handling multiple results, at least in NRWT, is smaller than handling the reftest manifest, and in terms of things I'd remove from NRWT if I wanted to reduce the complexity, this isn't close to the top of the list.

Which is even less than supporting the embedded link elements. Note embedded link elements can specify multiple reference files as well.

> It is true, though, that supporting multiple references may complicate things ... do we care if a test used to match A and passes and now matches B and passes? Or used to match A when it shouldn't but now matches B when it shouldn't? Maybe we should have some way of indicating which reference of the allowed set we do expect to match?

That's a good point. We probably need to revisit that once we have a large number of reftests imported and it becomes an issue for us. On the other hand, flexible test expectation is one of many benefits of reftests.

> Also, what does it mean for garden-o-matic to handle reftests at all? It's not like you can meaningfully rebaseline them, unless you were to switch which match was expected.

Probably updating test_expectations.txt?