73613 – [NRWT] reftest should support having multiple references per test

RESOLVED FIXED 73613

[NRWT] reftest should support having multiple references per test

https://bugs.webkit.org/show_bug.cgi?id=73613

Summary [NRWT] reftest should support having multiple references per test

Ryosuke Niwa

Reported 2011-12-01 19:51:12 PST

RIght now new-run-webkit-test supports having exactly one reference file per test. However, this isn't the case for w3c tests, and we need to support having multiple reference files in order to import w3c tests.

Attachments
Patch (26.21 KB, patch) 2011-12-01 23:59 PST, Ryosuke Niwa	dpranke: review+ dpranke: commit-queue-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2011-12-01 20:10:03 PST

Example from http://test.csswg.org/suites/css2.1/20110323/html4/reftest.list: == block-in-inline-insert-003.htm block-in-inline-insert-003-nosplit-ref.htm == block-in-inline-insert-003.htm block-in-inline-insert-003-ref.htm

Ryosuke Niwa

Comment 2 2011-12-01 20:16:27 PST

Quotation from http://wiki.csswg.org/test/reftest: "If a test has multiple == references then any of the references may match the test. If a test has multiple != references, then none of those references may match the test. The reference file may also have entries in the manifest, in this case, the renderings of the references must match each other as well in order to consider the test as passed."

Ryosuke Niwa

Comment 3 2011-12-01 23:59:57 PST

Created attachment 117581 [details] Patch

Eric Seidel (no email)

Comment 4 2011-12-02 00:03:15 PST

Really? Why?

Ryosuke Niwa

Comment 5 2011-12-02 00:06:41 PST

https://bugs.webkit.org/show_bug.cgi?id=66295 Comment #14 From Ryosuke Niwa 2011-11-03 10:01:31 PST (-) [reply] (In reply to comment #13) > Alternatively, we can give a feedback to W3C that we should have exactly one reference file per test. What's the purpose of having multiple references anyway? Answer: "In some cases when creating the reference file, it is necessary to use features that, although different from the tested features, may themselves fail in such a manner as to cause the reference to render identically to a failed test. When this is the case, in order to reduce the possibility of false positive testing outcomes, multiple reference files should be used, each using a different technique to render the reference. One possibility is to create one or more references that must not match the test file, i.e.: a file that renders in the same manner as a failed test."

Dirk Pranke

Comment 6 2011-12-02 11:56:54 PST

Comment on attachment 117581 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=117581&action=review The patch looks basically fine. I found a few places where I'd rework things a bit or add comments to make them clearer (at least to me). > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:725 > + def test_reftest_skipped_if_unlisted(self): It wasn't immediately clear to me what "unlisted" meant, so I'd probably add a comment here that you're testing what happens if there's an old-style reftest found in a directory containing a manifest file. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:70 > + _log.error('The reftest (%s) can not have an expectation file. Please remove (%s).', I would probably log this as: "%s is both a reftest and has an expected output file: %s" it may be that the test shouldn't actually be a reftest, rather than that it shouldn't have the expected output. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:273 > + # Reftest fails if any mismatch matches; it succeeds if any match matches I think it would be clearer if you pull this comment up before the for statement, and combine it with the comment on line 267, something like: # A reftest can have multiple match references and multiple mismatch references; # the test fails if any mismatch matches and all of the matches don't match. # To minimize the number of references we have to check, we run all of the mismatches first, then the matches, and short-circuit out as soon as we can. # Note that sorting by the expectation sorts "!=" before "==" so this is easy to do. It's a bit wordy, but the logic is non-obvious to the uninitiated, so I think it's worth it to be wordy here. And then I would get rid of the "putAllMismatchBeforeMatch" variable (which threw me until I realized you were just aliasing 'sorted'. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:275 > + test_result_writer.write_test_result(self._port, self._test_name, test_output, reference_output, test_result.failures) can this just be a "break"? > Tools/Scripts/webkitpy/layout_tests/port/base.py:446 > + return reftest_list[filename] Nit: these last three lines could be just "return reftest_list.get(filename, [])". > Tools/Scripts/webkitpy/layout_tests/port/base.py:1064 > + parsed_list[key] = [] Nit: these three lines can be written as "parsed_list.setdefault(key, filesystem.join(...)". That returns parsed_list[key] as well, so you can actually chain in the next line, but I don't know if that would be more or less readable in this case.

Ojan Vafai

Comment 7 2011-12-02 13:20:11 PST

I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one.

Ryosuke Niwa

Comment 8 2011-12-02 13:21:50 PST

(In reply to comment #7) > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one. I don't think we can do that unless we don't mind losing test coverage.

Ryosuke Niwa

Comment 9 2011-12-02 13:37:37 PST

(In reply to comment #7) > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one. Also note that they don't really "need" to be updated. Exiting tools should all work because there will still be exactly one actual result per test. It's just "nice" if they did show links for all reference files.

Ryosuke Niwa

Comment 10 2011-12-02 13:41:31 PST

Comment on attachment 117581 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=117581&action=review >> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:725 >> + def test_reftest_skipped_if_unlisted(self): > > It wasn't immediately clear to me what "unlisted" meant, so I'd probably add a comment here that you're testing what happens if there's an old-style reftest found in a directory containing a manifest file. Renamed the test to test_reftest_should_not_use_naming_convention_if_not_listed_in_reftestlist. >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:70 >> + _log.error('The reftest (%s) can not have an expectation file. Please remove (%s).', > > I would probably log this as: > > "%s is both a reftest and has an expected output file: %s" > > it may be that the test shouldn't actually be a reftest, rather than that it shouldn't have the expected output. Done. >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:275 >> + test_result_writer.write_test_result(self._port, self._test_name, test_output, reference_output, test_result.failures) > > can this just be a "break"? Done. >> Tools/Scripts/webkitpy/layout_tests/port/base.py:446 >> + return reftest_list[filename] > > Nit: these last three lines could be just "return reftest_list.get(filename, [])". Done. >> Tools/Scripts/webkitpy/layout_tests/port/base.py:1064 >> + parsed_list[key] = [] > > Nit: these three lines can be written as "parsed_list.setdefault(key, filesystem.join(...)". That returns parsed_list[key] as well, so you can actually chain in the next line, but I don't know if that would be more or less readable in this case. Done.

Ryosuke Niwa

Comment 11 2011-12-02 13:42:37 PST

Committed r101845: <http://trac.webkit.org/changeset/101845>

Ojan Vafai

Comment 12 2011-12-02 13:57:29 PST

(In reply to comment #8) > (In reply to comment #7) > > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one. > > I don't think we can do that unless we don't mind losing test coverage. It's only a moderate loss in test coverage. It's worth the simplification to the test harness. (In reply to comment #9) > (In reply to comment #7) > > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one. > > Also note that they don't really "need" to be updated. Exiting tools should all work because there will still be exactly one actual result per test. It's just "nice" if they did show links for all reference files. But all these tolls also show you the expected result and now they'll need to be updated to show you multiple expected results and multiple diffs. It's all doable, but it's non-trivial and considerably complicates the codebase. It makes understanding test failures more complicated as well. Also, I'm not sure who will actually do the work to make it happen.

Ryosuke Niwa

Comment 13 2011-12-02 14:00:55 PST

(In reply to comment #12) > (In reply to comment #8) > > I don't think we can do that unless we don't mind losing test coverage. > > It's only a moderate loss in test coverage. It's worth the simplification to the test harness. There are many tests in CSS2.1 with multiple reference files. > (In reply to comment #9) > > (In reply to comment #7) > > > I agree with Eric, this use-case does not justify the extra complexity. For example, now the dashboard, garden-o-matic, results.html, etc all need to be updated to handle multiple expected results files. I think, in the cases where the w3c test lists use multiple reference files, we should just grab the first one. > > > > Also note that they don't really "need" to be updated. Exiting tools should all work because there will still be exactly one actual result per test. It's just "nice" if they did show links for all reference files. > > But all these tolls also show you the expected result and now they'll need to be updated to show you multiple expected results and multiple diffs. It's all doable, but it's non-trivial and considerably complicates the codebase. It makes understanding test failures more complicated as well. Also, I'm not sure who will actually do the work to make it happen. There is exactly one diff per test because we bail out as soon as we find one failure. (the order is stable). And the reference file that caused the failure (i.e. the first one that failed) will be reported as the reference_file in the json file.

Dirk Pranke

Comment 14 2011-12-02 14:42:04 PST

I think there's a significant value to supporting the w3c test suites fully and as close to as-is as possible. In particular, I think there's quite a bit of value in capturing the fact that tests can pass multiple different ways (which we can't today capture with IMAGE tests, only with text-only/script-based tests. The additional complexity of handling multiple results, at least in NRWT, is smaller than handling the reftest manifest, and in terms of things I'd remove from NRWT if I wanted to reduce the complexity, this isn't close to the top of the list. It is true, though, that supporting multiple references may complicate things ... do we care if a test used to match A and passes and now matches B and passes? Or used to match A when it shouldn't but now matches B when it shouldn't? Maybe we should have some way of indicating which reference of the allowed set we do expect to match? Also, what does it mean for garden-o-matic to handle reftests at all? It's not like you can meaningfully rebaseline them, unless you were to switch which match was expected.

Ryosuke Niwa

Comment 15 2011-12-02 14:58:59 PST

(In reply to comment #14) > I think there's a significant value to supporting the w3c test suites fully and as close to as-is as possible. In particular, I think there's quite a bit of value in capturing the fact that tests can pass multiple different ways (which we can't today capture with IMAGE tests, only with text-only/script-based tests. Completely agree. That's the whole rationale behind my patch. If reducing the complexity was so important, then I wouldn't even support reftest.list and just convert all tests to use naming convention. > The additional complexity of handling multiple results, at least in NRWT, is smaller than handling the reftest manifest, and in terms of things I'd remove from NRWT if I wanted to reduce the complexity, this isn't close to the top of the list. Which is even less than supporting the embedded link elements. Note embedded link elements can specify multiple reference files as well. > It is true, though, that supporting multiple references may complicate things ... do we care if a test used to match A and passes and now matches B and passes? Or used to match A when it shouldn't but now matches B when it shouldn't? Maybe we should have some way of indicating which reference of the allowed set we do expect to match? That's a good point. We probably need to revisit that once we have a large number of reftests imported and it becomes an issue for us. On the other hand, flexible test expectation is one of many benefits of reftests. > Also, what does it mean for garden-o-matic to handle reftests at all? It's not like you can meaningfully rebaseline them, unless you were to switch which match was expected. Probably updating test_expectations.txt?

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Ryosuke Niwa

Reported

2011-12-01 19:51 PST

Modified

2011-12-02 14:58 PST History

CC List

6 users Show

URL

Keywords

Depends on

Blocks

66295

Dependencies

tree graph