https://bugs.webkit.org/show_bug.cgi?id=80709 (Convert regions parsing test to use testharness.js) added testharness.js files from the W3C and a sample test that uses this harness. It also added an -expected.txt file for the test, as this was the easiest way to integrate W3C testharness.js testing into WebKit. This means that when we import W3C test suites, we will have to create and maintain -expected.txt files for each testharness.js test. It would be better if our test harness was able to determine whether a testharness.js test passed or failed without an extra .txt file. The data is available at the end of the test - we're just creating a .txt file to compare against the report output. Perhaps there's something in the W3C test suite build system that we can use to identify testharness.js tests, as we do with reftest reference matching with the manifest?
What if our behavior is different than what the W3C expects? We use the -expected.txt files to record what behavior we expect from the engine so that we can detect when that changes.
We've imported lots of test suites from other places and have found the -expected.txt files quite useful to track our progress towards conformance and to record when we intentionally differ from the "correct" behavior according to the author of the test suite.
That's a fair point. I am mostly concerned with maintenance - if there is no discrepancy in behavior, it's extra work to generate and maintain that separate file. I'm assuming most of the cases where WebKit intentionally differs from the 'correct' behavior are actually problems with the W3C test suite. When that's the case we should be getting the W3C test suite fixed rather than maintaining the diff in our output. I find it a bit odd that it's useful to track progress towards conformance by having a set of tests "pass" against expected failures. Wouldn't it be better to have the actual expected result with the test skipped until it conforms?
> I find it a bit odd that it's useful to track progress towards conformance by having a set of tests "pass" against expected failures. Wouldn't it be better to have the actual expected result with the test skipped until it conforms? Often a single test file contains multiple sub-tests, some that pass and some that fail. The -expected.txt mechanism lets us make sure that we continue to pass the subtests we currently pass rather than forcing us to skip entire groups of tests. Also, when someone makes a change that improves conformance, it's easy to review diffs to the -expected.txt files as part of the patch review process. If we skip the tests, then we later have to try unskipping the tests to see what all passes. I would encourage you to try our current -expected.txt workflow for a while to see what you do and don't like about it. It seems to have been working fairly well for us for a while.
I agree with Adam here. Even though we can probably abolish -expected.txt files for cases where the test pass, and only generate expected file for failing case, make pretty-patch/review tool support such cases, I don't think it's worth the complexity. Having said that, maybe what you want is some import helper script that tells you which tests are passing from -expected.txt files.
If you think it's worth the work to maintain the -expected.txt files, then close this bug out. As far as scripts go, we should probably have an import helper script that runs the tests and adds the -expected.txt files from the results, and flags where the results contain failures. Then we should have a script that runs a particular W3C suite and generates an implementation report (http://wiki.csswg.org/test/implementation-report) based on the actual results (noting where the -expected.txt files are masking failures). Let's talk about this at the contributor's meeting.
I wonder what's the plan for this bug but for the record in Chromium you don't need an -expected.txt file when all the test cases pass.
We're not gonna do this. -expected.txt documents the current state of the world, and has been very useful with WPT tests.