I have been checking that many (~328) of the expected results from imported/w3c contain FAIL strings. I understand this means that this tests are actually failing.
$ find LayoutTests/imported/w3c -type f -name '*-expected.txt'|xargs grep -l FAIL|wc -l
Wouldn't it have been better to mark this tests as failing in the TestExpectations file instead of uploading a -expected.txt file with FAIL results?
On the GTK+ port some of this tests PASS, but the tooling thing they are failing because the expected file is different than the one that it should be.
Check this example:
Running imported/w3c/web-platform-tests/XMLHttpRequest/send-entity-body-empty.htm on the GTK port I get:
Regressions: Unexpected text-only failures (1)
imported/w3c/web-platform-tests/XMLHttpRequest/send-entity-body-empty.htm [ Failure ]
But then, checking the diff result I see:
@@ -1,5 +1,5 @@
PASS XMLHttpRequest: send("") - empty entity body (POST)
PASS XMLHttpRequest: send("") - empty entity body (PUT)
-FAIL XMLHttpRequest: send("") - empty entity body (HEAD) assert_equals: expected "NO" but got "0"
+PASS XMLHttpRequest: send("") - empty entity body (HEAD)
Which means that the test is passing on the GTK+ port, but not on the port where the result was generated (Mac?).
I think the right thing to do is to mark this tests as failing on the corresponding TestExpectations, rather than committing as expected results the result of a test that FAILS.
Committed r204644: <http://trac.webkit.org/changeset/204644>
(In reply to comment #1)
> Committed r204644: <http://trac.webkit.org/changeset/204644>
I didn't want to close this, the tool automatically did it. Sorry.
In the commit above I updated just 3 of this tests, and marked them as failing on the general expectation files.
This 3 tests are passing on the GTK+ port, so I marked them as passing on the GTK+ TestExpectations file.
With the landed change, all platforms other than Gtk will now not know if these tests further regress, and fail more subtests. I think that this is detrimental overall.
Surely we don't want to ever have FAIL in our expected results; that makes it extremely difficult to track which tests are broken....
The W3C tests have two purposes:
- conformance: we want to progressively reduce the number of FAIL
- regression: whether FAIL or PASS, these tests reduce the risk to change some behavior without knowing it. Making a test expectation as fail is counter productive here.
To be noted also that these tests generally contain more than one test, some of them being PASS and others FAIL.
It might be better to have specific -expected files if ports have different results.
When refreshing W3C tests, I use the results of the iOS bots to update their specific -expected.txt files. It would be great if GTK/EFL bots could also give me those results.
> Surely we don't want to ever have FAIL in our expected results;
We actually do, for the reasons outlined in the comments above. Marking a test as Failure is quite detrimental.
Sometimes when dealing with old official test suites (like DOM2), a failure result on a subtest is the correct one, due to a change in newer spec version, so we don't even want to "fix" it. Yet we do want to run the test, to make sure that we don't start crashing, or that we don't accidentally revert to old behavior, and of course to verify other subtests in the test.
> that makes it extremely difficult to track which tests are broken....
The correct way to track bugs is in Bugzilla. I do not think that marking every partially failing test as Failure would work for tracking efforts - there is no priority, no discussion, no CC list etc.
> Sometimes when dealing with old official test suites (like DOM2), a failure
> result on a subtest is the correct one, due to a change in newer spec
> version, so we don't even want to "fix" it. Yet we do want to run the test,
> to make sure that we don't start crashing, or that we don't accidentally
> revert to old behavior, and of course to verify other subtests in the test.
I guess that in such a case, it is ok to change the test since we are no longer tracking it from a remote source. In an ideal world, changing the test so that FAIL disappears from the expectations files would be nice.
But this is probably a low priority.
I created bug 161036 to create Mac specific expectations for those tests.
It seems ok to me to have the "best" expectation files as the generic baseline.