Bug 105860 - Tests with WontFix expectation are (indirectly) skipped
Summary: Tests with WontFix expectation are (indirectly) skipped
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Zan Dobersek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-29 10:51 PST by Zan Dobersek
Modified: 2013-01-15 14:04 PST (History)
6 users (show)

See Also:


Attachments
Patch (4.57 KB, patch)
2013-01-15 13:55 PST, Zan Dobersek
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zan Dobersek 2012-12-29 10:51:32 PST
Any test with a WontFix expectation is not run (i.e. skipped).

I find this rather inconvenient as I'd like to test as many tests as possible, even if a specific test is expected to fail. My main reason behind this is to have as much test coverage as possible just so any crash occurrences do not go unnoticed.

I'd like to propose at least a compromise of noting the expected failure along with the WontFix modifier, for example:

fast/harness/sample-fail-mismatch-reftest.html [ WontFix ImageOnlyFailure ]
Comment 1 Eric Seidel (no email) 2012-12-29 10:56:45 PST
I believe the extra crash-coverage we get from WONTFIX tests is so miniscule compared to the (also small) cost of maintaining any expectations for them, that it's not worth running them.

NRWT has various options to allow running skipped tests.  If you're interested in crash-coverage from skipped tests, I would suggest setting up a separate bot with one of those options.
Comment 2 Eric Seidel (no email) 2012-12-29 11:02:12 PST
The Chromium port gets a large amount of crash-coverage from the huge fuzzing system which Inferno runs.  Each of those crashers is turned into a layout test and committed to svn.webkit.org for other ports to run.  I suspect that this "ClusterFuzz" provides infinitely more value than any ancient WONTFIX/Skipped, etc. layout test could.  IMO we should probably start culling old/broken layout tests which no one runs.  (We could also continue to ignore them in the noise, since I'm sure they're a tiny fraction of our 30k+ layout tests.)
Comment 3 Dirk Pranke 2012-12-29 14:44:26 PST
Note that NRWT actually used to do what you wanted (i.e., WontFix did not imply Skip). There was a fair amount of debate over the years as to whether or not running the WontFix-but-not-Skip tests was at all useful for catching the sort of crashes you mention. The closest anyone came was the belief that it did catch a few things when the port was very new (i.e., we were still bringing it up and crashes were not uncommon).

Eventually we mostly agreed w/ Eric; the cost of running tests we didn't care about seemed excessive, especially given that we had so much other coverage, and so we changed what WontFix meant (IIRC, Apple also wanted WontFix to imply Skip, although they probably would've been happy w/ just Skip).

But, not every port is Chromium - I could see an argument that other, especially newer, ports might want or need better coverage from the suite. I don't think we're likely to change back to the old behavior generally, but I could possibly be convinced to make this a port-specific configurable parameter whether things are skipped or not. Even then, this might be a bad idea since different ports would work differently.
Comment 4 Zan Dobersek 2012-12-31 13:30:26 PST
(In reply to comment #1)
> I believe the extra crash-coverage we get from WONTFIX tests is so miniscule compared to the (also small) cost of maintaining any expectations for them, that it's not worth running them.

While I pointed out only the crash coverage in comment #0, there are other benefits. The fast/harness/sample-fail-mismatch-reftest.html[1][2] layout test tests that an image-only failure occurs when the reftest and its expected mismatch baseline provide equal output. This test is marked as WontFix on both Chromium and GTK (and probably other ports as well) but isn't run even though it's integral to checking the correct harness behavior on which plenty of tests rely. I remember one occurrence of problems in harness that were indicated by this test unexpectedly passing (but it wasn't reported due to an otherwise unrelated bug), so not running this test can backfire.

Also, the Chromium port currently marks over 6700 tests as WontFix. That's ~20% of all the tests the port collects. IMO that's way too much to ignore and not run.

> NRWT has various options to allow running skipped tests.  If you're interested in crash-coverage from skipped tests, I would suggest setting up a separate bot with one of those options.

No such resources are available, unfortunately. I also think additional systems could be used for better purpose, such as perfbots.


(In reply to comment #2)
> The Chromium port gets a large amount of crash-coverage from the huge fuzzing system which Inferno runs.  Each of those crashers is turned into a layout test and committed to svn.webkit.org for other ports to run.  I suspect that this "ClusterFuzz" provides infinitely more value than any ancient WONTFIX/Skipped, etc. layout test could.  IMO we should probably start culling old/broken layout tests which no one runs.  (We could also continue to ignore them in the noise, since I'm sure they're a tiny fraction of our 30k+ layout tests.)

Reading about ClusterFuzz on the Chromium blog[3], I understand (could be reading it incorrectly, though) that only exploitable crashes are analyzed and upstreamed. I appreciate the effort put into this system and process, but this in no way tests neither the WebKit1 layer of the GTK (or any other non-Chromium) port nor the WebKit2 layer.

[1] http://trac.webkit.org/browser/trunk/LayoutTests/fast/harness/sample-fail-mismatch-reftest.html
[2] http://trac.webkit.org/browser/trunk/LayoutTests/fast/harness/sample-fail-mismatch-reftest-expected-mismatch.html
[3] http://blog.chromium.org/2012/04/fuzzing-for-security.html
Comment 5 Zan Dobersek 2012-12-31 13:45:20 PST
(In reply to comment #3)
> Note that NRWT actually used to do what you wanted (i.e., WontFix did not imply Skip). There was a fair amount of debate over the years as to whether or not running the WontFix-but-not-Skip tests was at all useful for catching the sort of crashes you mention. The closest anyone came was the belief that it did catch a few things when the port was very new (i.e., we were still bringing it up and crashes were not uncommon).
> 
> Eventually we mostly agreed w/ Eric; the cost of running tests we didn't care about seemed excessive, especially given that we had so much other coverage, and so we changed what WontFix meant (IIRC, Apple also wanted WontFix to imply Skip, although they probably would've been happy w/ just Skip).

As said in comment #4, fast/harness/sample-fail-mismatch-reftest.html is an example of a layout test that should definitely be run but is not really fixable either. OTOH, it is probably the only test of such nature, but I urge every port to run it.

And again, the Chromium port marks ~20% of collected layout tests as WontFix. That's a good chunk of coverage out of the window right at the base of the complete Chromium project structure.

> But, not every port is Chromium - I could see an argument that other, especially newer, ports might want or need better coverage from the suite. I don't think we're likely to change back to the old behavior generally, but I could possibly be convinced to make this a port-specific configurable parameter whether things are skipped or not. Even then, this might be a bad idea since different ports would work differently.

I'd very like to see at least a compromise of this in the form of an opt-in configuration to run tests marked as WontFix.

I'd still prefer a possibility of an expanded modifier set, where the WontFix modifier would only indicate the non-fixable nature of the test while the extra modifier would indicate the actual expected failure. Using only the WontFix modifier could still just skip the test, as is done now.
Comment 6 Dirk Pranke 2013-01-01 09:19:32 PST
(In reply to comment #4)
> (In reply to comment #1)
> > I believe the extra crash-coverage we get from WONTFIX tests is so miniscule compared to the (also small) cost of maintaining any expectations for them, that it's not worth running them.
> 
> While I pointed out only the crash coverage in comment #0, there are other benefits. The fast/harness/sample-fail-mismatch-reftest.html[1][2] layout test tests that an image-only failure occurs when the reftest and its expected mismatch baseline provide equal output. This test is marked as WontFix on both Chromium and GTK (and probably other ports as well) but isn't run even though it's integral to checking the correct harness behavior on which plenty of tests rely. I remember one occurrence of problems in harness that were indicated by this test unexpectedly passing (but it wasn't reported due to an otherwise unrelated bug), so not running this test can backfire.
> 

That's a good counterexample.

> Also, the Chromium port currently marks over 6700 tests as WontFix. That's ~20% of all the tests the port collects. IMO that's way too much to ignore and not run.
> 

IIRC, a large chunk of these are the sputnik tests, which we don't run because v8 runs them as part of their own test suite. Otherwise most of the tests we mark WontFix are for features that are either port-specific or that we have no intention of implementing.

(In reply to comment #5)
> 
> I'd very like to see at least a compromise of this in the form of an opt-in configuration to run tests marked as WontFix.
> 
> I'd still prefer a possibility of an expanded modifier set, where the WontFix modifier would only indicate the non-fixable nature of the test while the extra modifier would indicate the actual expected failure. Using only the WontFix modifier could still just skip the test, as is done now.

That's an interesting suggestion; it's actually basically compatible with the existing syntax (since no modifiers implies Skip). I will take a look at that and if there's no real complications that that introduces, that seems like a reasonable thing to do.
Comment 7 Zan Dobersek 2013-01-01 09:41:21 PST
(In reply to comment #6)
> (In reply to comment #4)
> > Also, the Chromium port currently marks over 6700 tests as WontFix. That's ~20% of all the tests the port collects. IMO that's way too much to ignore and not run.
> > 
> 
> IIRC, a large chunk of these are the sputnik tests, which we don't run because v8 runs them as part of their own test suite. Otherwise most of the tests we mark WontFix are for features that are either port-specific or that we have no intention of implementing.

Yeah, the sputnik tests make up for a large part of that enormous list of WontFix tests on Chromium port. I wasn't aware of that. Marking them as WontFix makes sense for the Chromium port, as does the comment in TestExpectations of moving them to the JSC test suite.
Comment 8 Zan Dobersek 2013-01-15 13:55:01 PST
Created attachment 182843 [details]
Patch
Comment 9 Dirk Pranke 2013-01-15 13:59:41 PST
Comment on attachment 182843 [details]
Patch

Looks great, thanks for working on this!
Comment 10 Zan Dobersek 2013-01-15 14:04:29 PST
Comment on attachment 182843 [details]
Patch

Clearing flags on attachment: 182843

Committed r139786: <http://trac.webkit.org/changeset/139786>
Comment 11 Zan Dobersek 2013-01-15 14:04:34 PST
All reviewed patches have been landed.  Closing bug.