Bug 134767 - Add support for automated W3C test baseline computation
Summary: Add support for automated W3C test baseline computation
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: youenn fablet
URL:
Keywords:
Depends on: 134766
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-09 07:08 PDT by youenn fablet
Modified: 2020-07-24 09:58 PDT (History)
9 users (show)

See Also:


Attachments
Patch (13.19 KB, patch)
2014-07-09 07:13 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
skipping all tests by default (15.12 KB, patch)
2014-07-16 11:43 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
add support for merge of TestExpectations (19.20 KB, patch)
2014-09-16 03:16 PDT, youenn fablet
no flags Details | Formatted Diff | Diff
Improving test expectation merge (22.65 KB, patch)
2014-12-23 07:27 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Patch (22.60 KB, patch)
2014-12-23 15:39 PST, youenn fablet
no flags Details | Formatted Diff | Diff
Adding CSS testsuite support (12.67 KB, patch)
2015-01-13 07:55 PST, youenn fablet
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description youenn fablet 2014-07-09 07:08:59 PDT
To ease the process of importing W3C tests as WebKit layout tests, it may be convenient to provide a tool that imports W3C tests, generates expected.txt files and related TestExpectation.
Comment 1 youenn fablet 2014-07-09 07:13:27 PDT
Created attachment 234636 [details]
Patch
Comment 2 youenn fablet 2014-07-16 11:43:52 PDT
Created attachment 235013 [details]
skipping all tests by default
Comment 3 youenn fablet 2014-09-16 03:16:00 PDT
Created attachment 238170 [details]
add support for merge of TestExpectations
Comment 4 youenn fablet 2014-12-23 07:27:56 PST
Created attachment 243676 [details]
Improving test expectation merge
Comment 5 youenn fablet 2014-12-23 15:39:13 PST
Created attachment 243704 [details]
Patch
Comment 6 Bem Jones-Bey 2015-01-05 14:08:56 PST
Comment on attachment 243704 [details]
Patch

I'm not convinced that we want to be automatically adding entries to TestExpectations for imported tests, especially without bugs. Also, often multiple imported tests will fail for the same reason, so it can be good to have a human look at them and intelligently add a single bug for these TestExpectations lines. That being said, I'm all for a tool that makes it easier to quickly see which imported tests have failed an in what way to make it easier to create the test expectations entries and relevant bugs.
Comment 7 Ryosuke Niwa 2015-01-05 14:55:11 PST
(In reply to comment #6)
> Comment on attachment 243704 [details]
> Patch
> 
> I'm not convinced that we want to be automatically adding entries to
> TestExpectations for imported tests, especially without bugs. Also, often
> multiple imported tests will fail for the same reason, so it can be good to
> have a human look at them and intelligently add a single bug for these
> TestExpectations lines. That being said, I'm all for a tool that makes it
> easier to quickly see which imported tests have failed an in what way to
> make it easier to create the test expectations entries and relevant bugs.

I agree. I think having a tool to help humans triage failures is good. Mindlessly adding test expectations for all failing tests is not.
Comment 8 youenn fablet 2015-01-06 01:40:14 PST
(In reply to comment #7)
> (In reply to comment #6)
> > Comment on attachment 243704 [details]
> > Patch
> > 
> > I'm not convinced that we want to be automatically adding entries to
> > TestExpectations for imported tests, especially without bugs. Also, often
> > multiple imported tests will fail for the same reason, so it can be good to
> > have a human look at them and intelligently add a single bug for these
> > TestExpectations lines. That being said, I'm all for a tool that makes it
> > easier to quickly see which imported tests have failed an in what way to
> > make it easier to create the test expectations entries and relevant bugs.
> 
> I agree. I think having a tool to help humans triage failures is good.
> Mindlessly adding test expectations for all failing tests is not.

Right, manual inspection of the results and augmentation of the generated test expectations is needed.
The tool to ease test results study may well be w3c_conformance_results.html from bug 134766.

Automatically generating WPT test expectations has some additional benefits, like ordering of the tests, clustering of the test suites...

One important use case is resyncing with the latest version of the WPT repo.
We should strive to make that effort minimal. This patch is an initial step towards that goal.
Comment 9 Bem Jones-Bey 2015-01-06 08:26:21 PST
(In reply to comment #8) 
> Right, manual inspection of the results and augmentation of the generated
> test expectations is needed.
> The tool to ease test results study may well be w3c_conformance_results.html
> from bug 134766.

From a quick look, that does seem like a useful tool.

> Automatically generating WPT test expectations has some additional benefits,
> like ordering of the tests, clustering of the test suites...

I think those are outweighed by the detriment that if no one is triaging or filing bugs, issues just won't get fixed. I'm also worried that a test that didn't fail before would start failing on import and the person running the import wouldn't notice.

> One important use case is resyncing with the latest version of the WPT repo.
> We should strive to make that effort minimal. This patch is an initial step
> towards that goal.

I don't think that updating the test expectations file is the hardest part of the import. Blink has a script that imports both the WPT tests and the CSSWG tests (based on the same import-w3c-tests infrastructure). I recently used it to import the CSS Shapes tests, and since it's all or nothing, it imported some new WPT tests as well. There were some new failures, and I don't think automatically adding expectations would have made the task any easier, since I would have had to still triage them and file bugs.

One thing to note: on the Blink side, they use a whitelist to determine what parts of the CSSWG and W3C test suites to import. Perhaps if we went that route, it wouldn't seem like we need so many expectations? We could simply not import parts of the suite for things that we don't implement yet, and also could import in parts so that each piece could be triaged separately. Dirk Pranke built a lot of that stuff over there, and may have other suggestions.
Comment 10 youenn fablet 2015-01-06 13:12:43 PST
(In reply to comment #9)
> (In reply to comment #8) 
> > Right, manual inspection of the results and augmentation of the generated
> > test expectations is needed.
> > The tool to ease test results study may well be w3c_conformance_results.html
> > from bug 134766.
> 
> From a quick look, that does seem like a useful tool.
> 
> > Automatically generating WPT test expectations has some additional benefits,
> > like ordering of the tests, clustering of the test suites...
> 
> I think those are outweighed by the detriment that if no one is triaging or
> filing bugs, issues just won't get fixed. I'm also worried that a test that
> didn't fail before would start failing on import and the person running the
> import wouldn't notice.

Ah, maybe the term 'automated' in the title is a bit misleading...

The purpose of this script is to help the authoring of test-import patches by developers. It is the responsibility of the developer to do the triage, augment test expectations accordingly and submit the patch. I guess the patch would be accepted if sufficient triage is done.

This patch handles the case of partially passing tests: test expectations are added so that there is an incentive to fix them but they are marked as PASS since they should comply with the provided -expected.txt file.

It also allows incremental import: (sub)folder after (sub)folder.

> > One important use case is resyncing with the latest version of the WPT repo.
> > We should strive to make that effort minimal. This patch is an initial step
> > towards that goal.
> 
> I don't think that updating the test expectations file is the hardest part
> of the import. Blink has a script that imports both the WPT tests and the
> CSSWG tests (based on the same import-w3c-tests infrastructure). I recently
> used it to import the CSS Shapes tests, and since it's all or nothing, it
> imported some new WPT tests as well. There were some new failures, and I
> don't think automatically adding expectations would have made the task any
> easier, since I would have had to still triage them and file bugs.

From what I saw a while ago, Blink is not importing a lot of WPT tests right now. Also Blink is not importing any expected file, which simplifies things a bit.

> 
> One thing to note: on the Blink side, they use a whitelist to determine what
> parts of the CSSWG and W3C test suites to import. Perhaps if we went that
> route, it wouldn't seem like we need so many expectations? We could simply
> not import parts of the suite for things that we don't implement yet, and
> also could import in parts so that each piece could be triaged separately.
> Dirk Pranke built a lot of that stuff over there, and may have other
> suggestions.

To limit adding a lot of test expectation lines, we may only import tests that have/had at least one passing assertion. Bug 134766 tool can be used to manage the other tests.
Comment 11 Dirk Pranke 2015-01-06 18:37:34 PST
> From what I saw a while ago, Blink is not importing a lot of WPT tests right now. 

This is correct.

> Also Blink is not importing any expected file, which simplifies things a bit.

I'm not sure what is meant by "any expected file" in this context?

Blink currently imports only reftests and script-only (js) tests, not tests that require pixel tests (manual review). script-only test may still produce partial failures, which can result in -expected.txt results.

Are you referring to pixel tests not being imported?
Comment 12 youenn fablet 2015-01-07 04:35:54 PST
(In reply to comment #11)
> > From what I saw a while ago, Blink is not importing a lot of WPT tests right now. 
> 
> This is correct.
> 
> > Also Blink is not importing any expected file, which simplifies things a bit.
> 
> I'm not sure what is meant by "any expected file" in this context?

I was meaning the -expected.txt results.

> 
> Blink currently imports only reftests and script-only (js) tests, not tests
> that require pixel tests (manual review). script-only test may still produce
> partial failures, which can result in -expected.txt results.

Oh, I was not aware of that.
That looks better this way.

When running the whole test suite, it takes a lot of time due to fully failing (probably timeouting) tests. The current patch is importing them but marking them as Skip. Another approach may be to not import them until they get fixed, either in WPT repo side or in WebKit side.
Comment 13 Bem Jones-Bey 2015-01-07 15:59:15 PST
(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8) 
> > > Right, manual inspection of the results and augmentation of the generated
> > > test expectations is needed.
> > > The tool to ease test results study may well be w3c_conformance_results.html
> > > from bug 134766.
> > 
> > From a quick look, that does seem like a useful tool.
> > 
> > > Automatically generating WPT test expectations has some additional benefits,
> > > like ordering of the tests, clustering of the test suites...
> > 
> > I think those are outweighed by the detriment that if no one is triaging or
> > filing bugs, issues just won't get fixed. I'm also worried that a test that
> > didn't fail before would start failing on import and the person running the
> > import wouldn't notice.
> 
> Ah, maybe the term 'automated' in the title is a bit misleading...
> 
> The purpose of this script is to help the authoring of test-import patches
> by developers. It is the responsibility of the developer to do the triage,
> augment test expectations accordingly and submit the patch. I guess the
> patch would be accepted if sufficient triage is done.
> 
> This patch handles the case of partially passing tests: test expectations
> are added so that there is an incentive to fix them but they are marked as
> PASS since they should comply with the provided -expected.txt file.
> 
> It also allows incremental import: (sub)folder after (sub)folder.
> 

I like these features. I just don't think that it should edit the test expectations files, I think it should just give the user a report (which could be as simple as listing the lines that would need to be added to the TestExpectations) so that the user can then triage and add the appropriate lines to TestExpectations. It looks like there is a lot of code in this change just to manage intelligently editing the TestExpectations files, and I don't think that's worth the maintenance burden when the user is just going to have to edit them by hand anyways.

> When running the whole test suite, it takes a lot of time due to fully
> failing (probably timeouting) tests. The current patch is importing them but
> marking them as Skip. Another approach may be to not import them until they
> get fixed, either in WPT repo side or in WebKit side.

I think we should do like Blink is doing and import one part at at time, so that those tests can be fixed and made to work before going on to the next part of the suite. I don't think we get much value from having a large number of failing or skipped tests with no obvious owner to make them work.
Comment 14 youenn fablet 2015-01-08 04:46:00 PST
> I like these features. I just don't think that it should edit the test
> expectations files, I think it should just give the user a report (which
> could be as simple as listing the lines that would need to be added to the
> TestExpectations) so that the user can then triage and add the appropriate

This is the default behavior of this patch (generation of a TestExpectationsWPT file in WebKitBuild folder).

> lines to TestExpectations. It looks like there is a lot of code in this
> change just to manage intelligently editing the TestExpectations files, and
> I don't think that's worth the maintenance burden when the user is just
> going to have to edit them by hand anyways.

I think that kind of functionality will be useful in the future.
But we need hands-on experience before going further.

> > When running the whole test suite, it takes a lot of time due to fully
> > failing (probably timeouting) tests. The current patch is importing them but
> > marking them as Skip. Another approach may be to not import them until they
> > get fixed, either in WPT repo side or in WebKit side.
> 
> I think we should do like Blink is doing and import one part at at time, so
> that those tests can be fixed and made to work before going on to the next
> part of the suite. I don't think we get much value from having a large
> number of failing or skipped tests with no obvious owner to make them work.

Agreed, a separate whitelist file should be a better way to manage failing tests.
Comment 15 youenn fablet 2015-01-13 07:55:01 PST
Created attachment 244514 [details]
Adding CSS testsuite support
Comment 16 youenn fablet 2015-01-13 11:41:00 PST
(In reply to comment #15)
> Created attachment 244514 [details]
> Adding CSS testsuite support

As part of bug 134764, a whitelist is added to identify which test suites to add, similarly to blink.
Another list identifies the infrastructure files and folders that need to be copied and skipped by run-webkit-tests.
Comment 17 WebKit Commit Bot 2016-05-09 19:30:41 PDT
Attachment 244514 [details] did not pass style-queue:


ERROR: Tools/Scripts/webkitpy/w3c/test_updater.py:79:  [TestUpdater.test_paths] Class 'TestDownloader' has no 'CSS_NAME' member  [pylint/E1101] [5]
ERROR: Tools/Scripts/webkitpy/w3c/test_updater.py:79:  [TestUpdater.test_paths] Class 'TestDownloader' has no 'WPT_NAME' member  [pylint/E1101] [5]
Total errors found: 2 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 18 Ryosuke Niwa 2016-05-09 19:37:08 PDT
Comment on attachment 244514 [details]
Adding CSS testsuite support

View in context: https://bugs.webkit.org/attachment.cgi?id=244514&action=review

> Tools/ChangeLog:11
> +        update-w3c-tests is the script to update W3C imported test suites and their baselines.
> +        TestUpdater is the main class that does the work.
> +        It is similar in spirit to https://codereview.chromium.org/148173016 update-w3c-deps script in Blink.
> +        It uses TestImporter to import the tests and run-webkit-tests to generate the missing baselines.

Instead of adding a new script, we should just add an option to import-w3c-tests.
Comment 19 youenn fablet 2016-06-02 00:00:13 PDT
(In reply to comment #18)
> Comment on attachment 244514 [details]
> Adding CSS testsuite support
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=244514&action=review
> 
> > Tools/ChangeLog:11
> > +        update-w3c-tests is the script to update W3C imported test suites and their baselines.
> > +        TestUpdater is the main class that does the work.
> > +        It is similar in spirit to https://codereview.chromium.org/148173016 update-w3c-deps script in Blink.
> > +        It uses TestImporter to import the tests and run-webkit-tests to generate the missing baselines.
> 
> Instead of adding a new script, we should just add an option to
> import-w3c-tests.

I upgraded import-w3c-tests to support most of the functionality present in this patch.
What is missing from this patch is running tests through rwt to rebase the expectations and rerunning the tests to detect flaky tests.
I am not sure this would add much though.
Comment 20 Frédéric Wang (:fredw) 2018-06-25 08:48:14 PDT
(In reply to youenn fablet from comment #12)
> (In reply to comment #11)
> > > From what I saw a while ago, Blink is not importing a lot of WPT tests right now. 
> > 
> > This is correct.
> > 
> > > Also Blink is not importing any expected file, which simplifies things a bit.
> > 
> > I'm not sure what is meant by "any expected file" in this context?
> 
> I was meaning the -expected.txt results.
> 

I guess this is bug 82245