Bug 186359 - Add script to update web-platform-tests TestExpectations after import
Summary: Add script to update web-platform-tests TestExpectations after import
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2018-06-06 11:45 PDT by Brendan McLoughlin
Modified: 2023-12-18 06:34 PST (History)
5 users (show)

See Also:


Attachments
Patch (13.77 KB, patch)
2018-06-06 11:45 PDT, Brendan McLoughlin
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews206 for win-future (12.79 MB, application/zip)
2018-06-06 13:53 PDT, EWS Watchlist
no flags Details
Patch (20.34 KB, patch)
2018-06-07 07:50 PDT, Brendan McLoughlin
no flags Details | Formatted Diff | Diff
Patch (16.89 KB, patch)
2018-06-07 07:50 PDT, Brendan McLoughlin
no flags Details | Formatted Diff | Diff
Patch (20.83 KB, patch)
2018-06-11 08:02 PDT, Brendan McLoughlin
no flags Details | Formatted Diff | Diff
Patch (21.04 KB, patch)
2018-06-11 14:17 PDT, Brendan McLoughlin
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews201 for win-future (12.78 MB, application/zip)
2018-06-11 20:27 PDT, EWS Watchlist
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brendan McLoughlin 2018-06-06 11:45:15 PDT
WIP Add script to update web-platform-tests expectations after import
Comment 1 Brendan McLoughlin 2018-06-06 11:45:38 PDT
Created attachment 342067 [details]
Patch
Comment 2 EWS Watchlist 2018-06-06 13:53:25 PDT
Comment on attachment 342067 [details]
Patch

Attachment 342067 [details] did not pass win-ews (win):
Output: http://webkit-queues.webkit.org/results/8036616

New failing tests:
http/tests/security/contentSecurityPolicy/video-with-https-url-allowed-by-csp-media-src-star.html
Comment 3 EWS Watchlist 2018-06-06 13:53:37 PDT
Created attachment 342081 [details]
Archive of layout-test-results from ews206 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews206  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 4 Brendan McLoughlin 2018-06-07 07:50:07 PDT
Created attachment 342161 [details]
Patch
Comment 5 Brendan McLoughlin 2018-06-07 07:50:48 PDT
Created attachment 342162 [details]
Patch
Comment 6 youenn fablet 2018-06-07 10:07:05 PDT
Comment on attachment 342162 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=342162&action=review

> Tools/ChangeLog:8
> +        * Scripts/update-w3c-tests: Added.

Could be named update-w3c-test-expectations

> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:429
> +        json_results_generator.write_json(self._filesystem, summarized_results, full_results_path_1)

This would overwrite "full-results.json" which might not be what we want.
Maybe for now, we should just load the current full-results.json file and remove ADD_Results prefix and suffix in the test_updater script with a FIXME comment.

> Tools/Scripts/webkitpy/w3c/test_updater.py:34
> + intended to be uses after runnning Tools/Scripts/import-w3c-tests to assist in

s/users/used

> Tools/Scripts/webkitpy/w3c/test_updater.py:105
> +class TestExpecationUpdater(object):

s/TestExpecationUpdater/TestExpectationUpdater/

> Tools/Scripts/webkitpy/w3c/test_updater.py:123
> +        tests = self._pre_process_tests(data['tests'])

Maybe _pre_process_tests should put failing tests in different buckets (timeout, crash, text).
Then, we would be able to update all 'Text' tests at once.
For logging, we could log all failing tests at once also, all crashing tests...

> Tools/Scripts/webkitpy/w3c/test_updater.py:218
> +        return p.wait()

I wonder whether we need to use Popen or whether could not directly run it directly since this is python.

> Tools/Scripts/webkitpy/w3c/test_updater.py:235
> +            'IMAGE': 'ImageOnlyFailure',

I wonder whether we have this kind of mapping already in our code base.
Comment 7 Brendan McLoughlin 2018-06-11 08:02:18 PDT
Created attachment 342430 [details]
Patch
Comment 8 youenn fablet 2018-06-11 13:15:05 PDT
Comment on attachment 342430 [details]
Patch

I think this is going in the right direction.
I think the most important part might be to update the script so that we can run unit test.
This might require adding some abstraction to how we run webkit test or get results from it as we would probably do a shim in unit test.

View in context: https://bugs.webkit.org/attachment.cgi?id=342430&action=review

> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:-460
> -            self._filesystem.remove(results_json_path)

Is this change actually needed?

> Tools/Scripts/webkitpy/w3c/test_updater.py:32
> + This run-webkit-tests and analyzes the results then it attempts to update the

s/This /This runs /

> Tools/Scripts/webkitpy/w3c/test_updater.py:33
> + -expected.txt or the root TestExpecations file for failing test. This script is

s/TestExpecations/TestExpectations/ here and elsewhere

> Tools/Scripts/webkitpy/w3c/test_updater.py:100
> +# Add documentation of argorithm for updating test expectations

algorithm.

> Tools/Scripts/webkitpy/w3c/test_updater.py:102
> +# add unittests

We need some unit tests before landing this script.
Maybe there could be a way to pass a run_webkit_test caller as part of the creation of TestExpectationUpdater.
Then, we could try to get the results of the updated files, such as TestExpectations.

> Tools/Scripts/webkitpy/w3c/test_updater.py:255
> +    def _failing_testharness_test(self, test):

Could be renamed to _update_failing_testharness_test_expectation to improve readability or more generic _handle_failing_testharness_test.
Ditto for above methods, having a verb helps since we are changing some files there.

> Tools/Scripts/webkitpy/w3c/test_updater.py:263
> +        args = ['Tools/Scripts/run-webkit-tests'] + self._option_args

I wonder whether we should try to call python directly if possible since run-webkit-tests is already python.
If this is not straightforward, please add a FIXME.

> Tools/Scripts/webkitpy/w3c/test_updater.py:319
> +            myfile.write('\n%s [ %s ]\n' % (test, expectation))

We might want to use filesystem.py here so that this works in unit tests.
Maybe open_text_file_for_writing would work with should_append equal to true.

I also wonder whether it would not be good to buffer all changes to the set expectation file.
That way, we can log it in the console (debug level maybe) and open/write the file once.

> Tools/Scripts/webkitpy/w3c/test_updater.py:323
> +            if os.path.isfile(path):

We usually use filesystem for that.

> Tools/Scripts/webkitpy/w3c/test_updater.py:324
> +                self._remove_test_expectation_from_path(path, test_name)

We could also check whether the folder is empty and if so remove it.

> Tools/Scripts/webkitpy/w3c/test_updater.py:336
> +        data = self._load_results()

Shouldn't we load the results just once and not everytime _extract_failing_test_result is called?

> Tools/Scripts/webkitpy/w3c/test_updater.py:337
> +        tests = self._pre_process_tests(data['tests'])

Ditto there, it might be better to keep the list of tests once.
Next line seems to suggest we could want to have a matching_tests member that would organize test information keyed by the name.

> Tools/Scripts/webkitpy/w3c/test_updater.py:355
> +            results['name'] = file_name

Maybe _flatten_path could do the update here so that _pre_process_tests will be simpler.
I am not sure _flatten_path is the right method name.
Maybe something like _compute_test_list would be better?

> Tools/Scripts/webkitpy/w3c/test_updater.py:359
> +

Can probably directly return [] here or we could try to merge it with the for-loop above if we want to keep the rest of _pre_process_tests?

> Tools/Scripts/webkitpy/w3c/test_updater.py:368
> +        if result['actual'] in result['expected']:

can just do return:
result['actual'] in result['expected']

> Tools/Scripts/webkitpy/w3c/test_updater.py:378
> +                pass

Why do we need pass here?
Comment 9 Brendan McLoughlin 2018-06-11 13:56:12 PDT
Comment on attachment 342430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=342430&action=review

>> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:-460
>> -            self._filesystem.remove(results_json_path)
> 
> Is this change actually needed?

The update-w3c-test-expectations script reads the failing tests from the `full_results.json` file and uses them to figure out which test expectations to update. If this `full_results.json` get removed there is no way to get the this level of detail on why the tests failed.

>> Tools/Scripts/webkitpy/w3c/test_updater.py:102
>> +# add unittests
> 
> We need some unit tests before landing this script.
> Maybe there could be a way to pass a run_webkit_test caller as part of the creation of TestExpectationUpdater.
> Then, we could try to get the results of the updated files, such as TestExpectations.

Will do.

>> Tools/Scripts/webkitpy/w3c/test_updater.py:263
>> +        args = ['Tools/Scripts/run-webkit-tests'] + self._option_args
> 
> I wonder whether we should try to call python directly if possible since run-webkit-tests is already python.
> If this is not straightforward, please add a FIXME.

Oops. I already refactored this to call run-webkit-tests via python. I'll remove this reference to the executable.

>> Tools/Scripts/webkitpy/w3c/test_updater.py:319
>> +            myfile.write('\n%s [ %s ]\n' % (test, expectation))
> 
> We might want to use filesystem.py here so that this works in unit tests.
> Maybe open_text_file_for_writing would work with should_append equal to true.
> 
> I also wonder whether it would not be good to buffer all changes to the set expectation file.
> That way, we can log it in the console (debug level maybe) and open/write the file once.

I originally went down that path of buffering all the changes until then end. However it made it difficult to isolate flaky tests when re-running run-webkit-tests after updating the expectations to test the new expectations.

I tried that before grouping the expectations by failure type. I'll do another experiment to test it since the additional context of the original failure type buckets may make it easier to test identify the flaky tests.

>> Tools/Scripts/webkitpy/w3c/test_updater.py:337
>> +        tests = self._pre_process_tests(data['tests'])
> 
> Ditto there, it might be better to keep the list of tests once.
> Next line seems to suggest we could want to have a matching_tests member that would organize test information keyed by the name.

After updating an expectation the test most of the methods call `self._run_webkit_tests` and re-run the test to check if the expectation worked. This helps it catch flaky tests or testharness tests that always generate unique output. `_extract_failing_test_result` makes a fresh read from the updated `full_results.json` to read the new test status.

>> Tools/Scripts/webkitpy/w3c/test_updater.py:378
>> +                pass
> 
> Why do we need pass here?

No. I will remote that.
Comment 10 Brendan McLoughlin 2018-06-11 14:09:11 PDT
Comment on attachment 342430 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=342430&action=review

>> Tools/Scripts/webkitpy/w3c/test_updater.py:355
>> +            results['name'] = file_name
> 
> Maybe _flatten_path could do the update here so that _pre_process_tests will be simpler.
> I am not sure _flatten_path is the right method name.
> Maybe something like _compute_test_list would be better?

I looked into this change and I think I want to leave it as is. ` _flatten_path` converts a deeply nested dict that looks like this: 
{"imported":{"w3c":{"web-platform-tests":{"xhr":{"responsexml-document-properties.htm":{"expected":"PASS","actual":"TEXT"}}}}}}

into a shallower dict where the key is the full test path like this:
{"imported/w3c/web-platform-tests/responsexml-document-properties.htm":{"expected":"PASS","actual":"TEXT"}}

Do to how it recursively calls itself to build the outer key, trying to add the name to the inner dict makes it more complicated.
Comment 11 Brendan McLoughlin 2018-06-11 14:17:44 PDT
Created attachment 342464 [details]
Patch
Comment 12 Brendan McLoughlin 2018-06-11 14:18:50 PDT
I uploaded a patch to address some of the smaller changes. Working on adding unit tests now.
Comment 13 EWS Watchlist 2018-06-11 20:26:50 PDT
Comment on attachment 342464 [details]
Patch

Attachment 342464 [details] did not pass win-ews (win):
Output: http://webkit-queues.webkit.org/results/8141200

New failing tests:
storage/indexeddb/modern/leak-1.html
Comment 14 EWS Watchlist 2018-06-11 20:27:01 PDT
Created attachment 342509 [details]
Archive of layout-test-results from ews201 for win-future

The attached test failures were seen while running run-webkit-tests on the win-ews.
Bot: ews201  Port: win-future  Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit
Comment 15 Alex Christensen 2021-11-01 12:20:15 PDT
Comment on attachment 342464 [details]
Patch

This has been requesting review for more than one year.  If this is still needed, please rebase and re-request review.
Comment 16 Sam Sneddon [:gsnedders] 2023-07-14 15:42:20 PDT
While I think at a high-level the goal here is reasonable, I don't think it makes sense to view as an item distinct to the existing update-test-expectations-from-bugzilla script (from the earlier bug 169538). We should be updating all the related baseline and expectations files as a single task, and not having separate scripts to update the baselines versus the TestExpectations files, as they're inherently two sides of the same coin.

It also seems like the current implementation probably doesn't follow the direction we'd really like to take; appending all new lines to the end of the TestExpectations files seems highly likely to lead to merge conflicts in the face of multiple WPT updates. It's probably also possible to reuse more of the existing code in webkitpy?
Comment 17 Radar WebKit Bug Importer 2023-08-04 06:53:38 PDT
<rdar://problem/113394740>
Comment 18 Sam Sneddon [:gsnedders] 2023-10-23 10:02:40 PDT
(In reply to Sam Sneddon [:gsnedders] from comment #16)
> While I think at a high-level the goal here is reasonable, I don't think it
> makes sense to view as an item distinct to the existing
> update-test-expectations-from-bugzilla script (from the earlier bug 169538).
> We should be updating all the related baseline and expectations files as a
> single task, and not having separate scripts to update the baselines versus
> the TestExpectations files, as they're inherently two sides of the same coin.

That said, some of the TestExpectations updates should likely be handled by import-w3c-tests (see bug 172471), as it already knows if files are deleted, and it can make a guess at renames.