186359 – Add script to update web-platform-tests TestExpectations after import

NEW 186359

Add script to update web-platform-tests TestExpectations after import

https://bugs.webkit.org/show_bug.cgi?id=186359

Summary Add script to update web-platform-tests TestExpectations after import

Brendan McLoughlin

Reported 2018-06-06 11:45:15 PDT

WIP Add script to update web-platform-tests expectations after import

Attachments
Patch (13.77 KB, patch) 2018-06-06 11:45 PDT, Brendan McLoughlin	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews206 for win-future (12.79 MB, application/zip) 2018-06-06 13:53 PDT, EWS Watchlist	no flags	Details
Patch (20.34 KB, patch) 2018-06-07 07:50 PDT, Brendan McLoughlin	no flags	Details Formatted Diff Diff
Patch (16.89 KB, patch) 2018-06-07 07:50 PDT, Brendan McLoughlin	no flags	Details Formatted Diff Diff
Patch (20.83 KB, patch) 2018-06-11 08:02 PDT, Brendan McLoughlin	no flags	Details Formatted Diff Diff
Patch (21.04 KB, patch) 2018-06-11 14:17 PDT, Brendan McLoughlin	ews-watchlist: commit-queue-	Details Formatted Diff Diff
Archive of layout-test-results from ews201 for win-future (12.78 MB, application/zip) 2018-06-11 20:27 PDT, EWS Watchlist	no flags	Details
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Brendan McLoughlin

Comment 1 2018-06-06 11:45:38 PDT

Created attachment 342067 [details] Patch

EWS Watchlist

Comment 2 2018-06-06 13:53:25 PDT

Comment on attachment 342067 [details] Patch Attachment 342067 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/8036616 New failing tests: http/tests/security/contentSecurityPolicy/video-with-https-url-allowed-by-csp-media-src-star.html

EWS Watchlist

Comment 3 2018-06-06 13:53:37 PDT

Created attachment 342081 [details] Archive of layout-test-results from ews206 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews206 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Brendan McLoughlin

Comment 4 2018-06-07 07:50:07 PDT

Created attachment 342161 [details] Patch

Brendan McLoughlin

Comment 5 2018-06-07 07:50:48 PDT

Created attachment 342162 [details] Patch

youenn fablet

Comment 6 2018-06-07 10:07:05 PDT

Comment on attachment 342162 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=342162&action=review > Tools/ChangeLog:8 > + * Scripts/update-w3c-tests: Added. Could be named update-w3c-test-expectations > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:429 > + json_results_generator.write_json(self._filesystem, summarized_results, full_results_path_1) This would overwrite "full-results.json" which might not be what we want. Maybe for now, we should just load the current full-results.json file and remove ADD_Results prefix and suffix in the test_updater script with a FIXME comment. > Tools/Scripts/webkitpy/w3c/test_updater.py:34 > + intended to be uses after runnning Tools/Scripts/import-w3c-tests to assist in s/users/used > Tools/Scripts/webkitpy/w3c/test_updater.py:105 > +class TestExpecationUpdater(object): s/TestExpecationUpdater/TestExpectationUpdater/ > Tools/Scripts/webkitpy/w3c/test_updater.py:123 > + tests = self._pre_process_tests(data['tests']) Maybe _pre_process_tests should put failing tests in different buckets (timeout, crash, text). Then, we would be able to update all 'Text' tests at once. For logging, we could log all failing tests at once also, all crashing tests... > Tools/Scripts/webkitpy/w3c/test_updater.py:218 > + return p.wait() I wonder whether we need to use Popen or whether could not directly run it directly since this is python. > Tools/Scripts/webkitpy/w3c/test_updater.py:235 > + 'IMAGE': 'ImageOnlyFailure', I wonder whether we have this kind of mapping already in our code base.

Brendan McLoughlin

Comment 7 2018-06-11 08:02:18 PDT

Created attachment 342430 [details] Patch

youenn fablet

Comment 8 2018-06-11 13:15:05 PDT

Comment on attachment 342430 [details] Patch I think this is going in the right direction. I think the most important part might be to update the script so that we can run unit test. This might require adding some abstraction to how we run webkit test or get results from it as we would probably do a shim in unit test. View in context: https://bugs.webkit.org/attachment.cgi?id=342430&action=review > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:-460 > - self._filesystem.remove(results_json_path) Is this change actually needed? > Tools/Scripts/webkitpy/w3c/test_updater.py:32 > + This run-webkit-tests and analyzes the results then it attempts to update the s/This /This runs / > Tools/Scripts/webkitpy/w3c/test_updater.py:33 > + -expected.txt or the root TestExpecations file for failing test. This script is s/TestExpecations/TestExpectations/ here and elsewhere > Tools/Scripts/webkitpy/w3c/test_updater.py:100 > +# Add documentation of argorithm for updating test expectations algorithm. > Tools/Scripts/webkitpy/w3c/test_updater.py:102 > +# add unittests We need some unit tests before landing this script. Maybe there could be a way to pass a run_webkit_test caller as part of the creation of TestExpectationUpdater. Then, we could try to get the results of the updated files, such as TestExpectations. > Tools/Scripts/webkitpy/w3c/test_updater.py:255 > + def _failing_testharness_test(self, test): Could be renamed to _update_failing_testharness_test_expectation to improve readability or more generic _handle_failing_testharness_test. Ditto for above methods, having a verb helps since we are changing some files there. > Tools/Scripts/webkitpy/w3c/test_updater.py:263 > + args = ['Tools/Scripts/run-webkit-tests'] + self._option_args I wonder whether we should try to call python directly if possible since run-webkit-tests is already python. If this is not straightforward, please add a FIXME. > Tools/Scripts/webkitpy/w3c/test_updater.py:319 > + myfile.write('\n%s [ %s ]\n' % (test, expectation)) We might want to use filesystem.py here so that this works in unit tests. Maybe open_text_file_for_writing would work with should_append equal to true. I also wonder whether it would not be good to buffer all changes to the set expectation file. That way, we can log it in the console (debug level maybe) and open/write the file once. > Tools/Scripts/webkitpy/w3c/test_updater.py:323 > + if os.path.isfile(path): We usually use filesystem for that. > Tools/Scripts/webkitpy/w3c/test_updater.py:324 > + self._remove_test_expectation_from_path(path, test_name) We could also check whether the folder is empty and if so remove it. > Tools/Scripts/webkitpy/w3c/test_updater.py:336 > + data = self._load_results() Shouldn't we load the results just once and not everytime _extract_failing_test_result is called? > Tools/Scripts/webkitpy/w3c/test_updater.py:337 > + tests = self._pre_process_tests(data['tests']) Ditto there, it might be better to keep the list of tests once. Next line seems to suggest we could want to have a matching_tests member that would organize test information keyed by the name. > Tools/Scripts/webkitpy/w3c/test_updater.py:355 > + results['name'] = file_name Maybe _flatten_path could do the update here so that _pre_process_tests will be simpler. I am not sure _flatten_path is the right method name. Maybe something like _compute_test_list would be better? > Tools/Scripts/webkitpy/w3c/test_updater.py:359 > + Can probably directly return [] here or we could try to merge it with the for-loop above if we want to keep the rest of _pre_process_tests? > Tools/Scripts/webkitpy/w3c/test_updater.py:368 > + if result['actual'] in result['expected']: can just do return: result['actual'] in result['expected'] > Tools/Scripts/webkitpy/w3c/test_updater.py:378 > + pass Why do we need pass here?

Brendan McLoughlin

Comment 9 2018-06-11 13:56:12 PDT

Comment on attachment 342430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=342430&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:-460 >> - self._filesystem.remove(results_json_path) > > Is this change actually needed? The update-w3c-test-expectations script reads the failing tests from the `full_results.json` file and uses them to figure out which test expectations to update. If this `full_results.json` get removed there is no way to get the this level of detail on why the tests failed. >> Tools/Scripts/webkitpy/w3c/test_updater.py:102 >> +# add unittests > > We need some unit tests before landing this script. > Maybe there could be a way to pass a run_webkit_test caller as part of the creation of TestExpectationUpdater. > Then, we could try to get the results of the updated files, such as TestExpectations. Will do. >> Tools/Scripts/webkitpy/w3c/test_updater.py:263 >> + args = ['Tools/Scripts/run-webkit-tests'] + self._option_args > > I wonder whether we should try to call python directly if possible since run-webkit-tests is already python. > If this is not straightforward, please add a FIXME. Oops. I already refactored this to call run-webkit-tests via python. I'll remove this reference to the executable. >> Tools/Scripts/webkitpy/w3c/test_updater.py:319 >> + myfile.write('\n%s [ %s ]\n' % (test, expectation)) > > We might want to use filesystem.py here so that this works in unit tests. > Maybe open_text_file_for_writing would work with should_append equal to true. > > I also wonder whether it would not be good to buffer all changes to the set expectation file. > That way, we can log it in the console (debug level maybe) and open/write the file once. I originally went down that path of buffering all the changes until then end. However it made it difficult to isolate flaky tests when re-running run-webkit-tests after updating the expectations to test the new expectations. I tried that before grouping the expectations by failure type. I'll do another experiment to test it since the additional context of the original failure type buckets may make it easier to test identify the flaky tests. >> Tools/Scripts/webkitpy/w3c/test_updater.py:337 >> + tests = self._pre_process_tests(data['tests']) > > Ditto there, it might be better to keep the list of tests once. > Next line seems to suggest we could want to have a matching_tests member that would organize test information keyed by the name. After updating an expectation the test most of the methods call `self._run_webkit_tests` and re-run the test to check if the expectation worked. This helps it catch flaky tests or testharness tests that always generate unique output. `_extract_failing_test_result` makes a fresh read from the updated `full_results.json` to read the new test status. >> Tools/Scripts/webkitpy/w3c/test_updater.py:378 >> + pass > > Why do we need pass here? No. I will remote that.

Brendan McLoughlin

Comment 10 2018-06-11 14:09:11 PDT

Comment on attachment 342430 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=342430&action=review >> Tools/Scripts/webkitpy/w3c/test_updater.py:355 >> + results['name'] = file_name > > Maybe _flatten_path could do the update here so that _pre_process_tests will be simpler. > I am not sure _flatten_path is the right method name. > Maybe something like _compute_test_list would be better? I looked into this change and I think I want to leave it as is. ` _flatten_path` converts a deeply nested dict that looks like this: {"imported":{"w3c":{"web-platform-tests":{"xhr":{"responsexml-document-properties.htm":{"expected":"PASS","actual":"TEXT"}}}}}} into a shallower dict where the key is the full test path like this: {"imported/w3c/web-platform-tests/responsexml-document-properties.htm":{"expected":"PASS","actual":"TEXT"}} Do to how it recursively calls itself to build the outer key, trying to add the name to the inner dict makes it more complicated.

Brendan McLoughlin

Comment 11 2018-06-11 14:17:44 PDT

Created attachment 342464 [details] Patch

Brendan McLoughlin

Comment 12 2018-06-11 14:18:50 PDT

I uploaded a patch to address some of the smaller changes. Working on adding unit tests now.

EWS Watchlist

Comment 13 2018-06-11 20:26:50 PDT

Comment on attachment 342464 [details] Patch Attachment 342464 [details] did not pass win-ews (win): Output: http://webkit-queues.webkit.org/results/8141200 New failing tests: storage/indexeddb/modern/leak-1.html

EWS Watchlist

Comment 14 2018-06-11 20:27:01 PDT

Created attachment 342509 [details] Archive of layout-test-results from ews201 for win-future The attached test failures were seen while running run-webkit-tests on the win-ews. Bot: ews201 Port: win-future Platform: CYGWIN_NT-6.1-2.9.0-0.318-5-3-x86_64-64bit

Alex Christensen

Comment 15 2021-11-01 12:20:15 PDT

Comment on attachment 342464 [details] Patch This has been requesting review for more than one year. If this is still needed, please rebase and re-request review.

Sam Sneddon [:gsnedders]

Comment 16 2023-07-14 15:42:20 PDT

While I think at a high-level the goal here is reasonable, I don't think it makes sense to view as an item distinct to the existing update-test-expectations-from-bugzilla script (from the earlier bug 169538). We should be updating all the related baseline and expectations files as a single task, and not having separate scripts to update the baselines versus the TestExpectations files, as they're inherently two sides of the same coin. It also seems like the current implementation probably doesn't follow the direction we'd really like to take; appending all new lines to the end of the TestExpectations files seems highly likely to lead to merge conflicts in the face of multiple WPT updates. It's probably also possible to reuse more of the existing code in webkitpy?

Radar WebKit Bug Importer

Comment 17 2023-08-04 06:53:38 PDT

<rdar://problem/113394740>

Sam Sneddon [:gsnedders]

Comment 18 2023-10-23 10:02:40 PDT

(In reply to Sam Sneddon [:gsnedders] from comment #16) > While I think at a high-level the goal here is reasonable, I don't think it > makes sense to view as an item distinct to the existing > update-test-expectations-from-bugzilla script (from the earlier bug 169538). > We should be updating all the related baseline and expectations files as a > single task, and not having separate scripts to update the baselines versus > the TestExpectations files, as they're inherently two sides of the same coin. That said, some of the TestExpectations updates should likely be handled by import-w3c-tests (see bug 172471), as it already knows if files are deleted, and it can make a guess at renames.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Nobody

Reported

2018-06-06 11:45 PDT

Modified

2023-12-18 06:34 PST History

CC List

5 users Show

URL

Keywords InRadar

Depends on

Blocks