Please see a master bug for the details: https://bugs.webkit.org/show_bug.cgi?id=36065 This is an initial effort to add support for reftests to new-run-webkit-tests.
Created attachment 84209 [details] in-progress. not-ready-for-review
Created attachment 85028 [details] add-reftests
I guess there are a lot of things remained until we can announce that 'Reftests is now available'. Updating documents, wiki, tweaking (old)-webkit-run-tests and so on. But as for new-run-webkit-tests, I think this patch is all we have to do. Could you review the patch? I'll split this patch into smaller patches if you prefer. But most changes are tightly related, so I put all into this patch at first. This patch should not affect existing layout tests at all unless you add 'reftest' in LayoutTests directory. So this patch should be safe to land.
Hi Dirk, Ojan, Could you review this? I am aiming to add reftest supports in this quarter. So I appreciate the reviews.
FYI I started to write a Wiki page about how to write reftests: https://trac.webkit.org/wiki/Writing%20Reftests
Comment on attachment 85028 [details] add-reftests View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review This patch looks fine. Just a few nits. We need to figure out whether we need to add reftests support to run-webkit-tests. That seems like a lot of work, so we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. In either case, that a discussion to be had on webkit-dev. It doesn't need to block checking this in, but it probably needs to happen before we actually checkin any reftests. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:85 > + ' expected-html file and expected-mismatch.html file' nit: s/expected-html/expected.html > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:117 > + # Returns a dummy TestResult. We don't have to rebase for reftests. > + return TestResult(self._filename) Should this return None and then the calling code handles None appropriately? That seems cleaner to me at first glance, but I'm not 100% sure how this code is used. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:188 > + This arg is optional and shoulbe be used only in reftests until we have a better way to know s/shoulbe/should > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:200 > + # FIXME: Move this actual file writing code somewhere so that this fucntion doesn't have any side effects. s/fucntion/function > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > + max_test_time = max(driver_output1.test_time, driver_output2.test_time) This should probably just be the sum of the test times since we use this information to see how long a test took to run. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 > + else: > + if driver_output1.image_hash != driver_output2.image_hash: This can be an "elif" > Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:310 > + uris = [self.relative_output_filename(filename, fn) for > + fn in out_names] fn is not a clear variable name. out_filename perhaps? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:328 > + uris = [self.relative_output_filename(filename, fn) for > + fn in out_names] ditto > Tools/Scripts/webkitpy/layout_tests/port/base.py:350 > + def expected_html_filename(self, filename): how about calling this reftest_expected_filename? The fact that it ends in .html is an implementation detail. > Tools/Scripts/webkitpy/layout_tests/port/base.py:354 > + def expected_mismatch_html_filename(self, filename): How about calling this reftest_expected_mismatch_filename? > Tools/Scripts/webkitpy/layout_tests/port/test.py:53 > + self.without_expectation_files = False should this just be self.is_reftest?
Comment on attachment 85028 [details] add-reftests View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:85 >> + ' expected-html file and expected-mismatch.html file' > > nit: s/expected-html/expected.html Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:117 >> + return TestResult(self._filename) > > Should this return None and then the calling code handles None appropriately? That seems cleaner to me at first glance, but I'm not 100% sure how this code is used. I don't think returning None as a TestResult is good idea. That causes the callers burdens and things more complex. I agree this 'Dummy' TestResult is not the best way. That's ugly. Ideally we should skip calling single_test_runner itself if the test is a reftest in rebasing. But that requires yet another non-trivial changes. So I'd like to leave this code as is until we have a better way in rebasing. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:188 >> + This arg is optional and shoulbe be used only in reftests until we have a better way to know > > s/shoulbe/should Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:200 >> + # FIXME: Move this actual file writing code somewhere so that this fucntion doesn't have any side effects. > > s/fucntion/function Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 >> + max_test_time = max(driver_output1.test_time, driver_output2.test_time) > > This should probably just be the sum of the test times since we use this information to see how long a test took to run. Done. I used 'sum' instead of 'max'. Let me explain my thought. Actually I had thought which was better between 'max' and 'sum'. I have an concern using 'sum'. It might confuse users or flakiness dashboard in the following case: - timeout limit: 10s - html1 took: 8s (it's okay because 8s < 10s) - html2 took: 7s (it's okay because 8s < 10s) - sum: 15s Users might be upset because displayed test time is bigger than '10s', but TIMEOUT didn't happen. Ideally, we should preserve both test_time values from both html files and display both values. But that is not realistic due to the current layout tests convention. We should compromise on this. If you have any ideas, please let me know. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 >> + if driver_output1.image_hash != driver_output2.image_hash: > > This can be an "elif" Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:310 >> + fn in out_names] > > fn is not a clear variable name. out_filename perhaps? Done. Thank you. I don't remember why I chose such a name. >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:328 >> + fn in out_names] > > ditto Done >> Tools/Scripts/webkitpy/layout_tests/port/base.py:350 >> + def expected_html_filename(self, filename): > > how about calling this reftest_expected_filename? The fact that it ends in .html is an implementation detail. Done. Although I don't think using 'html' is so bad because '.html' is only supported in current implementation and it's intuitive. But I don't have a strong opinion. Either is okay for me. Thank you. >> Tools/Scripts/webkitpy/layout_tests/port/base.py:354 >> + def expected_mismatch_html_filename(self, filename): > > How about calling this reftest_expected_mismatch_filename? Done. Thank you again. >> Tools/Scripts/webkitpy/layout_tests/port/test.py:53 >> + self.without_expectation_files = False > > should this just be self.is_reftest? That makes sense. Done.
Created attachment 85147 [details] update
(In reply to comment #6) > (From update of attachment 85028 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review > > This patch looks fine. Just a few nits. We need to figure out whether we need to add reftests support to run-webkit-tests. That seems like a lot of work, so we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. In either case, that a discussion to be had on webkit-dev. It doesn't need to block checking this in, but it probably needs to happen before we actually checkin any reftests. We've discussed that already in the master bug entry. We don't have a plan to add reftest support to run-webkit-tests. And unfortunately, new-run-webkit-tests has not replaced old-run-webkit-tests yet. So adding a mode to new-run-webkit-test to run just the reftests might be one of the approaches we should take. We should find a way to satisfying both new-run-webkit-tests and old-run-webkit-tests users.
Comment on attachment 85147 [details] update View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review Patch generally looks fine. The only substantive concern I have is with the duplication between _compare_output() and _compare_output_with_reference(). Let me know what you think about that. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92 > + It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:116 > + # Returns a dummy TestResult. We don't have to rebase for reftests. This reminds me that we'll want to add something to the rebaseline scripts like rebaseline-chromium-webkit-tests. At the least they should probably complain if asked to rebaseline a reftest. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:263 > + expected_driver_output = ExpectedDriverOutput(driver_output2.text, driver_output2.image, driver_output2.image_hash) Do you need to do this? Can't you just pass driver_output2 to write_test_result? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:268 > + def _compare_output_with_reference(self, driver_output1, driver_output2): I'm not real comfortable about the duplication between this and compare_output(). Does it make sense to combine the two? > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:359 > + self.assertEqual(res, unexpected_tests_count) We should probably add a routine in port/test.py that returns this. Maybe add a #FIXME?
Comment on attachment 85147 [details] update Hi Dirk, Thank you for the review. I am updating the patch. The difference between the previous patch are: - Get rid of file writing logic from SingleTestRunner._handle_error(). Now rebaseline also uses test_result_writer.py module and code is shared. - Skip image comparison when timeout occurs in reftests. - Rename FailureCrash.test_name to TestFailureCrash.reference_filename. Does similar things against FailureTime.test_name too. - Fixed a bug: When a crash occurs in reference html in reftests, a wrong stack was written. View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92 >> + > > It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug? Nice idea. I am not familiar with how a presubmit checker works. So I'll file a separate bug. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:116 >> + # Returns a dummy TestResult. We don't have to rebase for reftests. > > This reminds me that we'll want to add something to the rebaseline scripts like rebaseline-chromium-webkit-tests. At the least they should probably complain if asked to rebaseline a reftest. Yeah. I'll file a separate bug. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:263 >> + expected_driver_output = ExpectedDriverOutput(driver_output2.text, driver_output2.image, driver_output2.image_hash) > > Do you need to do this? Can't you just pass driver_output2 to write_test_result? Done. I think it's time to get rid of ExpectedDriverOutput class. I introduced an ExpectedDriverOutput once, but now I think it caused unnecessary complexity. So I've removed ExpectedRiverOutput class and reused base.DriverOutput class. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > + total_test_time = driver_output1.test_time + driver_output2.test_time Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if self._is_reftest: .. else..' blocks. So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function. >> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:359 >> + self.assertEqual(res, unexpected_tests_count) > > We should probably add a routine in port/test.py that returns this. Maybe add a #FIXME? Nice idea. I added a #FIXME. It might be better to do it in a separate patch.
Created attachment 85285 [details] update2
(In reply to comment #11) > (From update of attachment 85147 [details]) > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > > + total_test_time = driver_output1.test_time + driver_output2.test_time > > Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if self._is_reftest: .. else..' blocks. > So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function. > Okay. The latest patch LGTM :)
Hi Dirk, glad to hear that. Hi Ojan, could you review the latest patch? # The earthquake happened today, so I am still working :) (In reply to comment #13) > (In reply to comment #11) > > (From update of attachment 85147 [details] [details]) > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > > > + total_test_time = driver_output1.test_time + driver_output2.test_time > > > > Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if self._is_reftest: .. else..' blocks. > > So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function. > > > > Okay. > > The latest patch LGTM :)
Created attachment 85761 [details] update3
I've updated the patch. (In reply to comment #10) > (From update of attachment 85147 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review > > Patch generally looks fine. The only substantive concern I have is with the duplication between _compare_output() and _compare_output_with_reference(). Let me know what you think about that. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92 > > + > > It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug? I've included this check routine in single_test_runner instead of implementing that as a presubmit trigger.
I appreciate some reviewers could review this patch and marke this as 'r+'. Dirk and Ojan have already reviewed the most part of this patch.
Hamaji-san, Thank you for the 'r+'. I'll submit the patch after fixing a minor issue about DryRunTest.
Committed r81127: <http://trac.webkit.org/changeset/81127>