55457 – [NRWT] Add support for reftests to new-run-webkit-tests

RESOLVED FIXED 55457

[NRWT] Add support for reftests to new-run-webkit-tests

https://bugs.webkit.org/show_bug.cgi?id=55457

Summary [NRWT] Add support for reftests to new-run-webkit-tests

Hayato Ito

Reported 2011-03-01 04:25:32 PST

Please see a master bug for the details: https://bugs.webkit.org/show_bug.cgi?id=36065 This is an initial effort to add support for reftests to new-run-webkit-tests.

Attachments
in-progress. not-ready-for-review (17.96 KB, patch) 2011-03-01 04:33 PST, Hayato Ito	no flags	Details Formatted Diff Diff
add-reftests (23.97 KB, patch) 2011-03-08 00:02 PST, Hayato Ito	no flags	Details Formatted Diff Diff
update (23.93 KB, patch) 2011-03-09 01:59 PST, Hayato Ito	no flags	Details Formatted Diff Diff
update2 (26.59 KB, patch) 2011-03-09 22:53 PST, Hayato Ito	no flags	Details Formatted Diff Diff
update3 (26.99 KB, patch) 2011-03-14 19:28 PDT, Hayato Ito	hamaji: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Hayato Ito

Comment 1 2011-03-01 04:33:31 PST

Created attachment 84209 [details] in-progress. not-ready-for-review

Hayato Ito

Comment 2 2011-03-08 00:02:32 PST

Created attachment 85028 [details] add-reftests

Hayato Ito

Comment 3 2011-03-08 00:19:42 PST

I guess there are a lot of things remained until we can announce that 'Reftests is now available'. Updating documents, wiki, tweaking (old)-webkit-run-tests and so on. But as for new-run-webkit-tests, I think this patch is all we have to do. Could you review the patch? I'll split this patch into smaller patches if you prefer. But most changes are tightly related, so I put all into this patch at first. This patch should not affect existing layout tests at all unless you add 'reftest' in LayoutTests directory. So this patch should be safe to land.

Hayato Ito

Comment 4 2011-03-08 21:43:44 PST

Hi Dirk, Ojan, Could you review this? I am aiming to add reftest supports in this quarter. So I appreciate the reviews.

Hayato Ito

Comment 5 2011-03-08 23:38:12 PST

FYI I started to write a Wiki page about how to write reftests: https://trac.webkit.org/wiki/Writing%20Reftests

Ojan Vafai

Comment 6 2011-03-08 23:38:32 PST

Comment on attachment 85028 [details] add-reftests View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review This patch looks fine. Just a few nits. We need to figure out whether we need to add reftests support to run-webkit-tests. That seems like a lot of work, so we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. In either case, that a discussion to be had on webkit-dev. It doesn't need to block checking this in, but it probably needs to happen before we actually checkin any reftests. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:85 > + ' expected-html file and expected-mismatch.html file' nit: s/expected-html/expected.html > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:117 > + # Returns a dummy TestResult. We don't have to rebase for reftests. > + return TestResult(self._filename) Should this return None and then the calling code handles None appropriately? That seems cleaner to me at first glance, but I'm not 100% sure how this code is used. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:188 > + This arg is optional and shoulbe be used only in reftests until we have a better way to know s/shoulbe/should > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:200 > + # FIXME: Move this actual file writing code somewhere so that this fucntion doesn't have any side effects. s/fucntion/function > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > + max_test_time = max(driver_output1.test_time, driver_output2.test_time) This should probably just be the sum of the test times since we use this information to see how long a test took to run. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 > + else: > + if driver_output1.image_hash != driver_output2.image_hash: This can be an "elif" > Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:310 > + uris = [self.relative_output_filename(filename, fn) for > + fn in out_names] fn is not a clear variable name. out_filename perhaps? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:328 > + uris = [self.relative_output_filename(filename, fn) for > + fn in out_names] ditto > Tools/Scripts/webkitpy/layout_tests/port/base.py:350 > + def expected_html_filename(self, filename): how about calling this reftest_expected_filename? The fact that it ends in .html is an implementation detail. > Tools/Scripts/webkitpy/layout_tests/port/base.py:354 > + def expected_mismatch_html_filename(self, filename): How about calling this reftest_expected_mismatch_filename? > Tools/Scripts/webkitpy/layout_tests/port/test.py:53 > + self.without_expectation_files = False should this just be self.is_reftest?

Hayato Ito

Comment 7 2011-03-09 01:37:40 PST

Comment on attachment 85028 [details] add-reftests View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:85 >> + ' expected-html file and expected-mismatch.html file' > > nit: s/expected-html/expected.html Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:117 >> + return TestResult(self._filename) > > Should this return None and then the calling code handles None appropriately? That seems cleaner to me at first glance, but I'm not 100% sure how this code is used. I don't think returning None as a TestResult is good idea. That causes the callers burdens and things more complex. I agree this 'Dummy' TestResult is not the best way. That's ugly. Ideally we should skip calling single_test_runner itself if the test is a reftest in rebasing. But that requires yet another non-trivial changes. So I'd like to leave this code as is until we have a better way in rebasing. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:188 >> + This arg is optional and shoulbe be used only in reftests until we have a better way to know > > s/shoulbe/should Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:200 >> + # FIXME: Move this actual file writing code somewhere so that this fucntion doesn't have any side effects. > > s/fucntion/function Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 >> + max_test_time = max(driver_output1.test_time, driver_output2.test_time) > > This should probably just be the sum of the test times since we use this information to see how long a test took to run. Done. I used 'sum' instead of 'max'. Let me explain my thought. Actually I had thought which was better between 'max' and 'sum'. I have an concern using 'sum'. It might confuse users or flakiness dashboard in the following case: - timeout limit: 10s - html1 took: 8s (it's okay because 8s < 10s) - html2 took: 7s (it's okay because 8s < 10s) - sum: 15s Users might be upset because displayed test time is bigger than '10s', but TIMEOUT didn't happen. Ideally, we should preserve both test_time values from both html files and display both values. But that is not realistic due to the current layout tests convention. We should compromise on this. If you have any ideas, please let me know. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 >> + if driver_output1.image_hash != driver_output2.image_hash: > > This can be an "elif" Done >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:310 >> + fn in out_names] > > fn is not a clear variable name. out_filename perhaps? Done. Thank you. I don't remember why I chose such a name. >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:328 >> + fn in out_names] > > ditto Done >> Tools/Scripts/webkitpy/layout_tests/port/base.py:350 >> + def expected_html_filename(self, filename): > > how about calling this reftest_expected_filename? The fact that it ends in .html is an implementation detail. Done. Although I don't think using 'html' is so bad because '.html' is only supported in current implementation and it's intuitive. But I don't have a strong opinion. Either is okay for me. Thank you. >> Tools/Scripts/webkitpy/layout_tests/port/base.py:354 >> + def expected_mismatch_html_filename(self, filename): > > How about calling this reftest_expected_mismatch_filename? Done. Thank you again. >> Tools/Scripts/webkitpy/layout_tests/port/test.py:53 >> + self.without_expectation_files = False > > should this just be self.is_reftest? That makes sense. Done.

Hayato Ito

Comment 8 2011-03-09 01:59:15 PST

Created attachment 85147 [details] update

Hayato Ito

Comment 9 2011-03-09 02:12:49 PST

(In reply to comment #6) > (From update of attachment 85028 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review > > This patch looks fine. Just a few nits. We need to figure out whether we need to add reftests support to run-webkit-tests. That seems like a lot of work, so we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. In either case, that a discussion to be had on webkit-dev. It doesn't need to block checking this in, but it probably needs to happen before we actually checkin any reftests. We've discussed that already in the master bug entry. We don't have a plan to add reftest support to run-webkit-tests. And unfortunately, new-run-webkit-tests has not replaced old-run-webkit-tests yet. So adding a mode to new-run-webkit-test to run just the reftests might be one of the approaches we should take. We should find a way to satisfying both new-run-webkit-tests and old-run-webkit-tests users.

Dirk Pranke

Comment 10 2011-03-09 12:30:24 PST

Comment on attachment 85147 [details] update View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review Patch generally looks fine. The only substantive concern I have is with the duplication between _compare_output() and _compare_output_with_reference(). Let me know what you think about that. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92 > + It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:116 > + # Returns a dummy TestResult. We don't have to rebase for reftests. This reminds me that we'll want to add something to the rebaseline scripts like rebaseline-chromium-webkit-tests. At the least they should probably complain if asked to rebaseline a reftest. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:263 > + expected_driver_output = ExpectedDriverOutput(driver_output2.text, driver_output2.image, driver_output2.image_hash) Do you need to do this? Can't you just pass driver_output2 to write_test_result? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:268 > + def _compare_output_with_reference(self, driver_output1, driver_output2): I'm not real comfortable about the duplication between this and compare_output(). Does it make sense to combine the two? > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:359 > + self.assertEqual(res, unexpected_tests_count) We should probably add a routine in port/test.py that returns this. Maybe add a #FIXME?

Hayato Ito

Comment 11 2011-03-09 22:50:52 PST

Comment on attachment 85147 [details] update Hi Dirk, Thank you for the review. I am updating the patch. The difference between the previous patch are: - Get rid of file writing logic from SingleTestRunner._handle_error(). Now rebaseline also uses test_result_writer.py module and code is shared. - Skip image comparison when timeout occurs in reftests. - Rename FailureCrash.test_name to TestFailureCrash.reference_filename. Does similar things against FailureTime.test_name too. - Fixed a bug: When a crash occurs in reference html in reftests, a wrong stack was written. View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92 >> + > > It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug? Nice idea. I am not familiar with how a presubmit checker works. So I'll file a separate bug. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:116 >> + # Returns a dummy TestResult. We don't have to rebase for reftests. > > This reminds me that we'll want to add something to the rebaseline scripts like rebaseline-chromium-webkit-tests. At the least they should probably complain if asked to rebaseline a reftest. Yeah. I'll file a separate bug. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:263 >> + expected_driver_output = ExpectedDriverOutput(driver_output2.text, driver_output2.image, driver_output2.image_hash) > > Do you need to do this? Can't you just pass driver_output2 to write_test_result? Done. I think it's time to get rid of ExpectedDriverOutput class. I introduced an ExpectedDriverOutput once, but now I think it caused unnecessary complexity. So I've removed ExpectedRiverOutput class and reused base.DriverOutput class. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > + total_test_time = driver_output1.test_time + driver_output2.test_time Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if self._is_reftest: .. else..' blocks. So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function. >> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:359 >> + self.assertEqual(res, unexpected_tests_count) > > We should probably add a routine in port/test.py that returns this. Maybe add a #FIXME? Nice idea. I added a #FIXME. It might be better to do it in a separate patch.

Hayato Ito

Comment 12 2011-03-09 22:53:07 PST

Created attachment 85285 [details] update2

Dirk Pranke

Comment 13 2011-03-09 22:56:51 PST

(In reply to comment #11) > (From update of attachment 85147 [details]) > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > > + total_test_time = driver_output1.test_time + driver_output2.test_time > > Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if self._is_reftest: .. else..' blocks. > So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function. > Okay. The latest patch LGTM :)

Hayato Ito

Comment 14 2011-03-11 03:46:57 PST

Hi Dirk, glad to hear that. Hi Ojan, could you review the latest patch? # The earthquake happened today, so I am still working :) (In reply to comment #13) > (In reply to comment #11) > > (From update of attachment 85147 [details] [details]) > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269 > > > + total_test_time = driver_output1.test_time + driver_output2.test_time > > > > Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if self._is_reftest: .. else..' blocks. > > So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function. > > > > Okay. > > The latest patch LGTM :)

Hayato Ito

Comment 15 2011-03-14 19:28:16 PDT

Created attachment 85761 [details] update3

Hayato Ito

Comment 16 2011-03-14 20:03:56 PDT

I've updated the patch. (In reply to comment #10) > (From update of attachment 85147 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review > > Patch generally looks fine. The only substantive concern I have is with the duplication between _compare_output() and _compare_output_with_reference(). Let me know what you think about that. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92 > > + > > It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug? I've included this check routine in single_test_runner instead of implementing that as a presubmit trigger.

Hayato Ito

Comment 17 2011-03-14 23:47:20 PDT

I appreciate some reviewers could review this patch and marke this as 'r+'. Dirk and Ojan have already reviewed the most part of this patch.

Hayato Ito

Comment 18 2011-03-15 03:40:26 PDT

Hamaji-san, Thank you for the 'r+'. I'll submit the patch after fixing a minor issue about DryRunTest.

Hayato Ito

Comment 19 2011-03-15 03:46:54 PDT

Committed r81127: <http://trac.webkit.org/changeset/81127>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component Tools / Tests

Assignee

Hayato Ito

Reported

2011-03-01 04:25 PST

Modified

2011-03-15 03:46 PDT History

CC List

4 users Show

URL

Keywords

Depends on

Blocks

36065 56076

Dependencies

tree graph