Bug 55457

Summary: [NRWT] Add support for reftests to new-run-webkit-tests
Product: WebKit Reporter: Hayato Ito <hayato>
Component: Tools / TestsAssignee: Hayato Ito <hayato>
Status: RESOLVED FIXED    
Severity: Normal CC: dpranke, hamaji, mihaip, ojan
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 36065, 56076    
Attachments:
Description Flags
in-progress. not-ready-for-review
none
add-reftests
none
update
none
update2
none
update3 hamaji: review+

Description Hayato Ito 2011-03-01 04:25:32 PST
Please see a master bug for the details:
  https://bugs.webkit.org/show_bug.cgi?id=36065

This is an initial effort to add support for reftests to new-run-webkit-tests.
Comment 1 Hayato Ito 2011-03-01 04:33:31 PST
Created attachment 84209 [details]
in-progress. not-ready-for-review
Comment 2 Hayato Ito 2011-03-08 00:02:32 PST
Created attachment 85028 [details]
add-reftests
Comment 3 Hayato Ito 2011-03-08 00:19:42 PST
I guess there are a lot of things remained until we can announce that 'Reftests is now available'.
Updating documents, wiki, tweaking (old)-webkit-run-tests and so on.
But as for new-run-webkit-tests, I think this patch is all we have to do.

Could you review the patch?
I'll split this patch into smaller patches if you prefer. But most changes are tightly related, so I put all into this patch at first.

This patch should not affect existing layout tests at all unless you add 'reftest' in LayoutTests directory.
So this patch should be safe to land.
Comment 4 Hayato Ito 2011-03-08 21:43:44 PST
Hi Dirk, Ojan,

Could you review this?
I am aiming to add reftest supports in this quarter. So I appreciate the reviews.
Comment 5 Hayato Ito 2011-03-08 23:38:12 PST
FYI

I started to write a Wiki page about how to write reftests:
https://trac.webkit.org/wiki/Writing%20Reftests
Comment 6 Ojan Vafai 2011-03-08 23:38:32 PST
Comment on attachment 85028 [details]
add-reftests

View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review

This patch looks fine. Just a few nits. We need to figure out whether we need to add reftests support to run-webkit-tests. That seems like a lot of work, so we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. In either case, that a discussion to be had on webkit-dev. It doesn't need to block checking this in, but it probably needs to happen before we actually checkin any reftests.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:85
> +                           ' expected-html file and expected-mismatch.html file'

nit: s/expected-html/expected.html

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:117
> +                # Returns a dummy TestResult. We don't have to rebase for reftests.
> +                return TestResult(self._filename)

Should this return None and then the calling code handles None appropriately? That seems cleaner to me at first glance, but I'm not 100% sure how this code is used.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:188
> +              This arg is optional and shoulbe be used only in reftests until we have a better way to know

s/shoulbe/should

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:200
> +            # FIXME: Move this actual file writing code somewhere so that this fucntion doesn't have any side effects.

s/fucntion/function

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269
> +        max_test_time = max(driver_output1.test_time, driver_output2.test_time)

This should  probably just be the sum of the test times since we use this information to see how long a test took to run.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283
> +        else:
> +            if driver_output1.image_hash != driver_output2.image_hash:

This can be an "elif"

> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:310
> +        uris = [self.relative_output_filename(filename, fn) for
> +                fn in out_names]

fn is not a clear variable name. out_filename perhaps?

> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:328
> +        uris = [self.relative_output_filename(filename, fn) for
> +                fn in out_names]

ditto

> Tools/Scripts/webkitpy/layout_tests/port/base.py:350
> +    def expected_html_filename(self, filename):

how about calling this reftest_expected_filename? The fact that it ends in .html is an implementation detail.

> Tools/Scripts/webkitpy/layout_tests/port/base.py:354
> +    def expected_mismatch_html_filename(self, filename):

How about calling this reftest_expected_mismatch_filename?

> Tools/Scripts/webkitpy/layout_tests/port/test.py:53
> +        self.without_expectation_files = False

should this just be self.is_reftest?
Comment 7 Hayato Ito 2011-03-09 01:37:40 PST
Comment on attachment 85028 [details]
add-reftests

View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:85
>> +                           ' expected-html file and expected-mismatch.html file'
> 
> nit: s/expected-html/expected.html

Done

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:117
>> +                return TestResult(self._filename)
> 
> Should this return None and then the calling code handles None appropriately? That seems cleaner to me at first glance, but I'm not 100% sure how this code is used.

I don't think returning None as a TestResult is good idea. That causes the callers burdens and things more complex.
I agree this 'Dummy' TestResult is not the best way. That's ugly.
Ideally we should skip calling single_test_runner itself if the test is a reftest in rebasing. But that requires yet another non-trivial changes.
So I'd like to leave this code as is until we have a better way in rebasing.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:188
>> +              This arg is optional and shoulbe be used only in reftests until we have a better way to know
> 
> s/shoulbe/should

Done

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:200
>> +            # FIXME: Move this actual file writing code somewhere so that this fucntion doesn't have any side effects.
> 
> s/fucntion/function

Done

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269
>> +        max_test_time = max(driver_output1.test_time, driver_output2.test_time)
> 
> This should  probably just be the sum of the test times since we use this information to see how long a test took to run.

Done.  I used 'sum' instead of 'max'.

Let me explain my thought.
Actually I had thought which was better between 'max' and 'sum'.

I have an concern using 'sum'. It might confuse users or flakiness dashboard in the following case:

  - timeout limit: 10s
  - html1 took: 8s (it's okay because 8s < 10s)
  - html2 took: 7s (it's okay because 8s < 10s)
  - sum: 15s

Users might be upset because displayed test time is bigger than '10s', but TIMEOUT didn't happen.
Ideally, we should preserve both test_time values from both html files and display both values. But that is not realistic due to the current layout tests convention.
We should compromise on this.

If you have any ideas, please let me know.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283
>> +            if driver_output1.image_hash != driver_output2.image_hash:
> 
> This can be an "elif"

Done

>> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:310
>> +                fn in out_names]
> 
> fn is not a clear variable name. out_filename perhaps?

Done. Thank you. I don't remember why I chose such a name.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/test_failures.py:328
>> +                fn in out_names]
> 
> ditto

Done

>> Tools/Scripts/webkitpy/layout_tests/port/base.py:350
>> +    def expected_html_filename(self, filename):
> 
> how about calling this reftest_expected_filename? The fact that it ends in .html is an implementation detail.

Done. Although I don't think using 'html' is so bad because '.html' is only supported in current implementation and it's intuitive.
But I don't have a strong opinion. Either is okay for me. Thank you.

>> Tools/Scripts/webkitpy/layout_tests/port/base.py:354
>> +    def expected_mismatch_html_filename(self, filename):
> 
> How about calling this reftest_expected_mismatch_filename?

Done. Thank you again.

>> Tools/Scripts/webkitpy/layout_tests/port/test.py:53
>> +        self.without_expectation_files = False
> 
> should this just be self.is_reftest?

That makes sense. Done.
Comment 8 Hayato Ito 2011-03-09 01:59:15 PST
Created attachment 85147 [details]
update
Comment 9 Hayato Ito 2011-03-09 02:12:49 PST
(In reply to comment #6)
> (From update of attachment 85028 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=85028&action=review
> 
> This patch looks fine. Just a few nits. We need to figure out whether we need to add reftests support to run-webkit-tests. That seems like a lot of work, so we might be able to get away with adding a mode to new-run-webkit-tests to run just the reftests and then shelling out from run-webkit-tests to new-run-webkit-tests or just requiring people to run both until new-run-webkit-tests replaces old-run-webkit-tests. In either case, that a discussion to be had on webkit-dev. It doesn't need to block checking this in, but it probably needs to happen before we actually checkin any reftests.

We've discussed that already in the master bug entry. We don't have a plan to add reftest support to run-webkit-tests. And unfortunately, new-run-webkit-tests has not replaced old-run-webkit-tests yet.

So adding a mode to new-run-webkit-test to run just the reftests might be one of the approaches we should take.

We should find a way to satisfying both new-run-webkit-tests and old-run-webkit-tests users.
Comment 10 Dirk Pranke 2011-03-09 12:30:24 PST
Comment on attachment 85147 [details]
update

View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review

Patch generally looks fine. The only substantive concern I have is with the duplication between _compare_output() and _compare_output_with_reference(). Let me know what you think about that.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92
> +

It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:116
> +                # Returns a dummy TestResult. We don't have to rebase for reftests.

This reminds me that we'll want to add something to the rebaseline scripts like rebaseline-chromium-webkit-tests. At the least they should probably complain if asked to rebaseline a reftest.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:263
> +        expected_driver_output = ExpectedDriverOutput(driver_output2.text, driver_output2.image, driver_output2.image_hash)

Do you need to do this? Can't you just pass driver_output2 to write_test_result?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:268
> +    def _compare_output_with_reference(self, driver_output1, driver_output2):

I'm not real comfortable about the duplication between this and compare_output(). Does it make sense to combine the two?

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:359
> +        self.assertEqual(res, unexpected_tests_count)

We should probably add a routine in  port/test.py that returns this. Maybe add a #FIXME?
Comment 11 Hayato Ito 2011-03-09 22:50:52 PST
Comment on attachment 85147 [details]
update

Hi Dirk,
Thank you for the review.

I am updating the patch. The difference between the previous patch are:
  - Get rid of file writing logic from SingleTestRunner._handle_error(). Now rebaseline also uses test_result_writer.py module and code is shared.
  - Skip image comparison when timeout occurs in reftests.
  - Rename FailureCrash.test_name to TestFailureCrash.reference_filename. Does similar things against FailureTime.test_name too.
  - Fixed a bug: When a crash occurs in reference html in reftests, a wrong stack was written.

  


View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92
>> +
> 
> It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug?

Nice idea. I am not familiar with how a presubmit checker works. So I'll file a separate bug.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:116
>> +                # Returns a dummy TestResult. We don't have to rebase for reftests.
> 
> This reminds me that we'll want to add something to the rebaseline scripts like rebaseline-chromium-webkit-tests. At the least they should probably complain if asked to rebaseline a reftest.

Yeah. I'll file a separate bug.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:263
>> +        expected_driver_output = ExpectedDriverOutput(driver_output2.text, driver_output2.image, driver_output2.image_hash)
> 
> Do you need to do this? Can't you just pass driver_output2 to write_test_result?

Done.

I think it's time to get rid of ExpectedDriverOutput class.
I introduced an ExpectedDriverOutput once, but now I think it caused unnecessary complexity.

So I've removed ExpectedRiverOutput class and reused base.DriverOutput class.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269
> +        total_test_time = driver_output1.test_time + driver_output2.test_time

Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if  self._is_reftest: .. else..' blocks.
So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function.

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_unittest.py:359
>> +        self.assertEqual(res, unexpected_tests_count)
> 
> We should probably add a routine in  port/test.py that returns this. Maybe add a #FIXME?

Nice idea. I added a #FIXME. It might be better to do it in a separate patch.
Comment 12 Hayato Ito 2011-03-09 22:53:07 PST
Created attachment 85285 [details]
update2
Comment 13 Dirk Pranke 2011-03-09 22:56:51 PST
(In reply to comment #11)
> (From update of attachment 85147 [details])
> > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269
> > +        total_test_time = driver_output1.test_time + driver_output2.test_time
> 
> Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if  self._is_reftest: .. else..' blocks.
> So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function.
>

Okay. 

The latest patch LGTM :)
Comment 14 Hayato Ito 2011-03-11 03:46:57 PST
Hi Dirk, glad to hear that. 

Hi Ojan, could you review the latest patch?

# The earthquake happened today, so I am still working :)

(In reply to comment #13)
> (In reply to comment #11)
> > (From update of attachment 85147 [details] [details])
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:269
> > > +        total_test_time = driver_output1.test_time + driver_output2.test_time
> > 
> > Actually I tried it at first. But it turned out that it caused an ugly function which contains many 'if  self._is_reftest: .. else..' blocks.
> > So I'd like to choose two simple functions rather than one complex function until we have a strong need to combine them into one function.
> >
> 
> Okay. 
> 
> The latest patch LGTM :)
Comment 15 Hayato Ito 2011-03-14 19:28:16 PDT
Created attachment 85761 [details]
update3
Comment 16 Hayato Ito 2011-03-14 20:03:56 PDT
I've updated the patch.

(In reply to comment #10)
> (From update of attachment 85147 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=85147&action=review
> 
> Patch generally looks fine. The only substantive concern I have is with the duplication between _compare_output() and _compare_output_with_reference(). Let me know what you think about that.
> 
> > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:92
> > +
> 
> It would be nice to have a presubmit trigger that checked for this as well. Maybe file a separate bug?

I've included this check routine in single_test_runner instead of implementing that as a presubmit trigger.
Comment 17 Hayato Ito 2011-03-14 23:47:20 PDT
I appreciate some reviewers could review this patch and marke this as 'r+'.
Dirk and Ojan have already reviewed the most part of this patch.
Comment 18 Hayato Ito 2011-03-15 03:40:26 PDT
Hamaji-san,

Thank you for the 'r+'.
I'll submit the patch after fixing a minor issue about DryRunTest.
Comment 19 Hayato Ito 2011-03-15 03:46:54 PDT
Committed r81127: <http://trac.webkit.org/changeset/81127>