Bug 38063

Summary: new-run-webkit-tests complains about missing pixel results instead of plopping down new expectations
Product: WebKit Reporter: James Robinson <jamesr>
Component: Tools / TestsAssignee: Adam Barth <abarth>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, eric, ojan, rniwa, tony, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 34984    
Attachments:
Description Flags
Work in progress
none
Patch
none
Patch for landing none

Description James Robinson 2010-04-23 15:59:30 PDT
Run a test that does not have any -expected.png or -expected.checksum files checked in with the "-p" option.  old-run-webkit-tests generates a new .png and .checksum file and sticks them into the appropriate place (LayoutTests/platform/mac when running on my mac).  new-run-webkit-tests fails the test and complains about missing expectation
Comment 1 Ojan Vafai 2010-04-25 07:57:47 PDT
Moving to the old-run-webkit-tests behavior is fine with me as long as we print something out that we're doing so. I don't like that old-run-webkit-tests does so silently.
Comment 2 Eric Seidel (no email) 2011-06-26 23:16:11 PDT
*** Bug 62497 has been marked as a duplicate of this bug. ***
Comment 3 Adam Barth 2011-06-27 18:03:57 PDT
NRWT does the same thing for text results.  Do you think we should switch both to match the ORWT behavior, or do you think pixel tests should work different than text tests?
Comment 4 Eric Seidel (no email) 2011-06-28 01:09:33 PDT
I recommend switching both to the ORWT behavior and adding a --no-new-test-results flag.  Chromium has a wrapper script around NRWT for their bots iirc.  We could add that flag to that wrapper for easy transition.  We could also make the chromium port imply --no-new-test-results for the time-being (or alternatively, make the run-webkit-tests wrapper imply --new-test-results when running new-run-webkit-tests.  That's part of what it's there for, to make this migration easier:
http://trac.webkit.org/browser/trunk/Tools/Scripts/run-webkit-tests
Comment 5 Ryosuke Niwa 2011-06-28 06:13:01 PDT
I'd prefer having --no-new-test-results since that behavior seems to be only useful in bots.
Comment 6 Adam Barth 2011-06-28 13:57:35 PDT
Ok.  Will do.  Thanks.
Comment 7 Adam Barth 2011-06-29 09:58:06 PDT
Created attachment 99108 [details]
Work in progress
Comment 8 Adam Barth 2011-06-29 12:00:43 PDT
Created attachment 99121 [details]
Patch
Comment 9 Adam Barth 2011-06-29 12:01:46 PDT
There are some niceties missing (as noted in the ChangeLog).  Also the print statements that say which files are being created don't seem to be appearing.  If I bump the log level, then they show up turning unit testing, which is annoying.  Anyway, more polish required.
Comment 10 Dirk Pranke 2011-06-29 12:57:57 PDT
Comment on attachment 99121 [details]
Patch

Generally looks good ... R+'ing it although there are some nits and at least one change that looks correct but unnecessary that I have a question about.

View in context: https://bugs.webkit.org/attachment.cgi?id=99121&action=review

> Tools/Scripts/webkitpy/common/net/layouttestresults.py:164
> +        return [result for result in self._test_results if result.has_failure_matching_types(*failure_types)]

Why make this change (and the one in test_results.py)? It seems unnecessary.

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:309
> +            dest="save_missing_results", default=True,

Nit: might this be better as dest="new_test_results"? That's more in keeping with how we normally name options.

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:798
> +        failures = self._get_failures(result_summary, include_crashes=False, include_missing=False)

Hm. I'm not sure about this change, given that we can have flaky MISSING results if a text-only test crashes or fails prior to calling dumpAsText(). I agree that there's no point in retrying a truly missing test, and maybe the flaky case is rare enough that this is the right thing to do.

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:936
> +    def _get_failures(self, result_summary, include_crashes=True, include_missing=True):

Nit: I'd probably leave off the default values, since it's an internal function and it looks like they're not being used anyway.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:131
> +        if self._options.save_missing_results:

Change to options.new_test_results as per above?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:145
> +    def _add_missing_baselines(self, test_result, driver_output):

Nit: maybe add comments or docstrings indicating which options trigger _add_missing_baselines and _overwrite_baselines? I know you can figure this out from looking at the callers, but it might be helpful to list them here as well.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:150
> +            # than for text tests?

If I remember correctly, with old-run-webkit-tests, all new test results go in the generic directories alongside the test, and you need --add-platform-exceptions to tell it to put the new results in the platform dir instead.

Also, I suppose theoretically at least some render tree dumps could be generic (enough for more platforms, anyway).

Regardless, there's no way to distinguish a render tree dump from a non-rendertree dump without parsing the text output, and I'd prefer to not have to start doing that. 

Maybe we can modify DRT to output custom headers when dumping the layer tree?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:160
>          # DumpRenderTree may not output utf-8 text (e.g. webarchives).

As an aside, it turns out webarchives are just XML files and I think they're always UTF-8 as a result. Should probably update this comment at some point ...

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:190
> +            _log.debug('Writing new baseline result "%s"' % output_path)

I'm tempted to say that these should be _log.info() calls ... that would address Ojan's comments about the "silent" updates, and probably is generally more useful.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:193
> +            _log.debug('Resetting baseline result "%s"' % output_path)

Same as above.
Comment 11 Adam Barth 2011-06-29 13:29:42 PDT
Comment on attachment 99121 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99121&action=review

>> Tools/Scripts/webkitpy/common/net/layouttestresults.py:164
>> -        return [result for result in self._test_results if result.has_failure_matching_types(failure_types)]
>> +        return [result for result in self._test_results if result.has_failure_matching_types(*failure_types)]
> 
> Why make this change (and the one in test_results.py)? It seems unnecessary.

I just did that to make the code less ugly.  The varags approach looked prettier than calling the function like foo([ddd]).

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:309
>> +        optparse.make_option("--no-new-test-results", action="store_false",
>> +            dest="save_missing_results", default=True,
> 
> Nit: might this be better as dest="new_test_results"? That's more in keeping with how we normally name options.

Fixed.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:798
>> -        failures = self._get_failures(result_summary, include_crashes=False)
>> +        failures = self._get_failures(result_summary, include_crashes=False, include_missing=False)
> 
> Hm. I'm not sure about this change, given that we can have flaky MISSING results if a text-only test crashes or fails prior to calling dumpAsText(). I agree that there's no point in retrying a truly missing test, and maybe the flaky case is rare enough that this is the right thing to do.

Without this check, the behavior is really braindead because the first pass generates the results and then the second pass thinks the test passes.  If this is a big source of flakiness, we can re-evaluate.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:936
>> -    def _get_failures(self, result_summary, include_crashes):
>> +    def _get_failures(self, result_summary, include_crashes=True, include_missing=True):
> 
> Nit: I'd probably leave off the default values, since it's an internal function and it looks like they're not being used anyway.

Ok.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:131
>> +        if self._options.save_missing_results:
> 
> Change to options.new_test_results as per above?

Done.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:145
>> -    def _save_baselines(self, driver_output):
>> +    def _add_missing_baselines(self, test_result, driver_output):
> 
> Nit: maybe add comments or docstrings indicating which options trigger _add_missing_baselines and _overwrite_baselines? I know you can figure this out from looking at the callers, but it might be helpful to list them here as well.

We've been trying to follow the WebKit style in document webkitpy, which is to avoid including comments that are redundant with the code and/or can become out of date when the code changes.  Listing who the callers of a function are and why they call this function seems to fall into that category.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:150
>> +            # than for text tests?
> 
> If I remember correctly, with old-run-webkit-tests, all new test results go in the generic directories alongside the test, and you need --add-platform-exceptions to tell it to put the new results in the platform dir instead.
> 
> Also, I suppose theoretically at least some render tree dumps could be generic (enough for more platforms, anyway).
> 
> Regardless, there's no way to distinguish a render tree dump from a non-rendertree dump without parsing the text output, and I'd prefer to not have to start doing that. 
> 
> Maybe we can modify DRT to output custom headers when dumping the layer tree?

I don't think your recollection of how old-run-webkit-tests works is correct.  Tests that produce render tree dumps and/or images go into the platform-specific directory by default and normal text tests go into the generic directory.  (Where "generic" means outside the "platform" directory.)  I'm not sure who it figures that out.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:160
>>          # DumpRenderTree may not output utf-8 text (e.g. webarchives).
> 
> As an aside, it turns out webarchives are just XML files and I think they're always UTF-8 as a result. Should probably update this comment at some point ...

Ok.  I don't really understand that, so I'll let you update the comment next time you're editing this file so folks who "svn blame" will ask you about it instead of me.  :)

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:190
>> +            _log.debug('Writing new baseline result "%s"' % output_path)
> 
> I'm tempted to say that these should be _log.info() calls ... that would address Ojan's comments about the "silent" updates, and probably is generally more useful.

Yes.  The trouble was that then they showed up in the test-webkitpy output!  I'll fix that in a subsequent patch.
Comment 12 Adam Barth 2011-06-29 13:37:35 PDT
Created attachment 99139 [details]
Patch for landing
Comment 13 WebKit Review Bot 2011-06-29 15:26:11 PDT
Comment on attachment 99139 [details]
Patch for landing

Clearing flags on attachment: 99139

Committed r90055: <http://trac.webkit.org/changeset/90055>
Comment 14 WebKit Review Bot 2011-06-29 15:26:16 PDT
All reviewed patches have been landed.  Closing bug.