Bug 63494 - REGRESSION(r84294): new-run-webkit-tests results.html generate links to diffs.html or diff.png that don't exist
Summary: REGRESSION(r84294): new-run-webkit-tests results.html generate links to diffs...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Adam Barth
URL:
Keywords:
Depends on:
Blocks: 34984
  Show dependency treegraph
 
Reported: 2011-06-27 16:51 PDT by Ryosuke Niwa
Modified: 2011-06-30 15:13 PDT (History)
6 users (show)

See Also:


Attachments
Work in progress (2.64 KB, patch)
2011-06-29 19:50 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Work in progress (675 bytes, patch)
2011-06-29 19:58 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Work in progress (20.06 KB, patch)
2011-06-29 21:27 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Work in progress (might actually work) (23.16 KB, patch)
2011-06-29 22:15 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (25.48 KB, patch)
2011-06-29 22:34 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (25.65 KB, patch)
2011-06-30 11:31 PDT, Adam Barth
eric: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryosuke Niwa 2011-06-27 16:51:06 PDT
Tools/Scripts/new-run-webkit-tests --debug --pixel do not generate -diffs.html or diff.png for image failures on Mac port

I can see links on results.html, and in fact, layout-test-results's corresponding directories do have -expected.png and -actual.png but there are no -diffs.html or -diff.png
Comment 1 Ryosuke Niwa 2011-06-27 17:05:18 PDT
It appears that diffs.html and diff.png aren't generated when tests fail due to checksum differences.
Comment 2 Ryosuke Niwa 2011-06-27 17:16:45 PDT
At this point I'm pretty certain this is a bug in new results.html page.
Comment 3 Dirk Pranke 2011-06-27 17:21:03 PDT
On further investigation (e.g., fast/box-shadow/basic-shadows.html), we're getting cases where the checksums don't match but the images do match, as reported by ImageDiff.

So, no -diff files will be created. However, the test is still reported as failing. I think the json results generator doesn't know how to distinguish the different kinds of failures, and so it reports the generic Image failure.

We might need to add another flag to the test_dict in manager.py: summarize_results() to handle this case.

Once again, the attempt to map the broader set of TestFailures onto the smaller set of expectation types might be hurting us.
Comment 4 Adam Barth 2011-06-29 14:08:12 PDT
Is someone stepping up to own this issue.  Ojan?
Comment 5 Ojan Vafai 2011-06-29 16:54:57 PDT
(In reply to comment #3)
> On further investigation (e.g., fast/box-shadow/basic-shadows.html), we're getting cases where the checksums don't match but the images do match, as reported by ImageDiff.

Hm. That should not happen now that checksums are embedded in pngs!

> So, no -diff files will be created. However, the test is still reported as failing. I think the json results generator doesn't know how to distinguish the different kinds of failures, and so it reports the generic Image failure.
> 
> We might need to add another flag to the test_dict in manager.py: summarize_results() to handle this case.
> 
> Once again, the attempt to map the broader set of TestFailures onto the smaller set of expectation types might be hurting us.

The right solution IMO, is to make the results types serialized to the JSON file have a one-to-one mapping with the TestFailure types. That would also help cleanup code.

(In reply to comment #4)
> Is someone stepping up to own this issue.  Ojan?

I won't realistically get to it anytime soon.
Comment 6 Dirk Pranke 2011-06-29 17:12:36 PDT
(In reply to comment #5)
> (In reply to comment #3)
> > On further investigation (e.g., fast/box-shadow/basic-shadows.html), we're getting cases where the checksums don't match but the images do match, as reported by ImageDiff.
> 
> Hm. That should not happen now that checksums are embedded in pngs!
> 

Tony pointed out a difference a week or two ago where the apple mac ImageDiff implementation will report some PNGs as identical even when they are not bit-for-bit identical (i.e., even when --tolerance=0.0 is specified).

I think that that's a bug in ImageDiff that should be fixed, but perhaps that is debatable. Regardless, that is probably what is causing this.

> The right solution IMO, is to make the results types serialized to the JSON file have a one-to-one mapping with the TestFailure types. That would also help cleanup code.

That would probably work, and certainly simplify some things.
Comment 7 Adam Barth 2011-06-29 17:13:46 PDT
Ojan, if you're not going to fix your regression, should we roll out your patch?
Comment 8 Adam Barth 2011-06-29 18:02:20 PDT
Hum...  That patch is from 04/19/2011 14:18:52 (2 months ago).  :(
Comment 9 Adam Barth 2011-06-29 18:06:23 PDT
Reading this more carefully, this doesn't need to block Bug 34984.  This appears to happen only in the case were the images differ bit-by-bit but ImageDiff says they match, which should be rare.
Comment 10 Ryosuke Niwa 2011-06-29 18:11:45 PDT
(In reply to comment #9)
> Reading this more carefully, this doesn't need to block Bug 34984.  This appears to happen only in the case were the images differ bit-by-bit but ImageDiff says they match, which should be rare.

It happens on Mac port all the time because it has the default non-zero tolerance.
Comment 11 Ryosuke Niwa 2011-06-29 18:14:41 PDT
So it should block the bug 34984.
Comment 12 Adam Barth 2011-06-29 18:19:31 PDT
I see.  Ok.  Thanks.
Comment 13 Adam Barth 2011-06-29 19:50:43 PDT
Created attachment 99218 [details]
Work in progress
Comment 14 Adam Barth 2011-06-29 19:58:33 PDT
Created attachment 99222 [details]
Work in progress
Comment 15 Adam Barth 2011-06-29 21:27:13 PDT
Created attachment 99231 [details]
Work in progress
Comment 16 Adam Barth 2011-06-29 22:15:04 PDT
Created attachment 99236 [details]
Work in progress (might actually work)
Comment 17 Eric Seidel (no email) 2011-06-29 22:20:03 PDT
Comment on attachment 99236 [details]
Work in progress (might actually work)

View in context: https://bugs.webkit.org/attachment.cgi?id=99236&action=review

I like the idea.  I'd have to review it in the morning.

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:143
> +            # FIXME: It's not clear what we should return in this case.
> +            # Maybe we should throw an exception?
>              return True

Seems you could easily raise ValueError here.
Comment 18 Adam Barth 2011-06-29 22:34:29 PDT
Created attachment 99239 [details]
Patch
Comment 19 Dirk Pranke 2011-06-29 22:39:27 PDT
Looks plausible to me also. I have only skimmed the changes and need to review it more carefully as well

Adding Tom Hudson to this since this is related to some work he was doing that would allow us to return more metrics than just true/false for the diff, so that we could sort results by the degree and type of differences. 

Tom, this patch has much if not all of the restructuring we'd probably want for the things you were working on ...
Comment 20 Eric Seidel (no email) 2011-06-29 22:56:17 PDT
Comment on attachment 99239 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:573
> +            def diff_image(self, expected_contents, actual_contents):
>                  self.tolerance_used_for_diff_image = self._options.tolerance
>                  return True

Isn't this supposed to return an image diff?
Comment 21 Adam Barth 2011-06-29 22:58:55 PDT
Comment on attachment 99239 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:573
>>                  return True
> 
> Isn't this supposed to return an image diff?

This is some fake thing for testing something ridiculous.  I think it just has to return something that's not falsy.
Comment 22 Eric Seidel (no email) 2011-06-30 11:18:34 PDT
Comment on attachment 99239 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review

r- for nits.

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268
> +        res, buildbot_output, regular_output, user = logging_run(['--print-last-failures'], filesystem=fs)
> +        self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n'])

I'm confused what would prompt this test case to complain about checksum.html?  This is slightly concerning.  Would be nice to hear dirks thoughts on this.

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:275
> -        if not expected_driver_outputs.image:
> +        if not expected_driver_output.image:
>              failures.append(test_failures.FailureMissingImage())

Why if the expected output is missing an image do we add FailureMissingImage here?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281
> +        elif driver_output.image_hash != expected_driver_output.image_hash:
> +            driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image)
> +            if driver_output.image_diff:
> +                failures.append(test_failures.FailureImageHashMismatch())

Its a little subtle that this image_diff creation is here.  We may need to rename this method or add furhter commetns.

> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:72
> +            # FIXME: This work should be done earlier in the pipeline.

Can you be more specific as to when?  You're saying that the image diffing should be done before writing out files?

> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:265
> +        # FIXME: This seems like a text file, not a binary file.

All .diff/.patch files are binary, but if this is the diffs.html file (which cycles through the images) then yes, it seems like text rather than binary.

> Tools/Scripts/webkitpy/layout_tests/port/base.py:937
> +        self.image_diff = None  # image_diff gets filled in after construction.

It's odd that we even need to store this on driver_output.  It doesn't come from the driver.

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:179
> -                raise e
> +                raise

Yay!  I wonder how many other places we get this wrong.

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:182
> +            if exit_code == 1:
> +                result = self._filesystem.read_binary_file(diff_filename)

Although this seems awkward for chromium, this makes much more sense for WebKit who sends the binary over the wire instead of through the filesystem.

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:143
> +            # FIXME: It's not clear what we should return in this case.
> +            # Maybe we should throw an exception?
>              return True

I would raise ValueError (or any other error which seems reasonable for bad arguments).

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:145
> +        process = self._create_image_diff_process(expected_contents, actual_contents)

Maybe thsi should say "start"?  "create" blends wiht image_diff "create_image_diff" as though you're asking for the process which creates the image diffs.

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:160
> +                             (len(actual_contents), actual_contents, len(expected_contents), expected_contents))

Did you mean to indent this again?

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:189
> +        if output.startswith('diff'):
> +            m = re.match('diff: (.+)% (passed|failed)', output)
> +            if m.group(2) == 'passed':
> +                return None
> +        return output

I wonder if ORWT saves the diffs even when they "pass"?
Comment 23 Adam Barth 2011-06-30 11:30:08 PDT
Comment on attachment 99239 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review

>> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268
>> +        self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n'])
> 
> I'm confused what would prompt this test case to complain about checksum.html?  This is slightly concerning.  Would be nice to hear dirks thoughts on this.

I suspect this is because we change how this kind of failure is treated.  /me will investigate.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:275
>>              failures.append(test_failures.FailureMissingImage())
> 
> Why if the expected output is missing an image do we add FailureMissingImage here?

Because that's what FailureMissingImage means!

>> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281
>> +                failures.append(test_failures.FailureImageHashMismatch())
> 
> Its a little subtle that this image_diff creation is here.  We may need to rename this method or add furhter commetns.

Ok.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:72
>> +            # FIXME: This work should be done earlier in the pipeline.
> 
> Can you be more specific as to when?  You're saying that the image diffing should be done before writing out files?

I've updated the comment.  We should compare image for ref and non-ref tests at the same layer.  That's all.

>> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:265
>> +        # FIXME: This seems like a text file, not a binary file.
> 
> All .diff/.patch files are binary, but if this is the diffs.html file (which cycles through the images) then yes, it seems like text rather than binary.

I'm going to change that in a future patch.

>> Tools/Scripts/webkitpy/layout_tests/port/base.py:937
>> +        self.image_diff = None  # image_diff gets filled in after construction.
> 
> It's odd that we even need to store this on driver_output.  It doesn't come from the driver.

That seems like a problem for a future patch.

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:143
>>              return True
> 
> I would raise ValueError (or any other error which seems reasonable for bad arguments).

I'm not sure what the consequences of that would be.  I'm inclined to leave the comment and worry about it later.

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:145
>> +        process = self._create_image_diff_process(expected_contents, actual_contents)
> 
> Maybe thsi should say "start"?  "create" blends wiht image_diff "create_image_diff" as though you're asking for the process which creates the image diffs.

Fixed.

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:160
>> +                             (len(actual_contents), actual_contents, len(expected_contents), expected_contents))
> 
> Did you mean to indent this again?

Fixed.

>> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:189
>> +        return output
> 
> I wonder if ORWT saves the diffs even when they "pass"?

The diffs don't get created in that case, which is the whole problem.
Comment 24 Adam Barth 2011-06-30 11:31:26 PDT
Created attachment 99340 [details]
Patch
Comment 25 Eric Seidel (no email) 2011-06-30 11:33:08 PDT
Comment on attachment 99340 [details]
Patch

OK.
Comment 26 Adam Barth 2011-06-30 11:37:22 PDT
Committed r90138: <http://trac.webkit.org/changeset/90138>
Comment 27 Dirk Pranke 2011-06-30 12:37:05 PDT
Comment on attachment 99340 [details]
Patch

Patch looks generally good. Lots of minor stuff that could be changed, but most if not all of it could be done in a follow-on patch. The only major thing I don't like is hanging the image_diff off of the driver_output. I would probably have R-'ed the patch if I thought that fixing that wouldn't complicate the patch even more, so hopefully we can clean that up soon rather than forget about it.

I don't think I saw anything that looked actually incorrect in the sense that it would break things.


> Tools/ChangeLog:18
> +

It would've been nice to do these two things in two different patches, since they're both nontrivial, but I'm not going to lose a lot of sleep over it.

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268
> +        self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n'])

Both you and Eric seem to be unwrapping lines as you go along in these patches, as well as making actual changes.

Since I've had this feedback on my patches, I'll pass it along: it would be nice if you separated out the formatting changes from the semantic changes into different patches. In addition, since there's nothing actually wrong with the wrapped code, it seems like these changes are kinda unnecessary (one could argue that I would be just as justified in submitting patches that added the line breaks back in, because I find the old way easier to read ...).

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:426
> +                'failures/unexpected/checksum-with-matching-image.html',

If this test no longer fails, we should rename it to passes/checksum-with-matching-image.html

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:431
> +        self.assertTrue(fs.read_text_file('/tmp/layout-test-results/unexpected_results.json').find('"num_regressions":0') != -1)

res contains the # of unexpected failures. Just do self.assertEquals(res, 0).

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:276
> +        if not expected_driver_output.image:

This is concerning ... if these typos weren't causing failures in the test suite before, then these lines weren't being executed. We should figure out why that is/was, and if it's no longer true.

Do you know why they were executing before and that they now are?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281
> +            driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image)

The driver isn't producing the image_diff, so I'm not pleased with hanging the image_diff off of it.

Can we return the diff separately alongside the failures? Maybe we should pass in the TestResult object in to the compare_* routines and modify that instead?

> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283
> +                failures.append(test_failures.FailureImageHashMismatch())

Probably a separate patch, but I wonder if we should just get rid of the FailureImageHash* failure types and just have FailureImageMismatch ?

> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:61
> +            writer.write_image_diff_files(driver_output.image_diff)

See above re: image_diff hanging off of driver_output. If we modify write_test_result() to take the TestResult object as a parameter instead of just the list of failures, then the changes I made above would allow you to just pull the existing image_diff off of that object. This can probably all be done in a follow-up patch, though.

> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:75
> +                writer.write_image_diff_files(image_diff)

similar to above.

> Tools/Scripts/webkitpy/layout_tests/port/base.py:937
> +        self.image_diff = None  # image_diff gets filled in after construction.

see comments above.

> Tools/Scripts/webkitpy/layout_tests/port/chromium.py:142
>          # FIXME: need unit tests for this.

I think we can actually remove this comment, given that there are unit tests in port_testcase.py for this.

> Tools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:229
> +        self.assertEquals(None, port.diff_image("EXPECTED", "ACTUAL"))

I wonder why these tests are here rather than in port_testcase ... I don't see anything obviously chromium-specific about them. Maybe I wrote them before I wrote the port_testcase ones, and never moved them over.

> Tools/Scripts/webkitpy/layout_tests/port/port_testcase.py:235
> +        self.assertFalse(port.diff_image('', ''))

These should be testing against None explicitly. A port implementation that didn't get updated and returned False to indicate that there was a image diff would incorrectly pass here.

> Tools/Scripts/webkitpy/layout_tests/port/test.py:148
> +              actual_checksum='text-image-checksum_fail-checksum')

See comment way earlier ... this should be renamed if it no longer fails.

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:142
> +            # Maybe we should throw an exception?

Obviously, the tests expect you return that there was no diff. I don't remember why I thought this was acceptable or correct, unfortunately :( Maybe there's a comment in the patch that added them?

> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:190
> +        return output

The re-ordering of these branches looks correct, but it would be nice if we had tests that tested these cases one way or another to know that we were fixing and not introducing bugs. Oh, and I'd like a pony.
Comment 28 Adam Barth 2011-06-30 13:58:46 PDT
(In reply to comment #27)
> (From update of attachment 99340 [details])
> Patch looks generally good.

Thanks for looking at the patch.

> > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268
> > +        self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n'])
> 
> Both you and Eric seem to be unwrapping lines as you go along in these patches, as well as making actual changes.
> 
> Since I've had this feedback on my patches, I'll pass it along: it would be nice if you separated out the formatting changes from the semantic changes into different patches. In addition, since there's nothing actually wrong with the wrapped code, it seems like these changes are kinda unnecessary (one could argue that I would be just as justified in submitting patches that added the line breaks back in, because I find the old way easier to read ...).

We're following PEP8, with the exception of the line limit, where we're following WebKit style.

In any case, if our biggest problem is line length, then we're in good shape.

> > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:426
> > +                'failures/unexpected/checksum-with-matching-image.html',
> 
> If this test no longer fails, we should rename it to passes/checksum-with-matching-image.html

Ok.

> > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:431
> > +        self.assertTrue(fs.read_text_file('/tmp/layout-test-results/unexpected_results.json').find('"num_regressions":0') != -1)
> 
> res contains the # of unexpected failures. Just do self.assertEquals(res, 0).

The two seem interchangable.  What the advantage of one or the other?

> > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:276
> > +        if not expected_driver_output.image:
> 
> This is concerning ... if these typos weren't causing failures in the test suite before, then these lines weren't being executed. We should figure out why that is/was, and if it's no longer true.

I just renamed a bound variable.

> Do you know why they were executing before and that they now are?
> 
> > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281
> > +            driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image)
> 
> The driver isn't producing the image_diff, so I'm not pleased with hanging the image_diff off of it.
> 
> Can we return the diff separately alongside the failures? Maybe we should pass in the TestResult object in to the compare_* routines and modify that instead?

Yeah, this isn't ideal, but that's not a blocking issue.  Hence, it's a problem for another day.

> > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283
> > +                failures.append(test_failures.FailureImageHashMismatch())
> 
> Probably a separate patch, but I wonder if we should just get rid of the FailureImageHash* failure types and just have FailureImageMismatch ?

We're going to have to rip out the multiple representations of the result of running a test and replace it with something consistent.  We should do that then.

> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:61
> > +            writer.write_image_diff_files(driver_output.image_diff)
> 
> See above re: image_diff hanging off of driver_output. If we modify write_test_result() to take the TestResult object as a parameter instead of just the list of failures, then the changes I made above would allow you to just pull the existing image_diff off of that object. This can probably all be done in a follow-up patch, though.

This all needs to be re-architected globally.

> > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:75
> > +                writer.write_image_diff_files(image_diff)
> 
> similar to above.
> 
> > Tools/Scripts/webkitpy/layout_tests/port/base.py:937
> > +        self.image_diff = None  # image_diff gets filled in after construction.
> 
> see comments above.
> 
> > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:142
> >          # FIXME: need unit tests for this.
> 
> I think we can actually remove this comment, given that there are unit tests in port_testcase.py for this.

Please feel to write such a patch.

> > Tools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:229
> > +        self.assertEquals(None, port.diff_image("EXPECTED", "ACTUAL"))
> 
> I wonder why these tests are here rather than in port_testcase ... I don't see anything obviously chromium-specific about them. Maybe I wrote them before I wrote the port_testcase ones, and never moved them over.

Dunno.

> > Tools/Scripts/webkitpy/layout_tests/port/port_testcase.py:235
> > +        self.assertFalse(port.diff_image('', ''))
> 
> These should be testing against None explicitly. A port implementation that didn't get updated and returned False to indicate that there was a image diff would incorrectly pass here.

Why is that an incorrect pass?  Returning anything falsy is probably ok.

> > Tools/Scripts/webkitpy/layout_tests/port/test.py:148
> > +              actual_checksum='text-image-checksum_fail-checksum')
> 
> See comment way earlier ... this should be renamed if it no longer fails.
> 
> > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:142
> > +            # Maybe we should throw an exception?
> 
> Obviously, the tests expect you return that there was no diff. I don't remember why I thought this was acceptable or correct, unfortunately :( Maybe there's a comment in the patch that added them?

The problem is that they're returning that there is a diff (which makes very little sense)!

> > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:190
> > +        return output
> 
> The re-ordering of these branches looks correct, but it would be nice if we had tests that tested these cases one way or another to know that we were fixing and not introducing bugs. Oh, and I'd like a pony.

I don't think I changed any behavior here.
Comment 29 Dirk Pranke 2011-06-30 15:13:58 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > (From update of attachment 99340 [details] [details])
> > Patch looks generally good.
> 
> Thanks for looking at the patch.
> 
> > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268
> > > +        self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n'])
> > 
> > Both you and Eric seem to be unwrapping lines as you go along in these patches, as well as making actual changes.
> > 
> > Since I've had this feedback on my patches, I'll pass it along: it would be nice if you separated out the formatting changes from the semantic changes into different patches. In addition, since there's nothing actually wrong with the wrapped code, it seems like these changes are kinda unnecessary (one could argue that I would be just as justified in submitting patches that added the line breaks back in, because I find the old way easier to read ...).
> 
> We're following PEP8, with the exception of the line limit, where we're following WebKit style.
> 
> In any case, if our biggest problem is line length, then we're in good shape.
> 

Yes. I'm mostly just teasing. 

> > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:426
> > > +                'failures/unexpected/checksum-with-matching-image.html',
> > 
> > If this test no longer fails, we should rename it to passes/checksum-with-matching-image.html
> 
> Ok.
> 
> > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:431
> > > +        self.assertTrue(fs.read_text_file('/tmp/layout-test-results/unexpected_results.json').find('"num_regressions":0') != -1)
> > 
> > res contains the # of unexpected failures. Just do self.assertEquals(res, 0).
> 
> The two seem interchangable.  What the advantage of one or the other?
> 

It's shorter, faster, clearer (IMO), and does not have a dependency on the format of the unexpected_results.json file.


> > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:276
> > > +        if not expected_driver_output.image:
> > 
> > This is concerning ... if these typos weren't causing failures in the test suite before, then these lines weren't being executed. We should figure out why that is/was, and if it's no longer true.
> 
> I just renamed a bound variable.
> 

Yes, my point was that you found what are at least gaps in test coverage and perhaps actual bugs. We should figure out what those are.

> > Do you know why they were executing before and that they now are?
> > 
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281
> > > +            driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image)
> > 
> > The driver isn't producing the image_diff, so I'm not pleased with hanging the image_diff off of it.
> > 
> > Can we return the diff separately alongside the failures? Maybe we should pass in the TestResult object in to the compare_* routines and modify that instead?
> 
> Yeah, this isn't ideal, but that's not a blocking issue.  Hence, it's a problem for another day.
> 
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283
> > > +                failures.append(test_failures.FailureImageHashMismatch())
> > 
> > Probably a separate patch, but I wonder if we should just get rid of the FailureImageHash* failure types and just have FailureImageMismatch ?
> 
> We're going to have to rip out the multiple representations of the result of running a test and replace it with something consistent.  We should do that then.
> 
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:61
> > > +            writer.write_image_diff_files(driver_output.image_diff)
> > 
> > See above re: image_diff hanging off of driver_output. If we modify write_test_result() to take the TestResult object as a parameter instead of just the list of failures, then the changes I made above would allow you to just pull the existing image_diff off of that object. This can probably all be done in a follow-up patch, though.
> 
> This all needs to be re-architected globally.

I don't think you should dismiss my comments quite so lightly. You are introducing a design flaw in the code (by confusing what DriverOutput is doing). My suggestions would be straightforward to fix that and would hardly be a "global re-architecting".

> 
> > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:75
> > > +                writer.write_image_diff_files(image_diff)
> > 
> > similar to above.
> > 
> > > Tools/Scripts/webkitpy/layout_tests/port/base.py:937
> > > +        self.image_diff = None  # image_diff gets filled in after construction.
> > 
> > see comments above.
> > 
> > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:142
> > >          # FIXME: need unit tests for this.
> > 
> > I think we can actually remove this comment, given that there are unit tests in port_testcase.py for this.
> 
> Please feel to write such a patch.
> 
> > > Tools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:229
> > > +        self.assertEquals(None, port.diff_image("EXPECTED", "ACTUAL"))
> > 
> > I wonder why these tests are here rather than in port_testcase ... I don't see anything obviously chromium-specific about them. Maybe I wrote them before I wrote the port_testcase ones, and never moved them over.
> 
> Dunno.
> 
> > > Tools/Scripts/webkitpy/layout_tests/port/port_testcase.py:235
> > > +        self.assertFalse(port.diff_image('', ''))
> > 
> > These should be testing against None explicitly. A port implementation that didn't get updated and returned False to indicate that there was a image diff would incorrectly pass here.
> 
> Why is that an incorrect pass?  Returning anything falsy is probably ok.
> 

The point is that by testing against False, you are missing cases where the routine returns False instead of a string or None (certainly a bug), and missing cases where it's returning a diff of '' instead of None (probalby a bug).

> > > Tools/Scripts/webkitpy/layout_tests/port/test.py:148
> > > +              actual_checksum='text-image-checksum_fail-checksum')
> > 
> > See comment way earlier ... this should be renamed if it no longer fails.
> > 
> > > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:142
> > > +            # Maybe we should throw an exception?
> > 
> > Obviously, the tests expect you return that there was no diff. I don't remember why I thought this was acceptable or correct, unfortunately :( Maybe there's a comment in the patch that added them?
> 
> The problem is that they're returning that there is a diff (which makes very little sense)!
> 

If one file exists and is a valid image, and the other exists but is empty (what this branch is testing), I would say that that is a (sensible) diff.

> > > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:190
> > > +        return output
> > 
> > The re-ordering of these branches looks correct, but it would be nice if we had tests that tested these cases one way or another to know that we were fixing and not introducing bugs. Oh, and I'd like a pony.
> 
> I don't think I changed any behavior here.

Well, you clearly changed the logic of the code, and that logic should be visibly (testably) different, even if current tests don't happen to illustrate that. It might be nice to have such tests, but I don't think it's super important (hence the pony comment).