63494 – REGRESSION(r84294): new-run-webkit-tests results.html generate links to diffs.html or diff.png that don't exist

Ryosuke Niwa

Reported 2011-06-27 16:51:06 PDT

Tools/Scripts/new-run-webkit-tests --debug --pixel do not generate -diffs.html or diff.png for image failures on Mac port I can see links on results.html, and in fact, layout-test-results's corresponding directories do have -expected.png and -actual.png but there are no -diffs.html or -diff.png

Ryosuke Niwa

Comment 1 2011-06-27 17:05:18 PDT

It appears that diffs.html and diff.png aren't generated when tests fail due to checksum differences.

Ryosuke Niwa

Comment 2 2011-06-27 17:16:45 PDT

At this point I'm pretty certain this is a bug in new results.html page.

Dirk Pranke

Comment 3 2011-06-27 17:21:03 PDT

On further investigation (e.g., fast/box-shadow/basic-shadows.html), we're getting cases where the checksums don't match but the images do match, as reported by ImageDiff. So, no -diff files will be created. However, the test is still reported as failing. I think the json results generator doesn't know how to distinguish the different kinds of failures, and so it reports the generic Image failure. We might need to add another flag to the test_dict in manager.py: summarize_results() to handle this case. Once again, the attempt to map the broader set of TestFailures onto the smaller set of expectation types might be hurting us.

Adam Barth

Comment 4 2011-06-29 14:08:12 PDT

Is someone stepping up to own this issue. Ojan?

Ojan Vafai

Comment 5 2011-06-29 16:54:57 PDT

(In reply to comment #3) > On further investigation (e.g., fast/box-shadow/basic-shadows.html), we're getting cases where the checksums don't match but the images do match, as reported by ImageDiff. Hm. That should not happen now that checksums are embedded in pngs! > So, no -diff files will be created. However, the test is still reported as failing. I think the json results generator doesn't know how to distinguish the different kinds of failures, and so it reports the generic Image failure. > > We might need to add another flag to the test_dict in manager.py: summarize_results() to handle this case. > > Once again, the attempt to map the broader set of TestFailures onto the smaller set of expectation types might be hurting us. The right solution IMO, is to make the results types serialized to the JSON file have a one-to-one mapping with the TestFailure types. That would also help cleanup code. (In reply to comment #4) > Is someone stepping up to own this issue. Ojan? I won't realistically get to it anytime soon.

Dirk Pranke

Comment 6 2011-06-29 17:12:36 PDT

(In reply to comment #5) > (In reply to comment #3) > > On further investigation (e.g., fast/box-shadow/basic-shadows.html), we're getting cases where the checksums don't match but the images do match, as reported by ImageDiff. > > Hm. That should not happen now that checksums are embedded in pngs! > Tony pointed out a difference a week or two ago where the apple mac ImageDiff implementation will report some PNGs as identical even when they are not bit-for-bit identical (i.e., even when --tolerance=0.0 is specified). I think that that's a bug in ImageDiff that should be fixed, but perhaps that is debatable. Regardless, that is probably what is causing this. > The right solution IMO, is to make the results types serialized to the JSON file have a one-to-one mapping with the TestFailure types. That would also help cleanup code. That would probably work, and certainly simplify some things.

Adam Barth

Comment 7 2011-06-29 17:13:46 PDT

Ojan, if you're not going to fix your regression, should we roll out your patch?

Adam Barth

Comment 8 2011-06-29 18:02:20 PDT

Hum... That patch is from 04/19/2011 14:18:52 (2 months ago). :(

Adam Barth

Comment 9 2011-06-29 18:06:23 PDT

Reading this more carefully, this doesn't need to block Bug 34984. This appears to happen only in the case were the images differ bit-by-bit but ImageDiff says they match, which should be rare.

Ryosuke Niwa

Comment 10 2011-06-29 18:11:45 PDT

(In reply to comment #9) > Reading this more carefully, this doesn't need to block Bug 34984. This appears to happen only in the case were the images differ bit-by-bit but ImageDiff says they match, which should be rare. It happens on Mac port all the time because it has the default non-zero tolerance.

Ryosuke Niwa

Comment 11 2011-06-29 18:14:41 PDT

So it should block the bug 34984.

Adam Barth

Comment 12 2011-06-29 18:19:31 PDT

I see. Ok. Thanks.

Adam Barth

Comment 13 2011-06-29 19:50:43 PDT

Created attachment 99218 [details] Work in progress

Adam Barth

Comment 14 2011-06-29 19:58:33 PDT

Created attachment 99222 [details] Work in progress

Adam Barth

Comment 15 2011-06-29 21:27:13 PDT

Created attachment 99231 [details] Work in progress

Adam Barth

Comment 16 2011-06-29 22:15:04 PDT

Created attachment 99236 [details] Work in progress (might actually work)

Eric Seidel (no email)

Comment 17 2011-06-29 22:20:03 PDT

Comment on attachment 99236 [details] Work in progress (might actually work) View in context: https://bugs.webkit.org/attachment.cgi?id=99236&action=review I like the idea. I'd have to review it in the morning. > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:143 > + # FIXME: It's not clear what we should return in this case. > + # Maybe we should throw an exception? > return True Seems you could easily raise ValueError here.

Adam Barth

Comment 18 2011-06-29 22:34:29 PDT

Created attachment 99239 [details] Patch

Dirk Pranke

Comment 19 2011-06-29 22:39:27 PDT

Looks plausible to me also. I have only skimmed the changes and need to review it more carefully as well Adding Tom Hudson to this since this is related to some work he was doing that would allow us to return more metrics than just true/false for the diff, so that we could sort results by the degree and type of differences. Tom, this patch has much if not all of the restructuring we'd probably want for the things you were working on ...

Eric Seidel (no email)

Comment 20 2011-06-29 22:56:17 PDT

Comment on attachment 99239 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:573 > + def diff_image(self, expected_contents, actual_contents): > self.tolerance_used_for_diff_image = self._options.tolerance > return True Isn't this supposed to return an image diff?

Adam Barth

Comment 21 2011-06-29 22:58:55 PDT

Comment on attachment 99239 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review >> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:573 >> return True > > Isn't this supposed to return an image diff? This is some fake thing for testing something ridiculous. I think it just has to return something that's not falsy.

Eric Seidel (no email)

Comment 22 2011-06-30 11:18:34 PDT

Comment on attachment 99239 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review r- for nits. > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268 > + res, buildbot_output, regular_output, user = logging_run(['--print-last-failures'], filesystem=fs) > + self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n']) I'm confused what would prompt this test case to complain about checksum.html? This is slightly concerning. Would be nice to hear dirks thoughts on this. > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:275 > - if not expected_driver_outputs.image: > + if not expected_driver_output.image: > failures.append(test_failures.FailureMissingImage()) Why if the expected output is missing an image do we add FailureMissingImage here? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281 > + elif driver_output.image_hash != expected_driver_output.image_hash: > + driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image) > + if driver_output.image_diff: > + failures.append(test_failures.FailureImageHashMismatch()) Its a little subtle that this image_diff creation is here. We may need to rename this method or add furhter commetns. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:72 > + # FIXME: This work should be done earlier in the pipeline. Can you be more specific as to when? You're saying that the image diffing should be done before writing out files? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:265 > + # FIXME: This seems like a text file, not a binary file. All .diff/.patch files are binary, but if this is the diffs.html file (which cycles through the images) then yes, it seems like text rather than binary. > Tools/Scripts/webkitpy/layout_tests/port/base.py:937 > + self.image_diff = None # image_diff gets filled in after construction. It's odd that we even need to store this on driver_output. It doesn't come from the driver. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:179 > - raise e > + raise Yay! I wonder how many other places we get this wrong. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:182 > + if exit_code == 1: > + result = self._filesystem.read_binary_file(diff_filename) Although this seems awkward for chromium, this makes much more sense for WebKit who sends the binary over the wire instead of through the filesystem. > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:143 > + # FIXME: It's not clear what we should return in this case. > + # Maybe we should throw an exception? > return True I would raise ValueError (or any other error which seems reasonable for bad arguments). > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:145 > + process = self._create_image_diff_process(expected_contents, actual_contents) Maybe thsi should say "start"? "create" blends wiht image_diff "create_image_diff" as though you're asking for the process which creates the image diffs. > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:160 > + (len(actual_contents), actual_contents, len(expected_contents), expected_contents)) Did you mean to indent this again? > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:189 > + if output.startswith('diff'): > + m = re.match('diff: (.+)% (passed|failed)', output) > + if m.group(2) == 'passed': > + return None > + return output I wonder if ORWT saves the diffs even when they "pass"?

Adam Barth

Comment 23 2011-06-30 11:30:08 PDT

Comment on attachment 99239 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=99239&action=review >> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268 >> + self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n']) > > I'm confused what would prompt this test case to complain about checksum.html? This is slightly concerning. Would be nice to hear dirks thoughts on this. I suspect this is because we change how this kind of failure is treated. /me will investigate. >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:275 >> failures.append(test_failures.FailureMissingImage()) > > Why if the expected output is missing an image do we add FailureMissingImage here? Because that's what FailureMissingImage means! >> Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281 >> + failures.append(test_failures.FailureImageHashMismatch()) > > Its a little subtle that this image_diff creation is here. We may need to rename this method or add furhter commetns. Ok. >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:72 >> + # FIXME: This work should be done earlier in the pipeline. > > Can you be more specific as to when? You're saying that the image diffing should be done before writing out files? I've updated the comment. We should compare image for ref and non-ref tests at the same layer. That's all. >> Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:265 >> + # FIXME: This seems like a text file, not a binary file. > > All .diff/.patch files are binary, but if this is the diffs.html file (which cycles through the images) then yes, it seems like text rather than binary. I'm going to change that in a future patch. >> Tools/Scripts/webkitpy/layout_tests/port/base.py:937 >> + self.image_diff = None # image_diff gets filled in after construction. > > It's odd that we even need to store this on driver_output. It doesn't come from the driver. That seems like a problem for a future patch. >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:143 >> return True > > I would raise ValueError (or any other error which seems reasonable for bad arguments). I'm not sure what the consequences of that would be. I'm inclined to leave the comment and worry about it later. >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:145 >> + process = self._create_image_diff_process(expected_contents, actual_contents) > > Maybe thsi should say "start"? "create" blends wiht image_diff "create_image_diff" as though you're asking for the process which creates the image diffs. Fixed. >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:160 >> + (len(actual_contents), actual_contents, len(expected_contents), expected_contents)) > > Did you mean to indent this again? Fixed. >> Tools/Scripts/webkitpy/layout_tests/port/webkit.py:189 >> + return output > > I wonder if ORWT saves the diffs even when they "pass"? The diffs don't get created in that case, which is the whole problem.

Adam Barth

Comment 24 2011-06-30 11:31:26 PDT

Created attachment 99340 [details] Patch

Eric Seidel (no email)

Comment 25 2011-06-30 11:33:08 PDT

Comment on attachment 99340 [details] Patch OK.

Adam Barth

Comment 26 2011-06-30 11:37:22 PDT

Committed r90138: <http://trac.webkit.org/changeset/90138>

Dirk Pranke

Comment 27 2011-06-30 12:37:05 PDT

Comment on attachment 99340 [details] Patch Patch looks generally good. Lots of minor stuff that could be changed, but most if not all of it could be done in a follow-on patch. The only major thing I don't like is hanging the image_diff off of the driver_output. I would probably have R-'ed the patch if I thought that fixing that wouldn't complicate the patch even more, so hopefully we can clean that up soon rather than forget about it. I don't think I saw anything that looked actually incorrect in the sense that it would break things. > Tools/ChangeLog:18 > + It would've been nice to do these two things in two different patches, since they're both nontrivial, but I'm not going to lose a lot of sleep over it. > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268 > + self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n']) Both you and Eric seem to be unwrapping lines as you go along in these patches, as well as making actual changes. Since I've had this feedback on my patches, I'll pass it along: it would be nice if you separated out the formatting changes from the semantic changes into different patches. In addition, since there's nothing actually wrong with the wrapped code, it seems like these changes are kinda unnecessary (one could argue that I would be just as justified in submitting patches that added the line breaks back in, because I find the old way easier to read ...). > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:426 > + 'failures/unexpected/checksum-with-matching-image.html', If this test no longer fails, we should rename it to passes/checksum-with-matching-image.html > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:431 > + self.assertTrue(fs.read_text_file('/tmp/layout-test-results/unexpected_results.json').find('"num_regressions":0') != -1) res contains the # of unexpected failures. Just do self.assertEquals(res, 0). > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:276 > + if not expected_driver_output.image: This is concerning ... if these typos weren't causing failures in the test suite before, then these lines weren't being executed. We should figure out why that is/was, and if it's no longer true. Do you know why they were executing before and that they now are? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281 > + driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image) The driver isn't producing the image_diff, so I'm not pleased with hanging the image_diff off of it. Can we return the diff separately alongside the failures? Maybe we should pass in the TestResult object in to the compare_* routines and modify that instead? > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 > + failures.append(test_failures.FailureImageHashMismatch()) Probably a separate patch, but I wonder if we should just get rid of the FailureImageHash* failure types and just have FailureImageMismatch ? > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:61 > + writer.write_image_diff_files(driver_output.image_diff) See above re: image_diff hanging off of driver_output. If we modify write_test_result() to take the TestResult object as a parameter instead of just the list of failures, then the changes I made above would allow you to just pull the existing image_diff off of that object. This can probably all be done in a follow-up patch, though. > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:75 > + writer.write_image_diff_files(image_diff) similar to above. > Tools/Scripts/webkitpy/layout_tests/port/base.py:937 > + self.image_diff = None # image_diff gets filled in after construction. see comments above. > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:142 > # FIXME: need unit tests for this. I think we can actually remove this comment, given that there are unit tests in port_testcase.py for this. > Tools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:229 > + self.assertEquals(None, port.diff_image("EXPECTED", "ACTUAL")) I wonder why these tests are here rather than in port_testcase ... I don't see anything obviously chromium-specific about them. Maybe I wrote them before I wrote the port_testcase ones, and never moved them over. > Tools/Scripts/webkitpy/layout_tests/port/port_testcase.py:235 > + self.assertFalse(port.diff_image('', '')) These should be testing against None explicitly. A port implementation that didn't get updated and returned False to indicate that there was a image diff would incorrectly pass here. > Tools/Scripts/webkitpy/layout_tests/port/test.py:148 > + actual_checksum='text-image-checksum_fail-checksum') See comment way earlier ... this should be renamed if it no longer fails. > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:142 > + # Maybe we should throw an exception? Obviously, the tests expect you return that there was no diff. I don't remember why I thought this was acceptable or correct, unfortunately :( Maybe there's a comment in the patch that added them? > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:190 > + return output The re-ordering of these branches looks correct, but it would be nice if we had tests that tested these cases one way or another to know that we were fixing and not introducing bugs. Oh, and I'd like a pony.

Adam Barth

Comment 28 2011-06-30 13:58:46 PDT

(In reply to comment #27) > (From update of attachment 99340 [details]) > Patch looks generally good. Thanks for looking at the patch. > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268 > > + self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n']) > > Both you and Eric seem to be unwrapping lines as you go along in these patches, as well as making actual changes. > > Since I've had this feedback on my patches, I'll pass it along: it would be nice if you separated out the formatting changes from the semantic changes into different patches. In addition, since there's nothing actually wrong with the wrapped code, it seems like these changes are kinda unnecessary (one could argue that I would be just as justified in submitting patches that added the line breaks back in, because I find the old way easier to read ...). We're following PEP8, with the exception of the line limit, where we're following WebKit style. In any case, if our biggest problem is line length, then we're in good shape. > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:426 > > + 'failures/unexpected/checksum-with-matching-image.html', > > If this test no longer fails, we should rename it to passes/checksum-with-matching-image.html Ok. > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:431 > > + self.assertTrue(fs.read_text_file('/tmp/layout-test-results/unexpected_results.json').find('"num_regressions":0') != -1) > > res contains the # of unexpected failures. Just do self.assertEquals(res, 0). The two seem interchangable. What the advantage of one or the other? > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:276 > > + if not expected_driver_output.image: > > This is concerning ... if these typos weren't causing failures in the test suite before, then these lines weren't being executed. We should figure out why that is/was, and if it's no longer true. I just renamed a bound variable. > Do you know why they were executing before and that they now are? > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281 > > + driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image) > > The driver isn't producing the image_diff, so I'm not pleased with hanging the image_diff off of it. > > Can we return the diff separately alongside the failures? Maybe we should pass in the TestResult object in to the compare_* routines and modify that instead? Yeah, this isn't ideal, but that's not a blocking issue. Hence, it's a problem for another day. > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 > > + failures.append(test_failures.FailureImageHashMismatch()) > > Probably a separate patch, but I wonder if we should just get rid of the FailureImageHash* failure types and just have FailureImageMismatch ? We're going to have to rip out the multiple representations of the result of running a test and replace it with something consistent. We should do that then. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:61 > > + writer.write_image_diff_files(driver_output.image_diff) > > See above re: image_diff hanging off of driver_output. If we modify write_test_result() to take the TestResult object as a parameter instead of just the list of failures, then the changes I made above would allow you to just pull the existing image_diff off of that object. This can probably all be done in a follow-up patch, though. This all needs to be re-architected globally. > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:75 > > + writer.write_image_diff_files(image_diff) > > similar to above. > > > Tools/Scripts/webkitpy/layout_tests/port/base.py:937 > > + self.image_diff = None # image_diff gets filled in after construction. > > see comments above. > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:142 > > # FIXME: need unit tests for this. > > I think we can actually remove this comment, given that there are unit tests in port_testcase.py for this. Please feel to write such a patch. > > Tools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:229 > > + self.assertEquals(None, port.diff_image("EXPECTED", "ACTUAL")) > > I wonder why these tests are here rather than in port_testcase ... I don't see anything obviously chromium-specific about them. Maybe I wrote them before I wrote the port_testcase ones, and never moved them over. Dunno. > > Tools/Scripts/webkitpy/layout_tests/port/port_testcase.py:235 > > + self.assertFalse(port.diff_image('', '')) > > These should be testing against None explicitly. A port implementation that didn't get updated and returned False to indicate that there was a image diff would incorrectly pass here. Why is that an incorrect pass? Returning anything falsy is probably ok. > > Tools/Scripts/webkitpy/layout_tests/port/test.py:148 > > + actual_checksum='text-image-checksum_fail-checksum') > > See comment way earlier ... this should be renamed if it no longer fails. > > > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:142 > > + # Maybe we should throw an exception? > > Obviously, the tests expect you return that there was no diff. I don't remember why I thought this was acceptable or correct, unfortunately :( Maybe there's a comment in the patch that added them? The problem is that they're returning that there is a diff (which makes very little sense)! > > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:190 > > + return output > > The re-ordering of these branches looks correct, but it would be nice if we had tests that tested these cases one way or another to know that we were fixing and not introducing bugs. Oh, and I'd like a pony. I don't think I changed any behavior here.

Dirk Pranke

Comment 29 2011-06-30 15:13:58 PDT

(In reply to comment #28) > (In reply to comment #27) > > (From update of attachment 99340 [details] [details]) > > Patch looks generally good. > > Thanks for looking at the patch. > > > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:268 > > > + self.assertEqual(regular_output.get(), [u'failures/expected/checksum.html\n\n']) > > > > Both you and Eric seem to be unwrapping lines as you go along in these patches, as well as making actual changes. > > > > Since I've had this feedback on my patches, I'll pass it along: it would be nice if you separated out the formatting changes from the semantic changes into different patches. In addition, since there's nothing actually wrong with the wrapped code, it seems like these changes are kinda unnecessary (one could argue that I would be just as justified in submitting patches that added the line breaks back in, because I find the old way easier to read ...). > > We're following PEP8, with the exception of the line limit, where we're following WebKit style. > > In any case, if our biggest problem is line length, then we're in good shape. > Yes. I'm mostly just teasing. > > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:426 > > > + 'failures/unexpected/checksum-with-matching-image.html', > > > > If this test no longer fails, we should rename it to passes/checksum-with-matching-image.html > > Ok. > > > > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py:431 > > > + self.assertTrue(fs.read_text_file('/tmp/layout-test-results/unexpected_results.json').find('"num_regressions":0') != -1) > > > > res contains the # of unexpected failures. Just do self.assertEquals(res, 0). > > The two seem interchangable. What the advantage of one or the other? > It's shorter, faster, clearer (IMO), and does not have a dependency on the format of the unexpected_results.json file. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:276 > > > + if not expected_driver_output.image: > > > > This is concerning ... if these typos weren't causing failures in the test suite before, then these lines weren't being executed. We should figure out why that is/was, and if it's no longer true. > > I just renamed a bound variable. > Yes, my point was that you found what are at least gaps in test coverage and perhaps actual bugs. We should figure out what those are. > > Do you know why they were executing before and that they now are? > > > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:281 > > > + driver_output.image_diff = self._port.diff_image(driver_output.image, expected_driver_output.image) > > > > The driver isn't producing the image_diff, so I'm not pleased with hanging the image_diff off of it. > > > > Can we return the diff separately alongside the failures? Maybe we should pass in the TestResult object in to the compare_* routines and modify that instead? > > Yeah, this isn't ideal, but that's not a blocking issue. Hence, it's a problem for another day. > > > > Tools/Scripts/webkitpy/layout_tests/layout_package/single_test_runner.py:283 > > > + failures.append(test_failures.FailureImageHashMismatch()) > > > > Probably a separate patch, but I wonder if we should just get rid of the FailureImageHash* failure types and just have FailureImageMismatch ? > > We're going to have to rip out the multiple representations of the result of running a test and replace it with something consistent. We should do that then. > > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:61 > > > + writer.write_image_diff_files(driver_output.image_diff) > > > > See above re: image_diff hanging off of driver_output. If we modify write_test_result() to take the TestResult object as a parameter instead of just the list of failures, then the changes I made above would allow you to just pull the existing image_diff off of that object. This can probably all be done in a follow-up patch, though. > > This all needs to be re-architected globally. I don't think you should dismiss my comments quite so lightly. You are introducing a design flaw in the code (by confusing what DriverOutput is doing). My suggestions would be straightforward to fix that and would hardly be a "global re-architecting". > > > > Tools/Scripts/webkitpy/layout_tests/layout_package/test_result_writer.py:75 > > > + writer.write_image_diff_files(image_diff) > > > > similar to above. > > > > > Tools/Scripts/webkitpy/layout_tests/port/base.py:937 > > > + self.image_diff = None # image_diff gets filled in after construction. > > > > see comments above. > > > > > Tools/Scripts/webkitpy/layout_tests/port/chromium.py:142 > > > # FIXME: need unit tests for this. > > > > I think we can actually remove this comment, given that there are unit tests in port_testcase.py for this. > > Please feel to write such a patch. > > > > Tools/Scripts/webkitpy/layout_tests/port/chromium_unittest.py:229 > > > + self.assertEquals(None, port.diff_image("EXPECTED", "ACTUAL")) > > > > I wonder why these tests are here rather than in port_testcase ... I don't see anything obviously chromium-specific about them. Maybe I wrote them before I wrote the port_testcase ones, and never moved them over. > > Dunno. > > > > Tools/Scripts/webkitpy/layout_tests/port/port_testcase.py:235 > > > + self.assertFalse(port.diff_image('', '')) > > > > These should be testing against None explicitly. A port implementation that didn't get updated and returned False to indicate that there was a image diff would incorrectly pass here. > > Why is that an incorrect pass? Returning anything falsy is probably ok. > The point is that by testing against False, you are missing cases where the routine returns False instead of a string or None (certainly a bug), and missing cases where it's returning a diff of '' instead of None (probalby a bug). > > > Tools/Scripts/webkitpy/layout_tests/port/test.py:148 > > > + actual_checksum='text-image-checksum_fail-checksum') > > > > See comment way earlier ... this should be renamed if it no longer fails. > > > > > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:142 > > > + # Maybe we should throw an exception? > > > > Obviously, the tests expect you return that there was no diff. I don't remember why I thought this was acceptable or correct, unfortunately :( Maybe there's a comment in the patch that added them? > > The problem is that they're returning that there is a diff (which makes very little sense)! > If one file exists and is a valid image, and the other exists but is empty (what this branch is testing), I would say that that is a (sensible) diff. > > > Tools/Scripts/webkitpy/layout_tests/port/webkit.py:190 > > > + return output > > > > The re-ordering of these branches looks correct, but it would be nice if we had tests that tested these cases one way or another to know that we were fixing and not introducing bugs. Oh, and I'd like a pony. > > I don't think I changed any behavior here. Well, you clearly changed the logic of the code, and that logic should be visibly (testably) different, even if current tests don't happen to illustrate that. It might be nice to have such tests, but I don't think it's super important (hence the pony comment).

Attachments
Work in progress (2.64 KB, patch) 2011-06-29 19:50 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Work in progress (675 bytes, patch) 2011-06-29 19:58 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Work in progress (20.06 KB, patch) 2011-06-29 21:27 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Work in progress (might actually work) (23.16 KB, patch) 2011-06-29 22:15 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Patch (25.48 KB, patch) 2011-06-29 22:34 PDT, Adam Barth	no flags	Details Formatted Diff Diff
Patch (25.65 KB, patch) 2011-06-30 11:31 PDT, Adam Barth	eric: review+	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.