ImageDiff is spewing: "error, test and reference image have different properties." It's very unclear what this means.
Can you point to an example (i.e. on the bots)? I don't see this when I run locally.
I got it when running pixel tests on Mountain Lion. Perhaps it's specific to that OS.
I've just filed bug 92934 to propagate errors back from ImageDiff. This won't address whatever your root problem is here, but it'll make it a whole lot less ignorable :).
I can still reproduce. My command is: ./Tools/Scripts/run-webkit-tests --debug --pixel-tests LayoutTests/compositing/ Sample output: compositing/animation/busy-indicator.html -> pixel hash failed (but pixel test still passes) [8/278] compositing/color-matching/image-color-matching.html failed unexpectedly (image diff) compositing/culling/clear-fixed-iframe.html : ImageDiff produced stderr output: error, test and reference image have different properties.
Created attachment 159567 [details] Improve error reporting
Comment on attachment 159567 [details] Improve error reporting http://trac.webkit.org/changeset/126101
So now I get: compositing/culling/clear-fixed-iframe.html : ImageDiff produced stderr output: Error: test and reference images differ in alpha. Test image does not have alpha, reference image has alpha. 12$ $ sips -g all ./LayoutTests/compositing/culling/clear-fixed-iframe-expected.png /Volumes/DataSSD/Development/apple/webkit/WebKit.git/LayoutTests/compositing/culling/clear-fixed-iframe-expected.png pixelWidth: 800 pixelHeight: 600 typeIdentifier: public.png format: png formatOptions: default dpiWidth: 72.000 dpiHeight: 72.000 samplesPerPixel: 3 bitsPerSample: 8 hasAlpha: no space: RGB So hasAlpha is NO in the PNG, but ImageDiff thinks it has alpha.
But CGImageGetAlphaInfo() is saying that the image alpha flags are kCGImageAlphaLast, implying non-premultplied RGBA.
createBitmapContext() creates a bitmap context with (kCGImageAlphaPremultipliedFirst | kCGBitmapByteOrder32Host), so I think we expect all the images to have alpha.
There's something I don't understand. DRT is dumping to stdout a PNG created from an image with alpha flags 3 (kCGImageAlphaLast). However, when ImageDiff reads this in from stdin, it gets an image with alpha flags 5 (kCGImageAlphaNoneSkipLast). This is assuming that ImageDiff is not swapping the images by mistake. The code seems to read the actual before the expected: if (imageSize > 0 && !actualImage) actualImage = createImageFromStdin(imageSize); else if (imageSize > 0 && !baselineImage) baselineImage = createImageFromStdin(imageSize); I'm not sure who's printing the expected image to stdin. Is it ImageDiffer in the python code?
(In reply to comment #10) > There's something I don't understand. DRT is dumping to stdout a PNG created from an image with alpha flags 3 (kCGImageAlphaLast). However, when ImageDiff reads this in from stdin, it gets an image with alpha flags 5 (kCGImageAlphaNoneSkipLast). > > This is assuming that ImageDiff is not swapping the images by mistake. The code seems to read the actual before the expected: > > if (imageSize > 0 && !actualImage) > actualImage = createImageFromStdin(imageSize); > else if (imageSize > 0 && !baselineImage) > baselineImage = createImageFromStdin(imageSize); > > I'm not sure who's printing the expected image to stdin. Is it ImageDiffer in the python code? Yes. See base.py:353 and port/image_diff.py:60.
I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match: Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f Original image alpha flags: 2, new image alpha flags: 3 ImageDiff produced stderr output: image in imageDiff has hash af52930d6c48db65a6e957ad80869650 image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf
I'm pretty sure that _compare_image() is swapping the images by mistake: diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) def diff_image(self, expected_contents, actual_contents, tolerance): Note flipped order of params.
Re: (In reply to comment #12) > I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match: > > Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f > Original image alpha flags: 2, new image alpha flags: 3 > ImageDiff produced stderr output: > image in imageDiff has hash af52930d6c48db65a6e957ad80869650 > image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf Ah, they do match, I was just getting my output out of order.. So the reason for the pixel comparison failures is that the expected image result files which have been committed by chromium people lack alpha. For example, LayoutTests/compositing/culling/clear-fixed-iframe-expected.png, committed in http://trac.webkit.org/changeset/109514, has no alpha. We need to make sure that expected image files generated on different platforms match. I have not yet looked at ref test failures.
Fix for flipped actual/expected; diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py index 7379d97..45b47f1 100644 --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py @@ -264,7 +264,7 @@ class SingleTestRunner(object): elif not expected_driver_output.image_hash: failures.append(test_failures.FailureMissingImageHash()) elif driver_output.image_hash != expected_driver_output.image_hash: - diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) + diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image) err_str = diff_result[2] # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't
Filed bug 94567 to land that patch.
Giving this bug to Dirk to ensure that Chromium tools dump PNG images with alpha. I also note that the color profiles in the Chromium-supplied images don't match the Mac ones. Maybe we should give up on the notion of platform-agnostic expected.png files?
(In reply to comment #15) > Fix for flipped actual/expected; > > diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > index 7379d97..45b47f1 100644 > --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > @@ -264,7 +264,7 @@ class SingleTestRunner(object): > elif not expected_driver_output.image_hash: > failures.append(test_failures.FailureMissingImageHash()) > elif driver_output.image_hash != expected_driver_output.image_hash: > - diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) > + diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image) > err_str = diff_result[2] > # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and > # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't So, single_test_runner currently passes expected, actual to base.Port.diff_image(), which is expecting actual, expected :(. However, base.Port.diff_image then flips around and calls calls image_differ(expected, actual), which writes actual, then expected to ImageDiff, which then reads actual, then expected. For reference, the signature in ChromiumPort.diff_image is (expected, actual). So, single_test_runner is consistent with ChromiumPort, which isn't surprising, since chromium actually uses pixel diffs regularly :) In short we should leave single_test_runner alone and fix base.py because it's the only one that isn't (expected, actual) in the python code (and I would argue that expected, actual is the order that make sense insofar as it is consistent with "diff old new"; I realize the non-chromium ImageDiff appears to want "diff new old", but that seems wrong.
(In reply to comment #14) > Re: (In reply to comment #12) > > I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match: > > > > Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f > > Original image alpha flags: 2, new image alpha flags: 3 > > ImageDiff produced stderr output: > > image in imageDiff has hash af52930d6c48db65a6e957ad80869650 > > image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf > > Ah, they do match, I was just getting my output out of order.. > > So the reason for the pixel comparison failures is that the expected image result files which have been committed by chromium people lack alpha. For example, LayoutTests/compositing/culling/clear-fixed-iframe-expected.png, committed in http://trac.webkit.org/changeset/109514, has no alpha. > > We need to make sure that expected image files generated on different platforms match. > Adding Tony to this, since I've discussed this issue with him before. It is true that different ports have different interpretations of what the PNG format is supposed to be, and we are inconsistent. Given the aggressiveness of the baseline optimizer, this is probably bad, but I don't think this is insoluble. I think we just need to standardize on how we want to store (and compare) PNGs. If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. IIRC, there's a reason the chromium port doesn't store alphas; it has something to do with the differences between our win and linux ports. > I have not yet looked at ref test failures.
I didn't find any issues with ref tests on Mountain Lion. When running nrwt with --pixel-tests (to force ref testing) on ML, things worked fine.
Also, I thought premultiplied alpha meant that the alpha was multiplied first? so, if you have #fff with an alpha of 0.5, do you store 0x80808080 or 0x00808080 ? I've never really understood this premultiplied algorithm ...
(In reply to comment #20) > I didn't find any issues with ref tests on Mountain Lion. When running nrwt with --pixel-tests (to force ref testing) on ML, things worked fine. How can that be? Aren't the same images being compared on Lion and ML?
(In reply to comment #18) > (In reply to comment #15) > > Fix for flipped actual/expected; > > > > diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > > index 7379d97..45b47f1 100644 > > --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > > +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > > @@ -264,7 +264,7 @@ class SingleTestRunner(object): > > elif not expected_driver_output.image_hash: > > failures.append(test_failures.FailureMissingImageHash()) > > elif driver_output.image_hash != expected_driver_output.image_hash: > > - diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) > > + diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image) > > err_str = diff_result[2] > > # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and > > # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't > > So, single_test_runner currently passes expected, actual to base.Port.diff_image(), which is expecting actual, expected :(. However, base.Port.diff_image then flips around and calls calls image_differ(expected, actual), which writes actual, then expected to ImageDiff, which then reads actual, then expected. For reference, the signature in ChromiumPort.diff_image is (expected, actual). So, single_test_runner is consistent with ChromiumPort, which isn't surprising, since chromium actually uses pixel diffs regularly :) > > In short we should leave single_test_runner alone and fix base.py because it's the only one that isn't (expected, actual) in the python code (and I would argue that expected, actual is the order that make sense insofar as it is consistent with "diff old new"; I realize the non-chromium ImageDiff appears to want "diff new old", but that seems wrong. Argh. the code as currently checked in calls diff_image(actual, expected), of course. Basically we need to trace through the whole path to make sure they're consistent :(.
(In reply to comment #23) > Argh. the code as currently checked in calls diff_image(actual, expected), of course. Basically we need to trace through the whole path to make sure they're consistent :(. Named arguments FTW. (and this might not be a bad time to adopt them here!)
(In reply to comment #19) > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. I thought this was also slow. Seems like it would be. > IIRC, there's a reason the chromium port doesn't store alphas; it has something to do with the differences between our win and linux ports. http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/chromium/TestShell.cpp#L660 I didn't write the comment. I'm not sure what about text/form control drawing causes problems (brettw may know or remember).
(In reply to comment #25) > (In reply to comment #19) > > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. > > I thought this was also slow. Seems like it would be. > Yeah, probably. We'd need to benchmark it to see if there is a real impact, since we only do the diffs when the hashes don't match.
(In reply to comment #26) > (In reply to comment #25) > > (In reply to comment #19) > > > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. > > > > I thought this was also slow. Seems like it would be. > > > > Yeah, probably. We'd need to benchmark it to see if there is a real impact, since we only do the diffs when the hashes don't match. The perf of ImageDiff matters a lot less in a world in which we check in expected failures. Then the only times we'd be running this are when there are regressions on ToT or for flaky pixel failures. Seems unlikely to me that this would have a huge perf impact on the time of running the full test suite. But, there's never a harm in measuring.
I'm not actually planning to fix this in the near future, so someone else should feel free to take a look.
Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone?
(In reply to comment #29) > Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone? I have not, but I think we will eventually.
(In reply to comment #30) > (In reply to comment #29) > > Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone? > > I have not, but I think we will eventually. Okay, I'm gonna reassign this to you if that's okay, unless there's something I can do. It wouldn't be hard to go through and find all the bad PNGs and replace them, for example.