92578 – Lots of "error, test and reference image have different properties" in pixel test output because of alpha differences

Simon Fraser (smfr)

Reported 2012-07-28 10:47:01 PDT

ImageDiff is spewing: "error, test and reference image have different properties." It's very unclear what this means.

Ojan Vafai

Comment 1 2012-07-31 12:24:16 PDT

Can you point to an example (i.e. on the bots)? I don't see this when I run locally.

Simon Fraser (smfr)

Comment 2 2012-07-31 12:55:56 PDT

I got it when running pixel tests on Mountain Lion. Perhaps it's specific to that OS.

Dirk Pranke

Comment 3 2012-08-01 19:26:05 PDT

I've just filed bug 92934 to propagate errors back from ImageDiff. This won't address whatever your root problem is here, but it'll make it a whole lot less ignorable :).

Simon Fraser (smfr)

Comment 4 2012-08-20 17:01:18 PDT

I can still reproduce. My command is: ./Tools/Scripts/run-webkit-tests --debug --pixel-tests LayoutTests/compositing/ Sample output: compositing/animation/busy-indicator.html -> pixel hash failed (but pixel test still passes) [8/278] compositing/color-matching/image-color-matching.html failed unexpectedly (image diff) compositing/culling/clear-fixed-iframe.html : ImageDiff produced stderr output: error, test and reference image have different properties.

Simon Fraser (smfr)

Comment 5 2012-08-20 17:13:16 PDT

Created attachment 159567 [details] Improve error reporting

Simon Fraser (smfr)

Comment 6 2012-08-20 17:17:21 PDT

Comment on attachment 159567 [details] Improve error reporting http://trac.webkit.org/changeset/126101

Simon Fraser (smfr)

Comment 7 2012-08-20 17:22:47 PDT

So now I get: compositing/culling/clear-fixed-iframe.html : ImageDiff produced stderr output: Error: test and reference images differ in alpha. Test image does not have alpha, reference image has alpha. 12$ $ sips -g all ./LayoutTests/compositing/culling/clear-fixed-iframe-expected.png /Volumes/DataSSD/Development/apple/webkit/WebKit.git/LayoutTests/compositing/culling/clear-fixed-iframe-expected.png pixelWidth: 800 pixelHeight: 600 typeIdentifier: public.png format: png formatOptions: default dpiWidth: 72.000 dpiHeight: 72.000 samplesPerPixel: 3 bitsPerSample: 8 hasAlpha: no space: RGB So hasAlpha is NO in the PNG, but ImageDiff thinks it has alpha.

Simon Fraser (smfr)

Comment 8 2012-08-20 17:28:59 PDT

But CGImageGetAlphaInfo() is saying that the image alpha flags are kCGImageAlphaLast, implying non-premultplied RGBA.

Simon Fraser (smfr)

Comment 9 2012-08-20 18:30:05 PDT

createBitmapContext() creates a bitmap context with (kCGImageAlphaPremultipliedFirst | kCGBitmapByteOrder32Host), so I think we expect all the images to have alpha.

Simon Fraser (smfr)

Comment 10 2012-08-20 18:53:55 PDT

There's something I don't understand. DRT is dumping to stdout a PNG created from an image with alpha flags 3 (kCGImageAlphaLast). However, when ImageDiff reads this in from stdin, it gets an image with alpha flags 5 (kCGImageAlphaNoneSkipLast). This is assuming that ImageDiff is not swapping the images by mistake. The code seems to read the actual before the expected: if (imageSize > 0 && !actualImage) actualImage = createImageFromStdin(imageSize); else if (imageSize > 0 && !baselineImage) baselineImage = createImageFromStdin(imageSize); I'm not sure who's printing the expected image to stdin. Is it ImageDiffer in the python code?

Dirk Pranke

Comment 11 2012-08-20 19:15:29 PDT

(In reply to comment #10) > There's something I don't understand. DRT is dumping to stdout a PNG created from an image with alpha flags 3 (kCGImageAlphaLast). However, when ImageDiff reads this in from stdin, it gets an image with alpha flags 5 (kCGImageAlphaNoneSkipLast). > > This is assuming that ImageDiff is not swapping the images by mistake. The code seems to read the actual before the expected: > > if (imageSize > 0 && !actualImage) > actualImage = createImageFromStdin(imageSize); > else if (imageSize > 0 && !baselineImage) > baselineImage = createImageFromStdin(imageSize); > > I'm not sure who's printing the expected image to stdin. Is it ImageDiffer in the python code? Yes. See base.py:353 and port/image_diff.py:60.

Simon Fraser (smfr)

Comment 12 2012-08-20 19:24:25 PDT

I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match: Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f Original image alpha flags: 2, new image alpha flags: 3 ImageDiff produced stderr output: image in imageDiff has hash af52930d6c48db65a6e957ad80869650 image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf

Simon Fraser (smfr)

Comment 13 2012-08-20 20:19:18 PDT

I'm pretty sure that _compare_image() is swapping the images by mistake: diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) def diff_image(self, expected_contents, actual_contents, tolerance): Note flipped order of params.

Simon Fraser (smfr)

Comment 14 2012-08-20 20:24:02 PDT

Re: (In reply to comment #12) > I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match: > > Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f > Original image alpha flags: 2, new image alpha flags: 3 > ImageDiff produced stderr output: > image in imageDiff has hash af52930d6c48db65a6e957ad80869650 > image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf Ah, they do match, I was just getting my output out of order.. So the reason for the pixel comparison failures is that the expected image result files which have been committed by chromium people lack alpha. For example, LayoutTests/compositing/culling/clear-fixed-iframe-expected.png, committed in http://trac.webkit.org/changeset/109514, has no alpha. We need to make sure that expected image files generated on different platforms match. I have not yet looked at ref test failures.

Simon Fraser (smfr)

Comment 15 2012-08-20 20:24:35 PDT

Fix for flipped actual/expected; diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py index 7379d97..45b47f1 100644 --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py @@ -264,7 +264,7 @@ class SingleTestRunner(object): elif not expected_driver_output.image_hash: failures.append(test_failures.FailureMissingImageHash()) elif driver_output.image_hash != expected_driver_output.image_hash: - diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) + diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image) err_str = diff_result[2] # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't

Simon Fraser (smfr)

Comment 16 2012-08-20 20:26:09 PDT

Filed bug 94567 to land that patch.

Simon Fraser (smfr)

Comment 17 2012-08-20 20:28:39 PDT

Giving this bug to Dirk to ensure that Chromium tools dump PNG images with alpha. I also note that the color profiles in the Chromium-supplied images don't match the Mac ones. Maybe we should give up on the notion of platform-agnostic expected.png files?

Dirk Pranke

Comment 18 2012-08-20 21:48:42 PDT

(In reply to comment #15) > Fix for flipped actual/expected; > > diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > index 7379d97..45b47f1 100644 > --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > @@ -264,7 +264,7 @@ class SingleTestRunner(object): > elif not expected_driver_output.image_hash: > failures.append(test_failures.FailureMissingImageHash()) > elif driver_output.image_hash != expected_driver_output.image_hash: > - diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) > + diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image) > err_str = diff_result[2] > # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and > # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't So, single_test_runner currently passes expected, actual to base.Port.diff_image(), which is expecting actual, expected :(. However, base.Port.diff_image then flips around and calls calls image_differ(expected, actual), which writes actual, then expected to ImageDiff, which then reads actual, then expected. For reference, the signature in ChromiumPort.diff_image is (expected, actual). So, single_test_runner is consistent with ChromiumPort, which isn't surprising, since chromium actually uses pixel diffs regularly :) In short we should leave single_test_runner alone and fix base.py because it's the only one that isn't (expected, actual) in the python code (and I would argue that expected, actual is the order that make sense insofar as it is consistent with "diff old new"; I realize the non-chromium ImageDiff appears to want "diff new old", but that seems wrong.

Dirk Pranke

Comment 19 2012-08-20 21:53:27 PDT

(In reply to comment #14) > Re: (In reply to comment #12) > > I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match: > > > > Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f > > Original image alpha flags: 2, new image alpha flags: 3 > > ImageDiff produced stderr output: > > image in imageDiff has hash af52930d6c48db65a6e957ad80869650 > > image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf > > Ah, they do match, I was just getting my output out of order.. > > So the reason for the pixel comparison failures is that the expected image result files which have been committed by chromium people lack alpha. For example, LayoutTests/compositing/culling/clear-fixed-iframe-expected.png, committed in http://trac.webkit.org/changeset/109514, has no alpha. > > We need to make sure that expected image files generated on different platforms match. > Adding Tony to this, since I've discussed this issue with him before. It is true that different ports have different interpretations of what the PNG format is supposed to be, and we are inconsistent. Given the aggressiveness of the baseline optimizer, this is probably bad, but I don't think this is insoluble. I think we just need to standardize on how we want to store (and compare) PNGs. If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. IIRC, there's a reason the chromium port doesn't store alphas; it has something to do with the differences between our win and linux ports. > I have not yet looked at ref test failures.

Simon Fraser (smfr)

Comment 20 2012-08-20 21:54:48 PDT

I didn't find any issues with ref tests on Mountain Lion. When running nrwt with --pixel-tests (to force ref testing) on ML, things worked fine.

Dirk Pranke

Comment 21 2012-08-20 21:57:37 PDT

Also, I thought premultiplied alpha meant that the alpha was multiplied first? so, if you have #fff with an alpha of 0.5, do you store 0x80808080 or 0x00808080 ? I've never really understood this premultiplied algorithm ...

Dirk Pranke

Comment 22 2012-08-20 21:58:15 PDT

(In reply to comment #20) > I didn't find any issues with ref tests on Mountain Lion. When running nrwt with --pixel-tests (to force ref testing) on ML, things worked fine. How can that be? Aren't the same images being compared on Lion and ML?

Dirk Pranke

Comment 23 2012-08-20 22:01:34 PDT

(In reply to comment #18) > (In reply to comment #15) > > Fix for flipped actual/expected; > > > > diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > > index 7379d97..45b47f1 100644 > > --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > > +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py > > @@ -264,7 +264,7 @@ class SingleTestRunner(object): > > elif not expected_driver_output.image_hash: > > failures.append(test_failures.FailureMissingImageHash()) > > elif driver_output.image_hash != expected_driver_output.image_hash: > > - diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image) > > + diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image) > > err_str = diff_result[2] > > # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and > > # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't > > So, single_test_runner currently passes expected, actual to base.Port.diff_image(), which is expecting actual, expected :(. However, base.Port.diff_image then flips around and calls calls image_differ(expected, actual), which writes actual, then expected to ImageDiff, which then reads actual, then expected. For reference, the signature in ChromiumPort.diff_image is (expected, actual). So, single_test_runner is consistent with ChromiumPort, which isn't surprising, since chromium actually uses pixel diffs regularly :) > > In short we should leave single_test_runner alone and fix base.py because it's the only one that isn't (expected, actual) in the python code (and I would argue that expected, actual is the order that make sense insofar as it is consistent with "diff old new"; I realize the non-chromium ImageDiff appears to want "diff new old", but that seems wrong. Argh. the code as currently checked in calls diff_image(actual, expected), of course. Basically we need to trace through the whole path to make sure they're consistent :(.

Tim Horton

Comment 24 2012-08-20 22:11:08 PDT

(In reply to comment #23) > Argh. the code as currently checked in calls diff_image(actual, expected), of course. Basically we need to trace through the whole path to make sure they're consistent :(. Named arguments FTW. (and this might not be a bad time to adopt them here!)

Tony Chang

Comment 25 2012-08-21 12:12:05 PDT

(In reply to comment #19) > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. I thought this was also slow. Seems like it would be. > IIRC, there's a reason the chromium port doesn't store alphas; it has something to do with the differences between our win and linux ports. http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/chromium/TestShell.cpp#L660 I didn't write the comment. I'm not sure what about text/form control drawing causes problems (brettw may know or remember).

Dirk Pranke

Comment 26 2012-08-21 13:18:45 PDT

(In reply to comment #25) > (In reply to comment #19) > > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. > > I thought this was also slow. Seems like it would be. > Yeah, probably. We'd need to benchmark it to see if there is a real impact, since we only do the diffs when the hashes don't match.

Ojan Vafai

Comment 27 2012-08-21 13:47:26 PDT

(In reply to comment #26) > (In reply to comment #25) > > (In reply to comment #19) > > > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions. > > > > I thought this was also slow. Seems like it would be. > > > > Yeah, probably. We'd need to benchmark it to see if there is a real impact, since we only do the diffs when the hashes don't match. The perf of ImageDiff matters a lot less in a world in which we check in expected failures. Then the only times we'd be running this are when there are regressions on ToT or for flaky pixel failures. Seems unlikely to me that this would have a huge perf impact on the time of running the full test suite. But, there's never a harm in measuring.

Dirk Pranke

Comment 28 2012-09-21 16:53:00 PDT

I'm not actually planning to fix this in the near future, so someone else should feel free to take a look.

Dirk Pranke

Comment 29 2013-04-15 18:44:32 PDT

Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone?

Simon Fraser (smfr)

Comment 30 2013-04-15 19:54:22 PDT

(In reply to comment #29) > Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone? I have not, but I think we will eventually.

Dirk Pranke

Comment 31 2013-04-15 21:17:35 PDT

(In reply to comment #30) > (In reply to comment #29) > > Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone? > > I have not, but I think we will eventually. Okay, I'm gonna reassign this to you if that's okay, unless there's something I can do. It wouldn't be hard to go through and find all the bad PNGs and replace them, for example.

Attachments
Improve error reporting (2.88 KB, patch) 2012-08-20 17:13 PDT, Simon Fraser (smfr)	no flags	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.