Bug 92578

Summary: Lots of "error, test and reference image have different properties" in pixel test output because of alpha differences
Product: WebKit Reporter: Simon Fraser (smfr) <simon.fraser>
Component: Tools / TestsAssignee: Simon Fraser (smfr) <simon.fraser>
Status: ASSIGNED ---    
Severity: Normal CC: abarth, beidson, dpranke, kbalazs, simon.fraser, thorton, tony
Priority: P2 Keywords: NRWT
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Improve error reporting none

Description Simon Fraser (smfr) 2012-07-28 10:47:01 PDT
ImageDiff is spewing:
"error, test and reference image have different properties."

It's very unclear what this means.
Comment 1 Ojan Vafai 2012-07-31 12:24:16 PDT
Can you point to an example (i.e. on the bots)? I don't see this when I run locally.
Comment 2 Simon Fraser (smfr) 2012-07-31 12:55:56 PDT
I got it when running pixel tests on Mountain Lion. Perhaps it's specific to that OS.
Comment 3 Dirk Pranke 2012-08-01 19:26:05 PDT
I've just filed bug 92934 to propagate errors back from ImageDiff. This won't address whatever your root problem is here, but it'll make it a whole lot less ignorable :).
Comment 4 Simon Fraser (smfr) 2012-08-20 17:01:18 PDT
I can still reproduce. My command is:
./Tools/Scripts/run-webkit-tests --debug --pixel-tests LayoutTests/compositing/

Sample output:

  compositing/animation/busy-indicator.html -> pixel hash failed (but pixel test still passes)
[8/278] compositing/color-matching/image-color-matching.html failed unexpectedly (image diff)
  compositing/culling/clear-fixed-iframe.html : ImageDiff produced stderr output:
error, test and reference image have different properties.
Comment 5 Simon Fraser (smfr) 2012-08-20 17:13:16 PDT
Created attachment 159567 [details]
Improve error reporting
Comment 6 Simon Fraser (smfr) 2012-08-20 17:17:21 PDT
Comment on attachment 159567 [details]
Improve error reporting

http://trac.webkit.org/changeset/126101
Comment 7 Simon Fraser (smfr) 2012-08-20 17:22:47 PDT
So now I get:
  compositing/culling/clear-fixed-iframe.html : ImageDiff produced stderr output:
Error: test and reference images differ in alpha. Test image does not have alpha, reference image has alpha.

 12$ $ sips -g all ./LayoutTests/compositing/culling/clear-fixed-iframe-expected.png 
/Volumes/DataSSD/Development/apple/webkit/WebKit.git/LayoutTests/compositing/culling/clear-fixed-iframe-expected.png
  pixelWidth: 800
  pixelHeight: 600
  typeIdentifier: public.png
  format: png
  formatOptions: default
  dpiWidth: 72.000
  dpiHeight: 72.000
  samplesPerPixel: 3
  bitsPerSample: 8
  hasAlpha: no
  space: RGB

So hasAlpha is NO in the PNG, but ImageDiff thinks it has alpha.
Comment 8 Simon Fraser (smfr) 2012-08-20 17:28:59 PDT
But CGImageGetAlphaInfo() is saying that the image alpha flags are kCGImageAlphaLast, implying non-premultplied RGBA.
Comment 9 Simon Fraser (smfr) 2012-08-20 18:30:05 PDT
createBitmapContext() creates a bitmap context with (kCGImageAlphaPremultipliedFirst | kCGBitmapByteOrder32Host), so I think we expect all the images to have alpha.
Comment 10 Simon Fraser (smfr) 2012-08-20 18:53:55 PDT
There's something I don't understand. DRT is dumping to stdout a PNG created from an image with alpha flags 3 (kCGImageAlphaLast). However, when ImageDiff reads this in from stdin, it gets an image with alpha flags 5 (kCGImageAlphaNoneSkipLast).

This is assuming that ImageDiff is not swapping the images by mistake. The code seems to read the actual before the expected:

            if (imageSize > 0 && !actualImage)
                actualImage = createImageFromStdin(imageSize);
            else if (imageSize > 0 && !baselineImage)
                baselineImage = createImageFromStdin(imageSize);

I'm not sure who's printing the expected image to stdin. Is it ImageDiffer in the python code?
Comment 11 Dirk Pranke 2012-08-20 19:15:29 PDT
(In reply to comment #10)
> There's something I don't understand. DRT is dumping to stdout a PNG created from an image with alpha flags 3 (kCGImageAlphaLast). However, when ImageDiff reads this in from stdin, it gets an image with alpha flags 5 (kCGImageAlphaNoneSkipLast).
> 
> This is assuming that ImageDiff is not swapping the images by mistake. The code seems to read the actual before the expected:
> 
>             if (imageSize > 0 && !actualImage)
>                 actualImage = createImageFromStdin(imageSize);
>             else if (imageSize > 0 && !baselineImage)
>                 baselineImage = createImageFromStdin(imageSize);
> 
> I'm not sure who's printing the expected image to stdin. Is it ImageDiffer in the python code?

Yes. See base.py:353 and port/image_diff.py:60.
Comment 12 Simon Fraser (smfr) 2012-08-20 19:24:25 PDT
I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match:

Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f
Original image alpha flags: 2, new image alpha flags: 3
ImageDiff produced stderr output:
image in imageDiff has hash af52930d6c48db65a6e957ad80869650
image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf
Comment 13 Simon Fraser (smfr) 2012-08-20 20:19:18 PDT
I'm pretty sure that _compare_image() is swapping the images by mistake:

            diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image)

    def diff_image(self, expected_contents, actual_contents, tolerance):

Note flipped order of params.
Comment 14 Simon Fraser (smfr) 2012-08-20 20:24:02 PDT
Re: (In reply to comment #12)
> I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match:
> 
> Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f
> Original image alpha flags: 2, new image alpha flags: 3
> ImageDiff produced stderr output:
> image in imageDiff has hash af52930d6c48db65a6e957ad80869650
> image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf

Ah, they do match, I was just getting my output out of order..

So the reason for the pixel comparison failures is that the expected image result files which have been committed by chromium people lack alpha. For example, LayoutTests/compositing/culling/clear-fixed-iframe-expected.png, committed in http://trac.webkit.org/changeset/109514, has no alpha.

We need to make sure that expected image files generated on different platforms match.

I have not yet looked at ref test failures.
Comment 15 Simon Fraser (smfr) 2012-08-20 20:24:35 PDT
Fix for flipped actual/expected;

diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
index 7379d97..45b47f1 100644
--- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
+++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
@@ -264,7 +264,7 @@ class SingleTestRunner(object):
         elif not expected_driver_output.image_hash:
             failures.append(test_failures.FailureMissingImageHash())
         elif driver_output.image_hash != expected_driver_output.image_hash:
-            diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image)
+            diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image)
             err_str = diff_result[2]
             # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and
             # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't
Comment 16 Simon Fraser (smfr) 2012-08-20 20:26:09 PDT
Filed bug 94567 to land that patch.
Comment 17 Simon Fraser (smfr) 2012-08-20 20:28:39 PDT
Giving this bug to Dirk to ensure that Chromium tools dump PNG images with alpha. I also note that the color profiles in the Chromium-supplied images don't match the Mac ones.

Maybe we should give up on the notion of platform-agnostic expected.png files?
Comment 18 Dirk Pranke 2012-08-20 21:48:42 PDT
(In reply to comment #15)
> Fix for flipped actual/expected;
> 
> diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> index 7379d97..45b47f1 100644
> --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> @@ -264,7 +264,7 @@ class SingleTestRunner(object):
>          elif not expected_driver_output.image_hash:
>              failures.append(test_failures.FailureMissingImageHash())
>          elif driver_output.image_hash != expected_driver_output.image_hash:
> -            diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image)
> +            diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image)
>              err_str = diff_result[2]
>              # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and
>              # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't

So, single_test_runner currently passes expected, actual to base.Port.diff_image(), which is expecting actual, expected :(. However, base.Port.diff_image then flips around and calls  calls image_differ(expected, actual), which writes actual, then expected to ImageDiff, which then reads actual, then expected. For reference, the signature in ChromiumPort.diff_image is (expected, actual). So, single_test_runner is consistent with ChromiumPort, which isn't surprising, since chromium actually uses pixel diffs regularly :)

In short we should leave single_test_runner alone and fix base.py because it's the only one that isn't (expected, actual) in the python code (and I would argue that expected, actual is the order that make sense insofar as it is consistent with "diff old new"; I realize the non-chromium ImageDiff appears to want "diff new old", but that seems wrong.
Comment 19 Dirk Pranke 2012-08-20 21:53:27 PDT
(In reply to comment #14)
> Re: (In reply to comment #12)
> > I'm checksumming the data spat out by DRT (after the checksum injection), and the data ingested by ImageDiff, and they don't seem to match:
> > 
> > Dumped PNG has hash efb0c7c9967868bc58f1c9c989f6878f
> > Original image alpha flags: 2, new image alpha flags: 3
> > ImageDiff produced stderr output:
> > image in imageDiff has hash af52930d6c48db65a6e957ad80869650
> > image in imageDiff has hash 6f08bd8f8505e7b86567159012c459cf
> 
> Ah, they do match, I was just getting my output out of order..
> 
> So the reason for the pixel comparison failures is that the expected image result files which have been committed by chromium people lack alpha. For example, LayoutTests/compositing/culling/clear-fixed-iframe-expected.png, committed in http://trac.webkit.org/changeset/109514, has no alpha.
> 
> We need to make sure that expected image files generated on different platforms match.
> 

Adding Tony to this, since I've discussed this issue with him before. 

It is true that different ports have different interpretations of what the PNG format is supposed to be, and we are inconsistent. Given the aggressiveness of the baseline optimizer, this is probably bad, but I don't think this is insoluble. I think we just need to standardize on how we want to store (and compare) PNGs.

If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions.

IIRC, there's a reason the chromium port doesn't store alphas; it has something to do with the differences between our win and linux ports.
> I have not yet looked at ref test failures.
Comment 20 Simon Fraser (smfr) 2012-08-20 21:54:48 PDT
I didn't find any issues with ref tests on Mountain Lion. When running nrwt with --pixel-tests (to force ref testing) on ML, things worked fine.
Comment 21 Dirk Pranke 2012-08-20 21:57:37 PDT
Also, I thought premultiplied alpha meant that the alpha was multiplied first? so, if you have #fff with an alpha of 0.5, do you store 0x80808080 or 0x00808080 ? I've never really understood this premultiplied algorithm ...
Comment 22 Dirk Pranke 2012-08-20 21:58:15 PDT
(In reply to comment #20)
> I didn't find any issues with ref tests on Mountain Lion. When running nrwt with --pixel-tests (to force ref testing) on ML, things worked fine.

How can that be? Aren't the same images being compared on Lion and ML?
Comment 23 Dirk Pranke 2012-08-20 22:01:34 PDT
(In reply to comment #18)
> (In reply to comment #15)
> > Fix for flipped actual/expected;
> > 
> > diff --git a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> > index 7379d97..45b47f1 100644
> > --- a/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> > +++ b/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py
> > @@ -264,7 +264,7 @@ class SingleTestRunner(object):
> >          elif not expected_driver_output.image_hash:
> >              failures.append(test_failures.FailureMissingImageHash())
> >          elif driver_output.image_hash != expected_driver_output.image_hash:
> > -            diff_result = self._port.diff_image(driver_output.image, expected_driver_output.image)
> > +            diff_result = self._port.diff_image(expected_driver_output.image, driver_output.image)
> >              err_str = diff_result[2]
> >              # FIXME: see https://bugs.webkit.org/show_bug.cgi?id=94277 and
> >              # https://bugs.webkit.org/show_bug.cgi?id=81962; ImageDiff doesn't
> 
> So, single_test_runner currently passes expected, actual to base.Port.diff_image(), which is expecting actual, expected :(. However, base.Port.diff_image then flips around and calls  calls image_differ(expected, actual), which writes actual, then expected to ImageDiff, which then reads actual, then expected. For reference, the signature in ChromiumPort.diff_image is (expected, actual). So, single_test_runner is consistent with ChromiumPort, which isn't surprising, since chromium actually uses pixel diffs regularly :)
> 
> In short we should leave single_test_runner alone and fix base.py because it's the only one that isn't (expected, actual) in the python code (and I would argue that expected, actual is the order that make sense insofar as it is consistent with "diff old new"; I realize the non-chromium ImageDiff appears to want "diff new old", but that seems wrong.

Argh. the code as currently checked in calls diff_image(actual, expected), of course. Basically we need to trace through the whole path to make sure they're consistent :(.
Comment 24 Tim Horton 2012-08-20 22:11:08 PDT
(In reply to comment #23)
> Argh. the code as currently checked in calls diff_image(actual, expected), of course. Basically we need to trace through the whole path to make sure they're consistent :(.

Named arguments FTW. (and this might not be a bad time to adopt them here!)
Comment 25 Tony Chang 2012-08-21 12:12:05 PDT
(In reply to comment #19)
> If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions.

I thought this was also slow. Seems like it would be.

> IIRC, there's a reason the chromium port doesn't store alphas; it has something to do with the differences between our win and linux ports.

http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/chromium/TestShell.cpp#L660

I didn't write the comment.  I'm not sure what about text/form control drawing causes problems (brettw may know or remember).
Comment 26 Dirk Pranke 2012-08-21 13:18:45 PDT
(In reply to comment #25)
> (In reply to comment #19)
> > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions.
> 
> I thought this was also slow. Seems like it would be.
> 

Yeah, probably. We'd need to benchmark it to see if there is a real impact, since we only do the diffs when the hashes don't match.
Comment 27 Ojan Vafai 2012-08-21 13:47:26 PDT
(In reply to comment #26)
> (In reply to comment #25)
> > (In reply to comment #19)
> > > If we did that, we could probably get rid of the port-specific ImageDiff code altogether. I had looked into using pure-python png decoding but gave up after realizing we had different conventions.
> > 
> > I thought this was also slow. Seems like it would be.
> > 
> 
> Yeah, probably. We'd need to benchmark it to see if there is a real impact, since we only do the diffs when the hashes don't match.

The perf of ImageDiff matters a lot less in a world in which we check in expected failures. Then the only times we'd be running this are when there are regressions on ToT or for flaky pixel failures. Seems unlikely to me that this would have a huge perf impact on the time of running the full test suite. But, there's never a harm in measuring.
Comment 28 Dirk Pranke 2012-09-21 16:53:00 PDT
I'm not actually planning to fix this in the near future, so someone else should feel free to take a look.
Comment 29 Dirk Pranke 2013-04-15 18:44:32 PDT
Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone?
Comment 30 Simon Fraser (smfr) 2013-04-15 19:54:22 PDT
(In reply to comment #29)
> Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone?

I have not, but I think we will eventually.
Comment 31 Dirk Pranke 2013-04-15 21:17:35 PDT
(In reply to comment #30)
> (In reply to comment #29)
> > Simon is this still an issue, or have you fixed all of the non-alpha-having PNGs now that Chromium is gone?
> 
> I have not, but I think we will eventually.

Okay, I'm gonna reassign this to you if that's okay, unless there's something I can do. It wouldn't be hard to go through and find all the bad PNGs and replace them, for example.