Bug 168221 - Nwtr unexpectedly passes mismatch ref test if the hashes doesn't match but no diff
Summary: Nwtr unexpectedly passes mismatch ref test if the hashes doesn't match but no...
Status: CLOSED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Fujii Hironori
URL:
Keywords:
Depends on: 168298
Blocks: 168033
  Show dependency treegraph
 
Reported: 2017-02-12 22:08 PST by Fujii Hironori
Modified: 2017-02-14 16:16 PST (History)
6 users (show)

See Also:


Attachments
Patch (2.61 KB, patch)
2017-02-12 22:21 PST, Fujii Hironori
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fujii Hironori 2017-02-12 22:08:46 PST
Nwtr unexpectedly passes mismatch ref test if the hashes doesn't match but no diff

There is a logic error in comparing mismatch ref test images.
In mismatch ref test images, it should be failed if two images has no diff.
But, if the hashes are different, nwtr unexpectedly pass the mismatch ref test.

It is difficult to test this bug.
First, you need to find a *match* ref test which does not match the hashes.
For example, in GTK port, I found a following test is the case:

> fast/css/sticky/sticky-left-percentage.html -> ref test hashes didn't match but diff passed

Next, convert this match ref test to a mismatch ref test:

> mv LayoutTests/fast/css/sticky/sticky-left-percentage-expected{,-mismatch}.html

And, run the test. In this time, it should fail:

> $ ./Tools/Scripts/run-webkit-tests --gtk --release --no-new-test-results fast/css/sticky/sticky-left-percentage.html -v

But, passed:

> [1/1] fast/css/sticky/sticky-left-percentage.html passed
Comment 1 Fujii Hironori 2017-02-12 22:21:28 PST
Created attachment 301332 [details]
Patch
Comment 2 Alexey Proskuryakov 2017-02-13 10:06:48 PST
Comment on attachment 301332 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301332&action=review

> Tools/ChangeLog:3
> +        Nwtr unexpectedly passes mismatch ref test if the hashes doesn't match but no diff

Why does the logic for expected mismatch run ImageDiff in this case?

I'd expect the test to pass as soon as there is a hash mismatch.
Comment 3 Ryosuke Niwa 2017-02-13 11:35:03 PST
Comment on attachment 301332 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301332&action=review

>> Tools/ChangeLog:3
>> +        Nwtr unexpectedly passes mismatch ref test if the hashes doesn't match but no diff
> 
> Why does the logic for expected mismatch run ImageDiff in this case?
> 
> I'd expect the test to pass as soon as there is a hash mismatch.

The problem is that sometimes two images look identical even though hashes are different.
e.g. due to anti-aliasing differences on Mac when a layer is composed.
Comment 4 WebKit Commit Bot 2017-02-13 12:00:15 PST
Comment on attachment 301332 [details]
Patch

Clearing flags on attachment: 301332

Committed r212237: <http://trac.webkit.org/changeset/212237>
Comment 5 WebKit Commit Bot 2017-02-13 12:00:22 PST
All reviewed patches have been landed.  Closing bug.
Comment 6 Alexey Proskuryakov 2017-02-13 12:02:34 PST
> The problem is that sometimes two images look identical even though hashes are different.
> e.g. due to anti-aliasing differences on Mac when a layer is composed.

I understand this, but that doesn't answer my question. I still think that this patch is not the right approach to the issue.
Comment 7 Ryosuke Niwa 2017-02-13 12:08:38 PST
(In reply to comment #6)
> > The problem is that sometimes two images look identical even though hashes are different.
> > e.g. due to anti-aliasing differences on Mac when a layer is composed.
> 
> I understand this, but that doesn't answer my question. I still think that
> this patch is not the right approach to the issue.

What do you think will be the right fix?
Comment 8 Alexey Proskuryakov 2017-02-13 12:26:13 PST
I think that we should report pass as soon as there is a hash difference. We already know that there is some mismatch at this point, and it's not the tool's responsibility to make guesses about how much of a difference is really a difference.
Comment 9 Ryosuke Niwa 2017-02-13 13:38:29 PST
(In reply to comment #8)
> I think that we should report pass as soon as there is a hash difference. We
> already know that there is some mismatch at this point, and it's not the
> tool's responsibility to make guesses about how much of a difference is
> really a difference.

The problem is that tests with mismatch expectation MUST fail when two images look the same, and hashes don't seem to be a good indicator of whether they're the same or not since many of the existing ref tests with match expectation only pass under image-diff, meaning that many tests have different hashes from their respective expected results. Conversely, it is then possible for many tests with mismatch expectation to have different hashes from expected results yet look identical (failing condition).
Comment 10 Alexey Proskuryakov 2017-02-13 14:22:57 PST
I don't think that mismatch is about looking the same. For example, if you have a mismatch test that confirms that two different forms of antialiasing are not identical, the two results will look almost the same.

Do you have a specific counter example where running an image diff would be helpful? We don't have that many "failed hash but passed diff" tests to serve as a generic counter example.
Comment 11 Fujii Hironori 2017-02-13 16:58:08 PST
I don't understand your question correctly. I have a question.
Do you mean you are against using ImageDiff only for mismatch ref tests, even though you think it's ok to use ImageDiff for match ref test?
Comment 12 Fujii Hironori 2017-02-13 21:05:16 PST
There were some ideas suggested here:

https://lists.webkit.org/pipermail/webkit-dev/2015-January/027179.html

* Add ImageDiff or ImageHashMismatch expectation
* Expose a new internals or testRunner methods to mark a test such
Comment 13 Fujii Hironori 2017-02-13 22:42:05 PST
  Bug 94704 – REGRESSION(r126189): Reftest mismatches are (again) run through ImageDiff with 0.1 tolerance

tolerance=0.1 is given to pixel tests, and tolerance=0 for ref tests.
Pixels should be equal in ref tests.
Comment 14 Ryosuke Niwa 2017-02-13 22:52:19 PST
(In reply to comment #13)
>   Bug 94704 – REGRESSION(r126189): Reftest mismatches are (again) run
> through ImageDiff with 0.1 tolerance
> 
> tolerance=0.1 is given to pixel tests, and tolerance=0 for ref tests.
> Pixels should be equal in ref tests.

Wait, if that's the case how could this code path ever be useful?
Comment 15 Fujii Hironori 2017-02-13 23:09:23 PST
I had a big mistake.
I was using fast/css/sticky/sticky-left-percentage.html for testing (comment 0).
But this test has other problem (bug 168033 comment 7).
Comment 16 Fujii Hironori 2017-02-13 23:11:56 PST
(In reply to comment #14)
> Wait, if that's the case how could this code path ever be useful?

That's right.
Calling diff_image with tolerance=0 is useless.
Checking checksum seems enough.
Comment 17 Alexey Proskuryakov 2017-02-13 23:17:28 PST
Note that there is some tolerance forced by ImageDiffCG.cpp even when tolerance=0 is requested. That may be not easy to clean up.
Comment 18 Fujii Hironori 2017-02-13 23:20:21 PST
(In reply to comment #17)
> Note that there is some tolerance forced by ImageDiffCG.cpp even when
> tolerance=0 is requested. That may be not easy to clean up.

I can not find such code in ImageDiffCG.cpp. Could you tell me which code?
Comment 19 Alexey Proskuryakov 2017-02-13 23:56:01 PST
Distances lower than 1/255 are ignored:

            if (distance >= 1.0f / 255.0f) {
                count += 1.0f;
                sum += distance;
                if (distance > maxDistance)
                    maxDistance = distance;
            }
Comment 20 Fujii Hironori 2017-02-14 01:11:25 PST
(In reply to comment #19)
> Distances lower than 1/255 are ignored:
> 
>             if (distance >= 1.0f / 255.0f) {
>                 count += 1.0f;
>                 sum += distance;
>                 if (distance > maxDistance)
>                     maxDistance = distance;
>             }

Thank you so much.

This color distance threshold is needed for match ref tests, but
for mismatch ref tests at the moment.

Should I stop calling image_diff and just do checksum checking
for mismatch ref tests?
Comment 21 Fujii Hironori 2017-02-14 02:10:29 PST
(In reply to comment #20)
> Should I stop calling image_diff and just do checksum checking
> for mismatch ref tests?

I checked some mismatch ref tests to see how do they expect. Most
of them expect substantial mismatch. But, a following test
expects tiny mismatch. I understand Alexey's comment 10.

  css3/filters/drop-shadow-blur-radius.html

Original code before my change <http://trac.webkit.org/changeset/212237>
was actually just doing checksum checking for mismatch ref tests.
So, I prefer rolling back r212237 and closing this bug as invalid.
I'll remove calling image_diff for mismatch ref tests in other bug.

I don't have a permission to roll back. Could you roll back r212237?
Comment 22 WebKit Commit Bot 2017-02-14 02:48:45 PST
Re-opened since this is blocked by bug 168298
Comment 23 Fujii Hironori 2017-02-14 05:24:17 PST
Thank you, Ryosuke.
Closed this bug as INVALID.
Comment 24 Alexey Proskuryakov 2017-02-14 09:21:21 PST
I do think that there is a bug here though, as you've demonstrated how the logic is clearly wrong. It just seems better to not run ImageDiff in this case.
Comment 25 Simon Fraser (smfr) 2017-02-14 11:01:57 PST
There's still a valid need to have ref tests that are a close, but not exact match (for example, software-rendered clipping vs. GPU-rendered clipping).
Comment 26 Fujii Hironori 2017-02-14 16:14:39 PST
(In reply to comment #24)
> I do think that there is a bug here though, as you've demonstrated how the
> logic is clearly wrong. It just seems better to not run ImageDiff in this
> case.

I agree and I'll do it (comment 21).
Comment 27 Fujii Hironori 2017-02-14 16:16:49 PST
(In reply to comment #25)
> There's still a valid need to have ref tests that are a close, but not exact
> match (for example, software-rendered clipping vs. GPU-rendered clipping).

Right.
If someone will add a new mismatch ref test which needs the color distance threshold, it will be the counter example which Alexey requested (comment 10).