149828 – Allow a way to specify a custom pixel tolerance in a ref test

RESOLVED FIXED149828

Allow a way to specify a custom pixel tolerance in a ref test

https://bugs.webkit.org/show_bug.cgi?id=149828

Summary Allow a way to specify a custom pixel tolerance in a ref test

Simon Fraser (smfr)

Reported 2015-10-05 20:45:40 PDT

Allow a test to specify a custom pixel tolerance in a ref test

Attachments
Patch (39.62 KB, patch) 2015-10-05 21:07 PDT, Simon Fraser (smfr)	ap: review- buildbot: commit-queue-	Details Formatted Diff Diff
Archive of layout-test-results from ews103 for mac-mavericks (635.35 KB, application/zip) 2015-10-05 21:39 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews106 for mac-mavericks-wk2 (698.24 KB, application/zip) 2015-10-05 21:44 PDT, Build Bot	no flags	Details
Patch (12.15 KB, patch) 2021-10-27 21:04 PDT, Simon Fraser (smfr)	mrobinson: review+ ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch (14.05 KB, patch) 2021-11-04 10:54 PDT, Simon Fraser (smfr)	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Simon Fraser (smfr)

Comment 1 2015-10-05 21:07:05 PDT

Created attachment 262492 [details] Patch

Build Bot

Comment 2 2015-10-05 21:39:32 PDT

Comment on attachment 262492 [details] Patch Attachment 262492 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/249175 New failing tests: compositing/overlap-blending/children-opacity-no-overlap.html compositing/overlap-blending/children-opacity-huge.html compositing/overlap-blending/nested-non-overlap-clipping.html

Build Bot

Comment 3 2015-10-05 21:39:36 PDT

Created attachment 262496 [details] Archive of layout-test-results from ews103 for mac-mavericks The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-mavericks Platform: Mac OS X 10.9.5

Build Bot

Comment 4 2015-10-05 21:44:17 PDT

Comment on attachment 262492 [details] Patch Attachment 262492 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/249179 New failing tests: fast/images/gif-loop-count.html compositing/overlap-blending/children-opacity-no-overlap.html compositing/overlap-blending/nested-non-overlap-clipping.html compositing/overlap-blending/children-opacity-huge.html fast/images/animated-gif-no-layout.html

Build Bot

Comment 5 2015-10-05 21:44:20 PDT

Created attachment 262497 [details] Archive of layout-test-results from ews106 for mac-mavericks-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-mavericks-wk2 Platform: Mac OS X 10.9.5

alan

Comment 6 2015-10-05 21:45:09 PDT

I like this a lot!

Alexey Proskuryakov

Comment 7 2015-10-05 21:53:26 PDT

Comment on attachment 262492 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=262492&action=review > Tools/ChangeLog:4 > + Allow a test to specify a custom pixel tolerance in a ref test > + https://bugs.webkit.org/show_bug.cgi?id=149828 I still do not think that that this is desirable. It is very difficult or impossible to reason about what tolerance is acceptable for a test, or what it even means to have a particular tolerance. I have no idea what 0.004 means, and I expect that this default makes some existing tests ineffective.

Simon Fraser (smfr)

Comment 8 2015-10-05 23:15:41 PDT

(In reply to comment #7) > It is very difficult or impossible to reason about what tolerance is > acceptable for a test, or what it even means to have a particular tolerance. > I have no idea what 0.004 means, and I expect that this default makes some > existing tests ineffective. So you're OK with the arbitrary small tolerance that ImageDiff currently allows? The tolerance value here is the same value that ImageDiff returns, which we've been reporting in test results for ages. ImageDiff explains it as "The difference as a percentage combining both the number of different pixels and their difference amount i.e. the average distance over the entire image". Without this, it's impossible to do some kinds of testing.

Alexey Proskuryakov

Comment 9 2015-10-05 23:21:16 PDT

> So you're OK with the arbitrary small tolerance that ImageDiff currently allows? Not really. It's also confusing, and the formula used for calculating the difference doesn't seem to make sense.

Alexey Proskuryakov

Comment 10 2015-10-05 23:22:41 PDT

Adding a way to manually control the tolerance is a step back though, as it adds a new dimension to possible mistakes. Let's discuss what kinds of testing are impossible, maybe there is another way to implement those.

Carlos Alberto Lopez Perez

Comment 11 2015-11-13 10:09:03 PST

(In reply to comment #10) > Adding a way to manually control the tolerance is a step back though, as it > adds a new dimension to possible mistakes. > > Let's discuss what kinds of testing are impossible, maybe there is another > way to implement those. An idea that comes to mind is that we can add support to specify on the TestExpectations files a tolerance for a given test. That way different ports can specify different tolerances or not tolerance at all for a test.

Blaze Burg

Comment 12 2015-11-18 23:26:25 PST

(In reply to comment #11) > (In reply to comment #10) > > Adding a way to manually control the tolerance is a step back though, as it > > adds a new dimension to possible mistakes. > > > > Let's discuss what kinds of testing are impossible, maybe there is another > > way to implement those. > > An idea that comes to mind is that we can add support to specify on the > TestExpectations files a tolerance for a given test. That way different > ports can specify different tolerances or not tolerance at all for a test. That seems even more error-prone that the test author changing the tolerance, since a tree maintainer may have no idea why tolerance is relevant in the specific scenario. And, increasing tolerance enough will make the test pass regardless of whether there's a real bug, which is bad. Also, TestExpectations suck as a way of annotating tests with runtime parameters. It's bad enough trying to differentiate image vs test and other outcomes.

Dean Jackson

Comment 13 2017-04-25 10:28:07 PDT

We need something like this for testing gradients. Floating point rounding gives us tiny differences in ref tests that are perfectly acceptable. Our alternative is to effectively implement the tolerance in the test, the way we now do for gradients in the canvas/philip suite. This is much more error prone, and requires duplication.

Carlos Alberto Lopez Perez

Comment 14 2017-04-25 10:42:09 PDT

(In reply to Alexey Proskuryakov from comment #7) > Comment on attachment 262492 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=262492&action=review > > > Tools/ChangeLog:4 > > + Allow a test to specify a custom pixel tolerance in a ref test > > + https://bugs.webkit.org/show_bug.cgi?id=149828 > > I still do not think that that this is desirable. > > It is very difficult or impossible to reason about what tolerance is > acceptable for a test, or what it even means to have a particular tolerance. > I have no idea what 0.004 means, and I expect that this default makes some > existing tests ineffective. Would you be ok with a solution that allows to specify the value for an allowed pixel tolerance in the TestExpectation file of each port and for each test? Something like that: imported/blink/fast/canvas/canvas-clip-stack-persistence-diffs.html [ PixelTolerance=0.05 ] fast/gradients [ PixelTolerance=0.001 ] That would mean that imported/blink/fast/canvas/canvas-clip-stack-persistence-diffs.html will pass if the diff on the image is <= 0.05 and that all test on fast/gradients will pass if the diff is <= 0.001 (supposing that there isn't any line below overriding that default on fast/gradients for any specific test under that set) That way we can set the value specifically for each test and port. So the one setting the value can reason about if it makes sense for that very specific test or not.

Alexey Proskuryakov

Comment 15 2017-04-25 10:53:44 PDT

We would need to clearly define what a tolerance value means. There are many bugs where the problem manifests as a single line or even a single pixel of entirely wrong color. Then there are many bugs where the problem is a slight color variance. The current algorithm for computing a distance doesn't account for that.

Carlos Alberto Lopez Perez

Comment 16 2017-04-25 11:07:41 PDT

(In reply to Alexey Proskuryakov from comment #15) > We would need to clearly define what a tolerance value means. > > There are many bugs where the problem manifests as a single line or even a > single pixel of entirely wrong color. Then there are many bugs where the > problem is a slight color variance. > > The current algorithm for computing a distance doesn't account for that. Another idea is that a reftest result can be overriding by a png file. That way the png result produced by the test-suite can be dropped into the LayouTests directory as needed: LayoutTests/imported/blink/fast/canvas/canvas-clip-stack-persistence.html LayoutTests/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.html LayoutTests/platform/gtk/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.png That would mean that on the GTK+ port the test imported/blink/fast/canvas/canvas-clip-stack-persistence.html would pass if it produces an image equal to the one generated by LayoutTests/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.html *or* if it produces an image equal to the one stored at LayoutTests/platform/gtk/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.png

Dean Jackson

Comment 17 2017-04-26 10:23:24 PDT

Radar WebKit Bug Importer

Comment 18 2020-04-30 16:13:33 PDT

<rdar://problem/62682186>

Simon Fraser (smfr)

Comment 19 2020-07-27 10:30:17 PDT

We should implement the WPT approach: http://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching

Simon Fraser (smfr)

Comment 20 2021-10-27 21:04:05 PDT

Created attachment 442670 [details] Patch

Martin Robinson

Comment 21 2021-10-28 10:20:25 PDT

Comment on attachment 442670 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=442670&action=review Does this cause any new test passes? It seems like it should at least in the WPT transforms directory. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:387 > + Nit: I think this

Simon Fraser (smfr)

Comment 22 2021-11-04 10:54:57 PDT

Created attachment 443320 [details] Patch

EWS

Comment 23 2021-12-21 10:35:58 PST

Committed r287323 (245473@main): <https://commits.webkit.org/245473@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 443320 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Simon Fraser (smfr)

Reported

2015-10-05 20:45 PDT

Modified

2022-02-15 17:19 PST History

CC List

20 users Show

URL

Keywords InRadar

Depends on

232399 232410 232522 232523 232525

Blocks

142258 146523 211261 213511 215324

Dependencies

tree graph

ap
bburg
bjonesbe
buildbot
clopez
commit-queue
dbates
dino
ews-watchlist
fred.wang
glenn
gsnedders
jbedard
mrobinson
rniwa
ryanhaddad
simon.fraser
thorton
webkit-bug-importer
zalan