WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
149828
Allow a way to specify a custom pixel tolerance in a ref test
https://bugs.webkit.org/show_bug.cgi?id=149828
Summary
Allow a way to specify a custom pixel tolerance in a ref test
Simon Fraser (smfr)
Reported
2015-10-05 20:45:40 PDT
Allow a test to specify a custom pixel tolerance in a ref test
Attachments
Patch
(39.62 KB, patch)
2015-10-05 21:07 PDT
,
Simon Fraser (smfr)
ap
: review-
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ews103 for mac-mavericks
(635.35 KB, application/zip)
2015-10-05 21:39 PDT
,
Build Bot
no flags
Details
Archive of layout-test-results from ews106 for mac-mavericks-wk2
(698.24 KB, application/zip)
2015-10-05 21:44 PDT
,
Build Bot
no flags
Details
Patch
(12.15 KB, patch)
2021-10-27 21:04 PDT
,
Simon Fraser (smfr)
mrobinson
: review+
ews-feeder
: commit-queue-
Details
Formatted Diff
Diff
Patch
(14.05 KB, patch)
2021-11-04 10:54 PDT
,
Simon Fraser (smfr)
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Simon Fraser (smfr)
Comment 1
2015-10-05 21:07:05 PDT
Created
attachment 262492
[details]
Patch
Build Bot
Comment 2
2015-10-05 21:39:32 PDT
Comment on
attachment 262492
[details]
Patch
Attachment 262492
[details]
did not pass mac-ews (mac): Output:
http://webkit-queues.webkit.org/results/249175
New failing tests: compositing/overlap-blending/children-opacity-no-overlap.html compositing/overlap-blending/children-opacity-huge.html compositing/overlap-blending/nested-non-overlap-clipping.html
Build Bot
Comment 3
2015-10-05 21:39:36 PDT
Created
attachment 262496
[details]
Archive of layout-test-results from ews103 for mac-mavericks The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews103 Port: mac-mavericks Platform: Mac OS X 10.9.5
Build Bot
Comment 4
2015-10-05 21:44:17 PDT
Comment on
attachment 262492
[details]
Patch
Attachment 262492
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-queues.webkit.org/results/249179
New failing tests: fast/images/gif-loop-count.html compositing/overlap-blending/children-opacity-no-overlap.html compositing/overlap-blending/nested-non-overlap-clipping.html compositing/overlap-blending/children-opacity-huge.html fast/images/animated-gif-no-layout.html
Build Bot
Comment 5
2015-10-05 21:44:20 PDT
Created
attachment 262497
[details]
Archive of layout-test-results from ews106 for mac-mavericks-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews106 Port: mac-mavericks-wk2 Platform: Mac OS X 10.9.5
zalan
Comment 6
2015-10-05 21:45:09 PDT
I like this a lot!
Alexey Proskuryakov
Comment 7
2015-10-05 21:53:26 PDT
Comment on
attachment 262492
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=262492&action=review
> Tools/ChangeLog:4 > + Allow a test to specify a custom pixel tolerance in a ref test > +
https://bugs.webkit.org/show_bug.cgi?id=149828
I still do not think that that this is desirable. It is very difficult or impossible to reason about what tolerance is acceptable for a test, or what it even means to have a particular tolerance. I have no idea what 0.004 means, and I expect that this default makes some existing tests ineffective.
Simon Fraser (smfr)
Comment 8
2015-10-05 23:15:41 PDT
(In reply to
comment #7
)
> It is very difficult or impossible to reason about what tolerance is > acceptable for a test, or what it even means to have a particular tolerance. > I have no idea what 0.004 means, and I expect that this default makes some > existing tests ineffective.
So you're OK with the arbitrary small tolerance that ImageDiff currently allows? The tolerance value here is the same value that ImageDiff returns, which we've been reporting in test results for ages. ImageDiff explains it as "The difference as a percentage combining both the number of different pixels and their difference amount i.e. the average distance over the entire image". Without this, it's impossible to do some kinds of testing.
Alexey Proskuryakov
Comment 9
2015-10-05 23:21:16 PDT
> So you're OK with the arbitrary small tolerance that ImageDiff currently allows?
Not really. It's also confusing, and the formula used for calculating the difference doesn't seem to make sense.
Alexey Proskuryakov
Comment 10
2015-10-05 23:22:41 PDT
Adding a way to manually control the tolerance is a step back though, as it adds a new dimension to possible mistakes. Let's discuss what kinds of testing are impossible, maybe there is another way to implement those.
Carlos Alberto Lopez Perez
Comment 11
2015-11-13 10:09:03 PST
(In reply to
comment #10
)
> Adding a way to manually control the tolerance is a step back though, as it > adds a new dimension to possible mistakes. > > Let's discuss what kinds of testing are impossible, maybe there is another > way to implement those.
An idea that comes to mind is that we can add support to specify on the TestExpectations files a tolerance for a given test. That way different ports can specify different tolerances or not tolerance at all for a test.
Blaze Burg
Comment 12
2015-11-18 23:26:25 PST
(In reply to
comment #11
)
> (In reply to
comment #10
) > > Adding a way to manually control the tolerance is a step back though, as it > > adds a new dimension to possible mistakes. > > > > Let's discuss what kinds of testing are impossible, maybe there is another > > way to implement those. > > An idea that comes to mind is that we can add support to specify on the > TestExpectations files a tolerance for a given test. That way different > ports can specify different tolerances or not tolerance at all for a test.
That seems even more error-prone that the test author changing the tolerance, since a tree maintainer may have no idea why tolerance is relevant in the specific scenario. And, increasing tolerance enough will make the test pass regardless of whether there's a real bug, which is bad. Also, TestExpectations suck as a way of annotating tests with runtime parameters. It's bad enough trying to differentiate image vs test and other outcomes.
Dean Jackson
Comment 13
2017-04-25 10:28:07 PDT
We need something like this for testing gradients. Floating point rounding gives us tiny differences in ref tests that are perfectly acceptable. Our alternative is to effectively implement the tolerance in the test, the way we now do for gradients in the canvas/philip suite. This is much more error prone, and requires duplication.
Carlos Alberto Lopez Perez
Comment 14
2017-04-25 10:42:09 PDT
(In reply to Alexey Proskuryakov from
comment #7
)
> Comment on
attachment 262492
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=262492&action=review
> > > Tools/ChangeLog:4 > > + Allow a test to specify a custom pixel tolerance in a ref test > > +
https://bugs.webkit.org/show_bug.cgi?id=149828
> > I still do not think that that this is desirable. > > It is very difficult or impossible to reason about what tolerance is > acceptable for a test, or what it even means to have a particular tolerance. > I have no idea what 0.004 means, and I expect that this default makes some > existing tests ineffective.
Would you be ok with a solution that allows to specify the value for an allowed pixel tolerance in the TestExpectation file of each port and for each test? Something like that: imported/blink/fast/canvas/canvas-clip-stack-persistence-diffs.html [ PixelTolerance=0.05 ] fast/gradients [ PixelTolerance=0.001 ] That would mean that imported/blink/fast/canvas/canvas-clip-stack-persistence-diffs.html will pass if the diff on the image is <= 0.05 and that all test on fast/gradients will pass if the diff is <= 0.001 (supposing that there isn't any line below overriding that default on fast/gradients for any specific test under that set) That way we can set the value specifically for each test and port. So the one setting the value can reason about if it makes sense for that very specific test or not.
Alexey Proskuryakov
Comment 15
2017-04-25 10:53:44 PDT
We would need to clearly define what a tolerance value means. There are many bugs where the problem manifests as a single line or even a single pixel of entirely wrong color. Then there are many bugs where the problem is a slight color variance. The current algorithm for computing a distance doesn't account for that.
Carlos Alberto Lopez Perez
Comment 16
2017-04-25 11:07:41 PDT
(In reply to Alexey Proskuryakov from
comment #15
)
> We would need to clearly define what a tolerance value means. > > There are many bugs where the problem manifests as a single line or even a > single pixel of entirely wrong color. Then there are many bugs where the > problem is a slight color variance. > > The current algorithm for computing a distance doesn't account for that.
Another idea is that a reftest result can be overriding by a png file. That way the png result produced by the test-suite can be dropped into the LayouTests directory as needed: LayoutTests/imported/blink/fast/canvas/canvas-clip-stack-persistence.html LayoutTests/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.html LayoutTests/platform/gtk/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.png That would mean that on the GTK+ port the test imported/blink/fast/canvas/canvas-clip-stack-persistence.html would pass if it produces an image equal to the one generated by LayoutTests/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.html *or* if it produces an image equal to the one stored at LayoutTests/platform/gtk/imported/blink/fast/canvas/canvas-clip-stack-persistence-expected.png
Dean Jackson
Comment 17
2017-04-26 10:23:24 PDT
(In reply to Alexey Proskuryakov from
comment #15
)
> We would need to clearly define what a tolerance value means. > > There are many bugs where the problem manifests as a single line or even a > single pixel of entirely wrong color. Then there are many bugs where the > problem is a slight color variance. > > The current algorithm for computing a distance doesn't account for that.
What I am asking for is NOT the ability to give an arbitrary test a tolerance. I'm talking about tests that by their nature are unable to be 100% replicated in a ref test e.g. due to floating point rounding. Gradients are a good example of this. I agree that it is not desirable if people fix flakey/bad tests with a tolerance value.
Radar WebKit Bug Importer
Comment 18
2020-04-30 16:13:33 PDT
<
rdar://problem/62682186
>
Simon Fraser (smfr)
Comment 19
2020-07-27 10:30:17 PDT
We should implement the WPT approach:
http://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching
Simon Fraser (smfr)
Comment 20
2021-10-27 21:04:05 PDT
Created
attachment 442670
[details]
Patch
Martin Robinson
Comment 21
2021-10-28 10:20:25 PDT
Comment on
attachment 442670
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=442670&action=review
Does this cause any new test passes? It seems like it should at least in the WPT transforms directory.
> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:387 > +
Nit: I think this
Simon Fraser (smfr)
Comment 22
2021-11-04 10:54:57 PDT
Created
attachment 443320
[details]
Patch
EWS
Comment 23
2021-12-21 10:35:58 PST
Committed
r287323
(
245473@main
): <
https://commits.webkit.org/245473@main
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 443320
[details]
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug