RESOLVED FIXED Bug 97112
REGRESSION (r126189): No more mismatch-did-not-occur failures in reftests
https://bugs.webkit.org/show_bug.cgi?id=97112
Summary REGRESSION (r126189): No more mismatch-did-not-occur failures in reftests
Zan Dobersek
Reported 2012-09-19 07:57:50 PDT
On GTK (and possibly other platforms as well, though most skip the test) when running fast/harness/sample-fail-mismatch-reftest.html the following output is shown, originating from SingleTestRunner[1]: fast/harness/sample-fail-mismatch-reftest.html -> ref test hashes matched but diff failed The test is expected to (and should) fail but actually passes, that is the reftest matches with its expected mismatch and that's considered an error. The output indicates ImageDiff does not produce the diff image event though there are no differences. I believe this is the current behavior of pixel dumping code throughout platforms when the actual hash matches the expected one[3][4][5]. [1] http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py#L328 [2] http://trac.webkit.org/changeset/126189 [3] http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/PixelDumpSupport.cpp#L71 [4] http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/qt/DumpRenderTreeQt.cpp#L962 [5] http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/chromium/TestShell.cpp#L708
Attachments
Patch (2.32 KB, patch)
2012-09-20 10:33 PDT, Zan Dobersek
no flags
Patch (3.82 KB, patch)
2012-09-25 14:12 PDT, Dirk Pranke
dpranke: review+
Dirk Pranke
Comment 1 2012-09-19 13:00:51 PDT
I'm a bit confused ... that log message says that although the hashes were the same, the images were different, and so there was a mismatch (hence the test ran "as expected"). That seems to be the opposite of what you're saying? FWIW, I just ran this on chromium mac (on lion) and the hashes were actually different. I am rebuilding the apple-mac port to check there.
Zan Dobersek
Comment 2 2012-09-19 13:19:28 PDT
(In reply to comment #1) > I'm a bit confused ... that log message says that although the hashes were the same, the images were different, and so there was a mismatch (hence the test ran "as expected"). > > That seems to be the opposite of what you're saying? Indeed the hashes are the same (as they should be) but the image of the reference driver output is None because DumpRenderTrees don't seem to output diff images when the hashes match. The Port.diff_image method returns a (True, 0, None) tuple when either image content is missing[1], and this is considered as a difference between reference and actual output in the current SingleTestRunner._compare_output_with_reference code, meaning the test in question passes (mismatch occurred) while it should fail. > > FWIW, I just ran this on chromium mac (on lion) and the hashes were actually different. I am rebuilding the apple-mac port to check there. The hashes shouldn't differ, though. Don't know what's happening there. [1] http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/base.py#L346
Dirk Pranke
Comment 3 2012-09-19 13:53:01 PDT
(In reply to comment #2) > (In reply to comment #1) > > I'm a bit confused ... that log message says that although the hashes were the same, the images were different, and so there was a mismatch (hence the test ran "as expected"). > > > > That seems to be the opposite of what you're saying? > > Indeed the hashes are the same (as they should be) but the image of the reference driver output is None because DumpRenderTrees don't seem to output diff images when the hashes match. The Port.diff_image method returns a (True, 0, None) tuple when either image content is missing[1], and this is considered as a difference between reference and actual output in the current SingleTestRunner._compare_output_with_reference code, meaning the test in question passes (mismatch occurred) while it should fail. > Hm. Okay, well, we seem to have two choices. We can either (1) not pass the hash to the reference test as input (so that we always generate both PNGs) or (2) assume that if the hashes match we don't need to actually diff the two images. I'm inclined to do the former, since we've seen problems with hashes before, and it's not clear if there's any real perf impact to getting the second PNG. FWIW, though, both chromium linux and apple mac also produce different hashes for the two inputs. > > > > FWIW, I just ran this on chromium mac (on lion) and the hashes were actually different. I am rebuilding the apple-mac port to check there. > > The hashes shouldn't differ, though. Don't know what's happening there. > > > [1] http://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/layout_tests/port/base.py#L346
Zan Dobersek
Comment 4 2012-09-20 10:24:55 PDT
(In reply to comment #3) > > Hm. Okay, well, we seem to have two choices. We can either (1) not pass the hash to the reference test as input (so that we always generate both PNGs) or (2) assume that if the hashes match we don't need to actually diff the two images. > > I'm inclined to do the former, since we've seen problems with hashes before, and it's not clear if there's any real perf impact to getting the second PNG. > I'll upload a patch that doesn't pass the hash to driver input. On a related note, this would be easier to spot if the test was reported as unexpectedly passing. But the test is reported as such only if pixel testing is enabled, despite the test itself being a reftest. Also discovered is the following problem - the expectation for the failing test is not taken into account (the failure is not expected) if in the following format: fast/harness/sample-fail-mismatch-reftest.html [ ImageOnlyFailure ] The following two expectations seem to work, though: (1) fast/harness/sample-fail-mismatch-reftest.html [ ImageOnlyFailure WontFix ] (2) Bug(GTK) fast/harness/sample-fail-mismatch-reftest.html [ ImageOnlyFailure ] I'll open two separate bugs for these issues.
Zan Dobersek
Comment 5 2012-09-20 10:33:40 PDT
Peter Beverloo (cr-android ews)
Comment 6 2012-09-20 10:48:51 PDT
Comment on attachment 164941 [details] Patch Attachment 164941 [details] did not pass cr-android-ews (chromium-android): Output: http://queues.webkit.org/results/13954252
Zan Dobersek
Comment 7 2012-09-20 12:16:31 PDT
(In reply to comment #4) > On a related note, this would be easier to spot if the test was reported as unexpectedly passing. But the test is reported as such only if pixel testing is enabled, despite the test itself being a reftest. Also discovered is the following problem - the expectation for the failing test is not taken into account (the failure is not expected) if in the following format: > fast/harness/sample-fail-mismatch-reftest.html [ ImageOnlyFailure ] > > The following two expectations seem to work, though: > (1) fast/harness/sample-fail-mismatch-reftest.html [ ImageOnlyFailure WontFix ] > (2) Bug(GTK) fast/harness/sample-fail-mismatch-reftest.html [ ImageOnlyFailure ] The first issue is being handled in bug #97424, the other issue is not really an issue, the expectation should either have the WontFix modifier or the Bug(*) modifier. So no problem there after all.
Dirk Pranke
Comment 8 2012-09-20 14:09:44 PDT
Comment on attachment 164941 [details] Patch can we add a test for this?
Zan Dobersek
Comment 9 2012-09-21 02:41:09 PDT
(In reply to comment #8) > (From update of attachment 164941 [details]) > can we add a test for this? (In reply to comment #8) > (From update of attachment 164941 [details]) > can we add a test for this? Ideally yes, though MockDRT still doesn't seem to be up to date after not-that-recent per-test pixel dumping changes.
Zan Dobersek
Comment 10 2012-09-25 12:09:17 PDT
> (In reply to comment #8) > > (From update of attachment 164941 [details] [details]) > > can we add a test for this? > > Ideally yes, though MockDRT still doesn't seem to be up to date after not-that-recent per-test pixel dumping changes. After closer reinspection I'm not finding any practical way of how to test this. Testing MockDRT behavior is not helping to test the code changes in SingleTestRunner and the latter class doesn't have its own unit tests. I'd be glad to receive some pointers on how to approach this or at very least just commit the patch (to be honest most of the latest changes to this code haven't included any tests).
Dirk Pranke
Comment 11 2012-09-25 14:12:10 PDT
Dirk Pranke
Comment 12 2012-09-25 14:15:51 PDT
(In reply to comment #10) > After closer reinspection I'm not finding any practical way of how to test this. Testing MockDRT behavior is not helping to test the code changes in SingleTestRunner and the latter class doesn't have its own unit tests. I'd be glad to receive some pointers on how to approach this I was looking for integration-level tests that make sure that we're testing how reftest mismatches are handled; it turns out that the way the Test port was implemented wasn't matching the normal ports' behavior, so I've updated it to return no image if the hashes match (and updated diff_image() to match the base implementation if it gets None). MockDRT could probably use a similar patch but we don't need it for this. > to be honest most of the latest changes to this code haven't included any tests). Pointers? If that was true, that would either be because the change wasn't for something I considered worth testing (e.g. the text in a log message), or I thought it was covered by existing tests. If there were substantive changes that went in w/o tests I'd be sad (probably means I missed something in the review).
Zan Dobersek
Comment 13 2012-09-26 00:16:24 PDT
(In reply to comment #12) > (In reply to comment #10) > > After closer reinspection I'm not finding any practical way of how to test this. Testing MockDRT behavior is not helping to test the code changes in SingleTestRunner and the latter class doesn't have its own unit tests. I'd be glad to receive some pointers on how to approach this > > I was looking for integration-level tests that make sure that we're testing how reftest mismatches are handled; it turns out that the way the Test port was implemented wasn't matching the normal ports' behavior, so I've updated it to return no image if the hashes match (and updated diff_image() to match the base implementation if it gets None). MockDRT could probably use a similar patch but we don't need it for this. > Aha, thanks, makes sense. > > to be honest most of the latest changes to this code haven't included any tests). > > Pointers? If that was true, that would either be because the change wasn't for something I considered worth testing (e.g. the text in a log message), or I thought it was covered by existing tests. If there were substantive changes that went in w/o tests I'd be sad (probably means I missed something in the review). To be fair nothing substantial went in without being tested, though admittedly I was considering this patch unsubstantial as well. I apologize if I came around as offensive or doubtful, I didn't mean to. Thanks again, I'll land the patch shortly.
Zan Dobersek
Comment 14 2012-09-26 00:17:52 PDT
Dirk Pranke
Comment 15 2012-09-26 10:46:52 PDT
(In reply to comment #13) > To be fair nothing substantial went in without being tested, though admittedly I was considering this patch unsubstantial as well. I apologize if I came around as offensive or doubtful, I didn't mean to. No offense taken, I was just wondering if I missed something.
Note You need to log in before you can comment on or make changes to this bug.