168033 – Nwtr ignores ImageDiff's errors for ref test

RESOLVED FIXED168033

Nwtr ignores ImageDiff's errors for ref test

https://bugs.webkit.org/show_bug.cgi?id=168033

Summary Nwtr ignores ImageDiff's errors for ref test

Fujii Hironori

Reported 2017-02-08 19:58:55 PST

nrwt: Reftests pass unexpectedly in case of ImageDiff's crash > fast/css/cascade/box-shadow-and-webkit-box-shadow-cascade-order.html -> ref test hashes didn't match but diff passed On the other hand, pixel test fails and reports "ImageDiff crashed" as expected. > fast/css/background-clip-radius-values.html : ImageDiff crashed >[1/1] fast/css/background-clip-radius-values.html failed unexpectedly (image diff)

Attachments
Patch (3.75 KB, patch) 2017-02-08 20:23 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews117 for mac-elcapitan (1.62 MB, application/zip) 2017-02-08 21:33 PST, Build Bot	no flags	Details
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (976.81 KB, application/zip) 2017-02-08 21:38 PST, Build Bot	no flags	Details
Patch (7.10 KB, patch) 2017-02-09 00:57 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Patch (3.46 KB, patch) 2017-02-15 00:46 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Patch (3.46 KB, patch) 2017-02-15 01:04 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Patch (5.61 KB, patch) 2017-02-15 01:07 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Patch (5.67 KB, patch) 2017-02-15 20:16 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Patch (4.55 KB, patch) 2017-03-07 00:54 PST, Fujii Hironori	no flags	Details Formatted Diff Diff
Patch (4.63 KB, patch) 2017-03-12 22:32 PDT, Fujii Hironori	no flags	Details Formatted Diff Diff
Show Obsolete (9) View All Add attachment proposed patch, testcase, etc.

Fujii Hironori

Comment 1 2017-02-08 20:23:33 PST

Created attachment 301004 [details] Patch

Build Bot

Comment 2 2017-02-08 21:33:44 PST

Comment on attachment 301004 [details] Patch Attachment 301004 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/3030141 New failing tests: compositing/transitions/transform-on-large-layer.html fast/css/sticky/sticky-left-percentage.html

Build Bot

Comment 3 2017-02-08 21:33:47 PST

Created attachment 301013 [details] Archive of layout-test-results from ews117 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 4 2017-02-08 21:38:03 PST

Comment on attachment 301004 [details] Patch Attachment 301004 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/3030161 New failing tests: compositing/transitions/transform-on-large-layer.html fast/css/sticky/sticky-left-percentage.html

Build Bot

Comment 5 2017-02-08 21:38:06 PST

Created attachment 301015 [details] Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6

Alexey Proskuryakov

Comment 6 2017-02-08 22:03:46 PST

Comment on attachment 301004 [details] Patch Looks like this breaks tests. But also, why does ImageDiff crash? In a crazy situation like this, it's probably appropriate to terminate run-webkit-tests with an exception.

Fujii Hironori

Comment 7 2017-02-08 23:33:49 PST

(In reply to comment #4) > New failing tests: > compositing/transitions/transform-on-large-layer.html > fast/css/sticky/sticky-left-percentage.html This change discovers problems which have been hidden by false negative. > compositing/transitions/transform-on-large-layer.html : ImageDiff produced stderr output: > Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977 > [1/10] compositing/transitions/transform-on-large-layer.html failed unexpectedly (reference didn't generate pixel results.) > fast/css/sticky/sticky-left-percentage.html : ImageDiff produced stderr output: > Error: test and reference images have different sizes. Test image is 800x700, reference image is 800x600 > [2/10] fast/css/sticky/sticky-left-percentage.html failed unexpectedly (reference didn't generate pixel results.)

Fujii Hironori

Comment 8 2017-02-08 23:54:16 PST

(In reply to comment #6) > But also, why does ImageDiff crash? In a crazy situation like this, it's > probably appropriate to terminate run-webkit-tests with an exception. My ImageDiff crashed just because mis-configuration of my system (Bug 168036 comment 1). But, this change isn't only for ImageDiff's crash. Nwtr hides tree types of other problems at the moment: * ImageDiff produced stderr output * ImageDiff timed out * ImageDiff crashed https://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/image_diff.py?rev=211931#L97

Fujii Hironori

Comment 9 2017-02-09 00:06:08 PST

Nwtr ignores ImageDiff's errors only for ref test. Nwtr checks the err_str of pixel test.

Fujii Hironori

Comment 10 2017-02-09 00:57:13 PST

Created attachment 301022 [details] Patch

Alexey Proskuryakov

Comment 11 2017-02-10 16:52:08 PST

Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review > Tools/ChangeLog:3 > + Nwtr ignores ImageDiff's errors for ref test Looking at the patch, but it's taking me some effort to understand all that it changes. In particular, I'm still not sure what the problems with these two tests were. What exactly makes them fail now? They clearly don't crash or hang. When I run them locally, I get a warning saying "ref test hashes didn't match but diff passed", but many more tests produce this warning, so what is the difference with those? compositing/masks/compositing-clip-path.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/children-opacity-huge.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/children-opacity-no-overlap.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test hashes didn't match but diff passed compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add.html -> ref test hashes didn't match but diff passed css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html -> ref test hashes didn't match but diff passed css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but diff passed > LayoutTests/ChangeLog:13 > + Set a tiny margin-top to #sandbox not to overwrap with a preceding > + text. Why do we need it to overlap?

Alexey Proskuryakov

Comment 12 2017-02-10 16:54:12 PST

Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review >> LayoutTests/ChangeLog:13 >> + text. > > Why do we need it to overlap? I meant, why do we need it to not overlap?

Ryosuke Niwa

Comment 13 2017-02-10 17:07:20 PST

Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review > Tools/ChangeLog:9 > + expected. Bacause _compare_image checks returned err_str of We use a single space after period. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339 > diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0) > - if not diff_result[0]: > - failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > + err_str = diff_result[2] > + if err_str: > + _log.warning(' %s : %s' % (self._test_name, err_str)) > + failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename)) Can we extract this as a helper method with _ prefix so that we don't have to repeat the same code twice? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343 > else: > - _log.warning(" %s -> ref test hashes matched but diff failed" % self._test_name) > + if not diff_result[0]: > + failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) Why don't we use elif here? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355 > else: > - _log.warning(" %s -> ref test hashes didn't match but diff passed" % self._test_name) > + if diff_result[0]: Ditto. > LayoutTests/ChangeLog:9 > + ImageDiff reports errors for different image sizes of expected and > + actual image of ref tests. They should have same window size. Can we make this change in a separate patch?

Fujii Hironori

Comment 14 2017-02-12 20:38:15 PST

Thank you for reviewing. (In reply to comment #11) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > > Tools/ChangeLog:3 > > + Nwtr ignores ImageDiff's errors for ref test > > Looking at the patch, but it's taking me some effort to understand all that > it changes. > > In particular, I'm still not sure what the problems with these two tests > were. What exactly makes them fail now? They clearly don't crash or hang. > When I run them locally, I get a warning saying "ref test hashes didn't > match but diff passed", but many more tests produce this warning, so what is > the difference with those? > > compositing/masks/compositing-clip-path.html -> ref test hashes didn't > match but diff passed > compositing/overlap-blending/children-opacity-huge.html -> ref test hashes > didn't match but diff passed > compositing/overlap-blending/children-opacity-no-overlap.html -> ref test > hashes didn't match but diff passed > compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test > hashes didn't match but diff passed > compositing/transitions/transform-on-large-layer.html -> ref test hashes > didn't match but diff passed > > css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add. > html -> ref test hashes didn't match but diff passed > > css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html > -> ref test hashes didn't match but diff passed > css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but > diff passed You didn't apply my patch. It's difficult to know what's an actual failure without my patch because nwtr doesn't report ImageDiff's error. You can see the ImageDiff's error message by applying out patch (Comment 7): > Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977 And, ImageDiff doen't check any pixels if the comparing two image have different sizes. So, it can not fail the test by modifying the text in -expected.html.

Fujii Hironori

Comment 15 2017-02-12 22:37:49 PST

(In reply to comment #12) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > >> LayoutTests/ChangeLog:13 > >> + text. > > > > Why do we need it to overlap? > > I meant, why do we need it to not overlap? It caused pixel diffs while I was runnning nwtr of GTK port. I thought it was caused by z-index specifed in the test. But, I check the test again, there is no such z-index specifed, but just translate3d. This is my fault. I need to investigate the failure of GTK port. Thank you for pointing out.

Fujii Hironori

Comment 16 2017-02-12 22:51:01 PST

Thank you for reviewing. (In reply to comment #13) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > > Tools/ChangeLog:9 > > + expected. Bacause _compare_image checks returned err_str of > > We use a single space after period. Oh, this is Emacs default behavior of fill-paragraph. I changed my config (sentence-end-double-space) now. > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339 > > diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0) > > - if not diff_result[0]: > > - failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > > + err_str = diff_result[2] > > + if err_str: > > + _log.warning(' %s : %s' % (self._test_name, err_str)) > > + failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename)) > > Can we extract this as a helper method with _ prefix so that we don't have > to repeat the same code twice? I found another problem (Bug 168221). The login of match and mismatch ref tests are similar but same. I think it's difficult. I'll revisit this bug after finishing Bug 168221. > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343 > > else: > > - _log.warning(" %s -> ref test hashes matched but diff failed" % self._test_name) > > + if not diff_result[0]: > > + failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > > Why don't we use elif here? > > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355 > > else: > > - _log.warning(" %s -> ref test hashes didn't match but diff passed" % self._test_name) > > + if diff_result[0]: > > Ditto. I just copied the code of pixel tests. You're right. I'll revise. > > LayoutTests/ChangeLog:9 > > + ImageDiff reports errors for different image sizes of expected and > > + actual image of ref tests. They should have same window size. > > Can we make this change in a separate patch? That's good idea. Filed (Bug 168217 and Bug 168218).

Alexey Proskuryakov

Comment 17 2017-02-12 23:14:35 PST

> You didn't apply my patch. It's difficult to know what's an > actual failure without my patch because nwtr doesn't report > ImageDiff's error. I don't think that this answers my question. Can you explain what is different about the two tests that you modified in this patch, in contrast with the others that currently have the same output? > And, ImageDiff doen't check any pixels if the comparing two image have different sizes. Are you sure? My recollection is that is does, at least on the Mac.

Fujii Hironori

Comment 18 2017-02-13 00:09:59 PST

(In reply to comment #17) > > You didn't apply my patch. It's difficult to know what's an > > actual failure without my patch because nwtr doesn't report > > ImageDiff's error. > > I don't think that this answers my question. Can you explain what is > different about the two tests that you modified in this patch, in contrast > with the others that currently have the same output? The two tests call window.resize, but others don't. Is this an answer to your question? I think I don't understand your question. > > And, ImageDiff doen't check any pixels if the comparing two image have different sizes. > > Are you sure? My recollection is that is does, at least on the Mac. I tested in my mac mini now. My result is different with yours. > fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html > Using port 'mac-elcapitan-wk2' > Test configuration: <elcapitan, x86_64, release> > Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results > Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic > Using Release build > Pixel tests disabled > Regular timeout: 30000, slow test timeout: 150000 > Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner - > > --lint-test-files warnings: > LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools > > Found 1 test; running 1, skipping 0. > > Running 1 test > > Starting helper ...2017-02-13 00:00:03.439 LayoutTestHelper[28724:1359551] Could not determine current color profile, so it will not be reset after running the tests. Running 1 WebKitTestRunner. > > compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed > The test ran as expected. > > fujii@fujihiro-mac-mini $ sed -ie 's/turns green/XXX/' LayoutTests/compositing/transitions/transform-on-large-layer-expected.html > fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html > Using port 'mac-elcapitan-wk2' > Test configuration: <elcapitan, x86_64, release> > Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results > Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic > Using Release build > Pixel tests disabled > Regular timeout: 30000, slow test timeout: 150000 > Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner - > > --lint-test-files warnings: > LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools > > Found 1 test; running 1, skipping 0. > > Running 1 test > > Starting helper ...2017-02-13 00:02:06.395 LayoutTestHelper[28882:1360579] Could not determine current color profile, so it will not be reset after running the tests. Running 1 WebKitTestRunner. > > compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed > The test ran as expected. > > fujii@fujihiro-mac-mini $ Here is a source code of ImageDiff of Mac port. https://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/cg/ImageDiffCG.cpp?rev=198559#L216 It doesn't check any pixels if the sizes are different.

Fujii Hironori

Comment 19 2017-02-13 18:41:54 PST

I discovered old discussions. This problem was discussed two years ago. [webkit-dev] run-webkit-tests question; hashes when comparing ref test output https://lists.webkit.org/pipermail/webkit-dev/2015-January/027166.html

Fujii Hironori

Comment 20 2017-02-15 00:46:44 PST

Created attachment 301589 [details] Patch

Fujii Hironori

Comment 21 2017-02-15 01:04:31 PST

Created attachment 301593 [details] Patch

Fujii Hironori

Comment 22 2017-02-15 01:07:24 PST

Created attachment 301594 [details] Patch

Fujii Hironori

Comment 23 2017-02-15 20:16:40 PST

Created attachment 301694 [details] Patch

Fujii Hironori

Comment 24 2017-02-27 18:52:45 PST

Comment on attachment 301593 [details] Patch I've finished fixing depending bugs, please review again.

Ryosuke Niwa

Comment 25 2017-03-06 19:14:11 PST

Comment on attachment 301593 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301593&action=review > Tools/ChangeLog:8 > + On the other hand, pixel test fails and reports the error as It's strange to start your log by saying "on the other hand" given you haven't stated what on one hand happens. > Tools/ChangeLog:9 > + expected. Bacause _compare_image checks returned err_str of Typo: B"a"cause. > Tools/ChangeLog:16 > + Please also explain the change you're making to expected match case. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:340 > + err_str = diff_result[2] Please spell out "error" instead of abbreviating it as "err".

Fujii Hironori

Comment 26 2017-03-07 00:54:44 PST

Created attachment 303635 [details] Patch

Alexey Proskuryakov

Comment 27 2017-03-09 16:13:51 PST

Comment on attachment 303635 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review Looks good to me. Can automated tests be added for this fix? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276 > + error = diff_result[2] I think that error_string would be a better name, there are too many things that an "error" can contain. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344 > + actual_driver_output.error = (actual_driver_output.error or '') + error I don't see any other place where we set actual_driver_output.error. Why is this needed?

Fujii Hironori

Comment 28 2017-03-12 22:15:00 PDT

Comment on attachment 303635 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276 >> + error = diff_result[2] > > I think that error_string would be a better name, there are too many things that an "error" can contain. I agree. I'll revise. >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344 >> + actual_driver_output.error = (actual_driver_output.error or '') + error > > I don't see any other place where we set actual_driver_output.error. Why is this needed? actual_driver_output is the return value of self._driver.run_test. Then, actual_driver_output.error may contain a error of executing WTR for actual case.

Fujii Hironori

Comment 29 2017-03-12 22:23:23 PDT

(In reply to comment #27) > Looks good to me. Can automated tests be added for this fix? I don't think so. I have no idea.

Fujii Hironori

Comment 30 2017-03-12 22:32:32 PDT

Created attachment 304227 [details] Patch

Alexey Proskuryakov

Comment 31 2017-03-12 23:35:18 PDT

CC'ing some folks who may have an idea for how to make a regression test for this fix.

WebKit Commit Bot

Comment 32 2017-03-14 11:22:09 PDT

Comment on attachment 304227 [details] Patch Clearing flags on attachment: 304227 Committed r213918: <http://trac.webkit.org/changeset/213918>

WebKit Commit Bot

Comment 33 2017-03-14 11:22:19 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Fujii Hironori

Reported

2017-02-08 19:58 PST

Modified

2017-03-14 11:22 PDT History

CC List

12 users Show

URL

Keywords

Depends on

168217 168221

Blocks

Dependencies

tree graph