Bug 168033

Summary: Nwtr ignores ImageDiff's errors for ref test
Product: WebKit Reporter: Fujii Hironori <Hironori.Fujii>
Component: Tools / TestsAssignee: Fujii Hironori <Hironori.Fujii>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, ap, buildbot, commit-queue, dbates, dino, glenn, lforschler, mcatanzaro, rniwa, simon.fraser, webkit
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 168217, 168221    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews117 for mac-elcapitan
none
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Fujii Hironori
Reported 2017-02-08 19:58:55 PST
nrwt: Reftests pass unexpectedly in case of ImageDiff's crash > fast/css/cascade/box-shadow-and-webkit-box-shadow-cascade-order.html -> ref test hashes didn't match but diff passed On the other hand, pixel test fails and reports "ImageDiff crashed" as expected. > fast/css/background-clip-radius-values.html : ImageDiff crashed >[1/1] fast/css/background-clip-radius-values.html failed unexpectedly (image diff)
Attachments
Patch (3.75 KB, patch)
2017-02-08 20:23 PST, Fujii Hironori
no flags
Archive of layout-test-results from ews117 for mac-elcapitan (1.62 MB, application/zip)
2017-02-08 21:33 PST, Build Bot
no flags
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (976.81 KB, application/zip)
2017-02-08 21:38 PST, Build Bot
no flags
Patch (7.10 KB, patch)
2017-02-09 00:57 PST, Fujii Hironori
no flags
Patch (3.46 KB, patch)
2017-02-15 00:46 PST, Fujii Hironori
no flags
Patch (3.46 KB, patch)
2017-02-15 01:04 PST, Fujii Hironori
no flags
Patch (5.61 KB, patch)
2017-02-15 01:07 PST, Fujii Hironori
no flags
Patch (5.67 KB, patch)
2017-02-15 20:16 PST, Fujii Hironori
no flags
Patch (4.55 KB, patch)
2017-03-07 00:54 PST, Fujii Hironori
no flags
Patch (4.63 KB, patch)
2017-03-12 22:32 PDT, Fujii Hironori
no flags
Fujii Hironori
Comment 1 2017-02-08 20:23:33 PST
Build Bot
Comment 2 2017-02-08 21:33:44 PST
Comment on attachment 301004 [details] Patch Attachment 301004 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/3030141 New failing tests: compositing/transitions/transform-on-large-layer.html fast/css/sticky/sticky-left-percentage.html
Build Bot
Comment 3 2017-02-08 21:33:47 PST
Created attachment 301013 [details] Archive of layout-test-results from ews117 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews117 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 4 2017-02-08 21:38:03 PST
Comment on attachment 301004 [details] Patch Attachment 301004 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/3030161 New failing tests: compositing/transitions/transform-on-large-layer.html fast/css/sticky/sticky-left-percentage.html
Build Bot
Comment 5 2017-02-08 21:38:06 PST
Created attachment 301015 [details] Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Alexey Proskuryakov
Comment 6 2017-02-08 22:03:46 PST
Comment on attachment 301004 [details] Patch Looks like this breaks tests. But also, why does ImageDiff crash? In a crazy situation like this, it's probably appropriate to terminate run-webkit-tests with an exception.
Fujii Hironori
Comment 7 2017-02-08 23:33:49 PST
(In reply to comment #4) > New failing tests: > compositing/transitions/transform-on-large-layer.html > fast/css/sticky/sticky-left-percentage.html This change discovers problems which have been hidden by false negative. > compositing/transitions/transform-on-large-layer.html : ImageDiff produced stderr output: > Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977 > [1/10] compositing/transitions/transform-on-large-layer.html failed unexpectedly (reference didn't generate pixel results.) > fast/css/sticky/sticky-left-percentage.html : ImageDiff produced stderr output: > Error: test and reference images have different sizes. Test image is 800x700, reference image is 800x600 > [2/10] fast/css/sticky/sticky-left-percentage.html failed unexpectedly (reference didn't generate pixel results.)
Fujii Hironori
Comment 8 2017-02-08 23:54:16 PST
(In reply to comment #6) > But also, why does ImageDiff crash? In a crazy situation like this, it's > probably appropriate to terminate run-webkit-tests with an exception. My ImageDiff crashed just because mis-configuration of my system (Bug 168036 comment 1). But, this change isn't only for ImageDiff's crash. Nwtr hides tree types of other problems at the moment: * ImageDiff produced stderr output * ImageDiff timed out * ImageDiff crashed https://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/image_diff.py?rev=211931#L97
Fujii Hironori
Comment 9 2017-02-09 00:06:08 PST
Nwtr ignores ImageDiff's errors only for ref test. Nwtr checks the err_str of pixel test.
Fujii Hironori
Comment 10 2017-02-09 00:57:13 PST
Alexey Proskuryakov
Comment 11 2017-02-10 16:52:08 PST
Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review > Tools/ChangeLog:3 > + Nwtr ignores ImageDiff's errors for ref test Looking at the patch, but it's taking me some effort to understand all that it changes. In particular, I'm still not sure what the problems with these two tests were. What exactly makes them fail now? They clearly don't crash or hang. When I run them locally, I get a warning saying "ref test hashes didn't match but diff passed", but many more tests produce this warning, so what is the difference with those? compositing/masks/compositing-clip-path.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/children-opacity-huge.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/children-opacity-no-overlap.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test hashes didn't match but diff passed compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add.html -> ref test hashes didn't match but diff passed css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html -> ref test hashes didn't match but diff passed css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but diff passed > LayoutTests/ChangeLog:13 > + Set a tiny margin-top to #sandbox not to overwrap with a preceding > + text. Why do we need it to overlap?
Alexey Proskuryakov
Comment 12 2017-02-10 16:54:12 PST
Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review >> LayoutTests/ChangeLog:13 >> + text. > > Why do we need it to overlap? I meant, why do we need it to not overlap?
Ryosuke Niwa
Comment 13 2017-02-10 17:07:20 PST
Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review > Tools/ChangeLog:9 > + expected. Bacause _compare_image checks returned err_str of We use a single space after period. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339 > diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0) > - if not diff_result[0]: > - failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > + err_str = diff_result[2] > + if err_str: > + _log.warning(' %s : %s' % (self._test_name, err_str)) > + failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename)) Can we extract this as a helper method with _ prefix so that we don't have to repeat the same code twice? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343 > else: > - _log.warning(" %s -> ref test hashes matched but diff failed" % self._test_name) > + if not diff_result[0]: > + failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) Why don't we use elif here? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355 > else: > - _log.warning(" %s -> ref test hashes didn't match but diff passed" % self._test_name) > + if diff_result[0]: Ditto. > LayoutTests/ChangeLog:9 > + ImageDiff reports errors for different image sizes of expected and > + actual image of ref tests. They should have same window size. Can we make this change in a separate patch?
Fujii Hironori
Comment 14 2017-02-12 20:38:15 PST
Thank you for reviewing. (In reply to comment #11) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > > Tools/ChangeLog:3 > > + Nwtr ignores ImageDiff's errors for ref test > > Looking at the patch, but it's taking me some effort to understand all that > it changes. > > In particular, I'm still not sure what the problems with these two tests > were. What exactly makes them fail now? They clearly don't crash or hang. > When I run them locally, I get a warning saying "ref test hashes didn't > match but diff passed", but many more tests produce this warning, so what is > the difference with those? > > compositing/masks/compositing-clip-path.html -> ref test hashes didn't > match but diff passed > compositing/overlap-blending/children-opacity-huge.html -> ref test hashes > didn't match but diff passed > compositing/overlap-blending/children-opacity-no-overlap.html -> ref test > hashes didn't match but diff passed > compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test > hashes didn't match but diff passed > compositing/transitions/transform-on-large-layer.html -> ref test hashes > didn't match but diff passed > > css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add. > html -> ref test hashes didn't match but diff passed > > css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html > -> ref test hashes didn't match but diff passed > css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but > diff passed You didn't apply my patch. It's difficult to know what's an actual failure without my patch because nwtr doesn't report ImageDiff's error. You can see the ImageDiff's error message by applying out patch (Comment 7): > Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977 And, ImageDiff doen't check any pixels if the comparing two image have different sizes. So, it can not fail the test by modifying the text in -expected.html.
Fujii Hironori
Comment 15 2017-02-12 22:37:49 PST
(In reply to comment #12) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > >> LayoutTests/ChangeLog:13 > >> + text. > > > > Why do we need it to overlap? > > I meant, why do we need it to not overlap? It caused pixel diffs while I was runnning nwtr of GTK port. I thought it was caused by z-index specifed in the test. But, I check the test again, there is no such z-index specifed, but just translate3d. This is my fault. I need to investigate the failure of GTK port. Thank you for pointing out.
Fujii Hironori
Comment 16 2017-02-12 22:51:01 PST
Thank you for reviewing. (In reply to comment #13) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > > Tools/ChangeLog:9 > > + expected. Bacause _compare_image checks returned err_str of > > We use a single space after period. Oh, this is Emacs default behavior of fill-paragraph. I changed my config (sentence-end-double-space) now. > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339 > > diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0) > > - if not diff_result[0]: > > - failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > > + err_str = diff_result[2] > > + if err_str: > > + _log.warning(' %s : %s' % (self._test_name, err_str)) > > + failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename)) > > Can we extract this as a helper method with _ prefix so that we don't have > to repeat the same code twice? I found another problem (Bug 168221). The login of match and mismatch ref tests are similar but same. I think it's difficult. I'll revisit this bug after finishing Bug 168221. > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343 > > else: > > - _log.warning(" %s -> ref test hashes matched but diff failed" % self._test_name) > > + if not diff_result[0]: > > + failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > > Why don't we use elif here? > > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355 > > else: > > - _log.warning(" %s -> ref test hashes didn't match but diff passed" % self._test_name) > > + if diff_result[0]: > > Ditto. I just copied the code of pixel tests. You're right. I'll revise. > > LayoutTests/ChangeLog:9 > > + ImageDiff reports errors for different image sizes of expected and > > + actual image of ref tests. They should have same window size. > > Can we make this change in a separate patch? That's good idea. Filed (Bug 168217 and Bug 168218).
Alexey Proskuryakov
Comment 17 2017-02-12 23:14:35 PST
> You didn't apply my patch. It's difficult to know what's an > actual failure without my patch because nwtr doesn't report > ImageDiff's error. I don't think that this answers my question. Can you explain what is different about the two tests that you modified in this patch, in contrast with the others that currently have the same output? > And, ImageDiff doen't check any pixels if the comparing two image have different sizes. Are you sure? My recollection is that is does, at least on the Mac.
Fujii Hironori
Comment 18 2017-02-13 00:09:59 PST
(In reply to comment #17) > > You didn't apply my patch. It's difficult to know what's an > > actual failure without my patch because nwtr doesn't report > > ImageDiff's error. > > I don't think that this answers my question. Can you explain what is > different about the two tests that you modified in this patch, in contrast > with the others that currently have the same output? The two tests call window.resize, but others don't. Is this an answer to your question? I think I don't understand your question. > > And, ImageDiff doen't check any pixels if the comparing two image have different sizes. > > Are you sure? My recollection is that is does, at least on the Mac. I tested in my mac mini now. My result is different with yours. > fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html > Using port 'mac-elcapitan-wk2' > Test configuration: <elcapitan, x86_64, release> > Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results > Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic > Using Release build > Pixel tests disabled > Regular timeout: 30000, slow test timeout: 150000 > Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner - > > --lint-test-files warnings: > LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools > > Found 1 test; running 1, skipping 0. > > Running 1 test > > Starting helper ...2017-02-13 00:00:03.439 LayoutTestHelper[28724:1359551] Could not determine current color profile, so it will not be reset after running the tests. Running 1 WebKitTestRunner. > > compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed > The test ran as expected. > > fujii@fujihiro-mac-mini $ sed -ie 's/turns green/XXX/' LayoutTests/compositing/transitions/transform-on-large-layer-expected.html > fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html > Using port 'mac-elcapitan-wk2' > Test configuration: <elcapitan, x86_64, release> > Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results > Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic > Using Release build > Pixel tests disabled > Regular timeout: 30000, slow test timeout: 150000 > Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner - > > --lint-test-files warnings: > LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools > > Found 1 test; running 1, skipping 0. > > Running 1 test > > Starting helper ...2017-02-13 00:02:06.395 LayoutTestHelper[28882:1360579] Could not determine current color profile, so it will not be reset after running the tests. Running 1 WebKitTestRunner. > > compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed > The test ran as expected. > > fujii@fujihiro-mac-mini $ Here is a source code of ImageDiff of Mac port. https://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/cg/ImageDiffCG.cpp?rev=198559#L216 It doesn't check any pixels if the sizes are different.
Fujii Hironori
Comment 19 2017-02-13 18:41:54 PST
I discovered old discussions. This problem was discussed two years ago. [webkit-dev] run-webkit-tests question; hashes when comparing ref test output https://lists.webkit.org/pipermail/webkit-dev/2015-January/027166.html
Fujii Hironori
Comment 20 2017-02-15 00:46:44 PST
Fujii Hironori
Comment 21 2017-02-15 01:04:31 PST
Fujii Hironori
Comment 22 2017-02-15 01:07:24 PST
Fujii Hironori
Comment 23 2017-02-15 20:16:40 PST
Fujii Hironori
Comment 24 2017-02-27 18:52:45 PST
Comment on attachment 301593 [details] Patch I've finished fixing depending bugs, please review again.
Ryosuke Niwa
Comment 25 2017-03-06 19:14:11 PST
Comment on attachment 301593 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301593&action=review > Tools/ChangeLog:8 > + On the other hand, pixel test fails and reports the error as It's strange to start your log by saying "on the other hand" given you haven't stated what on one hand happens. > Tools/ChangeLog:9 > + expected. Bacause _compare_image checks returned err_str of Typo: B"a"cause. > Tools/ChangeLog:16 > + Please also explain the change you're making to expected match case. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:340 > + err_str = diff_result[2] Please spell out "error" instead of abbreviating it as "err".
Fujii Hironori
Comment 26 2017-03-07 00:54:44 PST
Alexey Proskuryakov
Comment 27 2017-03-09 16:13:51 PST
Comment on attachment 303635 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review Looks good to me. Can automated tests be added for this fix? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276 > + error = diff_result[2] I think that error_string would be a better name, there are too many things that an "error" can contain. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344 > + actual_driver_output.error = (actual_driver_output.error or '') + error I don't see any other place where we set actual_driver_output.error. Why is this needed?
Fujii Hironori
Comment 28 2017-03-12 22:15:00 PDT
Comment on attachment 303635 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276 >> + error = diff_result[2] > > I think that error_string would be a better name, there are too many things that an "error" can contain. I agree. I'll revise. >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344 >> + actual_driver_output.error = (actual_driver_output.error or '') + error > > I don't see any other place where we set actual_driver_output.error. Why is this needed? actual_driver_output is the return value of self._driver.run_test. Then, actual_driver_output.error may contain a error of executing WTR for actual case.
Fujii Hironori
Comment 29 2017-03-12 22:23:23 PDT
(In reply to comment #27) > Looks good to me. Can automated tests be added for this fix? I don't think so. I have no idea.
Fujii Hironori
Comment 30 2017-03-12 22:32:32 PDT
Alexey Proskuryakov
Comment 31 2017-03-12 23:35:18 PDT
CC'ing some folks who may have an idea for how to make a regression test for this fix.
WebKit Commit Bot
Comment 32 2017-03-14 11:22:09 PDT
Comment on attachment 304227 [details] Patch Clearing flags on attachment: 304227 Committed r213918: <http://trac.webkit.org/changeset/213918>
WebKit Commit Bot
Comment 33 2017-03-14 11:22:19 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.