Summary: | Nwtr ignores ImageDiff's errors for ref test | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Fujii Hironori <Hironori.Fujii> | ||||||||||||||||||||||
Component: | Tools / Tests | Assignee: | Fujii Hironori <Hironori.Fujii> | ||||||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||||||
Severity: | Normal | CC: | aakash_jain, ap, buildbot, commit-queue, dbates, dino, glenn, lforschler, mcatanzaro, rniwa, simon.fraser, webkit | ||||||||||||||||||||||
Priority: | P2 | ||||||||||||||||||||||||
Version: | WebKit Nightly Build | ||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||
Bug Depends on: | 168217, 168221 | ||||||||||||||||||||||||
Bug Blocks: | |||||||||||||||||||||||||
Attachments: |
|
Description
Fujii Hironori
2017-02-08 19:58:55 PST
Created attachment 301004 [details]
Patch
Comment on attachment 301004 [details] Patch Attachment 301004 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/3030141 New failing tests: compositing/transitions/transform-on-large-layer.html fast/css/sticky/sticky-left-percentage.html Created attachment 301013 [details]
Archive of layout-test-results from ews117 for mac-elcapitan
The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Comment on attachment 301004 [details] Patch Attachment 301004 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/3030161 New failing tests: compositing/transitions/transform-on-large-layer.html fast/css/sticky/sticky-left-percentage.html Created attachment 301015 [details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Comment on attachment 301004 [details]
Patch
Looks like this breaks tests.
But also, why does ImageDiff crash? In a crazy situation like this, it's probably appropriate to terminate run-webkit-tests with an exception.
(In reply to comment #4) > New failing tests: > compositing/transitions/transform-on-large-layer.html > fast/css/sticky/sticky-left-percentage.html This change discovers problems which have been hidden by false negative. > compositing/transitions/transform-on-large-layer.html : ImageDiff produced stderr output: > Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977 > [1/10] compositing/transitions/transform-on-large-layer.html failed unexpectedly (reference didn't generate pixel results.) > fast/css/sticky/sticky-left-percentage.html : ImageDiff produced stderr output: > Error: test and reference images have different sizes. Test image is 800x700, reference image is 800x600 > [2/10] fast/css/sticky/sticky-left-percentage.html failed unexpectedly (reference didn't generate pixel results.) (In reply to comment #6) > But also, why does ImageDiff crash? In a crazy situation like this, it's > probably appropriate to terminate run-webkit-tests with an exception. My ImageDiff crashed just because mis-configuration of my system (Bug 168036 comment 1). But, this change isn't only for ImageDiff's crash. Nwtr hides tree types of other problems at the moment: * ImageDiff produced stderr output * ImageDiff timed out * ImageDiff crashed https://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/image_diff.py?rev=211931#L97 Nwtr ignores ImageDiff's errors only for ref test. Nwtr checks the err_str of pixel test. Created attachment 301022 [details]
Patch
Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review > Tools/ChangeLog:3 > + Nwtr ignores ImageDiff's errors for ref test Looking at the patch, but it's taking me some effort to understand all that it changes. In particular, I'm still not sure what the problems with these two tests were. What exactly makes them fail now? They clearly don't crash or hang. When I run them locally, I get a warning saying "ref test hashes didn't match but diff passed", but many more tests produce this warning, so what is the difference with those? compositing/masks/compositing-clip-path.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/children-opacity-huge.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/children-opacity-no-overlap.html -> ref test hashes didn't match but diff passed compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test hashes didn't match but diff passed compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add.html -> ref test hashes didn't match but diff passed css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html -> ref test hashes didn't match but diff passed css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but diff passed > LayoutTests/ChangeLog:13 > + Set a tiny margin-top to #sandbox not to overwrap with a preceding > + text. Why do we need it to overlap? Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review >> LayoutTests/ChangeLog:13 >> + text. > > Why do we need it to overlap? I meant, why do we need it to not overlap? Comment on attachment 301022 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review > Tools/ChangeLog:9 > + expected. Bacause _compare_image checks returned err_str of We use a single space after period. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339 > diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0) > - if not diff_result[0]: > - failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > + err_str = diff_result[2] > + if err_str: > + _log.warning(' %s : %s' % (self._test_name, err_str)) > + failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename)) Can we extract this as a helper method with _ prefix so that we don't have to repeat the same code twice? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343 > else: > - _log.warning(" %s -> ref test hashes matched but diff failed" % self._test_name) > + if not diff_result[0]: > + failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) Why don't we use elif here? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355 > else: > - _log.warning(" %s -> ref test hashes didn't match but diff passed" % self._test_name) > + if diff_result[0]: Ditto. > LayoutTests/ChangeLog:9 > + ImageDiff reports errors for different image sizes of expected and > + actual image of ref tests. They should have same window size. Can we make this change in a separate patch? Thank you for reviewing. (In reply to comment #11) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > > Tools/ChangeLog:3 > > + Nwtr ignores ImageDiff's errors for ref test > > Looking at the patch, but it's taking me some effort to understand all that > it changes. > > In particular, I'm still not sure what the problems with these two tests > were. What exactly makes them fail now? They clearly don't crash or hang. > When I run them locally, I get a warning saying "ref test hashes didn't > match but diff passed", but many more tests produce this warning, so what is > the difference with those? > > compositing/masks/compositing-clip-path.html -> ref test hashes didn't > match but diff passed > compositing/overlap-blending/children-opacity-huge.html -> ref test hashes > didn't match but diff passed > compositing/overlap-blending/children-opacity-no-overlap.html -> ref test > hashes didn't match but diff passed > compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test > hashes didn't match but diff passed > compositing/transitions/transform-on-large-layer.html -> ref test hashes > didn't match but diff passed > > css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add. > html -> ref test hashes didn't match but diff passed > > css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html > -> ref test hashes didn't match but diff passed > css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but > diff passed You didn't apply my patch. It's difficult to know what's an actual failure without my patch because nwtr doesn't report ImageDiff's error. You can see the ImageDiff's error message by applying out patch (Comment 7): > Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977 And, ImageDiff doen't check any pixels if the comparing two image have different sizes. So, it can not fail the test by modifying the text in -expected.html. (In reply to comment #12) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > >> LayoutTests/ChangeLog:13 > >> + text. > > > > Why do we need it to overlap? > > I meant, why do we need it to not overlap? It caused pixel diffs while I was runnning nwtr of GTK port. I thought it was caused by z-index specifed in the test. But, I check the test again, there is no such z-index specifed, but just translate3d. This is my fault. I need to investigate the failure of GTK port. Thank you for pointing out. Thank you for reviewing. (In reply to comment #13) > Comment on attachment 301022 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=301022&action=review > > > Tools/ChangeLog:9 > > + expected. Bacause _compare_image checks returned err_str of > > We use a single space after period. Oh, this is Emacs default behavior of fill-paragraph. I changed my config (sentence-end-double-space) now. > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339 > > diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0) > > - if not diff_result[0]: > > - failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > > + err_str = diff_result[2] > > + if err_str: > > + _log.warning(' %s : %s' % (self._test_name, err_str)) > > + failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename)) > > Can we extract this as a helper method with _ prefix so that we don't have > to repeat the same code twice? I found another problem (Bug 168221). The login of match and mismatch ref tests are similar but same. I think it's difficult. I'll revisit this bug after finishing Bug 168221. > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343 > > else: > > - _log.warning(" %s -> ref test hashes matched but diff failed" % self._test_name) > > + if not diff_result[0]: > > + failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename)) > > Why don't we use elif here? > > > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355 > > else: > > - _log.warning(" %s -> ref test hashes didn't match but diff passed" % self._test_name) > > + if diff_result[0]: > > Ditto. I just copied the code of pixel tests. You're right. I'll revise. > > LayoutTests/ChangeLog:9 > > + ImageDiff reports errors for different image sizes of expected and > > + actual image of ref tests. They should have same window size. > > Can we make this change in a separate patch? That's good idea. Filed (Bug 168217 and Bug 168218). > You didn't apply my patch. It's difficult to know what's an > actual failure without my patch because nwtr doesn't report > ImageDiff's error. I don't think that this answers my question. Can you explain what is different about the two tests that you modified in this patch, in contrast with the others that currently have the same output? > And, ImageDiff doen't check any pixels if the comparing two image have different sizes. Are you sure? My recollection is that is does, at least on the Mac. (In reply to comment #17) > > You didn't apply my patch. It's difficult to know what's an > > actual failure without my patch because nwtr doesn't report > > ImageDiff's error. > > I don't think that this answers my question. Can you explain what is > different about the two tests that you modified in this patch, in contrast > with the others that currently have the same output? The two tests call window.resize, but others don't. Is this an answer to your question? I think I don't understand your question. > > And, ImageDiff doen't check any pixels if the comparing two image have different sizes. > > Are you sure? My recollection is that is does, at least on the Mac. I tested in my mac mini now. My result is different with yours. > fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html > Using port 'mac-elcapitan-wk2' > Test configuration: <elcapitan, x86_64, release> > Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results > Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic > Using Release build > Pixel tests disabled > Regular timeout: 30000, slow test timeout: 150000 > Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner - > > --lint-test-files warnings: > LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools > > Found 1 test; running 1, skipping 0. > > Running 1 test > > Starting helper ...2017-02-13 00:00:03.439 LayoutTestHelper[28724:1359551] Could not determine current color profile, so it will not be reset after running the tests. Running 1 WebKitTestRunner. > > compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed > The test ran as expected. > > fujii@fujihiro-mac-mini $ sed -ie 's/turns green/XXX/' LayoutTests/compositing/transitions/transform-on-large-layer-expected.html > fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html > Using port 'mac-elcapitan-wk2' > Test configuration: <elcapitan, x86_64, release> > Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results > Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic > Using Release build > Pixel tests disabled > Regular timeout: 30000, slow test timeout: 150000 > Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner - > > --lint-test-files warnings: > LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools > > Found 1 test; running 1, skipping 0. > > Running 1 test > > Starting helper ...2017-02-13 00:02:06.395 LayoutTestHelper[28882:1360579] Could not determine current color profile, so it will not be reset after running the tests. Running 1 WebKitTestRunner. > > compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed > The test ran as expected. > > fujii@fujihiro-mac-mini $ Here is a source code of ImageDiff of Mac port. https://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/cg/ImageDiffCG.cpp?rev=198559#L216 It doesn't check any pixels if the sizes are different. I discovered old discussions. This problem was discussed two years ago. [webkit-dev] run-webkit-tests question; hashes when comparing ref test output https://lists.webkit.org/pipermail/webkit-dev/2015-January/027166.html Created attachment 301589 [details]
Patch
Created attachment 301593 [details]
Patch
Created attachment 301594 [details]
Patch
Created attachment 301694 [details]
Patch
Comment on attachment 301593 [details]
Patch
I've finished fixing depending bugs, please review again.
Comment on attachment 301593 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301593&action=review > Tools/ChangeLog:8 > + On the other hand, pixel test fails and reports the error as It's strange to start your log by saying "on the other hand" given you haven't stated what on one hand happens. > Tools/ChangeLog:9 > + expected. Bacause _compare_image checks returned err_str of Typo: B"a"cause. > Tools/ChangeLog:16 > + Please also explain the change you're making to expected match case. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:340 > + err_str = diff_result[2] Please spell out "error" instead of abbreviating it as "err". Created attachment 303635 [details]
Patch
Comment on attachment 303635 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review Looks good to me. Can automated tests be added for this fix? > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276 > + error = diff_result[2] I think that error_string would be a better name, there are too many things that an "error" can contain. > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344 > + actual_driver_output.error = (actual_driver_output.error or '') + error I don't see any other place where we set actual_driver_output.error. Why is this needed? Comment on attachment 303635 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276 >> + error = diff_result[2] > > I think that error_string would be a better name, there are too many things that an "error" can contain. I agree. I'll revise. >> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344 >> + actual_driver_output.error = (actual_driver_output.error or '') + error > > I don't see any other place where we set actual_driver_output.error. Why is this needed? actual_driver_output is the return value of self._driver.run_test. Then, actual_driver_output.error may contain a error of executing WTR for actual case. (In reply to comment #27) > Looks good to me. Can automated tests be added for this fix? I don't think so. I have no idea. Created attachment 304227 [details]
Patch
CC'ing some folks who may have an idea for how to make a regression test for this fix. Comment on attachment 304227 [details] Patch Clearing flags on attachment: 304227 Committed r213918: <http://trac.webkit.org/changeset/213918> All reviewed patches have been landed. Closing bug. |