Bug 168033

Summary: Nwtr ignores ImageDiff's errors for ref test
Product: WebKit Reporter: Fujii Hironori <Hironori.Fujii>
Component: Tools / TestsAssignee: Fujii Hironori <Hironori.Fujii>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, ap, buildbot, commit-queue, dbates, dino, glenn, lforschler, mcatanzaro, rniwa, simon.fraser, webkit
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 168217, 168221    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews117 for mac-elcapitan
none
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Fujii Hironori 2017-02-08 19:58:55 PST
nrwt: Reftests pass unexpectedly in case of ImageDiff's crash

>  fast/css/cascade/box-shadow-and-webkit-box-shadow-cascade-order.html -> ref test hashes didn't match but diff passed

On the other hand, pixel test fails and reports "ImageDiff crashed" as expected.

>  fast/css/background-clip-radius-values.html : ImageDiff crashed
>[1/1] fast/css/background-clip-radius-values.html failed unexpectedly (image diff)
Comment 1 Fujii Hironori 2017-02-08 20:23:33 PST
Created attachment 301004 [details]
Patch
Comment 2 Build Bot 2017-02-08 21:33:44 PST
Comment on attachment 301004 [details]
Patch

Attachment 301004 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/3030141

New failing tests:
compositing/transitions/transform-on-large-layer.html
fast/css/sticky/sticky-left-percentage.html
Comment 3 Build Bot 2017-02-08 21:33:47 PST
Created attachment 301013 [details]
Archive of layout-test-results from ews117 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews117  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 4 Build Bot 2017-02-08 21:38:03 PST
Comment on attachment 301004 [details]
Patch

Attachment 301004 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/3030161

New failing tests:
compositing/transitions/transform-on-large-layer.html
fast/css/sticky/sticky-left-percentage.html
Comment 5 Build Bot 2017-02-08 21:38:06 PST
Created attachment 301015 [details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews105  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 6 Alexey Proskuryakov 2017-02-08 22:03:46 PST
Comment on attachment 301004 [details]
Patch

Looks like this breaks tests.

But also, why does ImageDiff crash? In a crazy situation like this, it's probably appropriate to terminate run-webkit-tests with an exception.
Comment 7 Fujii Hironori 2017-02-08 23:33:49 PST
(In reply to comment #4)
> New failing tests:
> compositing/transitions/transform-on-large-layer.html
> fast/css/sticky/sticky-left-percentage.html

This change discovers problems which have been hidden by false negative.

>   compositing/transitions/transform-on-large-layer.html : ImageDiff produced stderr output:
> Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977
> [1/10] compositing/transitions/transform-on-large-layer.html failed unexpectedly (reference didn't generate pixel results.)
>   fast/css/sticky/sticky-left-percentage.html : ImageDiff produced stderr output:
> Error: test and reference images have different sizes. Test image is 800x700, reference image is 800x600
> [2/10] fast/css/sticky/sticky-left-percentage.html failed unexpectedly (reference didn't generate pixel results.)
Comment 8 Fujii Hironori 2017-02-08 23:54:16 PST
(In reply to comment #6)
> But also, why does ImageDiff crash? In a crazy situation like this, it's
> probably appropriate to terminate run-webkit-tests with an exception.

My ImageDiff crashed just because mis-configuration of my system (Bug 168036 comment 1).

But, this change isn't only for ImageDiff's crash.
Nwtr hides tree types of other problems at the moment:
* ImageDiff produced stderr output
* ImageDiff timed out
* ImageDiff crashed

https://trac.webkit.org/browser/trunk/Tools/Scripts/webkitpy/port/image_diff.py?rev=211931#L97
Comment 9 Fujii Hironori 2017-02-09 00:06:08 PST
Nwtr ignores ImageDiff's errors only for ref test.
Nwtr checks the err_str of pixel test.
Comment 10 Fujii Hironori 2017-02-09 00:57:13 PST
Created attachment 301022 [details]
Patch
Comment 11 Alexey Proskuryakov 2017-02-10 16:52:08 PST
Comment on attachment 301022 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review

> Tools/ChangeLog:3
> +        Nwtr ignores ImageDiff's errors for ref test

Looking at the patch, but it's taking me some effort to understand all that it changes.

In particular, I'm still not sure what the problems with these two tests were. What exactly makes them fail now? They clearly don't crash or hang. When I run them locally, I get a warning saying "ref test hashes didn't match but diff passed", but many more tests produce this warning, so what is the difference with those?

  compositing/masks/compositing-clip-path.html -> ref test hashes didn't match but diff passed
  compositing/overlap-blending/children-opacity-huge.html -> ref test hashes didn't match but diff passed
  compositing/overlap-blending/children-opacity-no-overlap.html -> ref test hashes didn't match but diff passed
  compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test hashes didn't match but diff passed
  compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed
  css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add.html -> ref test hashes didn't match but diff passed
  css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html -> ref test hashes didn't match but diff passed
  css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but diff passed

> LayoutTests/ChangeLog:13
> +        Set a tiny margin-top to #sandbox not to overwrap with a preceding
> +        text.

Why do we need it to overlap?
Comment 12 Alexey Proskuryakov 2017-02-10 16:54:12 PST
Comment on attachment 301022 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review

>> LayoutTests/ChangeLog:13
>> +        text.
> 
> Why do we need it to overlap?

I meant, why do we need it to not overlap?
Comment 13 Ryosuke Niwa 2017-02-10 17:07:20 PST
Comment on attachment 301022 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301022&action=review

> Tools/ChangeLog:9
> +        expected.  Bacause _compare_image checks returned err_str of

We use a single space after period.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339
>                  diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0)
> -                if not diff_result[0]:
> -                    failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename))
> +                err_str = diff_result[2]
> +                if err_str:
> +                    _log.warning('  %s : %s' % (self._test_name, err_str))
> +                    failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename))

Can we extract this as a helper method with _ prefix so that we don't have to repeat the same code twice?

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343
>                  else:
> -                    _log.warning("  %s -> ref test hashes matched but diff failed" % self._test_name)
> +                    if not diff_result[0]:
> +                        failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename))

Why don't we use elif here?

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355
>              else:
> -                _log.warning("  %s -> ref test hashes didn't match but diff passed" % self._test_name)
> +                if diff_result[0]:

Ditto.

> LayoutTests/ChangeLog:9
> +        ImageDiff reports errors for different image sizes of expected and
> +        actual image of ref tests.  They should have same window size.

Can we make this change in a separate patch?
Comment 14 Fujii Hironori 2017-02-12 20:38:15 PST
Thank you for reviewing.

(In reply to comment #11)
> Comment on attachment 301022 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=301022&action=review
> 
> > Tools/ChangeLog:3
> > +        Nwtr ignores ImageDiff's errors for ref test
> 
> Looking at the patch, but it's taking me some effort to understand all that
> it changes.
> 
> In particular, I'm still not sure what the problems with these two tests
> were. What exactly makes them fail now? They clearly don't crash or hang.
> When I run them locally, I get a warning saying "ref test hashes didn't
> match but diff passed", but many more tests produce this warning, so what is
> the difference with those?
> 
>   compositing/masks/compositing-clip-path.html -> ref test hashes didn't
> match but diff passed
>   compositing/overlap-blending/children-opacity-huge.html -> ref test hashes
> didn't match but diff passed
>   compositing/overlap-blending/children-opacity-no-overlap.html -> ref test
> hashes didn't match but diff passed
>   compositing/overlap-blending/nested-non-overlap-clipping.html -> ref test
> hashes didn't match but diff passed
>   compositing/transitions/transform-on-large-layer.html -> ref test hashes
> didn't match but diff passed
>  
> css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection-add.
> html -> ref test hashes didn't match but diff passed
>  
> css3/filters/backdrop/backdrop-filter-with-border-radius-and-reflection.html
> -> ref test hashes didn't match but diff passed
>   css3/filters/filters-on-svg-root.html -> ref test hashes didn't match but
> diff passed

You didn't apply my patch. It's difficult to know what's an
actual failure without my patch because nwtr doesn't report
ImageDiff's error.

You can see the ImageDiff's error message by applying out patch (Comment 7):

> Error: test and reference images have different sizes. Test image is 800x600, reference image is 1600x977

And, ImageDiff doen't check any pixels if the comparing two image have different sizes.
So, it can not fail the test by modifying the text in -expected.html.
Comment 15 Fujii Hironori 2017-02-12 22:37:49 PST
(In reply to comment #12)
> Comment on attachment 301022 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=301022&action=review
> 
> >> LayoutTests/ChangeLog:13
> >> +        text.
> > 
> > Why do we need it to overlap?
> 
> I meant, why do we need it to not overlap?

It caused pixel diffs while I was runnning nwtr of GTK port.
I thought it was caused by z-index specifed in the test.
But, I check the test again, there is no such z-index specifed, but just translate3d.
This is my fault. I need to investigate the failure of GTK port.
Thank you for pointing out.
Comment 16 Fujii Hironori 2017-02-12 22:51:01 PST
Thank you for reviewing.

(In reply to comment #13)
> Comment on attachment 301022 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=301022&action=review
> 
> > Tools/ChangeLog:9
> > +        expected.  Bacause _compare_image checks returned err_str of
> 
> We use a single space after period.

Oh, this is Emacs default behavior of fill-paragraph. I changed
my config (sentence-end-double-space) now.

> > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:339
> >                  diff_result = self._port.diff_image(reference_driver_output.image, actual_driver_output.image, tolerance=0)
> > -                if not diff_result[0]:
> > -                    failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename))
> > +                err_str = diff_result[2]
> > +                if err_str:
> > +                    _log.warning('  %s : %s' % (self._test_name, err_str))
> > +                    failures.append(test_failures.FailureReftestNoImagesGenerated(reference_filename))
> 
> Can we extract this as a helper method with _ prefix so that we don't have
> to repeat the same code twice?

I found another problem (Bug 168221).
The login of match and mismatch ref tests are similar but same.
I think it's difficult.
I'll revisit this bug after finishing Bug 168221.

> > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:343
> >                  else:
> > -                    _log.warning("  %s -> ref test hashes matched but diff failed" % self._test_name)
> > +                    if not diff_result[0]:
> > +                        failures.append(test_failures.FailureReftestMismatchDidNotOccur(reference_filename))
> 
> Why don't we use elif here?
> 
> > Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:355
> >              else:
> > -                _log.warning("  %s -> ref test hashes didn't match but diff passed" % self._test_name)
> > +                if diff_result[0]:
> 
> Ditto.

I just copied the code of pixel tests. You're right. I'll revise.

> > LayoutTests/ChangeLog:9
> > +        ImageDiff reports errors for different image sizes of expected and
> > +        actual image of ref tests.  They should have same window size.
> 
> Can we make this change in a separate patch?

That's good idea. Filed (Bug 168217 and Bug 168218).
Comment 17 Alexey Proskuryakov 2017-02-12 23:14:35 PST
> You didn't apply my patch. It's difficult to know what's an
> actual failure without my patch because nwtr doesn't report
> ImageDiff's error.

I don't think that this answers my question. Can you explain what is different about the two tests that you modified in this patch, in contrast with the others that currently have the same output?

> And, ImageDiff doen't check any pixels if the comparing two image have different sizes.

Are you sure? My recollection is that is does, at least on the Mac.
Comment 18 Fujii Hironori 2017-02-13 00:09:59 PST
(In reply to comment #17)
> > You didn't apply my patch. It's difficult to know what's an
> > actual failure without my patch because nwtr doesn't report
> > ImageDiff's error.
> 
> I don't think that this answers my question. Can you explain what is
> different about the two tests that you modified in this patch, in contrast
> with the others that currently have the same output?

The two tests call window.resize, but others don't.
Is this an answer to your question? I think I don't understand your question.


> > And, ImageDiff doen't check any pixels if the comparing two image have different sizes.
> 
> Are you sure? My recollection is that is does, at least on the Mac.

I tested in my mac mini now. My result is different with yours.

> fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html
> Using port 'mac-elcapitan-wk2'
> Test configuration: <elcapitan, x86_64, release>
> Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results
> Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic
> Using Release build
> Pixel tests disabled
> Regular timeout: 30000, slow test timeout: 150000
> Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner -
> 
> --lint-test-files warnings:
> LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools
> 
> Found 1 test; running 1, skipping 0.
> 
> Running 1 test
> 
> Starting helper ...2017-02-13 00:00:03.439 LayoutTestHelper[28724:1359551] Could not determine current color profile, so it will not be reset after running the tests.     Running 1 WebKitTestRunner.
> 
>   compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed
> The test ran as expected.
> 
> fujii@fujihiro-mac-mini $ sed -ie 's/turns green/XXX/' LayoutTests/compositing/transitions/transform-on-large-layer-expected.html
> fujii@fujihiro-mac-mini $ ./Tools/Scripts/run-webkit-tests compositing/transitions/transform-on-large-layer.html
> Using port 'mac-elcapitan-wk2'
> Test configuration: <elcapitan, x86_64, release>
> Placing test results in /Users/fujii/work/webkit/g1/WebKitBuild/Release/layout-test-results
> Baseline search path: platform/mac-wk2 -> platform/wk2 -> platform/mac-elcapitan -> platform/mac -> generic
> Using Release build
> Pixel tests disabled
> Regular timeout: 30000, slow test timeout: 150000
> Command line: /Users/fujii/work/webkit/g1/WebKitBuild/Release/WebKitTestRunner -
> 
> --lint-test-files warnings:
> LayoutTests/TestExpectations:384 Path does not exist. imported/w3c/web-platform-tests/tools
> 
> Found 1 test; running 1, skipping 0.
> 
> Running 1 test
> 
> Starting helper ...2017-02-13 00:02:06.395 LayoutTestHelper[28882:1360579] Could not determine current color profile, so it will not be reset after running the tests.     Running 1 WebKitTestRunner.
> 
>   compositing/transitions/transform-on-large-layer.html -> ref test hashes didn't match but diff passed
> The test ran as expected.
> 
> fujii@fujihiro-mac-mini $


Here is a source code of ImageDiff of Mac port.
https://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/cg/ImageDiffCG.cpp?rev=198559#L216

It doesn't check any pixels if the sizes are different.
Comment 19 Fujii Hironori 2017-02-13 18:41:54 PST
I discovered old discussions.
This problem was discussed two years ago.

  [webkit-dev] run-webkit-tests question; hashes when comparing ref test output
  https://lists.webkit.org/pipermail/webkit-dev/2015-January/027166.html
Comment 20 Fujii Hironori 2017-02-15 00:46:44 PST
Created attachment 301589 [details]
Patch
Comment 21 Fujii Hironori 2017-02-15 01:04:31 PST
Created attachment 301593 [details]
Patch
Comment 22 Fujii Hironori 2017-02-15 01:07:24 PST
Created attachment 301594 [details]
Patch
Comment 23 Fujii Hironori 2017-02-15 20:16:40 PST
Created attachment 301694 [details]
Patch
Comment 24 Fujii Hironori 2017-02-27 18:52:45 PST
Comment on attachment 301593 [details]
Patch

I've finished fixing depending bugs, please review again.
Comment 25 Ryosuke Niwa 2017-03-06 19:14:11 PST
Comment on attachment 301593 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301593&action=review

> Tools/ChangeLog:8
> +        On the other hand, pixel test fails and reports the error as

It's strange to start your log by saying "on the other hand" given you haven't stated what on one hand happens.

> Tools/ChangeLog:9
> +        expected. Bacause _compare_image checks returned err_str of

Typo: B"a"cause.

> Tools/ChangeLog:16
> +

Please also explain the change you're making to expected match case.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:340
> +            err_str = diff_result[2]

Please spell out "error" instead of abbreviating it as "err".
Comment 26 Fujii Hironori 2017-03-07 00:54:44 PST
Created attachment 303635 [details]
Patch
Comment 27 Alexey Proskuryakov 2017-03-09 16:13:51 PST
Comment on attachment 303635 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review

Looks good to me. Can automated tests be added for this fix?

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276
> +            error = diff_result[2]

I think that error_string would be a better name, there are too many things that an "error" can contain.

> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344
> +                actual_driver_output.error = (actual_driver_output.error or '') + error

I don't see any other place where we set actual_driver_output.error. Why is this needed?
Comment 28 Fujii Hironori 2017-03-12 22:15:00 PDT
Comment on attachment 303635 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=303635&action=review

>> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:276
>> +            error = diff_result[2]
> 
> I think that error_string would be a better name, there are too many things that an "error" can contain.

I agree. I'll revise.

>> Tools/Scripts/webkitpy/layout_tests/controllers/single_test_runner.py:344
>> +                actual_driver_output.error = (actual_driver_output.error or '') + error
> 
> I don't see any other place where we set actual_driver_output.error. Why is this needed?

actual_driver_output is the return value of self._driver.run_test.
Then, actual_driver_output.error may contain a error of executing WTR for actual case.
Comment 29 Fujii Hironori 2017-03-12 22:23:23 PDT
(In reply to comment #27)
> Looks good to me. Can automated tests be added for this fix?

I don't think so. I have no idea.
Comment 30 Fujii Hironori 2017-03-12 22:32:32 PDT
Created attachment 304227 [details]
Patch
Comment 31 Alexey Proskuryakov 2017-03-12 23:35:18 PDT
CC'ing some folks who may have an idea for how to make a regression test for this fix.
Comment 32 WebKit Commit Bot 2017-03-14 11:22:09 PDT
Comment on attachment 304227 [details]
Patch

Clearing flags on attachment: 304227

Committed r213918: <http://trac.webkit.org/changeset/213918>
Comment 33 WebKit Commit Bot 2017-03-14 11:22:19 PDT
All reviewed patches have been landed.  Closing bug.