Bug 9496 - Pixel tests failing on BuildBot
Summary: Pixel tests failing on BuildBot
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 420+
Hardware: Macintosh OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords: LayoutTestFailure
Depends on: 12862 9576 9801 9812 9830 9834 13412
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-18 10:03 PDT by mitz
Modified: 2009-03-02 11:51 PST (History)
6 users (show)

See Also:


Attachments
download-differing-buildbot-pixel-results.pl (3.50 KB, text/x-perl-script)
2006-06-18 14:18 PDT, David Kilzer (:ddkilzer)
no flags Details
Pixel test differences for three machines (as of r19001) (27.23 KB, text/plain)
2007-01-29 08:18 PST, mitz
no flags Details
Pixel failures, classified (not including SVG) (18.48 KB, text/plain)
2007-01-29 09:34 PST, mitz
no flags Details
Pixel failures, annotated (SVG only) (5.20 KB, text/plain)
2007-01-29 11:53 PST, mitz
no flags Details
Pixel failures, annotated (SVG only) (7.48 KB, text/plain)
2007-01-29 12:01 PST, mitz
no flags Details
Patch that lets you ignore small differences (5.67 KB, patch)
2007-02-17 05:52 PST, mitz
no flags Details | Formatted Diff | Diff
Patch that lets you ignore small differences (6.34 KB, patch)
2007-11-07 17:42 PST, mitz
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mitz 2006-06-18 10:03:32 PDT
142 pixel tests are reported as failing on BuildBot, which makes it hard to notice real regressions. The failures consist of:

1. Minor discrepancies due to color matching, image decoding or other graphics APIs (some of these pass locally, some fail differently):

css1/box_properties/float_elements_in_series
editing/selection/iframe
editing/selection/inline-table
fast/css/first-letter-detach
fast/css/imageTileOpacity
fast/selectors/159
tables/mozilla/bugs/bug5797
tables/mozilla/bugs/bug10565
tables/mozilla/bugs/bug11026
tables/mozilla/bugs/bug12908-1
tables/mozilla/bugs/bug12908-2
tables/mozilla/bugs/bug12910-2
tables/mozilla/bugs/bug13169
tables/mozilla/bugs/bug15544
tables/mozilla/bugs/bug17138
tables/mozilla/bugs/bug29314
tables/mozilla/bugs/bug82946-2
tables/mozilla/bugs/bug120107
tables/mozilla/bugs/bug196870
tables/mozilla/bugs/bug1271
tables/mozilla/bugs/bug25074
tables/mozilla/bugs/bug625
tables/mozilla/bugs/bug1188
tables/mozilla/bugs/bug1296
tables/mozilla/bugs/bug1430
tables/mozilla/bugs/bug2981-2
tables/mozilla/bugs/bug4093
tables/mozilla/bugs/bug4284
tables/mozilla/bugs/bug4427
tables/mozilla/bugs/bug4523
tables/mozilla/bugs/bug6404
tables/mozilla/bugs/bug50695-2
tables/mozilla/bugs/bug56563
tables/mozilla/core/bloomberg
tables/mozilla/core/col_widths_auto_autoFix
tables/mozilla/core/misc
tables/mozilla/marvin/tbody_valign_baseline
tables/mozilla/marvin/tbody_valign_bottom
tables/mozilla/marvin/tbody_valign_middle
tables/mozilla/marvin/tbody_valign_top
tables/mozilla/marvin/td_valign_baseline
tables/mozilla/marvin/td_valign_bottom
tables/mozilla/marvin/td_valign_middle
tables/mozilla/marvin/td_valign_top
tables/mozilla/marvin/tfoot_valign_baseline
tables/mozilla/marvin/tfoot_valign_bottom
tables/mozilla/marvin/tfoot_valign_middle
tables/mozilla/marvin/tfoot_valign_top
tables/mozilla/marvin/th_valign_baseline
tables/mozilla/marvin/th_valign_bottom
tables/mozilla/marvin/th_valign_middle
tables/mozilla/marvin/th_valign_top
tables/mozilla/marvin/thead_valign_baseline
tables/mozilla/marvin/thead_valign_bottom
tables/mozilla/marvin/thead_valign_middle
tables/mozilla/marvin/thead_valign_top
tables/mozilla/marvin/tr_valign_baseline
tables/mozilla/marvin/tr_valign_bottom
tables/mozilla/marvin/tr_valign_middle
tables/mozilla/marvin/tr_valign_top
tables/mozilla/other/cell_widths
tables/mozilla_expected_failures/bugs/bug6933
tables/mozilla_expected_failures/bugs/bug85016
tables/mozilla_expected_failures/bugs/bug101674
css2.1/t0804-c5510-padn-00-b-ag
css2.1/t100801-c544-valgn-02-d-agi
css2.1/t100801-c544-valgn-03-d-agi
css2.1/t100801-c544-valgn-04-d-agi
fast/backgrounds/size/backgroundSize10
fast/backgrounds/size/backgroundSize12
fast/backgrounds/size/backgroundSize18
fast/backgrounds/size/backgroundSize19
fast/box-sizing/percentage-height
fast/replaced/image-sizing
fast/replaced/maxheight-percent
fast/replaced/maxheight-pxs
fast/replaced/maxwidth-percent
fast/replaced/maxwidth-pxs
tables/mozilla/bugs/bug14929
tables/mozilla/bugs/bug16252
tables/mozilla/bugs/bug97383

2. Pixel results not updated after fixing bug 3297:

editing/selection/3690719
fast/invalid/018
fast/table/colspanMinWidth
tables/mozilla/bugs/bug6304
tables/mozilla/bugs/bug25086
tables/mozilla/bugs/bug28928
tables/mozilla/bugs/bug44523
tables/mozilla/bugs/bug97138
tables/mozilla/core/col_widths_fix_auto
tables/mozilla/core/row_span
tables/mozilla_expected_failures/bugs/bug1262
tables/mozilla_expected_failures/bugs/bug11945
tables/mozilla_expected_failures/bugs/bug23847
tables/mozilla_expected_failures/bugs/bug32205-1
tables/mozilla_expected_failures/marvin/backgr_border-table-cell
tables/mozilla_expected_failures/marvin/backgr_border-table-column-group
tables/mozilla_expected_failures/marvin/backgr_border-table
fast/encoding/utf-16-big-endian
fast/encoding/utf-16-little-endian
fast/table/cell-absolute-child

3. Pixel results not updated for r13868:

fast/css/word-space-extra
fast/overflow/image-selection-highlight

4. A possible regression:

editing/style/smoosh-styles-003

5. Expected results generated with a debug build, bot running a release build:

tables/mozilla_expected_failures/bugs/bug178855

6. SVG failures:

svg/W3C-SVG-1.1/coords-units-01-b
svg/W3C-SVG-1.1/coords-viewattr-02-b
svg/W3C-SVG-1.1/filters-blend-01-b
svg/W3C-SVG-1.1/filters-color-01-b
svg/W3C-SVG-1.1/filters-composite-02-b
svg/W3C-SVG-1.1/filters-comptran-01-b
svg/W3C-SVG-1.1/filters-diffuse-01-f
svg/W3C-SVG-1.1/filters-displace-01-f
svg/W3C-SVG-1.1/filters-example-01-b
svg/W3C-SVG-1.1/filters-gauss-01-b
svg/W3C-SVG-1.1/filters-image-01-b
svg/W3C-SVG-1.1/filters-light-01-f
svg/W3C-SVG-1.1/filters-offset-01-b
svg/W3C-SVG-1.1/filters-specular-01-f
svg/W3C-SVG-1.1/paths-data-04-t
svg/W3C-SVG-1.1/pservers-grad-02-b
svg/W3C-SVG-1.1/pservers-grad-04-b
svg/W3C-SVG-1.1/pservers-grad-05-b
svg/W3C-SVG-1.1/pservers-grad-06-b
svg/W3C-SVG-1.1/pservers-grad-11-b
svg/W3C-SVG-1.1/pservers-grad-12-b
svg/W3C-SVG-1.1/render-groups-01-b
svg/W3C-SVG-1.1/render-groups-03-t
svg/W3C-SVG-1.1/struct-image-01-t
svg/W3C-SVG-1.1/struct-image-02-b
svg/W3C-SVG-1.1/struct-image-04-t
svg/W3C-SVG-1.1/styling-inherit-01-b
svg/custom/feComponentTransfer-Discrete
svg/custom/feComponentTransfer-Gamma
svg/custom/feComponentTransfer-Linear
svg/custom/feComponentTransfer-Table
svg/custom/feDisplacementMap-01
svg/custom/filter-source-alpha
svg/custom/image-with-transform-clip-filter
svg/custom/invalid-css
svg/custom/text-filter
svg/custom/text-image-opacity

With the exception of 4., I think the bot's current results should be checked in as the expected results. The tricky part is that if you run the tests on your machines, some of your results may differ from the bot's (due to different architectures, OS build or graphics hardware or color profiles), so perhaps somebody with access to the build slaves can run the tests on one of them and pull the results from it.
Comment 1 David Kilzer (:ddkilzer) 2006-06-18 14:18:13 PDT
Created attachment 8907 [details]
download-differing-buildbot-pixel-results.pl

A handy Perl script to download all of the differing images from the BuildBot web site into your local WebKit/LayoutTests directory structure.  Note that it does not reset checksums; this will occur when rerunning run-webkit-tests if the image diff succeeds.  (I tried the css2.1 images but they did not work locally.)

It would be nice if we could figure out why some of the tests with no apparent pixel differences are failing to compare.
Comment 2 mitz 2006-06-18 14:50:08 PDT
(In reply to comment #1)

> It would be nice if we could figure out why some of the tests with no apparent
> pixel differences are failing to compare.
> 

Tests with no visible pixel differences actually have small differences due to color matching and image decoding issues. One issue is that image decoding and rescaling has changed in 10.4.6, and the build slaves are on 10.4.5 or earlier. They should probably be upgraded before proceeding. The next step after upgrading would be to examine the new set of failing tests, and then run-webkit-tests --pixel --reset on a build slave and commit the generated results for all tests except the suspected regression (group 4. above) to the repository.
Comment 3 Alexey Proskuryakov 2006-06-19 11:38:19 PDT
Comparing 
http://build.webkit.org/post-commit-pixel-powerpc-mac-os-x/builds/1007 
and
http://build.webkit.org/post-commit-pixel-powerpc-mac-os-x/builds/1008 
(coming from apple-slave-6 and apple-slave-5 respectively, no significant differences in the codebase), I see several pixel tests that fail only in one of these:

editing/selection/drag-in-iframe
editing/selection/drag-to-contenteditable-iframe
svg/W3C-SVG-1.1/pservers-grad-08-b
svg/custom/filter-source-alpha
tables/mozilla/bugs/bug86708
Comment 4 mitz 2007-01-29 08:18:07 PST
Created attachment 12741 [details]
Pixel test differences for three machines (as of r19001)

This file lists tests that failed on at least one of three machines (a build slave <http://build.webkit.org/post-commit-pixel-powerpc-mac-os-x/builds/3397>, an iMac G5 and a MacBook Pro, the latter two running Mac OS X 10.4.8). For each test, the three numbers are the distance from the expected result observed by each of the machines (in decreasing order). The tests are listed in descending order of the biggest difference. The difference metric is the maximum over all pixels of the L_1 distance (sum of absolute differences of R, G and B) between actual pixel and expected pixel.

Hopefully I will follow up with some analysis. (What I really should have measured was the distances between the different machines' actual results, but laziness and total lack of {perl,python,ruby} fu have so far stopped me from doing it).
Comment 5 mitz 2007-01-29 09:34:03 PST
Created attachment 12745 [details]
Pixel failures, classified (not including SVG)

This suggests that for non-SVG tests, an acceptance threshold can be set at or slightly above 10, provided that 10 or so problematic tests are changed (to not use Arial, animated and other problematic GIFs or unpredictable caret visibility).
Comment 6 mitz 2007-01-29 11:53:02 PST
Created attachment 12748 [details]
Pixel failures, annotated (SVG only)

Looks like the biggest problem with SVG is fonts.
Comment 7 mitz 2007-01-29 12:01:44 PST
Created attachment 12749 [details]
Pixel failures, annotated (SVG only)

Changed tabs to spaces.
Comment 8 mitz 2007-02-17 05:52:19 PST
Created attachment 13211 [details]
Patch that lets you ignore small differences

I'm using this with --threshold 10
Comment 9 mitz 2007-11-07 17:42:10 PST
Created attachment 17115 [details]
Patch that lets you ignore small differences

Updated to merge with TOT.
Comment 10 Darin Adler 2007-11-07 17:44:38 PST
Comment on attachment 17115 [details]
Patch that lets you ignore small differences

r=me

Should we set an appropriate default threshold?
Comment 11 mitz 2007-11-07 17:48:49 PST
Comment on attachment 17115 [details]
Patch that lets you ignore small differences

This patch landed in r27584. Removing the review flag to keep it out of the commit queue.
Comment 12 mitz 2007-11-07 17:54:32 PST
(In reply to comment #10)
> Should we set an appropriate default threshold?

(A) default threshold(s) should be part of the plan to get people to run the tests on their own machines, but I would like to improve the reporting from run-webkit-tests before doing that (so that it tells you about "silent failures" and potentially suggests that you cache alternative checksums).
Comment 13 Simon Fraser (smfr) 2009-01-03 19:45:57 PST
How much of this is still relevant?
Comment 14 mitz 2009-01-03 19:53:23 PST
(In reply to comment #13)
> How much of this is still relevant?

Nothing that justifies keeping the bug open.