Bug 64812

Summary:

Buildbot marks a nrwt bot red when tests are missing results

Product:

WebKit

Reporter:

Ryosuke Niwa <rniwa>

Component:

Tools / Tests

Assignee:

Ryosuke Niwa <rniwa>

Status:

RESOLVED FIXED

Severity:

Critical

CC:

abarth, aroben, dpranke, eric, kbalazs, kkristof, ossy, tony, webkit.review.bot, wsiegrist

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

67268, 67855, 68247

Bug Blocks:

34984, 64491

Attachments:

Description	Flags
fixes the bug	none
more fixes	none
Patch	none
fixes this bug for good	none
proposed fix	dpranke: review+, dpranke: commit-queue-
mark missing result as flakey	none
mark missing results as flakey	none
mark missing results as flakey	none

Ryosuke Niwa

Reported 2011-07-19 10:33:10 PDT

Missing expected results should only make the bots orange, not red. You can see this on: http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35535

Attachments
fixes the bug (2.31 KB, patch) 2011-07-19 10:41 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
more fixes (2.58 KB, patch) 2011-07-19 13:51 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
Patch (1.45 KB, patch) 2011-09-08 16:15 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
fixes this bug for good (2.81 KB, patch) 2011-09-08 18:03 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
proposed fix (1.21 KB, patch) 2011-09-14 00:17 PDT, Kristóf Kosztyó	dpranke: review+ dpranke: commit-queue-	Details Formatted Diff Diff
mark missing result as flakey (1.31 KB, patch) 2011-09-16 04:47 PDT, Kristóf Kosztyó	no flags	Details Formatted Diff Diff
mark missing results as flakey (2.18 KB, text/plain) 2011-09-19 05:40 PDT, Kristóf Kosztyó	no flags	Details
mark missing results as flakey (2.18 KB, patch) 2011-09-19 05:42 PDT, Kristóf Kosztyó	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2011-07-19 10:41:51 PDT

Created attachment 101342 [details] fixes the bug

Tony Chang

Comment 2 2011-07-19 10:46:49 PDT

Comment on attachment 101342 [details] fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319 > - if result != FAILURE and (line.find('were flaky') >= 0 or line.find('new passes') >= 0): > + if result != FAILURE and (line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0): > result = WARNINGS > else: > result = FAILURE Nit: Can we just return on FAILURE and remove the check in the warnings code?

Ryosuke Niwa

Comment 3 2011-07-19 11:18:02 PDT

Comment on attachment 101342 [details] fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review >> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319 > > Nit: Can we just return on FAILURE and remove the check in the warnings code? Ok, will do.

Ryosuke Niwa

Comment 4 2011-07-19 11:21:56 PDT

Committed r91273: <http://trac.webkit.org/changeset/91273>

Adam Roben (:aroben)

Comment 5 2011-07-19 11:31:51 PDT

*** Bug 64514 has been marked as a duplicate of this bug. ***

Ryosuke Niwa

Comment 6 2011-07-19 11:49:34 PDT

Apparently, there is a bug.

Ryosuke Niwa

Comment 7 2011-07-19 13:51:15 PDT

Created attachment 101378 [details] more fixes

Ryosuke Niwa

Comment 8 2011-07-19 18:12:06 PDT

Ping reviwers

Ryosuke Niwa

Comment 9 2011-07-20 23:54:05 PDT

Committed r91448: <http://trac.webkit.org/changeset/91448>

Csaba Osztrogonác

Comment 10 2011-07-27 05:34:46 PDT

Reopen, because it doesn't work now. :(( http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809 There is one new test, but the bot is red.

Csaba Osztrogonác

Comment 11 2011-07-27 05:35:07 PDT

Comment on attachment 101378 [details] more fixes remove r+ from landed patch

Ryosuke Niwa

Comment 12 2011-07-27 09:41:57 PDT

(In reply to comment #10) > Reopen, because it doesn't work now. :(( > > http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809 > There is one new test, but the bot is red. This is because nrwt exited with 1: http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809/steps/layout-test/logs/stdio 2011-07-27 04:00:24,024 31623 run_webkit_tests.py:110 DEBUG Testing completed, Exit status: 1 program finished with exit code 1 elapsedTime=1324.734859 Adam, does nrwt return 1 when there are missing results?

Adam Barth

Comment 13 2011-07-27 10:52:23 PDT

Dunno. That's a question for Eric.

Tony Chang

Comment 14 2011-07-27 11:01:39 PDT

(In reply to comment #12) > Adam, does nrwt return 1 when there are missing results? Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). Isn't it normally a mistake of the committer to forget expected results?

Csaba Osztrogonác

Comment 15 2011-07-27 11:04:44 PDT

(In reply to comment #14) > (In reply to comment #12) > > Adam, does nrwt return 1 when there are missing results? > > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). > > Isn't it normally a mistake of the committer to forget expected results? This is problem because committers land only mac specific expected results, and after it platform maintainers land platform specific results. We can't guarantee 0-24 gardeners for all ports.

Tony Chang

Comment 16 2011-07-27 11:22:59 PDT

(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #12) > > > Adam, does nrwt return 1 when there are missing results? > > > > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). > > > > Isn't it normally a mistake of the committer to forget expected results? > > This is problem because committers land only mac specific expected > results, and after it platform maintainers land platform specific results. > We can't guarantee 0-24 gardeners for all ports. I see, the chromium ports all fall back to the mac results, so it doesn't have this problem. Maybe we should special case the error if it's chromium.

Ryosuke Niwa

Comment 17 2011-07-27 11:40:36 PDT

(In reply to comment #16) > I see, the chromium ports all fall back to the mac results, so it doesn't have this problem. Maybe we should special case the error if it's chromium. That sounds like a good idea. I'll note that I've been frequently annoyed by this error because sometimes people just add -expected.txt and not -expected.png but we have to wait and ask patch authors if the pixel results are correct or not. So it might be worth changing the behavior on chromium as well.

Eric Seidel (no email)

Comment 18 2011-08-08 09:17:36 PDT

What is the behavioral difference? Does NRWT exit 1 while ORWT exit'd 0? Or is it just a parsing difference in the buildbot code?

Ryosuke Niwa

Comment 19 2011-08-08 09:20:56 PDT

(In reply to comment #18) > Does NRWT exit 1 while ORWT exit'd 0? Yes.

Eric Seidel (no email)

Comment 20 2011-08-08 09:24:57 PDT

Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make.

Ryosuke Niwa

Comment 21 2011-08-08 09:30:44 PDT

(In reply to comment #20) > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red.

Eric Seidel (no email)

Comment 22 2011-08-08 09:42:39 PDT

Make it so. :)

Tony Chang

Comment 23 2011-08-08 09:50:14 PDT

(In reply to comment #21) > (In reply to comment #20) > > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. > > Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red. Specifically, if the chromium bots go red because of a missing result, it's normally because someone checked in a pixel test without any pixel results. This seems like a valid reason to go red. We could also test for the difference between a pixel test without any pixel results (tree red) vs a pixel test without pixel results for the platform being tested (tree orange), although I'm not sure it's worth the code complexity.

Csaba Osztrogonác

Comment 24 2011-08-25 11:04:03 PDT

It is critical for the Qt port. We don't want red bots because of new tests for hours. We don't have 0-24 gardener for checking new tests, so only the orange bot is acceptable. Kristóf, could you pick up this bug, and fix it ASAP? Please try to find a way make Qt and Chromium folks happy too.

Csaba Osztrogonác

Comment 25 2011-08-31 01:00:43 PDT

It should block "new-run-webkit-tests: MASTER BUG: Switch all webkit.org bots over" too. I'm sceptic a little bit why we (Qt) use NRWT, because it is slower than ORWT and it is always red because of new tests ...

Ryosuke Niwa

Comment 26 2011-09-08 16:07:03 PDT

Okay, let's just return the cmd.rc for now. I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes.

Ryosuke Niwa

Comment 27 2011-09-08 16:07:18 PDT

(In reply to comment #26) > Okay, let's just return the cmd.rc for now. I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes. I mean let's just ignore* cmd.rc.

Ryosuke Niwa

Comment 28 2011-09-08 16:15:22 PDT

Created attachment 106811 [details] Patch

Tony Chang

Comment 29 2011-09-08 16:24:01 PDT

Are we able to tell if the process crashed? E.g., someone made a change to NRWT that raises an exception?

Ryosuke Niwa

Comment 30 2011-09-08 16:29:50 PDT

(In reply to comment #29) > Are we able to tell if the process crashed? E.g., someone made a change to NRWT that raises an exception? We'll see the bots turn orange so we should be able to.

Ryosuke Niwa

Comment 31 2011-09-08 16:36:01 PDT

Comment on attachment 106811 [details] Patch I found a root cause of this. Let me fix it...

Ryosuke Niwa

Comment 32 2011-09-08 18:03:02 PDT

Created attachment 106823 [details] fixes this bug for good

Csaba Osztrogonác

Comment 33 2011-09-09 06:17:55 PDT

Comment on attachment 106823 [details] fixes this bug for good View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review r- now, because it isn't the real fix. > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 > + for line in self.incorrectLayoutLines: > + if line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0: > + result = WARNINGS > + else: > + return FAILURE > > if cmd.rc != 0: > return FAILURE If there are 5 new test, but 0 real failure, then result will be WARNINGS, but cmd.rc == 5 will cause return FAILURE.

Kristóf Kosztyó

Comment 34 2011-09-09 06:27:15 PDT

(In reply to comment #33) We should subtract the count of new/flakes tests from the cmd.rc > (From update of attachment 106823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review > > r- now, because it isn't the real fix. > > > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 count_of_new_tests = 0 count_of_flakey_tests = 0 > > + for line in self.incorrectLayoutLines: count_of_new_tests += line.find('new passes') + line.find('missing results') >= 0 count_of_flakey_tests += line.find('flakes') if count_of_new_tests + count_of_flakey_tests >= 0: > > + result = WARNINGS > > + else: > > + return FAILURE > > if cmd.rc - (count_of_new_tests + count_of_flakey_tests) != 0: > > return FAILURE > > If there are 5 new test, but 0 real failure, then result will > be WARNINGS, but cmd.rc == 5 will cause return FAILURE.

Kristóf Kosztyó

Comment 35 2011-09-09 06:29:53 PDT

(In reply to comment #34) I mistyped. count_of_new_tests += line.find('new passes') + line.find('missing results')

Csaba Osztrogonác

Comment 36 2011-09-09 06:37:38 PDT

Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests.

Kristóf Kosztyó

Comment 37 2011-09-09 07:30:56 PDT

(In reply to comment #36) > Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests. In that case we have two way to fix it: * check which rwt has run maybe like here: def commandComplete(self, cmd): shell.Test.commandComplete(self, cmd) logText = cmd.logs['stdio'].getText() if logText.find("Collecting tests ...") >= 0: self._parseNewRunWebKitTestsOutput(logText) else: self._parseOldRunWebKitTestsOutput(logText) * the NRWT returns with the subtracted result instead of the number of the unexpected results

Adam Roben (:aroben)

Comment 38 2011-09-09 07:45:38 PDT

This seems like a silly difference to have between NRWT and ORWT. I'd be in favor of changing NRWT to match ORWT's behavior. But I guess ORWT is going away "soon".

Eric Seidel (no email)

Comment 39 2011-09-09 07:46:56 PDT

Soon is still a ways off, sadly. But I am actually working on it again. :)

Ryosuke Niwa

Comment 40 2011-09-09 10:10:38 PDT

Comment on attachment 106823 [details] fixes this bug for good View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review >>> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 >>> return FAILURE >> >> If there are 5 new test, but 0 real failure, then result will >> be WARNINGS, but cmd.rc == 5 will cause return FAILURE. > > count_of_new_tests = 0 > count_of_flakey_tests = 0 Nope. Because of recent nrwt change, nrwt no longer report missing. It always report them as flaky. Also, we should fix nrwt to return 0 when some tests are missing results.

Ryosuke Niwa

Comment 41 2011-09-09 10:13:46 PDT

I guess I can move that patch to elsewhere since it doesn't really match this bug.

Csaba Osztrogonác

Comment 42 2011-09-12 02:43:31 PDT

(In reply to comment #41) > I guess I can move that patch to elsewhere since it doesn't really match this bug. This change was necessary, but not enough to fix this bug. ( landed in http://trac.webkit.org/changeset/94868 )

Kristóf Kosztyó

Comment 43 2011-09-14 00:17:05 PDT

Created attachment 107299 [details] proposed fix

Csaba Osztrogonác

Comment 44 2011-09-14 00:41:27 PDT

(In reply to comment #43) > Created an attachment (id=107299) [details] > proposed fix LGTM. Ryosuke, what do you think about it?

Ryosuke Niwa

Comment 45 2011-09-14 00:43:08 PDT

Comment on attachment 107299 [details] proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > + elif 'MISSING' in actual: > + num_flaky += 1 I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change.

Csaba Osztrogonác

Comment 46 2011-09-14 00:50:27 PDT

(In reply to comment #45) > (From update of attachment 107299 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > > + elif 'MISSING' in actual: > > + num_flaky += 1 > > I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. It's true. But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result.

Csaba Osztrogonác

Comment 47 2011-09-14 00:52:49 PDT

Kristof, could you check bug67268?

Kristóf Kosztyó

Comment 48 2011-09-14 01:03:14 PDT

(In reply to comment #46) > (In reply to comment #45) > > (From update of attachment 107299 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > > > > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > > > + elif 'MISSING' in actual: > > > + num_flaky += 1 > > > > I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. > > It's true. > > But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result. The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red.

Balazs Kelemen

Comment 49 2011-09-14 01:45:55 PDT

> > The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red. I'm not familiar with the problem neither with NRWT but it seems to me that you should teach NRWT that missing results are a separate category (not fail neither flaky) instead of hacking around smg that paints the bots to the color we would like.

Dirk Pranke

Comment 50 2011-09-14 15:13:30 PDT

Comment on attachment 107299 [details] proposed fix This bug has been open for too long and is generating too many non-Chromium red bots :( I think this fix itself is probably the wrong thing to do, but it's better than doing nothing. Please add a FIXME: to break MISSING results into a different category. It sounds like we want different behavior on Chromium and non-Chromium bots, but even that's still up in the air. At any rate, I'll be gardening Chromium for the next few days, so landing this will force me to fix things one way or another :)

Kristóf Kosztyó

Comment 51 2011-09-16 04:47:24 PDT

Created attachment 107636 [details] mark missing result as flakey

Csaba Osztrogonác

Comment 52 2011-09-16 05:03:59 PDT

Comment on attachment 107636 [details] mark missing result as flakey Landed in r95285.

Csaba Osztrogonác

Comment 53 2011-09-16 08:04:29 PDT

Reopen, because it broke a webkitpy test: File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py", line 718, in test_missing_results self.assertEquals(res, 2) AssertionError: 0 != 2

Csaba Osztrogonác

Comment 54 2011-09-16 09:14:48 PDT

Reopened because it was rolled out

Kristóf Kosztyó

Comment 55 2011-09-19 05:40:15 PDT

Created attachment 107837 [details] mark missing results as flakey

WebKit Review Bot

Comment 56 2011-09-19 05:42:20 PDT

Attachment 107837 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/Scripts/webkitpy..." exit_code: 1 Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:152: indentation is not a multiple of four [pep8/E111] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.

Kristóf Kosztyó

Comment 57 2011-09-19 05:42:38 PDT

Created attachment 107838 [details] mark missing results as flakey

WebKit Review Bot

Comment 58 2011-09-19 11:02:12 PDT

Comment on attachment 107838 [details] mark missing results as flakey Clearing flags on attachment: 107838 Committed r95441: <http://trac.webkit.org/changeset/95441>

WebKit Review Bot

Comment 59 2011-09-19 11:02:19 PDT

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.