Bug 64812

Summary: Buildbot marks a nrwt bot red when tests are missing results
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Tools / TestsAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Critical CC: abarth, aroben, dpranke, eric, kbalazs, kkristof, ossy, tony, webkit.review.bot, wsiegrist
Priority: P1    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 67268, 67855, 68247    
Bug Blocks: 34984, 64491    
Attachments:
Description Flags
fixes the bug
none
more fixes
none
Patch
none
fixes this bug for good
none
proposed fix
dpranke: review+, dpranke: commit-queue-
mark missing result as flakey
none
mark missing results as flakey
none
mark missing results as flakey none

Ryosuke Niwa
Reported 2011-07-19 10:33:10 PDT
Missing expected results should only make the bots orange, not red. You can see this on: http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35535
Attachments
fixes the bug (2.31 KB, patch)
2011-07-19 10:41 PDT, Ryosuke Niwa
no flags
more fixes (2.58 KB, patch)
2011-07-19 13:51 PDT, Ryosuke Niwa
no flags
Patch (1.45 KB, patch)
2011-09-08 16:15 PDT, Ryosuke Niwa
no flags
fixes this bug for good (2.81 KB, patch)
2011-09-08 18:03 PDT, Ryosuke Niwa
no flags
proposed fix (1.21 KB, patch)
2011-09-14 00:17 PDT, Kristóf Kosztyó
dpranke: review+
dpranke: commit-queue-
mark missing result as flakey (1.31 KB, patch)
2011-09-16 04:47 PDT, Kristóf Kosztyó
no flags
mark missing results as flakey (2.18 KB, text/plain)
2011-09-19 05:40 PDT, Kristóf Kosztyó
no flags
mark missing results as flakey (2.18 KB, patch)
2011-09-19 05:42 PDT, Kristóf Kosztyó
no flags
Ryosuke Niwa
Comment 1 2011-07-19 10:41:51 PDT
Created attachment 101342 [details] fixes the bug
Tony Chang
Comment 2 2011-07-19 10:46:49 PDT
Comment on attachment 101342 [details] fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319 > - if result != FAILURE and (line.find('were flaky') >= 0 or line.find('new passes') >= 0): > + if result != FAILURE and (line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0): > result = WARNINGS > else: > result = FAILURE Nit: Can we just return on FAILURE and remove the check in the warnings code?
Ryosuke Niwa
Comment 3 2011-07-19 11:18:02 PDT
Comment on attachment 101342 [details] fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review >> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319 > > Nit: Can we just return on FAILURE and remove the check in the warnings code? Ok, will do.
Ryosuke Niwa
Comment 4 2011-07-19 11:21:56 PDT
Adam Roben (:aroben)
Comment 5 2011-07-19 11:31:51 PDT
*** Bug 64514 has been marked as a duplicate of this bug. ***
Ryosuke Niwa
Comment 6 2011-07-19 11:49:34 PDT
Apparently, there is a bug.
Ryosuke Niwa
Comment 7 2011-07-19 13:51:15 PDT
Created attachment 101378 [details] more fixes
Ryosuke Niwa
Comment 8 2011-07-19 18:12:06 PDT
Ping reviwers
Ryosuke Niwa
Comment 9 2011-07-20 23:54:05 PDT
Csaba Osztrogonác
Comment 10 2011-07-27 05:34:46 PDT
Reopen, because it doesn't work now. :(( http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809 There is one new test, but the bot is red.
Csaba Osztrogonác
Comment 11 2011-07-27 05:35:07 PDT
Comment on attachment 101378 [details] more fixes remove r+ from landed patch
Ryosuke Niwa
Comment 12 2011-07-27 09:41:57 PDT
(In reply to comment #10) > Reopen, because it doesn't work now. :(( > > http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809 > There is one new test, but the bot is red. This is because nrwt exited with 1: http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809/steps/layout-test/logs/stdio 2011-07-27 04:00:24,024 31623 run_webkit_tests.py:110 DEBUG Testing completed, Exit status: 1 program finished with exit code 1 elapsedTime=1324.734859 Adam, does nrwt return 1 when there are missing results?
Adam Barth
Comment 13 2011-07-27 10:52:23 PDT
Dunno. That's a question for Eric.
Tony Chang
Comment 14 2011-07-27 11:01:39 PDT
(In reply to comment #12) > Adam, does nrwt return 1 when there are missing results? Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). Isn't it normally a mistake of the committer to forget expected results?
Csaba Osztrogonác
Comment 15 2011-07-27 11:04:44 PDT
(In reply to comment #14) > (In reply to comment #12) > > Adam, does nrwt return 1 when there are missing results? > > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). > > Isn't it normally a mistake of the committer to forget expected results? This is problem because committers land only mac specific expected results, and after it platform maintainers land platform specific results. We can't guarantee 0-24 gardeners for all ports.
Tony Chang
Comment 16 2011-07-27 11:22:59 PDT
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #12) > > > Adam, does nrwt return 1 when there are missing results? > > > > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). > > > > Isn't it normally a mistake of the committer to forget expected results? > > This is problem because committers land only mac specific expected > results, and after it platform maintainers land platform specific results. > We can't guarantee 0-24 gardeners for all ports. I see, the chromium ports all fall back to the mac results, so it doesn't have this problem. Maybe we should special case the error if it's chromium.
Ryosuke Niwa
Comment 17 2011-07-27 11:40:36 PDT
(In reply to comment #16) > I see, the chromium ports all fall back to the mac results, so it doesn't have this problem. Maybe we should special case the error if it's chromium. That sounds like a good idea. I'll note that I've been frequently annoyed by this error because sometimes people just add -expected.txt and not -expected.png but we have to wait and ask patch authors if the pixel results are correct or not. So it might be worth changing the behavior on chromium as well.
Eric Seidel (no email)
Comment 18 2011-08-08 09:17:36 PDT
What is the behavioral difference? Does NRWT exit 1 while ORWT exit'd 0? Or is it just a parsing difference in the buildbot code?
Ryosuke Niwa
Comment 19 2011-08-08 09:20:56 PDT
(In reply to comment #18) > Does NRWT exit 1 while ORWT exit'd 0? Yes.
Eric Seidel (no email)
Comment 20 2011-08-08 09:24:57 PDT
Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make.
Ryosuke Niwa
Comment 21 2011-08-08 09:30:44 PDT
(In reply to comment #20) > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red.
Eric Seidel (no email)
Comment 22 2011-08-08 09:42:39 PDT
Make it so. :)
Tony Chang
Comment 23 2011-08-08 09:50:14 PDT
(In reply to comment #21) > (In reply to comment #20) > > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. > > Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red. Specifically, if the chromium bots go red because of a missing result, it's normally because someone checked in a pixel test without any pixel results. This seems like a valid reason to go red. We could also test for the difference between a pixel test without any pixel results (tree red) vs a pixel test without pixel results for the platform being tested (tree orange), although I'm not sure it's worth the code complexity.
Csaba Osztrogonác
Comment 24 2011-08-25 11:04:03 PDT
It is critical for the Qt port. We don't want red bots because of new tests for hours. We don't have 0-24 gardener for checking new tests, so only the orange bot is acceptable. Kristóf, could you pick up this bug, and fix it ASAP? Please try to find a way make Qt and Chromium folks happy too.
Csaba Osztrogonác
Comment 25 2011-08-31 01:00:43 PDT
It should block "new-run-webkit-tests: MASTER BUG: Switch all webkit.org bots over" too. I'm sceptic a little bit why we (Qt) use NRWT, because it is slower than ORWT and it is always red because of new tests ...
Ryosuke Niwa
Comment 26 2011-09-08 16:07:03 PDT
Okay, let's just return the cmd.rc for now. I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes.
Ryosuke Niwa
Comment 27 2011-09-08 16:07:18 PDT
(In reply to comment #26) > Okay, let's just return the cmd.rc for now. I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes. I mean let's just ignore* cmd.rc.
Ryosuke Niwa
Comment 28 2011-09-08 16:15:22 PDT
Tony Chang
Comment 29 2011-09-08 16:24:01 PDT
Are we able to tell if the process crashed? E.g., someone made a change to NRWT that raises an exception?
Ryosuke Niwa
Comment 30 2011-09-08 16:29:50 PDT
(In reply to comment #29) > Are we able to tell if the process crashed? E.g., someone made a change to NRWT that raises an exception? We'll see the bots turn orange so we should be able to.
Ryosuke Niwa
Comment 31 2011-09-08 16:36:01 PDT
Comment on attachment 106811 [details] Patch I found a root cause of this. Let me fix it...
Ryosuke Niwa
Comment 32 2011-09-08 18:03:02 PDT
Created attachment 106823 [details] fixes this bug for good
Csaba Osztrogonác
Comment 33 2011-09-09 06:17:55 PDT
Comment on attachment 106823 [details] fixes this bug for good View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review r- now, because it isn't the real fix. > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 > + for line in self.incorrectLayoutLines: > + if line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0: > + result = WARNINGS > + else: > + return FAILURE > > if cmd.rc != 0: > return FAILURE If there are 5 new test, but 0 real failure, then result will be WARNINGS, but cmd.rc == 5 will cause return FAILURE.
Kristóf Kosztyó
Comment 34 2011-09-09 06:27:15 PDT
(In reply to comment #33) We should subtract the count of new/flakes tests from the cmd.rc > (From update of attachment 106823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review > > r- now, because it isn't the real fix. > > > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 count_of_new_tests = 0 count_of_flakey_tests = 0 > > + for line in self.incorrectLayoutLines: count_of_new_tests += line.find('new passes') + line.find('missing results') >= 0 count_of_flakey_tests += line.find('flakes') if count_of_new_tests + count_of_flakey_tests >= 0: > > + result = WARNINGS > > + else: > > + return FAILURE > > if cmd.rc - (count_of_new_tests + count_of_flakey_tests) != 0: > > return FAILURE > > If there are 5 new test, but 0 real failure, then result will > be WARNINGS, but cmd.rc == 5 will cause return FAILURE.
Kristóf Kosztyó
Comment 35 2011-09-09 06:29:53 PDT
(In reply to comment #34) I mistyped. count_of_new_tests += line.find('new passes') + line.find('missing results')
Csaba Osztrogonác
Comment 36 2011-09-09 06:37:38 PDT
Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests.
Kristóf Kosztyó
Comment 37 2011-09-09 07:30:56 PDT
(In reply to comment #36) > Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests. In that case we have two way to fix it: * check which rwt has run maybe like here: def commandComplete(self, cmd): shell.Test.commandComplete(self, cmd) logText = cmd.logs['stdio'].getText() if logText.find("Collecting tests ...") >= 0: self._parseNewRunWebKitTestsOutput(logText) else: self._parseOldRunWebKitTestsOutput(logText) * the NRWT returns with the subtracted result instead of the number of the unexpected results
Adam Roben (:aroben)
Comment 38 2011-09-09 07:45:38 PDT
This seems like a silly difference to have between NRWT and ORWT. I'd be in favor of changing NRWT to match ORWT's behavior. But I guess ORWT is going away "soon".
Eric Seidel (no email)
Comment 39 2011-09-09 07:46:56 PDT
Soon is still a ways off, sadly. But I am actually working on it again. :)
Ryosuke Niwa
Comment 40 2011-09-09 10:10:38 PDT
Comment on attachment 106823 [details] fixes this bug for good View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review >>> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 >>> return FAILURE >> >> If there are 5 new test, but 0 real failure, then result will >> be WARNINGS, but cmd.rc == 5 will cause return FAILURE. > > count_of_new_tests = 0 > count_of_flakey_tests = 0 Nope. Because of recent nrwt change, nrwt no longer report missing. It always report them as flaky. Also, we should fix nrwt to return 0 when some tests are missing results.
Ryosuke Niwa
Comment 41 2011-09-09 10:13:46 PDT
I guess I can move that patch to elsewhere since it doesn't really match this bug.
Csaba Osztrogonác
Comment 42 2011-09-12 02:43:31 PDT
(In reply to comment #41) > I guess I can move that patch to elsewhere since it doesn't really match this bug. This change was necessary, but not enough to fix this bug. ( landed in http://trac.webkit.org/changeset/94868 )
Kristóf Kosztyó
Comment 43 2011-09-14 00:17:05 PDT
Created attachment 107299 [details] proposed fix
Csaba Osztrogonác
Comment 44 2011-09-14 00:41:27 PDT
(In reply to comment #43) > Created an attachment (id=107299) [details] > proposed fix LGTM. Ryosuke, what do you think about it?
Ryosuke Niwa
Comment 45 2011-09-14 00:43:08 PDT
Comment on attachment 107299 [details] proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > + elif 'MISSING' in actual: > + num_flaky += 1 I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change.
Csaba Osztrogonác
Comment 46 2011-09-14 00:50:27 PDT
(In reply to comment #45) > (From update of attachment 107299 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > > + elif 'MISSING' in actual: > > + num_flaky += 1 > > I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. It's true. But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result.
Csaba Osztrogonác
Comment 47 2011-09-14 00:52:49 PDT
Kristof, could you check bug67268?
Kristóf Kosztyó
Comment 48 2011-09-14 01:03:14 PDT
(In reply to comment #46) > (In reply to comment #45) > > (From update of attachment 107299 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > > > > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > > > + elif 'MISSING' in actual: > > > + num_flaky += 1 > > > > I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. > > It's true. > > But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result. The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red.
Balazs Kelemen
Comment 49 2011-09-14 01:45:55 PDT
> > The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red. I'm not familiar with the problem neither with NRWT but it seems to me that you should teach NRWT that missing results are a separate category (not fail neither flaky) instead of hacking around smg that paints the bots to the color we would like.
Dirk Pranke
Comment 50 2011-09-14 15:13:30 PDT
Comment on attachment 107299 [details] proposed fix This bug has been open for too long and is generating too many non-Chromium red bots :( I think this fix itself is probably the wrong thing to do, but it's better than doing nothing. Please add a FIXME: to break MISSING results into a different category. It sounds like we want different behavior on Chromium and non-Chromium bots, but even that's still up in the air. At any rate, I'll be gardening Chromium for the next few days, so landing this will force me to fix things one way or another :)
Kristóf Kosztyó
Comment 51 2011-09-16 04:47:24 PDT
Created attachment 107636 [details] mark missing result as flakey
Csaba Osztrogonác
Comment 52 2011-09-16 05:03:59 PDT
Comment on attachment 107636 [details] mark missing result as flakey Landed in r95285.
Csaba Osztrogonác
Comment 53 2011-09-16 08:04:29 PDT
Reopen, because it broke a webkitpy test: File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py", line 718, in test_missing_results self.assertEquals(res, 2) AssertionError: 0 != 2
Csaba Osztrogonác
Comment 54 2011-09-16 09:14:48 PDT
Reopened because it was rolled out
Kristóf Kosztyó
Comment 55 2011-09-19 05:40:15 PDT
Created attachment 107837 [details] mark missing results as flakey
WebKit Review Bot
Comment 56 2011-09-19 05:42:20 PDT
Attachment 107837 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/Scripts/webkitpy..." exit_code: 1 Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:152: indentation is not a multiple of four [pep8/E111] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Kristóf Kosztyó
Comment 57 2011-09-19 05:42:38 PDT
Created attachment 107838 [details] mark missing results as flakey
WebKit Review Bot
Comment 58 2011-09-19 11:02:12 PDT
Comment on attachment 107838 [details] mark missing results as flakey Clearing flags on attachment: 107838 Committed r95441: <http://trac.webkit.org/changeset/95441>
WebKit Review Bot
Comment 59 2011-09-19 11:02:19 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.