Bug 64812

Summary: Buildbot marks a nrwt bot red when tests are missing results
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Tools / TestsAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Critical CC: abarth, aroben, dpranke, eric, kbalazs, kkristof, ossy, tony, webkit.review.bot, wsiegrist
Priority: P1    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 67268, 67855, 68247    
Bug Blocks: 34984, 64491    
Attachments:
Description Flags
fixes the bug
none
more fixes
none
Patch
none
fixes this bug for good
none
proposed fix
dpranke: review+, dpranke: commit-queue-
mark missing result as flakey
none
mark missing results as flakey
none
mark missing results as flakey none

Description Ryosuke Niwa 2011-07-19 10:33:10 PDT
Missing expected results should only make the bots orange, not red.

You can see this on:
http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35535
Comment 1 Ryosuke Niwa 2011-07-19 10:41:51 PDT
Created attachment 101342 [details]
fixes the bug
Comment 2 Tony Chang 2011-07-19 10:46:49 PDT
Comment on attachment 101342 [details]
fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319
> -                if result != FAILURE and (line.find('were flaky') >= 0 or line.find('new passes') >= 0):
> +                if result != FAILURE and (line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0):
>                      result = WARNINGS
>                  else:
>                      result = FAILURE

Nit: Can we just return on FAILURE and remove the check in the warnings code?
Comment 3 Ryosuke Niwa 2011-07-19 11:18:02 PDT
Comment on attachment 101342 [details]
fixes the bug

View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review

>> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319

> 
> Nit: Can we just return on FAILURE and remove the check in the warnings code?

Ok, will do.
Comment 4 Ryosuke Niwa 2011-07-19 11:21:56 PDT
Committed r91273: <http://trac.webkit.org/changeset/91273>
Comment 5 Adam Roben (:aroben) 2011-07-19 11:31:51 PDT
*** Bug 64514 has been marked as a duplicate of this bug. ***
Comment 6 Ryosuke Niwa 2011-07-19 11:49:34 PDT
Apparently, there is a bug.
Comment 7 Ryosuke Niwa 2011-07-19 13:51:15 PDT
Created attachment 101378 [details]
more fixes
Comment 8 Ryosuke Niwa 2011-07-19 18:12:06 PDT
Ping reviwers
Comment 9 Ryosuke Niwa 2011-07-20 23:54:05 PDT
Committed r91448: <http://trac.webkit.org/changeset/91448>
Comment 10 Csaba Osztrogonác 2011-07-27 05:34:46 PDT
Reopen, because it doesn't work now. :((

http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809
There is one new test, but the bot is red.
Comment 11 Csaba Osztrogonác 2011-07-27 05:35:07 PDT
Comment on attachment 101378 [details]
more fixes

remove r+ from landed patch
Comment 12 Ryosuke Niwa 2011-07-27 09:41:57 PDT
(In reply to comment #10)
> Reopen, because it doesn't work now. :((
> 
> http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809
> There is one new test, but the bot is red.

This is because nrwt exited with 1:
http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809/steps/layout-test/logs/stdio

2011-07-27 04:00:24,024 31623 run_webkit_tests.py:110 DEBUG Testing completed, Exit status: 1
program finished with exit code 1
elapsedTime=1324.734859

Adam, does nrwt return 1 when there are missing results?
Comment 13 Adam Barth 2011-07-27 10:52:23 PDT
Dunno.  That's a question for Eric.
Comment 14 Tony Chang 2011-07-27 11:01:39 PDT
(In reply to comment #12)
> Adam, does nrwt return 1 when there are missing results?

Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline).

Isn't it normally a mistake of the committer to forget expected results?
Comment 15 Csaba Osztrogonác 2011-07-27 11:04:44 PDT
(In reply to comment #14)
> (In reply to comment #12)
> > Adam, does nrwt return 1 when there are missing results?
> 
> Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline).
> 
> Isn't it normally a mistake of the committer to forget expected results?

This is problem because committers land only mac specific expected 
results, and after it platform maintainers land platform specific results.
We can't guarantee 0-24 gardeners for all ports.
Comment 16 Tony Chang 2011-07-27 11:22:59 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #12)
> > > Adam, does nrwt return 1 when there are missing results?
> > 
> > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline).
> > 
> > Isn't it normally a mistake of the committer to forget expected results?
> 
> This is problem because committers land only mac specific expected 
> results, and after it platform maintainers land platform specific results.
> We can't guarantee 0-24 gardeners for all ports.

I see, the chromium ports all fall back to the mac results, so it doesn't have this problem.  Maybe we should special case the error if it's chromium.
Comment 17 Ryosuke Niwa 2011-07-27 11:40:36 PDT
(In reply to comment #16)
> I see, the chromium ports all fall back to the mac results, so it doesn't have this problem.  Maybe we should special case the error if it's chromium.

That sounds like a good idea.

I'll note that I've been frequently annoyed by this error because sometimes people just add -expected.txt and not -expected.png but we have to wait and ask patch authors if the pixel results are correct or not.  So it might be worth changing the behavior on chromium as well.
Comment 18 Eric Seidel (no email) 2011-08-08 09:17:36 PDT
What is the behavioral difference?  Does NRWT exit 1 while ORWT exit'd 0?  Or is it just a parsing difference in the buildbot code?
Comment 19 Ryosuke Niwa 2011-08-08 09:20:56 PDT
(In reply to comment #18)
> Does NRWT exit 1 while ORWT exit'd 0?

Yes.
Comment 20 Eric Seidel (no email) 2011-08-08 09:24:57 PDT
Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0.  Should be a very simple change to make.
Comment 21 Ryosuke Niwa 2011-08-08 09:30:44 PDT
(In reply to comment #20)
> Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0.  Should be a very simple change to make.

Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red.
Comment 22 Eric Seidel (no email) 2011-08-08 09:42:39 PDT
Make it so. :)
Comment 23 Tony Chang 2011-08-08 09:50:14 PDT
(In reply to comment #21)
> (In reply to comment #20)
> > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0.  Should be a very simple change to make.
> 
> Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red.

Specifically, if the chromium bots go red because of a missing result, it's normally because someone checked in a pixel test without any pixel results.  This seems like a valid reason to go red.

We could also test for the difference between a pixel test without any pixel results (tree red) vs a pixel test without pixel results for the platform being tested (tree orange), although I'm not sure it's worth the code complexity.
Comment 24 Csaba Osztrogonác 2011-08-25 11:04:03 PDT
It is critical for the Qt port. 

We don't want red bots because of new tests for hours. We don't have 0-24 gardener for checking new tests, so only the orange bot is acceptable.

Kristóf, could you pick up this bug, and fix it ASAP? Please 
try to find a way make Qt and Chromium folks happy too.
Comment 25 Csaba Osztrogonác 2011-08-31 01:00:43 PDT
It should block "new-run-webkit-tests: MASTER BUG: Switch all webkit.org bots over" too. I'm sceptic a little bit why we (Qt) use NRWT, because it is slower than ORWT and it is always red because of new tests ...
Comment 26 Ryosuke Niwa 2011-09-08 16:07:03 PDT
Okay, let's just return the cmd.rc for now.  I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes.
Comment 27 Ryosuke Niwa 2011-09-08 16:07:18 PDT
(In reply to comment #26)
> Okay, let's just return the cmd.rc for now.  I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes.

I mean let's just ignore* cmd.rc.
Comment 28 Ryosuke Niwa 2011-09-08 16:15:22 PDT
Created attachment 106811 [details]
Patch
Comment 29 Tony Chang 2011-09-08 16:24:01 PDT
Are we able to tell if the process crashed?  E.g., someone made a change to NRWT that raises an exception?
Comment 30 Ryosuke Niwa 2011-09-08 16:29:50 PDT
(In reply to comment #29)
> Are we able to tell if the process crashed?  E.g., someone made a change to NRWT that raises an exception?

We'll see the bots turn orange so we should be able to.
Comment 31 Ryosuke Niwa 2011-09-08 16:36:01 PDT
Comment on attachment 106811 [details]
Patch

I found a root cause of this.  Let me fix it...
Comment 32 Ryosuke Niwa 2011-09-08 18:03:02 PDT
Created attachment 106823 [details]
fixes this bug for good
Comment 33 Csaba Osztrogonác 2011-09-09 06:17:55 PDT
Comment on attachment 106823 [details]
fixes this bug for good

View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review

r- now, because it isn't the real fix.

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333
> +            for line in self.incorrectLayoutLines:
> +                if line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0:
> +                    result = WARNINGS
> +                else:
> +                    return FAILURE
>  
>          if cmd.rc != 0:
>              return FAILURE

If there are 5 new test, but 0 real failure, then result will 
be WARNINGS, but cmd.rc == 5 will cause return FAILURE.
Comment 34 Kristóf Kosztyó 2011-09-09 06:27:15 PDT
(In reply to comment #33)
We should subtract the count of new/flakes tests from the cmd.rc
> (From update of attachment 106823 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review
> 
> r- now, because it isn't the real fix.
> 
> > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333
                 count_of_new_tests = 0
                 count_of_flakey_tests = 0
> > +            for line in self.incorrectLayoutLines:
                     count_of_new_tests += line.find('new passes') + line.find('missing results') >= 0
                     count_of_flakey_tests += line.find('flakes')
                     if count_of_new_tests + count_of_flakey_tests >= 0:
> > +                    result = WARNINGS
> > +                else:
> > +                    return FAILURE
> >  
            if cmd.rc - (count_of_new_tests + count_of_flakey_tests) != 0:
> >              return FAILURE
> 
> If there are 5 new test, but 0 real failure, then result will 
> be WARNINGS, but cmd.rc == 5 will cause return FAILURE.
Comment 35 Kristóf Kosztyó 2011-09-09 06:29:53 PDT
(In reply to comment #34)
I mistyped.
count_of_new_tests += line.find('new passes') + line.find('missing results')
Comment 36 Csaba Osztrogonác 2011-09-09 06:37:38 PDT
Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests.
Comment 37 Kristóf Kosztyó 2011-09-09 07:30:56 PDT
(In reply to comment #36)
> Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests.

In that case we have two way to fix it:
 * check which rwt has run
    maybe like here:
      def commandComplete(self, cmd):
          shell.Test.commandComplete(self, cmd)

          logText = cmd.logs['stdio'].getText()
          if logText.find("Collecting tests ...") >= 0:
              self._parseNewRunWebKitTestsOutput(logText)
          else:
              self._parseOldRunWebKitTestsOutput(logText)

 * the NRWT returns with the subtracted result instead of the number of the unexpected results
Comment 38 Adam Roben (:aroben) 2011-09-09 07:45:38 PDT
This seems like a silly difference to have between NRWT and ORWT. I'd be in favor of changing NRWT to match ORWT's behavior. But I guess ORWT is going away "soon".
Comment 39 Eric Seidel (no email) 2011-09-09 07:46:56 PDT
Soon is still a ways off, sadly.  But I am actually working on it again. :)
Comment 40 Ryosuke Niwa 2011-09-09 10:10:38 PDT
Comment on attachment 106823 [details]
fixes this bug for good

View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review

>>> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333
>>>              return FAILURE
>> 
>> If there are 5 new test, but 0 real failure, then result will 
>> be WARNINGS, but cmd.rc == 5 will cause return FAILURE.
> 
> count_of_new_tests = 0
>                  count_of_flakey_tests = 0

Nope. Because of recent nrwt change, nrwt no longer report missing. It always report them as flaky.

Also, we should fix nrwt to return 0 when some tests are missing results.
Comment 41 Ryosuke Niwa 2011-09-09 10:13:46 PDT
I guess I can move that patch to elsewhere since it doesn't really match this bug.
Comment 42 Csaba Osztrogonác 2011-09-12 02:43:31 PDT
(In reply to comment #41)
> I guess I can move that patch to elsewhere since it doesn't really match this bug.

This change was necessary, but not enough to fix this bug.
( landed in http://trac.webkit.org/changeset/94868 )
Comment 43 Kristóf Kosztyó 2011-09-14 00:17:05 PDT
Created attachment 107299 [details]
proposed fix
Comment 44 Csaba Osztrogonác 2011-09-14 00:41:27 PDT
(In reply to comment #43)
> Created an attachment (id=107299) [details]
> proposed fix

LGTM. Ryosuke, what do you think about it?
Comment 45 Ryosuke Niwa 2011-09-14 00:43:08 PDT
Comment on attachment 107299 [details]
proposed fix

View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review

> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151
> +                elif 'MISSING' in actual:
> +                    num_flaky += 1

I don't think we should be counting missing expectations as flaky.  That's at least semantically incorrect.  Eric or Dirk should take a look at this change.
Comment 46 Csaba Osztrogonác 2011-09-14 00:50:27 PDT
(In reply to comment #45)
> (From update of attachment 107299 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review
> 
> > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151
> > +                elif 'MISSING' in actual:
> > +                    num_flaky += 1
> 
> I don't think we should be counting missing expectations as flaky.  That's at least semantically incorrect.  Eric or Dirk should take a look at this change.

It's true. 

But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result.
Comment 47 Csaba Osztrogonác 2011-09-14 00:52:49 PDT
Kristof, could you check bug67268?
Comment 48 Kristóf Kosztyó 2011-09-14 01:03:14 PDT
(In reply to comment #46)
> (In reply to comment #45)
> > (From update of attachment 107299 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review
> > 
> > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151
> > > +                elif 'MISSING' in actual:
> > > +                    num_flaky += 1
> > 
> > I don't think we should be counting missing expectations as flaky.  That's at least semantically incorrect.  Eric or Dirk should take a look at this change.
> 
> It's true. 
> 
> But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result.

The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red.
Comment 49 Balazs Kelemen 2011-09-14 01:45:55 PDT
> 
> The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red.

I'm not familiar with the problem neither with NRWT but it seems to me that you should teach NRWT that missing results are a separate category (not fail neither flaky) instead of hacking around smg that paints the bots to the color we would like.
Comment 50 Dirk Pranke 2011-09-14 15:13:30 PDT
Comment on attachment 107299 [details]
proposed fix

This bug has been open for too long and is generating too many non-Chromium red bots :(

I think this fix itself is probably the wrong thing to do, but it's better than doing nothing. Please add a FIXME: to break MISSING results into a different category. It sounds like we want different behavior on Chromium and non-Chromium bots, but even that's still up in the air.

At any rate, I'll be gardening Chromium for the next few days, so landing this will force me to fix things one way or another :)
Comment 51 Kristóf Kosztyó 2011-09-16 04:47:24 PDT
Created attachment 107636 [details]
mark missing result as flakey
Comment 52 Csaba Osztrogonác 2011-09-16 05:03:59 PDT
Comment on attachment 107636 [details]
mark missing result as flakey

Landed in r95285.
Comment 53 Csaba Osztrogonác 2011-09-16 08:04:29 PDT
Reopen, because it broke a webkitpy test:

 File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py", line 718, in test_missing_results
    self.assertEquals(res, 2)
AssertionError: 0 != 2
Comment 54 Csaba Osztrogonác 2011-09-16 09:14:48 PDT
Reopened because it was rolled out
Comment 55 Kristóf Kosztyó 2011-09-19 05:40:15 PDT
Created attachment 107837 [details]
mark missing results as flakey
Comment 56 WebKit Review Bot 2011-09-19 05:42:20 PDT
Attachment 107837 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/Scripts/webkitpy..." exit_code: 1

Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:152:  indentation is not a multiple of four  [pep8/E111] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 57 Kristóf Kosztyó 2011-09-19 05:42:38 PDT
Created attachment 107838 [details]
mark missing results as flakey
Comment 58 WebKit Review Bot 2011-09-19 11:02:12 PDT
Comment on attachment 107838 [details]
mark missing results as flakey

Clearing flags on attachment: 107838

Committed r95441: <http://trac.webkit.org/changeset/95441>
Comment 59 WebKit Review Bot 2011-09-19 11:02:19 PDT
All reviewed patches have been landed.  Closing bug.