Summary: | Buildbot marks a nrwt bot red when tests are missing results | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Ryosuke Niwa <rniwa> | ||||||||||||||||||
Component: | Tools / Tests | Assignee: | Ryosuke Niwa <rniwa> | ||||||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||||||
Severity: | Critical | CC: | abarth, aroben, dpranke, eric, kbalazs, kkristof, ossy, tony, webkit.review.bot, wsiegrist | ||||||||||||||||||
Priority: | P1 | ||||||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||
Bug Depends on: | 67268, 67855, 68247 | ||||||||||||||||||||
Bug Blocks: | 34984, 64491 | ||||||||||||||||||||
Attachments: |
|
Description
Ryosuke Niwa
2011-07-19 10:33:10 PDT
Created attachment 101342 [details]
fixes the bug
Comment on attachment 101342 [details] fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319 > - if result != FAILURE and (line.find('were flaky') >= 0 or line.find('new passes') >= 0): > + if result != FAILURE and (line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0): > result = WARNINGS > else: > result = FAILURE Nit: Can we just return on FAILURE and remove the check in the warnings code? Comment on attachment 101342 [details] fixes the bug View in context: https://bugs.webkit.org/attachment.cgi?id=101342&action=review >> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:319 > > Nit: Can we just return on FAILURE and remove the check in the warnings code? Ok, will do. Committed r91273: <http://trac.webkit.org/changeset/91273> *** Bug 64514 has been marked as a duplicate of this bug. *** Apparently, there is a bug. Created attachment 101378 [details]
more fixes
Ping reviwers Committed r91448: <http://trac.webkit.org/changeset/91448> Reopen, because it doesn't work now. :(( http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809 There is one new test, but the bot is red. Comment on attachment 101378 [details]
more fixes
remove r+ from landed patch
(In reply to comment #10) > Reopen, because it doesn't work now. :(( > > http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809 > There is one new test, but the bot is red. This is because nrwt exited with 1: http://build.webkit.org/builders/Qt%20Linux%20Release/builds/35809/steps/layout-test/logs/stdio 2011-07-27 04:00:24,024 31623 run_webkit_tests.py:110 DEBUG Testing completed, Exit status: 1 program finished with exit code 1 elapsedTime=1324.734859 Adam, does nrwt return 1 when there are missing results? Dunno. That's a question for Eric. (In reply to comment #12) > Adam, does nrwt return 1 when there are missing results? Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). Isn't it normally a mistake of the committer to forget expected results? (In reply to comment #14) > (In reply to comment #12) > > Adam, does nrwt return 1 when there are missing results? > > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). > > Isn't it normally a mistake of the committer to forget expected results? This is problem because committers land only mac specific expected results, and after it platform maintainers land platform specific results. We can't guarantee 0-24 gardeners for all ports. (In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #12) > > > Adam, does nrwt return 1 when there are missing results? > > > > Yes, because on build.chromium.org, missing results causes red (the gardener then adds it to test_expectations.txt or adds a baseline). > > > > Isn't it normally a mistake of the committer to forget expected results? > > This is problem because committers land only mac specific expected > results, and after it platform maintainers land platform specific results. > We can't guarantee 0-24 gardeners for all ports. I see, the chromium ports all fall back to the mac results, so it doesn't have this problem. Maybe we should special case the error if it's chromium. (In reply to comment #16) > I see, the chromium ports all fall back to the mac results, so it doesn't have this problem. Maybe we should special case the error if it's chromium. That sounds like a good idea. I'll note that I've been frequently annoyed by this error because sometimes people just add -expected.txt and not -expected.png but we have to wait and ask patch authors if the pixel results are correct or not. So it might be worth changing the behavior on chromium as well. What is the behavioral difference? Does NRWT exit 1 while ORWT exit'd 0? Or is it just a parsing difference in the buildbot code? (In reply to comment #18) > Does NRWT exit 1 while ORWT exit'd 0? Yes. Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. (In reply to comment #20) > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red. Make it so. :) (In reply to comment #21) > (In reply to comment #20) > > Well, if we're dumping out new results by default like ORWT did, then it seems we could also exit 0. Should be a very simple change to make. > > Tony says we should probably keep the old for chromium port or should modify build.chromium.org/chromium.webkit/ to still make the bots turn red. Specifically, if the chromium bots go red because of a missing result, it's normally because someone checked in a pixel test without any pixel results. This seems like a valid reason to go red. We could also test for the difference between a pixel test without any pixel results (tree red) vs a pixel test without pixel results for the platform being tested (tree orange), although I'm not sure it's worth the code complexity. It is critical for the Qt port. We don't want red bots because of new tests for hours. We don't have 0-24 gardener for checking new tests, so only the orange bot is acceptable. Kristóf, could you pick up this bug, and fix it ASAP? Please try to find a way make Qt and Chromium folks happy too. It should block "new-run-webkit-tests: MASTER BUG: Switch all webkit.org bots over" too. I'm sceptic a little bit why we (Qt) use NRWT, because it is slower than ORWT and it is always red because of new tests ... Okay, let's just return the cmd.rc for now. I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes. (In reply to comment #26) > Okay, let's just return the cmd.rc for now. I think that's a net win for now since we've never seen a case nrwt went wary and didn't spit out any line that doesn't match my regexes. I mean let's just ignore* cmd.rc. Created attachment 106811 [details]
Patch
Are we able to tell if the process crashed? E.g., someone made a change to NRWT that raises an exception? (In reply to comment #29) > Are we able to tell if the process crashed? E.g., someone made a change to NRWT that raises an exception? We'll see the bots turn orange so we should be able to. Comment on attachment 106811 [details]
Patch
I found a root cause of this. Let me fix it...
Created attachment 106823 [details]
fixes this bug for good
Comment on attachment 106823 [details] fixes this bug for good View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review r- now, because it isn't the real fix. > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 > + for line in self.incorrectLayoutLines: > + if line.find('flakes') >= 0 or line.find('new passes') >= 0 or line.find('missing results') >= 0: > + result = WARNINGS > + else: > + return FAILURE > > if cmd.rc != 0: > return FAILURE If there are 5 new test, but 0 real failure, then result will be WARNINGS, but cmd.rc == 5 will cause return FAILURE. (In reply to comment #33) We should subtract the count of new/flakes tests from the cmd.rc > (From update of attachment 106823 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review > > r- now, because it isn't the real fix. > > > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 count_of_new_tests = 0 count_of_flakey_tests = 0 > > + for line in self.incorrectLayoutLines: count_of_new_tests += line.find('new passes') + line.find('missing results') >= 0 count_of_flakey_tests += line.find('flakes') if count_of_new_tests + count_of_flakey_tests >= 0: > > + result = WARNINGS > > + else: > > + return FAILURE > > if cmd.rc - (count_of_new_tests + count_of_flakey_tests) != 0: > > return FAILURE > > If there are 5 new test, but 0 real failure, then result will > be WARNINGS, but cmd.rc == 5 will cause return FAILURE. (In reply to comment #34) I mistyped. count_of_new_tests += line.find('new passes') + line.find('missing results') Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests. (In reply to comment #36) > Kristof, the idea is good, but unfortunately the return code of ORWT can be 0 or 1. Only NRWT returns the number of the failing tests. In that case we have two way to fix it: * check which rwt has run maybe like here: def commandComplete(self, cmd): shell.Test.commandComplete(self, cmd) logText = cmd.logs['stdio'].getText() if logText.find("Collecting tests ...") >= 0: self._parseNewRunWebKitTestsOutput(logText) else: self._parseOldRunWebKitTestsOutput(logText) * the NRWT returns with the subtracted result instead of the number of the unexpected results This seems like a silly difference to have between NRWT and ORWT. I'd be in favor of changing NRWT to match ORWT's behavior. But I guess ORWT is going away "soon". Soon is still a ways off, sadly. But I am actually working on it again. :) Comment on attachment 106823 [details] fixes this bug for good View in context: https://bugs.webkit.org/attachment.cgi?id=106823&action=review >>> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:333 >>> return FAILURE >> >> If there are 5 new test, but 0 real failure, then result will >> be WARNINGS, but cmd.rc == 5 will cause return FAILURE. > > count_of_new_tests = 0 > count_of_flakey_tests = 0 Nope. Because of recent nrwt change, nrwt no longer report missing. It always report them as flaky. Also, we should fix nrwt to return 0 when some tests are missing results. I guess I can move that patch to elsewhere since it doesn't really match this bug. (In reply to comment #41) > I guess I can move that patch to elsewhere since it doesn't really match this bug. This change was necessary, but not enough to fix this bug. ( landed in http://trac.webkit.org/changeset/94868 ) Created attachment 107299 [details]
proposed fix
(In reply to comment #43) > Created an attachment (id=107299) [details] > proposed fix LGTM. Ryosuke, what do you think about it? Comment on attachment 107299 [details] proposed fix View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > + elif 'MISSING' in actual: > + num_flaky += 1 I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. (In reply to comment #45) > (From update of attachment 107299 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > > + elif 'MISSING' in actual: > > + num_flaky += 1 > > I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. It's true. But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result. Kristof, could you check bug67268? (In reply to comment #46) > (In reply to comment #45) > > (From update of attachment 107299 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=107299&action=review > > > > > Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:151 > > > + elif 'MISSING' in actual: > > > + num_flaky += 1 > > > > I don't think we should be counting missing expectations as flaky. That's at least semantically incorrect. Eric or Dirk should take a look at this change. > > It's true. > > But now a missing result is reported as flakey or missing depends on horoscope. :) Kristof's patch would eliminate this flakiness. It isn't the best solution, but it would make the bot orange if there is a new test without result. The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red. >
> The new tests sometimes get 'missing pass' result, and sometimes only 'missing'. When 'missing pass' it report as flakey and the bot turn onrange, but when only 'missing' it report as regression and the bot turn red.
I'm not familiar with the problem neither with NRWT but it seems to me that you should teach NRWT that missing results are a separate category (not fail neither flaky) instead of hacking around smg that paints the bots to the color we would like.
Comment on attachment 107299 [details]
proposed fix
This bug has been open for too long and is generating too many non-Chromium red bots :(
I think this fix itself is probably the wrong thing to do, but it's better than doing nothing. Please add a FIXME: to break MISSING results into a different category. It sounds like we want different behavior on Chromium and non-Chromium bots, but even that's still up in the air.
At any rate, I'll be gardening Chromium for the next few days, so landing this will force me to fix things one way or another :)
Created attachment 107636 [details]
mark missing result as flakey
Comment on attachment 107636 [details] mark missing result as flakey Landed in r95285. Reopen, because it broke a webkitpy test: File "/ramdisk/qt-linux-release/build/Tools/Scripts/webkitpy/layout_tests/run_webkit_tests_integrationtest.py", line 718, in test_missing_results self.assertEquals(res, 2) AssertionError: 0 != 2 Reopened because it was rolled out Created attachment 107837 [details]
mark missing results as flakey
Attachment 107837 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Tools/ChangeLog', u'Tools/Scripts/webkitpy..." exit_code: 1
Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:152: indentation is not a multiple of four [pep8/E111] [5]
Total errors found: 1 in 3 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 107838 [details]
mark missing results as flakey
Comment on attachment 107838 [details] mark missing results as flakey Clearing flags on attachment: 107838 Committed r95441: <http://trac.webkit.org/changeset/95441> All reviewed patches have been landed. Closing bug. |