Bug 62178

Summary: new-run-webkit-tests: Bot master should print useful information on waterfall/console for nrwt
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Tools / TestsAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aroben, dpranke, eric, ojan, sam, tony, wsiegrist
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 34984    
Attachments:
Description Flags
screenshot 1
none
screenshot 2
none
fixes the bug
none
screenshot 1'
none
screenshot 2'
none
shortened text
none
screenshot 1''
none
screenshot 2''
none
shortened text further
none
screenshot 1'''
none
screenshot 2'''
none
Patch tony: review+

Ryosuke Niwa
Reported 2011-06-06 19:27:45 PDT
Right now, waterfall and console on build.webkit.org only says "failed" for new-run-webkit-tests. This makes waterfall/console views much less useful. We should modify master.cfg to display more useful information.
Attachments
screenshot 1 (28.42 KB, image/png)
2011-06-06 19:29 PDT, Ryosuke Niwa
no flags
screenshot 2 (20.94 KB, image/png)
2011-06-06 19:29 PDT, Ryosuke Niwa
no flags
fixes the bug (2.67 KB, patch)
2011-06-06 19:38 PDT, Ryosuke Niwa
no flags
screenshot 1' (32.56 KB, image/png)
2011-06-06 22:13 PDT, Ryosuke Niwa
no flags
screenshot 2' (21.48 KB, image/png)
2011-06-06 22:13 PDT, Ryosuke Niwa
no flags
shortened text (2.79 KB, patch)
2011-06-06 22:17 PDT, Ryosuke Niwa
no flags
screenshot 1'' (12.18 KB, image/png)
2011-06-07 13:21 PDT, Ryosuke Niwa
no flags
screenshot 2'' (17.84 KB, image/png)
2011-06-07 13:22 PDT, Ryosuke Niwa
no flags
shortened text further (2.82 KB, patch)
2011-06-07 13:26 PDT, Ryosuke Niwa
no flags
screenshot 1''' (17.08 KB, image/png)
2011-06-07 16:24 PDT, Ryosuke Niwa
no flags
screenshot 2''' (19.46 KB, image/png)
2011-06-07 16:25 PDT, Ryosuke Niwa
no flags
Patch (2.73 KB, patch)
2011-06-07 16:46 PDT, Ryosuke Niwa
tony: review+
Ryosuke Niwa
Comment 1 2011-06-06 19:29:29 PDT
Created attachment 96179 [details] screenshot 1
Ryosuke Niwa
Comment 2 2011-06-06 19:29:46 PDT
Created attachment 96180 [details] screenshot 2
Adam Barth
Comment 3 2011-06-06 19:31:45 PDT
There's too much text there. Can we convey the information with less text?
Ryosuke Niwa
Comment 4 2011-06-06 19:34:17 PDT
(In reply to comment #3) > There's too much text there. Can we convey the information with less text? Yeah, I agree. Do you have a suggestion as to how we do it?
Ryosuke Niwa
Comment 5 2011-06-06 19:36:45 PDT
Maybe I can get rid of " mismatch ". I don't know how I can shorten "Expected to fail, but passed" and "Expected to timeout, but passed"
Ryosuke Niwa
Comment 6 2011-06-06 19:38:05 PDT
Created attachment 96181 [details] fixes the bug
Ryosuke Niwa
Comment 7 2011-06-06 22:13:13 PDT
Created attachment 96198 [details] screenshot 1'
Ryosuke Niwa
Comment 8 2011-06-06 22:13:43 PDT
Created attachment 96199 [details] screenshot 2'
Ryosuke Niwa
Comment 9 2011-06-06 22:17:07 PDT
Created attachment 96200 [details] shortened text
Ryosuke Niwa
Comment 10 2011-06-06 23:56:13 PDT
*** Bug 59894 has been marked as a duplicate of this bug. ***
Tony Chang
Comment 11 2011-06-07 09:48:29 PDT
This still seems like too much text and the wrapping makes it hard to read in the narrow view. How about just totals like: flaky: # failed: # And if there are no failed tests: unexpected pass: # flaky: # Alternately, we should just match the text of the other bots and work from there (i.e., work on improving all the bots output).
Ojan Vafai
Comment 12 2011-06-07 09:54:48 PDT
Comment on attachment 96200 [details] shortened text I don't think we should print out what kind of failure/flaky it is. Just a sum number for each of the following: # failures # flaky # unexpected passes Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing.
Ryosuke Niwa
Comment 13 2011-06-07 11:18:54 PDT
(In reply to comment #11) > This still seems like too much text and the wrapping makes it hard to read in the narrow view. > > How about just totals like: > > flaky: # > failed: # I agree. We should shorten the text. But I'd like the number of failures be left-aligned. It makes a huge difference in readability. (In reply to comment #12) > (From update of attachment 96200 [details]) > I don't think we should print out what kind of failure/flaky it is. Just a sum number for each of the following: > # failures > # flaky > # unexpected passes Agreed. > Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing. I personally find them a pure noise on build.chromium.org bots. Maybe 1-2 tests would be fine but when there are more than 3+ tests, it clutters the view (epecially when test names are long) and I can't make sense of it. Since we'd still have to look at diffs in order to determine what kind of failures they are, I don't see it as a critical feature at the moment. BTW, it'll be nice for Chromium port's results.html to have a link to flakiness dashboard.
Dirk Pranke
Comment 14 2011-06-07 12:07:14 PDT
(In reply to comment #13) > > Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing. > > I personally find them a pure noise on build.chromium.org bots. Maybe 1-2 tests would be fine but when there are more than 3+ tests, it clutters the view (epecially when test names are long) and I can't make sense of it. I'd side with Ojan on this one. On a side note, I am working on splitting out all of the "useful" output from new-run-webkit-tests from the debug output, so the stdio output should become a lot less noisy soon.
Ojan Vafai
Comment 15 2011-06-07 12:10:53 PDT
When we list the failures it makes it super easy to tell from a quick glance at the waterfall (perhaps with some find-in-page) when a test started failing. Without that you either need to look at the flakiness dashboard or actually look at the stdio for each run.
Ryosuke Niwa
Comment 16 2011-06-07 12:19:21 PDT
(In reply to comment #15) > When we list the failures it makes it super easy to tell from a quick glance at the waterfall (perhaps with some find-in-page) when a test started failing. Without that you either need to look at the flakiness dashboard or actually look at the stdio for each run. Okay, I can see that people have different opinions on this. It'd be nice if this was configurable (the number of tests to show, etc...). But if we're making this change for new-run-webkit-tests, then we should be making the same change for old-run-webkit-tests because I don't like nrwt and orwt having different look & feel.
Ryosuke Niwa
Comment 17 2011-06-07 13:21:27 PDT
Created attachment 96289 [details] screenshot 1''
Ryosuke Niwa
Comment 18 2011-06-07 13:22:09 PDT
Created attachment 96290 [details] screenshot 2''
Ryosuke Niwa
Comment 19 2011-06-07 13:26:15 PDT
Created attachment 96291 [details] shortened text further
Tony Chang
Comment 20 2011-06-07 13:49:16 PDT
Comment on attachment 96291 [details] shortened text further View in context: https://bugs.webkit.org/attachment.cgi?id=96291&action=review > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:287 > + match = expressions[name].match(line) Nit: Maybe use search() instead of match()? search() matches anywhere in the string and match() only matches the beginning of the string. > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:292 > + if name not in testFailures: > + testFailures[name] = 0 > + testFailures[name] += int(match.group(1)) This could be testFailures[name] = testFailures.get(name, 0) + int(match.group(1)) > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:313 > + return SUCCESS Did you mean to return result here?
Dirk Pranke
Comment 21 2011-06-07 13:53:42 PDT
I think the text is good enough as is, but if you were still looking to compress further, here are some suggestions: 1) omit "tests" ... "14 failed" is good enough 2) s/were flaky/flaky ... "10 flaky" is also good enough 3) you could omit both the flaky and unexpected pass numbers if you had actual failures. On the chromium bots, neither of those sets of results would produce a failed test step, so you be okay with skipping them. 4) I attempted to come up with something shorter than "passed unexpectedly", and failed. Perhaps we could coin a word like "unfailed" :) Personally, I'd probably make the changes in (1) and (2) and not the others. Up to you, though.
Tony Chang
Comment 22 2011-06-07 13:57:46 PDT
(In reply to comment #21) > I think the text is good enough as is, but if you were still looking to compress further, here are some suggestions: > > 1) omit "tests" ... "14 failed" is good enough > > 2) s/were flaky/flaky ... "10 flaky" is also good enough These also have the benefit of not having to worry about tenses. '1 tests failed' and '1 tests were flaky' is grammatically incorrect.
Ryosuke Niwa
Comment 23 2011-06-07 14:02:23 PDT
(In reply to comment #21) > 1) omit "tests" ... "14 failed" is good enough > > 2) s/were flaky/flaky ... "10 flaky" is also good enough Yeah, I experimented these two. Thought 14 failed / 14 passed sounded funny but so does "1 tests failed" so let's do this. > 3) you could omit both the flaky and unexpected pass numbers if you had actual failures. On the chromium bots, neither of those sets of results would produce a failed test step, so you be okay with skipping them. We definitely can do that in a followup patch if the current version proves to be too noisy. > 4) I attempted to come up with something shorter than "passed unexpectedly", and failed. Perhaps we could coin a word like "unfailed" :) Yeah, I tried and failed as well.
Ryosuke Niwa
Comment 24 2011-06-07 16:24:46 PDT
Created attachment 96324 [details] screenshot 1'''
Ryosuke Niwa
Comment 25 2011-06-07 16:25:09 PDT
Created attachment 96325 [details] screenshot 2'''
Ryosuke Niwa
Comment 26 2011-06-07 16:46:08 PDT
Ryosuke Niwa
Comment 27 2011-06-07 18:38:03 PDT
Ryosuke Niwa
Comment 28 2011-06-07 18:41:27 PDT
wms, aroben: could either one of you reconfig the master to pick up r88310?
William Siegrist
Comment 29 2011-06-08 09:03:04 PDT
Master updated.
Ryosuke Niwa
Comment 30 2011-06-08 09:40:38 PDT
(In reply to comment #29) > Master updated. Thanks!
Adam Roben (:aroben)
Comment 31 2011-06-20 22:04:30 PDT
(In reply to comment #12) > Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing. You might be interested in bug 59521 and bug 61877.
Note You need to log in before you can comment on or make changes to this bug.