Bug 62178

Summary:

new-run-webkit-tests: Bot master should print useful information on waterfall/console for nrwt

Product:

WebKit

Reporter:

Ryosuke Niwa <rniwa>

Component:

Tools / Tests

Assignee:

Ryosuke Niwa <rniwa>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

abarth, aroben, dpranke, eric, ojan, sam, tony, wsiegrist

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

Bug Depends on:

Bug Blocks:

34984

Attachments:

Description	Flags
screenshot 1	none
screenshot 2	none
fixes the bug	none
screenshot 1'	none
screenshot 2'	none
shortened text	none
screenshot 1''	none
screenshot 2''	none
shortened text further	none
screenshot 1'''	none
screenshot 2'''	none
Patch	tony: review+

Ryosuke Niwa

Reported 2011-06-06 19:27:45 PDT

Right now, waterfall and console on build.webkit.org only says "failed" for new-run-webkit-tests. This makes waterfall/console views much less useful. We should modify master.cfg to display more useful information.

Attachments
screenshot 1 (28.42 KB, image/png) 2011-06-06 19:29 PDT, Ryosuke Niwa	no flags	Details
screenshot 2 (20.94 KB, image/png) 2011-06-06 19:29 PDT, Ryosuke Niwa	no flags	Details
fixes the bug (2.67 KB, patch) 2011-06-06 19:38 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
screenshot 1' (32.56 KB, image/png) 2011-06-06 22:13 PDT, Ryosuke Niwa	no flags	Details
screenshot 2' (21.48 KB, image/png) 2011-06-06 22:13 PDT, Ryosuke Niwa	no flags	Details
shortened text (2.79 KB, patch) 2011-06-06 22:17 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
screenshot 1'' (12.18 KB, image/png) 2011-06-07 13:21 PDT, Ryosuke Niwa	no flags	Details
screenshot 2'' (17.84 KB, image/png) 2011-06-07 13:22 PDT, Ryosuke Niwa	no flags	Details
shortened text further (2.82 KB, patch) 2011-06-07 13:26 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
screenshot 1''' (17.08 KB, image/png) 2011-06-07 16:24 PDT, Ryosuke Niwa	no flags	Details
screenshot 2''' (19.46 KB, image/png) 2011-06-07 16:25 PDT, Ryosuke Niwa	no flags	Details
Patch (2.73 KB, patch) 2011-06-07 16:46 PDT, Ryosuke Niwa	tony: review+	Details Formatted Diff Diff
Show Obsolete (9) View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2011-06-06 19:29:29 PDT

Created attachment 96179 [details] screenshot 1

Ryosuke Niwa

Comment 2 2011-06-06 19:29:46 PDT

Created attachment 96180 [details] screenshot 2

Adam Barth

Comment 3 2011-06-06 19:31:45 PDT

There's too much text there. Can we convey the information with less text?

Ryosuke Niwa

Comment 4 2011-06-06 19:34:17 PDT

(In reply to comment #3) > There's too much text there. Can we convey the information with less text? Yeah, I agree. Do you have a suggestion as to how we do it?

Ryosuke Niwa

Comment 5 2011-06-06 19:36:45 PDT

Maybe I can get rid of " mismatch ". I don't know how I can shorten "Expected to fail, but passed" and "Expected to timeout, but passed"

Ryosuke Niwa

Comment 6 2011-06-06 19:38:05 PDT

Created attachment 96181 [details] fixes the bug

Ryosuke Niwa

Comment 7 2011-06-06 22:13:13 PDT

Created attachment 96198 [details] screenshot 1'

Ryosuke Niwa

Comment 8 2011-06-06 22:13:43 PDT

Created attachment 96199 [details] screenshot 2'

Ryosuke Niwa

Comment 9 2011-06-06 22:17:07 PDT

Created attachment 96200 [details] shortened text

Ryosuke Niwa

Comment 10 2011-06-06 23:56:13 PDT

*** Bug 59894 has been marked as a duplicate of this bug. ***

Tony Chang

Comment 11 2011-06-07 09:48:29 PDT

This still seems like too much text and the wrapping makes it hard to read in the narrow view. How about just totals like: flaky: # failed: # And if there are no failed tests: unexpected pass: # flaky: # Alternately, we should just match the text of the other bots and work from there (i.e., work on improving all the bots output).

Ojan Vafai

Comment 12 2011-06-07 09:54:48 PDT

Comment on attachment 96200 [details] shortened text I don't think we should print out what kind of failure/flaky it is. Just a sum number for each of the following: # failures # flaky # unexpected passes Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing.

Ryosuke Niwa

Comment 13 2011-06-07 11:18:54 PDT

(In reply to comment #11) > This still seems like too much text and the wrapping makes it hard to read in the narrow view. > > How about just totals like: > > flaky: # > failed: # I agree. We should shorten the text. But I'd like the number of failures be left-aligned. It makes a huge difference in readability. (In reply to comment #12) > (From update of attachment 96200 [details]) > I don't think we should print out what kind of failure/flaky it is. Just a sum number for each of the following: > # failures > # flaky > # unexpected passes Agreed. > Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing. I personally find them a pure noise on build.chromium.org bots. Maybe 1-2 tests would be fine but when there are more than 3+ tests, it clutters the view (epecially when test names are long) and I can't make sense of it. Since we'd still have to look at diffs in order to determine what kind of failures they are, I don't see it as a critical feature at the moment. BTW, it'll be nice for Chromium port's results.html to have a link to flakiness dashboard.

Dirk Pranke

Comment 14 2011-06-07 12:07:14 PDT

(In reply to comment #13) > > Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing. > > I personally find them a pure noise on build.chromium.org bots. Maybe 1-2 tests would be fine but when there are more than 3+ tests, it clutters the view (epecially when test names are long) and I can't make sense of it. I'd side with Ojan on this one. On a side note, I am working on splitting out all of the "useful" output from new-run-webkit-tests from the debug output, so the stdio output should become a lot less noisy soon.

Ojan Vafai

Comment 15 2011-06-07 12:10:53 PDT

When we list the failures it makes it super easy to tell from a quick glance at the waterfall (perhaps with some find-in-page) when a test started failing. Without that you either need to look at the flakiness dashboard or actually look at the stdio for each run.

Ryosuke Niwa

Comment 16 2011-06-07 12:19:21 PDT

(In reply to comment #15) > When we list the failures it makes it super easy to tell from a quick glance at the waterfall (perhaps with some find-in-page) when a test started failing. Without that you either need to look at the flakiness dashboard or actually look at the stdio for each run. Okay, I can see that people have different opinions on this. It'd be nice if this was configurable (the number of tests to show, etc...). But if we're making this change for new-run-webkit-tests, then we should be making the same change for old-run-webkit-tests because I don't like nrwt and orwt having different look & feel.

Ryosuke Niwa

Comment 17 2011-06-07 13:21:27 PDT

Created attachment 96289 [details] screenshot 1''

Ryosuke Niwa

Comment 18 2011-06-07 13:22:09 PDT

Created attachment 96290 [details] screenshot 2''

Ryosuke Niwa

Comment 19 2011-06-07 13:26:15 PDT

Created attachment 96291 [details] shortened text further

Tony Chang

Comment 20 2011-06-07 13:49:16 PDT

Comment on attachment 96291 [details] shortened text further View in context: https://bugs.webkit.org/attachment.cgi?id=96291&action=review > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:287 > + match = expressions[name].match(line) Nit: Maybe use search() instead of match()? search() matches anywhere in the string and match() only matches the beginning of the string. > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:292 > + if name not in testFailures: > + testFailures[name] = 0 > + testFailures[name] += int(match.group(1)) This could be testFailures[name] = testFailures.get(name, 0) + int(match.group(1)) > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:313 > + return SUCCESS Did you mean to return result here?

Dirk Pranke

Comment 21 2011-06-07 13:53:42 PDT

I think the text is good enough as is, but if you were still looking to compress further, here are some suggestions: 1) omit "tests" ... "14 failed" is good enough 2) s/were flaky/flaky ... "10 flaky" is also good enough 3) you could omit both the flaky and unexpected pass numbers if you had actual failures. On the chromium bots, neither of those sets of results would produce a failed test step, so you be okay with skipping them. 4) I attempted to come up with something shorter than "passed unexpectedly", and failed. Perhaps we could coin a word like "unfailed" :) Personally, I'd probably make the changes in (1) and (2) and not the others. Up to you, though.

Tony Chang

Comment 22 2011-06-07 13:57:46 PDT

(In reply to comment #21) > I think the text is good enough as is, but if you were still looking to compress further, here are some suggestions: > > 1) omit "tests" ... "14 failed" is good enough > > 2) s/were flaky/flaky ... "10 flaky" is also good enough These also have the benefit of not having to worry about tenses. '1 tests failed' and '1 tests were flaky' is grammatically incorrect.

Ryosuke Niwa

Comment 23 2011-06-07 14:02:23 PDT

(In reply to comment #21) > 1) omit "tests" ... "14 failed" is good enough > > 2) s/were flaky/flaky ... "10 flaky" is also good enough Yeah, I experimented these two. Thought 14 failed / 14 passed sounded funny but so does "1 tests failed" so let's do this. > 3) you could omit both the flaky and unexpected pass numbers if you had actual failures. On the chromium bots, neither of those sets of results would produce a failed test step, so you be okay with skipping them. We definitely can do that in a followup patch if the current version proves to be too noisy. > 4) I attempted to come up with something shorter than "passed unexpectedly", and failed. Perhaps we could coin a word like "unfailed" :) Yeah, I tried and failed as well.

Ryosuke Niwa

Comment 24 2011-06-07 16:24:46 PDT

Created attachment 96324 [details] screenshot 1'''

Ryosuke Niwa

Comment 25 2011-06-07 16:25:09 PDT

Created attachment 96325 [details] screenshot 2'''

Ryosuke Niwa

Comment 26 2011-06-07 16:46:08 PDT

Created attachment 96329 [details] Patch

Ryosuke Niwa

Comment 27 2011-06-07 18:38:03 PDT

Committed r88310: <http://trac.webkit.org/changeset/88310>

Ryosuke Niwa

Comment 28 2011-06-07 18:41:27 PDT

wms, aroben: could either one of you reconfig the master to pick up r88310?

William Siegrist

Comment 29 2011-06-08 09:03:04 PDT

Master updated.

Ryosuke Niwa

Comment 30 2011-06-08 09:40:38 PDT

(In reply to comment #29) > Master updated. Thanks!

Adam Roben (:aroben)

Comment 31 2011-06-20 22:04:30 PDT

(In reply to comment #12) > Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing. You might be interested in bug 59521 and bug 61877.

Note You need to log in before you can comment on or make changes to this bug.