Bug 62178

Summary: new-run-webkit-tests: Bot master should print useful information on waterfall/console for nrwt
Product: WebKit Reporter: Ryosuke Niwa <rniwa>
Component: Tools / TestsAssignee: Ryosuke Niwa <rniwa>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aroben, dpranke, eric, ojan, sam, tony, wsiegrist
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 34984    
Attachments:
Description Flags
screenshot 1
none
screenshot 2
none
fixes the bug
none
screenshot 1'
none
screenshot 2'
none
shortened text
none
screenshot 1''
none
screenshot 2''
none
shortened text further
none
screenshot 1'''
none
screenshot 2'''
none
Patch tony: review+

Description Ryosuke Niwa 2011-06-06 19:27:45 PDT
Right now, waterfall and console on build.webkit.org only says "failed" for new-run-webkit-tests.  This makes waterfall/console views much less useful.  We should modify master.cfg to display more useful information.
Comment 1 Ryosuke Niwa 2011-06-06 19:29:29 PDT
Created attachment 96179 [details]
screenshot 1
Comment 2 Ryosuke Niwa 2011-06-06 19:29:46 PDT
Created attachment 96180 [details]
screenshot 2
Comment 3 Adam Barth 2011-06-06 19:31:45 PDT
There's too much text there.  Can we convey the information with less text?
Comment 4 Ryosuke Niwa 2011-06-06 19:34:17 PDT
(In reply to comment #3)
> There's too much text there.  Can we convey the information with less text?

Yeah, I agree.  Do you have a suggestion as to how we do it?
Comment 5 Ryosuke Niwa 2011-06-06 19:36:45 PDT
Maybe I can get rid of " mismatch ".  I don't know how I can shorten "Expected to fail, but passed" and "Expected to timeout, but passed"
Comment 6 Ryosuke Niwa 2011-06-06 19:38:05 PDT
Created attachment 96181 [details]
fixes the bug
Comment 7 Ryosuke Niwa 2011-06-06 22:13:13 PDT
Created attachment 96198 [details]
screenshot 1'
Comment 8 Ryosuke Niwa 2011-06-06 22:13:43 PDT
Created attachment 96199 [details]
screenshot 2'
Comment 9 Ryosuke Niwa 2011-06-06 22:17:07 PDT
Created attachment 96200 [details]
shortened text
Comment 10 Ryosuke Niwa 2011-06-06 23:56:13 PDT
*** Bug 59894 has been marked as a duplicate of this bug. ***
Comment 11 Tony Chang 2011-06-07 09:48:29 PDT
This still seems like too much text and the wrapping makes it hard to read in the narrow view.

How about just totals like:

flaky: #
failed: #

And if there are no failed tests:

unexpected pass: #
flaky: #


Alternately, we should just match the text of the other bots and work from there (i.e., work on improving all the bots output).
Comment 12 Ojan Vafai 2011-06-07 09:54:48 PDT
Comment on attachment 96200 [details]
shortened text

I don't think we should print out what kind of failure/flaky it is. Just a sum number for each of the following:
# failures
# flaky
# unexpected passes

Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing.
Comment 13 Ryosuke Niwa 2011-06-07 11:18:54 PDT
(In reply to comment #11)
> This still seems like too much text and the wrapping makes it hard to read in the narrow view.
> 
> How about just totals like:
> 
> flaky: #
> failed: #

I agree. We should shorten the text.  But I'd like the number of failures be left-aligned.  It makes a huge difference in readability.

(In reply to comment #12)
> (From update of attachment 96200 [details])
> I don't think we should print out what kind of failure/flaky it is. Just a sum number for each of the following:
> # failures
> # flaky
> # unexpected passes

Agreed.

> Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing.

I personally find them a pure noise on build.chromium.org bots.  Maybe 1-2 tests would be fine but when there are more than 3+ tests, it clutters the view (epecially when test names are long) and I can't make sense of it.

Since we'd still have to look at diffs in order to determine what kind of failures they are, I don't see it as a critical feature at the moment.

BTW, it'll be nice for Chromium port's results.html to have a link to flakiness dashboard.
Comment 14 Dirk Pranke 2011-06-07 12:07:14 PDT
(In reply to comment #13)
> > Printing out more is too much information to make sense of. Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing.
> 
> I personally find them a pure noise on build.chromium.org bots.  Maybe 1-2 tests would be fine but when there are more than 3+ tests, it clutters the view (epecially when test names are long) and I can't make sense of it.

I'd side with Ojan on this one.

On a side note, I am working on splitting out all of the "useful" output from new-run-webkit-tests from the debug output, so the stdio output should become a lot less noisy soon.
Comment 15 Ojan Vafai 2011-06-07 12:10:53 PDT
When we list the failures it makes it super easy to tell from a quick glance at the waterfall (perhaps with some find-in-page) when a test started failing. Without that you either need to look at the flakiness dashboard or actually look at the stdio for each run.
Comment 16 Ryosuke Niwa 2011-06-07 12:19:21 PDT
(In reply to comment #15)
> When we list the failures it makes it super easy to tell from a quick glance at the waterfall (perhaps with some find-in-page) when a test started failing. Without that you either need to look at the flakiness dashboard or actually look at the stdio for each run.

Okay, I can see that people have different opinions on this. It'd be nice if this was configurable (the number of tests to show, etc...).

But if we're making this change for new-run-webkit-tests, then we should be making the same change for old-run-webkit-tests because I don't like nrwt and orwt having different look & feel.
Comment 17 Ryosuke Niwa 2011-06-07 13:21:27 PDT
Created attachment 96289 [details]
screenshot 1''
Comment 18 Ryosuke Niwa 2011-06-07 13:22:09 PDT
Created attachment 96290 [details]
screenshot 2''
Comment 19 Ryosuke Niwa 2011-06-07 13:26:15 PDT
Created attachment 96291 [details]
shortened text further
Comment 20 Tony Chang 2011-06-07 13:49:16 PDT
Comment on attachment 96291 [details]
shortened text further

View in context: https://bugs.webkit.org/attachment.cgi?id=96291&action=review

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:287
> +                match = expressions[name].match(line)

Nit: Maybe use search() instead of match()?  search() matches anywhere in the string and match() only matches the beginning of the string.

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:292
> +                if name not in testFailures:
> +                    testFailures[name] = 0
> +                testFailures[name] += int(match.group(1))

This could be testFailures[name] = testFailures.get(name, 0) + int(match.group(1))

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:313
> +        return SUCCESS

Did you mean to return result here?
Comment 21 Dirk Pranke 2011-06-07 13:53:42 PDT
I think the text is good enough as is, but if you were still looking to compress further, here are some suggestions:

1) omit "tests" ... "14 failed" is good enough

2) s/were flaky/flaky ... "10 flaky" is also good enough

3) you could omit both the flaky and unexpected pass numbers if you had actual failures. On the chromium bots, neither of those sets of results would produce a failed test step, so you be okay with skipping them.

4) I attempted to come up with something shorter than "passed unexpectedly", and failed. Perhaps we could coin a word like "unfailed" :)

Personally, I'd probably make the changes in (1) and (2) and not the others. Up to you, though.
Comment 22 Tony Chang 2011-06-07 13:57:46 PDT
(In reply to comment #21)
> I think the text is good enough as is, but if you were still looking to compress further, here are some suggestions:
> 
> 1) omit "tests" ... "14 failed" is good enough
> 
> 2) s/were flaky/flaky ... "10 flaky" is also good enough

These also have the benefit of not having to worry about tenses.  '1 tests failed' and '1 tests were flaky' is grammatically incorrect.
Comment 23 Ryosuke Niwa 2011-06-07 14:02:23 PDT
(In reply to comment #21)
> 1) omit "tests" ... "14 failed" is good enough
> 
> 2) s/were flaky/flaky ... "10 flaky" is also good enough

Yeah, I experimented these two. Thought 14 failed / 14 passed sounded funny but so does "1 tests failed" so let's do this.

> 3) you could omit both the flaky and unexpected pass numbers if you had actual failures. On the chromium bots, neither of those sets of results would produce a failed test step, so you be okay with skipping them.

We definitely can do that in a followup patch if the current version proves to be too noisy.

> 4) I attempted to come up with something shorter than "passed unexpectedly", and failed. Perhaps we could coin a word like "unfailed" :)

Yeah, I tried and failed as well.
Comment 24 Ryosuke Niwa 2011-06-07 16:24:46 PDT
Created attachment 96324 [details]
screenshot 1'''
Comment 25 Ryosuke Niwa 2011-06-07 16:25:09 PDT
Created attachment 96325 [details]
screenshot 2'''
Comment 26 Ryosuke Niwa 2011-06-07 16:46:08 PDT
Created attachment 96329 [details]
Patch
Comment 27 Ryosuke Niwa 2011-06-07 18:38:03 PDT
Committed r88310: <http://trac.webkit.org/changeset/88310>
Comment 28 Ryosuke Niwa 2011-06-07 18:41:27 PDT
wms, aroben: could either one of you reconfig the master to pick up r88310?
Comment 29 William Siegrist 2011-06-08 09:03:04 PDT
Master updated.
Comment 30 Ryosuke Niwa 2011-06-08 09:40:38 PDT
(In reply to comment #29)
> Master updated.

Thanks!
Comment 31 Adam Roben (:aroben) 2011-06-20 22:04:30 PDT
(In reply to comment #12)
> Also, perhaps as a separate patch, I'd really like to see us printout the first 10 failures as well like we do on the build.chromium.org waterfall. That really helps a lot in identifying when a test started failing.

You might be interested in bug 59521 and bug 61877.