123385 – New flakiness dashboard shouldn't treat tests with right expectations as failing

RESOLVED FIXED123385

New flakiness dashboard shouldn't treat tests with right expectations as failing

https://bugs.webkit.org/show_bug.cgi?id=123385

Summary New flakiness dashboard shouldn't treat tests with right expectations as failing

Ryosuke Niwa

Reported 2013-10-25 23:29:20 PDT

Right now, if you select "failing" tests on the builder pane, the new flakiness dashboard lists all failing tests including ones that have the right test expectation. It should instead only list tests that are failing and don't have the right expectation that are making bots red.

Attachments
Changes the behavior (2.24 KB, patch) 2013-10-25 23:33 PDT, Ryosuke Niwa	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Ryosuke Niwa

Comment 1 2013-10-25 23:33:19 PDT

Created attachment 215240 [details] Changes the behavior

Alexey Proskuryakov

Comment 2 2013-10-26 16:52:45 PDT

Comment on attachment 215240 [details] Changes the behavior I've never used this feature on the old dashboard, so it's not clear to me if either behavior is useful. What are the use cases? If this is a replacement for regular dashboard, then we should consider just removing the duplicate functionality. r=me

Ryosuke Niwa

Comment 3 2013-10-26 17:21:01 PDT

(In reply to comment #2) > (From update of attachment 215240 [details]) > I've never used this feature on the old dashboard, so it's not clear to me if either behavior is useful. What are the use cases? If this is a replacement for regular dashboard, then we should consider just removing the duplicate functionality. This shows the list of failing tests on the bots.

WebKit Commit Bot

Comment 4 2013-10-26 17:45:38 PDT

Comment on attachment 215240 [details] Changes the behavior Clearing flags on attachment: 215240 Committed r158093: <http://trac.webkit.org/changeset/158093>

WebKit Commit Bot

Comment 5 2013-10-26 17:45:40 PDT

All reviewed patches have been landed. Closing bug.

Alexey Proskuryakov

Comment 6 2013-10-27 10:06:03 PDT

> This shows the list of failing tests on the bots. I don't think that this answers my question about use cases. Listing tests that are currently failing is not a job for the dashboard, which is for historic analysis of results.

Ryosuke Niwa

Comment 7 2013-10-27 11:04:55 PDT

(In reply to comment #6) > > This shows the list of failing tests on the bots. > > I don't think that this answers my question about use cases. Listing tests that are currently failing is not a job for the dashboard, which is for historic analysis of results. If you're talking about http://build.webkit.org/dashboard/, I find it impossible to use because it doesn't have links to builder's page and it has -webkit-user-select: none along with dozens of other problems.

Alexey Proskuryakov

Comment 8 2013-10-27 11:41:48 PDT

Can you please file bugs for those? That is the tool intended to be used for looking at immediate state of the bots, and adding duplicate functionality to other tools is not the best path forward. We'll just end up with a set of tools that no one but their creators understand or use. build.webkit.org/dashboard is also meant to be the primary entry point into the regression test bot system for most people, because checking historic flakiness is an activity that is secondary to checking immediate state. Buildbot waterfall and console certainly have their use, but mostly for people who administer the system, not for WebKit developers in my opinion. There is a bunch of bugs and enhancement requests filed already, you can find these by searching for "build.webkit.org/dashboard" in Bugzilla titles. I encourage you to file bugs in terms of use cases that aren't addressed well (i.e. not simply "please remove user-select:none", but "I often need to do XXX when bot watching, and it's difficult to do now").

Ryosuke Niwa

Comment 9 2013-10-27 12:03:58 PDT

(In reply to comment #8) > build.webkit.org/dashboard is also meant to be the primary entry point into the regression test bot system for most people, because checking historic flakiness is an activity that is secondary to checking immediate state. Buildbot waterfall and console certainly have their use, but mostly for people who administer the system, not for WebKit developers in my opinion. I don't see a point in doing that given I'm satisfied with what build.webkit.org/waterfall and build.webkit.org/console provides. Those two pages provides exactly the kind of information I need.

Alexey Proskuryakov

Comment 10 2013-10-27 19:19:43 PDT

> I'm satisfied with what build.webkit.org/waterfall and build.webkit.org/console provides In this case, can we just get rid of the "failing" display in the new flakiness dashboard?

Ryosuke Niwa

Comment 11 2013-10-27 19:24:24 PDT

(In reply to comment #10) > > I'm satisfied with what build.webkit.org/waterfall and build.webkit.org/console provides > > In this case, can we just get rid of the "failing" display in the new flakiness dashboard? Why? The historical results of currently failing tests is exactly what bot watchers need to see to determine which patch caused the failure and whether tests have been flaky or not.

Ryosuke Niwa

Comment 12 2013-10-27 19:56:29 PDT

I think I'm disagreeing with the statement that "checking historic flakiness is an activity that is secondary to checking immediate state". In my experience, viewing the historical results of a test has been essential in determining the culprit and the correct test expectation to add. Knowing how many tests are failing on a builder doesn't get me anywhere as a bot watcher because my primary job as a bot watcher (contacting the patch author, etc…) cannot be carried out until the culprit is determined. I don't know what revision number http://build.webkit.org/dashboard/ is showing but automatically determining the culprit has already been tried by TestFailures and garden-o-magic. They have both miserably failed to carry out the promise. The task of this sort is best done by humans.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component WebKit Website

Assignee

Ryosuke Niwa

Reported

2013-10-25 23:29 PDT

Modified

2013-10-27 19:56 PDT History

CC List

6 users Show

URL

Keywords

Depends on

Blocks