Bug 142425 - Crash count should be listed separately from failure count when layout tests fail
Summary: Crash count should be listed separately from failure count when layout tests ...
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: David Kilzer (:ddkilzer)
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2015-03-06 20:05 PST by David Kilzer (:ddkilzer)
Modified: 2016-06-21 11:41 PDT (History)
11 users (show)

See Also:


Attachments
Patch v1 (3.40 KB, patch)
2015-03-06 20:23 PST, David Kilzer (:ddkilzer)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2015-03-06 20:05:52 PST
When a layout tests fail, it's useful to print the crash count separately (as a subset of the failure count).

This is because crashes are more severe than most other failures, and when running tests with ASan or GuardMalloc, the primary intent is to find tests that crash since those are the most interesting results for those bots.
Comment 1 David Kilzer (:ddkilzer) 2015-03-06 20:07:36 PST
<rdar://problem/20078699>
Comment 2 David Kilzer (:ddkilzer) 2015-03-06 20:23:30 PST
Created attachment 248137 [details]
Patch v1
Comment 3 Csaba Osztrogonác 2015-03-07 01:50:26 PST
(In reply to comment #0)
> When a layout tests fail, it's useful to print the crash count separately
> (as a subset of the failure count).
> 
> This is because crashes are more severe than most other failures, and when
> running tests with ASan or GuardMalloc, the primary intent is to find tests
> that crash since those are the most interesting results for those bots.

I like the idea to show crashes separately, but in my opinion counting 
them twice (as crashed and failures too) is very confusing if we don't
change the output format. I could imagine a more detailed output, but
it should be evident what means what. And maybe the bot watchers dashboard
should be fixed to be able parse the new output.

for example: (where x = y + u + v)
x unexpected regressions: y crashes, u timeouts, v failures
j new passes, k flakes, l missing results

For example let's see the latest result from the GTK bot,
where we can found all kind of failures:

actual output: 22 failures 32 new passes 10 flakes 8 missing results

proposed output, which is confusing for me:
22 failures 4 crashes 32 new passes 10 flakes 8 missing results

Or what about: 22 failures (4 of them crashes) ... ?


detailed results (actual):
--------------------------
Expected to crash, but passed: (1)
Expected to timeout, but passed: (2)
Expected to fail, but passed: (29)
==> 32 new passes

Unexpected flakiness: text-only failures (4)
Unexpected flakiness: image-only failures (1)
Unexpected flakiness: timeouts (5)
==> 10 flakes

Regressions: Unexpected text-only failures (4)
Regressions: Unexpected image-only failures (6)
Regressions: Unexpected crashes (4)
Regressions: Unexpected timeouts (8)
==> 22 failures

Regressions: Unexpected missing results (8)
==> 8 missing results
Comment 4 Alexey Proskuryakov 2015-03-07 08:03:24 PST
Comment on attachment 248137 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=248137&action=review

> Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:377
> +        # Add crash count to failure count.

I don't understand this part. Isn't it already included?
Comment 5 Alexey Proskuryakov 2015-03-07 08:07:00 PST
Ossy, I think that the intention here is to display crashes for crashesOnly queues (already supported by the dashboard), such as a potential GuardMalloc queue. 

I would object to displaying crash count separately for normal queues, that's too much information.
Comment 6 David Kilzer (:ddkilzer) 2015-03-08 15:27:15 PDT
(In reply to comment #4)
> Comment on attachment 248137 [details]
> Patch v1
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=248137&action=review
> 
> > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:377
> > +        # Add crash count to failure count.
> 
> I don't understand this part. Isn't it already included?

Prior to this change, crashes were counted as failures but not listed separately.

This piece of code keeps that invariant the same so that crash count is listed separately, but crashes are still also counted as failures.

FWIW, this matches how we report crashes on our internal buildbot instances (although the code accomplishes this slightly differently).

I don't have strong opinions on whether crash count should be a subset of failure count or not (I originally implemented them as separate counts).

(In reply to comment #5)
> Ossy, I think that the intention here is to display crashes for crashesOnly
> queues (already supported by the dashboard), such as a potential GuardMalloc
> queue. 

Yes, that's the intention.

> I would object to displaying crash count separately for normal queues,
> that's too much information.

I don't see the potential harm here, though.  Can you explain what exactly your concern is other than "too much information"?  Is it that the Info column might wrap to a second line?  How is having a crash count a bad thing in this situation?

Personally, I think crashes are an interesting enough subset of failures that I'd like to have a separate count in the summary text.  It's especially useful for GuardMalloc or ASan queues, but I feel crashes in other queues are interesting as well.
Comment 7 Alexey Proskuryakov 2015-03-09 12:30:00 PDT
> > > Tools/BuildSlaveSupport/build.webkit.org-config/master.cfg:377
> > > +        # Add crash count to failure count.
> > 
> > I don't understand this part. Isn't it already included?
> 
> Prior to this change, crashes were counted as failures but not listed separately.

What I'm saying is that failures come from all "Regressions: Unexpected" lines, and crashes come from "Regressions: Unexpected crash" lines. Which seems to mean that failures already include crashes, but then you add crashes to failures.

Aren't they double-counted as a result? Am I misreading the code?

> Can you explain what exactly your concern is other than "too much information"?

Too much information is exactly what I'm concerned about. We do have a bug about improving summary strings for failures though - bug 122901 - please feel free to add the idea of separating crashes if you think that it would help those looking at the page.
Comment 8 Alexey Proskuryakov 2015-03-09 12:31:36 PDT
One thing we could do without adding noise is to say "N crashes" when all the failures are crashes, and to say "N failures" when there is a mix of crashes, text failures, audio failures and/or image failures.
Comment 9 Brian Burg 2015-11-23 15:22:39 PST
(In reply to comment #8)
> One thing we could do without adding noise is to say "N crashes" when all
> the failures are crashes, and to say "N failures" when there is a mix of
> crashes, text failures, audio failures and/or image failures.

So, I think whether crashes should be included in the big red number or not is a dashboard UI issue. It would be nice to simply have the crash count included in the raw data (I would prefer it be subtracted from failures, unless we double count other types of failures already).
Comment 10 David Kilzer (:ddkilzer) 2016-06-21 11:39:52 PDT
Comment on attachment 248137 [details]
Patch v1

Clearing review flags.
Comment 11 David Kilzer (:ddkilzer) 2016-06-21 11:41:05 PDT
Apparently I was trying to solve a problem that no one was having, so moving to RESOLVED/WONTFIX.