Bug 138743

Summary: Having 30+ flaky failures breaks EWS
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: Tools / TestsAssignee: Jake Nielsen <jake.nielsen.webkit>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, glenn, jake.nielsen.webkit
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=137223
Attachments:
Description Flags
Adds test cases, and fixes the behavior of TaskAnalysisPatch
none
Style fix
none
Fixes needless logical complexity
jake.nielsen.webkit: review-, jake.nielsen.webkit: commit-queue-
Fixes logical error
none
Removes the unnecessary not clauses, and removes the patch file from the patch. none

Description Alexey Proskuryakov 2014-11-14 10:04:09 PST
When EWS detects 30+ failures on the first run, it doesn't re-test the patch, and only tries a clean build. But the very nature of flakiness is that the clean build will likely pass, and so we'll erroneously reject the patch.

This causes false rejections, like in bug 138728.
Comment 1 Jake Nielsen 2014-11-14 15:12:17 PST
Created attachment 241630 [details]
Adds test cases, and fixes the behavior of TaskAnalysisPatch
Comment 2 WebKit Commit Bot 2014-11-14 15:14:47 PST
Attachment 241630 [details] did not pass style-queue:


ERROR: Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237:  too many blank lines (2)  [pep8/E303] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Jake Nielsen 2014-11-14 15:16:09 PST
Created attachment 241631 [details]
Style fix
Comment 4 Alexey Proskuryakov 2014-11-14 15:41:49 PST
Comment on attachment 241631 [details]
Style fix

View in context: https://bugs.webkit.org/attachment.cgi?id=241631&action=review

> Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241
> +        if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit():
> +            if not first_results.did_exceed_test_failure_limit():
> +                self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id)
> +            else:
> +                self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id)
> +            return False

I think that this condition is unnecessarily convoluted. Can you split it in two?
if first_results.did_exceed_test_failure_limit():
  ...
if second_results.did_exceed_test_failure_limit():
  ...

Also, it's confusing that _should_defer_patch_or_throw doesn't return a result (or do we ignore it?). It sounds like a function whose job is to return a boolean without an side effects.
Comment 5 Jake Nielsen 2014-11-14 16:08:24 PST
(In reply to comment #4)
> Comment on attachment 241631 [details]
> Style fix
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=241631&action=review
> 
> > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241
> > +        if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit():
> > +            if not first_results.did_exceed_test_failure_limit():
> > +                self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id)
> > +            else:
> > +                self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id)
> > +            return False
> 
> I think that this condition is unnecessarily convoluted. Can you split it in
> two?
> if first_results.did_exceed_test_failure_limit():
>   ...
> if second_results.did_exceed_test_failure_limit():
>   ...
> 
> Also, it's confusing that _should_defer_patch_or_throw doesn't return a
> result (or do we ignore it?). It sounds like a function whose job is to
> return a boolean without an side effects.

Good point, yeah this should be written in a way that's easier on the eyes and brain. I'll change it.
 
_should_defer_patch_or_throw is a strange one. It will either return True, for "Yes, you should defer the patch", False for "You shouldn't necessarily defer the patch", or it will throw if it knows that the patch should be rejected. In this case, there's enough information to know that if _should_defer_patch_or_throw doesn't throw, then we should always defer.
Comment 6 Jake Nielsen 2014-11-14 16:13:07 PST
Created attachment 241638 [details]
Fixes needless logical complexity
Comment 7 Jake Nielsen 2014-11-14 16:15:42 PST
Comment on attachment 241638 [details]
Fixes needless logical complexity

Hold up. I broke something.
Comment 8 Jake Nielsen 2014-11-14 16:19:00 PST
Created attachment 241640 [details]
Fixes logical error
Comment 9 Alexey Proskuryakov 2014-11-14 22:26:12 PST
Comment on attachment 241640 [details]
Fixes logical error

View in context: https://bugs.webkit.org/attachment.cgi?id=241640&action=review

Something is wrong with this patch, I suspect that it applies in an unexpected way:

++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100))
++
++

> Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225
> +        if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit():

I'd put first_results first, and second_results second.

> Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236
> +        if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit():

Why do you need this "and"? The case where they both exceeded the limit was already handled.

> Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240
> +        if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit():

Ditto.
Comment 10 Jake Nielsen 2014-11-17 09:48:25 PST
(In reply to comment #9)
> Comment on attachment 241640 [details]
> Fixes logical error
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=241640&action=review
> 
> Something is wrong with this patch, I suspect that it applies in an
> unexpected way:
This looks fine to me. _lots_of_failing_tests was added as a convenience variable to make it more concise to add test cases for which the failure has been exceeded.
> 
> ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0,
> 100))
> ++
> ++
> 
> > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225
> > +        if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit():
> 
> I'd put first_results first, and second_results second.
> 
> > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236
> > +        if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit():
> 
> Why do you need this "and"? The case where they both exceeded the limit was
> already handled.
Touche
> 
> > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240
> > +        if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit():
> 
> Ditto.
Comment 11 Jake Nielsen 2014-11-17 09:50:36 PST
(In reply to comment #10)
> (In reply to comment #9)
> > Comment on attachment 241640 [details]
> > Fixes logical error
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=241640&action=review
> > 
> > Something is wrong with this patch, I suspect that it applies in an
> > unexpected way:
> This looks fine to me. _lots_of_failing_tests was added as a convenience
> variable to make it more concise to add test cases for which the failure has
> been exceeded.
> > 
> > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0,
> > 100))
> > ++
> > ++
> > 
> > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225
> > > +        if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit():
> > 
> > I'd put first_results first, and second_results second.
> > 
> > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236
> > > +        if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit():
> > 
> > Why do you need this "and"? The case where they both exceeded the limit was
> > already handled.
> Touche
> > 
> > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240
> > > +        if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit():
> > 
> > Ditto.
Oh, I see the issue here. I included the patch file in the patch. I'll fix that.
Comment 12 Jake Nielsen 2014-11-17 09:53:48 PST
Created attachment 241720 [details]
Removes the unnecessary not clauses, and removes the patch file from the patch.
Comment 13 Alexey Proskuryakov 2014-11-17 10:27:09 PST
Comment on attachment 241720 [details]
Removes the unnecessary not clauses, and removes the patch file from the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=241720&action=review

> Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162
> +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100))

What is the failure limit for unit tests? For regular tests, it's 500 failures or 50 crashes/timeouts, and for EWS it's 30 failures.

> Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237
> +            self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id)

This is still somewhat unreadable, and in need of refactoring. Which should be easier, now that we have test coverage.
Comment 14 WebKit Commit Bot 2014-11-17 11:06:38 PST
Comment on attachment 241720 [details]
Removes the unnecessary not clauses, and removes the patch file from the patch.

Clearing flags on attachment: 241720

Committed r176211: <http://trac.webkit.org/changeset/176211>
Comment 15 WebKit Commit Bot 2014-11-17 11:06:42 PST
All reviewed patches have been landed.  Closing bug.
Comment 16 Jake Nielsen 2014-11-17 12:12:44 PST
(In reply to comment #13)
> Comment on attachment 241720 [details]
> Removes the unnecessary not clauses, and removes the patch file from the
> patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=241720&action=review
> 
> > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162
> > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100))
> 
> What is the failure limit for unit tests? For regular tests, it's 500
> failures or 50 crashes/timeouts, and for EWS it's 30 failures.
For the unit tests it's 10, so I guess this could be 20 rather than 100. There was a prior test that used 100 tests generated this way to communicate that there were many failing tests, so I just kept it and made it global.
> 
> > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237
> > +            self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id)
> 
> This is still somewhat unreadable, and in need of refactoring. Which should
> be easier, now that we have test coverage.
I agree, I'll open a bug for it.