Bug 138743

Summary:

Having 30+ flaky failures breaks EWS

Product:

WebKit

Reporter:

Alexey Proskuryakov <ap>

Component:

Tools / Tests

Assignee:

Jake Nielsen <jake.nielsen.webkit>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

commit-queue, glenn, jake.nielsen.webkit

Priority:

Version:

528+ (Nightly build)

Hardware:

Unspecified

OS:

Unspecified

See Also:

https://bugs.webkit.org/show_bug.cgi?id=137223

Attachments:

Description	Flags
Adds test cases, and fixes the behavior of TaskAnalysisPatch	none
Style fix	none
Fixes needless logical complexity	jake.nielsen.webkit: review-, jake.nielsen.webkit: commit-queue-
Fixes logical error	none
Removes the unnecessary not clauses, and removes the patch file from the patch.	none

Alexey Proskuryakov

Reported 2014-11-14 10:04:09 PST

When EWS detects 30+ failures on the first run, it doesn't re-test the patch, and only tries a clean build. But the very nature of flakiness is that the clean build will likely pass, and so we'll erroneously reject the patch. This causes false rejections, like in bug 138728.

Attachments
Adds test cases, and fixes the behavior of TaskAnalysisPatch (11.16 KB, patch) 2014-11-14 15:12 PST, Jake Nielsen	no flags	Details Formatted Diff Diff
Style fix (11.15 KB, patch) 2014-11-14 15:16 PST, Jake Nielsen	no flags	Details Formatted Diff Diff
Fixes needless logical complexity (11.11 KB, patch) 2014-11-14 16:13 PST, Jake Nielsen	jake.nielsen.webkit: review- jake.nielsen.webkit: commit-queue-	Details Formatted Diff Diff
Fixes logical error (22.87 KB, patch) 2014-11-14 16:19 PST, Jake Nielsen	no flags	Details Formatted Diff Diff
Removes the unnecessary not clauses, and removes the patch file from the patch. (11.04 KB, patch) 2014-11-17 09:53 PST, Jake Nielsen	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Jake Nielsen

Comment 1 2014-11-14 15:12:17 PST

Created attachment 241630 [details] Adds test cases, and fixes the behavior of TaskAnalysisPatch

WebKit Commit Bot

Comment 2 2014-11-14 15:14:47 PST

Attachment 241630 [details] did not pass style-queue: ERROR: Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237: too many blank lines (2) [pep8/E303] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.

Jake Nielsen

Comment 3 2014-11-14 15:16:09 PST

Created attachment 241631 [details] Style fix

Alexey Proskuryakov

Comment 4 2014-11-14 15:41:49 PST

Comment on attachment 241631 [details] Style fix View in context: https://bugs.webkit.org/attachment.cgi?id=241631&action=review > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241 > + if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit(): > + if not first_results.did_exceed_test_failure_limit(): > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > + else: > + self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id) > + return False I think that this condition is unnecessarily convoluted. Can you split it in two? if first_results.did_exceed_test_failure_limit(): ... if second_results.did_exceed_test_failure_limit(): ... Also, it's confusing that _should_defer_patch_or_throw doesn't return a result (or do we ignore it?). It sounds like a function whose job is to return a boolean without an side effects.

Jake Nielsen

Comment 5 2014-11-14 16:08:24 PST

(In reply to comment #4) > Comment on attachment 241631 [details] > Style fix > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241631&action=review > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241 > > + if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit(): > > + if not first_results.did_exceed_test_failure_limit(): > > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > > + else: > > + self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id) > > + return False > > I think that this condition is unnecessarily convoluted. Can you split it in > two? > if first_results.did_exceed_test_failure_limit(): > ... > if second_results.did_exceed_test_failure_limit(): > ... > > Also, it's confusing that _should_defer_patch_or_throw doesn't return a > result (or do we ignore it?). It sounds like a function whose job is to > return a boolean without an side effects. Good point, yeah this should be written in a way that's easier on the eyes and brain. I'll change it. _should_defer_patch_or_throw is a strange one. It will either return True, for "Yes, you should defer the patch", False for "You shouldn't necessarily defer the patch", or it will throw if it knows that the patch should be rejected. In this case, there's enough information to know that if _should_defer_patch_or_throw doesn't throw, then we should always defer.

Jake Nielsen

Comment 6 2014-11-14 16:13:07 PST

Created attachment 241638 [details] Fixes needless logical complexity

Jake Nielsen

Comment 7 2014-11-14 16:15:42 PST

Comment on attachment 241638 [details] Fixes needless logical complexity Hold up. I broke something.

Jake Nielsen

Comment 8 2014-11-14 16:19:00 PST

Created attachment 241640 [details] Fixes logical error

Alexey Proskuryakov

Comment 9 2014-11-14 22:26:12 PST

Comment on attachment 241640 [details] Fixes logical error View in context: https://bugs.webkit.org/attachment.cgi?id=241640&action=review Something is wrong with this patch, I suspect that it applies in an unexpected way: ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) ++ ++ > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): I'd put first_results first, and second_results second. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): Why do you need this "and"? The case where they both exceeded the limit was already handled. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): Ditto.

Jake Nielsen

Comment 10 2014-11-17 09:48:25 PST

(In reply to comment #9) > Comment on attachment 241640 [details] > Fixes logical error > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241640&action=review > > Something is wrong with this patch, I suspect that it applies in an > unexpected way: This looks fine to me. _lots_of_failing_tests was added as a convenience variable to make it more concise to add test cases for which the failure has been exceeded. > > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, > 100)) > ++ > ++ > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): > > I'd put first_results first, and second_results second. > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): > > Why do you need this "and"? The case where they both exceeded the limit was > already handled. Touche > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): > > Ditto.

Jake Nielsen

Comment 11 2014-11-17 09:50:36 PST

(In reply to comment #10) > (In reply to comment #9) > > Comment on attachment 241640 [details] > > Fixes logical error > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=241640&action=review > > > > Something is wrong with this patch, I suspect that it applies in an > > unexpected way: > This looks fine to me. _lots_of_failing_tests was added as a convenience > variable to make it more concise to add test cases for which the failure has > been exceeded. > > > > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, > > 100)) > > ++ > > ++ > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > > > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): > > > > I'd put first_results first, and second_results second. > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > > > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): > > > > Why do you need this "and"? The case where they both exceeded the limit was > > already handled. > Touche > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > > > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): > > > > Ditto. Oh, I see the issue here. I included the patch file in the patch. I'll fix that.

Jake Nielsen

Comment 12 2014-11-17 09:53:48 PST

Created attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch.

Alexey Proskuryakov

Comment 13 2014-11-17 10:27:09 PST

Comment on attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch. View in context: https://bugs.webkit.org/attachment.cgi?id=241720&action=review > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162 > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) What is the failure limit for unit tests? For regular tests, it's 500 failures or 50 crashes/timeouts, and for EWS it's 30 failures. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237 > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) This is still somewhat unreadable, and in need of refactoring. Which should be easier, now that we have test coverage.

WebKit Commit Bot

Comment 14 2014-11-17 11:06:38 PST

Comment on attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch. Clearing flags on attachment: 241720 Committed r176211: <http://trac.webkit.org/changeset/176211>

WebKit Commit Bot

Comment 15 2014-11-17 11:06:42 PST

All reviewed patches have been landed. Closing bug.

Jake Nielsen

Comment 16 2014-11-17 12:12:44 PST

(In reply to comment #13) > Comment on attachment 241720 [details] > Removes the unnecessary not clauses, and removes the patch file from the > patch. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241720&action=review > > > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162 > > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) > > What is the failure limit for unit tests? For regular tests, it's 500 > failures or 50 crashes/timeouts, and for EWS it's 30 failures. For the unit tests it's 10, so I guess this could be 20 rather than 100. There was a prior test that used 100 tests generated this way to communicate that there were many failing tests, so I just kept it and made it global. > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237 > > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > > This is still somewhat unreadable, and in need of refactoring. Which should > be easier, now that we have test coverage. I agree, I'll open a bug for it.

Note You need to log in before you can comment on or make changes to this bug.