RESOLVED FIXED 138743
Having 30+ flaky failures breaks EWS
https://bugs.webkit.org/show_bug.cgi?id=138743
Summary Having 30+ flaky failures breaks EWS
Alexey Proskuryakov
Reported 2014-11-14 10:04:09 PST
When EWS detects 30+ failures on the first run, it doesn't re-test the patch, and only tries a clean build. But the very nature of flakiness is that the clean build will likely pass, and so we'll erroneously reject the patch. This causes false rejections, like in bug 138728.
Attachments
Adds test cases, and fixes the behavior of TaskAnalysisPatch (11.16 KB, patch)
2014-11-14 15:12 PST, Jake Nielsen
no flags
Style fix (11.15 KB, patch)
2014-11-14 15:16 PST, Jake Nielsen
no flags
Fixes needless logical complexity (11.11 KB, patch)
2014-11-14 16:13 PST, Jake Nielsen
jake.nielsen.webkit: review-
jake.nielsen.webkit: commit-queue-
Fixes logical error (22.87 KB, patch)
2014-11-14 16:19 PST, Jake Nielsen
no flags
Removes the unnecessary not clauses, and removes the patch file from the patch. (11.04 KB, patch)
2014-11-17 09:53 PST, Jake Nielsen
no flags
Jake Nielsen
Comment 1 2014-11-14 15:12:17 PST
Created attachment 241630 [details] Adds test cases, and fixes the behavior of TaskAnalysisPatch
WebKit Commit Bot
Comment 2 2014-11-14 15:14:47 PST
Attachment 241630 [details] did not pass style-queue: ERROR: Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237: too many blank lines (2) [pep8/E303] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Jake Nielsen
Comment 3 2014-11-14 15:16:09 PST
Created attachment 241631 [details] Style fix
Alexey Proskuryakov
Comment 4 2014-11-14 15:41:49 PST
Comment on attachment 241631 [details] Style fix View in context: https://bugs.webkit.org/attachment.cgi?id=241631&action=review > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241 > + if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit(): > + if not first_results.did_exceed_test_failure_limit(): > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > + else: > + self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id) > + return False I think that this condition is unnecessarily convoluted. Can you split it in two? if first_results.did_exceed_test_failure_limit(): ... if second_results.did_exceed_test_failure_limit(): ... Also, it's confusing that _should_defer_patch_or_throw doesn't return a result (or do we ignore it?). It sounds like a function whose job is to return a boolean without an side effects.
Jake Nielsen
Comment 5 2014-11-14 16:08:24 PST
(In reply to comment #4) > Comment on attachment 241631 [details] > Style fix > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241631&action=review > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241 > > + if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit(): > > + if not first_results.did_exceed_test_failure_limit(): > > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > > + else: > > + self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id) > > + return False > > I think that this condition is unnecessarily convoluted. Can you split it in > two? > if first_results.did_exceed_test_failure_limit(): > ... > if second_results.did_exceed_test_failure_limit(): > ... > > Also, it's confusing that _should_defer_patch_or_throw doesn't return a > result (or do we ignore it?). It sounds like a function whose job is to > return a boolean without an side effects. Good point, yeah this should be written in a way that's easier on the eyes and brain. I'll change it. _should_defer_patch_or_throw is a strange one. It will either return True, for "Yes, you should defer the patch", False for "You shouldn't necessarily defer the patch", or it will throw if it knows that the patch should be rejected. In this case, there's enough information to know that if _should_defer_patch_or_throw doesn't throw, then we should always defer.
Jake Nielsen
Comment 6 2014-11-14 16:13:07 PST
Created attachment 241638 [details] Fixes needless logical complexity
Jake Nielsen
Comment 7 2014-11-14 16:15:42 PST
Comment on attachment 241638 [details] Fixes needless logical complexity Hold up. I broke something.
Jake Nielsen
Comment 8 2014-11-14 16:19:00 PST
Created attachment 241640 [details] Fixes logical error
Alexey Proskuryakov
Comment 9 2014-11-14 22:26:12 PST
Comment on attachment 241640 [details] Fixes logical error View in context: https://bugs.webkit.org/attachment.cgi?id=241640&action=review Something is wrong with this patch, I suspect that it applies in an unexpected way: ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) ++ ++ > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): I'd put first_results first, and second_results second. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): Why do you need this "and"? The case where they both exceeded the limit was already handled. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): Ditto.
Jake Nielsen
Comment 10 2014-11-17 09:48:25 PST
(In reply to comment #9) > Comment on attachment 241640 [details] > Fixes logical error > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241640&action=review > > Something is wrong with this patch, I suspect that it applies in an > unexpected way: This looks fine to me. _lots_of_failing_tests was added as a convenience variable to make it more concise to add test cases for which the failure has been exceeded. > > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, > 100)) > ++ > ++ > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): > > I'd put first_results first, and second_results second. > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): > > Why do you need this "and"? The case where they both exceeded the limit was > already handled. Touche > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): > > Ditto.
Jake Nielsen
Comment 11 2014-11-17 09:50:36 PST
(In reply to comment #10) > (In reply to comment #9) > > Comment on attachment 241640 [details] > > Fixes logical error > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=241640&action=review > > > > Something is wrong with this patch, I suspect that it applies in an > > unexpected way: > This looks fine to me. _lots_of_failing_tests was added as a convenience > variable to make it more concise to add test cases for which the failure has > been exceeded. > > > > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, > > 100)) > > ++ > > ++ > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > > > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): > > > > I'd put first_results first, and second_results second. > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > > > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): > > > > Why do you need this "and"? The case where they both exceeded the limit was > > already handled. > Touche > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > > > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): > > > > Ditto. Oh, I see the issue here. I included the patch file in the patch. I'll fix that.
Jake Nielsen
Comment 12 2014-11-17 09:53:48 PST
Created attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch.
Alexey Proskuryakov
Comment 13 2014-11-17 10:27:09 PST
Comment on attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch. View in context: https://bugs.webkit.org/attachment.cgi?id=241720&action=review > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162 > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) What is the failure limit for unit tests? For regular tests, it's 500 failures or 50 crashes/timeouts, and for EWS it's 30 failures. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237 > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) This is still somewhat unreadable, and in need of refactoring. Which should be easier, now that we have test coverage.
WebKit Commit Bot
Comment 14 2014-11-17 11:06:38 PST
Comment on attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch. Clearing flags on attachment: 241720 Committed r176211: <http://trac.webkit.org/changeset/176211>
WebKit Commit Bot
Comment 15 2014-11-17 11:06:42 PST
All reviewed patches have been landed. Closing bug.
Jake Nielsen
Comment 16 2014-11-17 12:12:44 PST
(In reply to comment #13) > Comment on attachment 241720 [details] > Removes the unnecessary not clauses, and removes the patch file from the > patch. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241720&action=review > > > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162 > > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) > > What is the failure limit for unit tests? For regular tests, it's 500 > failures or 50 crashes/timeouts, and for EWS it's 30 failures. For the unit tests it's 10, so I guess this could be 20 rather than 100. There was a prior test that used 100 tests generated this way to communicate that there were many failing tests, so I just kept it and made it global. > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237 > > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > > This is still somewhat unreadable, and in need of refactoring. Which should > be easier, now that we have test coverage. I agree, I'll open a bug for it.
Note You need to log in before you can comment on or make changes to this bug.