When EWS detects 30+ failures on the first run, it doesn't re-test the patch, and only tries a clean build. But the very nature of flakiness is that the clean build will likely pass, and so we'll erroneously reject the patch. This causes false rejections, like in bug 138728.
Created attachment 241630 [details] Adds test cases, and fixes the behavior of TaskAnalysisPatch
Attachment 241630 [details] did not pass style-queue: ERROR: Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237: too many blank lines (2) [pep8/E303] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 241631 [details] Style fix
Comment on attachment 241631 [details] Style fix View in context: https://bugs.webkit.org/attachment.cgi?id=241631&action=review > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241 > + if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit(): > + if not first_results.did_exceed_test_failure_limit(): > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > + else: > + self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id) > + return False I think that this condition is unnecessarily convoluted. Can you split it in two? if first_results.did_exceed_test_failure_limit(): ... if second_results.did_exceed_test_failure_limit(): ... Also, it's confusing that _should_defer_patch_or_throw doesn't return a result (or do we ignore it?). It sounds like a function whose job is to return a boolean without an side effects.
(In reply to comment #4) > Comment on attachment 241631 [details] > Style fix > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241631&action=review > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:241 > > + if second_results.did_exceed_test_failure_limit() or first_results.did_exceed_test_failure_limit(): > > + if not first_results.did_exceed_test_failure_limit(): > > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > > + else: > > + self._should_defer_patch_or_throw(second_results.failing_test_results(), second_results_archive, second_script_error, second_failure_status_id) > > + return False > > I think that this condition is unnecessarily convoluted. Can you split it in > two? > if first_results.did_exceed_test_failure_limit(): > ... > if second_results.did_exceed_test_failure_limit(): > ... > > Also, it's confusing that _should_defer_patch_or_throw doesn't return a > result (or do we ignore it?). It sounds like a function whose job is to > return a boolean without an side effects. Good point, yeah this should be written in a way that's easier on the eyes and brain. I'll change it. _should_defer_patch_or_throw is a strange one. It will either return True, for "Yes, you should defer the patch", False for "You shouldn't necessarily defer the patch", or it will throw if it knows that the patch should be rejected. In this case, there's enough information to know that if _should_defer_patch_or_throw doesn't throw, then we should always defer.
Created attachment 241638 [details] Fixes needless logical complexity
Comment on attachment 241638 [details] Fixes needless logical complexity Hold up. I broke something.
Created attachment 241640 [details] Fixes logical error
Comment on attachment 241640 [details] Fixes logical error View in context: https://bugs.webkit.org/attachment.cgi?id=241640&action=review Something is wrong with this patch, I suspect that it applies in an unexpected way: ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) ++ ++ > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): I'd put first_results first, and second_results second. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): Why do you need this "and"? The case where they both exceeded the limit was already handled. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): Ditto.
(In reply to comment #9) > Comment on attachment 241640 [details] > Fixes logical error > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241640&action=review > > Something is wrong with this patch, I suspect that it applies in an > unexpected way: This looks fine to me. _lots_of_failing_tests was added as a convenience variable to make it more concise to add test cases for which the failure has been exceeded. > > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, > 100)) > ++ > ++ > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): > > I'd put first_results first, and second_results second. > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): > > Why do you need this "and"? The case where they both exceeded the limit was > already handled. Touche > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): > > Ditto.
(In reply to comment #10) > (In reply to comment #9) > > Comment on attachment 241640 [details] > > Fixes logical error > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=241640&action=review > > > > Something is wrong with this patch, I suspect that it applies in an > > unexpected way: > This looks fine to me. _lots_of_failing_tests was added as a convenience > variable to make it more concise to add test cases for which the failure has > been exceeded. > > > > ++_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, > > 100)) > > ++ > > ++ > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:225 > > > + if second_results.did_exceed_test_failure_limit() and first_results.did_exceed_test_failure_limit(): > > > > I'd put first_results first, and second_results second. > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:236 > > > + if second_results.did_exceed_test_failure_limit() and not first_results.did_exceed_test_failure_limit(): > > > > Why do you need this "and"? The case where they both exceeded the limit was > > already handled. > Touche > > > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:240 > > > + if first_results.did_exceed_test_failure_limit() and not second_results.did_exceed_test_failure_limit(): > > > > Ditto. Oh, I see the issue here. I included the patch file in the patch. I'll fix that.
Created attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch.
Comment on attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch. View in context: https://bugs.webkit.org/attachment.cgi?id=241720&action=review > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162 > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) What is the failure limit for unit tests? For regular tests, it's 500 failures or 50 crashes/timeouts, and for EWS it's 30 failures. > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237 > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) This is still somewhat unreadable, and in need of refactoring. Which should be easier, now that we have test coverage.
Comment on attachment 241720 [details] Removes the unnecessary not clauses, and removes the patch file from the patch. Clearing flags on attachment: 241720 Committed r176211: <http://trac.webkit.org/changeset/176211>
All reviewed patches have been landed. Closing bug.
(In reply to comment #13) > Comment on attachment 241720 [details] > Removes the unnecessary not clauses, and removes the patch file from the > patch. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=241720&action=review > > > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:162 > > +_lots_of_failing_tests = map(lambda num: "test-%s.html" % num, range(0, 100)) > > What is the failure limit for unit tests? For regular tests, it's 500 > failures or 50 crashes/timeouts, and for EWS it's 30 failures. For the unit tests it's 10, so I guess this could be 20 rather than 100. There was a prior test that used 100 tests generated this way to communicate that there were many failing tests, so I just kept it and made it global. > > > Tools/Scripts/webkitpy/tool/bot/patchanalysistask.py:237 > > + self._should_defer_patch_or_throw(first_results.failing_test_results(), first_results_archive, first_script_error, first_failure_status_id) > > This is still somewhat unreadable, and in need of refactoring. Which should > be easier, now that we have test coverage. I agree, I'll open a bug for it.