commit-queue will report constant failures as flaky if other tests flake See https://bugs.webkit.org/show_bug.cgi?id=51236#c9 first_failing_tests = self._failing_tests_from_last_run() if self._test(): self._report_flaky_tests(first_failing_tests) return True second_failing_tests = self._failing_tests_from_last_run() if first_failing_tests != second_failing_tests: self._report_flaky_tests(first_failing_tests + second_failing_tests) return False Notice how if second_failing_tests is different from first_failing_tests, all of them get reported. What should happen is only the differences between first and second should be reported.
Fixing this may also fix bug 50263
We can look at the test ordering and only report a flaky if we get further the second time.
Oh, I wasn't even considering that. I figured we'd just intersect the failure lists. But you're right, it's possible we'd still report a "late" constant failure as flaky if an earlier test flaked the second time.
How would you suggest we look at the test ordering? I'm not sure we have a class to do that.
Eh, we can write an approximate sort function. I'll take a whack at this.
Created attachment 77078 [details] Patch
*** Bug 50263 has been marked as a duplicate of this bug. ***
Comment on attachment 77078 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=77078&action=review The test ordering code is too fragile. As the project evolves, its going to be subtly wrong. > Tools/Scripts/webkitpy/common/net/layouttestresults.py:99 > + # This is intended to match run-webkit-tests behavior. > + @classmethod > + def test_order_compare(cls, test1, test2): Really? I don't think this function is right. What about WebSocket tests? > Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py:181 > + compare_result = LayoutTestResults.test_order_compare(first_failures[-1], second_failures[-1]) > + if compare_result < 0: # First run was shorter > + return different_failures.difference(second_failures) > + elif compare_result > 0: > + return different_failures.difference(first_failures) I don't know about this design. You didn't like the idea of only reporting a flak if the second run was all-success?
(In reply to comment #8) > (From update of attachment 77078 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=77078&action=review > > The test ordering code is too fragile. As the project evolves, its going to be subtly wrong. I agree, that's a risk. However, if we shared this code with NRWT it would be fine. :) > > Tools/Scripts/webkitpy/common/net/layouttestresults.py:99 > > + # This is intended to match run-webkit-tests behavior. > > + @classmethod > > + def test_order_compare(cls, test1, test2): > > Really? I don't think this function is right. What about WebSocket tests? Again, should be shared if we're going to go this way. > > Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py:181 > > + compare_result = LayoutTestResults.test_order_compare(first_failures[-1], second_failures[-1]) > > + if compare_result < 0: # First run was shorter > > + return different_failures.difference(second_failures) > > + elif compare_result > 0: > > + return different_failures.difference(first_failures) > > I don't know about this design. You didn't like the idea of only reporting a flak if the second run was all-success? Oh, that's fine. Certainly simpler. It just produces this state where we retry w/o reporting any flaky tests which is OK just less than idea.
> Oh, that's fine. Certainly simpler. It just produces this state where we retry w/o reporting any flaky tests which is OK just less than idea. It generates this strange response curve where if things are really really flaky, we never file bugs, but if things are just a bit flaky, then we'll be good at detecting them and filing bugs.
Basically, I ended up writing the test sorting function first, and then fleshed out the other details. The other details ended up rather complicated. :) This is certainly not a simple solution. Then again, flaky tests are not simple to deal with. :) But I think I'll write a new (much much simpler) patch which just removes the attempt flaky tests when we have a double-flake.
Created attachment 77084 [details] Patch
Comment on attachment 77084 [details] Patch Thanks.
Comment on attachment 77084 [details] Patch Rejecting attachment 77084 [details] from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-4', 'build-and-test', '--no-clean', '--no-update', '--test', '--non-interactive']" exit_code: 2 Last 500 characters of output: t/webkit-commit-queue/Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py", line 114, in test_update_failure self._run_through_task(commit_queue, expected_stderr) File "/mnt/git/webkit-commit-queue/Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py", line 76, in _run_through_task self.assertEqual(success, not expect_retry) AssertionError: False != True ---------------------------------------------------------------------- Ran 755 tests in 19.168s FAILED (failures=2) Full output: http://queues.webkit.org/results/7304078
Created attachment 77090 [details] Patch for landing
The commit-queue encountered the following flaky tests while processing attachment 77090 [details]: fast/preloader/script.html bug 50879 (author: abarth@webkit.org) animations/play-state-suspend.html bug 50959 (author: cmarrin@apple.com) The commit-queue is continuing to process your patch.
Comment on attachment 77090 [details] Patch for landing Clearing flags on attachment: 77090 Committed r74408: <http://trac.webkit.org/changeset/74408>
All reviewed patches have been landed. Closing bug.