51272 – commit-queue will report constant failures as flaky if other tests flake

RESOLVED FIXED 51272

commit-queue will report constant failures as flaky if other tests flake

https://bugs.webkit.org/show_bug.cgi?id=51272

Summary commit-queue will report constant failures as flaky if other tests flake

Eric Seidel (no email)

Reported 2010-12-17 12:56:05 PST

commit-queue will report constant failures as flaky if other tests flake See https://bugs.webkit.org/show_bug.cgi?id=51236#c9 first_failing_tests = self._failing_tests_from_last_run() if self._test(): self._report_flaky_tests(first_failing_tests) return True second_failing_tests = self._failing_tests_from_last_run() if first_failing_tests != second_failing_tests: self._report_flaky_tests(first_failing_tests + second_failing_tests) return False Notice how if second_failing_tests is different from first_failing_tests, all of them get reported. What should happen is only the differences between first and second should be reported.

Attachments
Patch (7.72 KB, patch) 2010-12-20 20:14 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch (8.44 KB, patch) 2010-12-20 22:19 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch for landing (9.00 KB, patch) 2010-12-21 00:21 PST, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2010-12-17 12:56:58 PST

Fixing this may also fix bug 50263

Adam Barth

Comment 2 2010-12-17 13:02:28 PST

We can look at the test ordering and only report a flaky if we get further the second time.

Eric Seidel (no email)

Comment 3 2010-12-17 13:11:30 PST

Oh, I wasn't even considering that. I figured we'd just intersect the failure lists. But you're right, it's possible we'd still report a "late" constant failure as flaky if an earlier test flaked the second time.

Eric Seidel (no email)

Comment 4 2010-12-17 23:49:03 PST

How would you suggest we look at the test ordering? I'm not sure we have a class to do that.

Eric Seidel (no email)

Comment 5 2010-12-17 23:49:46 PST

Eh, we can write an approximate sort function. I'll take a whack at this.

Eric Seidel (no email)

Comment 6 2010-12-20 20:14:10 PST

Created attachment 77078 [details] Patch

Eric Seidel (no email)

Comment 7 2010-12-20 20:35:00 PST

*** Bug 50263 has been marked as a duplicate of this bug. ***

Adam Barth

Comment 8 2010-12-20 21:00:28 PST

Comment on attachment 77078 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=77078&action=review The test ordering code is too fragile. As the project evolves, its going to be subtly wrong. > Tools/Scripts/webkitpy/common/net/layouttestresults.py:99 > + # This is intended to match run-webkit-tests behavior. > + @classmethod > + def test_order_compare(cls, test1, test2): Really? I don't think this function is right. What about WebSocket tests? > Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py:181 > + compare_result = LayoutTestResults.test_order_compare(first_failures[-1], second_failures[-1]) > + if compare_result < 0: # First run was shorter > + return different_failures.difference(second_failures) > + elif compare_result > 0: > + return different_failures.difference(first_failures) I don't know about this design. You didn't like the idea of only reporting a flak if the second run was all-success?

Eric Seidel (no email)

Comment 9 2010-12-20 21:26:30 PST

(In reply to comment #8) > (From update of attachment 77078 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=77078&action=review > > The test ordering code is too fragile. As the project evolves, its going to be subtly wrong. I agree, that's a risk. However, if we shared this code with NRWT it would be fine. :) > > Tools/Scripts/webkitpy/common/net/layouttestresults.py:99 > > + # This is intended to match run-webkit-tests behavior. > > + @classmethod > > + def test_order_compare(cls, test1, test2): > > Really? I don't think this function is right. What about WebSocket tests? Again, should be shared if we're going to go this way. > > Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py:181 > > + compare_result = LayoutTestResults.test_order_compare(first_failures[-1], second_failures[-1]) > > + if compare_result < 0: # First run was shorter > > + return different_failures.difference(second_failures) > > + elif compare_result > 0: > > + return different_failures.difference(first_failures) > > I don't know about this design. You didn't like the idea of only reporting a flak if the second run was all-success? Oh, that's fine. Certainly simpler. It just produces this state where we retry w/o reporting any flaky tests which is OK just less than idea.

Adam Barth

Comment 10 2010-12-20 21:30:38 PST

> Oh, that's fine. Certainly simpler. It just produces this state where we retry w/o reporting any flaky tests which is OK just less than idea. It generates this strange response curve where if things are really really flaky, we never file bugs, but if things are just a bit flaky, then we'll be good at detecting them and filing bugs.

Eric Seidel (no email)

Comment 11 2010-12-20 21:34:39 PST

Basically, I ended up writing the test sorting function first, and then fleshed out the other details. The other details ended up rather complicated. :) This is certainly not a simple solution. Then again, flaky tests are not simple to deal with. :) But I think I'll write a new (much much simpler) patch which just removes the attempt flaky tests when we have a double-flake.

Eric Seidel (no email)

Comment 12 2010-12-20 22:19:07 PST

Created attachment 77084 [details] Patch

Adam Barth

Comment 13 2010-12-20 22:20:32 PST

Comment on attachment 77084 [details] Patch Thanks.

WebKit Commit Bot

Comment 14 2010-12-21 00:06:31 PST

Comment on attachment 77084 [details] Patch Rejecting attachment 77084 [details] from commit-queue. Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-4', 'build-and-test', '--no-clean', '--no-update', '--test', '--non-interactive']" exit_code: 2 Last 500 characters of output: t/webkit-commit-queue/Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py", line 114, in test_update_failure self._run_through_task(commit_queue, expected_stderr) File "/mnt/git/webkit-commit-queue/Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py", line 76, in _run_through_task self.assertEqual(success, not expect_retry) AssertionError: False != True ---------------------------------------------------------------------- Ran 755 tests in 19.168s FAILED (failures=2) Full output: http://queues.webkit.org/results/7304078

Eric Seidel (no email)

Comment 15 2010-12-21 00:21:24 PST

Created attachment 77090 [details] Patch for landing

WebKit Commit Bot

Comment 16 2010-12-21 01:38:20 PST

The commit-queue encountered the following flaky tests while processing attachment 77090 [details]: fast/preloader/script.html bug 50879 (author: abarth@webkit.org) animations/play-state-suspend.html bug 50959 (author: cmarrin@apple.com) The commit-queue is continuing to process your patch.

WebKit Commit Bot

Comment 17 2010-12-21 04:56:00 PST

Comment on attachment 77090 [details] Patch for landing Clearing flags on attachment: 77090 Committed r74408: <http://trac.webkit.org/changeset/74408>

WebKit Commit Bot

Comment 18 2010-12-21 04:56:09 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component Tools / Tests

Assignee

Nobody

Reported

2010-12-17 12:56 PST

Modified

2010-12-21 04:56 PST History

CC List

6 users Show

URL

Keywords

Duplicates (1)

50263 View as bug list

Depends on

Blocks

50263

Dependencies

tree graph