Bug 51272 - commit-queue will report constant failures as flaky if other tests flake
Summary: commit-queue will report constant failures as flaky if other tests flake
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
: 50263 (view as bug list)
Depends on:
Blocks: 50263
  Show dependency treegraph
 
Reported: 2010-12-17 12:56 PST by Eric Seidel (no email)
Modified: 2010-12-21 04:56 PST (History)
6 users (show)

See Also:


Attachments
Patch (7.72 KB, patch)
2010-12-20 20:14 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (8.44 KB, patch)
2010-12-20 22:19 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (9.00 KB, patch)
2010-12-21 00:21 PST, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-12-17 12:56:05 PST
commit-queue will report constant failures as flaky if other tests flake

See https://bugs.webkit.org/show_bug.cgi?id=51236#c9

        first_failing_tests = self._failing_tests_from_last_run()
        if self._test():
            self._report_flaky_tests(first_failing_tests)
            return True

        second_failing_tests = self._failing_tests_from_last_run()
        if first_failing_tests != second_failing_tests:
            self._report_flaky_tests(first_failing_tests + second_failing_tests)
            return False

Notice how if second_failing_tests is different from first_failing_tests, all of them get reported.  What should happen is only the differences between first and second should be reported.
Comment 1 Eric Seidel (no email) 2010-12-17 12:56:58 PST
Fixing this may also fix bug 50263
Comment 2 Adam Barth 2010-12-17 13:02:28 PST
We can look at the test ordering and only report a flaky if we get further the second time.
Comment 3 Eric Seidel (no email) 2010-12-17 13:11:30 PST
Oh, I wasn't even considering that.  I figured we'd just intersect the failure lists.  But you're right, it's possible we'd still report a "late" constant failure as flaky if an earlier test flaked the second time.
Comment 4 Eric Seidel (no email) 2010-12-17 23:49:03 PST
How would you suggest we look at the test ordering?  I'm not sure we have a class to do that.
Comment 5 Eric Seidel (no email) 2010-12-17 23:49:46 PST
Eh, we can write an approximate sort function.  I'll take a whack at this.
Comment 6 Eric Seidel (no email) 2010-12-20 20:14:10 PST
Created attachment 77078 [details]
Patch
Comment 7 Eric Seidel (no email) 2010-12-20 20:35:00 PST
*** Bug 50263 has been marked as a duplicate of this bug. ***
Comment 8 Adam Barth 2010-12-20 21:00:28 PST
Comment on attachment 77078 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=77078&action=review

The test ordering code is too fragile.  As the project evolves, its going to be subtly wrong.

> Tools/Scripts/webkitpy/common/net/layouttestresults.py:99
> +    # This is intended to match run-webkit-tests behavior.
> +    @classmethod
> +    def test_order_compare(cls, test1, test2):

Really?  I don't think this function is right.  What about WebSocket tests?

> Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py:181
> +        compare_result = LayoutTestResults.test_order_compare(first_failures[-1], second_failures[-1])
> +        if compare_result < 0:  # First run was shorter
> +            return different_failures.difference(second_failures)
> +        elif compare_result > 0:
> +            return different_failures.difference(first_failures)

I don't know about this design.  You didn't like the idea of only reporting a flak if the second run was all-success?
Comment 9 Eric Seidel (no email) 2010-12-20 21:26:30 PST
(In reply to comment #8)
> (From update of attachment 77078 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=77078&action=review
> 
> The test ordering code is too fragile.  As the project evolves, its going to be subtly wrong.

I agree, that's a risk.  However, if we shared this code with NRWT it would be fine. :)

> > Tools/Scripts/webkitpy/common/net/layouttestresults.py:99
> > +    # This is intended to match run-webkit-tests behavior.
> > +    @classmethod
> > +    def test_order_compare(cls, test1, test2):
> 
> Really?  I don't think this function is right.  What about WebSocket tests?

Again, should be shared if we're going to go this way.

> > Tools/Scripts/webkitpy/tool/bot/commitqueuetask.py:181
> > +        compare_result = LayoutTestResults.test_order_compare(first_failures[-1], second_failures[-1])
> > +        if compare_result < 0:  # First run was shorter
> > +            return different_failures.difference(second_failures)
> > +        elif compare_result > 0:
> > +            return different_failures.difference(first_failures)
> 
> I don't know about this design.  You didn't like the idea of only reporting a flak if the second run was all-success?

Oh, that's fine.  Certainly simpler.  It just produces this state where we retry w/o reporting any flaky tests which is OK just less than idea.
Comment 10 Adam Barth 2010-12-20 21:30:38 PST
> Oh, that's fine.  Certainly simpler.  It just produces this state where we retry w/o reporting any flaky tests which is OK just less than idea.

It generates this strange response curve where if things are really really flaky, we never file bugs, but if things are just a bit flaky, then we'll be good at detecting them and filing bugs.
Comment 11 Eric Seidel (no email) 2010-12-20 21:34:39 PST
Basically, I ended up writing the test sorting function first, and then fleshed out the other details.  The other details ended up rather complicated. :)

This is certainly not a simple solution.  Then again, flaky tests are not simple to deal with. :)  But I think I'll write a new (much much simpler) patch which just removes the attempt flaky tests when we have a double-flake.
Comment 12 Eric Seidel (no email) 2010-12-20 22:19:07 PST
Created attachment 77084 [details]
Patch
Comment 13 Adam Barth 2010-12-20 22:20:32 PST
Comment on attachment 77084 [details]
Patch

Thanks.
Comment 14 WebKit Commit Bot 2010-12-21 00:06:31 PST
Comment on attachment 77084 [details]
Patch

Rejecting attachment 77084 [details] from commit-queue.

Failed to run "['./Tools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=cr-jail-4', 'build-and-test', '--no-clean', '--no-update', '--test', '--non-interactive']" exit_code: 2
Last 500 characters of output:
t/webkit-commit-queue/Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py", line 114, in test_update_failure
    self._run_through_task(commit_queue, expected_stderr)
  File "/mnt/git/webkit-commit-queue/Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py", line 76, in _run_through_task
    self.assertEqual(success, not expect_retry)
AssertionError: False != True

----------------------------------------------------------------------
Ran 755 tests in 19.168s

FAILED (failures=2)

Full output: http://queues.webkit.org/results/7304078
Comment 15 Eric Seidel (no email) 2010-12-21 00:21:24 PST
Created attachment 77090 [details]
Patch for landing
Comment 16 WebKit Commit Bot 2010-12-21 01:38:20 PST
The commit-queue encountered the following flaky tests while processing attachment 77090 [details]:

fast/preloader/script.html bug 50879 (author: abarth@webkit.org)
animations/play-state-suspend.html bug 50959 (author: cmarrin@apple.com)
The commit-queue is continuing to process your patch.
Comment 17 WebKit Commit Bot 2010-12-21 04:56:00 PST
Comment on attachment 77090 [details]
Patch for landing

Clearing flags on attachment: 77090

Committed r74408: <http://trac.webkit.org/changeset/74408>
Comment 18 WebKit Commit Bot 2010-12-21 04:56:09 PST
All reviewed patches have been landed.  Closing bug.