Bug 138217 - PatchAnalysisTask._test_patch() should have more tests that examine cases where the failure limit is exceeded.
Summary: PatchAnalysisTask._test_patch() should have more tests that examine cases whe...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Srinivasan Vijayaraghavan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-30 11:41 PDT by Jake Nielsen
Modified: 2016-04-23 13:35 PDT (History)
8 users (show)

See Also:


Attachments
Adds more test cases, and fixes a bug. (10.44 KB, patch)
2014-10-30 13:32 PDT, Jake Nielsen
dbates: review-
dbates: commit-queue-
Details | Formatted Diff | Diff
Removes useless test (9.97 KB, patch)
2014-11-07 15:10 PST, Jake Nielsen
dbates: review-
dbates: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jake Nielsen 2014-10-30 11:41:33 PDT
commitqueuetask_unittest.py tests this code path. Add more tests there.
Comment 1 Jake Nielsen 2014-10-30 13:32:09 PDT
Created attachment 240691 [details]
Adds more test cases, and fixes a bug.
Comment 2 Alexey Proskuryakov 2014-10-30 13:39:02 PDT
Comment on attachment 240691 [details]
Adds more test cases, and fixes a bug.

View in context: https://bugs.webkit.org/attachment.cgi?id=240691&action=review

> Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:524
> +            first_test_failures=[],
> +            second_test_failures=[],
> +            clean_test_failures=_lots_of_failing_tests)
> +
> +        self._run_and_expect_patch_analysis_result(commit_queue, PatchAnalysisResult.PASS)

We shouldn't be testing clean tests in this case, should we?
Comment 3 Jake Nielsen 2014-10-30 13:41:51 PDT
(In reply to comment #2)
> Comment on attachment 240691 [details]
> Adds more test cases, and fixes a bug.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=240691&action=review
> 
> > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:524
> > +            first_test_failures=[],
> > +            second_test_failures=[],
> > +            clean_test_failures=_lots_of_failing_tests)
> > +
> > +        self._run_and_expect_patch_analysis_result(commit_queue, PatchAnalysisResult.PASS)
> 
> We shouldn't be testing clean tests in this case, should we?
expect_clean_tests_to_run defaults to False, I can make it explicit if you want though.
Comment 4 Jake Nielsen 2014-10-31 11:42:53 PDT
bump
Comment 5 Jake Nielsen 2014-11-07 11:34:25 PST
double bump
Comment 6 Ryosuke Niwa 2014-11-07 13:53:48 PST
Comment on attachment 240691 [details]
Adds more test cases, and fixes a bug.

rs=me
Comment 7 Daniel Bates 2014-11-07 14:07:19 PST
Comment on attachment 240691 [details]
Adds more test cases, and fixes a bug.

View in context: https://bugs.webkit.org/attachment.cgi?id=240691&action=review

>>> Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:524
>>> +        self._run_and_expect_patch_analysis_result(commit_queue, PatchAnalysisResult.PASS)
>> 
>> We shouldn't be testing clean tests in this case, should we?
> 
> expect_clean_tests_to_run defaults to False, I can make it explicit if you want though.

From my understanding, Alexey is remarking about the usefulness of this unit test given that we will neither build and run layout tests on the "patch" a second time (*) nor build and run layout tests without the "patch" (since there were no test failures when we built and tested the "patch" on the first try); => only the value of the first argument, first_test_failures, is important for the purpose of this test; => the name of this function is a misnomer. One way to resolve this issue is to rename this function to better describe its purpose, maybe test_patch_passes_on_first_try? You may also want to consider adding a comment and/or pick the empty list for the value of argument clean_test_failures to imply to the reader the arbitrariness of the value of the arguments second_test_failures and clean_test_failures.

On another note, I'm unclear how beneficial this test case is given that we have the test case CommitQueueTaskTest.test_success_case(). 

(*) You commented on this behavior on line 510 of this patch. Is this comment incorrect?
Comment 8 Daniel Bates 2014-11-07 14:18:14 PST
Comment on attachment 240691 [details]
Adds more test cases, and fixes a bug.

View in context: https://bugs.webkit.org/attachment.cgi?id=240691&action=review

> Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:532
> +        # Unfortunately there are cases where the clean build will randomly fail enough tests to hit the failure limit.

Can you provide a concrete example? I know we may not know an exact cause, but do we have an idea what the cause is when the a clean build randomly fails enough tests to hit the failure limit? Do we have so many non-skipped flaky tests?
Comment 9 Jake Nielsen 2014-11-07 15:03:22 PST
(In reply to comment #7)
> Comment on attachment 240691 [details]
> Adds more test cases, and fixes a bug.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=240691&action=review
> 
> >>> Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:524
> >>> +        self._run_and_expect_patch_analysis_result(commit_queue, PatchAnalysisResult.PASS)
> >> 
> >> We shouldn't be testing clean tests in this case, should we?
> > 
> > expect_clean_tests_to_run defaults to False, I can make it explicit if you want though.
> 
> From my understanding, Alexey is remarking about the usefulness of this unit
> test given that we will neither build and run layout tests on the "patch" a
> second time (*) nor build and run layout tests without the "patch" (since
> there were no test failures when we built and tested the "patch" on the
> first try); => only the value of the first argument, first_test_failures, is
> important for the purpose of this test; => the name of this function is a
> misnomer. One way to resolve this issue is to rename this function to better
> describe its purpose, maybe test_patch_passes_on_first_try? You may also
> want to consider adding a comment and/or pick the empty list for the value
> of argument clean_test_failures to imply to the reader the arbitrariness of
> the value of the arguments second_test_failures and clean_test_failures.
> 
> On another note, I'm unclear how beneficial this test case is given that we
> have the test case CommitQueueTaskTest.test_success_case().
That's a good point, I'll take it out.
> 
> (*) You commented on this behavior on line 510 of this patch. Is this
> comment incorrect?
Yes, this comment is correct. The test case in question (test_tree_failure_limit_with_patch_that_fixes_redness) ensures the correctness of the comment, but as you pointed out, so does test_success_case.
Comment 10 Jake Nielsen 2014-11-07 15:05:53 PST
(In reply to comment #8)
> Comment on attachment 240691 [details]
> Adds more test cases, and fixes a bug.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=240691&action=review
> 
> > Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:532
> > +        # Unfortunately there are cases where the clean build will randomly fail enough tests to hit the failure limit.
> 
> Can you provide a concrete example? I know we may not know an exact cause,
> but do we have an idea what the cause is when the a clean build randomly
> fails enough tests to hit the failure limit? Do we have so many non-skipped
> flaky tests?
Alexey mentioned this case to me in person. As I recall there's a case where the browser can get resized and essentially any test that expects elements to end up at a specific location will fail. I may be misunderstanding/misremembering. 
Did I get that right Alexey?
Comment 11 Jake Nielsen 2014-11-07 15:10:20 PST
Created attachment 241211 [details]
Removes useless test
Comment 12 Alexey Proskuryakov 2014-11-07 15:26:13 PST
It's not flaky tests, it's some flakiness in WebKitTestRunner, or WindowServer, or who know where. See e.g. <https://build.webkit.org/results/Apple%20Mavericks%20Debug%20WK1%20(Tests)/r175752%20(8770)/results.html> - that happens multiple times per week. People were looking into this, but didn't find the root cause yet.
Comment 13 Daniel Bates 2016-04-23 13:18:07 PDT
Comment on attachment 241211 [details]
Removes useless test

View in context: https://bugs.webkit.org/attachment.cgi?id=241211&action=review

This patch is reasonable. It seems excessive to represent _lots_of_failing_tests with 100 failing tests. It seems sufficient to represents this with three failures and use the name of the variable to convey that it represents many test failures. If we make _lots_of_failing_tests be an array of three test names then we should update the other test cases such that we pass an array with less than three entries for {first, second, clean}_test_failures as appropriate for the test.

> Tools/Scripts/webkitpy/tool/bot/commitqueuetask_unittest.py:518
> +    def test_tree_failure_limit_with_patch_that_potentially_fixes_some_redness(self):

How does this test case represent a patch that "potentially fixes some redness"? I mean, the patch fails the same tests in both the first and second tries. The result of the clean build represents an anomaly that we should not experience in theory, but seems to hit in practice according to you and Alexey. We should determine the root cause and fix this anomaly. I can understand the motivation for including a test for this anomaly. I do not see the benefit of this test case other than to serve as documentation that the anomaly can occur. This test case is logically not sound. If the purpose of this test is for documentation purposes then it would be better to document this anomaly in a bug. So that we can fix track it in a searchable manner and ultimately fix it. I do not see the benefit of validating in the webkitpy code that we can handle such a case because if a machine were to get into a state materially consistent with the state excised by this test case (maybe a cosmic ray hit it?) then the machine is messed up and I do not see how we can trust results from it, including results from the continued execution of the webkitpy code.
Comment 14 Daniel Bates 2016-04-23 13:23:29 PDT
(In reply to comment #13)
> Comment on attachment 241211 [details]
> Removes useless test
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=241211&action=review
> 
> This patch is reasonable. [...]

To clarify, I'm r-'ing this patch because this patch is several month old and I would like to see a rebased patch. I would also like to see us simplify this test based on remark in the first paragraph of comment #13 and I would like us to consider either renaming or removing the the test case test_tree_failure_limit_with_patch_that_potentially_fixes_some_redness().