The TestFailures page blames arbitrary revisions for breaking flaky tests. It would be a lot more helpful if it would instead identify the tests as flaky.
<rdar://problem/9452796>
Created attachment 99100 [details] Teach TestFailures to detect possibly flaky tests and list them separately
Comment on attachment 99100 [details] Teach TestFailures to detect possibly flaky tests and list them separately Unofficial r=me! Excited to see this land.
Comment on attachment 99100 [details] Teach TestFailures to detect possibly flaky tests and list them separately View in context: https://bugs.webkit.org/attachment.cgi?id=99100&action=review This looks sane to me. > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:36 > * Preiodically calls callback until all current failures have been explained. Callback is Preiodically => Periodically (Just thought to mention the misspelling even though this text isn't part of the actual changes) > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:66 > * Each build contains just the failures that a) are still occuring on the bots, and b) were new occuring => occurring (Just thought to mention the misspelling even though this text isn't part of the actual changes) > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:133 > + const minimumRequiredTestRunsWithoutInterestingChanges = 5; Note: IE doesn't support const (*). I take it we don't expect to access this web app from IE. (*) The decision against const support in IE9 was remarked in <http://blogs.msdn.com/b/ie/archive/2010/08/25/chakra-interoperability-means-more-than-just-standards.aspx>.
View in context: https://bugs.webkit.org/attachment.cgi?id=99100&action=review > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js:35 > + return false; Since the expected output of this function is an Array, it might make more sense to return null (still falsey). > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:108 > + var newFlakyTests = self._flakinessDetector.incorporateTestResults(nextBuildName, tests, tooManyFailures); > + if (newFlakyTests.length) { If there were tooManyFailures, then this would return "false" (or maybe "null") in the future and the .length check would through an error. This should probably become: if (newFlakyTests && newFlakyTests.length) { ... } > Tools/ChangeLog:31 > + We tell our caller to keep calling until all current failures have been explained and we've > + gone through 5 builds without any new flaky tests being identified. How far back does this normally go? Just 5-10 builds? 100 builds? Is there a limit on how far back this looks or could it go back to the start (if things were flaky).
(In reply to comment #5) > View in context: https://bugs.webkit.org/attachment.cgi?id=99100&action=review > > > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/FlakyLayoutTestDetector.js:35 > > + return false; > > Since the expected output of this function is an Array, it might make more sense to return null (still falsey). > > > Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:108 > > + var newFlakyTests = self._flakinessDetector.incorporateTestResults(nextBuildName, tests, tooManyFailures); > > + if (newFlakyTests.length) { > > If there were tooManyFailures, then this would return "false" (or maybe "null") in the future and the .length check would through an error. This should probably become: > > if (newFlakyTests && newFlakyTests.length) { ... } I think I'll change to returning an empty array in this case. The false is leftover from an earlier version of this function that just returned a boolean. > > Tools/ChangeLog:31 > > + We tell our caller to keep calling until all current failures have been explained and we've > > + gone through 5 builds without any new flaky tests being identified. > > How far back does this normally go? Just 5-10 builds? 100 builds? Is there a limit on how far back this looks or could it go back to the start (if things were flaky). It could go all the way back to the start. In practice it only goes back fewer than 20 builds in most cases.
Comment on attachment 99100 [details] Teach TestFailures to detect possibly flaky tests and list them separately View in context: https://bugs.webkit.org/attachment.cgi?id=99100&action=review >> Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:36 >> * Preiodically calls callback until all current failures have been explained. Callback is > > Preiodically => Periodically > > (Just thought to mention the misspelling even though this text isn't part of the actual changes) I'll fix this. >> Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:66 >> * Each build contains just the failures that a) are still occuring on the bots, and b) were new > > occuring => occurring > > (Just thought to mention the misspelling even though this text isn't part of the actual changes) I'll fix this. >> Tools/BuildSlaveSupport/build.webkit.org-config/public_html/TestFailures/LayoutTestHistoryAnalyzer.js:133 >> + const minimumRequiredTestRunsWithoutInterestingChanges = 5; > > Note: IE doesn't support const (*). I take it we don't expect to access this web app from IE. > > (*) The decision against const support in IE9 was remarked in <http://blogs.msdn.com/b/ie/archive/2010/08/25/chakra-interoperability-means-more-than-just-standards.aspx>. Correct.
Committed r90054: <http://trac.webkit.org/changeset/90054>