Bug 203698

Summary: [EWS] Layout testers can go in an infinite RETRY loop when there are 30+ failures on trunk
Product: WebKit Reporter: Aakash Jain <aakash_jain>
Component: Tools / TestsAssignee: Aakash Jain <aakash_jain>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, ap, ddkilzer, jbedard, ryanhaddad, webkit-bot-watchers-bugzilla, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=212755
https://bugs.webkit.org/show_bug.cgi?id=216046
https://bugs.webkit.org/show_bug.cgi?id=216487
Attachments:
Description Flags
Patch
none
Patch for landing none

Aakash Jain
Reported 2019-10-31 14:28:23 PDT
Layout testers can go in an infinite RETRY loop when there are 30+ build failures on trunk. 30 is the layout-test failure limit we use in ews (--exit-after-n-failures 30). In case there are 30+ failures on trunk, layout-tests, re-run-layout-tests, and run-layout-tests-without-patch will all fail. EWS will simply retry the build, since in case of so many failures, EWS can't reliably tell if the patch introduce any new failures or not. However, the retried build will re-use the already built archive (built on a builder queue) and test using the same revision (on which builder queue built the patch). It will not build again with ToT. (This was done to save the building time) However, since EWS will keep retrying the build on same revision, it will keep failing. Even if the failures are fixed in ToT later, the tester queue will keep retrying the same patch on a given revision, and keep failing, and wouldn't be able to automatically recover (until the patch becomes obsolete, r- or bug is closed). For example, all of the following are retry of same patch on same revision: https://ews-build.webkit.org/#/builders/24/builds/3152 https://ews-build.webkit.org/#/builders/24/builds/3162 https://ews-build.webkit.org/#/builders/24/builds/3168 https://ews-build.webkit.org/#/builders/24/builds/3174 https://ews-build.webkit.org/#/builders/24/builds/3181 https://ews-build.webkit.org/#/builders/24/builds/3189 https://ews-build.webkit.org/#/builders/24/builds/3195
Attachments
Patch (13.45 KB, patch)
2019-11-22 15:23 PST, Aakash Jain
no flags
Patch for landing (44.37 KB, patch)
2020-09-14 09:42 PDT, Aakash Jain
no flags
Aakash Jain
Comment 2 2019-11-22 15:23:42 PST
Jonathan Bedard
Comment 3 2019-11-22 17:44:26 PST
Comment on attachment 384207 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384207&action=review > Tools/BuildSlaveSupport/ews-build/loadConfig.py:142 > + if not isTriggerUsedByAnyBuilder(config, scheduler['name']) and 'build' not in scheduler['name'].lower(): Is this saying that testers cannot trigger other testers? > Tools/BuildSlaveSupport/ews-build/steps.py:494 > + self.include_revision = include_revision What is 'revision' for EWS? The git hash? > Tools/BuildSlaveSupport/ews-build/steps.py:1412 > + self.finished(SUCCESS) Won't this turn the bubble green for a bit? Also, we still have an infinite loop problem is someone broke the build, right?
Aakash Jain
Comment 4 2019-11-22 18:06:00 PST
Comment on attachment 384207 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384207&action=review >> Tools/BuildSlaveSupport/ews-build/loadConfig.py:142 >> + if not isTriggerUsedByAnyBuilder(config, scheduler['name']) and 'build' not in scheduler['name'].lower(): > > Is this saying that testers cannot trigger other testers? This was testing that we do not have any unused triggers in config.json. But now it's possible to have a valid case of triggers which no-one in config.json will use directly (since these triggers would be triggered dynamically). Effectively yes, we don't have such a scenario (testers triggering other testers) yet. >> Tools/BuildSlaveSupport/ews-build/steps.py:494 >> + self.include_revision = include_revision > > What is 'revision' for EWS? The git hash? yeah, it's git hash >> Tools/BuildSlaveSupport/ews-build/steps.py:1412 >> + self.finished(SUCCESS) > > Won't this turn the bubble green for a bit? > > Also, we still have an infinite loop problem is someone broke the build, right? This will turn only the step green. Build would be red/FAILED because of previous layout-test steps failing.
Radar WebKit Bug Importer
Comment 5 2020-07-27 05:29:43 PDT
Jonathan Bedard
Comment 6 2020-09-04 11:30:50 PDT
Comment on attachment 384207 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384207&action=review > Tools/BuildSlaveSupport/ews-build/config.json:349 > + "triggered_by": ["ios-13-sim-build-ews"], This makes sense to me, but I don't understand why this isn't already here, since ios-simulator-13 has been triggered by ios-13-sim-build-ews for months.
Aakash Jain
Comment 7 2020-09-04 11:40:55 PDT
Comment on attachment 384207 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384207&action=review >> Tools/BuildSlaveSupport/ews-build/config.json:349 >> + "triggered_by": ["ios-13-sim-build-ews"], > > This makes sense to me, but I don't understand why this isn't already here, since ios-simulator-13 has been triggered by ios-13-sim-build-ews for months. This 'triggered_by' key is used just for passing information. I wasn't able to dynamically extract this information (about who triggered current build) using buildbot. So I am passing this information in a static manner. This information is used by tester build, to trigger back the builder. Yes, iOS-13-Simulator-WK2-Tests-EWS is triggered by iOS-13-Simulator-Build-EWS using the 'triggers' key.
Jonathan Bedard
Comment 8 2020-09-04 12:43:13 PDT
Comment on attachment 384207 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384207&action=review >>> Tools/BuildSlaveSupport/ews-build/config.json:349 >>> + "triggered_by": ["ios-13-sim-build-ews"], >> >> This makes sense to me, but I don't understand why this isn't already here, since ios-simulator-13 has been triggered by ios-13-sim-build-ews for months. > > This 'triggered_by' key is used just for passing information. I wasn't able to dynamically extract this information (about who triggered current build) using buildbot. So I am passing this information in a static manner. This information is used by tester build, to trigger back the builder. > > Yes, iOS-13-Simulator-WK2-Tests-EWS is triggered by iOS-13-Simulator-Build-EWS using the 'triggers' key. That's unfortunate, but does answer my question!
Aakash Jain
Comment 9 2020-09-14 09:42:24 PDT
Created attachment 408716 [details] Patch for landing
Aakash Jain
Comment 10 2020-09-14 09:48:26 PDT
Comment on attachment 408716 [details] Patch for landing View in context: https://bugs.webkit.org/attachment.cgi?id=408716&action=review Did some more testing, and updated patch accordingly. Also re-based it on ToT. > Tools/BuildSlaveSupport/ews-app/ews/views/statusbubble.py:156 > + bubble['state'] = 'provisional-fail' Updated status-bubble code here to ensure that the status-bubble is orange when the build is waiting to be retried. > Tools/BuildSlaveSupport/ews-build/factories_unittest.py:50 > def test_generic_factory(self): Updated factories_unittest.py accordingly. > Tools/BuildSlaveSupport/ews-build/steps.py:2186 > + self.setProperty('build_summary', message) Added this line to ensure that the buildbot build summary is set properly.
Aakash Jain
Comment 11 2020-09-14 09:52:27 PDT
Sample run: Builder build: https://ews-build.webkit-uat.org/#/builders/14/builds/1594 Corresponding tester build: https://ews-build.webkit-uat.org/#/builders/16/builds/122, it correctly triggered back the builder in step #30. Re-triggered builder build: https://ews-build.webkit-uat.org/#/builders/14/builds/1607 , it correctly updated to ToT and triggered tester with ToT.
Aakash Jain
Comment 12 2020-09-14 09:55:33 PDT
Corresponding status-bubble for ios-wk2 is correctly orange and hover-over message is also correct in https://ews.webkit-uat.org/status-bubble/407969/
EWS
Comment 13 2020-09-14 10:47:38 PDT
Committed r267022: <https://trac.webkit.org/changeset/267022> All reviewed patches have been landed. Closing bug and clearing flags on attachment 408716 [details].
Aakash Jain
Comment 14 2020-09-18 10:14:04 PDT
This change seems to be working fine. e.g.: https://ews-build.webkit.org/#/builders/24/builds/26185 failed with 30+ failures. It correctly re-trigerred the parent builder in: https://ews-build.webkit.org/#/builders/23/builds/27400. This correctly triggered the tester again with ToT in https://ews-build.webkit.org/#/builders/24/builds/26223, which passed.
David Kilzer (:ddkilzer)
Comment 15 2020-09-22 10:03:46 PDT
(In reply to Aakash Jain from comment #14) > This change seems to be working fine. > > e.g.: https://ews-build.webkit.org/#/builders/24/builds/26185 failed with > 30+ failures. It correctly re-trigerred the parent builder in: > https://ews-build.webkit.org/#/builders/23/builds/27400. This correctly > triggered the tester again with ToT in > https://ews-build.webkit.org/#/builders/24/builds/26223, which passed. Awesome! Thanks Aakash!
Note You need to log in before you can comment on or make changes to this bug.