Bug 202424 - [ews] Add API endpoint to retry failed builds for a patch
Summary: [ews] Add API endpoint to retry failed builds for a patch
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Aakash Jain
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-10-01 12:37 PDT by Aakash Jain
Modified: 2019-10-02 13:08 PDT (History)
4 users (show)

See Also:


Attachments
Patch (7.05 KB, patch)
2019-10-01 13:27 PDT, Aakash Jain
no flags Details | Formatted Diff | Diff
Patch (7.08 KB, patch)
2019-10-02 11:37 PDT, Aakash Jain
jbedard: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aakash Jain 2019-10-01 12:37:57 PDT
Add API endpoint to retry failed builds for a patch. This would allow us to implement the functionality required in Bug 196599: 'EWS should have a way to retry a patch'
Comment 1 Aakash Jain 2019-10-01 13:27:54 PDT
Created attachment 379942 [details]
Patch
Comment 2 Jonathan Bedard 2019-10-01 14:18:43 PDT
Comment on attachment 379942 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379942&action=review

> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:48
> +        except:

Can we be be more specific about the kind of errors we're catching here? At the moment, we're catching everything.

> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:51
> +        builds_to_retry = StatusBubble().find_failed_builds_for_patch(patch_id)

I understand the logic here, I just question it's wisdom. We're automatically retrying only the builds which failed, not the successful ones. Particularly with layout tests, I'm not sure this is a good idea....imagine a case where someone lands a patch which fixes your patch on iOS, but breaks it on MacOS.

> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:58
> +            rc = Buildbot.retry_build(build.builder_id, build.number)

I don't think rc is very useful here.

> Tools/BuildSlaveSupport/ews-app/ews/views/statusbubble.py:212
> +    def find_failed_builds_for_patch(self, patch_id):

What are our plans to test this function? I feel like we should be able to test this function now.
Comment 3 Aakash Jain 2019-10-02 05:14:25 PDT
Comment on attachment 379942 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379942&action=review

>> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:48
>> +        except:
> 
> Can we be be more specific about the kind of errors we're catching here? At the moment, we're catching everything.

This is to catch any errors in converting patch_id to integer. It can be ValueError(when patch_id is not a integer), or TypeError (when patch_id is missing).

I agree that it's good to catch specific exceptions. However, in this case being generic also seems fine, since if we can't get the patch_id for any reason, there is no point in continuing or even crashing.

>> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:51
>> +        builds_to_retry = StatusBubble().find_failed_builds_for_patch(patch_id)
> 
> I understand the logic here, I just question it's wisdom. We're automatically retrying only the builds which failed, not the successful ones. Particularly with layout tests, I'm not sure this is a good idea....imagine a case where someone lands a patch which fixes your patch on iOS, but breaks it on MacOS.

This one is not an automatic retry. This is a manual retry, done by pressing a button by engineers. (See #4 in https://lists.webkit.org/pipermail/webkit-dev/2019-September/030804.html)

Note that EWS is run on every patch for every configuration by default. If some EWS bubbles are red for a given patch, and any engineer believes that EWS is incorrect (maybe because of flaky tests, infrastructure issue, patch being part of patch-series which was processed too soon etc.), then he/she can press the 'Retry failed builds' button to retry those builds.

A new uploaded patch (e.g.: to fix previous failures) is always processed on every EWS queue by default.

>> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:58
>> +            rc = Buildbot.retry_build(build.builder_id, build.number)
> 
> I don't think rc is very useful here.

The other option I considered was following (but didn't find it as readable as this one). Let me know if you prefer this.

if not Buildbot.retry_build(build.builder_id, build.number):

>> Tools/BuildSlaveSupport/ews-app/ews/views/statusbubble.py:212
>> +    def find_failed_builds_for_patch(self, patch_id):
> 
> What are our plans to test this function? I feel like we should be able to test this function now.

I tested in manually with various scenarios (using http://ews.webkit-uat.org/retry/). We don't have unit-tests for django app yet. I will add unit-tests separately in https://bugs.webkit.org/show_bug.cgi?id=202452
Comment 4 Jonathan Bedard 2019-10-02 07:38:14 PDT
Comment on attachment 379942 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=379942&action=review

>>> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:48
>>> +        except:
>> 
>> Can we be be more specific about the kind of errors we're catching here? At the moment, we're catching everything.
> 
> This is to catch any errors in converting patch_id to integer. It can be ValueError(when patch_id is not a integer), or TypeError (when patch_id is missing).
> 
> I agree that it's good to catch specific exceptions. However, in this case being generic also seems fine, since if we can't get the patch_id for any reason, there is no point in continuing or even crashing.

I think the bigger problem is that since Python is interpreted, this will also catch syntax errors. In that case, the response will not be indicative of the actual error.

>>> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:51
>>> +        builds_to_retry = StatusBubble().find_failed_builds_for_patch(patch_id)
>> 
>> I understand the logic here, I just question it's wisdom. We're automatically retrying only the builds which failed, not the successful ones. Particularly with layout tests, I'm not sure this is a good idea....imagine a case where someone lands a patch which fixes your patch on iOS, but breaks it on MacOS.
> 
> This one is not an automatic retry. This is a manual retry, done by pressing a button by engineers. (See #4 in https://lists.webkit.org/pipermail/webkit-dev/2019-September/030804.html)
> 
> Note that EWS is run on every patch for every configuration by default. If some EWS bubbles are red for a given patch, and any engineer believes that EWS is incorrect (maybe because of flaky tests, infrastructure issue, patch being part of patch-series which was processed too soon etc.), then he/she can press the 'Retry failed builds' button to retry those builds.
> 
> A new uploaded patch (e.g.: to fix previous failures) is always processed on every EWS queue by default.

I'm ok with this as is, but we should keep an eye on it. I can see it potentially being a problem.

>>> Tools/BuildSlaveSupport/ews-app/ews/views/retrypatch.py:58
>>> +            rc = Buildbot.retry_build(build.builder_id, build.number)
>> 
>> I don't think rc is very useful here.
> 
> The other option I considered was following (but didn't find it as readable as this one). Let me know if you prefer this.
> 
> if not Buildbot.retry_build(build.builder_id, build.number):

I would prefer this other option, but not particularly strongly. (Also note that the WebKit Style guide would say to use 'not rc' instead of 'rc != True'
Comment 5 Aakash Jain 2019-10-02 11:37:29 PDT
Created attachment 380043 [details]
Patch
Comment 6 Aakash Jain 2019-10-02 13:07:41 PDT
Committed r250623: <https://trac.webkit.org/changeset/250623>
Comment 7 Radar WebKit Bug Importer 2019-10-02 13:08:19 PDT
<rdar://problem/55921240>