Bug 210425

Summary: Add slack-aware WKR implementation
Product: WebKit Reporter: Yusuke Suzuki <ysuzuki>
Component: Tools / TestsAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, ap, don.olmstead, jbedard, lingho, mark.lam, repstein, rniwa, ryanhaddad, saam, webkit-bot-watchers-bugzilla, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 211682, 211685, 211687, 211688, 211689, 211690, 211710, 214175    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch ap: review+

Description Yusuke Suzuki 2020-04-13 07:50:42 PDT
Currently, it is running in my home server :P
Comment 1 Yusuke Suzuki 2020-04-13 08:17:04 PDT
Created attachment 396281 [details]
Patch
Comment 2 Yusuke Suzuki 2020-04-13 08:24:17 PDT
Created attachment 396285 [details]
Patch
Comment 3 Yusuke Suzuki 2020-04-13 08:27:20 PDT
Created attachment 396286 [details]
Patch
Comment 4 Yusuke Suzuki 2020-04-13 08:37:04 PDT
The resulted message is something like this. https://webkit.slack.com/archives/CU5MVMDT6/p1586791477092600

We can extend it further, but for now, this patch is focusing on implementing the mechanism to post things :)
Comment 5 Saam Barati 2020-04-13 09:39:50 PDT
Comment on attachment 396286 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396286&action=review

> Tools/ChangeLog:3
> +        Add slack-aware WKR implementation

What does WKR stand for?

> Tools/WKR/WKR.mjs:31
> +import axios from "axios"

Is this a slack package?

> Tools/WKR/WKR.mjs:65
> +const contributors = new Contributors(JSON.parse(fs.readFileSync(path.join(rootDirectory, "..", "Scripts/webkitpy/common/config/contributors.json"))));

It almost feels like this should be pulled from the web on each commit

> Tools/WKR/WKR.mjs:197
> +        this.revision = await storage.getItem("revision");

How does this get bootstrapped? Or does it just start as undefined/null and handled with that branch above?
Comment 6 Yusuke Suzuki 2020-04-13 09:43:23 PDT
Comment on attachment 396286 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396286&action=review

>> Tools/ChangeLog:3
>> +        Add slack-aware WKR implementation
> 
> What does WKR stand for?

TBH, I'm not sure too :)
Historically, bot posting changes to IRC, was named as WKR. So this is just following to that name. My best guess is WebKit Revisions => WKR

>> Tools/WKR/WKR.mjs:31
>> +import axios from "axios"
> 
> Is this a slack package?

This is a package to easily does HTTP requests. https://github.com/axios/axios

>> Tools/WKR/WKR.mjs:65
>> +const contributors = new Contributors(JSON.parse(fs.readFileSync(path.join(rootDirectory, "..", "Scripts/webkitpy/common/config/contributors.json"))));
> 
> It almost feels like this should be pulled from the web on each commit

Sounds good.

>> Tools/WKR/WKR.mjs:197
>> +        this.revision = await storage.getItem("revision");
> 
> How does this get bootstrapped? Or does it just start as undefined/null and handled with that branch above?

Yes, it starts with undefined and well handled in "action" code.
Comment 7 Alexey Proskuryakov 2020-04-13 09:47:51 PDT
Comment on attachment 396286 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396286&action=review

Nice. OK to land as, since it's already running on your server, and thus should be checked in. But I do have comments.

> Tools/ChangeLog:10
> +        this bot is introducing post-commit hook, but for now, this polling strategy just works.

We'll want a post-commit hook mode before deploying to webkit.org I think.

> Tools/ChangeLog:14
> +        * WKR/WKR.mjs: Added.

node.js is not a technology that we use anywhere in infrastructure (other than PLT code), so not sure if we want to add another thing to learn how to maintain. I'd be very interested in opinions from people who would be keeping this up and running.

How does one test a new deployment or a new version, other than putting it in production and waiting for commits to show up on Slack?

> Tools/WKR/ReadMe.md:1
> +# WKR

We have almost zero text in Markdown format in WebKit, should we start using it everywhere?

./PerformanceTests/SixSpeed/ReadMe.md
./ReadMe.md
./Websites/perf.webkit.org/ReadMe.md

> Tools/WKR/ReadMe.md:6
> +## Steps to run

Not sure how useful this is. The real deployment needs an Ansible playbook, not manual steps.

> Tools/WKR/ReadMe.md:14
> +The lastest posted revision data is stored in `data/` directory. You can clean up state if you remove files in `data/`.

This means that the service will depend on frequent backups for recovery, as otherwise it will re-announce everything since last backup.

What happens when the service is brought up from scratch without a data directory? Will it just start announcing new commits, and nothing from the past? It's worth documenting operational aspects like this one a bit more.

> Tools/WKR/WKR.mjs:36
> +const DEBUG = false;

How is this debugging mode used? Does one need to change the code, or is there a way to trigger it at runtime?

> Tools/WKR/WKR.mjs:65
> +const contributors = new Contributors(JSON.parse(fs.readFileSync(path.join(rootDirectory, "..", "Scripts/webkitpy/common/config/contributors.json"))));

I don't see where the checkout is being updated. In fact, we probably shouldn't rely on a checkout, and should fetch the data from the repository or from Trac directly each time.

Can git do this?

svn cat https://svn.webkit.org/repository/webkit/trunk/Tools/Scripts/webkitpy/common/config/contributors.json

> Tools/WKR/WKR.mjs:126
> +                // to-be-implemented.

Please make this comment more specific.

> Tools/WKR/WKR.mjs:150
> +        this.interval = 60 * 1000;
> +        this.url = "https://git.webkit.org/?p=WebKit-https.git;a=atom";

These look like beginning of file constants.

> Tools/WKR/WKR.mjs:162
> +        await sleep(Math.random() * 1000);

Why the random wait?

> Tools/WKR/WKR.mjs:186
> +            // Exponential back-off capped with 1 hour.

That's a huge cap. Why is it desirable to delay recovery by up to an hour?

> Tools/WKR/package-lock.json:8
> +      "version": "0.19.2",

How were all these hardcoded versions chosen? Should we be upgrading anything ever, and if so, what will the process be?
Comment 8 Alexey Proskuryakov 2020-04-13 09:50:47 PDT
> TBH, I'm not sure too :)
> Historically, bot posting changes to IRC, was named as WKR. So this is just
> following to that name. My best guess is WebKit Revisions => WKR

I also don't know. FWIW, it was named in https://trac.webkit.org/r143126 by Ryosuke.
Comment 9 Saam Barati 2020-04-13 11:21:06 PDT
(In reply to Alexey Proskuryakov from comment #8)
> > TBH, I'm not sure too :)
> > Historically, bot posting changes to IRC, was named as WKR. So this is just
> > following to that name. My best guess is WebKit Revisions => WKR
> 
> I also don't know. FWIW, it was named in https://trac.webkit.org/r143126 by
> Ryosuke.

Maybe we can do everyone a favor and name it something which isn't an acronym
Comment 10 Saam Barati 2020-04-13 11:32:53 PDT
Comment on attachment 396286 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396286&action=review

> Tools/WKR/ReadMe.md:9
> +2. Put `.env` file, which includes `slackURL="<Slack Endpoint URL>"`

can you say how to find this endpoint?

>> Tools/WKR/ReadMe.md:14
>> +The lastest posted revision data is stored in `data/` directory. You can clean up state if you remove files in `data/`.
> 
> This means that the service will depend on frequent backups for recovery, as otherwise it will re-announce everything since last backup.
> 
> What happens when the service is brought up from scratch without a data directory? Will it just start announcing new commits, and nothing from the past? It's worth documenting operational aspects like this one a bit more.

I believe it'll only announce new commits from my understanding of the code.

> Tools/WKR/WKR.mjs:72
> +            change = change.replace(regExp, "");

what's the point of this assignment here? Isn't "change" a string?

>> Tools/WKR/WKR.mjs:150
>> +        this.url = "https://git.webkit.org/?p=WebKit-https.git;a=atom";
> 
> These look like beginning of file constants.

nit: You could pick a more descriptive name here than "url". Maybe "commitEndpointURL"? Or something that describes what the URL is for.

> Tools/WKR/WKR.mjs:170
> +            let response = await parser.parseURL(this.url);

This library named their function very badly :-)
(The url isn't being parsed...)

>> Tools/WKR/WKR.mjs:186
>> +            // Exponential back-off capped with 1 hour.
> 
> That's a huge cap. Why is it desirable to delay recovery by up to an hour?

I also don't really see the point of this. It'd make more sense if this were a resource constrained server, but it's not.

>> Tools/WKR/package-lock.json:8
>> +      "version": "0.19.2",
> 
> How were all these hardcoded versions chosen? Should we be upgrading anything ever, and if so, what will the process be?

I believe npm has a way of generating this file, but I don't remember how.
Comment 11 lingho@apple.com 2020-04-13 11:37:38 PDT
To have this application considered for deployment on webkit.org, please open a Radar and assign to me.

The server needs to be Centos 7 based and verified to work under Centos 7, and packages need to be installed using Yum, not NPM.

Otherwise, a lot of reworks may need to be done.
Comment 12 Ryosuke Niwa 2020-04-13 11:41:10 PDT
FWIW, it was named WKR because there was CIA before my not and I wanted to keep three letter name. I was initially running it at home so the name of it was never discussed.
Comment 13 Yusuke Suzuki 2020-04-13 12:40:05 PDT
Comment on attachment 396286 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396286&action=review

Thanks, will upload the revised version soon.

>> Tools/ChangeLog:10
>> +        this bot is introducing post-commit hook, but for now, this polling strategy just works.
> 
> We'll want a post-commit hook mode before deploying to webkit.org I think.

Yes, I can help when doing this. I think large part of this implementation can be reused (parsing etc.).

>> Tools/ChangeLog:14
>> +        * WKR/WKR.mjs: Added.
> 
> node.js is not a technology that we use anywhere in infrastructure (other than PLT code), so not sure if we want to add another thing to learn how to maintain. I'd be very interested in opinions from people who would be keeping this up and running.
> 
> How does one test a new deployment or a new version, other than putting it in production and waiting for commits to show up on Slack?

Yeah, I've just hacked up this one with node.js just because JSC team was expertised with JavaScript :)

>> Tools/WKR/ReadMe.md:1
>> +# WKR
> 
> We have almost zero text in Markdown format in WebKit, should we start using it everywhere?
> 
> ./PerformanceTests/SixSpeed/ReadMe.md
> ./ReadMe.md
> ./Websites/perf.webkit.org/ReadMe.md

We recently start using ReadMe.md as I showed in slack :)

>> Tools/WKR/ReadMe.md:6
>> +## Steps to run
> 
> Not sure how useful this is. The real deployment needs an Ansible playbook, not manual steps.

I think this is useful for trying this locally for enhancement.

>> Tools/WKR/ReadMe.md:9
>> +2. Put `.env` file, which includes `slackURL="<Slack Endpoint URL>"`
> 
> can you say how to find this endpoint?

Yes, documented.

>>> Tools/WKR/ReadMe.md:14
>>> +The lastest posted revision data is stored in `data/` directory. You can clean up state if you remove files in `data/`.
>> 
>> This means that the service will depend on frequent backups for recovery, as otherwise it will re-announce everything since last backup.
>> 
>> What happens when the service is brought up from scratch without a data directory? Will it just start announcing new commits, and nothing from the past? It's worth documenting operational aspects like this one a bit more.
> 
> I believe it'll only announce new commits from my understanding of the code.

If it is brought up from the scratch, then what happens is that we skip everything and record the latest revision as ToT. And after this ToT, this bot starts reporting.

>> Tools/WKR/WKR.mjs:36
>> +const DEBUG = false;
> 
> How is this debugging mode used? Does one need to change the code, or is there a way to trigger it at runtime?

It is just added to suppress slack posting. If this is true, bot works without posting to slack.

>>>> Tools/WKR/WKR.mjs:65
>>>> +const contributors = new Contributors(JSON.parse(fs.readFileSync(path.join(rootDirectory, "..", "Scripts/webkitpy/common/config/contributors.json"))));
>>> 
>>> It almost feels like this should be pulled from the web on each commit
>> 
>> Sounds good.
> 
> I don't see where the checkout is being updated. In fact, we probably shouldn't rely on a checkout, and should fetch the data from the repository or from Trac directly each time.
> 
> Can git do this?
> 
> svn cat https://svn.webkit.org/repository/webkit/trunk/Tools/Scripts/webkitpy/common/config/contributors.json

Yeah, sounds fine. I'm currently doing `await axios.get("https://svn.webkit.org/repository/webkit/trunk/Tools/Scripts/webkitpy/common/config/contributors.json");`, as we are doing this in https://webkit.org/team/ page.

>> Tools/WKR/WKR.mjs:72
>> +            change = change.replace(regExp, "");
> 
> what's the point of this assignment here? Isn't "change" a string?

Oops, right. I should return [change, matched[1]];

>>> Tools/WKR/WKR.mjs:150
>>> +        this.url = "https://git.webkit.org/?p=WebKit-https.git;a=atom";
>> 
>> These look like beginning of file constants.
> 
> nit: You could pick a more descriptive name here than "url". Maybe "commitEndpointURL"? Or something that describes what the URL is for.

Sounds nice, fixed.

>> Tools/WKR/WKR.mjs:162
>> +        await sleep(Math.random() * 1000);
> 
> Why the random wait?

To avoid DDoS to slack, I introduced some wait here.

>> Tools/WKR/WKR.mjs:170
>> +            let response = await parser.parseURL(this.url);
> 
> This library named their function very badly :-)
> (The url isn't being parsed...)

Yeah, agree :)

>>> Tools/WKR/WKR.mjs:186
>>> +            // Exponential back-off capped with 1 hour.
>> 
>> That's a huge cap. Why is it desirable to delay recovery by up to an hour?
> 
> I also don't really see the point of this. It'd make more sense if this were a resource constrained server, but it's not.

Maybe, some network failure could occur, then, we do exponential back-off here, 1min, 2min, 4min, 8min, ... and 1hour cap.

>>> Tools/WKR/package-lock.json:8
>>> +      "version": "0.19.2",
>> 
>> How were all these hardcoded versions chosen? Should we be upgrading anything ever, and if so, what will the process be?
> 
> I believe npm has a way of generating this file, but I don't remember how.

This file iis automatically generated by npm. Updating process of package is

1. Changing package.json's versions
2. Run `npm install`

This is following to the default procedure of node.js project.

So long as this package-lock.json is commited & package.json is not changed, `npm install` will pick the same version packages.
And this package-lock.json is intented to be landed into source tree.

https://docs.npmjs.com/files/package-lock.json
> This file is intended to be committed into source repositories
Comment 14 Yusuke Suzuki 2020-04-13 12:51:07 PDT
Comment on attachment 396286 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396286&action=review

>>>> Tools/WKR/WKR.mjs:186
>>>> +            // Exponential back-off capped with 1 hour.
>>> 
>>> That's a huge cap. Why is it desirable to delay recovery by up to an hour?
>> 
>> I also don't really see the point of this. It'd make more sense if this were a resource constrained server, but it's not.
> 
> Maybe, some network failure could occur, then, we do exponential back-off here, 1min, 2min, 4min, 8min, ... and 1hour cap.

One of the example is maintenance. Then during maintenance, we will just request once per 1hour, so after the maintenance finishes, it gets the result, and interval becomes normal 60s again.
Comment 15 Yusuke Suzuki 2020-04-13 13:35:03 PDT
Created attachment 396326 [details]
Patch
Comment 16 Yusuke Suzuki 2020-04-13 13:47:41 PDT
Created attachment 396328 [details]
Patch
Comment 17 Yusuke Suzuki 2020-04-13 13:50:07 PDT
Created attachment 396329 [details]
Patch
Comment 18 Yusuke Suzuki 2020-04-13 13:51:32 PDT
Comment on attachment 396329 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396329&action=review

> Tools/WKR/WKR.mjs:176
> +            let parser = new RSS;
> +            let response = await parser.parseURL(commitEndpointURL);
> +            let commits = response.items.map((feedItem) => new Commit(feedItem, contributors));

We can parallel them, but I didn't do that, maybe, not worth doing since it is already so quick.

let [contributors, response] = await Promise.all([Contributors.create(), (new RSS).parseURL(commitEndpointURL)]);

for example.
Comment 19 Alexey Proskuryakov 2020-04-13 14:09:00 PDT
Comment on attachment 396329 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396329&action=review

I'm going to say r+ because this is already running in production, and the patch isn't introducing anything terrible like security problems. It does appear that substantial changes or even a complete rewrite will be necessary before moving to webkit.org infrastructure.

> Tools/WKR/WKR.mjs:36
> +const DEBUG = false;

You explained what this does, but my question was about how this is used. Would a comment like this be accurate: "// Change to true before starting to avoid posting notifications to Slack."?

> Tools/WKR/WKR.mjs:166
> +        await sleep(Math.random() * 1000);

I don't understand how using random() here "avoids DDoS to Slack".

> Tools/WKR/WKR.mjs:191
> +            // Exponential back-off capped with 1 hour.

You explained how "exponential back-off" behaves, but did not explain why a cap of 1 hour is desirable. I still think that it is undesirable to delay coming back up for up to an hour.

I think that this is addressing multiple scenarios:

1. git.webkit.org is not responding. It would be perfectly OK to keep making the request every minute until it comes back up.

2. slack.com is not responding. It might be a problem if the whole world was retrying every minute, to overwhelm Slack with all accumulated requests once it's back up. Still, an hour is excessive.

3. Someone lands a syntax error in contributors.json, so it fails to parse. It is desirable to keep checking frequently, no reason for exponential back-off. Maybe WKR should handle this gracefully, by reusing the previous version of contributors.json if trunk doesn't parse.
Comment 20 Yusuke Suzuki 2020-04-16 13:14:29 PDT
Comment on attachment 396329 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=396329&action=review

>> Tools/WKR/WKR.mjs:36
>> +const DEBUG = false;
> 
> You explained what this does, but my question was about how this is used. Would a comment like this be accurate: "// Change to true before starting to avoid posting notifications to Slack."?

Yes, this is for debugging purpose boolean in the local development. Added the comment.

>> Tools/WKR/WKR.mjs:166
>> +        await sleep(Math.random() * 1000);
> 
> I don't understand how using random() here "avoids DDoS to Slack".

I'm mentioning to adding some wait to stop repeatedly posting messages without any wait. Changed to inserting 500ms wait.

>> Tools/WKR/WKR.mjs:191
>> +            // Exponential back-off capped with 1 hour.
> 
> You explained how "exponential back-off" behaves, but did not explain why a cap of 1 hour is desirable. I still think that it is undesirable to delay coming back up for up to an hour.
> 
> I think that this is addressing multiple scenarios:
> 
> 1. git.webkit.org is not responding. It would be perfectly OK to keep making the request every minute until it comes back up.
> 
> 2. slack.com is not responding. It might be a problem if the whole world was retrying every minute, to overwhelm Slack with all accumulated requests once it's back up. Still, an hour is excessive.
> 
> 3. Someone lands a syntax error in contributors.json, so it fails to parse. It is desirable to keep checking frequently, no reason for exponential back-off. Maybe WKR should handle this gracefully, by reusing the previous version of contributors.json if trunk doesn't parse.

I think it is OK to remove this exponential back-off if we are OK to repeatedly accessing to webkit.org & slack.com, and maybe this is OK.
Comment 21 Yusuke Suzuki 2020-04-16 13:42:23 PDT
Committed r260220: <https://trac.webkit.org/changeset/260220>
Comment 22 Radar WebKit Bug Importer 2020-04-16 13:43:17 PDT
<rdar://problem/61901789>
Comment 23 Yusuke Suzuki 2020-05-09 21:11:13 PDT
Re-opened since this is blocked by bug 211682
Comment 24 Yusuke Suzuki 2020-05-10 00:16:20 PDT
I was using this bug to test rollout feature of webkitbot. I'll try the last one, and if it worked well, I don't do further.
Comment 25 Yusuke Suzuki 2020-05-10 00:23:56 PDT
Tested. Done.
Comment 26 Yusuke Suzuki 2020-05-10 21:02:19 PDT
Re-opened since this is blocked by bug 211710
Comment 27 Yusuke Suzuki 2020-07-09 23:51:57 PDT
Re-opened since this is blocked by bug 214175