Bug 222540

Summary: [Tools] Make run-buildbot-test compatible with buildbot 2.10.5
Product: WebKit Reporter: Carlos Alberto Lopez Perez <clopez>
Component: Tools / TestsAssignee: Carlos Alberto Lopez Perez <clopez>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, angelos, ap, dean_johnson, jbedard, ryanhaddad, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=225268
Attachments:
Description Flags
Patch
none
Patch none

Description Carlos Alberto Lopez Perez 2021-03-01 03:41:32 PST
The tool run-buildbot-test stopped working since the upgrade of buildbot to  2.10.1 at https://build.webkit.org
Comment 1 Radar WebKit Bug Importer 2021-03-08 03:42:13 PST
<rdar://problem/75164639>
Comment 2 Carlos Alberto Lopez Perez 2021-04-08 19:17:04 PDT
Created attachment 425570 [details]
Patch
Comment 3 Aakash Jain 2021-04-14 08:50:13 PDT
Comment on attachment 425570 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425570&action=review

> Tools/CISupport/ews-build/loadConfig.py:103
> +    c['schedulers'].append(forceScheduler)

This shouldn't be enabled in production EWS (atleast for now). Should be under "if is_test_mode_enabled:"

> Tools/CISupport/ews-build/steps.py:319
> +            shell.ShellCommand.start(self)

Is this part tested (locally)?

> Tools/CISupport/start-buildbot-server-virtualenv.py:166
> +                for worker in self._get_list_workers():

Instead of creating ~100 workers on local machine, it's better to use create just one worker named 'local-worker'. We already have support for 'local-worker' in EWS configuration and I am adding the support in build.webkit.org in https://bugs.webkit.org/show_bug.cgi?id=224551. local-worker attaches to all the builders. It is enabled automatically in test mode (when the BUILDBOT_PRODUCTION env variable is not explicitly set). See https://github.com/WebKit/WebKit/blob/main/Tools/CISupport/ews-build/loadConfig.py#L56

> Tools/CISupport/start-buildbot-server-virtualenv.py:167
> +                    worker_runner = multiprocessing.Process(target=self._start_worker, args=(worker,))

Don't need multiprocessing, since one worker is enough.

> Tools/CISupport/start-buildbot-server-virtualenv.py:319
> +    def _get_list_workers(self):

This can be deleted. Just one worker named 'local-worker' is sufficient.

> Tools/CISupport/start-buildbot-server-virtualenv.py:328
> +    def _start_worker(self, worker):

Don't need multiprocessiung here. Just one worker is enough.
Comment 4 Aakash Jain 2021-04-14 09:01:21 PDT
By the way, overall this is great. Thanks Carlos for taking care of this.
Comment 5 Carlos Alberto Lopez Perez 2021-04-14 09:08:12 PDT
Comment on attachment 425570 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425570&action=review

>> Tools/CISupport/ews-build/loadConfig.py:103
>> +    c['schedulers'].append(forceScheduler)
> 
> This shouldn't be enabled in production EWS (atleast for now). Should be under "if is_test_mode_enabled:"

mmm, right

>> Tools/CISupport/ews-build/steps.py:319
>> +            shell.ShellCommand.start(self)
> 
> Is this part tested (locally)?

Yes. I assume wget is a command available on any developer machine.

>> Tools/CISupport/start-buildbot-server-virtualenv.py:166
>> +                for worker in self._get_list_workers():
> 
> Instead of creating ~100 workers on local machine, it's better to use create just one worker named 'local-worker'. We already have support for 'local-worker' in EWS configuration and I am adding the support in build.webkit.org in https://bugs.webkit.org/show_bug.cgi?id=224551. local-worker attaches to all the builders. It is enabled automatically in test mode (when the BUILDBOT_PRODUCTION env variable is not explicitly set). See https://github.com/WebKit/WebKit/blob/main/Tools/CISupport/ews-build/loadConfig.py#L56

Interesting stuff. Will try to use it, thanks.
Comment 6 Aakash Jain 2021-04-14 09:41:33 PDT
Comment on attachment 425570 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425570&action=review

> Tools/ChangeLog:8
> +        This renames the previous tool run-buildbot-test to start-buildbot-server-virtualenv

Don't need virtualenv in the file name. 
You can also consider naming it: start-local-buildmaster/start-local-buildbot

> Tools/CISupport/ews-build/loadConfig.py:101
> +                    StringParameter(name="ews_revision", label="WebKit git sha1 hash to checkout before trying patch", required=True, maxsize=40)],

I believe it can be optional, and ews will just use ToT.

> Tools/CISupport/start-buildbot-server-virtualenv.py:205
> +                        print(f.read())

Can consider printing last few lines instead of complete file here.

> Tools/CISupport/start-buildbot-server-virtualenv.py:287
> +            print('Starting the twistd process ...')

Would be better to say buildbot instead of twistd here, to keep it easy to understand for users.
Comment 7 Carlos Alberto Lopez Perez 2021-04-14 09:59:36 PDT
Comment on attachment 425570 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425570&action=review

>> Tools/CISupport/ews-build/loadConfig.py:101
>> +                    StringParameter(name="ews_revision", label="WebKit git sha1 hash to checkout before trying patch", required=True, maxsize=40)],
> 
> I believe it can be optional, and ews will just use ToT.

Ok, will try with that.

>> Tools/CISupport/start-buildbot-server-virtualenv.py:205
>> +                        print(f.read())
> 
> Can consider printing last few lines instead of complete file here.

This is only printed when it fails to start, so the log will be small in any case (it couldn't even start the server process).
Also when this happens many times there are useful backtraces on the lines above the failure... so in that case seeing the whole log is useful.
Initially I was not printing anything but I got tired of re-running the command with the --no-clean option to cat manually the server.log file (the log file is by default automatically deleted when this program ends).

>> Tools/CISupport/start-buildbot-server-virtualenv.py:287
>> +            print('Starting the twistd process ...')
> 
> Would be better to say buildbot instead of twistd here, to keep it easy to understand for users.

Indeed
Comment 8 Carlos Alberto Lopez Perez 2021-04-14 14:32:05 PDT
(In reply to Carlos Alberto Lopez Perez from comment #5)
> Comment on attachment 425570 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=425570&action=review
> 
> >> Tools/CISupport/ews-build/loadConfig.py:103
> >> +    c['schedulers'].append(forceScheduler)
> > 
> > This shouldn't be enabled in production EWS (atleast for now). Should be under "if is_test_mode_enabled:"
> 
> mmm, right
> 
> >> Tools/CISupport/ews-build/steps.py:319
> >> +            shell.ShellCommand.start(self)
> > 
> > Is this part tested (locally)?
> 
> Yes. I assume wget is a command available on any developer machine.
> 

mmm.. maybe I should use curl instead

It seems wget is not available on Mac by default but curl is, isn't it?
Comment 9 Aakash Jain 2021-04-15 07:08:06 PDT
(In reply to Carlos Alberto Lopez Perez from comment #8)
> It seems wget is not available on Mac by default but curl is
That's right.
Comment 10 Carlos Alberto Lopez Perez 2021-04-29 18:22:59 PDT
(In reply to Aakash Jain from comment #3)
> 
> Instead of creating ~100 workers on local machine, it's better to use create
> just one worker named 'local-worker'. We already have support for
> 'local-worker' in EWS configuration and I am adding the support in
> build.webkit.org in https://bugs.webkit.org/show_bug.cgi?id=224551.
> local-worker attaches to all the builders. It is enabled automatically in
> test mode (when the BUILDBOT_PRODUCTION env variable is not explicitly set).
> See
> https://github.com/WebKit/WebKit/blob/main/Tools/CISupport/ews-build/
> loadConfig.py#L56
> 

I have experimented with this, and it works. But it doesn't behave exactly as I as expecting.

- With local-worker only one build at a time works. If I click on starting a new build (on queue B) meanwhile one build is already happening on queue A, then the request to start the job in queue B is held in a confusing state as it doesn't say it is waiting for queue A. I learned that if you cancel the on-going job on queue A then the one in queue A will start, but is far from intuitive.

- The dashboard with only local-worker shows all workers as offline except this local-worker connected that you don't see on the production deployment. See example screenshots: https://people.igalia.com/clopez/wkbug/222540
This is nitpick but to me it looks less confusing the image where I see all workers connected but the local-worker one.

However I can understand the benefit of not starting 100 workers by default, so I guess I will add a new option to the tool to start all workers instead of the local-worker. Meanwhile keeping the option to only start one local-worker as the default option, because it seems is the option that can be more useful for quick testing and uses less resources
Comment 11 Carlos Alberto Lopez Perez 2021-04-29 18:25:24 PDT
(In reply to Carlos Alberto Lopez Perez from comment #10)
> (In reply to Aakash Jain from comment #3)
> > 
> > Instead of creating ~100 workers on local machine, it's better to use create
> > just one worker named 'local-worker'. We already have support for
> > 'local-worker' in EWS configuration and I am adding the support in
> > build.webkit.org in https://bugs.webkit.org/show_bug.cgi?id=224551.
> > local-worker attaches to all the builders. It is enabled automatically in
> > test mode (when the BUILDBOT_PRODUCTION env variable is not explicitly set).
> > See
> > https://github.com/WebKit/WebKit/blob/main/Tools/CISupport/ews-build/
> > loadConfig.py#L56
> > 
> 
> I have experimented with this, and it works. But it doesn't behave exactly
> as I as expecting.
> 
> - With local-worker only one build at a time works. If I click on starting a
> new build (on queue B) meanwhile one build is already happening on queue A,
> then the request to start the job in queue B is held in a confusing state as
> it doesn't say it is waiting for queue A. I learned that if you cancel the
> on-going job on queue A then the one in queue A will start, but is far from
> intuitive.
> 
^^ I wanted to mean: I learned that if you cancel the on-going job on queue A then the one in queue *B* will start, but is far from intuitive.
Comment 12 Carlos Alberto Lopez Perez 2021-04-29 20:35:36 PDT
Created attachment 427397 [details]
Patch

Patch v2. Address reviewer comments. Diff from previous version: http://ix.io/3lmQ/diff
Comment 13 Aakash Jain 2021-04-30 09:37:15 PDT
Comment on attachment 427397 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427397&action=review

Looks good overall.

> Tools/ChangeLog:24
> +        * CISupport/start-buildbot-server-test: Added.

I don't like the word 'test' in this filename, it makes it feel like this file contains some kind of tests (unit-test or regression test). But file name doesn't need to block this patch. We can rename it in subsequent patch.
Comment 14 Aakash Jain 2021-04-30 09:37:59 PDT
Comment on attachment 427397 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427397&action=review

> Tools/ChangeLog:3
> +        [tools] Make run-buildbot-test compatible with buildbot 2.10.1

Nit: 2.10.5
Comment 15 Carlos Alberto Lopez Perez 2021-04-30 10:03:46 PDT
(In reply to Aakash Jain from comment #13)
> Comment on attachment 427397 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427397&action=review
> 
> Looks good overall.
> 
> > Tools/ChangeLog:24
> > +        * CISupport/start-buildbot-server-test: Added.
> 
> I don't like the word 'test' in this filename, it makes it feel like this
> file contains some kind of tests (unit-test or regression test). But file
> name doesn't need to block this patch. We can rename it in subsequent patch.

I used 'test' because the tool launches the buildbot server for testing. Is the end purpose of this tool.. to easy testing how the buildbot server will behave after doing changes to the config.

I can rename before landing

I think just 'start-buildbot-server' is maybe confusing in the sense that may imply this tool is for starting the builbot for production, which is not.

perhaps start-buildbot-server-staging ?
Comment 16 Aakash Jain 2021-04-30 10:07:29 PDT
(In reply to Carlos Alberto Lopez Perez from comment #15)
> I think just 'start-buildbot-server' is maybe confusing in the sense that
> may imply this tool is for starting the builbot for production, which is not.
That name is fine with me.
> 
> perhaps start-buildbot-server-staging ?
Maybe start-local-buildbot-server
Comment 17 Carlos Alberto Lopez Perez 2021-04-30 18:12:15 PDT
(In reply to Aakash Jain from comment #16)
> (In reply to Carlos Alberto Lopez Perez from comment #15)
> > I think just 'start-buildbot-server' is maybe confusing in the sense that
> > may imply this tool is for starting the builbot for production, which is not.
> That name is fine with me.
> > 
> > perhaps start-buildbot-server-staging ?
> Maybe start-local-buildbot-server

That one seems ok to me. Let's go with it
Comment 18 Carlos Alberto Lopez Perez 2021-04-30 18:21:10 PDT
Committed r276868 (237214@main): <https://commits.webkit.org/237214@main>