Bug 217534 - [webkitscmpy] Add `git-webkit find`
Summary: [webkitscmpy] Add `git-webkit find`
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-10-09 14:03 PDT by Jonathan Bedard
Modified: 2020-10-13 18:10 PDT (History)
6 users (show)

See Also:


Attachments
Patch (39.75 KB, patch)
2020-10-09 17:21 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (36.80 KB, patch)
2020-10-11 10:20 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (37.19 KB, patch)
2020-10-12 13:35 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (37.55 KB, patch)
2020-10-12 16:14 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (37.29 KB, patch)
2020-10-13 09:19 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (37.41 KB, patch)
2020-10-13 09:22 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (37.41 KB, patch)
2020-10-13 13:58 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (31.46 KB, patch)
2020-10-13 15:48 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch for landing (1.82 KB, patch)
2020-10-13 17:32 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2020-10-09 14:03:41 PDT
webkitscmpy supports identifiers, but we need a way to expose those identifiers to the user. The eventual intention is to have these identifiers supported across a handful of scripts, but to start with, we want a script that can simply map identifiers, revisions and hashes to one another.
Comment 1 Radar WebKit Bug Importer 2020-10-09 14:04:02 PDT
<rdar://problem/70152431>
Comment 2 Jonathan Bedard 2020-10-09 17:21:17 PDT
Created attachment 410988 [details]
Patch
Comment 3 Jonathan Bedard 2020-10-09 17:22:11 PDT
(In reply to Jonathan Bedard from comment #2)
> Created attachment 410988 [details]
> Patch

Also has the contents of https://bugs.webkit.org/show_bug.cgi?id=217533, which I intend to land first, but wanted it up for discussion.
Comment 4 Dean Johnson 2020-10-09 17:42:47 PDT
Comment on attachment 410988 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=410988&action=review

Looks pretty good, though I've got a couple questions on the entry point for `git-webkit` and the default output when no arguments are provided.

> Tools/Scripts/git-webkit:40
> +        Contributor.by_email[email] = contributor

Instead of building an implicit dictionary on `Contributor`, could we instead call a memoized function with the json file path to get a list of contributors back? It wasn't clear to me reading `git-webkit` initially that it's using a static class to store all Contributor information.

> Tools/Scripts/git-webkit:42
> +sys.exit(program.main(repository=local.Scm.from_path(scripts)))

Also, maybe it makes more sense for this to be a symlink to the same file in `webkitscmpy`?:
Tools/Scripts/libraries/webkitscmpy/git-webkit

> Tools/Scripts/libraries/webkitcorepy/webkitcorepy/arguments.py:57
> +def LoggingGroup(parser, loggers=None, default=logging.WARNING, help='{} amount of logging'):

This is awesome!

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:270
> +    class Encoder(json.JSONEncoder):

Nit: I think Python encourages placing class definitions at the top of the block they're defined in (when they're defined within another class). You could also move it outside of the class and call it CommitEncoder if that is more clear.

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:94
> +            return self.commit()

IMO if there's no argument then we should fail / exit. This could easily cause mistakes by automation / scripts if they forget to provide an argument, whereas failing would be more correct.

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/__init__.py:26
> +from webkitscmpy.program.main import main

Why is code being imported if it's not being used?
Comment 5 Jonathan Bedard 2020-10-09 20:37:10 PDT
Comment on attachment 410988 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=410988&action=review

>> Tools/Scripts/git-webkit:40
>> +        Contributor.by_email[email] = contributor
> 
> Instead of building an implicit dictionary on `Contributor`, could we instead call a memoized function with the json file path to get a list of contributors back? It wasn't clear to me reading `git-webkit` initially that it's using a static class to store all Contributor information.

I’m a bit torn on the right way to handle this.

Some background: the reason we want a contributor mapping is because sometimes git and SVN have incomplete authors (missing an email, email without a name) and contributors.json let’s us normalize that. And that’s really all we’re using contributors.json for in this script.

The catch is that contributors.json is associated with the WebKit repository specifically, but webkitscmpy is supposed to be project agnostic. That’s why I’m loading contributors.json here and not in the library....seems like maybe the right solution is giving an ‘add’ method to the class

>> Tools/Scripts/git-webkit:42
>> +sys.exit(program.main(repository=local.Scm.from_path(scripts)))
> 
> Also, maybe it makes more sense for this to be a symlink to the same file in `webkitscmpy`?:
> Tools/Scripts/libraries/webkitscmpy/git-webkit

Similar story as above. We definitely don’t want things like contributors.json attached to webkitscmpy. I think the real question is if the script in webkitscmpy should be called got-WebKit.

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:270
>> +    class Encoder(json.JSONEncoder):
> 
> Nit: I think Python encourages placing class definitions at the top of the block they're defined in (when they're defined within another class). You could also move it outside of the class and call it CommitEncoder if that is more clear.

Will move!

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:94
>> +            return self.commit()
> 
> IMO if there's no argument then we should fail / exit. This could easily cause mistakes by automation / scripts if they forget to provide an argument, whereas failing would be more correct.

Is that stance for just this function, or the whole command?

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/__init__.py:26
>> +from webkitscmpy.program.main import main
> 
> Why is code being imported if it's not being used?

So that ‘from webkitscmpy.program import main’ works.
Comment 6 Jonathan Bedard 2020-10-11 10:20:09 PDT
Created attachment 411055 [details]
Patch
Comment 7 Dean Johnson 2020-10-12 10:59:37 PDT
(In reply to Jonathan Bedard from comment #5)
> Comment on attachment 410988 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=410988&action=review
> 
> >> Tools/Scripts/git-webkit:40
> >> +        Contributor.by_email[email] = contributor
> > 
> > Instead of building an implicit dictionary on `Contributor`, could we instead call a memoized function with the json file path to get a list of contributors back? It wasn't clear to me reading `git-webkit` initially that it's using a static class to store all Contributor information.
> 
> I’m a bit torn on the right way to handle this.
> 
> Some background: the reason we want a contributor mapping is because
> sometimes git and SVN have incomplete authors (missing an email, email
> without a name) and contributors.json let’s us normalize that. And that’s
> really all we’re using contributors.json for in this script.
> 
> The catch is that contributors.json is associated with the WebKit repository
> specifically, but webkitscmpy is supposed to be project agnostic. That’s why
> I’m loading contributors.json here and not in the library....seems like
> maybe the right solution is giving an ‘add’ method to the class
My concern was less about how contributors were added, and more that we were modifying the Contributor class implicitly for any other modules which may use it. I really dislike working with code which implicitly modifies state on a module I may be importing / using later, since it's more difficult to set expectations for what the module does and how it behaves.

One way to address that is to build your contributor list in git-webkit, then pass around the list of contributors to use.
> 
> >> Tools/Scripts/git-webkit:42
> >> +sys.exit(program.main(repository=local.Scm.from_path(scripts)))
> > 
> > Also, maybe it makes more sense for this to be a symlink to the same file in `webkitscmpy`?:
> > Tools/Scripts/libraries/webkitscmpy/git-webkit
> 
> Similar story as above. We definitely don’t want things like
> contributors.json attached to webkitscmpy. I think the real question is if
> the script in webkitscmpy should be called got-WebKit.
> 
> >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:270
> >> +    class Encoder(json.JSONEncoder):
> > 
> > Nit: I think Python encourages placing class definitions at the top of the block they're defined in (when they're defined within another class). You could also move it outside of the class and call it CommitEncoder if that is more clear.
> 
> Will move!
> 
> >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:94
> >> +            return self.commit()
> > 
> > IMO if there's no argument then we should fail / exit. This could easily cause mistakes by automation / scripts if they forget to provide an argument, whereas failing would be more correct.
> 
> Is that stance for just this function, or the whole command?
Whole command. If someone types `git webkit find` I don't think we should return any results -- imagine if any code does something like:

```
get_commit_info_command = ['git', 'webkit', 'find']
identifier = some_func_to_get_commit()
if identifier:
    get_commit_info_command.append(commit)
return identifier
```

if some_func_to_get_identifier ever has a bug that causes it to return None, then we could start returning the repository's current hash every time. A realistic case of this happening is when some_func_to_get_commit() is a function which talks to a web service and happens to have a `try/except Exception` added for flaky failures and forgets to re-raise after catching an exception after hitting the retry limit.
> 
> >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/__init__.py:26
> >> +from webkitscmpy.program.main import main
> > 
> > Why is code being imported if it's not being used?
> 
> So that ‘from webkitscmpy.program import main’ works.
Ah, makes sense.
Comment 8 Dean Johnson 2020-10-12 11:01:01 PDT
Sorry, partially rewrote example code while I was replying.

Updated code example since it was incomplete:
```
get_commit_info_command = ['git', 'webkit', 'find']
commit = some_func_to_get_commit()
if commit:
    get_commit_info_command.append(commit)
identifier = run_command(get_commit_info_command)
return identifier
```
Comment 9 Jonathan Bedard 2020-10-12 11:42:45 PDT
(In reply to Dean Johnson from comment #7)
> ...
>
> My concern was less about how contributors were added, and more that we were
> modifying the Contributor class implicitly for any other modules which may
> use it. I really dislike working with code which implicitly modifies state
> on a module I may be importing / using later, since it's more difficult to
> set expectations for what the module does and how it behaves.
> 
> One way to address that is to build your contributor list in git-webkit,
> then pass around the list of contributors to use.

I'm actually going to remove the contributors part of this patch, and put it in a stand-alone change so we can discuss further. This patch is big already, I didn't expect the contributors.json to be controversial (so I added it), but it's not required for git-webkit to work.

> ...
> if some_func_to_get_identifier ever has a bug that causes it to return None,
> then we could start returning the repository's current hash every time. A
> realistic case of this happening is when some_func_to_get_commit() is a
> function which talks to a web service and happens to have a `try/except
> Exception` added for flaky failures and forgets to re-raise after catching
> an exception after hitting the retry limit.

How would you feel about supporting "HEAD" instead of making None correspond to HEAD? Suppose that means we would also have to support HEAD~ and friends, but that's not too hard.
Comment 10 Jonathan Bedard 2020-10-12 13:35:42 PDT
Created attachment 411150 [details]
Patch
Comment 11 Dean Johnson 2020-10-12 15:08:14 PDT
Comment on attachment 411150 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=411150&action=review

Thanks for splitting out the patch. Added a couple more comments that suggest moving parsing calls and repository checks outside of the command itself.

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/git.py:91
> +            if 'master' in candidates:

Nit: Do we want to prefer `master` over `main`? Maybe we should error out if a default isn't defined, but both are in `self.branches`?

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:41
> +    HEAD_RE = re.compile(r'HEAD(~\d*)*')

Technically HEAD^ is also a valid git hash, though it seems unlikely to be used.

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:104
> +                count += int(s) if s else 1

Maybe it would be better to pass the argument through to `git`, get that hash, then convert using the hash? Then we don't need to handle cases like HEAD^ or write any parsing logic ourselves.

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/svn.py:40
> +    executable = '/usr/bin/svn' if os.path.exists('/usr/bin/svn') else '/usr/local/bin/svn'

Do we have any checks for this executable existing? Maybe it'd be better to make `executable` a property on Svn and use that function to find the appropriate path.

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/command.py:35
> +            raise NotImplementedError("'{}' does not have a  help message")

Nit: Extra space in 'a  help'

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:63
> +        if repository.is_svn:

As a user I don't like it when the arguments available to me in a script change based on my CWD. Can we make the options available no matter what, then error out if we fail to parse?

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:65
> +                '--revision', '-r',

Have you considered using `type` for these arguments? You could set them to their respective: Commit._parse_revision, Commit._parse_hash, Commit._parse_identifier, etc. then just use the returned reference. And we could set the type of the option which does not use arguments to be the upper part of main(), thereby removing the majority of the parsing logic (and making it re-usable for any project).

This would remove a decent amount of logic from the command parser, so we know that anything to do with identifiers/revisions/hashes should be fully dealt with webkitscmpy.
Comment 12 Jonathan Bedard 2020-10-12 15:20:04 PDT
Comment on attachment 411150 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=411150&action=review

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/git.py:91
>> +            if 'master' in candidates:
> 
> Nit: Do we want to prefer `master` over `main`? Maybe we should error out if a default isn't defined, but both are in `self.branches`?

Most of our repos won't hit this (I encountered it with a different version of git) because the rev-parse command will give us what we need

I think if we're being realistic, a repository that has both branches probably has "master" as the default branch since that's what the industry has been doing for many years (which is why I did it this way). Not a big deal either way though, most checkouts will support the rev-parse command

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:104
>> +                count += int(s) if s else 1
> 
> Maybe it would be better to pass the argument through to `git`, get that hash, then convert using the hash? Then we don't need to handle cases like HEAD^ or write any parsing logic ourselves.

Doesn't work for SVN repositories, though, and I'm trying to keep the API the same.

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/svn.py:40
>> +    executable = '/usr/bin/svn' if os.path.exists('/usr/bin/svn') else '/usr/local/bin/svn'
> 
> Do we have any checks for this executable existing? Maybe it'd be better to make `executable` a property on Svn and use that function to find the appropriate path.

I thought about the function, but we really only want to set it once, I think. Also, making a property classmethod is nasty.

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:63
>> +        if repository.is_svn:
> 
> As a user I don't like it when the arguments available to me in a script change based on my CWD. Can we make the options available no matter what, then error out if we fail to parse?

These options control the output, not the input, so we would basically be unconditionally failing the moment the user passed them...that's why I filtered them out as options. If we removed this, we would trigger the error on line 117

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:65
>> +                '--revision', '-r',
> 
> Have you considered using `type` for these arguments? You could set them to their respective: Commit._parse_revision, Commit._parse_hash, Commit._parse_identifier, etc. then just use the returned reference. And we could set the type of the option which does not use arguments to be the upper part of main(), thereby removing the majority of the parsing logic (and making it re-usable for any project).
> 
> This would remove a decent amount of logic from the command parser, so we know that anything to do with identifiers/revisions/hashes should be fully dealt with webkitscmpy.

These commands don't control the type of argument in, they control the type of argument out, `type` won't help us much.
Comment 13 Jonathan Bedard 2020-10-12 16:14:08 PDT
Created attachment 411181 [details]
Patch
Comment 14 Jonathan Bedard 2020-10-13 09:19:12 PDT
Created attachment 411225 [details]
Patch
Comment 15 Jonathan Bedard 2020-10-13 09:22:15 PDT
Created attachment 411226 [details]
Patch
Comment 16 dewei_zhu 2020-10-13 13:37:12 PDT
Comment on attachment 411226 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=411226&action=review

> Tools/Scripts/libraries/webkitscmpy/git-webkit:30
> +sys.exit(program.main())

My understanding is the difference between this file and 'Tools/Scripts/git-webkit' is this script can be use for other repo if necessary, right?

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:36
> +    DISPLAY_SIZE = 12

Display size seems ambiguous to me at first glance. Can we use a variable something like `HASH_LABEL_SIZE` or something similar?

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:24
> +import logging

Is this module used anywhere in this file?

> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:64
> +            sys.stderr.write(str(exception) + '\n')

Could you tell me more about choosing sys.stderr & print rather than logging module?
Comment 17 Jonathan Bedard 2020-10-13 13:45:42 PDT
Comment on attachment 411226 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=411226&action=review

>> Tools/Scripts/libraries/webkitscmpy/git-webkit:30
>> +sys.exit(program.main())
> 
> My understanding is the difference between this file and 'Tools/Scripts/git-webkit' is this script can be use for other repo if necessary, right?

That is correct.

In a future change, we will hook the WebKit one up to contributors json, Bugzilla,  ect.

>> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:64
>> +            sys.stderr.write(str(exception) + '\n')
> 
> Could you tell me more about choosing sys.stderr & print rather than logging module?

Logging will add the level in the log message by default (CRITICAL, in this case). Really no other reason than that.
Comment 18 Jonathan Bedard 2020-10-13 13:58:53 PDT
Created attachment 411250 [details]
Patch
Comment 19 dewei_zhu 2020-10-13 14:48:14 PDT
r=me
Comment 20 Jonathan Bedard 2020-10-13 15:48:30 PDT
Created attachment 411273 [details]
Patch
Comment 21 EWS 2020-10-13 16:53:40 PDT
Committed r268433: <https://trac.webkit.org/changeset/268433>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 411273 [details].
Comment 22 Jonathan Bedard 2020-10-13 17:32:17 PDT
Reopening to attach new patch.
Comment 23 Jonathan Bedard 2020-10-13 17:32:18 PDT
Created attachment 411285 [details]
Patch for landing
Comment 24 EWS 2020-10-13 18:10:21 PDT
Committed r268440: <https://trac.webkit.org/changeset/268440>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 411285 [details].