Bug 217534

Summary: [webkitscmpy] Add `git-webkit find`
Product: WebKit Reporter: Jonathan Bedard <jbedard>
Component: Tools / TestsAssignee: Jonathan Bedard <jbedard>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, dean_johnson, dewei_zhu, kocsen_chung, slewis, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=215862
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Jonathan Bedard
Reported 2020-10-09 14:03:41 PDT
webkitscmpy supports identifiers, but we need a way to expose those identifiers to the user. The eventual intention is to have these identifiers supported across a handful of scripts, but to start with, we want a script that can simply map identifiers, revisions and hashes to one another.
Attachments
Patch (39.75 KB, patch)
2020-10-09 17:21 PDT, Jonathan Bedard
no flags
Patch (36.80 KB, patch)
2020-10-11 10:20 PDT, Jonathan Bedard
no flags
Patch (37.19 KB, patch)
2020-10-12 13:35 PDT, Jonathan Bedard
no flags
Patch (37.55 KB, patch)
2020-10-12 16:14 PDT, Jonathan Bedard
no flags
Patch (37.29 KB, patch)
2020-10-13 09:19 PDT, Jonathan Bedard
no flags
Patch (37.41 KB, patch)
2020-10-13 09:22 PDT, Jonathan Bedard
no flags
Patch (37.41 KB, patch)
2020-10-13 13:58 PDT, Jonathan Bedard
no flags
Patch (31.46 KB, patch)
2020-10-13 15:48 PDT, Jonathan Bedard
no flags
Patch for landing (1.82 KB, patch)
2020-10-13 17:32 PDT, Jonathan Bedard
no flags
Radar WebKit Bug Importer
Comment 1 2020-10-09 14:04:02 PDT
Jonathan Bedard
Comment 2 2020-10-09 17:21:17 PDT
Jonathan Bedard
Comment 3 2020-10-09 17:22:11 PDT
(In reply to Jonathan Bedard from comment #2) > Created attachment 410988 [details] > Patch Also has the contents of https://bugs.webkit.org/show_bug.cgi?id=217533, which I intend to land first, but wanted it up for discussion.
Dean Johnson
Comment 4 2020-10-09 17:42:47 PDT
Comment on attachment 410988 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410988&action=review Looks pretty good, though I've got a couple questions on the entry point for `git-webkit` and the default output when no arguments are provided. > Tools/Scripts/git-webkit:40 > + Contributor.by_email[email] = contributor Instead of building an implicit dictionary on `Contributor`, could we instead call a memoized function with the json file path to get a list of contributors back? It wasn't clear to me reading `git-webkit` initially that it's using a static class to store all Contributor information. > Tools/Scripts/git-webkit:42 > +sys.exit(program.main(repository=local.Scm.from_path(scripts))) Also, maybe it makes more sense for this to be a symlink to the same file in `webkitscmpy`?: Tools/Scripts/libraries/webkitscmpy/git-webkit > Tools/Scripts/libraries/webkitcorepy/webkitcorepy/arguments.py:57 > +def LoggingGroup(parser, loggers=None, default=logging.WARNING, help='{} amount of logging'): This is awesome! > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:270 > + class Encoder(json.JSONEncoder): Nit: I think Python encourages placing class definitions at the top of the block they're defined in (when they're defined within another class). You could also move it outside of the class and call it CommitEncoder if that is more clear. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:94 > + return self.commit() IMO if there's no argument then we should fail / exit. This could easily cause mistakes by automation / scripts if they forget to provide an argument, whereas failing would be more correct. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/__init__.py:26 > +from webkitscmpy.program.main import main Why is code being imported if it's not being used?
Jonathan Bedard
Comment 5 2020-10-09 20:37:10 PDT
Comment on attachment 410988 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410988&action=review >> Tools/Scripts/git-webkit:40 >> + Contributor.by_email[email] = contributor > > Instead of building an implicit dictionary on `Contributor`, could we instead call a memoized function with the json file path to get a list of contributors back? It wasn't clear to me reading `git-webkit` initially that it's using a static class to store all Contributor information. I’m a bit torn on the right way to handle this. Some background: the reason we want a contributor mapping is because sometimes git and SVN have incomplete authors (missing an email, email without a name) and contributors.json let’s us normalize that. And that’s really all we’re using contributors.json for in this script. The catch is that contributors.json is associated with the WebKit repository specifically, but webkitscmpy is supposed to be project agnostic. That’s why I’m loading contributors.json here and not in the library....seems like maybe the right solution is giving an ‘add’ method to the class >> Tools/Scripts/git-webkit:42 >> +sys.exit(program.main(repository=local.Scm.from_path(scripts))) > > Also, maybe it makes more sense for this to be a symlink to the same file in `webkitscmpy`?: > Tools/Scripts/libraries/webkitscmpy/git-webkit Similar story as above. We definitely don’t want things like contributors.json attached to webkitscmpy. I think the real question is if the script in webkitscmpy should be called got-WebKit. >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:270 >> + class Encoder(json.JSONEncoder): > > Nit: I think Python encourages placing class definitions at the top of the block they're defined in (when they're defined within another class). You could also move it outside of the class and call it CommitEncoder if that is more clear. Will move! >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:94 >> + return self.commit() > > IMO if there's no argument then we should fail / exit. This could easily cause mistakes by automation / scripts if they forget to provide an argument, whereas failing would be more correct. Is that stance for just this function, or the whole command? >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/__init__.py:26 >> +from webkitscmpy.program.main import main > > Why is code being imported if it's not being used? So that ‘from webkitscmpy.program import main’ works.
Jonathan Bedard
Comment 6 2020-10-11 10:20:09 PDT
Dean Johnson
Comment 7 2020-10-12 10:59:37 PDT
(In reply to Jonathan Bedard from comment #5) > Comment on attachment 410988 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=410988&action=review > > >> Tools/Scripts/git-webkit:40 > >> + Contributor.by_email[email] = contributor > > > > Instead of building an implicit dictionary on `Contributor`, could we instead call a memoized function with the json file path to get a list of contributors back? It wasn't clear to me reading `git-webkit` initially that it's using a static class to store all Contributor information. > > I’m a bit torn on the right way to handle this. > > Some background: the reason we want a contributor mapping is because > sometimes git and SVN have incomplete authors (missing an email, email > without a name) and contributors.json let’s us normalize that. And that’s > really all we’re using contributors.json for in this script. > > The catch is that contributors.json is associated with the WebKit repository > specifically, but webkitscmpy is supposed to be project agnostic. That’s why > I’m loading contributors.json here and not in the library....seems like > maybe the right solution is giving an ‘add’ method to the class My concern was less about how contributors were added, and more that we were modifying the Contributor class implicitly for any other modules which may use it. I really dislike working with code which implicitly modifies state on a module I may be importing / using later, since it's more difficult to set expectations for what the module does and how it behaves. One way to address that is to build your contributor list in git-webkit, then pass around the list of contributors to use. > > >> Tools/Scripts/git-webkit:42 > >> +sys.exit(program.main(repository=local.Scm.from_path(scripts))) > > > > Also, maybe it makes more sense for this to be a symlink to the same file in `webkitscmpy`?: > > Tools/Scripts/libraries/webkitscmpy/git-webkit > > Similar story as above. We definitely don’t want things like > contributors.json attached to webkitscmpy. I think the real question is if > the script in webkitscmpy should be called got-WebKit. > > >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:270 > >> + class Encoder(json.JSONEncoder): > > > > Nit: I think Python encourages placing class definitions at the top of the block they're defined in (when they're defined within another class). You could also move it outside of the class and call it CommitEncoder if that is more clear. > > Will move! > > >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:94 > >> + return self.commit() > > > > IMO if there's no argument then we should fail / exit. This could easily cause mistakes by automation / scripts if they forget to provide an argument, whereas failing would be more correct. > > Is that stance for just this function, or the whole command? Whole command. If someone types `git webkit find` I don't think we should return any results -- imagine if any code does something like: ``` get_commit_info_command = ['git', 'webkit', 'find'] identifier = some_func_to_get_commit() if identifier: get_commit_info_command.append(commit) return identifier ``` if some_func_to_get_identifier ever has a bug that causes it to return None, then we could start returning the repository's current hash every time. A realistic case of this happening is when some_func_to_get_commit() is a function which talks to a web service and happens to have a `try/except Exception` added for flaky failures and forgets to re-raise after catching an exception after hitting the retry limit. > > >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/__init__.py:26 > >> +from webkitscmpy.program.main import main > > > > Why is code being imported if it's not being used? > > So that ‘from webkitscmpy.program import main’ works. Ah, makes sense.
Dean Johnson
Comment 8 2020-10-12 11:01:01 PDT
Sorry, partially rewrote example code while I was replying. Updated code example since it was incomplete: ``` get_commit_info_command = ['git', 'webkit', 'find'] commit = some_func_to_get_commit() if commit: get_commit_info_command.append(commit) identifier = run_command(get_commit_info_command) return identifier ```
Jonathan Bedard
Comment 9 2020-10-12 11:42:45 PDT
(In reply to Dean Johnson from comment #7) > ... > > My concern was less about how contributors were added, and more that we were > modifying the Contributor class implicitly for any other modules which may > use it. I really dislike working with code which implicitly modifies state > on a module I may be importing / using later, since it's more difficult to > set expectations for what the module does and how it behaves. > > One way to address that is to build your contributor list in git-webkit, > then pass around the list of contributors to use. I'm actually going to remove the contributors part of this patch, and put it in a stand-alone change so we can discuss further. This patch is big already, I didn't expect the contributors.json to be controversial (so I added it), but it's not required for git-webkit to work. > ... > if some_func_to_get_identifier ever has a bug that causes it to return None, > then we could start returning the repository's current hash every time. A > realistic case of this happening is when some_func_to_get_commit() is a > function which talks to a web service and happens to have a `try/except > Exception` added for flaky failures and forgets to re-raise after catching > an exception after hitting the retry limit. How would you feel about supporting "HEAD" instead of making None correspond to HEAD? Suppose that means we would also have to support HEAD~ and friends, but that's not too hard.
Jonathan Bedard
Comment 10 2020-10-12 13:35:42 PDT
Dean Johnson
Comment 11 2020-10-12 15:08:14 PDT
Comment on attachment 411150 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411150&action=review Thanks for splitting out the patch. Added a couple more comments that suggest moving parsing calls and repository checks outside of the command itself. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/git.py:91 > + if 'master' in candidates: Nit: Do we want to prefer `master` over `main`? Maybe we should error out if a default isn't defined, but both are in `self.branches`? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:41 > + HEAD_RE = re.compile(r'HEAD(~\d*)*') Technically HEAD^ is also a valid git hash, though it seems unlikely to be used. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:104 > + count += int(s) if s else 1 Maybe it would be better to pass the argument through to `git`, get that hash, then convert using the hash? Then we don't need to handle cases like HEAD^ or write any parsing logic ourselves. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/svn.py:40 > + executable = '/usr/bin/svn' if os.path.exists('/usr/bin/svn') else '/usr/local/bin/svn' Do we have any checks for this executable existing? Maybe it'd be better to make `executable` a property on Svn and use that function to find the appropriate path. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/command.py:35 > + raise NotImplementedError("'{}' does not have a help message") Nit: Extra space in 'a help' > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:63 > + if repository.is_svn: As a user I don't like it when the arguments available to me in a script change based on my CWD. Can we make the options available no matter what, then error out if we fail to parse? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:65 > + '--revision', '-r', Have you considered using `type` for these arguments? You could set them to their respective: Commit._parse_revision, Commit._parse_hash, Commit._parse_identifier, etc. then just use the returned reference. And we could set the type of the option which does not use arguments to be the upper part of main(), thereby removing the majority of the parsing logic (and making it re-usable for any project). This would remove a decent amount of logic from the command parser, so we know that anything to do with identifiers/revisions/hashes should be fully dealt with webkitscmpy.
Jonathan Bedard
Comment 12 2020-10-12 15:20:04 PDT
Comment on attachment 411150 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411150&action=review >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/git.py:91 >> + if 'master' in candidates: > > Nit: Do we want to prefer `master` over `main`? Maybe we should error out if a default isn't defined, but both are in `self.branches`? Most of our repos won't hit this (I encountered it with a different version of git) because the rev-parse command will give us what we need I think if we're being realistic, a repository that has both branches probably has "master" as the default branch since that's what the industry has been doing for many years (which is why I did it this way). Not a big deal either way though, most checkouts will support the rev-parse command >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/scm.py:104 >> + count += int(s) if s else 1 > > Maybe it would be better to pass the argument through to `git`, get that hash, then convert using the hash? Then we don't need to handle cases like HEAD^ or write any parsing logic ourselves. Doesn't work for SVN repositories, though, and I'm trying to keep the API the same. >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/local/svn.py:40 >> + executable = '/usr/bin/svn' if os.path.exists('/usr/bin/svn') else '/usr/local/bin/svn' > > Do we have any checks for this executable existing? Maybe it'd be better to make `executable` a property on Svn and use that function to find the appropriate path. I thought about the function, but we really only want to set it once, I think. Also, making a property classmethod is nasty. >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:63 >> + if repository.is_svn: > > As a user I don't like it when the arguments available to me in a script change based on my CWD. Can we make the options available no matter what, then error out if we fail to parse? These options control the output, not the input, so we would basically be unconditionally failing the moment the user passed them...that's why I filtered them out as options. If we removed this, we would trigger the error on line 117 >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:65 >> + '--revision', '-r', > > Have you considered using `type` for these arguments? You could set them to their respective: Commit._parse_revision, Commit._parse_hash, Commit._parse_identifier, etc. then just use the returned reference. And we could set the type of the option which does not use arguments to be the upper part of main(), thereby removing the majority of the parsing logic (and making it re-usable for any project). > > This would remove a decent amount of logic from the command parser, so we know that anything to do with identifiers/revisions/hashes should be fully dealt with webkitscmpy. These commands don't control the type of argument in, they control the type of argument out, `type` won't help us much.
Jonathan Bedard
Comment 13 2020-10-12 16:14:08 PDT
Jonathan Bedard
Comment 14 2020-10-13 09:19:12 PDT
Jonathan Bedard
Comment 15 2020-10-13 09:22:15 PDT
dewei_zhu
Comment 16 2020-10-13 13:37:12 PDT
Comment on attachment 411226 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411226&action=review > Tools/Scripts/libraries/webkitscmpy/git-webkit:30 > +sys.exit(program.main()) My understanding is the difference between this file and 'Tools/Scripts/git-webkit' is this script can be use for other repo if necessary, right? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:36 > + DISPLAY_SIZE = 12 Display size seems ambiguous to me at first glance. Can we use a variable something like `HASH_LABEL_SIZE` or something similar? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:24 > +import logging Is this module used anywhere in this file? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:64 > + sys.stderr.write(str(exception) + '\n') Could you tell me more about choosing sys.stderr & print rather than logging module?
Jonathan Bedard
Comment 17 2020-10-13 13:45:42 PDT
Comment on attachment 411226 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=411226&action=review >> Tools/Scripts/libraries/webkitscmpy/git-webkit:30 >> +sys.exit(program.main()) > > My understanding is the difference between this file and 'Tools/Scripts/git-webkit' is this script can be use for other repo if necessary, right? That is correct. In a future change, we will hook the WebKit one up to contributors json, Bugzilla, ect. >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/find.py:64 >> + sys.stderr.write(str(exception) + '\n') > > Could you tell me more about choosing sys.stderr & print rather than logging module? Logging will add the level in the log message by default (CRITICAL, in this case). Really no other reason than that.
Jonathan Bedard
Comment 18 2020-10-13 13:58:53 PDT
dewei_zhu
Comment 19 2020-10-13 14:48:14 PDT
r=me
Jonathan Bedard
Comment 20 2020-10-13 15:48:30 PDT
EWS
Comment 21 2020-10-13 16:53:40 PDT
Committed r268433: <https://trac.webkit.org/changeset/268433> All reviewed patches have been landed. Closing bug and clearing flags on attachment 411273 [details].
Jonathan Bedard
Comment 22 2020-10-13 17:32:17 PDT
Reopening to attach new patch.
Jonathan Bedard
Comment 23 2020-10-13 17:32:18 PDT
Created attachment 411285 [details] Patch for landing
EWS
Comment 24 2020-10-13 18:10:21 PDT
Committed r268440: <https://trac.webkit.org/changeset/268440> All reviewed patches have been landed. Closing bug and clearing flags on attachment 411285 [details].
Note You need to log in before you can comment on or make changes to this bug.