216403 – [webkitscmpy] Add Commit object

RESOLVED FIXED216403

[webkitscmpy] Add Commit object

https://bugs.webkit.org/show_bug.cgi?id=216403

Summary [webkitscmpy] Add Commit object

Jonathan Bedard

Reported 2020-09-11 04:31:29 PDT

Add a sortable commit object that includes our the new monotonically increasing identifier.

Attachments
Patch (18.69 KB, patch) 2020-09-15 16:05 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (19.11 KB, patch) 2020-09-18 18:10 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (19.76 KB, patch) 2020-09-23 11:12 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (20.71 KB, patch) 2020-09-25 21:18 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (20.92 KB, patch) 2020-09-28 14:06 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (22.85 KB, patch) 2020-09-29 15:36 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (22.78 KB, patch) 2020-09-29 16:57 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (22.93 KB, patch) 2020-09-30 13:49 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (22.96 KB, patch) 2020-09-30 17:13 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Show Obsolete (8) View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2020-09-11 04:31:47 PDT

<rdar://problem/68702779>

Jonathan Bedard

Comment 2 2020-09-11 04:37:35 PDT

This change is being used to support https://bugs.webkit.org/show_bug.cgi?id=216404

Jonathan Bedard

Comment 3 2020-09-15 16:05:17 PDT

Created attachment 408874 [details] Patch

Jonathan Bedard

Comment 4 2020-09-15 16:07:06 PDT

This change doesn't cover actually generating the identifiers, but does cover what those identifiers would look like as strings, how we parse them out, how we determine whether a particular string is an identifier/revision/git hash.

Jonathan Bedard

Comment 5 2020-09-18 18:10:39 PDT

Created attachment 409184 [details] Patch

Jonathan Bedard

Comment 6 2020-09-23 11:12:16 PDT

Created attachment 409486 [details] Patch

Dean Johnson

Comment 7 2020-09-23 11:40:16 PDT

Comment on attachment 409486 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409486&action=review > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:33 > + IDENTIFIER_RE = re.compile(r'^i?(?P<identifier>-?\d+)(?P<branch>@\S+)?$') Should we be capturing '@' as part of the branch group? That seems like it will lead to a ton of branch.lstrip('@') in caller's code. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:37 > + def standardize_hash(cls, hash, do_assert=False): Can we rename "standardize" to "parse" (for all standardize_* functions). > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:41 > + if not isinstance(hash, str) and not isinstance(hash, unicode): For Py2/3 compatibility it would be better to use six.string_types: `if not isinstance(hash, six.string_types)` > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:62 > + elif cls.NUMBER_RE.match(revision): I think you can also call `<str_object>.isdigit()` instead of parsing via regex. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:79 > + return revision I think providing a float causes this code to return what was provided. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:94 > + elif cls.NUMBER_RE.match(identifier): I'm a bit confused -- how will we know if a number is an ID, a revision, or a hash? Also, you can <str_object>.isdigit(). > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:106 > + return identifier, branch The return types for `standardize_identifier` is different than `standardize_revision` and `standardize_hash`. That is prone to lead to confusion and bugs if someone uses the code. Can we find a way to keep the return types the same? e.g. vending a new Commit object based on the parsed data. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:133 > + if branch and not isinstance(branch, str) and not isinstance(branch, unicode): Ditto to six.string_types comment. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:136 > + raise ValueError("Caller passed both 'branch' and 'identifier', but specified different branches") Can we also include the invalid values to assist with debugging? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:139 > + raise TypeError("Expected 'timestamp' to be of type {}".format(int)) Can we also include the invalid values to assist with debugging? Also, I don't think `int` needs to be formatted into the string. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:143 > + raise TypeError("Expected 'author' to be of type {}".format(Contributor)) Ditto. Also, it seems like author should be able to be an email (for example) that is turned into a Contributor object. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:146 > + if message and not isinstance(message, str) and not isinstance(message, unicode): Ditto to six.string_types comment. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:156 > + result += ' SVN revision: r{}'.format(self.revision) Do we need to prefix with spaces? Since this is across multiple lines it will be messy to reformat for any caller. Can we rename this function to "pretty_print" after Python's prettyprinter? to_string (to me) seems like the output you return for __repr__.

Jonathan Bedard

Comment 8 2020-09-23 11:51:21 PDT

Comment on attachment 409486 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=409486&action=review >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:33 >> + IDENTIFIER_RE = re.compile(r'^i?(?P<identifier>-?\d+)(?P<branch>@\S+)?$') > > Should we be capturing '@' as part of the branch group? That seems like it will lead to a ton of branch.lstrip('@') in caller's code. We don't really want to make the @ optional, though. All the callers access this through the "standardize_identifier" functions anyways, so they already get a striped branch. >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:62 >> + elif cls.NUMBER_RE.match(revision): > > I think you can also call `<str_object>.isdigit()` instead of parsing via regex. That doesn't support negative numbers....which is fine for svn revisions, I suppose. >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:79 >> + return revision > > I think providing a float causes this code to return what was provided. Oops, yeah, it would >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:94 >> + elif cls.NUMBER_RE.match(identifier): > > I'm a bit confused -- how will we know if a number is an ID, a revision, or a hash? > > Also, you can <str_object>.isdigit(). That's handled in "parse". If you provide a raw integer, we assume identifier first, then revision and lastly a hash. Identifiers can be negative, isdigit() won't work here >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:106 >> + return identifier, branch > > The return types for `standardize_identifier` is different than `standardize_revision` and `standardize_hash`. That is prone to lead to confusion and bugs if someone uses the code. > > Can we find a way to keep the return types the same? e.g. vending a new Commit object based on the parsed data. We can't vend a new Commit object because these functions are really helper functions used to construct the commit. Maybe it's best to preface these functions with an _ to indicate that they really aren't meant to be API >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:156 >> + result += ' SVN revision: r{}'.format(self.revision) > > Do we need to prefix with spaces? Since this is across multiple lines it will be messy to reformat for any caller. > > Can we rename this function to "pretty_print" after Python's prettyprinter? to_string (to me) seems like the output you return for __repr__. If callers were going to reformat, I assume they would either split the lines or construct their own string. What would you envision a caller would do with this output? I figured most of the time it would be printed directly. pretty_print is a better name, will rename.

Jonathan Bedard

Comment 9 2020-09-25 21:18:13 PDT

Created attachment 409768 [details] Patch

Jonathan Bedard

Comment 10 2020-09-28 14:06:28 PDT

Created attachment 409920 [details] Patch

Jonathan Bedard

Comment 11 2020-09-29 15:36:47 PDT

Created attachment 410060 [details] Patch

Jonathan Bedard

Comment 12 2020-09-29 15:48:52 PDT

Ryosuke had an idea to embed the branch point from the default branch into the identifier. I think this improves the human readability of the identifiers, but the additional information is actually not required to define which a commit. The current patch will print the branch point, if it's provided, but an identifier without a branch point will still be parsed.

Jonathan Bedard

Comment 13 2020-09-29 16:57:02 PDT

Created attachment 410068 [details] Patch

dewei_zhu

Comment 14 2020-09-30 12:10:07 PDT

Comment on attachment 410068 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410068&action=review > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:33 > + IDENTIFIER_RE = re.compile(r'^[Ii]?((?P<branch_point>\d+)\.)?(?P<identifier>-?\d+)(@(?P<branch>\S+))?$') For the identifier, will it be a string prefixed '-'? m = IDENTIFIER_RE.match('123.-456@main') m.group('identifier') will be '-456', is this expected? Do we intentionally keep a '.' after branch point? > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:82 > + def _parse_identifier(cls, identifier, do_assert=False): This function have inconsistent return value type, it can be: None, int, string (int, int), string. This makes parsing the results more complicated as you did on line 134. > Tools/Scripts/libraries/webkitscmpy/webkitscmpy/test/commit_unittest.py:87 > + self.assertEqual(None, Commit._parse_identifier(3.141592)) Do we have test case like `'i123.-124@main` if this is supported identifier? If not, we should programmatically disable it.

Jonathan Bedard

Comment 15 2020-09-30 12:31:55 PDT

Comment on attachment 410068 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=410068&action=review >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:33 >> + IDENTIFIER_RE = re.compile(r'^[Ii]?((?P<branch_point>\d+)\.)?(?P<identifier>-?\d+)(@(?P<branch>\S+))?$') > > For the identifier, will it be a string prefixed '-'? > m = IDENTIFIER_RE.match('123.-456@main') > m.group('identifier') will be '-456', is this expected? > Do we intentionally keep a '.' after branch point? The weirdness here is because the branch point is helpful for humans, but it's not actually required to define a commit. The example you provided is an invalid identifier, but not because of the negative value, it's because of the magnitude of the negative value (it's larger than the branch point) and the fact that "main" is likely the default branch. If you provide a negative identifier with a branch point (like i123.-4@branch-a), what you're really saying is "I want the commit 4 before the branch point on the default branch", (i119@main, in this case). Negative identifiers are valid arguments to scripts, but they are never going to be canonical identifiers of commits. If you passed something like "i123.-4@branch-a" to any service, we would fetch the commit matching those arguments, recompute the identifiers and refer to the commit as "i119@main". I'm working on the specific rules for canonical identifiers in https://bugs.webkit.org/show_bug.cgi?id=216404, those are separate from this change because they rely on your specific repository (namely, what your default branch is) The "branch_point" intentionally is grouped with the '.' because commits on the default branch don't have a branch point >> Tools/Scripts/libraries/webkitscmpy/webkitscmpy/commit.py:82 >> + def _parse_identifier(cls, identifier, do_assert=False): > > This function have inconsistent return value type, > it can be: > None, > int, string > (int, int), string. > This makes parsing the results more complicated as you did on line 134. I'm open to changing this, I'm just not sure the right way to do so. Returning None is handy for quickly testing, we could make "int, string" into "(None, int), string" to standardize it. Although in that case, I suppose it would be better to just return "int, int, string" and "None, int, string"

Jonathan Bedard

Comment 16 2020-09-30 13:49:42 PDT

Created attachment 410152 [details] Patch

dewei_zhu

Comment 17 2020-09-30 15:48:00 PDT

r=me Had some offline discussion with Johnathan, there will be a follow-up change to perform more comprehensive sanity check regarding negative identifier and branching points.

Jonathan Bedard

Comment 18 2020-09-30 17:13:37 PDT

Created attachment 410175 [details] Patch

EWS

Comment 19 2020-09-30 17:46:01 PDT

Committed r267815: <https://trac.webkit.org/changeset/267815> All reviewed patches have been landed. Closing bug and clearing flags on attachment 410175 [details].

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component Accessibility

Assignee

Jonathan Bedard

Reported

2020-09-11 04:31 PDT

Modified

2020-09-30 17:46 PDT History

CC List

7 users Show

URL

Keywords InRadar

Depends on

Blocks