Bug 221982

Summary: Add support for syncing repo with commit revision label
Product: WebKit Reporter: Zhifei Fang <zhifei_fang>
Component: New BugsAssignee: Zhifei Fang <zhifei_fang>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, darin, dewei_zhu, jbedard, rniwa, webkit-bug-importer, zhifei_fang
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 222897    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Zhifei Fang 2021-02-16 11:45:31 PST
Add support for syncing repo with commit identifier
Comment 1 Zhifei Fang 2021-02-16 11:54:18 PST
Created attachment 420515 [details]
Patch
Comment 2 Ryosuke Niwa 2021-02-16 13:10:14 PST
Comment on attachment 420515 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420515&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:263
> +    IDENTIFIER_RE = re.compile(r'Canonical link: https://commits.webkit.org/([\d]+@[a-zA-Z\s\-_]+)\n')

Use lowercase and prefix with _ to be consistent with other code in this file.

> Websites/perf.webkit.org/tools/sync-commits.py:284
> +        identifier = self.IDENTIFIER_RE.search(message).group(1)

Please use named group: ?P<~>

> Websites/perf.webkit.org/tools/sync-commits.py:290
> +        res = {

Please don't abbreviate result / response as "res".

> Websites/perf.webkit.org/tools/sync-commits.py:298
> +        if previous_identifier:
> +            res['previousCommit'] = previous_identifier

Please just always put previousCommit as we do in GitRepository's version.
Comment 3 Zhifei Fang 2021-02-16 13:12:55 PST
Comment on attachment 420515 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420515&action=review

>> Websites/perf.webkit.org/tools/sync-commits.py:298
>> +            res['previousCommit'] = previous_identifier
> 
> Please just always put previousCommit as we do in GitRepository's version.

This doesn't work for the first commit, please see validate_commits, it will validate the revision if the key exist. Either I change the API or change it here
Comment 4 Ryosuke Niwa 2021-02-16 13:50:40 PST
(In reply to Zhifei Fang from comment #3)
> Comment on attachment 420515 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=420515&action=review
> 
> >> Websites/perf.webkit.org/tools/sync-commits.py:298
> >> +            res['previousCommit'] = previous_identifier
> > 
> > Please just always put previousCommit as we do in GitRepository's version.
> 
> This doesn't work for the first commit, please see validate_commits, it will
> validate the revision if the key exist. Either I change the API or change it
> here

That seems to suggest that GitRepository._revision_from_tokens is also broken? I suggest we fix the API to allow null being specified in the previousCommit (with a test). It's awkward to disallow null there.
Comment 5 Zhifei Fang 2021-02-16 13:51:55 PST
Comment on attachment 420515 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420515&action=review

>>>> Websites/perf.webkit.org/tools/sync-commits.py:298
>>>> +            res['previousCommit'] = previous_identifier
>>> 
>>> Please just always put previousCommit as we do in GitRepository's version.
>> 
>> This doesn't work for the first commit, please see validate_commits, it will validate the revision if the key exist. Either I change the API or change it here
> 
> That seems to suggest that GitRepository._revision_from_tokens is also broken? I suggest we fix the API to allow null being specified in the previousCommit (with a test). It's awkward to disallow null there.

Yes... OK, I will change the API then
Comment 6 Zhifei Fang 2021-02-16 14:57:07 PST
Created attachment 420545 [details]
Patch
Comment 7 Ryosuke Niwa 2021-02-16 15:18:05 PST
Comment on attachment 420545 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420545&action=review

> Websites/perf.webkit.org/public/include/commit-updater.php:158
> -        require_format('Revision', $commit_info['revision'], '/^[A-Za-z0-9 \.]+$/');
> +        require_format('Revision', $commit_info['revision'], '/^[@A-Za-z0-9 \.]+$/');

I don't think this is quite right. We don't want to allow arbitrary number of @s.
We should be using a similar expression to the one in sync-commits.py

> Websites/perf.webkit.org/public/include/commit-updater.php:161
> +        if (array_key_exists('previousCommit', $commit_info) && $commit_info['previousCommit'])

Please add a test. Also, doesn't this allow previousCommit of 0 or false to be treated like null?
I think we should probably use is_null here or === NULL.

> Websites/perf.webkit.org/tools/sync-commits.py:263
> +    _IDENTIFIER_RE = re.compile(r'Canonical link: https://commits.webkit.org/(?P<identifier>[\d]+@[a-zA-Z\s\-_]+)\n')

Use lowercase?
Comment 8 dewei_zhu 2021-02-16 15:42:47 PST
Comment on attachment 420545 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420545&action=review

>> Websites/perf.webkit.org/tools/sync-commits.py:263
>> +    _IDENTIFIER_RE = re.compile(r'Canonical link: https://commits.webkit.org/(?P<identifier>[\d]+@[a-zA-Z\s\-_]+)\n')
> 
> Use lowercase?

If this is meant to be a constant, I think python PEP8 recommends it to be all upper case per https://www.python.org/dev/peps/pep-0008/#constants
Comment 9 Ryosuke Niwa 2021-02-16 15:51:26 PST
(In reply to dewei_zhu from comment #8)
> Comment on attachment 420545 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=420545&action=review
> 
> >> Websites/perf.webkit.org/tools/sync-commits.py:263
> >> +    _IDENTIFIER_RE = re.compile(r'Canonical link: https://commits.webkit.org/(?P<identifier>[\d]+@[a-zA-Z\s\-_]+)\n')
> > 
> > Use lowercase?
> 
> If this is meant to be a constant, I think python PEP8 recommends it to be
> all upper case per https://www.python.org/dev/peps/pep-0008/#constants

Huh, I'm pretty sure we used not to do that but I guess we're doing that now. Okay.
Comment 10 Ryosuke Niwa 2021-02-16 15:51:56 PST
Comment on attachment 420545 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=420545&action=review

>>>> Websites/perf.webkit.org/tools/sync-commits.py:263
>>>> +    _IDENTIFIER_RE = re.compile(r'Canonical link: https://commits.webkit.org/(?P<identifier>[\d]+@[a-zA-Z\s\-_]+)\n')
>>> 
>>> Use lowercase?
>> 
>> If this is meant to be a constant, I think python PEP8 recommends it to be all upper case per https://www.python.org/dev/peps/pep-0008/#constants
> 
> Huh, I'm pretty sure we used not to do that but I guess we're doing that now. Okay.

It should probably be IDENTIFIER_PATTERN in that case per PEP8 style.
Comment 11 Zhifei Fang 2021-02-17 17:02:16 PST
Created attachment 420762 [details]
Patch
Comment 12 Zhifei Fang 2021-02-17 17:03:32 PST
I will add unit test in another patch, will also need firstly update the unit test to make it can work with up to date nodejs

(In reply to Ryosuke Niwa from comment #7)
> Comment on attachment 420545 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=420545&action=review
> 
> > Websites/perf.webkit.org/public/include/commit-updater.php:158
> > -        require_format('Revision', $commit_info['revision'], '/^[A-Za-z0-9 \.]+$/');
> > +        require_format('Revision', $commit_info['revision'], '/^[@A-Za-z0-9 \.]+$/');
> 
> I don't think this is quite right. We don't want to allow arbitrary number
> of @s.
> We should be using a similar expression to the one in sync-commits.py
> 
> > Websites/perf.webkit.org/public/include/commit-updater.php:161
> > +        if (array_key_exists('previousCommit', $commit_info) && $commit_info['previousCommit'])
> 
> Please add a test. Also, doesn't this allow previousCommit of 0 or false to
> be treated like null?
> I think we should probably use is_null here or === NULL.
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:263
> > +    _IDENTIFIER_RE = re.compile(r'Canonical link: https://commits.webkit.org/(?P<identifier>[\d]+@[a-zA-Z\s\-_]+)\n')
> 
> Use lowercase?
Comment 13 dewei_zhu 2021-02-17 18:21:03 PST
Maybe we want to land the change that fixes unit tests to be compatible with newer version of nodejs first.

I think it would be cleaner to keep the fix and the unit tests those are associated with the fix under the same patch.
Comment 14 Ryosuke Niwa 2021-02-17 19:49:05 PST
(In reply to dewei_zhu from comment #13)
> Maybe we want to land the change that fixes unit tests to be compatible with
> newer version of nodejs first.
> 
> I think it would be cleaner to keep the fix and the unit tests those are
> associated with the fix under the same patch.

Yes, we should do that.
Comment 15 Radar WebKit Bug Importer 2021-02-23 11:46:36 PST
<rdar://problem/74653254>
Comment 16 Zhifei Fang 2021-03-12 14:31:04 PST
Created attachment 423083 [details]
Patch
Comment 17 Ryosuke Niwa 2021-03-31 19:36:50 PDT
Comment on attachment 423083 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=423083&action=review

r- because I don't think we want to land this as is.

> Websites/perf.webkit.org/tools/sync-commits.py:46
> +        return GitRepository(name=repository['name'], git_url=repository['url'], git_checkout=repository['gitCheckout'], git_branch=repository.get('branch'), fetch_revision_label=repository.get("fetchRevisionLabel", False))

This isn't right. We aren't really fetching anything.
If anything this is really about whether repository has revision identifier or not.

> Websites/perf.webkit.org/tools/sync-commits.py:194
> +    REVISION_LABEL_RE = re.compile(r'Canonical link: https://commits.webkit.org/(?P<revision_label>\d+@\S+)\n')

I don't think it's right that we're hard-coding into the script.
This script is supposed to work with any repository.
This should probably be specified in fetch_revision_label in the config file.

> Websites/perf.webkit.org/tools/sync-commits.py:245
> +        if self._fetch_revision_label:
> +            revision_label = self.REVISION_LABEL_RE.search(message).group('revision_label')
> +            result['revisionLabel'] = revision_label

This is unnecessarily complex. Given our API is perfectly capable of accepting null, we should just do this:
'revisionIdentifier': ~
Comment 18 Zhifei Fang 2021-04-07 22:36:29 PDT
Created attachment 425478 [details]
Patch
Comment 19 dewei_zhu 2021-04-08 00:25:30 PDT
Comment on attachment 425478 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=425478&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:198
> +    def __init__(self, name, git_checkout, git_url, git_branch=None, revision_identifier=False):

`revision_identifier` sounds like it an identifier, but it's actually a boolean. So how about `include_revision_identifier`?
Also, do we expect changes on `load_repository` ?
Stepping back a bit, do we really need this variable? Can we always include `revisionIdentifier` when it's available rather than using this flag?
Comment 20 Zhifei Fang 2021-04-08 11:00:45 PDT
(In reply to dewei_zhu from comment #19)
> Comment on attachment 425478 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=425478&action=review
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:198
> > +    def __init__(self, name, git_checkout, git_url, git_branch=None, revision_identifier=False):
> 
> `revision_identifier` sounds like it an identifier, but it's actually a
> boolean. So how about `include_revision_identifier`?
> Also, do we expect changes on `load_repository` ?
> Stepping back a bit, do we really need this variable? Can we always include
> `revisionIdentifier` when it's available rather than using this flag?

I did this because this code will be shared with other git repo, it won't include this in the commit message, running regular expression against something that I know it won't exist will be a waste of time. I think I should rename this to get_revision_identifier_from_commit_message.
Comment 21 Zhifei Fang 2021-04-08 11:03:41 PDT
Created attachment 425522 [details]
Patch
Comment 22 Zhifei Fang 2021-04-14 12:17:18 PDT
Created attachment 426029 [details]
Patch
Comment 23 Zhifei Fang 2021-04-14 12:29:10 PDT
Any further comments about this ?

I have made this script raise an error when we cannot find the revision identifier, I know our API is smart enough to handle if we have revision identifier or not, but the real issue is if we expected there is a revision identifier but we never find it, there must be a issue in our infra, the commit with out revision identifier will be rewrite, this will gives us a new git hash. I think we should not report this bad commit.
Comment 24 Zhifei Fang 2021-04-14 12:35:15 PDT
Created attachment 426032 [details]
Patch
Comment 25 Zhifei Fang 2021-04-14 12:36:35 PDT
Created attachment 426034 [details]
Patch
Comment 26 Zhifei Fang 2021-04-14 12:38:00 PDT
Also, add the sync option for the SVN as well, we currently still sync with SVN instead of the git repo
Comment 27 Ryosuke Niwa 2021-04-14 15:22:31 PDT
Comment on attachment 426034 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=426034&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:21
> +# Canonical link: https://commits.webkit.org/https://commits.webkit.org/232477@main
> +REVISION_IDENTIFIER_RE = re.compile(r'Canonical link: (https://commits.webkit.org/)+(?P<revision_identifier>\d+@[\w\.\-]+)\n')

Surely, you meant `(https://commits.webkit.org/)?`, not (~)+.

> Websites/perf.webkit.org/tools/sync-commits.py:131
> -    def __init__(self, name, svn_url, should_trust_certificate, use_server_auth, account_name_script_path):
> +    def __init__(self, name, svn_url, should_trust_certificate, use_server_auth, account_name_script_path, get_revision_identifier_from_commit_msg):

I'd call this report_revision_identifier_in_commit_msg instead.
But it doesn't seem like this is ever set?

> Websites/perf.webkit.org/tools/sync-commits.py:171
> +            if revision_identifier_match:

WebKit's preferred way of structuring code like this is to put unusual path as an early return.
So we should have `if not revision_identifier_match` then raise and shouldn't have any else clause.

> Websites/perf.webkit.org/tools/sync-commits.py:177
> +                # one useless. If we detect this, we should not report this commit.

This comment is very wordy. It's probably better this information in the exception itself.

> Websites/perf.webkit.org/tools/sync-commits.py:178
> +                raise ValueError('Expected commit message includes revision identifier, but cannot find it')

Something like this:
`Expected commit message includes revision identifier, but did not find - may have been caused by a history rewrite`

> Websites/perf.webkit.org/tools/sync-commits.py:255
> +            if revision_identifier_match:

Ditto.

> Websites/perf.webkit.org/tools/sync-commits.py:258
> +                # This is due to we may have sync error for the commit, if we don't have the revision

Ditto.
Comment 28 Zhifei Fang 2021-04-14 17:20:20 PDT
(In reply to Ryosuke Niwa from comment #27)
> Comment on attachment 426034 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=426034&action=review
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:21
> > +# Canonical link: https://commits.webkit.org/https://commits.webkit.org/232477@main
> > +REVISION_IDENTIFIER_RE = re.compile(r'Canonical link: (https://commits.webkit.org/)+(?P<revision_identifier>\d+@[\w\.\-]+)\n')
> 
> Surely, you meant `(https://commits.webkit.org/)?`, not (~)+.
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:131
> > -    def __init__(self, name, svn_url, should_trust_certificate, use_server_auth, account_name_script_path):
> > +    def __init__(self, name, svn_url, should_trust_certificate, use_server_auth, account_name_script_path, get_revision_identifier_from_commit_msg):
> 
> I'd call this report_revision_identifier_in_commit_msg instead.
> But it doesn't seem like this is ever set?
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:171
> > +            if revision_identifier_match:
> 
> WebKit's preferred way of structuring code like this is to put unusual path
> as an early return.
> So we should have `if not revision_identifier_match` then raise and
> shouldn't have any else clause.
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:177
> > +                # one useless. If we detect this, we should not report this commit.
> 
> This comment is very wordy. It's probably better this information in the
> exception itself.
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:178
> > +                raise ValueError('Expected commit message includes revision identifier, but cannot find it')
> 
> Something like this:
> `Expected commit message includes revision identifier, but did not find -
> may have been caused by a history rewrite`

OK, but this won't be caused by a history rewrite, we will need a history rewrite to fix this, therefore this commit is invalid after that rewrite.

I will change this to something like: 
`Expected commit message includes revision identifier, but did not find - need a history rewrite to fix it`
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:255
> > +            if revision_identifier_match:
> 
> Ditto.
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:258
> > +                # This is due to we may have sync error for the commit, if we don't have the revision
> 
> Ditto.
Comment 29 Ryosuke Niwa 2021-04-14 20:16:37 PDT
(In reply to Zhifei Fang from comment #28)
> (In reply to Ryosuke Niwa from comment #27)
> > Comment on attachment 426034 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=426034&action=review
> > 
> > > Websites/perf.webkit.org/tools/sync-commits.py:21
> > > +# Canonical link: https://commits.webkit.org/https://commits.webkit.org/232477@main
> > > +REVISION_IDENTIFIER_RE = re.compile(r'Canonical link: (https://commits.webkit.org/)+(?P<revision_identifier>\d+@[\w\.\-]+)\n')
> > 
> > Surely, you meant `(https://commits.webkit.org/)?`, not (~)+.
> > 
> > > Websites/perf.webkit.org/tools/sync-commits.py:131
> > > -    def __init__(self, name, svn_url, should_trust_certificate, use_server_auth, account_name_script_path):
> > > +    def __init__(self, name, svn_url, should_trust_certificate, use_server_auth, account_name_script_path, get_revision_identifier_from_commit_msg):
> > 
> > I'd call this report_revision_identifier_in_commit_msg instead.
> > But it doesn't seem like this is ever set?
> > 
> > > Websites/perf.webkit.org/tools/sync-commits.py:171
> > > +            if revision_identifier_match:
> > 
> > WebKit's preferred way of structuring code like this is to put unusual path
> > as an early return.
> > So we should have `if not revision_identifier_match` then raise and
> > shouldn't have any else clause.
> > 
> > > Websites/perf.webkit.org/tools/sync-commits.py:177
> > > +                # one useless. If we detect this, we should not report this commit.
> > 
> > This comment is very wordy. It's probably better this information in the
> > exception itself.
> > 
> > > Websites/perf.webkit.org/tools/sync-commits.py:178
> > > +                raise ValueError('Expected commit message includes revision identifier, but cannot find it')
> > 
> > Something like this:
> > `Expected commit message includes revision identifier, but did not find -
> > may have been caused by a history rewrite`
> 
> OK, but this won't be caused by a history rewrite, we will need a history
> rewrite to fix this, therefore this commit is invalid after that rewrite.
> 
> I will change this to something like: 
> `Expected commit message includes revision identifier, but did not find -
> need a history rewrite to fix it`

I see, yeah, that's a way clearer.
Comment 30 Zhifei Fang 2021-04-14 22:03:43 PDT
Created attachment 426074 [details]
Patch
Comment 31 Zhifei Fang 2021-04-14 22:05:32 PDT
After test I find that in the SVN repo we won't include the revision identifier, so we will need to report SVN revision and revision identifier from the git repo.

After we switch everything to git only, just switch off the new option I added, it will sync with git hash again.
Comment 32 Zhifei Fang 2021-04-16 13:17:18 PDT
Any more comments ?
Comment 33 Ryosuke Niwa 2021-04-17 01:08:48 PDT
Comment on attachment 426074 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=426074&action=review

Sorry for the delay. I got caught up with other things.

> Websites/perf.webkit.org/ChangeLog:2
> +2021-04-14  Zhifei Fang  <zhifei_fang@apple.com>
> +Add support for syncing repo with commit revision label

Something went awfully wrong with this change log entry.

> Websites/perf.webkit.org/tools/sync-commits.py:39
>          for repository in repositories:
> -            try:
>                  repository.fetch_commits_and_submit(server_config, args.max_fetch_count, args.max_ancestor_fetch_count)

Nit: wrong indentation.

> Websites/perf.webkit.org/tools/sync-commits.py:47
> -        return GitRepository(name=repository['name'], git_url=repository['url'], git_checkout=repository['gitCheckout'], git_branch=repository.get('branch'))
> +        return GitRepository(name=repository['name'], git_url=repository['url'], git_checkout=repository['gitCheckout'], git_branch=repository.get('branch'), report_revision_identifier_in_commit_msg=repository.get('reportRevisionIdentifier'), report_svn_revison=repository.get('reportSVNRevision'))

This line is way too long!
Please insert a hard line break somewhere.

> Websites/perf.webkit.org/tools/sync-commits.py:212
> +                # translate the last svn revision to git hash

This kind of "what" comment is a code smell.
We should be extracting this logic as a helper method instead.

> Websites/perf.webkit.org/tools/sync-commits.py:216
> +                last_fetched_git_hash = self._run_git_command(['svn', 'find-rev', 'r{}'.format(last_fetched)]).strip()
> +                if not last_fetched_git_hash:
> +                    self._fetch_all_hashes()
> +                    last_fetched_git_hash = self._run_git_command(['svn', 'find-rev', 'r{}'.format(last_fetched)]).strip()

What is this retry logic about? When is this useful?

> Websites/perf.webkit.org/tools/sync-commits.py:219
> +                last_fetched = last_fetched_git_hash

Why are we updating last_fetched? This is just wrong.
last_fetched comes from the server, it's nothing to do with the local state of git clone or git svn state.
This script is meant to be fault tolerant, meaning that if the script had failed / got killed mid way,
it can continue syncing the commit info from where it left off.
r- because of this.

> Websites/perf.webkit.org/tools/sync-commits.py:235
> +    def _get_svn_revision_from_message(self, message):

Nit: our coding style guidelines mandates that we do not prefix a getter function with "get" unless it has an out argument.
Here, it should read _svn_revision_from_commit_message instead
https://webkit.org/code-style-guidelines/#names-out-argument

> Websites/perf.webkit.org/tools/sync-commits.py:239
> +        if svn_revision_match:
> +            return svn_revision_match.group('svn_revision')
> +        return None

We prefer early exit over nested if. i.e. normal flow of logic should continue.
So this should read:
if not svn_revision_match:
    return None
return svn_revision_match.group('svn_revision')

or better yet:
return svn_revision_match.group('svn_revision') if svn_revision_match else None

> Websites/perf.webkit.org/tools/sync-commits.py:260
> +            svn_revision = self._get_svn_revision_from_message(message)

Why are we parsing subversion revision from git commit?
Can't we just run git svn find-rev instead?
It can convert git hash to subversion revision as well as subversion to git hash.

> Websites/perf.webkit.org/tools/sync-commits.py:269
> +            'revision': current_hash if not svn_revision else svn_revision,
> +            'revisionIdentifier': revision_identifier,
> +            'previousCommit': previous_hash if not previous_revision else previous_revision,

This isn't right. We can't switch between git hashes and revisions in a single repository.
That would cause all sorts of havoc down the line.
We should instead error and exit if _report_svn_revision is specified but we didn't find a subversion information.
r- because of this.
Comment 34 Zhifei Fang 2021-05-03 14:31:07 PDT
Created attachment 427603 [details]
Patch
Comment 35 Darin Adler 2021-05-03 14:41:05 PDT
Comment on attachment 427603 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427603&action=review

> Websites/perf.webkit.org/ChangeLog:3
> +2021-04-14  Zhifei Fang  <zhifei_fang@apple.com>
> +Add support for syncing repo with commit revision label
> +        https://bugs.webkit.org/show_bug.cgi?id=221982

This is not formatted correctly.

> Websites/perf.webkit.org/tools/sync-commits.py:21
> +REVISION_IDENTIFIER_RE = re.compile(r'Canonical link: (https://commits.webkit.org/)+(?P<revision_identifier>\d+@[\w\.\-]+)\n')

I believe the period characters in "commits.webkig.org" are going to compile as single character wild cards, not periods, right? I am pretty sure that is not what we want.
Comment 36 Zhifei Fang 2021-05-03 15:41:11 PDT
ah, good catch!

(In reply to Darin Adler from comment #35)
> Comment on attachment 427603 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427603&action=review
> 
> > Websites/perf.webkit.org/ChangeLog:3
> > +2021-04-14  Zhifei Fang  <zhifei_fang@apple.com>
> > +Add support for syncing repo with commit revision label
> > +        https://bugs.webkit.org/show_bug.cgi?id=221982
> 
> This is not formatted correctly.
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:21
> > +REVISION_IDENTIFIER_RE = re.compile(r'Canonical link: (https://commits.webkit.org/)+(?P<revision_identifier>\d+@[\w\.\-]+)\n')
> 
> I believe the period characters in "commits.webkig.org" are going to compile
> as single character wild cards, not periods, right? I am pretty sure that is
> not what we want.
Comment 37 Zhifei Fang 2021-05-03 16:01:00 PDT
Created attachment 427610 [details]
Patch
Comment 38 dewei_zhu 2021-05-03 16:20:08 PDT
Comment on attachment 427610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427610&action=review

> Websites/perf.webkit.org/ChangeLog:9
> +        Add new syncing option reportSVNRevision, which will let the script try to report SVN revision from a git-svn repo

Nit: missing '.' by the end of this sentence.

> Websites/perf.webkit.org/tools/sync-commits.py:38
>                  repository.fetch_commits_and_submit(server_config, args.max_fetch_count, args.max_ancestor_fetch_count)

Do we need to change the indentation if we remove `try - except` block?
Also, is the idea of removing those to stop the script if any error occurs?
Comment 39 Ryosuke Niwa 2021-05-03 17:27:49 PDT
Comment on attachment 427610 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427610&action=review

>> Websites/perf.webkit.org/tools/sync-commits.py:38
>>                  repository.fetch_commits_and_submit(server_config, args.max_fetch_count, args.max_ancestor_fetch_count)
> 
> Do we need to change the indentation if we remove `try - except` block?
> Also, is the idea of removing those to stop the script if any error occurs?

I don't think we want to make this change.

> Websites/perf.webkit.org/tools/sync-commits.py:46
> -        return GitRepository(name=repository['name'], git_url=repository['url'], git_checkout=repository['gitCheckout'], git_branch=repository.get('branch'))
> +        return GitRepository(name=repository['name'], git_url=repository['url'], git_checkout=repository['gitCheckout'], git_branch=repository.get('branch'), report_revision_identifier_in_commit_msg=repository.get('reportRevisionIdentifier'), report_svn_revison=repository.get('reportSVNRevision'))

Please insert a line break somewhere. This is awfully long.

> Websites/perf.webkit.org/tools/sync-commits.py:211
> +                # translate the last svn revision to git hash

Rather than adding a comment like this, we should move the code inside this if block as a helper method:
_fetch_git_hash_from_svn_revision

> Websites/perf.webkit.org/tools/sync-commits.py:214
> +                    self._fetch_all_hashes()

I don't think we want to tokenize the entire history just because we're missing a few subversion revisions.
Just run git svn fetch instead.

> Websites/perf.webkit.org/tools/sync-commits.py:217
> +                        raise ValueError('Cannot find git hash for last fetched svn revision')

Cannot find *the* git hash for *the* last fetched svn revision.

> Websites/perf.webkit.org/tools/sync-commits.py:234
> +    def _git_hash_to_svn_revision(self, git_hash):

To be consistent, call this _svn_revision_from_git_hash

> Websites/perf.webkit.org/tools/sync-commits.py:237
> +    def _svn_revision_to_git_hash(self, revision):

And _git_hash_from_svn_revision

> Websites/perf.webkit.org/tools/sync-commits.py:253
> +                raise ValueError('Expected commit message includes revision identifier, but cannot find it, will need a history rewrite to fix it')

Expected commit message *to include* a revision identifier but did not find one. Need a history rewrite to fix it.
"cannot find" doesn't sound right because it's not like we failed to fetch the commit message.
Also, we need to end the sentence after "find one". It's a run-on sentence as is.

> Websites/perf.webkit.org/tools/sync-commits.py:257
> +        svn_revision = None
> +        previous_revision = None

I don't think there is a need to define a new set of local variables.
Just rename the existing variables to current_revision & previous_revision
since "revision" is the term we're using in the perf dashboard.

> Websites/perf.webkit.org/tools/sync-commits.py:261
> +                raise ValueError('Expected this is a git svn repo, but cannot find SVN revison for {}'.format(svn_revision))

I don't think there is any need to say we expected this to be a git svn repository.
Just say "Cannot find svn revision for X.
Also, we should be including current_hash, not svn_revision, which is what we couldn't figure out.

> Websites/perf.webkit.org/tools/sync-commits.py:265
> +                    raise ValueError('Expected this is a git svn repo, but cannot find SVN revison for {}'.format(previous_hash))

Ditto.

> Websites/perf.webkit.org/tools/sync-commits.py:269
> +            'revision': current_hash if not svn_revision else svn_revision,

We don't want to rely on the value of revision we found to determine whether we use hash vs. revision.
This can lead to bugs like submitting hash when we should be submitting a revision
when _git_hash_to_svn_revision starts returning falsey values or some code change is made above
such that we don't always set a value to svn_revision when self._report_svn_revision is set.
Instead, check the value of self._report_svn_revision directly or merge these two variables as I suggested above.

> Websites/perf.webkit.org/tools/sync-commits.py:271
> +            'previousCommit': previous_hash if not previous_revision else previous_revision,

Ditto.
Comment 40 Zhifei Fang 2021-05-04 14:20:09 PDT
Created attachment 427699 [details]
Patch
Comment 41 Ryosuke Niwa 2021-05-04 15:19:25 PDT
Comment on attachment 427699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427699&action=review

r-. Please add all my comments above and below.

> Websites/perf.webkit.org/tools/sync-commits.py:-38
> -            try:
>                  repository.fetch_commits_and_submit(server_config, args.max_fetch_count, args.max_ancestor_fetch_count)
> -            except Exception as error:
> -                print "Failed to fetch and sync:", error

Again, I don't think we want to make this change.

> Websites/perf.webkit.org/tools/sync-commits.py:48
> +        return GitRepository(name=repository['name'], git_url=repository['url'], git_checkout=repository['gitCheckout'],
> +                             git_branch=repository.get('branch'), report_revision_identifier_in_commit_msg=repository.get('reportRevisionIdentifier'),
> +                             report_svn_revison=repository.get('reportSVNRevision'))

Nit: wrong indentation. Despite of what PEP8 says, we don't align arguments.
We always indent by 4 spaces instead.

> Websites/perf.webkit.org/tools/sync-commits.py:214
> +                # translate the last svn revision to git hash
> +                last_fetched_git_hash = self._git_hash_from_svn_revision(last_fetched)

Again, this should be extracted as a helper method instead of adding a comment.

> Websites/perf.webkit.org/tools/sync-commits.py:216
> +                    self._fetch_remote()

Again, we don't want to fetch all git commits just because we can't find some svn revisions.
Comment 42 Jonathan Bedard 2021-05-04 15:50:16 PDT
Comment on attachment 427699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427699&action=review

>> Websites/perf.webkit.org/tools/sync-commits.py:216
>> +                    self._fetch_remote()
> 
> Again, we don't want to fetch all git commits just because we can't find some svn revisions.

I'm not sure I understand the objection here. If we fail to find the git hash from the provided Svn revision, we clearly need to update our checkout.
Comment 43 Zhifei Fang 2021-05-04 15:52:09 PDT
Comment on attachment 427699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427699&action=review

>> Websites/perf.webkit.org/tools/sync-commits.py:216
>> +                    self._fetch_remote()
> 
> Again, we don't want to fetch all git commits just because we can't find some svn revisions.

I don't understand here, if we cannot find it at the first time, this means we don't have it locally, we should pull the remote to get the commits. this happens when you have a git repo out of sync with database. In such case, if we don't fetch the remote, you can find nothing by using git svn
Comment 44 Ryosuke Niwa 2021-05-04 15:53:45 PDT
Comment on attachment 427699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427699&action=review

>>>> Websites/perf.webkit.org/tools/sync-commits.py:216
>>>> +                    self._fetch_remote()
>>> 
>>> Again, we don't want to fetch all git commits just because we can't find some svn revisions.
>> 
>> I'm not sure I understand the objection here. If we fail to find the git hash from the provided Svn revision, we clearly need to update our checkout.
> 
> I don't understand here, if we cannot find it at the first time, this means we don't have it locally, we should pull the remote to get the commits. this happens when you have a git repo out of sync with database. In such case, if we don't fetch the remote, you can find nothing by using git svn

Oh, sorry, I misread the code. This new code is fine.
Comment 45 Zhifei Fang 2021-05-04 15:55:13 PDT
Created attachment 427705 [details]
Patch
Comment 46 Zhifei Fang 2021-05-04 17:45:15 PDT
Comment on attachment 427699 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427699&action=review

>> Websites/perf.webkit.org/tools/sync-commits.py:214
>> +                last_fetched_git_hash = self._git_hash_from_svn_revision(last_fetched)
> 
> Again, this should be extracted as a helper method instead of adding a comment.

Not really understand what should be extracted as a helper method, Do you mean translate the svn revision to git hash ? That is already a function. Or do you mean the line from 214 to 220? If so I am also confused, I think you have suggested me that we should not make any functions if there is only one caller, it will have some potential callers in future. And the logic here seems have no other caller.
Comment 47 Zhifei Fang 2021-05-04 17:46:27 PDT
Created attachment 427715 [details]
Patch
Comment 48 Ryosuke Niwa 2021-05-04 18:52:19 PDT
(In reply to Zhifei Fang from comment #46)
> Comment on attachment 427699 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427699&action=review
> 
> >> Websites/perf.webkit.org/tools/sync-commits.py:214
> >> +                last_fetched_git_hash = self._git_hash_from_svn_revision(last_fetched)
> > 
> > Again, this should be extracted as a helper method instead of adding a comment.
> 
> Not really understand what should be extracted as a helper method, Do you
> mean translate the svn revision to git hash ? That is already a function. Or
> do you mean the line from 214 to 220? If so I am also confused, I think you
> have suggested me that we should not make any functions if there is only one
> caller, it will have some potential callers in future. And the logic here
> seems have no other caller.

Lines 217 through 224. Either that should be extracted as a helper method or the comment should be removed. It's not okay to add a "what" comment like that.
Comment 49 Ryosuke Niwa 2021-05-04 18:52:36 PDT
Comment on attachment 427715 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427715&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:217
> +                # translate the last svn revision to git hash

This comment should be either removed or the following code should be extracted as a helper method.
r- because of this.
Comment 50 Zhifei Fang 2021-05-04 18:53:33 PDT
Created attachment 427723 [details]
Patch
Comment 51 Zhifei Fang 2021-05-04 18:54:01 PDT
Removed.
(In reply to Ryosuke Niwa from comment #49)
> Comment on attachment 427715 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427715&action=review
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:217
> > +                # translate the last svn revision to git hash
> 
> This comment should be either removed or the following code should be
> extracted as a helper method.
> r- because of this.
Comment 52 Ryosuke Niwa 2021-05-04 18:58:25 PDT
Comment on attachment 427723 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427723&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:240
> +        return self._run_git_command(['svn', 'find-rev', '{}'.format(git_hash)]).strip()

What is the point of calling format on '{}'? Why isn't this simply git_hash?
Comment 53 Zhifei Fang 2021-05-04 19:00:08 PDT
Created attachment 427726 [details]
Patch
Comment 54 Ryosuke Niwa 2021-05-04 19:02:26 PDT
Comment on attachment 427726 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427726&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:266
> +                raise ValueError('Cannot find SVN revison for {}'.format(current_revision))

This should be current_hash
Comment 55 Zhifei Fang 2021-05-04 20:06:15 PDT
Created attachment 427728 [details]
Patch
Comment 56 Ryosuke Niwa 2021-05-04 23:13:06 PDT
Comment on attachment 427728 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=427728&action=review

> Websites/perf.webkit.org/tools/sync-commits.py:258
> +                raise ValueError('Expected commit message to include revision identifier, but cannot find it, will need a history rewrite to fix it')

Ugh... sorry, I keep noticing new issues. Please see my earlier comment about revising this message.
Comment 57 Zhifei Fang 2021-05-07 16:24:02 PDT
(In reply to Ryosuke Niwa from comment #56)
Please see my previous comment about it:

"""

> > Websites/perf.webkit.org/tools/sync-commits.py:178
> > +                raise ValueError('Expected commit message includes revision identifier, but cannot find it')
> 
> Something like this:
> `Expected commit message includes revision identifier, but did not find -
> may have been caused by a history rewrite`


OK, but this won't be caused by a history rewrite, we will need a history rewrite to fix this, therefore this commit is invalid after that rewrite.

I will change this to something like: 
`Expected commit message includes revision identifier, but did not find - need a history rewrite to fix it`
"""

> Comment on attachment 427728 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=427728&action=review
> 
> > Websites/perf.webkit.org/tools/sync-commits.py:258
> > +                raise ValueError('Expected commit message to include revision identifier, but cannot find it, will need a history rewrite to fix it')
> 
> Ugh... sorry, I keep noticing new issues. Please see my earlier comment
> about revising this message.
Comment 58 EWS 2021-05-07 18:08:16 PDT
Committed r277220 (237491@main): <https://commits.webkit.org/237491@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 427728 [details].
Comment 59 Zhifei Fang 2021-06-14 11:31:49 PDT
*** Bug 222608 has been marked as a duplicate of this bug. ***