Bug 214957 - Replace 'http://svn.webkit.org' with 'https://svn.webkit.org' in webkitpy scripts
Summary: Replace 'http://svn.webkit.org' with 'https://svn.webkit.org' in webkitpy scr...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Fujii Hironori
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-07-29 20:58 PDT by Fujii Hironori
Modified: 2021-01-07 08:50 PST (History)
13 users (show)

See Also:


Attachments
Patch (7.52 KB, patch)
2020-07-29 21:21 PDT, Fujii Hironori
dbates: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fujii Hironori 2020-07-29 20:58:42 PDT
Replace 'http://svn.webkit.org' with 'https://svn.webkit.org' in webkitpy scripts

There is a problem I found

* "webkit-patch land" asks username and passoword everytime because SVNRepository.has_authorization_for_realm fails to find ream string.
Comment 1 Fujii Hironori 2020-07-29 21:21:39 PDT
Created attachment 405544 [details]
Patch
Comment 2 Daniel Bates 2020-07-29 22:28:32 PDT
Comment on attachment 405544 [details]
Patch

Patch looks good.
Comment 3 Fujii Hironori 2020-07-29 23:58:32 PDT
Committed r265077: <https://trac.webkit.org/changeset/265077>
Comment 4 Radar WebKit Bug Importer 2020-07-29 23:59:20 PDT
<rdar://problem/66313904>
Comment 5 Sergio Villar Senin 2020-08-20 02:01:51 PDT
It stills asks for username/password to me everytime I try to land a patch.
Comment 6 Alexey Proskuryakov 2020-08-20 08:39:24 PDT
Could you please file a new bug? The patch here was landed and not reverted, there is no evidence that the issue is related in any way other than the symptom.
Comment 7 Carlos Alberto Lopez Perez 2020-10-15 18:46:50 PDT
This patch caused me a few headaches.

It broke webkit-patch land for me because I had configured the svn server of the git repository (git-svn) to be http://svn.webkit.org (not https). I did the setup some years ago when https was not an option. So the config I had in ~/subversion is for http://svn.webkit.org and not for https://svn.webkit.org, therefore after this patch SVNRepository.has_authorization_for_realm failed for me causing webkit-patch to exit with a cryptic error:

COMMIT FAILED: 
Traceback (most recent call last):
  File "Tools/Scripts/webkit-patch", line 80, in <module>
    main()
  File "Tools/Scripts/webkit-patch", line 75, in main
    WebKitPatch(os.path.abspath(__file__)).main()
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/tool/multicommandtool.py", line 305, in main
    result = command.check_arguments_and_execute(options, args, self)
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/tool/multicommandtool.py", line 123, in check_arguments_and_execute
    return self.execute(options, args, tool) or 0
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/tool/commands/abstractsequencedcommand.py", line 55, in execute
    self._sequence.run_and_handle_errors(tool, options, state)
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/tool/commands/stepsequence.py", line 73, in run_and_handle_errors
    self._run(tool, options, state)
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/tool/commands/stepsequence.py", line 67, in _run
    step(tool, options).run(state)
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/tool/steps/commit.py", line 89, in run
    svn_revision = scm.svn_revision_from_commit_text(commit_text)
  File "/home/clopez/webkit/webkit/Tools/Scripts/webkitpy/common/checkout/scm/scm.py", line 107, in svn_revision_from_commit_text
    return match.group('svn_revision')
AttributeError: 'NoneType' object has no attribute 'group'


I tried (hard) to re-configure my setup to use https://svn.webkit.org but there is no way git-svn likes this. I even tried to purge all of my .git/svn* data and resync everything but it doesn't work. I suspect it may be related with the fact that the svn string we log in all git commits (at the end, where it says git-svn-id:) has actually a http:// prefix and not an https:// one.

I ended workarounding this by adding manually the config for https://svn.webkit.org on ~/subversion to make SVNRepository.has_authorization_for_realm happy

Now, i wonder.. i'm the only one that had trouble with this? Does "webkit-patch land" work fine for you from a git repository?
Comment 8 Carlos Alberto Lopez Perez 2020-10-15 18:49:48 PDT
(In reply to Sergio Villar Senin from comment #5)
> It stills asks for username/password to me everytime I try to land a patch.

This is likely caused by default svn config.

Check that you don't have "store-passwords = no" on ~/.subversion/config
You can set also "password-stores = gnome-keyring" to not store it on plaintext
More info: https://stackoverflow.com/q/2899209
Comment 9 Fujii Hironori 2020-10-15 19:35:32 PDT
Use WebKit-https.git. Follow this 
https://webkit.org/getting-the-code/#checking-out-with-git

As far as I tested, webkit-patch setup-git-clone fails for WebKit.git.
So, all commiters should use WebKit-https.git.
Comment 10 Carlos Alberto Lopez Perez 2020-10-15 20:47:15 PDT
(In reply to Fujii Hironori from comment #9)
> Use WebKit-https.git. Follow this 
> https://webkit.org/getting-the-code/#checking-out-with-git
> 
> As far as I tested, webkit-patch setup-git-clone fails for WebKit.git.
> So, all commiters should use WebKit-https.git.

Oook.. that explains it.. so there is a git://git.webkit.org/WebKit-https.git repository that has the entries for git-svn-id: with https instead of that. Didn't know that, interesting.

The problem I see with this is that this repository has a different git hash for every commit (because the commit log changes due to using https for git-svn-id)
So if you keep several remotes (like I do) for example to easily push/pull from different forks on github, gitlab, etc.. using this WebKit-https.git repo is a problem since the hashes don't match with every other fork of the webkit repository out there and git can't merge the common commits :( so I guess this will cause issues when rebasing/merging branches
Comment 11 Sergio Villar Senin 2020-10-16 01:25:27 PDT
(In reply to Carlos Alberto Lopez Perez from comment #8)
> (In reply to Sergio Villar Senin from comment #5)
> > It stills asks for username/password to me everytime I try to land a patch.
> 
> This is likely caused by default svn config.
> 
> Check that you don't have "store-passwords = no" on ~/.subversion/config
> You can set also "password-stores = gnome-keyring" to not store it on
> plaintext
> More info: https://stackoverflow.com/q/2899209

I doubt it's a problem in my svn config because I haven't touched it and it has been working fine for years.

BTW store-passwords is not set to no and I had already set "password-stores = gnome-keyring,kwallet" so that is not the problem either.
Comment 12 Jonathan Bedard 2020-10-16 08:41:14 PDT
I don't think sorting out the http vs https and different hashes in the git clone is going to be very worthwhile. The git clone, as is, has a number of deficiencies (namely, it doesn't actually have any branches)

At the moment, I'm doing a deep git clone of WebKit for the GitHub transition, and I intend to add commit identifiers into the commit messages of all historical commits (which obviously breaks existing hashes).
Comment 13 Carlos Alberto Lopez Perez 2020-10-16 08:52:29 PDT
(In reply to Jonathan Bedard from comment #12)
> I don't think sorting out the http vs https and different hashes in the git
> clone is going to be very worthwhile. The git clone, as is, has a number of
> deficiencies (namely, it doesn't actually have any branches)
> 
> At the moment, I'm doing a deep git clone of WebKit for the GitHub
> transition, and I intend to add commit identifiers into the commit messages
> of all historical commits (which obviously breaks existing hashes).

why adding commit identifiers to the current repo is needed? those are already tagged with the git-svn-id tag. The new tagging system can be used for the commits after the migration, but the older ones are already tagged with the revision number.

I foresee pain for developers if we switch to a new git repo that breakes the hashes of the de-facto webkit repository (the one mirrored in github). To start with all the forks we have in github will be completely broken and will not longer be a fork but a disjoint repository with no commits in common.
Comment 14 Jonathan Bedard 2020-10-16 09:15:53 PDT
(In reply to Carlos Alberto Lopez Perez from comment #13)
> (In reply to Jonathan Bedard from comment #12)
> > I don't think sorting out the http vs https and different hashes in the git
> > clone is going to be very worthwhile. The git clone, as is, has a number of
> > deficiencies (namely, it doesn't actually have any branches)
> > 
> > At the moment, I'm doing a deep git clone of WebKit for the GitHub
> > transition, and I intend to add commit identifiers into the commit messages
> > of all historical commits (which obviously breaks existing hashes).
> 
> why adding commit identifiers to the current repo is needed? those are
> already tagged with the git-svn-id tag. The new tagging system can be used
> for the commits after the migration, but the older ones are already tagged
> with the revision number.
> 
> I foresee pain for developers if we switch to a new git repo that breakes
> the hashes of the de-facto webkit repository (the one mirrored in github).
> To start with all the forks we have in github will be completely broken and
> will not longer be a fork but a disjoint repository with no commits in
> common.

The GitHub mirror is explicitly advertised as an unofficial one. And SVN revisions will be dying in the very near future, so everyone will find they are not of much use in the near future. We're trying to optimize for the future, I'm not aware of anyone actively tracking hashes in the WebKit mirror, breaking forks seems like an acceptable trade-off in exchange for a consistent labeling scheme.
Comment 15 Ryosuke Niwa 2020-10-16 14:56:11 PDT
(In reply to Jonathan Bedard from comment #14)
> (In reply to Carlos Alberto Lopez Perez from comment #13)
> > (In reply to Jonathan Bedard from comment #12)
> > > I don't think sorting out the http vs https and different hashes in the git
> > > clone is going to be very worthwhile. The git clone, as is, has a number of
> > > deficiencies (namely, it doesn't actually have any branches)
> > > 
> > > At the moment, I'm doing a deep git clone of WebKit for the GitHub
> > > transition, and I intend to add commit identifiers into the commit messages
> > > of all historical commits (which obviously breaks existing hashes).
> > 
> > why adding commit identifiers to the current repo is needed? those are
> > already tagged with the git-svn-id tag. The new tagging system can be used
> > for the commits after the migration, but the older ones are already tagged
> > with the revision number.
> > 
> > I foresee pain for developers if we switch to a new git repo that breakes
> > the hashes of the de-facto webkit repository (the one mirrored in github).
> > To start with all the forks we have in github will be completely broken and
> > will not longer be a fork but a disjoint repository with no commits in
> > common.
> 
> The GitHub mirror is explicitly advertised as an unofficial one. And SVN
> revisions will be dying in the very near future, so everyone will find they
> are not of much use in the near future. We're trying to optimize for the
> future, I'm not aware of anyone actively tracking hashes in the WebKit
> mirror, breaking forks seems like an acceptable trade-off in exchange for a
> consistent labeling scheme.

FWIW, we can probably archive the existing GitHub mirror as DepreactedWebKitMirror or something for a while so that people who have links, etc... can reference the old mirror. Alternatively, we can build some conversion tool / service to map old Git hashes to new ones.
Comment 16 Emilio Cobos Álvarez (:emilio) 2020-10-17 17:07:07 PDT
(In reply to Jonathan Bedard from comment #14)
> I'm not aware of anyone actively tracking hashes in the WebKit
> mirror.

I don't think it necessarily changes the trade-offs, but permalinks to https://webkit-search.igalia.com do use the commit hash.
Comment 17 Konstantin Tokarev 2020-10-18 16:28:11 PDT
(In reply to Jonathan Bedard from comment #12)
> At the moment, I'm doing a deep git clone of WebKit for the GitHub
> transition, and I intend to add commit identifiers into the commit messages
> of all historical commits (which obviously breaks existing hashes).

I think this is completely unnecessary and doesn't justify breaking 2000 existing forks. There are at least two better alternatives:

1. Start adding new identifiers in new commits and just leave old ones alone. They were made in "svn era" when svn repo was a source of truth. These commits are addressed by their revision number in existing products, and AFAIU adding new post-factum identifiers won't solve any real world task.

2. Use lightweight tags and keep new identifiers completely separate from commit messages. Decorated git log will show these new identifiers for all commits.
Comment 18 Konstantin Tokarev 2020-10-18 16:30:49 PDT
(In reply to Konstantin Tokarev from comment #17)
> 2. Use lightweight tags and keep new identifiers completely separate from
> commit messages. Decorated git log will show these new identifiers for all
> commits.

It would be also possible to use new identifiers for addressing commits in git commands, which seems to me as must have feature for any alternative identifiers system.
Comment 19 Ryosuke Niwa 2020-10-18 16:32:23 PDT
(In reply to Konstantin Tokarev from comment #18)
> (In reply to Konstantin Tokarev from comment #17)
> > 2. Use lightweight tags and keep new identifiers completely separate from
> > commit messages. Decorated git log will show these new identifiers for all
> > commits.
> 
> It would be also possible to use new identifiers for addressing commits in
> git commands, which seems to me as must have feature for any alternative
> identifiers system.

No, we really need to be address all past commits the same way. That's simply not acceptable. We track down regressions multiple years, even decades all the time.
Comment 20 Konstantin Tokarev 2020-10-18 16:35:05 PDT
Did you quote the right part of my comment? Because using lightweight tags is exactly a way which allows to add new identifiers to old commits without changing history.
Comment 21 Konstantin Tokarev 2020-10-18 16:55:03 PDT
Another possibility is using git notes [1] instead of tags. They can also be show in git log (though they won't work as native commit identifiers)

[1] https://git-scm.com/docs/git-notes
Comment 22 Ryosuke Niwa 2020-10-18 17:23:53 PDT
I really don't think we should give a high priority to preserving the history with the existing forks. Having the information of existing branches, etc... is way more important for most WebKit contributors. I certainly don't care that all of my got clones need to be re-created.
Comment 23 Konstantin Tokarev 2020-10-18 17:34:50 PDT
>Having the information of existing branches, etc... is way more important for most WebKit contributors.

Branches can be easily added to existing GitHub mirror simply by changing git-svn configuration, see e.g. https://github.com/qtwebkit/webkit-mirror

>I certainly don't care that all of my got clones need to be re-created.

So you don't care, and those who do should suffer? Okay.
Comment 24 Ryosuke Niwa 2020-10-18 17:44:52 PDT
(In reply to Konstantin Tokarev from comment #23)
> >Having the information of existing branches, etc... is way more important for most WebKit contributors.
> 
> Branches can be easily added to existing GitHub mirror simply by changing
> git-svn configuration, see e.g. https://github.com/qtwebkit/webkit-mirror
> 
> >I certainly don't care that all of my got clones need to be re-created.
> 
> So you don't care, and those who do should suffer? Okay.

That's right. The consideration we give should be proportional to the contributions they make back to the WebKit project. If they're making a fork and making changes there, I don't really see why we'd give much considerations for them.
Comment 25 Konstantin Tokarev 2020-10-18 18:11:55 PDT
We have to make forks because there was a rule that upstream WebKit port should have at least three full-time people working on it. Any plan to change this rule so forks are not needed anymore?
Comment 26 Ryosuke Niwa 2020-10-18 18:53:40 PDT
(In reply to Konstantin Tokarev from comment #25)
> We have to make forks because there was a rule that upstream WebKit port
> should have at least three full-time people working on it. Any plan to
> change this rule so forks are not needed anymore?

No plan to change that. Again, the requirement is there to avoid situations in which ports that don't contribute much beyond maintaining their own port will put undue burden onto the rest of the community with a high maintenance cost. If a port has less than 3 full time people, then this will most certainly happen.

The same goes for this Git hash concerns. The concerns of the majority of contributors come first because they're the ones making forward progress in the project. We can't make their experience worse / make them less productive to care for forks of unofficial Git mirrors. Right now, the official VCS of the WebKit project is Subversion. If you're using anything else to have your own fork, you're on your own.
Comment 27 Konstantin Tokarev 2020-10-18 19:14:27 PDT
>We can't make their experience worse / make them less productive to care for forks of unofficial Git mirrors.

How does using git tags or git notes make them less productive then rewriting commit messages? The latter way requires parsing identifiers back from commit messages to make any use of them, which is far from being "productive" in my book.

>If you're using anything else to have your own fork, you're on your own.

Sorry, but using Subversion for maintaining forks is extremely non-productive because of inferior tooling.
Comment 28 Jonathan Bedard 2020-10-19 08:51:12 PDT
(In reply to Konstantin Tokarev from comment #17)
> (In reply to Jonathan Bedard from comment #12)
> > At the moment, I'm doing a deep git clone of WebKit for the GitHub
> > transition, and I intend to add commit identifiers into the commit messages
> > of all historical commits (which obviously breaks existing hashes).
> 
> I think this is completely unnecessary and doesn't justify breaking 2000
> existing forks. There are at least two better alternatives:
> 
> 1. Start adding new identifiers in new commits and just leave old ones
> alone. They were made in "svn era" when svn repo was a source of truth.
> These commits are addressed by their revision number in existing products,
> and AFAIU adding new post-factum identifiers won't solve any real world task.
> 
> 2. Use lightweight tags and keep new identifiers completely separate from
> commit messages. Decorated git log will show these new identifiers for all
> commits.

I noticed this weekend that the GitHub clone is the http clone anyways, so we're defiantly going to be breaking hashes on clones of that version of the repository (since the deep clone will be https).
Comment 29 Jonathan Bedard 2020-10-19 08:57:05 PDT
(In reply to Emilio Cobos Álvarez (:emilio) from comment #16)
> (In reply to Jonathan Bedard from comment #14)
> > I'm not aware of anyone actively tracking hashes in the WebKit
> > mirror.
> 
> I don't think it necessarily changes the trade-offs, but permalinks to
> https://webkit-search.igalia.com do use the commit hash.

Are those using the GitHub http hashes or the https hashes?
Comment 30 Konstantin Tokarev 2020-10-19 09:02:13 PDT
> I noticed this weekend that the GitHub clone is the http clone anyways, so
> we're defiantly going to be breaking hashes on clones of that version of the
> repository (since the deep clone will be https).

The only difference is having "https" urls in git-svn-id messages instead of "http". Does it really matter?
Comment 31 Jonathan Bedard 2020-10-19 09:40:39 PDT
(In reply to Konstantin Tokarev from comment #30)
> > I noticed this weekend that the GitHub clone is the http clone anyways, so
> > we're defiantly going to be breaking hashes on clones of that version of the
> > repository (since the deep clone will be https).
> 
> The only difference is having "https" urls in git-svn-id messages instead of
> "http". Does it really matter?

We already have an https git clone in use, so unifying the repository in GitHub will be breaking someone regardless of what we do. Which is why I'm an advocate for making things the best they can be for the future, even if that means some temporary pain.
Comment 32 Carlos Alberto Lopez Perez 2020-10-19 10:34:01 PDT
(In reply to Jonathan Bedard from comment #31)
> (In reply to Konstantin Tokarev from comment #30)
> > > I noticed this weekend that the GitHub clone is the http clone anyways, so
> > > we're defiantly going to be breaking hashes on clones of that version of the
> > > repository (since the deep clone will be https).
> > 
> > The only difference is having "https" urls in git-svn-id messages instead of
> > "http". Does it really matter?
> 
> We already have an https git clone in use, so unifying the repository in
> GitHub will be breaking someone regardless of what we do. Which is why I'm
> an advocate for making things the best they can be for the future, even if
> that means some temporary pain.

I noticed past week that we have an alternative git repository with "https" for the git-svn-id hashes. 

Not sure how popular this alternative https-git repository is, but I would bet it isn't nowhere as popular as the one in github (the one with "http" in the git-svn-id), which is also the original git repository provided by this project as an alternative to svn at git.webkit.org/git/WebKit.git

That repository in github is forked 2k times. I wouldn't want to break 2k forks if there is an alternative to keep the future repository compatible with those existing forks.
Comment 33 Konstantin Tokarev 2020-10-19 11:09:41 PDT
FWIW, if we settle on breaking old history, I'm going to set up alternative automatic mirror which will have new commits rebased onto old history. Anyone will be able to use it instead of official one for keeping their forks up to date.
Comment 34 Alex 2021-01-05 07:58:04 PST
Hi!
Haiku project here!

We're suffering a *LOT* of pain on this one as well.

Webkit is placing new platforms into a weird catch-22 situation.

1. No new platform code accepted until testers + builders
2. Testers + builders won't work until new code in tree
3. Git history churn means you can't keep new platform support forks for very long.


Our team spends 99% of it's time now just dealing with git churn and rewrites at webkit and not making any port progress.

Please help by stopping this.  This isn't how git works.
Comment 35 Jonathan Bedard 2021-01-05 08:05:11 PST
(In reply to Alex from comment #34)
> Hi!
> Haiku project here!
> 
> We're suffering a *LOT* of pain on this one as well.
> 
> Webkit is placing new platforms into a weird catch-22 situation.
> 
> 1. No new platform code accepted until testers + builders
> 2. Testers + builders won't work until new code in tree
> 3. Git history churn means you can't keep new platform support forks for
> very long.
> 
> 
> Our team spends 99% of it's time now just dealing with git churn and
> rewrites at webkit and not making any port progress.
> 
> Please help by stopping this.  This isn't how git works.

We had two different git histories, the http and https one. The work over the last month on the new GitHub mirror was about cleaning up our history so we can move forward, eventually deprecating SVN in favor of git. We are done editing history now (at least on main, but we aren't canonically tracking other branches in Git yet anyways).

If you need help rebasing branches on the new history, let me know. It shouldn't be too difficult, and will be a one-time operation.
Comment 36 Alex 2021-01-05 08:11:58 PST
We can always use a hand :-)

https://github.com/haiku/webkit/tree/rebased

I got in over my head pretty quickly trying to rebase on master lol.
Comment 37 Jonathan Bedard 2021-01-05 08:21:05 PST
(In reply to Alex from comment #36)
> We can always use a hand :-)
> 
> https://github.com/haiku/webkit/tree/rebased
> 
> I got in over my head pretty quickly trying to rebase on master lol.

I have a partial set of instructions to do the rebase which I was going to send out to everyone who forked the project this week, give me a bit to double-check that process against your fork, and I will post it to this bug.
Comment 38 Jonathan Bedard 2021-01-07 08:50:58 PST
Here is the set of instructions to rebase onto the new primary branch:

Checkout the latest updated WebKit into your fork:
	git remote add canonical https://github.com/WebKit/WebKit.git
	git fetch canonical

Determine the identifier of your branch point, called “branch-point-old”:
	git checkout canonical/main
	Tools/Scripts/git-webkit find <branch-point-old>
where “branch-point-old” is a git reference (either hash or branch, usually) to the point your fork diverged from WebKit’s main branch. This should give you an identifier of the form <integer>@<branch>.

Map that branch point onto the canonical repository, whose’s ref we will call “branch-point-new” :
	Tools/Scripts/git-webkit find <integer>@main

Create a new branch and rebase your changes from the old repository onto the new:
	git branch -f <my-branch-new> <branch-point-new>
	git rebase —onto <my-branch-new> <branch-point-old> <my-branch-old>
where “my-branch-new” is a new branch name.

The issue with this approach (as I discovered working with Alex) is that it breaks-down with merge-commits, if your fork has been combining WebKit changes with your own changes, rather than rebasing your changes on top of WebKit's changes, things are going to be more difficult.