Bug 32834 - svn-apply should handle git patches with similarity index, rename and copy directives
Summary: svn-apply should handle git patches with similarity index, rename and copy di...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Chris Jerdonek
URL:
Keywords:
Depends on: 38628
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-21 12:09 PST by David Kilzer (:ddkilzer)
Modified: 2010-05-08 19:54 PDT (History)
7 users (show)

See Also:


Attachments
Proposed patch (21.62 KB, patch)
2010-05-07 16:57 PDT, Chris Jerdonek
cjerdonek: commit-queue-
Details | Formatted Diff | Diff
Proposed patch 2 (21.68 KB, patch)
2010-05-07 17:01 PDT, Chris Jerdonek
dbates: review+
cjerdonek: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2009-12-21 12:09:42 PST
* SUMMARY
If you use the "[diff] renames = copies" git config setting (or the --find-copies-harder switch) with git diff, git will include information in the patch that hints at the history of the code and may significantly reduce the size of the patch.  While git uses only heuristics to accomplish this, it would be nice if svn-apply understood these directives so that it could map them to equivalent svn commands.

* EXAMPLES
$ git mv Makefile NMakefile$ git diff --find-copies-harder HEAD
diff --git a/Makefile b/NMakefile
similarity index 100%
rename from Makefile
rename to NMakefile

$ git diff --find-copies-harder HEAD
diff --git a/Makefile b/NMakefile
similarity index 100%
copy from Makefile
copy to NMakefile

$ git mv WebCore/ChangeLog WebCore/ChangeLog-2009-12-21
$ vi WebCore/ChangeLog-2009-12-21
$ git add WebCore/ChangeLog-2009-12-21
$ git diff --find-copies-harder --cached
diff --git a/WebCore/ChangeLog b/WebCore/ChangeLog-2009-12-21
similarity index 99%
rename from WebCore/ChangeLog
rename to WebCore/ChangeLog-2009-12-21
index ab18e51..8cb2b03 100644
--- a/WebCore/ChangeLog
+++ b/WebCore/ChangeLog-2009-12-21
@@ -72516,7 +72516,7 @@ form controls that have different validity states.
         * bindings/v8/custom/V8CustomBinding.h:
         * bindings/v8/custom/V8HTMLDataGridElementCustom.cpp: Added.
 
-2009-06-24  David Kilzer  <ddkilzer@apple.com>
+2009-06-24  David Kilzer  <ddkilzer@webkit.org>
 
         Build fixes for ENABLE(PLUGIN_PROXY_FOR_VIDEO)
 

NOTE: For small files (under a certain number of lines), the rename detection doesn't work.
Comment 1 Chris Jerdonek 2010-01-12 13:52:16 PST
I have a general question about our approach to improving support for Git.

For the purpose of submitting Git patches, it seems there can be (at least) two approaches:

(1) Make a "create-patch" script that supports creating a patch in standard format from both SVN and Git.

(2) Make svn-apply and svn-unapply support both Git and SVN format.

(Combining both approaches may also make sense -- e.g. to address the issue in this bug report.)

From what I can tell, we currently do (2) to some extent, but not (1).  Has anyone given much thought to these approaches?  Are there any reasons to prefer (2) over (1) or vice versa, or to not do (1)?
Comment 2 David Kilzer (:ddkilzer) 2010-01-13 08:36:20 PST
(In reply to comment #1)
> For the purpose of submitting Git patches, it seems there can be (at least) two
> approaches:
> 
> (1) Make a "create-patch" script that supports creating a patch in standard
> format from both SVN and Git.
> 
> (2) Make svn-apply and svn-unapply support both Git and SVN format.
> 
> (Combining both approaches may also make sense -- e.g. to address the
> issue in this bug report.)
> 
> From what I can tell, we currently do (2) to some extent, but not (1).  Has
> anyone given much thought to these approaches?  Are there any reasons to
> prefer (2) over (1) or vice versa, or to not do (1)?

I think (2) is a higher priority than (1) because git's own diff subcommand (with the --binary switch) is sufficient to capture all of the changes in a patch.

Because the project's source repository is subversion (with a moderate number of developers using git), I think it's slightly more important that git patches work with the tools to apply patches to an svn working directory than vice-versa.
Comment 3 Chris Jerdonek 2010-01-28 04:17:52 PST
(In reply to comment #2)
> (In reply to comment #1)
> > For the purpose of submitting Git patches, it seems there can be (at least) two
> > approaches:
> > 
> > (1) Make a "create-patch" script that supports creating a patch in standard
> > format from both SVN and Git.
> > 
> > (2) Make svn-apply and svn-unapply support both Git and SVN format.
> > 
> > (Combining both approaches may also make sense -- e.g. to address the
> > issue in this bug report.)
> > 
> > From what I can tell, we currently do (2) to some extent, but not (1).  Has
> > anyone given much thought to these approaches?  Are there any reasons to
> > prefer (2) over (1) or vice versa, or to not do (1)?
> 
> I think (2) is a higher priority than (1) because git's own diff subcommand
> (with the --binary switch) is sufficient to capture all of the changes in a
> patch.

I gave an example here of a case where the "git diff" command doesn't generate a patch in the standard format (at least with the way that I was running it):

https://lists.webkit.org/pipermail/webkit-dev/2010-January/011495.html

The example occurred because "git diff" doesn't run fixChangeLogPatch.

> Because the project's source repository is subversion (with a moderate number
> of developers using git), I think it's slightly more important that git patches
> work with the tools to apply patches to an svn working directory than
> vice-versa.
Comment 4 David Kilzer (:ddkilzer) 2010-01-28 08:08:58 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > For the purpose of submitting Git patches, it seems there can be (at least) two
> > > approaches:
> > > 
> > > (1) Make a "create-patch" script that supports creating a patch in standard
> > > format from both SVN and Git.
> > > 
> > > (2) Make svn-apply and svn-unapply support both Git and SVN format.
> > > 
> > > (Combining both approaches may also make sense -- e.g. to address the
> > > issue in this bug report.)
> > > 
> > > From what I can tell, we currently do (2) to some extent, but not (1).  Has
> > > anyone given much thought to these approaches?  Are there any reasons to
> > > prefer (2) over (1) or vice versa, or to not do (1)?
> > 
> > I think (2) is a higher priority than (1) because git's own diff subcommand
> > (with the --binary switch) is sufficient to capture all of the changes in a
> > patch.
> 
> I gave an example here of a case where the "git diff" command doesn't generate
> a patch in the standard format (at least with the way that I was running it):
> 
> https://lists.webkit.org/pipermail/webkit-dev/2010-January/011495.html
> 
> The example occurred because "git diff" doesn't run fixChangeLogPatch.

It'd be great if you could post the example here.  I think I know what you're referring to, but it's always best to have a concrete example.

I don't think this has been an issue in practice because:

- There is no "loss of information" if the unaltered ChangeLog patch is not fixed up, and it may optionally be fixed up at any point in the future.  All the pieces are there, just not in the optimal order for applying it to the top of the ChangeLog file.

- The commit queue uses svn-apply to apply patches which fixes up the ChangeLog patch before applying it.  Individual contributors may also be using svn-apply to apply patches, or they may be using resolve-ChangeLogs afterwards.

- Even in the case where the patch is completely rejected, there's still enough information in the *.rej file to reconstruct the original patch, fix it up, then apply it.  (I don't think fixChangeLogPatch() works with non-unified diff format, though, so this may not work properly today.)

Having said that, if you feel that (1) is more important than (2), then I'd focus on that first.

For whatever reason, this issue hasn't come up much.  Maybe that's because the ChangeLog patches end up getting applied successfully to the middle of the ChangeLog file instead of at the beginning, but look "okay" otherwise, and no one has noticed/complained about it yet.  (I think I've seen this behavior in the commit queue, but I though it was due to something else at the time.)
Comment 5 Chris Jerdonek 2010-01-28 08:24:12 PST
(In reply to comment #4)

> It'd be great if you could post the example here.  I think I know what you're
> referring to, but it's always best to have a concrete example.

Here is a concrete example (same submitter on same day):

https://bugs.webkit.org/show_bug.cgi?id=34249#c1

https://bug-34249-attachments.webkit.org/attachment.cgi?id=47606

I needed to manually commit it since I had to make changes anyways.

> I don't think this has been an issue in practice because:

I recognize that this might be totally innocuous, but just wanted to raise it as something else that can come up.  I want to avoid having to use wrapper scripts if possible, like you.

> - There is no "loss of information" if the unaltered ChangeLog patch is not
> fixed up, and it may optionally be fixed up at any point in the future.  All
> the pieces are there, just not in the optimal order for applying it to the top
> of the ChangeLog file.
> 
> - The commit queue uses svn-apply to apply patches which fixes up the ChangeLog
> patch before applying it.  Individual contributors may also be using svn-apply
> to apply patches, or they may be using resolve-ChangeLogs afterwards.

If this is the case, it seems like it might not be necessary to call fixChangeLogPatch in svn-create-patch.
Comment 6 David Kilzer (:ddkilzer) 2010-01-28 08:51:56 PST
(In reply to comment #5)
> (In reply to comment #4)
> > It'd be great if you could post the example here.  I think I know what you're
> > referring to, but it's always best to have a concrete example.
> 
> Here is a concrete example (same submitter on same day):
> 
> https://bugs.webkit.org/show_bug.cgi?id=34249#c1
> 
> https://bug-34249-attachments.webkit.org/attachment.cgi?id=47606
> 
> I needed to manually commit it since I had to make changes anyways.

This one is relatively minor.  If you have a patch series with the same date/author/email line followed by the same "Reviewed by NOBODY(OOPS!)." line, then you'll have even more of an offset.  (FWIW, this is another reason I like bug titles before the "Reviewed by" line, but that's really neither here nor there.)

> > - There is no "loss of information" if the unaltered ChangeLog patch is not
> > fixed up, and it may optionally be fixed up at any point in the future.  All
> > the pieces are there, just not in the optimal order for applying it to the top
> > of the ChangeLog file.
> > 
> > - The commit queue uses svn-apply to apply patches which fixes up the ChangeLog
> > patch before applying it.  Individual contributors may also be using svn-apply
> > to apply patches, or they may be using resolve-ChangeLogs afterwards.
> 
> If this is the case, it seems like it might not be necessary to call
> fixChangeLogPatch in svn-create-patch.

It's nice to clean up the ChangeLog patches, though, as long as you're using svn-create-patch.  The corrected patch shows the intent of the ChangeLog patch much better than the uncorrected version.
Comment 7 Chris Jerdonek 2010-01-28 09:07:21 PST
(In reply to comment #6)
> This one is relatively minor.  If you have a patch series with the same
> date/author/email line followed by the same "Reviewed by NOBODY(OOPS!)." line,
> then you'll have even more of an offset.  (FWIW, this is another reason I like
> bug titles before the "Reviewed by" line, but that's really neither here nor
> there.)

Have you considered any schemes to break the symmetry even more cleanly -- e.g. including the time in the date stamp?
Comment 8 David Kilzer (:ddkilzer) 2010-01-28 09:44:14 PST
(In reply to comment #7)
> (In reply to comment #6)
> > This one is relatively minor.  If you have a patch series with the same
> > date/author/email line followed by the same "Reviewed by NOBODY(OOPS!)." line,
> > then you'll have even more of an offset.  (FWIW, this is another reason I like
> > bug titles before the "Reviewed by" line, but that's really neither here nor
> > there.)
> 
> Have you considered any schemes to break the symmetry even more cleanly -- e.g.
> including the time in the date stamp?

No, I haven't.  I don't think that a timestamp would add any useful information to the date/name/email line, and it would complicate the parsing of that line.
Comment 9 Chris Jerdonek 2010-05-04 16:00:03 PDT
I was a bit surprised after comparing the behavior of the -C, -M, and
--find-copies-harder options of git-diff.

It seems that in some cases the --find-copies-harder option will do a worse
job of detecting renames than the -C and -M options (which are less
expensive at least for a smaller number of changed files).  See below, for
example (it's the last of the three scenarios where the difference can
be observed).

# Make a copy.
cp Makefile NMakefile
git add NMakefile
git commit -a

git diff --find-copies-harder HEAD^  # detects the copy
git diff -C HEAD^                    # does not detect the copy

# Turn the copy into a move.
rm Makefile
git commit -a --amend

git diff --find-copies-harder HEAD^  # detects the rename
git diff -C HEAD^                    # detects the rename

# Turn the move into a move-with-changes.
echo "more content" >> NMakefile 
git commit -a --amend

git diff --find-copies-harder HEAD^  # does not detect the rename
git diff -C HEAD^                    # does detect the rename (output is below)
git diff -M HEAD^                    # same result as -C
diff --git a/Makefile b/NMakefile
similarity index 99%
rename from Makefile
rename to NMakefile
index 1e50d1d..1459d21 100644
--- a/Makefile
+++ b/NMakefile
@@ -15,3 +15,4 @@ release r deployment dep deploy:
 clean:
        @for dir in $(MODULES); do ${MAKE} $@ -C $$dir; exit_status=$$?; \
        if [ $$exit_status -ne 0 ]; then exit $$exit_status; fi; done
+test


From the git-diff manual, this seems to be because --find-copies-hard
searches for potential sources by examining *only* those files that haven't
changed, and not both files that have and haven't changed.

Consequently, it may make sense in more cases to prefer using -C (or -M)
over --find-copies-harder.  This is in part because I imagine moves and
moves-with-changes are more common for contributors than straight copying
of files (without also deleting the source).

Also, based on the above, it looks like it might not be possible to
detect all copies and moves-with-changes for some diffs using a single
"git-diff" command.
Comment 10 Chris Jerdonek 2010-05-07 16:57:34 PDT
Created attachment 55434 [details]
Proposed patch
Comment 11 WebKit Review Bot 2010-05-07 17:00:56 PDT
Attachment 55434 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseSvnDiffHeader.pl:122:  Line contains tab character.  [whitespace/tab] [5]
WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseSvnDiffHeader.pl:132:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 2 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Chris Jerdonek 2010-05-07 17:01:38 PDT
Created attachment 55435 [details]
Proposed patch 2

Cleaned up ChangeLog.
Comment 13 WebKit Review Bot 2010-05-07 17:07:05 PDT
Attachment 55435 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseSvnDiffHeader.pl:122:  Line contains tab character.  [whitespace/tab] [5]
WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseSvnDiffHeader.pl:132:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 2 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 14 Daniel Bates 2010-05-08 18:40:42 PDT
Comment on attachment 55435 [details]
Proposed patch 2

We should update the documentation of sub parseDiff to reflect the fact that we now return an array.

Looks sane.

r=me.
Comment 15 Chris Jerdonek 2010-05-08 19:54:11 PDT
Committed:

http://trac.webkit.org/changeset/59044