Bug 34033

Summary: svn-apply: Change svn-apply and svn-unapply to use new parsePatch().
Product: WebKit Reporter: Chris Jerdonek <cjerdonek>
Component: Tools / TestsAssignee: Chris Jerdonek <cjerdonek>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aroben, cjerdonek, darin, dbates, ddkilzer, eric
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 38094    
Bug Blocks: 27204    
Attachments:
Description Flags
Proposed patch
eric: review-, cjerdonek: commit-queue-
Proposed patch 2 eric: review+, cjerdonek: commit-queue-

Description Chris Jerdonek 2010-01-22 20:35:31 PST
This will simplify svn-apply and svn-unapply, increase their test coverage, and make them more similar.  It also paves the way for supporting SVN properties, e.g.:

https://bugs.webkit.org/show_bug.cgi?id=27204

parsePatch() is the unit-tested method introduced here:

https://bugs.webkit.org/show_bug.cgi?id=33475
Comment 1 Chris Jerdonek 2010-01-24 12:25:22 PST
Created attachment 47299 [details]
Proposed patch

I discovered Test::More's is_deeply function in this patch, which simplifies the unit tests a lot.  (I can use it to simplify some of the other unit test files, the next time I make a change to one of those.)

cq- so I can be around when this goes out.
Comment 2 Chris Jerdonek 2010-01-28 02:00:23 PST
Oops, like I messed up last weekend and added my check-webkit-style reviewers list instead of my svn-apply reviewer(s).  Adding you now, David K.  Thanks!
Comment 3 Eric Seidel (no email) 2010-02-04 15:14:08 PST
Comment on attachment 47299 [details]
Proposed patch

I tend to make long loop blocks into their own functions (well, long blocks in general):
 616     for my $diffHashRef (@diffHashRefs) {

But perhaps this one makes sense inline like this.


Why is shouldForce supported here?
 627                 die "Index path $indexPath appears twice as a copy target." unless $shouldForce;

Would that ever be a valid diff that we wanted to support processing of?  It seems more like such might be caused by programmer error.  If would have probably written it as a one-line perl if.

prepareParsedPatch seems mostly a sanity-checking method.  One which splits out into the various arrays of hashs.

Maybe it should be called something with "sort" in the name, since it sorts into buckets. Although that might get confusing as it's not sorting the order of the diffs in any way.


I guess if you did split the for innards out into a separate function, it could return the name of the hash and the outer loop could actually do the push.

copiedFiles seems uneeded.  You could do a linear search in all 3 cases, instead of having the special copiedFiles?  Or you could have a separate pass over the list of hashes (either before or after splitting them out into buckets) which makes sure that none of the index paths are doubled.  A separate pass strikes me as cleaner than wrapping it into this loop here.  Maybe python has just gotten me used to that pattern with all of it's map/reduce/list-comprehension style processing where you tend to make lots of separate passes over your data to do different things.

I assume we don't support applying multipel files, or do we?
29 my @diffHashRefs = parsePatch(*ARGV);
 130 
 131 print "Parsed " . @diffHashRefs . " diffs from patch file.\n";

If we do, we need to update the print message.


I'm sad that we do so much of svn-apply at the root level instead of using nicely named subroutines.

the move-copy for loop is one example of a case where I'm surprised we don't just make it a sub routine and call it:
 151 for my $copyDiffHashRef (@copyDiffHashRefs) {
Maybe that's a separate patch. :)

Can't perl return a list and decompose it on the fly?
 133 my $preparedPatchHash = prepareParsedPatch($force, @diffHashRefs);

So you could write that:

my (@copyDiffHashRefs, @ nonCopyDiffHashRefs, % sourceRevisions) = prepareParsedPatch();

Or am I dreaming of python?

Why don't we just call the hash-key "svnFormattedDiff":
 305     my $patch = $diffHashRef->{svnConvertedText}; # SVN-formatted diff

instead of adding a comment?

Does this work on windows to?  I guess we use CYGWIN?
 54 # Relative to root
 55 my $pattern = "WebKitTools/Scripts/webkitperl/*_unittest/*.pl";
4656 

Is this semantically differnet from having never defined the key?
 34     copiedFromPath => undef,

I'm not sure my little brain can parse those unit tests.  I guess I'll have to trust you that they're sane.

Overall this looks like a great change.  I'd love to see your response to my naming thoughts/questions before giving this a final r+ though.  Marking this r- for now, you can re-mark it r? if you believe it should land unchanged, or i'm happy to review any revised version now that I've read this one through and understand it! :)
Comment 4 Chris Jerdonek 2010-02-06 11:54:55 PST
(In reply to comment #3)

Thanks a lot for taking a crack at this, Eric!

Just a quick note on this patch since I agree with a lot of your
sentiments below.  My main goals with this patch were to connect up
the unit-tested code with more unit-tested code, while minimizing
the non-unit-tested changes to svn-apply and svn-unapply.  In other
words, I was trying to resist doing any other refactoring of
svn-apply/unapply, since as you have also noticed, there is a
lot of stuff that can be done. :)

A consequence of this decision was keeping those hashes and arrays
they were already looping through: the versions, copied files, and
non-copied files.  Unfortunately, this made for a unit-tested
function with a somewhat awkward purpose and return value.

> (From update of attachment 47299 [details])
> I tend to make long loop blocks into their own functions (well, long blocks in
> general):
>  616     for my $diffHashRef (@diffHashRefs) {
> 
> But perhaps this one makes sense inline like this.

Yes, I agree with your instinct here.  But since I'm not sure we'll 
want to keep prepareParsedPatch() in its current form as the
refactoring of svn-apply/unapply continues, it's probably not worth
the extra effort to break the inside out.  It's also possible that the
inside function would have an unavoidably messy signature and return
value given what prepareParsedPatch() does, but I'm not sure.

> Why is shouldForce supported here?
>  627                 die "Index path $indexPath appears twice as a copy
> target." unless $shouldForce;
> 
> Would that ever be a valid diff that we wanted to support processing of?  It
> seems more like such might be caused by programmer error.

Good question.  The original code actually proceeds without any
interruption in this case.  But it didn't seem quite right to me.
The compromise I made was to die, but to give the user the ability
to continue if it was wrong to die (that sounds weird :) ).  That
way we will also be able to find out whether this case occurs in
the wild.  It's possible this situation can come up if patches are
chained together -- I'm not sure.  If we are going to make any change
to this line, I would probably just take it out altogether.  It's
always a bit risky to constrain/add more validation to a script after
the fact, in the absence of detailed documentation.

> If would have
> probably written it as a one-line perl if.

The prevailing style I've noticed in these scripts is to use unless,
e.g. the following line which is visible in the patch file:

> if ($merge) {
>    die "--merge is currently only supported for SVN" unless isSVN();


> prepareParsedPatch seems mostly a sanity-checking method.  One which splits out
> into the various arrays of hashs.

It was more to prepare the variables so it can be "dropped in" to
svn-apply/unapply with little changes.  The sanity-checking was added
mostly after the fact.

> Maybe it should be called something with "sort" in the name, since it sorts
> into buckets. Although that might get confusing as it's not sorting the order
> of the diffs in any way.

Or perhaps "divide" or "split".  The dictionary of revision numbers
is orthogonal to the splitting up, so it's really doing a couple things.
Basically, it's not the most natural function.  I expect it will go away
over time.

> I guess if you did split the for innards out into a separate function, it could
> return the name of the hash and the outer loop could actually do the push.
> 
> copiedFiles seems uneeded.  You could do a linear search in all 3 cases,
> instead of having the special copiedFiles?

Yes, I thought about that.  I ended up choosing the route that allows
the code path to be unit tested.  Once we have more test coverage of
svn-apply/unapply, we can refactor in a more natural way. 

> I assume we don't support applying multipel files, or do we?
> 29 my @diffHashRefs = parsePatch(*ARGV);
>  130 
>  131 print "Parsed " . @diffHashRefs . " diffs from patch file.\n";
> 
> If we do, we need to update the print message.

Yes, I believe that can be done (although I'm not sure if anyone
does it), so you're right on changing the message.  Applying
multiple patches is one case that might have implications
for the sanity-checking (as I said above), so this is another case
where we're not sure on the scope of use.

> I'm sad that we do so much of svn-apply at the root level instead of using
> nicely named subroutines.

Yes, I agree.  I can envision svn-apply/unapply both being reduced
to a single call to the same function -- with a boolean $shouldReverse
parameter.


> the move-copy for loop is one example of a case where I'm surprised we don't
> just make it a sub routine and call it:
>  151 for my $copyDiffHashRef (@copyDiffHashRefs) {
> Maybe that's a separate patch. :)

Yes, we can do more later.

> Can't perl return a list and decompose it on the fly?
>  133 my $preparedPatchHash = prepareParsedPatch($force, @diffHashRefs);
> 
> So you could write that:
> 
> my (@copyDiffHashRefs, @ nonCopyDiffHashRefs, % sourceRevisions) =
> prepareParsedPatch();

Perl flattens arrays, so you can only rely on an array at the end.
To do nested arrays, you need to use references.  I chose to return
(a reference to) a hash, though, to avoid having to rely on the
ordering of the return value.  I probably should be calling that
"$preparedPatchHashRef".  It's possible there's a way to dereference
multiple hash values at once, but I'm not sure.  It might make
for a messy line.

> Why don't we just call the hash-key "svnFormattedDiff":
>  305     my $patch = $diffHashRef->{svnConvertedText}; # SVN-formatted diff
> 
> instead of adding a comment?

The containing "object" is itself called a diff, so it didn't make
as much sense for me to call one of its "properties" a diff.  So I
chose to call it the text associated to the diff.  As for converted
versus formatted, I chose converted since it is doing more than just
formatting.  For example, there can be a loss of information in the
conversion, etc. (e.g. the file modes that Git diffs have and 
other Git metadata).

> Does this work on windows to?  I guess we use CYGWIN?
>  54 # Relative to root
>  55 my $pattern = "WebKitTools/Scripts/webkitperl/*_unittest/*.pl";
> 4656 

I tried this on Windows, and it worked (no CYGWIN).  David K. has also
reviewed and commented on this specific line, and I'm sure he would
have mentioned something if it wasn't cross-platform.

> Is this semantically differnet from having never defined the key?
>  34     copiedFromPath => undef,

Yes, for example--

> my $hashRef = {key1 => undef};

means that--

exists($hashRef->{key1}) is true while
exists($hashRef->{key2}) is not true.

I will make some adjustments and re-submit after thinking more about your comments.  Thanks again!
Comment 5 Chris Jerdonek 2010-02-06 12:00:03 PST
(In reply to comment #4)
> (In reply to comment #3)

> > Why don't we just call the hash-key "svnFormattedDiff":
> >  305     my $patch = $diffHashRef->{svnConvertedText}; # SVN-formatted diff
> > 
> > instead of adding a comment?
> 
> The containing "object" is itself called a diff, so it didn't make
> as much sense for me to call one of its "properties" a diff.  So I
> chose to call it the text associated to the diff.  As for converted
> versus formatted, I chose converted since it is doing more than just
> formatting.  For example, there can be a loss of information in the
> conversion, etc. (e.g. the file modes that Git diffs have and 
> other Git metadata).

I meant to add here that I should probably just take that comment out
since it is redundant.  Thanks for noticing that.
Comment 6 Chris Jerdonek 2010-02-06 12:18:53 PST
(In reply to comment #4)
> (In reply to comment #3)
> > Is this semantically differnet from having never defined the key?
> >  34     copiedFromPath => undef,
> 
> Yes, for example--
> 
> > my $hashRef = {key1 => undef};
> 
> means that--
> 
> exists($hashRef->{key1}) is true while
> exists($hashRef->{key2}) is not true.

One more thing.  For completeness, you can use--

defined($hashRef->{key1})

rather than "exists" to distinguish "undef" from other values.  Both key1 and key2 have defined false in the above example.
Comment 7 Chris Jerdonek 2010-04-19 01:38:49 PDT
This is a heads up that I should be reposting a rebased patch within 1 or 2 days, incorporating whatever comments make sense at this time from Eric's comments.

This was spurred by conversation with Dan Bates from this report:

https://bugs.webkit.org/show_bug.cgi?id=27204#c28
Comment 8 Chris Jerdonek 2010-04-21 19:40:43 PDT
Created attachment 54016 [details]
Proposed patch 2

Resubmitting in response to Eric's comments in comment 3.

It's been quite a while since patch #1 was submitted.  Since it's been such a long time, below are the differences this patch has with patch #1:

diff --git a/WebKitTools/Scripts/VCSUtils.pm b/WebKitTools/Scripts/VCSUtils.pm
index 9c0b5bd..6c6f938 100644
--- a/WebKitTools/Scripts/VCSUtils.pm
+++ b/WebKitTools/Scripts/VCSUtils.pm
@@ -611,7 +611,7 @@ sub prepareParsedPatch($@)
     my %copiedFiles;
 
     # Return values
-    my @copyDiffHashRefs = (); # FIXME: Remove these initializers?
+    my @copyDiffHashRefs = ();
     my @nonCopyDiffHashRefs = ();
     my %sourceRevisionHash = ();
     for my $diffHashRef (@diffHashRefs) {
@@ -624,9 +624,9 @@ sub prepareParsedPatch($@)
             # Then the diff is a copy operation.
             $sourcePath = $copiedFromPath;
 
-            if (exists($copiedFiles{$indexPath})) {
-                die "Index path $indexPath appears twice as a copy target." unless $shouldForce;
-            }
+            # FIXME: Consider printing a warning or exiting if
+            #        exists($copiedFiles{$indexPath}) is true -- i.e. if
+            #        $indexPath appears twice as a copy target.
             $copiedFiles{$indexPath} = $sourcePath;
 
             push @copyDiffHashRefs, $diffHashRef;
diff --git a/WebKitTools/Scripts/svn-apply b/WebKitTools/Scripts/svn-apply
index 20a3f06..13b215b 100755
--- a/WebKitTools/Scripts/svn-apply
+++ b/WebKitTools/Scripts/svn-apply
@@ -128,7 +128,7 @@ my %copiedFiles;
 # otherwise get a bareword error.
 my @diffHashRefs = parsePatch(*ARGV);
 
-print "Parsed " . @diffHashRefs . " diffs from patch file.\n";
+print "Parsed " . @diffHashRefs . " diffs from patch file(s).\n";
 
 my $preparedPatchHash = prepareParsedPatch($force, @diffHashRefs);
 
@@ -303,7 +303,7 @@ sub patch($)
 {
     my ($diffHashRef) = @_;
 
-    my $patch = $diffHashRef->{svnConvertedText}; # SVN-formatted diff
+    my $patch = $diffHashRef->{svnConvertedText};
 
     unless ($patch =~ m|^Index: ([^\r\n]+)|) {
         my $separator = '-' x 67;
diff --git a/WebKitTools/Scripts/svn-unapply b/WebKitTools/Scripts/svn-unapply
index ec66b33..d5b95e8 100755
--- a/WebKitTools/Scripts/svn-unapply
+++ b/WebKitTools/Scripts/svn-unapply
@@ -101,7 +101,7 @@ my %directoriesToCheck;
 # otherwise get a bareword error.
 my @diffHashRefs = parsePatch(*ARGV);
 
-print "Parsed " . @diffHashRefs . " diffs from patch file.\n";
+print "Parsed " . @diffHashRefs . " diffs from patch file(s).\n";
 
 my $preparedPatchHash = prepareParsedPatch($force, @diffHashRefs);
 
@@ -139,7 +139,7 @@ sub patch($)
 {
     my ($diffHashRef) = @_;
 
-    my $patch = $diffHashRef->{svnConvertedText}; # SVN-formatted diff
+    my $patch = $diffHashRef->{svnConvertedText};
 
     unless ($patch =~ m|^Index: ([^\r\n]+)|) {
         my $separator = '-' x 67;
Comment 9 Chris Jerdonek 2010-04-23 11:22:42 PDT
(In reply to comment #8)
> Created an attachment (id=54016) [details]
> Proposed patch 2

Hi Eric, can you r+ this change?  A few months back (in comment 3), you more or less said you were willing to r+ it:

> I'd love to see your response to my
> naming thoughts/questions before giving this a final r+ though.  Marking this
> r- for now, you can re-mark it r? if you believe it should land unchanged

The update I just submitted is largely the same as the first patch with only minor differences in response to your comments.  Thanks!
Comment 10 Eric Seidel (no email) 2010-04-23 12:50:34 PDT
Comment on attachment 54016 [details]
Proposed patch 2

Looks right as far as I can tell.  Please be sure you ran test-webkitpy --all as those have more tests of svn-apply.
Comment 11 Chris Jerdonek 2010-04-23 13:11:44 PDT
(In reply to comment #10)
> (From update of attachment 54016 [details])
> Looks right as far as I can tell.  Please be sure you ran test-webkitpy --all
> as those have more tests of svn-apply.

Will do, thanks.  Will also be monitoring after landing.
Comment 12 Daniel Bates 2010-04-23 13:14:12 PDT
Awesome. Then we can look at bug #27204.

(In reply to comment #11)
> (In reply to comment #10)
> > (From update of attachment 54016 [details] [details])
> > Looks right as far as I can tell.  Please be sure you ran test-webkitpy --all
> > as those have more tests of svn-apply.
> 
> Will do, thanks.  Will also be monitoring after landing.
Comment 13 Chris Jerdonek 2010-04-23 13:16:14 PDT
(In reply to comment #12)
> Awesome. Then we can look at bug #27204.

Definitely!  I'll be touch with you about it off-line.

Thanks again for your patience. :)
Comment 14 Chris Jerdonek 2010-04-25 07:14:28 PDT
(In reply to comment #10)
> (From update of attachment 54016 [details])
> Looks right as far as I can tell.  Please be sure you ran test-webkitpy --all
> as those have more tests of svn-apply.

Good suggestion to run "test-webkitpy --all".

It turns out the new parsing code didn't support the format used by binary diffs.  I've added that support in a patch on another bug (bug 38094), so that "test-webkitpy --all" should now pass with this change.

Once bug 38094 lands, I should be able to land this as is.
Comment 15 Chris Jerdonek 2010-04-29 03:35:14 PDT
Committed:

http://trac.webkit.org/changeset/58495