Bug 38093

Summary: svn-apply: Remove unnecessary dividing line (i.e. "====...") logic
Product: WebKit Reporter: Chris Jerdonek <cjerdonek>
Component: Tools / TestsAssignee: Chris Jerdonek <cjerdonek>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cjerdonek, dbates, ddkilzer, eric
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 38094    
Attachments:
Description Flags
Proposed patch
cjerdonek: commit-queue-
Proposed patch 2
cjerdonek: commit-queue-
Proposed patch 3
cjerdonek: commit-queue-
Proposed patch 4
cjerdonek: commit-queue-
Proposed patch 5 dbates: review+, cjerdonek: commit-queue-

Description Chris Jerdonek 2010-04-25 05:23:23 PDT
This report is to simplify the code called by svn-apply and svn-unapply in the following way:

The code for parsing git-formatted patches contains logic for adding a dividing line (i.e. "====...") to make the patch look more SVN-like:

http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/VCSUtils.pm?rev=57453#L374

This code is cosmetic and isn't really necessary.  In fact, it doesn't do what it purports to do because it doesn't handle lines lacking the 6-character file mode string or lines containing more than 7 characters of the git hash.

Removing this code will simplify the code somewhat and make some subsequent changes somewhat easier.
Comment 1 Chris Jerdonek 2010-04-25 05:47:36 PDT
Created attachment 54236 [details]
Proposed patch
Comment 2 Chris Jerdonek 2010-04-25 05:49:19 PDT
(In reply to comment #1)
> Created an attachment (id=54236) [details]
> Proposed patch

By the way, I ran "test-webkitpy --all" and the svn-apply unit tests there checked out okay.
Comment 3 Chris Jerdonek 2010-04-25 06:37:21 PDT
Created attachment 54238 [details]
Proposed patch 2

Added the same correction to parseDiff.pl that I made to parseDiffHeader.pl.
Comment 4 Chris Jerdonek 2010-04-25 06:42:17 PDT
Created attachment 54239 [details]
Proposed patch 3

Fixed typo in ChangeLog (somwhat).
Comment 5 Daniel Bates 2010-04-26 12:36:00 PDT
Comment on attachment 54239 [details]
Proposed patch 3

> diff --git a/WebKitTools/Scripts/VCSUtils.pm b/WebKitTools/Scripts/VCSUtils.pm
> index 25a319b..7e09839 100644
> --- a/WebKitTools/Scripts/VCSUtils.pm
> +++ b/WebKitTools/Scripts/VCSUtils.pm
> @@ -371,10 +371,6 @@ sub gitdiff2svndiff($)
>      if (m#^diff --git \w/(.+) \w/([^\r\n]+)#) {
>          return "Index: $1$POSTMATCH";
>      }
> -    if (m#^index [0-9a-f]{7}\.\.[0-9a-f]{7} [0-9]{6}#) {
> -        # FIXME: No need to return dividing line once parseDiffHeader() is used.
> -        return "===================================================================$POSTMATCH";
> -    }

I agree, this doesn't seem needed. Moreover, as we discussed in person, by keeping this Git diff extension we can error check file mode changes.

> -            $_ = "=" x 67 . "$eol$_"; # Prepend dividing line ===....

(*)

>          } elsif (s/^\+\+\+ \S+/+++ $indexPath/) {
>              # +++
>              $foundHeaderEnding = 1;
> -        } else {
> -            # Skip unrecognized lines.
> -            next;
>          }

Now that we do not remove the git diff extension of the form "index [0-9a-f]{7}\.\.[0-9a-f]{7} [0-9]{6}", these lines (as well as other unrecognized lines such Git file mode changes) can appear in the SVN converted text. So, the SVN converted text may not correspond to a properly formatted SVN patch. If we want to keep a SVN converted form of the patch around then I would suggest that the converted text reflect a properly formatted SVN patch. Or at least, we should change the comments about svnConvertedText to reflect the fact that it represents a reasonable conversion.

Some additional notes...
From testing, the patch command does not seem to mind if extra lines appear in the diff header, so long as the ---/+++ lines exist. Notice, if we keep the "else" statement and (*) then the SVN converted text will reflect a properly formatted SVN patch, as produced by svn diff.
Comment 6 Chris Jerdonek 2010-04-26 13:11:12 PDT
(In reply to comment #5)
> (From update of attachment 54239 [details])

Thanks for the comments on the patch, Daniel.  As you probably noticed, I need to make this change and the change in bug 38094 before landing bug 34033.

> >          } elsif (s/^\+\+\+ \S+/+++ $indexPath/) {
> >              # +++
> >              $foundHeaderEnding = 1;
> > -        } else {
> > -            # Skip unrecognized lines.
> > -            next;
> >          }

> Now that we do not remove the git diff extension of the form "index
> [0-9a-f]{7}\.\.[0-9a-f]{7} [0-9]{6}", these lines (as well as other
> unrecognized lines such Git file mode changes) can appear in the SVN converted
> text. So, the SVN converted text may not correspond to a properly formatted SVN
> patch.

It turns out that the current code already allows some "non-SVN" lines to pass through in certain cases.  That's what I meant in comment 0 when I said that the line in gitdiff2svndiff() "doesn't do what it purports to do...".

For example, some git diffs leave out the 6-character file mode string (I think for new files, for example), and other git diffs include full git hashes and not just 7 characters.  The current code -- which doesn't use parseDiffHeader() -- allows those lines to pass through.

Maybe I'll dig up a couple examples from bugs.webkit.org and post links to them here.

> If we want to keep a SVN converted form of the patch around then I would
> suggest that the converted text reflect a properly formatted SVN patch. Or at
> least, we should change the comments about svnConvertedText to reflect the fact
> that it represents a reasonable conversion.

The latter is probably better since it might not make sense to speak of converting a git header into an SVN header in all cases (since they store different information).

Note that your point can also be made of gitdiff2svndiff() since it allows any lines to pass through that don't match any of the if-block regular expressions:

http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/VCSUtils.pm?rev=57453#L365

Really, what it's doing is converting a *subset* of the git diff lines to lines recognizable by the SVN-parsing code.  It just happens to look like it's converting the header into a properly formatted SVN header because that's what it does for the most common case.

> Some additional notes...
> From testing, the patch command does not seem to mind if extra lines appear in
> the diff header, so long as the ---/+++ lines exist.

Yes, that's what I've found, too.

In the long-term, it may make sense just to get rid of the svnConvertedText property altogether.  Whatever information is needed from the header we can store in a structured way as properties of the object, so we don't lose any information.  That way the calling code doesn't have to reparse the lines, and we won't be storing the same information in multiple spots (the data in the object will be more "normalized").
Comment 7 Chris Jerdonek 2010-04-26 13:16:26 PDT
(In reply to comment #6)
> Maybe I'll dig up a couple examples from bugs.webkit.org and post links to them
> here.

A couple examples can be seen in this attachment:

https://bug-38099-attachments.webkit.org/attachment.cgi?id=54257

Here the 6-character file mode does not appear in the index line:

diff --git a/LayoutTests/platform/chromium-linux/fast/css-generated-content/after-duplicated-after-split-expected.checksum b/LayoutTests/platform/chromium-linux/fast/css-generated-content/after-duplicated-after-split-expected.checksum
new file mode 100644
index 0000000..273b433
--- /dev/null
+++ b/LayoutTests/platform/chromium-linux/fast/css-generated-content/after-duplicated-after-split-expected.checksum
@@ -0,0 +1 @@


And here, full git hashes are being used:

diff --git a/LayoutTests/platform/chromium-linux/fast/css-generated-content/after-duplicated-after-split-expected.png b/LayoutTests/platform/chromium-linux/fast/css-generated-content/after-duplicated-after-split-expected.png
new file mode 100644
index 0000000000000000000000000000000000000000..809d7df4235478cc35d5378fb2b181997a06e356
GIT binary patch
Comment 8 Daniel Bates 2010-04-26 14:03:52 PDT
(In reply to comment #6)
> (In reply to comment #5)
> > (From update of attachment 54239 [details] [details])
> 
> Thanks for the comments on the patch, Daniel.  As you probably noticed, I need
> to make this change and the change in bug 38094 before landing bug 34033.
> 
> > >          } elsif (s/^\+\+\+ \S+/+++ $indexPath/) {
> > >              # +++
> > >              $foundHeaderEnding = 1;
> > > -        } else {
> > > -            # Skip unrecognized lines.
> > > -            next;
> > >          }
> 
> > Now that we do not remove the git diff extension of the form "index
> > [0-9a-f]{7}\.\.[0-9a-f]{7} [0-9]{6}", these lines (as well as other
> > unrecognized lines such Git file mode changes) can appear in the SVN converted
> > text. So, the SVN converted text may not correspond to a properly formatted SVN
> > patch.
> 
> It turns out that the current code already allows some "non-SVN" lines to pass
> through in certain cases.  That's what I meant in comment 0 when I said that
> the line in gitdiff2svndiff() "doesn't do what it purports to do...".

Correct, and as you pointed out, gitdiff2svndiff only recognizes and translates some subset of git diff extensions to the equivalent SVN diff extensions. Shouldn't we skip unrecognized lines then (i.e. keep the "else {... next; }" clause (*) in parseDiffHeader())? Then svnConvertedText represents a properly formatted SVN diff header.

> For example, some git diffs leave out the 6-character file mode string (I think
> for new files, for example), and other git diffs include full git hashes and
> not just 7 characters.  The current code -- which doesn't use parseDiffHeader()
> -- allows those lines to pass through.

Right. When we switch over to using the parsePatch()/parseDiffHeader() code path and if we keep (*) then such lines would not be part of  svnConvertedText.

> In the long-term, it may make sense just to get rid of the svnConvertedText
> property altogether.  Whatever information is needed from the header we can
> store in a structured way as properties of the object, so we don't lose any
> information.  That way the calling code doesn't have to reparse the lines, and
> we won't be storing the same information in multiple spots (the data in the
> object will be more "normalized").

Right. We seem to only keep it around as a by-product of using gitdiff2svndiff and for unit testing.
Comment 9 Chris Jerdonek 2010-04-26 14:58:41 PDT
(In reply to comment #8)

> Correct, and as you pointed out, gitdiff2svndiff only recognizes and translates
> some subset of git diff extensions to the equivalent SVN diff extensions.
> Shouldn't we skip unrecognized lines then (i.e. keep the "else {... next; }"
> clause (*) in parseDiffHeader())? Then svnConvertedText represents a properly
> formatted SVN diff header.

For the benefit of others reading the thread, Daniel and I discussed this on IRC.  One of the reasons that parseDiffHeader() needs to preserve unrecognized lines (at least in some cases) is that svn-apply has code which relies on certain Git-specific lines being present in the lines "converted" by gitdiff2svndiff(), e.g.

http://trac.webkit.org/browser/trunk/WebKitTools/Scripts/svn-apply?rev=57455#L347

Eventually, we should probably refactor that parsing code into parseDiffHeader() so that svn-apply doesn't need to do any parsing itself.

I included this as a FIXME in the patches for bug 34033.  See line 321 of the svn-apply portion of this patch, for instance:

https://bugs.webkit.org/attachment.cgi?id=54016&action=prettypatch

That is the reason why, for simplicity, I thought it would be best to preserve the existing behavior for now and allow all lines to pass through (even the unrecognized Git-specific ones).
Comment 10 Daniel Bates 2010-04-26 23:27:45 PDT
> --- a/WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseDiff.pl
> +++ b/WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseDiff.pl
> [...]
> @@ -288,19 +288,19 @@ sub testParseDiffAssertionArgs($)
>      my $testNameStart = "parseDiff(): [$testCaseHashRef->{diffName}] ";
>  
>      my @assertionArgsArrayRefs; # Return value
> -    my @assertionArgs;
> +
>      foreach my $diffHashRefKey (@diffHashRefKeys) {
>          my $testName = "${testNameStart}key=\"$diffHashRefKey\"";
> -        @assertionArgs = ($diffHashRef->{$diffHashRefKey}, $testCaseHashRef->{$diffHashRefKey}, $testName);
> -        push(@assertionArgsArrayRefs, \@assertionArgs);
> +        my @assertionArgs1 = ($diffHashRef->{$diffHashRefKey}, $testCaseHashRef->{$diffHashRefKey}, $testName);
> +        push(@assertionArgsArrayRefs, \@assertionArgs1);
>      }
>  
> -    @assertionArgs = ($lastReadLine, $testCaseHashRef->{lastReadLine}, "${testNameStart}lastReadLine");
> -    push(@assertionArgsArrayRefs, \@assertionArgs);
> +    my @assertionArgs2 = ($lastReadLine, $testCaseHashRef->{lastReadLine}, "${testNameStart}lastReadLine");
> +    push(@assertionArgsArrayRefs, \@assertionArgs2);
>  
>      my $nextLine = <$fileHandle>;
> -    @assertionArgs = ($nextLine, $testCaseHashRef->{nextLine}, "${testNameStart}nextLine");
> -    push(@assertionArgsArrayRefs, \@assertionArgs);
> +    my @assertionArgs3 = ($nextLine, $testCaseHashRef->{nextLine}, "${testNameStart}nextLine");
> +    push(@assertionArgsArrayRefs, \@assertionArgs3);
>  

From my understanding of (2) of section Making References of PerlRef <http://perldoc.perl.org/perlref.html>, we do not actually have to define new variables @assertionArgs1, @assertionArgs2, @assertionArgs3. Instead, we can write this portion of the patch using square-bracket notation (which returns a reference to an anonymous array) as follows:

...
my $assertion;

foreach my $diffHashRefKey (@diffHashRefKeys) {
    my $testName = "${testNameStart}key=\"$diffHashRefKey\"";
    $assertion = [$diffHashRef->{$diffHashRefKey}, $testCaseHashRef->{$diffHashRefKey}, $testName];
    push(@assertionArgsArrayRefs, $assertion);
}

$assertion = [$lastReadLine, $testCaseHashRef->{lastReadLine}, "${testNameStart}lastReadLine"];
push(@assertionArgsArrayRefs, $assertion);

my $nextLine = <$fileHandle>;
$assertion = [$nextLine, $testCaseHashRef->{nextLine}, "${testNameStart}nextLine"];
push(@assertionArgsArrayRefs, $assertion);

>      return @assertionArgsArrayRefs;
>  }
> diff --git a/WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseDiffHeader.pl b/WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseDiffHeader.pl
> index a7a3c26..a828800 100644
> --- a/WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseDiffHeader.pl
> +++ b/WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseDiffHeader.pl
> [...]
> @@ -247,19 +248,19 @@ sub testParseDiffHeaderAssertionArgs($)
>      my $testNameStart = "parseDiffHeader(): [$testCaseHashRef->{diffName}] ";
>  
>      my @assertionArgsArrayRefs; # Return value
> -    my @assertionArgs;
> +
>      foreach my $diffHeaderHashRefKey (@diffHeaderHashRefKeys) {
>          my $testName = "${testNameStart}key=\"$diffHeaderHashRefKey\"";
> -        @assertionArgs = ($headerHashRef->{$diffHeaderHashRefKey}, $testCaseHashRef->{$diffHeaderHashRefKey}, $testName);
> -        push(@assertionArgsArrayRefs, \@assertionArgs);
> +        my @assertionArgs1 = ($headerHashRef->{$diffHeaderHashRefKey}, $testCaseHashRef->{$diffHeaderHashRefKey}, $testName);
> +        push(@assertionArgsArrayRefs, \@assertionArgs1);
>      }
>  
> -    @assertionArgs = ($lastReadLine, $testCaseHashRef->{lastReadLine}, "${testNameStart}lastReadLine");
> -    push(@assertionArgsArrayRefs, \@assertionArgs);
> +    my @assertionArgs2 = ($lastReadLine, $testCaseHashRef->{lastReadLine}, "${testNameStart}lastReadLine");
> +    push(@assertionArgsArrayRefs, \@assertionArgs2);
>  
>      my $nextLine = <$fileHandle>;
> -    @assertionArgs = ($nextLine, $testCaseHashRef->{nextLine}, "${testNameStart}nextLine");
> -    push(@assertionArgsArrayRefs, \@assertionArgs);
> +    my @assertionArgs3 = ($nextLine, $testCaseHashRef->{nextLine}, "${testNameStart}nextLine");
> +    push(@assertionArgsArrayRefs, \@assertionArgs3);

Similarly, we can write this portion of the patch using the square-bracket notation.
Comment 11 Chris Jerdonek 2010-04-28 06:18:37 PDT
Created attachment 54558 [details]
Proposed patch 4

Incorporated Daniel's suggestion to create references to anonymous arrays instead of creating arrays.  (Great suggestion, Daniel!)
Comment 12 Chris Jerdonek 2010-04-28 06:25:06 PDT
Created attachment 54559 [details]
Proposed patch 5

Renamed $assertionArgs variables to $assertionArgsRef since they now represent references to arrays rather than arrays.
Comment 13 Daniel Bates 2010-04-28 09:34:52 PDT
Comment on attachment 54559 [details]
Proposed patch 5

Thanks Chris! I didn't see the changes that we discussed about updating the svnConvertedText comments to reflect the fact that this variable does not correspond to a properly formatted SVN diff.

r=me.
Comment 14 Chris Jerdonek 2010-04-28 09:50:50 PDT
(In reply to comment #13)

Thanks for the review, Daniel!

> (From update of attachment 54559 [details])
> Thanks Chris! I didn't see the changes that we discussed about updating the
> svnConvertedText comments to reflect the fact that this variable does not
> correspond to a properly formatted SVN diff.

You're right.  I overlooked that.  I'll be sure to update the code comment prior to landing.
Comment 15 Chris Jerdonek 2010-04-28 19:33:21 PDT
Committed:

http://trac.webkit.org/changeset/58464