Bug 30683

Summary: svn-apply's fixChangeLogPatch function seems broken
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: Tools / TestsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aroben, cjerdonek, commit-queue, darin, dbates, ddkilzer, dglazkov, levin, otte
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on: 30687, 30791, 36257    
Bug Blocks: 30634, 31092    
Attachments:
Description Flags
Patch v1
none
Share this code in more places
none
Updated ChangeLog
none
Fix as suggested by Dave. Still includes method moves so as not to depend on bug 30791.
none
Update after bug 30791 none

Description Eric Seidel (no email) 2009-10-22 11:39:45 PDT
svn-apply's fixChangeLogPatch function seems broken

I've not been able to get it to work yet in my testing.  I'm going to see if I can fix it...
Comment 1 Eric Seidel (no email) 2009-10-22 11:59:23 PDT
Actually, looks like it just doesn't work if more than the dateline are the same (like the reviewer line, as is the case when you have multiple unreviewed patches in ones git checkout).
Comment 2 Eric Seidel (no email) 2009-10-22 16:31:12 PDT
Created attachment 41695 [details]
Patch v1
Comment 3 Eric Seidel (no email) 2009-10-22 16:32:17 PDT
I'm not quite sure how I go about testing this.  I would love to test it, but we have no unit testing for our perl scripts. :( It's possible I could use the python infrastructure for testing some of this.
Comment 4 Eric Seidel (no email) 2009-10-23 13:23:02 PDT
Created attachment 41745 [details]
Share this code in more places
Comment 5 Eric Seidel (no email) 2009-10-23 13:23:30 PDT
I can't believe we had FOUR copies of this function.  No wonder our perl scripts are a sewer.
Comment 6 Eric Seidel (no email) 2009-10-23 13:24:36 PDT
Eh, my comment above was unfair.  I can understand the desire to have these scripts be stand-alone, but I don't think that it's worth the cost of having copy/paste code like this.  At least not functions as huge and complicated as this one.
Comment 7 Eric Seidel (no email) 2009-10-23 13:39:08 PDT
Created attachment 41746 [details]
Updated ChangeLog
Comment 8 Eric Seidel (no email) 2009-10-23 14:25:49 PDT
*** Bug 26730 has been marked as a duplicate of this bug. ***
Comment 9 Eric Seidel (no email) 2009-10-23 14:29:17 PDT
*** Bug 26865 has been marked as a duplicate of this bug. ***
Comment 10 David Kilzer (:ddkilzer) 2009-10-23 15:45:25 PDT
(In reply to comment #6)
> Eh, my comment above was unfair.  I can understand the desire to have these
> scripts be stand-alone, but I don't think that it's worth the cost of having
> copy/paste code like this.  At least not functions as huge and complicated as
> this one.

At the time I wrote this code, I was told that it was more important for the scripts to be standalone than to share code.
Comment 11 Eric Seidel (no email) 2009-10-23 15:48:29 PDT
(In reply to comment #10)
> (In reply to comment #6)
> > Eh, my comment above was unfair.  I can understand the desire to have these
> > scripts be stand-alone, but I don't think that it's worth the cost of having
> > copy/paste code like this.  At least not functions as huge and complicated as
> > this one.
> 
> At the time I wrote this code, I was told that it was more important for the
> scripts to be standalone than to share code.

Hum... It would that now that other code is being shared using VCSUtils.pm we should re-visit that decision.  If we decide to start sharing more code there is definitely a bunch more we can share yet.  I am very interested in your input on such a decision.  Obviously my bias above has already been stated. :)
Comment 12 David Kilzer (:ddkilzer) 2009-10-23 16:56:21 PDT
(In reply to comment #11)
> 
> Hum... It would that now that other code is being shared using VCSUtils.pm we
> should re-visit that decision.  If we decide to start sharing more code there
> is definitely a bunch more we can share yet.  I am very interested in your
> input on such a decision.  Obviously my bias above has already been stated. :)

Since we now have a way to share code using VCSUtils.pm, I'm all for doing that.  I originally wanted to share the code as well.
Comment 13 David Kilzer (:ddkilzer) 2009-10-23 17:31:34 PDT
Comment on attachment 41746 [details]
Updated ChangeLog

I'm on vacation through Oct. 28, so I can't really give a detailed review until then.

However, I see issues with this patch that require an r-:

- ChangeLog doesn't mention all the scripts affected.
- The new method in VCSUtils.pm has a debugging print() statement in it.
- The method Name added to the list of exported methods should be alphabetized.
- The method in VCSUtils.pm completely breaks the contract of the original method--that the entire patch should be returned unaltered if the method can't fix it.
- The method in VCSUtils.pm makes too many assumptions about the format of the patch, e.g., that the patch's first line of code is the fourth line of the text passed in.  While rare, there are some patches that don't include an "Index" or "diff" preamble line.

I think the best approach to fix this is to move the unaltered method into VCSUtils.pm first, then write a simple test harness in Perl that uses the VCSUtils module to read a patch from stdin and write the altered result to stdout.  Then this can be used to write test cases for different types of ChangeLog patches, then the method can be fixed for the case that this bug was filed for.
Comment 14 Eric Seidel (no email) 2009-10-26 15:00:48 PDT
(In reply to comment #13)

Thanks for the review!

> - The new method in VCSUtils.pm has a debugging print() statement in it.
Oops.  Fixed.

> - The method Name added to the list of exported methods should be alphabetized.
Fixed.

> - The method in VCSUtils.pm completely breaks the contract of the original
> method--that the entire patch should be returned unaltered if the method can't
> fix it.
My bad.  Fixed.

> - The method in VCSUtils.pm makes too many assumptions about the format of the
> patch, e.g., that the patch's first line of code is the fourth line of the text
> passed in.  While rare, there are some patches that don't include an "Index" or
> "diff" preamble line.

OK.  Fixed.

> I think the best approach to fix this is to move the unaltered method into
> VCSUtils.pm first

Ok.  bug 30791.

> then write a simple test harness in Perl that uses the
> VCSUtils module to read a patch from stdin and write the altered result to
> stdout.  Then this can be used to write test cases for different types of
> ChangeLog patches, then the method can be fixed for the case that this bug was
> filed for.

Eh.  Given our complete lack of unit testing in our perl scripts, perl's
seeming lack of built-in unit testing, I have little desire to invent a system.

So I've added some tests to scm_unittest.py instead.  Those drive svn-apply as
part of scm.py and should provide adequate coverage for this change.

Patch coming up!
Comment 15 Eric Seidel (no email) 2009-10-26 15:25:01 PDT
Created attachment 41906 [details]
Fix as suggested by Dave.  Still includes method moves so as not to depend on bug 30791.
Comment 16 David Kilzer (:ddkilzer) 2009-10-27 00:01:33 PDT
(In reply to comment #14)
> Eh.  Given our complete lack of unit testing in our perl scripts, perl's
> seeming lack of built-in unit testing, I have little desire to invent a system.
> 
> So I've added some tests to scm_unittest.py instead.  Those drive svn-apply as
> part of scm.py and should provide adequate coverage for this change.

Perl does have built-in testing.  See manpages for Test::Harness or Test::Simple--one of them comes with Perl.
Comment 17 Eric Seidel (no email) 2009-10-27 09:01:31 PDT
Created attachment 41956 [details]
Update after bug 30791
Comment 18 Eric Seidel (no email) 2009-10-27 11:58:55 PDT
(In reply to comment #16)
> (In reply to comment #14)
> > Eh.  Given our complete lack of unit testing in our perl scripts, perl's
> > seeming lack of built-in unit testing, I have little desire to invent a system.
> > 
> > So I've added some tests to scm_unittest.py instead.  Those drive svn-apply as
> > part of scm.py and should provide adequate coverage for this change.
> 
> Perl does have built-in testing.  See manpages for Test::Harness or
> Test::Simple--one of them comes with Perl.

Interesting.  It would appear that both are included on Mac OS X Leopard.  I do not know about other distros, but that's good to know!

I think the scm_unittest.py testing should be sufficient for this fix.  It covers the desired change, and has the added benefit of being a full-system test of svn-apply (which we did not have before).  But it's good to know that Test::* seem to be installed on mac os x for the future.
Comment 19 Eric Seidel (no email) 2009-11-03 12:03:42 PST
Ping?
Comment 20 David Kilzer (:ddkilzer) 2009-11-03 12:57:53 PST
(In reply to comment #19)
> Ping?

I'll try to look at this today.
Comment 21 David Kilzer (:ddkilzer) 2009-11-04 10:35:43 PST
Comment on attachment 41956 [details]
Update after bug 30791

> diff --git a/WebKitTools/Scripts/VCSUtils.pm b/WebKitTools/Scripts/VCSUtils.pm
> index e1e0bc2..ebf1190 100644
> --- a/WebKitTools/Scripts/VCSUtils.pm
> +++ b/WebKitTools/Scripts/VCSUtils.pm
> @@ -347,56 +347,64 @@ sub gitdiff2svndiff($)
>      return $_;
>  }
>  
> +# The diff(1) command is greedy when matching lines, so a new ChangeLog entry will
> +# have lines of context at the top of a patch when the existing entry has the same
> +# date and author as the new entry.  Alter the ChangeLog patch so
> +# that the added lines ("+") in the patch always start at the beginning of the
> +# patch and there are no initial lines of context.
>  sub fixChangeLogPatch($)
>  {
> +    my $patch = shift; # $patch will only contain patch fragments for ChangeLog.
> +
> +    $patch =~ /(\r?\n)/;
> +    my $lineEnding = $1;
> +    my @patchLines = split(/$lineEnding/, $patch);
> +
> +    # e.g. 2009-06-03  Eric Seidel  <eric@webkit.org>
> +    my $dateLineRegexpString = '^\+(\d{4}-\d{2}-\d{2})' # Consume the leading '+' and the date.
> +                             . '\s+(.+)\s+' # Consume the name.
> +                             . '<([^<>]+)>$'; # And finally the email address.
> +
> +    # Figure out where the patch contents start and stop.
> +    my $patchHeaderIndex;
> +    my $firstContentIndex;
> +    my $trailingContextIndex;
> +    my $dateIndex;
> +    my $patchEndIndex = scalar(@patchLines);
> +    for (my $index = 0; $index < @patchLines; ++$index) {
> +        my $line = $patchLines[$index];
> +        if ($line =~ /^\@\@ -\d+,\d+ \+\d+,\d+ \@\@$/) { # e.g. @@ -1,5 +1,18 @@
> +            if ($patchHeaderIndex) {
> +                $patchEndIndex = $index; # We only bother to fix up the first patch fragment.
> +                last;
>              }
> +            $patchHeaderIndex = $index;
>          }
> +        $firstContentIndex = $index if ($patchHeaderIndex && !$firstContentIndex && $line =~ /^\+[^+]/); # Only match after finding patchHeaderIndex, otherwise we'd match "+++".
> +        $dateIndex = $index if ($line =~ /$dateLineRegexpString/);
> +        $trailingContextIndex = $index if ($firstContentIndex && !$trailingContextIndex && $line =~ /^ /);
>      }
> +    my $contentLineCount = $trailingContextIndex - $firstContentIndex;
> +    my $trailingContextLineCount = $patchEndIndex - $trailingContextIndex;
> +
> +    # If we didn't find a date line in the content then this is not a patch we should try and fix.
> +    return $patch if (!$dateIndex);
> +
> +    # We only need to do anything if the date line is not the first content line.
> +    return $patch if ($dateIndex == $firstContentIndex);
> +
> +    # Write the new patch.
> +    my $totalNewContentLines = $contentLineCount + $trailingContextLineCount;
> +    $patchLines[$patchHeaderIndex] = "@@ -1,$trailingContextLineCount +1,$totalNewContentLines @@"; # Write a new header.
> +    my @repeatedLines = splice(@patchLines, $dateIndex, $trailingContextIndex - $dateIndex); # The date line and all the content after it that diff saw as repeated.
> +    splice(@patchLines, $firstContentIndex, 0, @repeatedLines); # Move the repeated content to the top.
> +    foreach my $line (@repeatedLines) {
> +        $line =~ s/^\+/ /;
> +    }
> +    splice(@patchLines, $trailingContextIndex, $patchEndIndex, @repeatedLines); # Replace trailing context with the repeated content.
> +    splice(@patchLines, $patchHeaderIndex + 1, $firstContentIndex - $patchHeaderIndex - 1); # Remove any leading context.
>  
> +    return join($lineEnding, @patchLines) . "\n"; # patch(1) expects an extra trailing newline.
>  }

After thinking about this some more, there's an even easier way to mangle the patch.  Basically, assuming the patch is 0-or-more unchanged lines, 1-or-more added lines (+), and then 1-or-more unchanged lines, all you really have to do is separate out the added lines from the unchanged lines, reorder the added lines starting with the date line and wrapping around to the top, then output the reordered added lines plus the first three lines of unchanged lines.  (This works because the added lines always contain the complete patch, but because of the vagaries of diff, they are sometimes out of order.)

r=me to unblock other issues.  We can revisit this later if needed.
Comment 22 Eric Seidel (no email) 2009-11-04 22:18:36 PST
(In reply to comment #21)
> After thinking about this some more, there's an even easier way to mangle the
> patch.  Basically, assuming the patch is 0-or-more unchanged lines, 1-or-more
> added lines (+), and then 1-or-more unchanged lines, all you really have to do
> is separate out the added lines from the unchanged lines, reorder the added
> lines starting with the date line and wrapping around to the top, then output
> the reordered added lines plus the first three lines of unchanged lines.  (This
> works because the added lines always contain the complete patch, but because of
> the vagaries of diff, they are sometimes out of order.)
> 
> r=me to unblock other issues.  We can revisit this later if needed.

I'm not sure I understand.  What you describe sounds a lot like what I tried to do in the code.  There must be some subtly I'm missing in your explanation.
Comment 23 WebKit Commit Bot 2009-11-04 23:07:08 PST
Comment on attachment 41956 [details]
Update after bug 30791

Clearing flags on attachment: 41956

Committed r50547: <http://trac.webkit.org/changeset/50547>
Comment 24 WebKit Commit Bot 2009-11-04 23:07:15 PST
All reviewed patches have been landed.  Closing bug.
Comment 25 David Kilzer (:ddkilzer) 2009-11-05 05:20:09 PST
(In reply to comment #22)
> (In reply to comment #21)
> > After thinking about this some more, there's an even easier way to mangle the
> > patch.  Basically, assuming the patch is 0-or-more unchanged lines, 1-or-more
> > added lines (+), and then 1-or-more unchanged lines, all you really have to do
> > is separate out the added lines from the unchanged lines, reorder the added
> > lines starting with the date line and wrapping around to the top, then output
> > the reordered added lines plus the first three lines of unchanged lines.  (This
> > works because the added lines always contain the complete patch, but because of
> > the vagaries of diff, they are sometimes out of order.)
> 
> I'm not sure I understand.  What you describe sounds a lot like what I tried to
> do in the code.  There must be some subtly I'm missing in your explanation.

Yes, but it seemed like the algorithm was more complicated than it needed to be in that it still had too many state variables.  Still, it's an improvement over my previous code, which was even more complex.  I don't feel any particular need to rewrite it again, though.  :)
Comment 26 Eric Seidel (no email) 2009-11-10 11:34:19 PST
*** Bug 30687 has been marked as a duplicate of this bug. ***
Comment 27 Chris Jerdonek 2009-12-27 10:44:05 PST
FYI, I have a proposed revision to fixChangeLogPatch:

https://bugs.webkit.org/show_bug.cgi?id=32919

I've added unit tests as suggested here.