Bug 38885

Summary: Add infrastructure to parse SVN property changes
Product: WebKit Reporter: Daniel Bates <dbates>
Component: Tools / TestsAssignee: Daniel Bates <dbates>
Status: RESOLVED FIXED    
Severity: Normal CC: cjerdonek, eric, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 39170, 39184    
Bug Blocks: 39409    
Attachments:
Description Flags
Patch with unit tests
none
Patch with unit tests
none
Patch with unit tests
none
Patch with unit tests cjerdonek: review+

Description Daniel Bates 2010-05-10 20:52:58 PDT
Towards adding executable bit support to svn-apply and svn-unapply, we need to add support for parsing SVN property change diffs.

SVN property change diffs have the following forms:

Property changes on: Makefile
___________________________________________________________________
Added: svn:executable
   + *

OR

Property changes on: Makefile
___________________________________________________________________
Deleted: svn:executable
   + *

OR

Property changes on: Makefile
___________________________________________________________________
Name: svn:executable
   + *
Comment 1 Daniel Bates 2010-05-10 20:54:48 PDT
Created attachment 55654 [details]
Patch with unit tests
Comment 2 Daniel Bates 2010-05-10 21:07:21 PDT
Comment on attachment 55654 [details]
Patch with unit tests

Noticed an issue with this patch, will correct.
Comment 3 Daniel Bates 2010-05-10 21:17:45 PDT
Created attachment 55655 [details]
Patch with unit tests
Comment 4 Chris Jerdonek 2010-05-10 21:26:20 PDT
(In reply to comment #0)
> SVN property change diffs have the following forms:
> 
> Property changes on: Makefile
> ___________________________________________________________________
> Added: svn:executable
>    + *

For completeness, can you also include enough examples so the pattern is clear when there are multiple additions or, say, additions combined with deletions, etc?  In particular, does each added property get its own "Added" line, or do multiple additions get a single "Added" line?
Comment 5 Daniel Bates 2010-05-10 22:20:06 PDT
(In reply to comment #4)
> (In reply to comment #0)
> > SVN property change diffs have the following forms:
> > 
> > Property changes on: Makefile
> > ___________________________________________________________________
> > Added: svn:executable
> >    + *
> 
> For completeness, can you also include enough examples so the pattern is clear when there are multiple additions or, say, additions combined with deletions, etc?  In particular, does each added property get its own "Added" line, or do multiple additions get a single "Added" line?

Sure, I'll add more test cases.

Yes, each property gets its own Added/Deleted/Name line followed by a line of the form: +/- V, where V is the the value of the property. Take V := '*' for the svn:executable property.
Comment 6 Daniel Bates 2010-05-10 23:20:41 PDT
Created attachment 55669 [details]
Patch with unit tests

Added additional unit tests. Also, while testing found that property change lines can be separated by whitespace/empty lines, so added support for this.
Comment 7 WebKit Review Bot 2010-05-10 23:27:59 PDT
Attachment 55669 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebKitTools/Scripts/webkitperl/VCSUtils_unittest/parseSvnDiffFooter.pl:140:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 1 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Chris Jerdonek 2010-05-11 09:26:09 PDT
(In reply to comment #6)
> Created an attachment (id=55669) [details]

Some comments on the first portion of the patch:

> Index: WebKitTools/ChangeLog
> +        * Scripts/VCSUtils.pm:

In the ChangeLog, it is generally good to comment on the per-file changes
for each file.  For example, can you at least include the names of the new
subroutines added to VCSUtils.pm?

> Index: WebKitTools/Scripts/VCSUtils.pm
>  my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#;
>  my $svnDiffStartRegEx = qr#^Index: ([^\r\n]+)#;
> +my $svnFooterStartRegEx = qr#^Property changes on: ([^\r\n]+)#;
> +my $svnPropertyNameRegEx = qr#^(Name|Added|Deleted): ([^\r\n]+)#;

It might be good to comment these two variables.  For example, it might be
nice to say here what $1 is (will normally be the same as the index path).
For the second, it might be good to state what $1 is and also clarify that the
reg ex marks the beginning of a change to a particular property instead of
to a section of additions (for example).

For the second, might $svnPropertyChangeStartRegEx or $svnPropertyStartRegEx
be clearer since the line marks the beginning of a property change in the
"Property changes" section and when parseSvnProperty() should be called?

> +# Args:
> +#   $fileHandle: advanced so the last line read from the handle is either the
> +#                the first occurence of an empty line or the first line of the

occurrence

Since this is in the args section, the $fileHandle comment should be explaining
the state of the file handle going into the subroutine.  Can you clarify the scoping
in the phrase "first occurrence of an empty line"?  This could be taken to mean
the first empty line in the diff, which is obviously not right.  Also, do we really
want to be saying that an empty line is what marks the beginning of the footer?
It seems like we can always require it to be the "Property changes" line.

> +#                next header to parse.  For SVN-formatted diffs, this is a line
> +#                beginning with "Index:".  For Git, this is a line beginning
> +#                with "diff --git".

It looks like you need to rewrite this in terms of footer language.

> +#   $line: the line last read from $fileHandle

Missing a period.

> +# Returns ($footerHashRef, $lastReadLine):
> +#   $footerHashRef: a hash reference representing a diff footer
> +#     propertyPath: the path of the target file.

It might be good to add that this should normally be the same as the
"index path" for the diff.

> +sub parseSvnDiffFooter($$)

> +        if (/$svnFooterStartRegEx/) {
> +            $propertyPath = $1;

Would something like "filePath" be clearer?  propertyPath makes it seem like
it is a path to a property.

> +        } elsif (/$svnPropertyNameRegEx/) {
> +            ($propertyHashRef, $_) = parseSvnProperty($fileHandle, $_);
> +            $footer{executableBitDelta} = $propertyHashRef->{propertyChangeDelta} if $propertyHashRef->{name} eq "svn:executable";

Since it doesn't look like you're storing any property changes other than
the executable bit one, it might be good to add a FIXME somewhere (like in
or near the description of the subroutine) to add support for parsing
all properties.

When this method is modified to support all properties, it seems like it
would make sense to store all the properties as a generic hash of key-value
pairs, so that this subroutine will not need to have any special knowledge of
the property representing the executable bit.  Then the caller could be
responsible for examining the hash and converting that information into an
executableBitDelta for the diff.  Maybe you can include a FIXME here
suggesting at future changes to this subroutine.

> +        } elsif (/^\s*$/x || $_ eq '_' x 67) {
> +            # Skip empty/whitespace-only lines and the divider line.
> +        } else {
> +            # Some other content, such as the binary portion of an SVN binary diff.
> +            last;

Since the binary lines are getting skipped over, how are they getting preserved
so the binary changes can later be applied?

> +        }
> +
> +        $_ = <$fileHandle>; # Not defined if end-of-file reached.
> +
> +        last if (!defined($_) || /$svnDiffStartRegEx/ || /$gitDiffStartRegEx/);

Are we supporting patches with Git and SVN diffs mixed together?  It doesn't
seem like this SVN-specific method needs to know about Git diffs.

> +    }
> +
> +    if (!$propertyPath) {
> +        die("First line of SVN diff does not begin with \"Property changes on \": \"$_\".");

It seems like the logic above doesn't require the first line to be the
"Property changes" line.  It seems like it only requires that some line
be a "Property changes" line.  Is there any reason not to check the first
line at the beginning of the method?
Comment 9 Chris Jerdonek 2010-05-11 09:48:38 PDT
(In reply to comment #5)
> Sure, I'll add more test cases.
> 
> Yes, each property gets its own Added/Deleted/Name line followed by a line of the form: +/- V, where V is the the value of the property. Take V := '*' for the svn:executable property.

Thanks for this extra information and for the added test cases.  Do you know when "Name" is used versus Added/Deleted?

It looks like "Modified" is also a possibility, and for there to be both a + and - line after the first line.  For example, I tried modifying the svn:ignore property for a directory and got this:

Property changes on: WebKitTools/Scripts/webkitpy
___________________________________________________________________
Modified: svn:ignore
   - commands
init
irc
patch
steps
*.pyc
autoinstall.cache.d

   + commands
init
patch
steps
*.pyc
autoinstall.cache.d
test-add
Comment 10 Daniel Bates 2010-05-11 15:16:01 PDT
(In reply to comment #8)
> (In reply to comment #6)
> > Created an attachment (id=55669) [details] [details]
> 
> Some comments on the first portion of the patch:
> 
> > Index: WebKitTools/ChangeLog
> > +        * Scripts/VCSUtils.pm:
> 
> In the ChangeLog, it is generally good to comment on the per-file changes
> for each file.  For example, can you at least include the names of the new
> subroutines added to VCSUtils.pm?

Yes, I agree. Will change.

> 
> > Index: WebKitTools/Scripts/VCSUtils.pm
> >  my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#;
> >  my $svnDiffStartRegEx = qr#^Index: ([^\r\n]+)#;
> > +my $svnFooterStartRegEx = qr#^Property changes on: ([^\r\n]+)#;
> > +my $svnPropertyNameRegEx = qr#^(Name|Added|Deleted): ([^\r\n]+)#;
> 
> It might be good to comment these two variables.  For example, it might be
> nice to say here what $1 is (will normally be the same as the index path).

Will add comment.

> For the second, it might be good to state what $1 is and also clarify that the
> reg ex marks the beginning of a change to a particular property instead of
> to a section of additions (for example).

Will add comment.

> 
> For the second, might $svnPropertyChangeStartRegEx or $svnPropertyStartRegEx
> be clearer since the line marks the beginning of a property change in the
> "Property changes" section and when parseSvnProperty() should be called?

Yeah, I was not sure what to call this. I'll go with $svnPropertyChangeStartRegEx.

> 
> > +# Args:
> > +#   $fileHandle: advanced so the last line read from the handle is either the
> > +#                the first occurence of an empty line or the first line of the
> 
> occurrence
> 
> Since this is in the args section, the $fileHandle comment should be explaining
> the state of the file handle going into the subroutine.  Can you clarify the scoping
> in the phrase "first occurrence of an empty line"?  This could be taken to mean
> the first empty line in the diff, which is obviously not right.  Also, do we really
> want to be saying that an empty line is what marks the beginning of the footer?
> It seems like we can always require it to be the "Property changes" line.

Oops, this used to describe the state of the fileHandle after the subroutine returns (in the old patch: <https://bugs.webkit.org/attachment.cgi?id=55655>). The empty line comment no longer applies to <https://bugs.webkit.org/attachment.cgi?id=55669>. Will update comment. 

> 
> > +#                next header to parse.  For SVN-formatted diffs, this is a line
> > +#                beginning with "Index:".  For Git, this is a line beginning
> > +#                with "diff --git".
> 
> It looks like you need to rewrite this in terms of footer language.

See above response. Will update comment.

> 
> > +#   $line: the line last read from $fileHandle
> 
> Missing a period.

Will fix.

> 
> > +# Returns ($footerHashRef, $lastReadLine):
> > +#   $footerHashRef: a hash reference representing a diff footer
> > +#     propertyPath: the path of the target file.
> 
> It might be good to add that this should normally be the same as the
> "index path" for the diff.

Will update comment to reflect this.

> 
> > +sub parseSvnDiffFooter($$)
> 
> > +        if (/$svnFooterStartRegEx/) {
> > +            $propertyPath = $1;
> 
> Would something like "filePath" be clearer?  propertyPath makes it seem like
> it is a path to a property.
> 
> > +        } elsif (/$svnPropertyNameRegEx/) {
> > +            ($propertyHashRef, $_) = parseSvnProperty($fileHandle, $_);
> > +            $footer{executableBitDelta} = $propertyHashRef->{propertyChangeDelta} if $propertyHashRef->{name} eq "svn:executable";
> 
> Since it doesn't look like you're storing any property changes other than
> the executable bit one, it might be good to add a FIXME somewhere (like in
> or near the description of the subroutine) to add support for parsing
> all properties.

Will do.

> 
> When this method is modified to support all properties, it seems like it
> would make sense to store all the properties as a generic hash of key-value
> pairs, so that this subroutine will not need to have any special knowledge of
> the property representing the executable bit.  Then the caller could be
> responsible for examining the hash and converting that information into an
> executableBitDelta for the diff.  Maybe you can include a FIXME here
> suggesting at future changes to this subroutine.

Yes, this was my original thought as well. Will add a FIXME for now.

> 
> > +        } elsif (/^\s*$/x || $_ eq '_' x 67) {
> > +            # Skip empty/whitespace-only lines and the divider line.
> > +        } else {
> > +            # Some other content, such as the binary portion of an SVN binary diff.
> > +            last;
> 
> Since the binary lines are getting skipped over, how are they getting preserved
> so the binary changes can later be applied?

The binary lines should not be skipped over. Disregarding the case where the caller of parseSvnFooterDiff does not pass the start of an SVN property change entry (i.e. starts with "Property changes on ..."), by the last "else" clause we break out of the loop and will return this line so that the caller (say, parseDiff() can continue processing the diff). Therefore, it will be the caller's responsibility to to preserve the binary patch contents. See test cases in parseSvnDiffFooter.pl: "binary file (diff snippet) with executable bit set", and  "binary file (diff snippet) with executable bit set (with empty line)".

> 
> > +        }
> > +
> > +        $_ = <$fileHandle>; # Not defined if end-of-file reached.
> > +
> > +        last if (!defined($_) || /$svnDiffStartRegEx/ || /$gitDiffStartRegEx/);
> 
> Are we supporting patches with Git and SVN diffs mixed together?  It doesn't
> seem like this SVN-specific method needs to know about Git diffs.

I agree, we shouldn't have mixing. Will change.

> 
> > +    }
> > +
> > +    if (!$propertyPath) {
> > +        die("First line of SVN diff does not begin with \"Property changes on \": \"$_\".");
> 
> It seems like the logic above doesn't require the first line to be the
> "Property changes" line.  It seems like it only requires that some line
> be a "Property changes" line.  Is there any reason not to check the first
> line at the beginning of the method?

In the process of refactoring this, I wound up moving it into the loop. I believe we can factor this out of the loop and do the testing at the beginning of the method.
Comment 11 Daniel Bates 2010-05-11 15:19:03 PDT
(In reply to comment #9)
> (In reply to comment #5)
> > Sure, I'll add more test cases.
> > 
> > Yes, each property gets its own Added/Deleted/Name line followed by a line of the form: +/- V, where V is the the value of the property. Take V := '*' for the svn:executable property.
> 
> Thanks for this extra information and for the added test cases.  Do you know when "Name" is used versus Added/Deleted?

For some versions of SVN < 1.6.5, such as 1.4.4 (I did not determine the exact SVN version that changed the format from "Name" to "Added", "Deleted", "Modified".

> 
> It looks like "Modified" is also a possibility, and for there to be both a + and - line after the first line.  For example, I tried modifying the svn:ignore property for a directory and got this:
> 
> Property changes on: WebKitTools/Scripts/webkitpy
> ___________________________________________________________________
> Modified: svn:ignore
>    - commands
> init
> irc
> patch
> steps
> *.pyc
> autoinstall.cache.d
> 
>    + commands
> init
> patch
> steps
> *.pyc
> autoinstall.cache.d
> test-add

Will add "Modified".
Comment 12 Chris Jerdonek 2010-05-11 16:13:28 PDT
(In reply to comment #10)

Thanks a lot for your responses, Dan, and for incorporating that feedback.

> > > +        } else {
> > > +            # Some other content, such as the binary portion of an SVN binary diff.
> > > +            last;
> > 
> > Since the binary lines are getting skipped over, how are they getting preserved
> > so the binary changes can later be applied?
> 
> The binary lines should not be skipped over. Disregarding the case where the 

Sorry for the confusion on this point.  I had misread that "last" to be "next".
Comment 13 Daniel Bates 2010-05-19 23:35:32 PDT
Created attachment 56567 [details]
Patch with unit tests
Comment 14 Chris Jerdonek 2010-05-20 05:30:16 PDT
Comment on attachment 56567 [details]
Patch with unit tests

Thanks for the revised patch.  Some suggestions (mostly about comments) to consider before landing.

> +# Parse the next SVN diff footer from the given file handle, and advance
> +# the handle so the last line read is the first line after the footer.
> +#
> +# This subroutine dies if given leading junk.

I would just say "Parse an SVN diff footer...."  I would also not use the
phrase "leading junk" here since it is usually used in the context of
leading junk at the beginning of a patch (e.g. the e-mail stuff that appears
at the beginning of git-format-patch) rather than in the middle.

It might be nice to add a comment here that for the special case of an
SVN binary diff, there can be binary contents after the footer (so
technically speaking, the "Property changes" section is not the full
footer).  Should we be calling this parseSvnPropertyChanges() since it's
not really always the footer?

> +    if (defined($_) && /^\Q$separator\E[\r\n]+$/) {

Do you need to be using \Q/\E here?  If so, I would add a comment why since
it's a bit more obscure.

> +    # FIXME: We should expand this to support other SVN properties.
> +    # Notice, we keep processing until we hit end-of-file or some

I would add a blank line after FIXME line so it's clearer that the second
and subsequent lines is not part of the FIXME.

Also, did you want to hint at possible implementation for other SVN
properties -- e.g. by returning a hash of property key-values that
represents all properties?  The line you wrote makes it seem like the 
FIXME is to make the function explicitly more aware of other properties.
In some sense, returning a hash makes the function less aware of the 
specific properties since the special handling of svn:executable will be
going in the calling code with the implementation we've discussed
(returning a hash).

> +            # FIXME: Should the diff specify multiple values for svn:executable
> +            #        (hence be malformed) then we set the value to the last
> +            #        svn:executable property. We may want to consider some
> +            #        better error handling.

I probably wouldn't bother with this FIXME.  We shouldn't need to be validating
every aspect of a patch.  And "last wins" is a valid logic to follow.

> +####
> +# Simple test cases
> +##

This looks good.  I like how you start the text closer in. :)

> +{
> +    # New test
> +    diffName => "add svn:executable, followed by empty line and start of next property diff",
> +    inputText => <<'END',
> +Property changes on: FileA
> +___________________________________________________________________
> +Added: svn:executable
> +   + *
> +
> +Property changes on: Makefile.shared
> +END

Can this case ever happen?  Is this what happens if a property change
is the only change to a file?

> +####
> +# Property value followed by empty line and start of binary patch
> +##

I would say "binary contents" as binary patch makes it seem like it's the
start of a new diff.

> +Added: documentation
> +   + A sentence.

This is not a complete sentence.  JK. :)
Comment 15 Daniel Bates 2010-05-21 21:27:50 PDT
(In reply to comment #14)
> (From update of attachment 56567 [details])
> Thanks for the revised patch.  Some suggestions (mostly about comments) to consider before landing.
> 
> > +# Parse the next SVN diff footer from the given file handle, and advance
> > +# the handle so the last line read is the first line after the footer.
> > +#
> > +# This subroutine dies if given leading junk.
> 
> I would just say "Parse an SVN diff footer...."  I would also not use the
> phrase "leading junk" here since it is usually used in the context of
> leading junk at the beginning of a patch (e.g. the e-mail stuff that appears
> at the beginning of git-format-patch) rather than in the middle.

Will change.

> It might be nice to add a comment here that for the special case of an
> SVN binary diff, there can be binary contents after the footer (so
> technically speaking, the "Property changes" section is not the full
> footer).  Should we be calling this parseSvnPropertyChanges() since it's
> not really always the footer?

I can add a comment about the binary diff. With regards to the name, what are your thoughts on parseSvnDiffPropertyChanges? This would follow our naming of parseDiff, parseDiffHeader, parseSvnDiffHeader, et cetera.

> 
> > +    if (defined($_) && /^\Q$separator\E[\r\n]+$/) {
> 
> Do you need to be using \Q/\E here?  If so, I would add a comment why since
> it's a bit more obscure.

I'll remove the \Q and \E.

> 
> > +    # FIXME: We should expand this to support other SVN properties.
> > +    # Notice, we keep processing until we hit end-of-file or some
> 
> I would add a blank line after FIXME line so it's clearer that the second
> and subsequent lines is not part of the FIXME.

Will change.

> Also, did you want to hint at possible implementation for other SVN
> properties -- e.g. by returning a hash of property key-values that
> represents all properties?  The line you wrote makes it seem like the 
> FIXME is to make the function explicitly more aware of other properties.
> In some sense, returning a hash makes the function less aware of the 
> specific properties since the special handling of svn:executable will be
> going in the calling code with the implementation we've discussed
> (returning a hash).

Sure, I'll elaborate a bit further in the FIXME.

> 
> > +            # FIXME: Should the diff specify multiple values for svn:executable
> > +            #        (hence be malformed) then we set the value to the last
> > +            #        svn:executable property. We may want to consider some
> > +            #        better error handling.
> 
> I probably wouldn't bother with this FIXME.  We shouldn't need to be validating
> every aspect of a patch.  And "last wins" is a valid logic to follow.

Will remove.

> 
> > +####
> > +# Simple test cases
> > +##
> 
> This looks good.  I like how you start the text closer in. :)

:-)

> 
> > +{
> > +    # New test
> > +    diffName => "add svn:executable, followed by empty line and start of next property diff",
> > +    inputText => <<'END',
> > +Property changes on: FileA
> > +___________________________________________________________________
> > +Added: svn:executable
> > +   + *
> > +
> > +Property changes on: Makefile.shared
> > +END
> 
> Can this case ever happen?  Is this what happens if a property change
> is the only change to a file?

Yes, it can happen for the reason you stated. That is, a property change may be the only change to a file.

> > +####
> > +# Property value followed by empty line and start of binary patch
> > +##
> 
> I would say "binary contents" as binary patch makes it seem like it's the
> start of a new diff.

Will change.

> 
> > +Added: documentation
> > +   + A sentence.
> 
> This is not a complete sentence.  JK. :)

Will change.
Comment 16 Chris Jerdonek 2010-05-21 22:00:26 PDT
(In reply to comment #15)
> > It might be nice to add a comment here that for the special case of an
> > SVN binary diff, there can be binary contents after the footer (so
> > technically speaking, the "Property changes" section is not the full
> > footer).  Should we be calling this parseSvnPropertyChanges() since it's
> > not really always the footer?
> 
> I can add a comment about the binary diff. With regards to the name, what are your thoughts on parseSvnDiffPropertyChanges? This would follow our naming of parseDiff, parseDiffHeader, parseSvnDiffHeader, et cetera.

In that case, perhaps parseSvnDiffProperties() to keep it a tad shorter?

> > > +Added: documentation
> > > +   + A sentence.
> > 
> > This is not a complete sentence.  JK. :)
> 
> Will change.

That was a joke, so no need (JK = just kidding).  But you can if you want.
Comment 17 Daniel Bates 2010-05-22 14:40:55 PDT
Committed r60015: <http://trac.webkit.org/changeset/60015>