RESOLVED FIXED 111066
svn-apply cannot apply patches which is generated by git to files that contain space characters in their path
https://bugs.webkit.org/show_bug.cgi?id=111066
Summary svn-apply cannot apply patches which is generated by git to files that contai...
Yuki Sekiguchi
Reported 2013-02-28 03:50:37 PST
WebKit Review Bot fail to apply the patch: https://bug-111042-attachments.webkit.org/attachment.cgi?id=190663 > Failed to run "['/mnt/git/webkit-commit-queue/Tools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', '--bot-id=gce-cq-03', 'apply-attachment', '--no-update', '--non-interactive', 190663, '--port=chromium-xvfb']" exit_code: 2 cwd: /mnt/git/webkit-commit-queue > > Logging in as webkit.review.bot@gmail.com... > Fetching: https://bugs.webkit.org/attachment.cgi?id=190663&action=edit > Fetching: https://bugs.webkit.org/show_bug.cgi?id=111042&ctype=xml&excludefield=attachmentdata > Processing 1 patch from 1 bug. > Processing patch 190663 from bug 111042. > Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Alexey Proskuryakov']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue > > Parsed 2 diffs from patch file(s). > patching file ChangeLog > patching file 'WebProcess).xcscheme' > Hunk #1 FAILED at 161. > 1 out of 1 hunk FAILED -- saving rejects to file 'WebProcess).xcscheme.rej' > > Failed to run "[u'/mnt/git/webkit-commit-queue/Tools/Scripts/svn-apply', '--force', '--reviewer', 'Alexey Proskuryakov']" exit_code: 1 cwd: /mnt/git/webkit-commit-queue
Attachments
Patch (2.52 KB, patch)
2013-02-28 04:12 PST, Yuki Sekiguchi
no flags
Patch (14.53 KB, patch)
2013-02-28 22:07 PST, Yuki Sekiguchi
no flags
Patch (18.48 KB, patch)
2013-05-23 22:17 PDT, Yuki Sekiguchi
no flags
Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 (683.94 KB, application/zip)
2013-05-24 00:29 PDT, Build Bot
no flags
Patch (20.46 KB, patch)
2013-05-26 23:18 PDT, Yuki Sekiguchi
no flags
Patch (18.23 KB, patch)
2013-06-04 04:54 PDT, Yuki Sekiguchi
no flags
Patch (20.48 KB, patch)
2013-06-05 01:31 PDT, Yuki Sekiguchi
no flags
Patch (20.54 KB, patch)
2013-06-05 23:56 PDT, Yuki Sekiguchi
no flags
Yuki Sekiguchi
Comment 1 2013-02-28 04:12:57 PST
Yuki Sekiguchi
Comment 2 2013-02-28 04:21:39 PST
Hi Daniel, Could you review this patch? I heard you are the best person to review svn-apply from Vivek.
Daniel Bates
Comment 3 2013-02-28 16:55:56 PST
Comment on attachment 190699 [details] Patch We need a test for this change. See the existing tests for this function in Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl. Run Tools/Scripts/test-webkitperl to run the Perl unit tests.
Yuki Sekiguchi
Comment 4 2013-02-28 22:07:30 PST
Yuki Sekiguchi
Comment 5 2013-02-28 22:22:51 PST
Hi Daniel, Thank you for reviewing. (In reply to comment #3) > (From update of attachment 190699 [details]) > We need a test for this change. See the existing tests for this function in Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl. Run Tools/Scripts/test-webkitperl to run the Perl unit tests. I added test cases and update expectations. I found that my fix break --no-prefix diff. I added support --no-prefix diff with empty file path.
Daniel Bates
Comment 6 2013-05-19 23:30:34 PDT
*** Bug 115281 has been marked as a duplicate of this bug. ***
Daniel Bates
Comment 7 2013-05-19 23:33:11 PDT
Comment on attachment 190883 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=190883&action=review Please excuse my tardy reply. Work has been very hectic. Regardless, I will work to be more responsive moving forward. Feel free to ping me via either email or IRC to expedite a response. This patch won't work for git diffs generated using: git diff --no-prefix --find-renames --find-copies. For example, the following diff renames the file named "Primary File" to "Secondary File": diff --git Primary File Secondary File similarity index 100% rename from Primary File rename to Secondary File Maybe it's uncommon to rename or copy a file whose path contains a space character? > Tools/Scripts/VCSUtils.pm:753 > + $_ = "--- $indexPath\t(revision 0)"; # Convert to SVN format. We should add a comment above this line that explains that we explicitly emit the suffix "\t(revision 0)" so that both the command patch(1) won't complain when $indexPath contains a space character and to more closely match the SVN diff format. > Tools/Scripts/VCSUtils.pm:755 > + $_ = "+++ $indexPath\t(revision 0)"; # Convert to SVN format. I suggest we emit the suffix "(working copy)" instead of "(revision 0)" in this line so as match the output of the command svn diff. Additionally, we should add a comment above this line that explains that we explicitly emit the suffix "\t(working copy)" so that both the command patch(1) won't complain when $indexPath contains a space character and to more closely match the SVN diff format.
Alexey Proskuryakov
Comment 8 2013-05-20 09:24:07 PDT
Yuki Sekiguchi
Comment 9 2013-05-23 22:17:58 PDT
Yuki Sekiguchi
Comment 10 2013-05-23 23:16:25 PDT
Hi Daniel, (In reply to comment #7) > (From update of attachment 190883 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=190883&action=review > > Please excuse my tardy reply. Work has been very hectic. Regardless, I will work to be more responsive moving forward. Feel free to ping me via either email or IRC to expedite a response. No problem. Thank your for your help. > This patch won't work for git diffs generated using: git diff --no-prefix --find-renames --find-copies. For example, the following diff renames the file named "Primary File" to "Secondary File": Fixed that case. > > Tools/Scripts/VCSUtils.pm:753 > > + $_ = "--- $indexPath\t(revision 0)"; # Convert to SVN format. > > We should add a comment above this line that explains that we explicitly emit the suffix "\t(revision 0)" so that both the command patch(1) won't complain when $indexPath contains a space character and to more closely match the SVN diff format. > > > Tools/Scripts/VCSUtils.pm:755 > > + $_ = "+++ $indexPath\t(revision 0)"; # Convert to SVN format. > > I suggest we emit the suffix "(working copy)" instead of "(revision 0)" in this line so as match the output of the command svn diff. > > Additionally, we should add a comment above this line that explains that we explicitly emit the suffix "\t(working copy)" so that both the command patch(1) won't complain when $indexPath contains a space character and to more closely match the SVN diff format. Changed suffix to "(working copy)" and added comment.
Build Bot
Comment 11 2013-05-24 00:29:25 PDT
Comment on attachment 202766 [details] Patch Attachment 202766 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.appspot.com/results/695004 New failing tests: editing/style/5065910.html
Build Bot
Comment 12 2013-05-24 00:29:27 PDT
Created attachment 202776 [details] Archive of layout-test-results from webkit-ews-09 for mac-mountainlion-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: webkit-ews-09 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.3
Daniel Bates
Comment 13 2013-05-24 12:33:33 PDT
Comment on attachment 202766 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=202766&action=review Thank you Yuki for updating the patch. The idea presented in this patch is good. I suggest that we encapsulate the parsing machinery to determine the path to the target file into a separate function, say parseGitDiffHeaderForIndexPath, that takes a reference to a file handle F and the last line read from F as input and returns the path to the target file as output (or the empty string on error). parseGitDiffHeaderForIndexPath() will guarantee that F isn't advanced when it returns. That is, if F is at byte b_0 when parseGitDiffHeaderForIndexPath() is called then F will be at byte b_0 when the call returns. This function would have the form: # Parse the Git diff header for the index path from the given file handle. # # This subroutine doesn't advance the handle. # # Args: # $fileHandle: a file handle; this should be at a line beginning # with "diff --git". The handle will not be advanced. # $line: the line last read from $fileHandle # # Returns $indexPath: the path of the target file. sub parseGitDiffHeaderForIndexPath($$) { my ($fileHandle, $line) = @_; $_ = $line; if (/$gitDiffStartWithPrefixRegEx/ || /$gitDiffStartWithoutPrefixNoSpaceRegEx/) { return $2; } # Assume the diff was generated with --no-prefix (e.g. git diff --no-prefix). my $sourcePath = ""; my $destinationPath = ""; my $savedPosition = tell($fileHandle); while (1) { # Temporarily strip off any end-of-line characters to simplify # regex matching below. s/([\n\r]+)$//; my $foundHeaderEnding; if (/^(copy|rename) to ([^\t\r\n]+)/) { $destinationPath = $2; } elsif (/^--- ([^\t\r\n]+)/) { $sourcePath = $1; } elsif (/^\+\+\+ ([^\t\r\n]+)/) { $destinationPath = $1; $foundHeaderEnding = 1; } elsif (/^GIT binary patch$/) { $foundHeaderEnding = 1; } $_ = <$fileHandle>; # Not defined if end-of-file reached. last if (!defined($_) || /$gitDiffStartRegEx/ || $foundHeaderEnding); } seek($fileHandle, $savedPosition, 0); my $indexPath = ""; if ($destinationPath eq "/dev/null") { # Deletion $indexPath = $sourcePath; } else { # Addition, modification, copy, or rename. $indexPath = $destinationPath; } return $indexPath; } Then in parseGitDiffHeader() we can write $indexPath = adjustPathForRecentRenamings(parseGitDiffHeaderForIndexPath($fileHandle, $line)) when $_ matches $gitDiffStartRegEx. When $indexPath is a non-empty string we construct the "Index:" line taking care to append the correct line ending (we'll need to save $POSTMATCH before calling parseGitDiffHeaderForIndexPath() as it will change on function return). Otherwise, we die() with an error message, say "Could not determine target file path from diff." > Tools/Scripts/VCSUtils.pm:110 > my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#; We seem to only use this regular expression to match the prefix "diff --git" and to extract the end of line character. Can we simplify this expression? > Tools/Scripts/VCSUtils.pm:111 > +my $gitDiffStartWithPrefixRegEx = qr#^diff --git a/(.+) b/([^\r\n]+)#; This is OK as-is. Notice that the source- (a/) and destination- prefix (b/) can be modified by specifying to the git diff command the optional command line arguments --src-prefix and --dst-prefix per <https://www.kernel.org/pub/software/scm/git/docs/v1.7.3/git-diff.html>. I assume this influenced the choice of the special escape \w in the regular expression $gitDiffStartRegEx as opposed to hardcoding the prefixes. The use of the special escape \w is insufficient to match an arbitrary source and destination prefix. Obviously the use of the special escape \w matches more variants of the diff --git line than a diff --git line with hardcoded source and destination prefixes. > Tools/Scripts/VCSUtils.pm:701 > + if (/^\+\+\+ (.+)/) { > + $indexPath = adjustPathForRecentRenamings($1); Consider the following non-prefixed diff that deletes the file named "TWO WORDS.txt": diff --git TWO WORDS.txt TWO WORDS.txt deleted file mode 100644 index 03a05ce..0000000 --- TWO WORDS.txt +++ /dev/null @@ -1 +0,0 @@ -Sample Then $indexPath will be /dev/null. But it should be "TWO WORDS.txt". > Tools/Scripts/VCSUtils.pm:704 > + $indexPath = $1; Don't we need to adjust the index path here (i.e. call adjustPathForRecentRenamings())? > Tools/Scripts/VCSUtils.pm:707 > + $indexPath = $1; Ditto. > Tools/Scripts/VCSUtils.pm:748 > + } elsif (/^copy from ([^\r\n]+)/) { I suggest we strengthen this regular expression and exclude tab characters from being considered in a path. > Tools/Scripts/VCSUtils.pm:750 > + } elsif (/^rename from ([^\r\n]+)/) { Ditto. > Tools/Scripts/VCSUtils.pm:762 > + # The patch(1) command thinks a file path is characters before the parenthesis. Is this correct? I thought patch(1) reads all characters up to the first tab character. > Tools/Scripts/VCSUtils.pm:768 > + # Please watch the comments in the "elsif (/^--- \S+/)" line for detail. I think it would be clearer to duplicate the last two lines of the comment for the "elsif (/^--- \S+/)" branch than to refer to it. If you don't like such duplication then I suggest updating this sentence to read (*): See the comment for the /^--- \S+/ case (above) for more details. (*) It's unusual to say "watch the comments" as the word "watch" implies that a person should keep the comment in view as if it were to change at any moment.
Yuki Sekiguchi
Comment 14 2013-05-26 23:18:00 PDT
Yuki Sekiguchi
Comment 15 2013-05-27 00:16:37 PDT
Hi Daniel, Thank you for reviewing. (In reply to comment #13) > (From update of attachment 202766 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=202766&action=review > > Thank you Yuki for updating the patch. > > The idea presented in this patch is good. I suggest that we encapsulate the parsing machinery to determine the path to the target file into a separate function, say parseGitDiffHeaderForIndexPath, that takes a reference to a file handle F and the last line read from F as input and returns the path to the target file as output (or the empty string on error). parseGitDiffHeaderForIndexPath() will guarantee that F isn't advanced when it returns. That is, if F is at byte b_0 when parseGitDiffHeaderForIndexPath() is called then F will be at byte b_0 when the call returns. This function would have the form: Refactoring is good, but why should we use file handle as input of parseGitDiffHeaderForIndexPath()? We have to read file to create @diffLines. I think creating @diffLines and using it from parseGitDiffHeaderForIndexPath() and parseGitDiffHeader() is better than reading file twice. My patch use @diffLines for the input of parseGitDiffHeaderForIndexPath(). > Then in parseGitDiffHeader() we can write $indexPath = adjustPathForRecentRenamings(parseGitDiffHeaderForIndexPath($fileHandle, $line)) when $_ matches $gitDiffStartRegEx. When $indexPath is a non-empty string we construct the "Index:" line taking care to append the correct line ending (we'll need to save $POSTMATCH before calling parseGitDiffHeaderForIndexPath() as it will change on function return). Otherwise, we die() with an error message, say "Could not determine target file path from diff." Fixed. > > > Tools/Scripts/VCSUtils.pm:110 > > my $gitDiffStartRegEx = qr#^diff --git (\w/)?(.+) (\w/)?([^\r\n]+)#; > > We seem to only use this regular expression to match the prefix "diff --git" and to extract the end of line character. Can we simplify this expression? Yes. Simplified. > > Tools/Scripts/VCSUtils.pm:111 > > +my $gitDiffStartWithPrefixRegEx = qr#^diff --git a/(.+) b/([^\r\n]+)#; > > This is OK as-is. Notice that the source- (a/) and destination- prefix (b/) can be modified by specifying to the git diff command the optional command line arguments --src-prefix and --dst-prefix per <https://www.kernel.org/pub/software/scm/git/docs/v1.7.3/git-diff.html>. I assume this influenced the choice of the special escape \w in the regular expression $gitDiffStartRegEx as opposed to hardcoding the prefixes. The use of the special escape \w is insufficient to match an arbitrary source and destination prefix. Obviously the use of the special escape \w matches more variants of the diff --git line than a diff --git line with hardcoded source and destination prefixes. Used \w+, and added comment the limitation. > > Tools/Scripts/VCSUtils.pm:701 > > + if (/^\+\+\+ (.+)/) { > > + $indexPath = adjustPathForRecentRenamings($1); > > Consider the following non-prefixed diff that deletes the file named "TWO WORDS.txt": > > diff --git TWO WORDS.txt TWO WORDS.txt > deleted file mode 100644 > index 03a05ce..0000000 > --- TWO WORDS.txt > +++ /dev/null > @@ -1 +0,0 @@ > -Sample > > Then $indexPath will be /dev/null. But it should be "TWO WORDS.txt". I added a test case and confirmed that your suggested function fixed this. Thanks! > > Tools/Scripts/VCSUtils.pm:704 > > + $indexPath = $1; > > Don't we need to adjust the index path here (i.e. call adjustPathForRecentRenamings())? > > > Tools/Scripts/VCSUtils.pm:707 > > + $indexPath = $1; > > Ditto. Yes, we need. > > Tools/Scripts/VCSUtils.pm:748 > > + } elsif (/^copy from ([^\r\n]+)/) { > > I suggest we strengthen this regular expression and exclude tab characters from being considered in a path. > > > Tools/Scripts/VCSUtils.pm:750 > > + } elsif (/^rename from ([^\r\n]+)/) { > > Ditto. Added \t. > > Tools/Scripts/VCSUtils.pm:762 > > + # The patch(1) command thinks a file path is characters before the parenthesis. > > Is this correct? I thought patch(1) reads all characters up to the first tab character. This is not correct. You are right. In GNU patch code, fetchname() @ util.c do that. > > Tools/Scripts/VCSUtils.pm:768 > > + # Please watch the comments in the "elsif (/^--- \S+/)" line for detail. > > I think it would be clearer to duplicate the last two lines of the comment for the "elsif (/^--- \S+/)" branch than to refer to it. If you don't like such duplication then I suggest updating this sentence to read (*): See the comment for the /^--- \S+/ case (above) for more details. Just copied commets. > (*) It's unusual to say "watch the comments" as the word "watch" implies that a person should keep the comment in view as if it were to change at any moment. English is very difficult for me like perl :(
Yuki Sekiguchi
Comment 16 2013-05-31 04:50:22 PDT
Hi Daniel, Could you review my patch?
Daniel Bates
Comment 17 2013-06-01 14:05:37 PDT
I spoke with Chris Jerdonek, who wrote most of the Git diff parsing machinery, about this patch. We felt that it's sufficient to start with a simple solution and iterate on it as needed. One such simple solution is to use the first path segment w of the source file path to determine where the target file path begins. Consider the following diff --git line: diff --git WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess)2.xcscheme Take w := "WebKit.xcworkspace/". Then locate the second occurrence of w and split the string accordingly. The source and destination file paths would then be "WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme", and "WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess)2.xcscheme", respectively. Such a solution would have the form: sub parseGitDiffHeader($$) { ... my $indexPath; if (/$gitDiffStartRegEx/) { ... $indexPath = adjustPathForRecentRenamings(parseGitDiffStartLineForTargetPath($_)); # Use $POSTMATCH to preserve the end-of-line character. $_ = "Index: $indexPath$POSTMATCH"; # Convert to SVN format. } else { die("Could not parse leading \"diff --git\" line: \"$line\"."); } ... while (1) { ... } elsif (/^--- \S+/) { ... $_ = "--- $indexPath\t(revision 0)"; } elsif (/^\+\+\+ \S+/) { ... $_ = "+++ $indexPath\t(working copy)"; } ... } ... } sub parseGitDiffStartLine($) { ... if (/$gitDiffStartWithPrefixRegEx/ || /$gitDiffStartWithoutPrefixNoSpaceRegEx/) { return $2; } ... } We expect that this approach will handle almost all cases in practice. However, initially it won't handle cases where the top-level directory changes or certain rare cases where the first path segment may occur later in the source path (e.g. diff --git A/B A/C.txt A/D.txt; source path = "A/B A/C.txt", target path = "A/D.txt"). Moreover, if we ever want to handle non-prefix Git diffs for binary files, the approach of parsing only the first line will be necessary because such diffs only reference the source and target path in this line. Starting with a simple solution is attractive as the implementation tends to be concise and maintains the readability and hackability of the code base. Should the simple solution turn out to be insufficient we can iterate on it towards a more comprehensive solution.
David Farler
Comment 18 2013-06-03 16:28:25 PDT
(In reply to comment #17) > I spoke with Chris Jerdonek, who wrote most of the Git diff parsing machinery, about this patch. We felt that it's sufficient to start with a simple solution and iterate on it as needed. One such simple solution is to use the first path segment w of the source file path to determine where the target file path begins. Consider the following diff --git line: > > diff --git WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess)2.xcscheme > > Take w := "WebKit.xcworkspace/". Then locate the second occurrence of w and split the string accordingly. The source and destination file paths would then be "WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme", and "WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess)2.xcscheme", respectively. > > Such a solution would have the form: > > sub parseGitDiffHeader($$) > { > ... > my $indexPath; > if (/$gitDiffStartRegEx/) { > ... > $indexPath = adjustPathForRecentRenamings(parseGitDiffStartLineForTargetPath($_)); > # Use $POSTMATCH to preserve the end-of-line character. > $_ = "Index: $indexPath$POSTMATCH"; # Convert to SVN format. > } else { > die("Could not parse leading \"diff --git\" line: \"$line\"."); > } > ... > while (1) { > ... > } elsif (/^--- \S+/) { > ... > $_ = "--- $indexPath\t(revision 0)"; > } elsif (/^\+\+\+ \S+/) { > ... > $_ = "+++ $indexPath\t(working copy)"; > } > ... > } > ... > } > > sub parseGitDiffStartLine($) > { > ... > if (/$gitDiffStartWithPrefixRegEx/ || /$gitDiffStartWithoutPrefixNoSpaceRegEx/) { > return $2; > } > ... > } > > We expect that this approach will handle almost all cases in practice. However, initially it won't handle cases where the top-level directory changes or certain rare cases where the first path segment may occur later in the source path (e.g. diff --git A/B A/C.txt A/D.txt; source path = "A/B A/C.txt", target path = "A/D.txt"). Moreover, if we ever want to handle non-prefix Git diffs for binary files, the approach of parsing only the first line will be necessary because such diffs only reference the source and target path in this line. > > Starting with a simple solution is attractive as the implementation tends to be concise and maintains the readability and hackability of the code base. Should the simple solution turn out to be insufficient we can iterate on it towards a more comprehensive solution. If we like, we can also just see how git does it: builtin/apply.c static int find_header(const char *line, unsigned long size, int *hdrsize, struct patch *patch) :1490 static int parse_git_header(const char *line, int len, unsigned int size, struct patch *patch) :1280 static char *git_header_name(const char *line, int llen)
Yuki Sekiguchi
Comment 19 2013-06-04 04:54:43 PDT
Yuki Sekiguchi
Comment 20 2013-06-04 05:05:22 PDT
Hi Daniel, (In reply to comment #17) > I spoke with Chris Jerdonek, who wrote most of the Git diff parsing machinery, about this patch. We felt that it's sufficient to start with a simple solution and iterate on it as needed. One such simple solution is to use the first path segment w of the source file path to determine where the target file path begins. Consider the following diff --git line: > > diff --git WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess)2.xcscheme > > Take w := "WebKit.xcworkspace/". Then locate the second occurrence of w and split the string accordingly. The source and destination file paths would then be "WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme", and "WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess)2.xcscheme", respectively. OK. Clean and simple code is good :) > We expect that this approach will handle almost all cases in practice. However, initially it won't handle cases where the top-level directory changes or certain rare cases where the first path segment may occur later in the source path (e.g. diff --git A/B A/C.txt A/D.txt; source path = "A/B A/C.txt", target path = "A/D.txt"). Moreover, if we ever want to handle non-prefix Git diffs for binary files, the approach of parsing only the first line will be necessary because such diffs only reference the source and target path in this line. > > Starting with a simple solution is attractive as the implementation tends to be concise and maintains the readability and hackability of the code base. Should the simple solution turn out to be insufficient we can iterate on it towards a more comprehensive solution. I found that moving a file through sub directories of top directory is not supported, too (e.g diff --git A/B.txt C/B.txt). However, I think this is also rare case. I partial reverted your suggestion in Comment #13 which support --src-prefix and --dst-prefix. We cannot distinguish the prefix from path prefix, so I think a single character is --src-prefix or --dst-prefix.
Yuki Sekiguchi
Comment 21 2013-06-04 05:15:39 PDT
(In reply to comment #18) > If we like, we can also just see how git does it: > > builtin/apply.c > > static int find_header(const char *line, unsigned long size, int *hdrsize, struct patch *patch) > :1490 > static int parse_git_header(const char *line, int len, unsigned int size, struct patch *patch) > :1280 > static char *git_header_name(const char *line, int llen) Thank you for good information. I looked the code. Summary: * git get paths from "git --diff" line only if the file is changed. * When the file is renamed, copied or deleted, git gets paths from diff file (e.g. "+++ foo.h"). * To get paths from a diff line which contains paths which have space, git splits the diff line with space or tab, and it checks the two paths is same. * To support --src-prefix and --dst-prefix, git use -p parameter which is same as patch(1) or heuristically guess -p parameter of a diff using the diff and relative path from the .git directory .
Daniel Bates
Comment 22 2013-06-04 14:54:00 PDT
Comment on attachment 203682 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203682&action=review This patch looks good. I have some minor nits. > Tools/ChangeLog:8 > + If the path in a diff has space characters, parseGitDiffHeader() considers last nonspace characters as file path. I'm unclear what you mean by this sentence. > Tools/ChangeLog:9 > + When the diff have prefix, we consider next characters after "b/" as file path. Is this sentence correct? I thought we currently considered the next characters after "b/" as part of a file path? > Tools/ChangeLog:12 > + We only support --src-prefix and --dst-prefix don't have a \W and end with '/' because we cannot distinguish the prefix from directory path. For your consideration, I suggest we write out the meaning of \W in English and hence change "have a \W" to "contain a non-word character (\W)". > Tools/Scripts/VCSUtils.pm:111 > +my $gitDiffStartWithPrefixRegEx = qr#^diff --git \w/(.+) \w/([^\r\n]+)#; # We suppose that --src-prefix and --dst-prefix don't have a \W and end with '/'. For your consideration, I suggest we write out the meaning of \W in English and hence change "have a \W" to "contain a non-word character (\W)". > Tools/Scripts/VCSUtils.pm:114 > +my $gitDiffPathPrefix = qr#^diff --git ([^\/]+\/)#; It's unnecessary to escape the '/' character in this regular expression since we use '#' as the delimiter. Thus, we can simplify the regular expression in this line to read: ^diff --git ([^/]+/) On another note, I wish we could come up with a more descriptive name for this variable. Maybe $gitDiffStartWithoutPrefixFirstDirectoryNameRegExp? or $gitDiffStartWithoutPrefixSourceDirectoryPrefixRegExp? or $gitDiffStartPathPrefixRegExp? > Tools/Scripts/VCSUtils.pm:624 > +# $line: the line last read from $fileHandle The description of this argument is incorrect. We expect $line to be the diff --git line. > Tools/Scripts/VCSUtils.pm:626 > +# Returns $indexPath: the path of the target file. There is no such local variable called $indexPath. I would update this line to read: Returns the path of the target file. > Tools/Scripts/VCSUtils.pm:636 > + # Moving top directory file is not supported (e.g diff --git A.txt B.txt). We should make this a FIXME comment. > Tools/Scripts/VCSUtils.pm:637 > + die("Could not find '/' in \"diff --git\" line: \"$line\".\nPlease try git diff without --no-prefix"); I suggest we further elaborate on the reason we failed similar to the comment on the line above. Maybe something like: "Could not find '/' in \"diff --git\" line: \"$line\"; only non-prefixed git diffs (i.e. not generated with --no-prefix) that move a top-level directory file are supported." > Tools/Scripts/VCSUtils.pm:640 > + if (!/^diff --git $pathPrefix.+ ($pathPrefix.+)$/) { We should disable all metacharacters in $pathPrefix when interpolating its value by surrounding it with the special escapes \Q, \E: /^diff --git \Q$pathPrefix\E.+ (\Q$pathPrefix\E.+)$/ For completeness, the special escape characters \Q and \E are discussed in <http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators>. > Tools/Scripts/VCSUtils.pm:641 > + # Moving a file through sub directories of top directory is not supported (e.g diff --git A/B.txt C/B.txt). We should make this a FIXME coment. > Tools/Scripts/VCSUtils.pm:642 > + die("Could not find '/' in \"diff --git\" line: \"$line\".\nPlease try git diff without --no-prefix"); I suggest we further elaborate on the reason we failed similar to the comment on the line above. Maybe something like: "Could not find '/' in \"diff --git\" line: \"$line\"; only non-prefixed git diffs (i.e. not generated with --no-prefix) that move a file between top-level directories are supported." > Tools/Scripts/VCSUtils.pm:740 > + # This suffix make our diff more closely match the SVN diff format. Nit: make => makes > Tools/Scripts/VCSUtils.pm:746 > + # This suffix make our diff more closely match the SVN diff format. Nit: make => makes > Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl:84 > + diffName => "Modified file which have empty characters in path", Nit: "empty" => "space" > Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl:107 > +{ # New test These test cases are good. For your consideration, I suggest we create a test case that has a diff --git line identical to one in attachment #190663 [details] (which is the patch that revealed the bug in our Git diff parsing code): diff --git a/WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme b/WebKit.xcworkspace/xcshareddata/xcschemes/All Source (target WebProcess).xcscheme > Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl:108 > + diffName => "Modified file which have empty characters in path using --no-prefix", Nit: "empty" => "space" > Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl:186 > + diffName => "delete file which have empty characters in path using --no-prefix", Nit: "empty" => "space" > Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl:264 > + diffName => "copy file which have empty characters in path using --no-prefix (with similarity index 100%)", Ditto. > Tools/Scripts/webkitperl/VCSUtils_unittest/parseGitDiffHeader.pl:338 > + diffName => "rename file which have empty characters in path using --no-prefix (with similarity index 100%)", Ditto.
Yuki Sekiguchi
Comment 23 2013-06-05 01:31:25 PDT
Yuki Sekiguchi
Comment 24 2013-06-05 01:43:57 PDT
Hi Daniel, Thank you for kindly review. I fixed codes which you comment. (In reply to comment #22) > (From update of attachment 203682 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=203682&action=review > > This patch looks good. I have some minor nits. > > > Tools/ChangeLog:8 > > + If the path in a diff has space characters, parseGitDiffHeader() considers last nonspace characters as file path. > > I'm unclear what you mean by this sentence. I want to describe bug cause. I changed this to the past tense. Is this make sense? > > > Tools/ChangeLog:9 > > + When the diff have prefix, we consider next characters after "b/" as file path. > > Is this sentence correct? I thought we currently considered the next characters after "b/" as part of a file path? Sorry, I cannot understand your comment. I added "part of a", but my fix might be wrong. > > Tools/Scripts/VCSUtils.pm:114 > > +my $gitDiffPathPrefix = qr#^diff --git ([^\/]+\/)#; > > It's unnecessary to escape the '/' character in this regular expression since we use '#' as the delimiter. Thus, we can simplify the regular expression in this line to read: ^diff --git ([^/]+/) > > On another note, I wish we could come up with a more descriptive name for this variable. Maybe $gitDiffStartWithoutPrefixFirstDirectoryNameRegExp? or $gitDiffStartWithoutPrefixSourceDirectoryPrefixRegExp? or $gitDiffStartPathPrefixRegExp? I like $gitDiffStartWithoutPrefixSourceDirectoryPrefixRegExp :)
Daniel Bates
Comment 25 2013-06-05 05:07:39 PDT
Comment on attachment 203682 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203682&action=review >>> Tools/ChangeLog:8 >>> + If the path in a diff has space characters, parseGitDiffHeader() considers last nonspace characters as file path. >> >> I'm unclear what you mean by this sentence. > > I want to describe bug cause. > I changed this to the past tense. > Is this make sense? I think a better way to phrase this sentence is to write: Fixes an issue where parseGitDiffHeader() would extract the wrong substring of the diff --git line as the target file path when the source file path contains a space character. >>> Tools/ChangeLog:9 >>> + When the diff have prefix, we consider next characters after "b/" as file path. >> >> Is this sentence correct? I thought we currently considered the next characters after "b/" as part of a file path? > > Sorry, I cannot understand your comment. > I added "part of a", but my fix might be wrong. OK. This sentence reads better with the proposed correction.
Daniel Bates
Comment 26 2013-06-05 05:09:49 PDT
Comment on attachment 203769 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=203769&action=review > Tools/ChangeLog:8 > + If the path in diff start line has space characters, parseGitDiffHeader() considered last non-space characters as file path. As aforementioned in comment #25, I think a better way to phrase this sentence would be: Fixes an issue where parseGitDiffHeader() would extract the wrong substring of the diff --git line as the target file path when the source file path contains a space character.
Yuki Sekiguchi
Comment 27 2013-06-05 23:56:17 PDT
Yuki Sekiguchi
Comment 28 2013-06-06 00:07:11 PDT
Hi Daniel, (In reply to comment #26) > (From update of attachment 203769 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=203769&action=review > > > Tools/ChangeLog:8 > > + If the path in diff start line has space characters, parseGitDiffHeader() considered last non-space characters as file path. > > As aforementioned in comment #25, I think a better way to phrase this sentence would be: > > Fixes an issue where parseGitDiffHeader() would extract the wrong substring of the diff --git line as the target file path when the source file path contains a space character. Thank you. Fixed.
Daniel Bates
Comment 29 2013-06-06 07:16:36 PDT
Comment on attachment 203907 [details] Patch r=me
WebKit Commit Bot
Comment 30 2013-06-06 16:37:25 PDT
Comment on attachment 203907 [details] Patch Clearing flags on attachment: 203907 Committed r151300: <http://trac.webkit.org/changeset/151300>
WebKit Commit Bot
Comment 31 2013-06-06 16:37:29 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.