Bug 110356

Summary: Parse author names with commas in ChangeLogs
Product: WebKit Reporter: Tony Chang <tony>
Component: New BugsAssignee: Tony Chang <tony>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cgarcia, dpranke, eric, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 109303    
Attachments:
Description Flags
Patch
none
Patch
none
Patch for landing none

Tony Chang
Reported 2013-02-20 10:46:27 PST
Parse author names with commas in ChangeLogs
Attachments
Patch (3.52 KB, patch)
2013-02-20 10:48 PST, Tony Chang
no flags
Patch (4.68 KB, patch)
2013-02-20 12:15 PST, Tony Chang
no flags
Patch for landing (4.90 KB, patch)
2013-02-20 13:44 PST, Tony Chang
no flags
Tony Chang
Comment 1 2013-02-20 10:48:57 PST
Tony Chang
Comment 2 2013-02-20 10:49:25 PST
This is kind of gross, alternate suggestions welcome.
Tony Chang
Comment 3 2013-02-20 10:50:06 PST
Dirk Pranke
Comment 4 2013-02-20 11:52:16 PST
Comment on attachment 189339 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review > Tools/ChangeLog:8 > + PaweÅ's name has a comma in it, which was confusing the ChangeLog parser. interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here? > Tools/Scripts/webkitpy/common/checkout/changelog.py:150 > + return names It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck? Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).
Tony Chang
Comment 5 2013-02-20 12:15:33 PST
Tony Chang
Comment 6 2013-02-20 12:16:15 PST
(In reply to comment #4) > (From update of attachment 189339 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review > > > Tools/ChangeLog:8 > > + PaweÅ's name has a comma in it, which was confusing the ChangeLog parser. > > interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here? This is a bug in PrettyPatch.rb. The ruby script isn't unicode aware. > > Tools/Scripts/webkitpy/common/checkout/changelog.py:150 > > + return names > > It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck? That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer. > Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149). Done.
Tony Chang
Comment 7 2013-02-20 13:44:31 PST
Created attachment 189366 [details] Patch for landing
Dirk Pranke
Comment 8 2013-02-20 13:45:06 PST
(In reply to comment #6) > (In reply to comment #4) > > (From update of attachment 189339 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review > > > > > Tools/ChangeLog:8 > > > + PaweÅ's name has a comma in it, which was confusing the ChangeLog parser. > > > > interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here? > > This is a bug in PrettyPatch.rb. The ruby script isn't unicode aware. > Is there a bug filed for that? > > > Tools/Scripts/webkitpy/common/checkout/changelog.py:150 > > > + return names > > > > It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck? > > That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer. > > > Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149). > > Done. As we discussed over IM, I think there are three problems here: there's the obvious one (we can't handle phajdan.jr's name), and two less obvious ones: 1) We don't expect the parser to be able to handle all possible combinations of stuff, and when it does fail to parse, the CQ chokes on this rather than failing semi-robustly somehow. We should file a bug to fix this. 2) The existing approach uses very complicated (IMO) regexps and a couple of ad-hoc workarounds to try and do the best it can. The code is unmaintainable (to me) and this patch makes it slightly worse. The only way I personally would fix this bug is to throw out the code and rewrite it to use a proper parser, but that's a lot more work. I don't want to require it now, so I've r+'ed this patch (so that we can handle phajdan.jr in the meantime), but we should file a separate bug to rework this code to make it more maintainable and/or correct. does that capture everything?
WebKit Review Bot
Comment 9 2013-02-20 14:05:32 PST
Comment on attachment 189366 [details] Patch for landing Clearing flags on attachment: 189366 Committed r143504: <http://trac.webkit.org/changeset/143504>
WebKit Review Bot
Comment 10 2013-02-20 14:05:37 PST
All reviewed patches have been landed. Closing bug.
Tony Chang
Comment 11 2013-02-20 14:13:31 PST
Carlos Garcia Campos
Comment 12 2013-04-10 00:36:29 PDT
*** Bug 84582 has been marked as a duplicate of this bug. ***
Note You need to log in before you can comment on or make changes to this bug.