Bug 110356

Summary: Parse author names with commas in ChangeLogs
Product: WebKit Reporter: Tony Chang <tony>
Component: New BugsAssignee: Tony Chang <tony>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, cgarcia, dpranke, eric, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 109303    
Attachments:
Description Flags
Patch
none
Patch
none
Patch for landing none

Description Tony Chang 2013-02-20 10:46:27 PST
Parse author names with commas in ChangeLogs
Comment 1 Tony Chang 2013-02-20 10:48:57 PST
Created attachment 189339 [details]
Patch
Comment 2 Tony Chang 2013-02-20 10:49:25 PST
This is kind of gross, alternate suggestions welcome.
Comment 3 Tony Chang 2013-02-20 10:50:06 PST
You can see the failure on https://bugs.webkit.org/show_bug.cgi?id=109303 .
Comment 4 Dirk Pranke 2013-02-20 11:52:16 PST
Comment on attachment 189339 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review

> Tools/ChangeLog:8
> +        PaweÅ's name has a comma in it, which was confusing the ChangeLog parser.

interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here?

> Tools/Scripts/webkitpy/common/checkout/changelog.py:150
> +        return names

It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck?

Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).
Comment 5 Tony Chang 2013-02-20 12:15:33 PST
Created attachment 189354 [details]
Patch
Comment 6 Tony Chang 2013-02-20 12:16:15 PST
(In reply to comment #4)
> (From update of attachment 189339 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review
> 
> > Tools/ChangeLog:8
> > +        PaweÅ's name has a comma in it, which was confusing the ChangeLog parser.
> 
> interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here?

This is a bug in PrettyPatch.rb.  The ruby script isn't unicode aware.

> > Tools/Scripts/webkitpy/common/checkout/changelog.py:150
> > +        return names
> 
> It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck?

That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer.

> Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).

Done.
Comment 7 Tony Chang 2013-02-20 13:44:31 PST
Created attachment 189366 [details]
Patch for landing
Comment 8 Dirk Pranke 2013-02-20 13:45:06 PST
(In reply to comment #6)
> (In reply to comment #4)
> > (From update of attachment 189339 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review
> > 
> > > Tools/ChangeLog:8
> > > +        PaweÅ's name has a comma in it, which was confusing the ChangeLog parser.
> > 
> > interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here?
> 
> This is a bug in PrettyPatch.rb.  The ruby script isn't unicode aware.
> 

Is there a bug filed for that?

> > > Tools/Scripts/webkitpy/common/checkout/changelog.py:150
> > > +        return names
> > 
> > It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck?
> 
> That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer.
> 
> > Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).
> 
> Done.

As we discussed over IM, I think there are three problems here: there's the obvious one (we can't handle phajdan.jr's name), and two less obvious ones:

1) We don't expect the parser to be able to handle all possible combinations of stuff, and when it does fail to parse, the CQ chokes on this rather than failing semi-robustly somehow. We should file a bug to fix this.

2) The existing approach uses very complicated (IMO) regexps and a couple of ad-hoc workarounds to try and do the best it can. The code is unmaintainable (to me) and this patch makes it slightly worse. The only way I personally would fix this bug is to throw out the code and rewrite it to use a proper parser, but that's a lot more work. I don't want to require it now, so I've r+'ed this patch (so that we can handle phajdan.jr in the meantime), but we should file a separate bug to rework this code to make it more maintainable and/or correct.

does that capture everything?
Comment 9 WebKit Review Bot 2013-02-20 14:05:32 PST
Comment on attachment 189366 [details]
Patch for landing

Clearing flags on attachment: 189366

Committed r143504: <http://trac.webkit.org/changeset/143504>
Comment 10 WebKit Review Bot 2013-02-20 14:05:37 PST
All reviewed patches have been landed.  Closing bug.
Comment 11 Tony Chang 2013-02-20 14:13:31 PST
https://bugs.webkit.org/show_bug.cgi?id=110380 filed for (1).
Comment 12 Carlos Garcia Campos 2013-04-10 00:36:29 PDT
*** Bug 84582 has been marked as a duplicate of this bug. ***