Bug 110356 - Parse author names with commas in ChangeLogs
Summary: Parse author names with commas in ChangeLogs
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Tony Chang
URL:
Keywords:
: 84582 (view as bug list)
Depends on:
Blocks: 109303
  Show dependency treegraph
 
Reported: 2013-02-20 10:46 PST by Tony Chang
Modified: 2013-04-10 00:36 PDT (History)
5 users (show)

See Also:


Attachments
Patch (3.52 KB, patch)
2013-02-20 10:48 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch (4.68 KB, patch)
2013-02-20 12:15 PST, Tony Chang
no flags Details | Formatted Diff | Diff
Patch for landing (4.90 KB, patch)
2013-02-20 13:44 PST, Tony Chang
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tony Chang 2013-02-20 10:46:27 PST
Parse author names with commas in ChangeLogs
Comment 1 Tony Chang 2013-02-20 10:48:57 PST
Created attachment 189339 [details]
Patch
Comment 2 Tony Chang 2013-02-20 10:49:25 PST
This is kind of gross, alternate suggestions welcome.
Comment 3 Tony Chang 2013-02-20 10:50:06 PST
You can see the failure on https://bugs.webkit.org/show_bug.cgi?id=109303 .
Comment 4 Dirk Pranke 2013-02-20 11:52:16 PST
Comment on attachment 189339 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review

> Tools/ChangeLog:8
> +        PaweÅ's name has a comma in it, which was confusing the ChangeLog parser.

interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here?

> Tools/Scripts/webkitpy/common/checkout/changelog.py:150
> +        return names

It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck?

Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).
Comment 5 Tony Chang 2013-02-20 12:15:33 PST
Created attachment 189354 [details]
Patch
Comment 6 Tony Chang 2013-02-20 12:16:15 PST
(In reply to comment #4)
> (From update of attachment 189339 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review
> 
> > Tools/ChangeLog:8
> > +        PaweÅ's name has a comma in it, which was confusing the ChangeLog parser.
> 
> interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here?

This is a bug in PrettyPatch.rb.  The ruby script isn't unicode aware.

> > Tools/Scripts/webkitpy/common/checkout/changelog.py:150
> > +        return names
> 
> It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck?

That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer.

> Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).

Done.
Comment 7 Tony Chang 2013-02-20 13:44:31 PST
Created attachment 189366 [details]
Patch for landing
Comment 8 Dirk Pranke 2013-02-20 13:45:06 PST
(In reply to comment #6)
> (In reply to comment #4)
> > (From update of attachment 189339 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review
> > 
> > > Tools/ChangeLog:8
> > > +        PaweÅ's name has a comma in it, which was confusing the ChangeLog parser.
> > 
> > interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here?
> 
> This is a bug in PrettyPatch.rb.  The ruby script isn't unicode aware.
> 

Is there a bug filed for that?

> > > Tools/Scripts/webkitpy/common/checkout/changelog.py:150
> > > +        return names
> > 
> > It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck?
> 
> That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer.
> 
> > Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149).
> 
> Done.

As we discussed over IM, I think there are three problems here: there's the obvious one (we can't handle phajdan.jr's name), and two less obvious ones:

1) We don't expect the parser to be able to handle all possible combinations of stuff, and when it does fail to parse, the CQ chokes on this rather than failing semi-robustly somehow. We should file a bug to fix this.

2) The existing approach uses very complicated (IMO) regexps and a couple of ad-hoc workarounds to try and do the best it can. The code is unmaintainable (to me) and this patch makes it slightly worse. The only way I personally would fix this bug is to throw out the code and rewrite it to use a proper parser, but that's a lot more work. I don't want to require it now, so I've r+'ed this patch (so that we can handle phajdan.jr in the meantime), but we should file a separate bug to rework this code to make it more maintainable and/or correct.

does that capture everything?
Comment 9 WebKit Review Bot 2013-02-20 14:05:32 PST
Comment on attachment 189366 [details]
Patch for landing

Clearing flags on attachment: 189366

Committed r143504: <http://trac.webkit.org/changeset/143504>
Comment 10 WebKit Review Bot 2013-02-20 14:05:37 PST
All reviewed patches have been landed.  Closing bug.
Comment 11 Tony Chang 2013-02-20 14:13:31 PST
https://bugs.webkit.org/show_bug.cgi?id=110380 filed for (1).
Comment 12 Carlos Garcia Campos 2013-04-10 00:36:29 PDT
*** Bug 84582 has been marked as a duplicate of this bug. ***