Summary: | Parse author names with commas in ChangeLogs | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Tony Chang <tony> | ||||||||
Component: | New Bugs | Assignee: | Tony Chang <tony> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | abarth, cgarcia, dpranke, eric, webkit.review.bot | ||||||||
Priority: | P2 | ||||||||||
Version: | 528+ (Nightly build) | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Bug Depends on: | |||||||||||
Bug Blocks: | 109303 | ||||||||||
Attachments: |
|
Description
Tony Chang
2013-02-20 10:46:27 PST
Created attachment 189339 [details]
Patch
This is kind of gross, alternate suggestions welcome. You can see the failure on https://bugs.webkit.org/show_bug.cgi?id=109303 . Comment on attachment 189339 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review > Tools/ChangeLog:8 > + PaweÅ's name has a comma in it, which was confusing the ChangeLog parser. interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here? > Tools/Scripts/webkitpy/common/checkout/changelog.py:150 > + return names It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck? Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149). Created attachment 189354 [details]
Patch
(In reply to comment #4) > (From update of attachment 189339 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review > > > Tools/ChangeLog:8 > > + PaweÅ's name has a comma in it, which was confusing the ChangeLog parser. > > interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here? This is a bug in PrettyPatch.rb. The ruby script isn't unicode aware. > > Tools/Scripts/webkitpy/common/checkout/changelog.py:150 > > + return names > > It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck? That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer. > Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149). Done. Created attachment 189366 [details]
Patch for landing
(In reply to comment #6) > (In reply to comment #4) > > (From update of attachment 189339 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=189339&action=review > > > > > Tools/ChangeLog:8 > > > + PaweÅ's name has a comma in it, which was confusing the ChangeLog parser. > > > > interesting that this is showing up as \u212b (angstrom sign) rather than \u0142 (latin small l with stroke) in the bugzilla diff. What is it in the file? Perhaps there's a separate bug for some character encoding conversion here? > > This is a bug in PrettyPatch.rb. The ruby script isn't unicode aware. > Is there a bug filed for that? > > > Tools/Scripts/webkitpy/common/checkout/changelog.py:150 > > > + return names > > > > It looks like this function is used in two places, in the author text and in the reviewer text. If I'm correctly understanding this patch, this will work for the author_text case (where we want name_and_email=True), but I think if Pawel put his full name as one of the reviewers, we'd probably still be stuck? > > That's true, it will fail for reviewers. I'm OK with punting this problem until Pawel becomes a reviewer. > > > Also, it seems like maybe we should just split this into two different functions, and maybe define the regex as a constant or something. Combining them into one function makes it hard to follow (especially lines 148-149). > > Done. As we discussed over IM, I think there are three problems here: there's the obvious one (we can't handle phajdan.jr's name), and two less obvious ones: 1) We don't expect the parser to be able to handle all possible combinations of stuff, and when it does fail to parse, the CQ chokes on this rather than failing semi-robustly somehow. We should file a bug to fix this. 2) The existing approach uses very complicated (IMO) regexps and a couple of ad-hoc workarounds to try and do the best it can. The code is unmaintainable (to me) and this patch makes it slightly worse. The only way I personally would fix this bug is to throw out the code and rewrite it to use a proper parser, but that's a lot more work. I don't want to require it now, so I've r+'ed this patch (so that we can handle phajdan.jr in the meantime), but we should file a separate bug to rework this code to make it more maintainable and/or correct. does that capture everything? Comment on attachment 189366 [details] Patch for landing Clearing flags on attachment: 189366 Committed r143504: <http://trac.webkit.org/changeset/143504> All reviewed patches have been landed. Closing bug. https://bugs.webkit.org/show_bug.cgi?id=110380 filed for (1). *** Bug 84582 has been marked as a duplicate of this bug. *** |