Bug 77886 - KeywordLookupGenerator.py script fails in some cases
Summary: KeywordLookupGenerator.py script fails in some cases
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows 7
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-06 11:08 PST by Ashod Nakashian
Modified: 2012-02-12 21:00 PST (History)
12 users (show)

See Also:


Attachments
Patch (1.26 KB, patch)
2012-02-06 12:27 PST, Ashod Nakashian
no flags Details | Formatted Diff | Diff
Patch (1.24 KB, patch)
2012-02-06 20:46 PST, Ashod Nakashian
no flags Details | Formatted Diff | Diff
Patch (4.09 KB, patch)
2012-02-11 05:09 PST, Ashod Nakashian
no flags Details | Formatted Diff | Diff
Patch (3.86 KB, patch)
2012-02-12 05:43 PST, Ashod Nakashian
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ashod Nakashian 2012-02-06 11:08:39 PST
KeywordLookupGenerator.py splits the input file by newline characters ('\n') using the split() function. The code also trims each line of whitespaces after splitting. However, carriage-return ('\r') isn't necessarily a whitespace character, and, if it exists in the input file, it remains dangling at the end of the the lines. This is not an issue for most lines, except for the first and last, where they are compared verbatim with the expected values.

On my build system (Win7/CygWin/Python2.6) the last line is '@end\r' which fails to compare with the expected '@end' and thereby the complete input file fails to parse and the script exists with an exception. This in turn breaks the build.

The generic fix is to also assume the input may contain both carriage-returns and split by both newline and carriage-return characters (empty lines are removed anyway).
Comment 1 Ashod Nakashian 2012-02-06 12:27:14 PST
Created attachment 125688 [details]
Patch
Comment 2 Benjamin Poulain 2012-02-06 13:08:18 PST
Comment on attachment 125688 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=125688&action=review

I am not sure why the input would use the Windows line ending. Is that converted automatically by git/svn?

> Source/JavaScriptCore/ChangeLog:7
> +        Reviewed by NOBODY (OOPS!).
> +

Missing explanation in the ChangeLog.

> Source/JavaScriptCore/ChangeLog:8
> +        * JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def:

???
Comment 3 Eric Seidel (no email) 2012-02-06 13:09:32 PST
Comment on attachment 125688 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=125688&action=review

This whole split cascade is a bit odd.  Is this file not unittested?  Normally we test all changes in WebKit, including python changes.

> Source/JavaScriptCore/ChangeLog:8
> +        * JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def:

Looks like you didn't actually change this file.
Comment 4 Ashod Nakashian 2012-02-06 20:46:38 PST
Created attachment 125760 [details]
Patch
Comment 5 Ashod Nakashian 2012-02-06 21:48:13 PST
(In reply to comment #2)
> (From update of attachment 125688 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=125688&action=review
> 
> I am not sure why the input would use the Windows line ending. Is that converted automatically by git/svn?

I wouldn't know, but I went back half a dozen or so commits of the input file and found no changes to the last line. This suggests the file did contain cr and it wasn't causing problems.

Considering that the split also trims the strings of whitespace, and considering that in the past I have successfully run this script on WinXP, I'm suspecting it has something to do with the default whitespace list of python in the given version/build/system. There might be some other explanation, but I find this the most reasonable at this point.

At any rate, I think non-print chars like this are bound to creep-in considering that we work on multiple platforms. My suggested fix is to take cr into account and treat them as splitting character as well. There are other solutions too, but probably not as generic.

> 
> > Source/JavaScriptCore/ChangeLog:7
> > +        Reviewed by NOBODY (OOPS!).
> > +
> 
> Missing explanation in the ChangeLog.
Fixed.

> 
> > Source/JavaScriptCore/ChangeLog:8
> > +        * JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def:
> 
> ???
Removed. No idea why the patch script thought this modified.
Comment 6 Ashod Nakashian 2012-02-06 21:51:46 PST
(In reply to comment #3)
> (From update of attachment 125688 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=125688&action=review
> 
> This whole split cascade is a bit odd.  Is this file not unittested?  Normally we test all changes in WebKit, including python changes.

Agreed on the odd part, but considering that line breaks aren't predictable, we either have to remove carriage-returns (before or after splitting) or we have to be more generic and break at CR as well as LF. After all, we *can* have lines with CR breaks only.

> 
> > Source/JavaScriptCore/ChangeLog:8
> > +        * JavaScriptCore.vcproj/JavaScriptCore/JavaScriptCore.def:
> 
> Looks like you didn't actually change this file.
Fixed.
Comment 7 Darin Adler 2012-02-07 12:26:50 PST
(In reply to comment #6)
> line breaks aren't predictable

Maybe true for the programming world at large, but need not be true for the WebKit project. We can make whatever rule we like about line breaks. There is no good reason for us to support CR or CRLF line breaks in project tools; we should consistently set properties so our source files all use LF line breaks except for the small number of files that may be subject to specific platform requirements on Windows that might require something different.

> After all, we *can* have lines with CR breaks only.

Not a goal for the project.
Comment 8 Ashod Nakashian 2012-02-07 20:23:10 PST
(In reply to comment #7)
> (In reply to comment #6)
> > line breaks aren't predictable
> 
> Maybe true for the programming world at large, but need not be true for the WebKit project. We can make whatever rule we like about line breaks. There is no good reason for us to support CR or CRLF line breaks in project tools; we should consistently set properties so our source files all use LF line breaks except for the small number of files that may be subject to specific platform requirements on Windows that might require something different.
> 
> > After all, we *can* have lines with CR breaks only.
> 
> Not a goal for the project.

Fair enough. However, I'd like to avoid accidental mistakes as much as possible. I see 3 possible solutions:

1) Force LF only via tests and convert the input file.
2) Force LF only on the input file via a commit-time hook (git would make that a breeze, alas).
3) Accept my patch which by only adding one line supports the current input file as-is.

I don't know whether or not we have tests for build scripts and specifically for KeywordLookupGenerator.py. Any tips are welcome.
I don't know how to add SVN commit-time hooks on the client.
I've submitted a working patch as the third alternative.

Of course we can always manually fix the input file, until it fails again on someone... or pick on of the above (or another suggestion?)

Let me know what's the preferred action in a similar case for WebKit.
Comment 9 Benjamin Poulain 2012-02-07 20:37:10 PST
> Fair enough. However, I'd like to avoid accidental mistakes as much as possible. I see 3 possible solutions:
> 
> 1) Force LF only via tests and convert the input file.

Yep, you can do that with check-webkit-style. Optionally you can add support for converting automatically the input file through webkit-patch.

> Of course we can always manually fix the input file, until it fails again on someone... or pick on of the above (or another suggestion?)

Given no port nor bot require support for this, I think Darin has a fair point about not supporting CRLF here.
Comment 10 Ashod Nakashian 2012-02-07 21:35:53 PST
(In reply to comment #9)
> > Fair enough. However, I'd like to avoid accidental mistakes as much as possible. I see 3 possible solutions:
> > 
> > 1) Force LF only via tests and convert the input file.
> 
> Yep, you can do that with check-webkit-style. Optionally you can add support for converting automatically the input file through webkit-patch.

Great. I'll make the appropriate changes in at least check-webkit-style and possibly auto-convert in webkit-patch and resubmit.

> 
> > Of course we can always manually fix the input file, until it fails again on someone... or pick on of the above (or another suggestion?)
> 
> Given no port nor bot require support for this, I think Darin has a fair point about not supporting CRLF here.

Sure.
Comment 11 Ashod Nakashian 2012-02-11 05:09:16 PST
Created attachment 126629 [details]
Patch
Comment 12 Benjamin Poulain 2012-02-11 14:05:22 PST
Comment on attachment 126629 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=126629&action=review

Some nitpicking:

> Source/JavaScriptCore/ChangeLog:12
> +        Keywords.table has been converted to LF-only.
> +
> +        * parser/Keywords.table:

The line "* parser/Keywords.table:" is made for the comments about that file. You should have: 

* parser/Keywords.table: Converted to LF-only.

> Source/JavaScriptCore/parser/Keywords.table:38
> -# reserved for future use
> +# reserved for future use.

Our rule is "Make comments look like sentences by starting with a capital letter and ending with a period (punctation)."

If you fix the period, you should fix the capital letter  :)

> Source/JavaScriptCore/parser/Keywords.table:49
> -# reserved for future use in strict code
> +# reserved for future use in strict code.

ditto

> Tools/ChangeLog:10
> +        Keywords.table has been converted to LF-only.

This should not be part of this particular ChangeLog.

> Tools/Scripts/webkitpy/style/checker.py:243
> +      # This is only critical for the first and last lines
> +      # which are compared verbatim.

This is describing an implementation detail in an other script, it is a bad idea to have that in a comment. You can drop those lines.

> Tools/Scripts/webkitpy/style/checker_unittest.py:273
> +        path = "Source/JavaScriptCore/parser/Keywords.table"
> +        assertCheck(path, "whitespace/carriage_return")

No need for the variable "path", follow the style of this function passing strings as literal.
Comment 13 Ashod Nakashian 2012-02-12 05:14:14 PST
(In reply to comment #12)
> (From update of attachment 126629 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=126629&action=review
> 
> Some nitpicking:
> 
Thanks for all the help. All of your notes make sense. Since some of them are project-specific, without knowing what WebKit specifically prefers, I err on the safer side of being as conforming as much as possible (hence not fixing the case of the first letter and creating a superfluous path variable.)
Comment 14 Ashod Nakashian 2012-02-12 05:43:10 PST
Created attachment 126675 [details]
Patch
Comment 15 Benjamin Poulain 2012-02-12 18:47:43 PST
Comment on attachment 126675 [details]
Patch

Looks good. Thanks for the patch.
Comment 16 WebKit Review Bot 2012-02-12 21:00:40 PST
Comment on attachment 126675 [details]
Patch

Clearing flags on attachment: 126675

Committed r107527: <http://trac.webkit.org/changeset/107527>
Comment 17 WebKit Review Bot 2012-02-12 21:00:46 PST
All reviewed patches have been landed.  Closing bug.