Bug 128422 - Web Inspector: update check-webkit-style to flag single quotes in WebInspectorUI projects
Summary: Web Inspector: update check-webkit-style to flag single quotes in WebInspecto...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Inspector (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
: 119901 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-02-07 17:22 PST by James Craig
Modified: 2014-03-02 12:46 PST (History)
6 users (show)

See Also:


Attachments
Patch (5.91 KB, patch)
2014-02-13 03:02 PST, Diego Pino
no flags Details | Formatted Diff | Diff
Patch (6.13 KB, patch)
2014-02-18 01:41 PST, Diego Pino
no flags Details | Formatted Diff | Diff
Patch (5.69 KB, patch)
2014-02-20 05:17 PST, Diego Pino
no flags Details | Formatted Diff | Diff
Patch (5.63 KB, patch)
2014-02-21 02:28 PST, Diego Pino
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Craig 2014-02-07 17:22:21 PST
Since WebInspectorUI style has the additional requirement to use double-quotes only, update check-webkit-style to flag single-quotes in WebInspectorUI files. I've forgotten this several times. Seems like an easy win.
Comment 1 Radar WebKit Bug Importer 2014-02-07 17:22:45 PST
<rdar://problem/16017987>
Comment 2 Diego Pino 2014-02-13 03:02:32 PST
Created attachment 224055 [details]
Patch
Comment 3 Diego Pino 2014-02-13 03:12:16 PST
The checker does the following:

   * Multi line and single line comments are removed. Single quotes should be valid in comments.
   * Check for single quotes. Single quotes that happen within strings ("") or regular expressions should be valid.
   * Otherwise error.

The current js.py checker is only used to validate the JavaScript files of the Web Inspector, so added validation for single quotes to that file.

In addition to this, there are currently some files from the Inspector that include single quotes. Here is the list:

CookieStorageContentView.js(254):     if (cookieDomain.charAt(0) !== '.')
DOMTreeElement.js(539):             tag.textContent = '';
DOMTreeElement.js(540):             tag.appendChild(document.createTextNode('<'+nodeName));
DOMTreeElement.js(542):             tag.appendChild(document.createTextNode('>'));
DOMTreeElement.js(752):             tagNameElement.removeEventListener('keyup', keyupListener, false);
DOMTreeElement.js(752):             tagNameElement.removeEventListener('keyup', keyupListener, false);
DOMTreeElement.js(762):         tagNameElement.addEventListener('keyup', keyupListener, false);
DataGrid.js(678):             emptyData[identifier] = '';
ColorPicker.js(49):     this._opacityPattern = 'url("data:image/svg+xml;base64,' + btoa('<svg xmlns="http://www.w3.org/2000/svg" width="6" height="6" fill="rgb(204, 204, 204)"><rect width="3" height="3" /><rect x="3" y="3" width="3" height="3"/></svg>') + '")';
DOMTreeOutline.js(380):         event.dataTransfer.dropEffect = 'move';
Main.js(491):         console.assert(profileTitle[0] === '/');

I have concerns for ColorPicker.js because single quotes are used to wrap double quotes. It looks a reasonable use of single quotes. Either we escape the double quotes or add logic to the checker to handle this case.

The rest of files I think they should be modified. And don't know if it's better to do this in this patch or to file a new bug to remove single quotes from these files.
Comment 4 Joseph Pecoraro 2014-02-13 10:55:59 PST
(In reply to comment #3)
> The checker does the following:
> 
>    * Multi line and single line comments are removed. Single quotes should be valid in comments.
>    * Check for single quotes. Single quotes that happen within strings ("") or regular expressions should be valid.
>    * Otherwise error.

Awesome!


> The rest of files I think they should be modified. And don't know if it's better to do this in this patch or to file a new bug to remove single quotes from these files.

Either approach works. The cleanest would be separate patches.
Comment 5 Joseph Pecoraro 2014-02-17 17:06:03 PST
Comment on attachment 224055 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224055&action=review

> Tools/Scripts/webkitpy/style/checkers/js.py:64
> +            """FIXME: Despite stripping comments, there are two single-line comments
> +            not stripped in CodeMirrorAdditions.js. """
> +            if (line.strip().find("//") == 0):
> +                continue

Oh, what comments are these? The first part of the regex very clearly handles '//.*?$' so its surprising to me that some might have been missed.

> Tools/Scripts/webkitpy/style/checkers/js.py:67
> +            """Single-quotes within strings and regular expressions are valid."""
> +            if (self._within_string(line, "'") or self._within_regex(line, "'")):
> +                continue

Nit: You can just use python "# foo" comment syntax here. Unless there is an advantage to the """....""" syntax that I don't know about. These just look like generic inline comments.

> Tools/Scripts/webkitpy/style/checkers/js.py:71
> +            if (pos != -1):
> +                line_number, original_line = self._find_original_line(content, line)
> +                self._handle_style_error(line_number, "js/syntax", 5, "Line contains single-quote character.")

This is pretty basic, which may be okay. It checks for the first single quote and verifies that it is not within the first double quoted string on the list. Likewise for regex. I don't think it can give false positives, but issues that I think would go undetected include:

    foo("a'b", 'c');
    foo("a/'b", '/c');
    foo(/'/, 'c');

You might be able to detect these by (1) checking all the single quotes on a line and (2) if you do a find and rfind around the position of the character itself:
http://docs.python.org/2/library/string.html#string.find

Alternatively, now that comments are removed you could just search for single quoted strings. The regular expression in _remove_comments looks like it would already be sufficient if you ignore double quoted strings.

However, that being said our code tends to be rather straightforward, so this looks good to me.
Comment 6 James Craig 2014-02-17 17:30:50 PST
(In reply to comment #5)
> Nit: You can just use python "# foo" comment syntax here. Unless there is an advantage to the """....""" syntax that I don't know about. These just look like generic inline comments.

Convention is to use triple-quote comments for docstrings:
http://en.wikibooks.org/wiki/Python_Programming/Source_Documentation_and_Comments#Documentation_Strings
Comment 7 Joseph Pecoraro 2014-02-17 17:59:32 PST
(In reply to comment #6)
> (In reply to comment #5)
> > Nit: You can just use python "# foo" comment syntax here. Unless there is an advantage to the """....""" syntax that I don't know about. These just look like generic inline comments.
> 
> Convention is to use triple-quote comments for docstrings:
> http://en.wikibooks.org/wiki/Python_Programming/Source_Documentation_and_Comments#Documentation_Strings

To quote the documentation:

> Fortunately, Python has such a capability. Documentation strings (or docstrings)
> are used to create easily-accessible documentation. You can add a docstring to a
> function, class, or module by adding a string as the first indented statement.

In these cases it is never the first indented line statement. Hence my confusion on why the syntax is being used.
Comment 8 Joseph Pecoraro 2014-02-17 18:00:42 PST
And having a "FIXME: ..." be a docstring is also confusing to me =)
Comment 9 Diego Pino 2014-02-18 01:22:11 PST
Comment on attachment 224055 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224055&action=review

>> Tools/Scripts/webkitpy/style/checkers/js.py:64
>> +                continue
> 
> Oh, what comments are these? The first part of the regex very clearly handles '//.*?$' so its surprising to me that some might have been missed.

Yes, I don't know because it seems to be working OK for more comments like this even if this very same file. If I remove that checking I got:

CodeMirrorAdditions.js(44):         // If the stream isn't at the end of line then we found the end quote.
CodeMirrorAdditions.js(111):             // We don't expect other tokens between attribute and string since

Actually, it's not happening for just these two lines (there are two because they contain a single-quote), it's happening for more lines in CodeMirrorAdditions.js. I isolated the function in its own file, run it and I got:

        // If the stream isn't at the end of line then we found the end quote.
        // In the case, change _linkTokenize to parse the end of the link next.
        // Otherwise _linkTokenize will stay as-is to parse more of the link.
        // Eat the quote character to style it with the base style.
        // Clean up the state.
            // Call the link tokenizer instead.
        // Remember the start position so we can rewind if needed.
            // Look for "href" or "src" attributes. If found then we should
            // expect a string later that should get the "link" style instead.
            // This is a link, so setup the state to process it next.
            // The attribute may or may not be quoted.
            // Rewind the steam to the start of this token.
            // Eat the open quote of the string so the string style
            // will be used for the quote character.
            // We don't expect other tokens between attribute and string since

>> Tools/Scripts/webkitpy/style/checkers/js.py:71
>> +                self._handle_style_error(line_number, "js/syntax", 5, "Line contains single-quote character.")
> 
> This is pretty basic, which may be okay. It checks for the first single quote and verifies that it is not within the first double quoted string on the list. Likewise for regex. I don't think it can give false positives, but issues that I think would go undetected include:
> 
>     foo("a'b", 'c');
>     foo("a/'b", '/c');
>     foo(/'/, 'c');
> 
> You might be able to detect these by (1) checking all the single quotes on a line and (2) if you do a find and rfind around the position of the character itself:
> http://docs.python.org/2/library/string.html#string.find
> 
> Alternatively, now that comments are removed you could just search for single quoted strings. The regular expression in _remove_comments looks like it would already be sufficient if you ignore double quoted strings.
> 
> However, that being said our code tends to be rather straightforward, so this looks good to me.

OK, I think I will leave like this then. Thanks for pointing those cases.
Comment 10 Diego Pino 2014-02-18 01:32:35 PST
(In reply to comment #6)
> (In reply to comment #5)
> > Nit: You can just use python "# foo" comment syntax here. Unless there is an advantage to the """....""" syntax that I don't know about. These just look like generic inline comments.
> 
> Convention is to use triple-quote comments for docstrings:
> http://en.wikibooks.org/wiki/Python_Programming/Source_Documentation_and_Comments#Documentation_Strings

I didn't know about this convention. I checked a few .py file of the codebase and I only found examples of these type of comments being used, so I thought hash comments were not allowed (which I was clearly wrong because they are used extensively in other files such as cpp.py).

I will change the comments to "# foo" where appropriate.
Comment 11 Diego Pino 2014-02-18 01:41:39 PST
Created attachment 224482 [details]
Patch
Comment 12 Joseph Pecoraro 2014-02-18 11:13:58 PST
Comment on attachment 224482 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224482&action=review

> Tools/Scripts/webkitpy/style/checkers/js.py:30
> +This checker is only used to check WebInspector Javascript files.

Typo: "Javascript" => "JavaScript".

> Tools/Scripts/webkitpy/style/checkers/js.py:63
> +            # FIXME: Despite stripping comments, there are two single-line comments

Apparently this comment is wrong. There are more then two single-line comments that are not stripped.

> Tools/Scripts/webkitpy/style/checkers/js.py:100
> +        pattern = re.compile(r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"', re.DOTALL | re.MULTILINE)

I wonder if the string regexes are too greedy. Given that they are multiline.

The string regex components are:

    \'(?:\\.|[^\\\'])*\'

and:

    "(?:\\.|[^\\"])*"

Both of these are greedy. I believe the pattern /A(.)*A/ should be /A(.)*?A/. This may explain why some things are not getting stripped properly.

Also, the regex attempts to avoid escape sequences. If you're in a single quoted string it attempts to avoid the sequence |\'| in the regex with [^\\\']. However, that is wrong.

Maybe we should rethink this.
Comment 13 Diego Pino 2014-02-19 05:33:22 PST
(In reply to comment #12)
> (From update of attachment 224482 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=224482&action=review
> 
> > Tools/Scripts/webkitpy/style/checkers/js.py:30
> > +This checker is only used to check WebInspector Javascript files.
> 
> Typo: "Javascript" => "JavaScript".
> 

Ok

> > Tools/Scripts/webkitpy/style/checkers/js.py:63
> > +            # FIXME: Despite stripping comments, there are two single-line comments
> 
> Apparently this comment is wrong. There are more then two single-line comments that are not stripped.
> 

Ok

> > Tools/Scripts/webkitpy/style/checkers/js.py:100
> > +        pattern = re.compile(r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"', re.DOTALL | re.MULTILINE)
> 
> I wonder if the string regexes are too greedy. Given that they are multiline.
> 
> The string regex components are:
> 
>     \'(?:\\.|[^\\\'])*\'
> 
> and:
> 
>     "(?:\\.|[^\\"])*"
> 
> Both of these are greedy. I believe the pattern /A(.)*A/ should be /A(.)*?A/. This may explain why some things are not getting stripped properly.
>

I tried non-greedy and got the same results.
 
> Also, the regex attempts to avoid escape sequences. If you're in a single quoted string it attempts to avoid the sequence |\'| in the regex with [^\\\']. However, that is wrong.
>

My fault, the snippet of code was intented for removing comments in C. In addition, to have it completely ready for JavaScript it should match regular expressions too (in case a // or /* happens within a regex) :/
 
> Maybe we should rethink this.

I spent some time yesterday but I still got nothing. Before sending this patch I used a tool called jsstrip (https://code.google.com/p/jsstrip/wiki/UsagePython) to remove the comments. It worked well but it seemed to me a little bit overkill to include an external tool just for that (and there might be problems with licenses, etc). I've been taking a look at jsmin too (http://code.google.com/p/v8/source/browse/branches/bleeding_edge/tools/jsmin.py?r=3565) to see what it does for handling comments.
Comment 14 Timothy Hatcher 2014-02-19 10:43:55 PST
(In reply to comment #13)
> (In reply to comment #12)
> > (From update of attachment 224482 [details] [details])
> I spent some time yesterday but I still got nothing. Before sending this patch I used a tool called jsstrip (https://code.google.com/p/jsstrip/wiki/UsagePython) to remove the comments. It worked well but it seemed to me a little bit overkill to include an external tool just for that (and there might be problems with licenses, etc). I've been taking a look at jsmin too (http://code.google.com/p/v8/source/browse/branches/bleeding_edge/tools/jsmin.py?r=3565) to see what it does for handling comments.

We try to be conservative in pulling in external libraries. We do have a copy of jsmin.py in JavaScriptCore/inspector/scripts/jsmin.py.
Comment 15 Diego Pino 2014-02-20 05:17:41 PST
Created attachment 224749 [details]
Patch
Comment 16 Diego Pino 2014-02-20 05:28:23 PST
I recoded the whole thing. 

Now what the script does is iterating through lines, checks if line is in a multi-line-comment (it assumes that a line starting with '*' is a line within a multi-line-comment). If line is not a multi-line-comment, removes the content of strings, single-line-comment (if any) and regular expressions. If after that, there's a single quote character returns error.

I also added the other tests suggested by Joseph.

After checking against the whole Web Inspector code base, the results are the following:

CookieStorageContentView.js(254):     if (cookieDomain.charAt(0) !== '.')
DOMTreeElement.js(539):             tag.textContent = '';
DOMTreeElement.js(540):             tag.appendChild(document.createTextNode('<'+nodeName));
DOMTreeElement.js(542):             tag.appendChild(document.createTextNode('>'));
DOMTreeElement.js(752):             tagNameElement.removeEventListener('keyup', keyupListener, false);
DOMTreeElement.js(758):             tagNameElement.removeEventListener('keyup', keyupListener, false);
DOMTreeElement.js(762):         tagNameElement.addEventListener('keyup', keyupListener, false);
DataGrid.js(660):             emptyData[identifier] = '';
DataGrid.js(1111):                     var columnTitle = this.dataGrid.columns.get(columnIdentifier).get('title');
ColorPicker.js(49):     this._opacityPattern = 'url("data:image/svg+xml;base64,' + btoa('<svg xmlns="http://www.w3.org/2000/svg" width="6" height="6" fill="rgb(204, 204, 204)"><rect width="3" height="3" /><rect x="3" y="3" width="3" height="3"/></svg>') + '")';
DOMTreeOutline.js(380):         event.dataTransfer.dropEffect = 'move';
Main.js(491):         console.assert(profileTitle[0] === '/');

Same results as before (the single quote in DataGrid:1111 was added recently). Hope this approach can work.
Comment 17 Timothy Hatcher 2014-02-20 09:08:27 PST
Comment on attachment 224749 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224749&action=review

Looking good to me, but I will let Joe review too.

> Tools/Scripts/webkitpy/style/checkers/js.py:67
> +            if (line.endswith("*/")):

We have had cases where there is a space after the comment ends that this wouldn't catch.

> Tools/Scripts/webkitpy/style/checkers/js.py:71
> +            if (line.startswith("/*") or line.startswith("*")):

Not a foolproof check for multi-line comments, but we don't use them much at all. This will take care of the license comments.
Comment 18 Joseph Pecoraro 2014-02-20 12:10:15 PST
Comment on attachment 224749 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224749&action=review

r=me! I like this much better! Thanks for looking into it again.

> Tools/Scripts/webkitpy/style/checkers/js.py:75
> +            # Remove strings

Nit: Remove "double quoted" strings.

> Tools/Scripts/webkitpy/style/checkers/js.py:76
> +            line = re.compile(r'"(?:[^"\\]|\\.)*"', re.MULTILINE).sub('""', line)

Nit: I don't think you need re.MULTILINE. The input string (line) is only a single line anyways right?

> Tools/Scripts/webkitpy/style/checkers/js.py:78
> +            # Remove single line comment if any

Nit: Comment should be full sentences that end in a period. That applies to all these # comments.

> Tools/Scripts/webkitpy/style/checkers/js.py:88
> +            line = re.compile('/.+?/', re.MULTILINE).sub('//', line)

Ditto regarding MULTILINE.
Comment 19 Diego Pino 2014-02-21 02:28:12 PST
Created attachment 224848 [details]
Patch
Comment 20 Diego Pino 2014-02-21 02:35:03 PST
Comment on attachment 224749 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224749&action=review

>> Tools/Scripts/webkitpy/style/checkers/js.py:67
>> +            if (line.endswith("*/")):
> 
> We have had cases where there is a space after the comment ends that this wouldn't catch.

Line is stripped, that means that leading and trailing whitespace characters are removed before the check.
http://docs.python.org/2/library/string.html

>> Tools/Scripts/webkitpy/style/checkers/js.py:76
>> +            line = re.compile(r'"(?:[^"\\]|\\.)*"', re.MULTILINE).sub('""', line)
> 
> Nit: I don't think you need re.MULTILINE. The input string (line) is only a single line anyways right?

OK
Comment 21 Joseph Pecoraro 2014-02-21 10:41:00 PST
Comment on attachment 224848 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=224848&action=review

> Tools/Scripts/webkitpy/style/checkers/js.py:74
> +            #  Remove "double quoted" strings.

Double space in comment.

> Tools/Scripts/webkitpy/style/checkers/js.py:80
> +                    line = line[:single_line_comment_pos]

Double indent.

> Tools/Scripts/webkitpy/style/checkers/js.py:91
> +                self._handle_style_error(line_number, "js/syntax", 5, "Line contains single-quote character.")

It might be nice to dump the line at the same time.
Comment 22 WebKit Commit Bot 2014-02-21 11:12:08 PST
Comment on attachment 224848 [details]
Patch

Clearing flags on attachment: 224848

Committed r164487: <http://trac.webkit.org/changeset/164487>
Comment 23 WebKit Commit Bot 2014-02-21 11:12:12 PST
All reviewed patches have been landed.  Closing bug.
Comment 24 Diego Pino 2014-02-24 04:26:42 PST
(In reply to comment #21)
> (From update of attachment 224848 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=224848&action=review
> 
> > Tools/Scripts/webkitpy/style/checkers/js.py:74
> > +            #  Remove "double quoted" strings.
> 
> Double space in comment.
> 
> > Tools/Scripts/webkitpy/style/checkers/js.py:80
> > +                    line = line[:single_line_comment_pos]
> 
> Double indent.
> 

Fixed in r164582.

> > Tools/Scripts/webkitpy/style/checkers/js.py:91
> > +                self._handle_style_error(line_number, "js/syntax", 5, "Line contains single-quote character.")
> 
> It might be nice to dump the line at the same time.

Agree, but I think that in case of solving that it makes to do it in for all type of errors, that means to change how handle_style_error works.
Comment 25 BJ Burg 2014-03-02 12:46:00 PST
*** Bug 119901 has been marked as a duplicate of this bug. ***