Bug 214954

Summary: check-webkit-style should enforce acronym capitalization at start/end of an identifier
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: Tools / TestsAssignee: David Kilzer (:ddkilzer) <ddkilzer>
Status: RESOLVED FIXED    
Severity: Normal CC: darin, ews-watchlist, glenn, jbedard, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=214756
Bug Depends on:    
Bug Blocks: 215026    
Attachments:
Description Flags
Patch v1 none

Description David Kilzer (:ddkilzer) 2020-07-29 18:27:58 PDT
check-webkit-style should enforce acronym capitalization at start/end of an identifier.

Specifically, for "URL".  See Bug 214756 Comment #19.
Comment 1 David Kilzer (:ddkilzer) 2020-07-29 18:30:13 PDT
We can now find the incorrectly capitalized identifiers in DumpRenderTree.mm:

$ ./Tools/Scripts/check-webkit-style --filter-rules=-,+readability/naming/acronym Tools/DumpRenderTree/mac/DumpRenderTree.mm 
ERROR: Tools/DumpRenderTree/mac/DumpRenderTree.mm:297:  The identifier name "URLString" starts with a acronym that is not all lowercase.  [readability/naming/acronym] [5]
ERROR: Tools/DumpRenderTree/mac/DumpRenderTree.mm:1648:  The identifier name "testPathOrUrl" ends with a acronym that is not all uppercase.  [readability/naming/acronym] [5]
ERROR: Tools/DumpRenderTree/mac/DumpRenderTree.mm:1650:  The identifier name "localPathOrUrl" ends with a acronym that is not all uppercase.  [readability/naming/acronym] [5]
Total errors found: 3 in 1 files

And verify that Source/WTF has no incorrectly named variables (might be false negatives, but no false positives):

$ ./Tools/Scripts/check-webkit-style --filter-rules=-,+readability/naming/acronym Source/WTF
Total errors found: 0 in 828 files
Comment 2 David Kilzer (:ddkilzer) 2020-07-29 21:09:07 PDT
Created attachment 405543 [details]
Patch v1
Comment 3 David Kilzer (:ddkilzer) 2020-07-29 21:52:30 PDT
BTW, I don't plan on running the script on existing code and fixing issues right away since I have other priorities to work on.
Comment 4 Jonathan Bedard 2020-07-30 07:28:47 PDT
Comment on attachment 405543 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=405543&action=review

> Tools/Scripts/webkitpy/style/checkers/cpp.py:1820
> +    # FIXME: Hard to check middle words without knowing that the word to the left doesn't end with an acronym.

We already have a list of acronyms, can't we just do 'starts with' iterating through our list?
Comment 5 David Kilzer (:ddkilzer) 2020-07-30 10:20:22 PDT
Comment on attachment 405543 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=405543&action=review

> Tools/Scripts/webkitpy/style/checkers/cpp.py:1805
> +    acronyms = '|'.join(['MIME', 'URL'])

@Jonathan: This list of acronyms could be replaced by the other list you mentioned.  Can you tell me where the list is defined?

>> Tools/Scripts/webkitpy/style/checkers/cpp.py:1820
>> +    # FIXME: Hard to check middle words without knowing that the word to the left doesn't end with an acronym.
> 
> We already have a list of acronyms, can't we just do 'starts with' iterating through our list?

Where is the list of acronyms?  I wasn't aware of it until you mentioned it.  (See previous comment.)

Also, I'm not sure how a list of acronyms would help with this FIXME.  The case I'm concerned about is something (made-up) like:

    canUnfurlFlag
    canCurlHair

Searching the middle of a variable for "url" (in this case) might return a false positive, which is what the comment was about.

I was thinking more about this after posting the patch, and I think a better algorithm may be to:

- Split camelCase identifier into individual words using the start of a capital letter followed by a lowercase letter (after the initial word which would be all-uppercase or all-lowercase).
- Compare each individual word in the list to a list of acronyms, and determine whether its case is wrong whether it's first or not-first in the list.

The algorithm wouldn't be perfect, but would probably do a better job (and be easier to understand) than this code:

- HTMLDOMDocument => (HTMLDOM, Document)
- canUnfurlFlag => (can, Unfurl, Flag)
- canCurlHair => (can Curl Hair)
- URLString => (URL, String)
- localPathOrUrl => (local, Path, Or, Url)
- MIMEStringForURL => (MIME, String, For, URL)

Now this algorithm assumes each word is spelled correctly, but we're not trying to solve correct spelling of camelCase words, so that's okay.
Comment 6 Jonathan Bedard 2020-07-30 10:27:29 PDT
Comment on attachment 405543 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=405543&action=review

>> Tools/Scripts/webkitpy/style/checkers/cpp.py:1805
>> +    acronyms = '|'.join(['MIME', 'URL'])
> 
> @Jonathan: This list of acronyms could be replaced by the other list you mentioned.  Can you tell me where the list is defined?

I was actually referring to this list here, although you explanation of the FIXME and false-positive matches makes it more clear the cases you had in mind.

>>> Tools/Scripts/webkitpy/style/checkers/cpp.py:1820
>>> +    # FIXME: Hard to check middle words without knowing that the word to the left doesn't end with an acronym.
>> 
>> We already have a list of acronyms, can't we just do 'starts with' iterating through our list?
> 
> Where is the list of acronyms?  I wasn't aware of it until you mentioned it.  (See previous comment.)
> 
> Also, I'm not sure how a list of acronyms would help with this FIXME.  The case I'm concerned about is something (made-up) like:
> 
>     canUnfurlFlag
>     canCurlHair
> 
> Searching the middle of a variable for "url" (in this case) might return a false positive, which is what the comment was about.
> 
> I was thinking more about this after posting the patch, and I think a better algorithm may be to:
> 
> - Split camelCase identifier into individual words using the start of a capital letter followed by a lowercase letter (after the initial word which would be all-uppercase or all-lowercase).
> - Compare each individual word in the list to a list of acronyms, and determine whether its case is wrong whether it's first or not-first in the list.
> 
> The algorithm wouldn't be perfect, but would probably do a better job (and be easier to understand) than this code:
> 
> - HTMLDOMDocument => (HTMLDOM, Document)
> - canUnfurlFlag => (can, Unfurl, Flag)
> - canCurlHair => (can Curl Hair)
> - URLString => (URL, String)
> - localPathOrUrl => (local, Path, Or, Url)
> - MIMEStringForURL => (MIME, String, For, URL)
> 
> Now this algorithm assumes each word is spelled correctly, but we're not trying to solve correct spelling of camelCase words, so that's okay.

I was referring to your above list. (seems like DOM and HTML should be in it, though)

If we want to do the more thorough camelCase splitting, that really feels like a different patch, since it's quite a bit more complicated then the startswith case I suggested.
Comment 7 David Kilzer (:ddkilzer) 2020-07-30 14:32:59 PDT
Comment on attachment 405543 [details]
Patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=405543&action=review

>>> Tools/Scripts/webkitpy/style/checkers/cpp.py:1805
>>> +    acronyms = '|'.join(['MIME', 'URL'])
>> 
>> @Jonathan: This list of acronyms could be replaced by the other list you mentioned.  Can you tell me where the list is defined?
> 
> I was actually referring to this list here, although you explanation of the FIXME and false-positive matches makes it more clear the cases you had in mind.

I should note that I wanted to start with a small list that was easy for others to add to.  I don't have time to check for false positives for a ton of acronyms up-front.

>>>> Tools/Scripts/webkitpy/style/checkers/cpp.py:1820
>>>> +    # FIXME: Hard to check middle words without knowing that the word to the left doesn't end with an acronym.
>>> 
>>> We already have a list of acronyms, can't we just do 'starts with' iterating through our list?
>> 
>> Where is the list of acronyms?  I wasn't aware of it until you mentioned it.  (See previous comment.)
>> 
>> Also, I'm not sure how a list of acronyms would help with this FIXME.  The case I'm concerned about is something (made-up) like:
>> 
>>     canUnfurlFlag
>>     canCurlHair
>> 
>> Searching the middle of a variable for "url" (in this case) might return a false positive, which is what the comment was about.
>> 
>> I was thinking more about this after posting the patch, and I think a better algorithm may be to:
>> 
>> - Split camelCase identifier into individual words using the start of a capital letter followed by a lowercase letter (after the initial word which would be all-uppercase or all-lowercase).
>> - Compare each individual word in the list to a list of acronyms, and determine whether its case is wrong whether it's first or not-first in the list.
>> 
>> The algorithm wouldn't be perfect, but would probably do a better job (and be easier to understand) than this code:
>> 
>> - HTMLDOMDocument => (HTMLDOM, Document)
>> - canUnfurlFlag => (can, Unfurl, Flag)
>> - canCurlHair => (can Curl Hair)
>> - URLString => (URL, String)
>> - localPathOrUrl => (local, Path, Or, Url)
>> - MIMEStringForURL => (MIME, String, For, URL)
>> 
>> Now this algorithm assumes each word is spelled correctly, but we're not trying to solve correct spelling of camelCase words, so that's okay.
> 
> I was referring to your above list. (seems like DOM and HTML should be in it, though)
> 
> If we want to do the more thorough camelCase splitting, that really feels like a different patch, since it's quite a bit more complicated then the startswith case I suggested.

Okay, I can do a follow-up patch.  This code seems like such a hack, though.  :)
Comment 8 EWS 2020-07-30 14:44:38 PDT
Committed r265096: <https://trac.webkit.org/changeset/265096>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 405543 [details].
Comment 9 Radar WebKit Bug Importer 2020-07-30 14:46:26 PDT
<rdar://problem/66348710>