Bug 163048

Summary: Modify check-webkit-style to prohibit sensitive phrases
Product: WebKit Reporter: John Wilander <wilander>
Component: Tools / TestsAssignee: John Wilander <wilander>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, commit-queue, dbates, ddkilzer, glenn, lforschler, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description John Wilander 2016-10-06 11:35:08 PDT
Warn about security-sensitive terms in ChangeLog entries.
Comment 1 John Wilander 2016-10-06 11:42:50 PDT
Created attachment 290839 [details]
Patch
Comment 2 John Wilander 2016-10-06 11:44:16 PDT
rdar://problem/28289755
Comment 3 Brent Fulgham 2016-10-06 12:30:12 PDT
Comment on attachment 290839 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290839&action=review

This change looks fine to me. I'd like Dan or someone more familiar with webkitpy to see if there's anything missing.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:111
> +           # bug, bypass, crash, dereference, error, failure, heap, leak, overflow, stack

This is useful information, but I don't think it needs to be encoded as a comment in the file. Can you just mention this in the Bugzilla notes?

> Tools/Scripts/webkitpy/style/checkers/changelog.py:114
> +        # Make sure words are separated by a single space

I don't think this comment is necessary. An even clearer way to express this idea would be to make a new function "collapseWhitespace" or "ensureOneSpaceBetweenWords" or something.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:143
> +                                    "Please don't use the security-sensitive '" + ("terms " if more_than_one_unwanted_term_found else "term ") + all_unwanted_terms_used + "' in change logs or in commits.")

This line was a little difficult to read. It seems like a "format_dangerous_terms_message" function would be nice here.
Comment 4 John Wilander 2016-10-06 13:02:16 PDT
Thanks, Brent! I'll wait for Dan's comments too.
Comment 5 Daniel Bates 2016-10-06 14:20:39 PDT
Comment on attachment 290839 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290839&action=review

I am worried about the false positive rate for this change. How did you come up with this list of "unwanted security terms"? Did you build it up from looking at the ChangeLog entries for security bug fixes? Do we have any metrics on the false positive rate of running this filter against the existing ChangeLog files?

> Tools/Scripts/webkitpy/style/checkers/changelog.py:98
> +    def check_for_unwanted_security_terms(self, first_line_checked, segments):

The name segments is ambiguous. I suggest that we use the same name that the caller uses, entry_lines, to describe the array of lines of text from the change log entry.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:99
> +        unwanted_security_terms = [

Maybe unwanted_phrases would be a more descriptive name for this variable as it contains phrases such as "buffer underrun".

> Tools/Scripts/webkitpy/style/checkers/changelog.py:103
> +           "memory corruption", "null dereference", "null pointer dereference",

From my understanding we do not consider a null pointer deference as a security bug. So, I would remove the terms "null dereference" and "null pointer dereference".

> Tools/Scripts/webkitpy/style/checkers/changelog.py:107
> +           "fuzz",         # Captures fuzzer, fuzzing, fuzztest

This will also match the adjective "fuzzy" and "jsfunfuzz" - the name of a JavaScript fuzzer. I guess we do not mention either often enough.

Minor: No change needed. I do not see the need to space out this comment and the one below such that they are both left aligned.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:115
> +        trimmed_segments = []

Maybe a better name for this variable would be cleansed_lines.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:116
> +        for untrimmed_segment in segments:

I suggest that we rename untrimmed_segment to line to better describe what it is, a line from a ChangeLog entry.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:141
> +        class FoundUnwantedSecurityTerm:
> +            def __init__(self, term, line_number):
> +                self.term = term
> +                self.line_number = line_number
> +
> +        found_unwanted_security_terms = []
> +        for unwanted_term in unwanted_security_terms:
> +            for segment_index, segment in enumerate(trimmed_segments):
> +                if (searchIgnorecase(unwanted_term, segment)):
> +                    found_unwanted_security_terms.append(FoundUnwantedSecurityTerm(unwanted_term, first_line_checked + segment_index))
> +
> +        if len(found_unwanted_security_terms) > 0:
> +            all_unwanted_terms_used = ""
> +            first_line_number_with_unwanted_term = maxsize
> +            more_than_one_unwanted_term_found = False
> +            for found_unwanted_security_term in found_unwanted_security_terms:
> +                first_line_number_with_unwanted_term = min(first_line_number_with_unwanted_term, found_unwanted_security_term.line_number)
> +                if (all_unwanted_terms_used == ""):
> +                    all_unwanted_terms_used = found_unwanted_security_term.term
> +                else:
> +                    more_than_one_unwanted_term_found = True
> +                    all_unwanted_terms_used += ", " + found_unwanted_security_term.term
> +            self.handle_style_error(first_line_number_with_unwanted_term,

Making use of webkitpy.tool.grammar.pluralize() to do the pluralization logic of "term", and removing the remark "or in commits" from the error message (*), I would write this code as:

# FIXME: Handle hyphens used to divide words.
unwanted_phrase_and_line_number_pairs = []
for i, line in enumerate(cleansed_lines):
    for phrase in unwanted_phrases:
        if (searchIgnorecase(phrase, line)):
            line_number = first_line_checked + i
            unwanted_phrase_and_line_number_pairs.append((phrase, line_number))

if not unwanted_phrase_and_line_number_pairs:
    return

unwanted_phrases = []
for phrase, line_number in unwanted_phrase_and_line_number_pairs:
    unwanted_phrases.append(phrase)
first_line_number_with_unwanted_phrase = unwanted_phrase_and_line_number_pairs[0][1]
self.handle_style_error(first_line_number_with_unwanted_phrase,
                        "changelog/unwantedsecurityterms", 3,
                        "Please don't use the following security-sensitive {} in change logs: {}".format(pluralize(len(unwanted_phrase_and_line_number_pairs), "term"), " ".join(unwanted_phrases))

Notice that we do not need to use sys.maxsize.

(*) It seem unnecessary to add the remark "or in commits" as we assume a committer knows the project conventions such that they will always use the ChangeLog message as their commit message. For commit queue landed changes, the commit queue does this automatically.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:142
> +                                    "changelog/unwantedsecurityterms", 3,

Do we need to add the category "changelog/unwantedsecurityterms" to ChangeLogChecker.categories?

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:190
> +                          '        A buffer overflow existed in code.')

I suggest that we add a newline character to the end of this line to more closely match how a real ChangeLog entry would be formatted.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:202
> +                          '        This patch addresses a great number of issues.'
> +                          '        Therefore there is a lot to say here about a great'
> +                          '        many things such as the weather, the latest and'
> +                          '        greatest in sports, and the mood of fiction'
> +                          '        characters. Anyway the patch fixes a null pointer'
> +                          '        dereference which is not good. Well, it is good'
> +                          '        that it is fixed but not good that it existed.')

I suggest that we add newline characters to the end of each line in this section to make it straightforward to reason about the correctness of this test.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:209
> +                          '        This patch addresses a pretty bad buffer '
> +                          '        overflow in ')

Ditto.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:221
> +                          '        Bug found through fuzzing.')

I suggest that we add a newline character to the end of this line to more closely match how a real ChangeLog entry would be formatted.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:228
> +                          '\n'

I suggest that we either move this '\n' to the end of the previous line (line 227) or add a newline character to the end of line 227 to make it straightforward to reason about the correctness of this test.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:230
> +                          '\n'

I suggest that we either move this '\n' to the end of the previous line (line 229) or add a newline character to the end of line 229 to make it straightforward to reason about the correctness of this test.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:240
> +                          '\n'

I suggest that we either move this '\n' to the end of the previous line (line 239) or add a newline character to the end of line 239 to make it straightforward to reason about the correctness of this test.

> Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:242
> +                          '\n'

I suggest that we either move this '\n' to the end of the previous line (line 241) or add a newline character to the end of line 241 to make it straightforward to reason about the correctness of this test.
Comment 6 Brent Fulgham 2016-10-06 14:39:10 PDT
Comment on attachment 290839 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290839&action=review

>> Tools/Scripts/webkitpy/style/checkers/changelog.py:103
>> +           "memory corruption", "null dereference", "null pointer dereference",
> 
> From my understanding we do not consider a null pointer deference as a security bug. So, I would remove the terms "null dereference" and "null pointer dereference".

That's a good point. I agree we should remove the terms "null pointer" and "null pointer dereference" from the list.

>> Tools/Scripts/webkitpy/style/checkers/changelog.py:107
>> +           "fuzz",         # Captures fuzzer, fuzzing, fuzztest
> 
> This will also match the adjective "fuzzy" and "jsfunfuzz" - the name of a JavaScript fuzzer. I guess we do not mention either often enough.
> 
> Minor: No change needed. I do not see the need to space out this comment and the one below such that they are both left aligned.

We probably should not match "fuzzy", but we should probably avoid "jsfunfuzz", since anything related to jsfunfuzz is likely to be of interest to people looking for vulnerabilities.
Comment 7 Daniel Bates 2016-10-06 14:51:56 PDT
(In reply to comment #5)
> # FIXME: Handle hyphens used to divide words.
> unwanted_phrase_and_line_number_pairs = []
> for i, line in enumerate(cleansed_lines):
>     for phrase in unwanted_phrases:
>         if (searchIgnorecase(phrase, line)):
>             line_number = first_line_checked + i
>             unwanted_phrase_and_line_number_pairs.append((phrase,
> line_number))
> 
> if not unwanted_phrase_and_line_number_pairs:
>     return
> 
> unwanted_phrases = []
> for phrase, line_number in unwanted_phrase_and_line_number_pairs:
>     unwanted_phrases.append(phrase)
> first_line_number_with_unwanted_phrase =
> unwanted_phrase_and_line_number_pairs[0][1]
> self.handle_style_error(first_line_number_with_unwanted_phrase,
>                         "changelog/unwantedsecurityterms", 3,
>                         "Please don't use the following security-sensitive
> {} in change logs:
> {}".format(pluralize(len(unwanted_phrase_and_line_number_pairs), "term"), "
> ".join(unwanted_phrases))

Disregard this code snippet as it does not find unwanted phrases across line breaks.
Comment 8 John Wilander 2016-10-06 17:21:57 PDT
Created attachment 290873 [details]
Patch
Comment 9 John Wilander 2016-10-06 17:24:14 PDT
Brent and Dan, thanks for both of your reviews. I have made the changes you suggested and simplified the code.

I also searched through all of WebCore's change logs to see if we had phrases that would cause noise/false positives. Those phrases were removed from the blacklist.
Comment 10 John Wilander 2016-10-06 17:26:04 PDT
Created attachment 290874 [details]
Patch
Comment 11 John Wilander 2016-10-06 17:26:32 PDT
Removed unnecessary import statement.
Comment 12 John Wilander 2016-10-10 16:43:33 PDT
Anyone up for a review of this?
Comment 13 Brent Fulgham 2016-10-11 09:18:17 PDT
Comment on attachment 290874 [details]
Patch

Looks good. Let's see how this goes!
Comment 14 WebKit Commit Bot 2016-10-11 09:41:09 PDT
Comment on attachment 290874 [details]
Patch

Clearing flags on attachment: 290874

Committed r207146: <http://trac.webkit.org/changeset/207146>
Comment 15 WebKit Commit Bot 2016-10-11 09:41:13 PDT
All reviewed patches have been landed.  Closing bug.
Comment 16 Daniel Bates 2016-10-11 09:54:04 PDT
Comment on attachment 290874 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290874&action=review

> Tools/Scripts/webkitpy/style/checkers/changelog.py:121
> +        lines_with_single_spaces = []
> +        for line in lines:
> +            lines_with_single_spaces.append(" ".join(line.split()))
> +
> +        found_unwanted_security_phrases = []
> +        last_index = len(lines_with_single_spaces) - 1
> +        first_line_number_with_unwanted_phrase = maxsize
> +        for unwanted_phrase in unwanted_security_phrases:
> +            for line_index, line in enumerate(lines_with_single_spaces):
> +                next_line = "" if line_index >= last_index else lines_with_single_spaces[line_index + 1]

Is there much value in finding the line number of the first unwanted phrase given that the emitted error contains the path to the offending ChangeLog file and we list the found phrases?

We can simplify this code by concatenating the lines in lines_with_single_spaces together such that each line is delimited from the next line by a single space character (and, optionally, removing hyphens), call this changelog_entry. Then search for unwanted phrase p in changelog_entry for each p in unwanted_security_phrases, appending p to found_unwanted_security_phrases if changelog_entry contains p. If you have your heart set on providing the line number of the first occurrence of an unwanted phrase then we can build up a list of the starting positions of each line in changelog_entry when building up changelog_entry and use this list to map back the starting position of the first found unwanted phrase to its line number.

> Tools/Scripts/webkitpy/style/checkers/changelog.py:129
> +                                    "Please consider whether the use of security-sensitive phrasing could help someone exploit WebKit: {}".format(", ".join(found_unwanted_security_phrases)))

The presence of these unwanted phrases in a ChangeLog message does not necessarily help someone "exploit WebKit" as much as they attract attention to a security sensitive change. I suggest that we phrase this message in terms of attention attracting capabilities of using these words. Maybe a message of the form: "ChangeLog contains the words: X, Y. Avoid using these words and analogous language as they can attract unnecessary attention to the security implications of this change."