Bug 163048

Summary:

Modify check-webkit-style to prohibit sensitive phrases

Product:

WebKit

Reporter:

John Wilander <wilander>

Component:

Tools / Tests

Assignee:

John Wilander <wilander>

Status:

RESOLVED FIXED

Severity:

Normal

CC:

bfulgham, commit-queue, dbates, ddkilzer, glenn, lforschler, webkit-bug-importer, wilander

Priority:

Keywords:

InRadar

Version:

WebKit Nightly Build

Hardware:

Unspecified

OS:

Unspecified

Attachments:

Description	Flags
Patch	none
Patch	none
Patch	none

John Wilander

Reported 2016-10-06 11:35:08 PDT

Warn about security-sensitive terms in ChangeLog entries.

Attachments
Patch (9.63 KB, patch) 2016-10-06 11:42 PDT, John Wilander	no flags	Details Formatted Diff Diff
Patch (9.46 KB, patch) 2016-10-06 17:21 PDT, John Wilander	no flags	Details Formatted Diff Diff
Patch (9.38 KB, patch) 2016-10-06 17:26 PDT, John Wilander	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

John Wilander

Comment 1 2016-10-06 11:42:50 PDT

Created attachment 290839 [details] Patch

John Wilander

Comment 2 2016-10-06 11:44:16 PDT

rdar://problem/28289755

Brent Fulgham

Comment 3 2016-10-06 12:30:12 PDT

Comment on attachment 290839 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=290839&action=review This change looks fine to me. I'd like Dan or someone more familiar with webkitpy to see if there's anything missing. > Tools/Scripts/webkitpy/style/checkers/changelog.py:111 > + # bug, bypass, crash, dereference, error, failure, heap, leak, overflow, stack This is useful information, but I don't think it needs to be encoded as a comment in the file. Can you just mention this in the Bugzilla notes? > Tools/Scripts/webkitpy/style/checkers/changelog.py:114 > + # Make sure words are separated by a single space I don't think this comment is necessary. An even clearer way to express this idea would be to make a new function "collapseWhitespace" or "ensureOneSpaceBetweenWords" or something. > Tools/Scripts/webkitpy/style/checkers/changelog.py:143 > + "Please don't use the security-sensitive '" + ("terms " if more_than_one_unwanted_term_found else "term ") + all_unwanted_terms_used + "' in change logs or in commits.") This line was a little difficult to read. It seems like a "format_dangerous_terms_message" function would be nice here.

John Wilander

Comment 4 2016-10-06 13:02:16 PDT

Thanks, Brent! I'll wait for Dan's comments too.

Daniel Bates

Comment 5 2016-10-06 14:20:39 PDT

Comment on attachment 290839 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=290839&action=review I am worried about the false positive rate for this change. How did you come up with this list of "unwanted security terms"? Did you build it up from looking at the ChangeLog entries for security bug fixes? Do we have any metrics on the false positive rate of running this filter against the existing ChangeLog files? > Tools/Scripts/webkitpy/style/checkers/changelog.py:98 > + def check_for_unwanted_security_terms(self, first_line_checked, segments): The name segments is ambiguous. I suggest that we use the same name that the caller uses, entry_lines, to describe the array of lines of text from the change log entry. > Tools/Scripts/webkitpy/style/checkers/changelog.py:99 > + unwanted_security_terms = [ Maybe unwanted_phrases would be a more descriptive name for this variable as it contains phrases such as "buffer underrun". > Tools/Scripts/webkitpy/style/checkers/changelog.py:103 > + "memory corruption", "null dereference", "null pointer dereference", From my understanding we do not consider a null pointer deference as a security bug. So, I would remove the terms "null dereference" and "null pointer dereference". > Tools/Scripts/webkitpy/style/checkers/changelog.py:107 > + "fuzz", # Captures fuzzer, fuzzing, fuzztest This will also match the adjective "fuzzy" and "jsfunfuzz" - the name of a JavaScript fuzzer. I guess we do not mention either often enough. Minor: No change needed. I do not see the need to space out this comment and the one below such that they are both left aligned. > Tools/Scripts/webkitpy/style/checkers/changelog.py:115 > + trimmed_segments = [] Maybe a better name for this variable would be cleansed_lines. > Tools/Scripts/webkitpy/style/checkers/changelog.py:116 > + for untrimmed_segment in segments: I suggest that we rename untrimmed_segment to line to better describe what it is, a line from a ChangeLog entry. > Tools/Scripts/webkitpy/style/checkers/changelog.py:141 > + class FoundUnwantedSecurityTerm: > + def __init__(self, term, line_number): > + self.term = term > + self.line_number = line_number > + > + found_unwanted_security_terms = [] > + for unwanted_term in unwanted_security_terms: > + for segment_index, segment in enumerate(trimmed_segments): > + if (searchIgnorecase(unwanted_term, segment)): > + found_unwanted_security_terms.append(FoundUnwantedSecurityTerm(unwanted_term, first_line_checked + segment_index)) > + > + if len(found_unwanted_security_terms) > 0: > + all_unwanted_terms_used = "" > + first_line_number_with_unwanted_term = maxsize > + more_than_one_unwanted_term_found = False > + for found_unwanted_security_term in found_unwanted_security_terms: > + first_line_number_with_unwanted_term = min(first_line_number_with_unwanted_term, found_unwanted_security_term.line_number) > + if (all_unwanted_terms_used == ""): > + all_unwanted_terms_used = found_unwanted_security_term.term > + else: > + more_than_one_unwanted_term_found = True > + all_unwanted_terms_used += ", " + found_unwanted_security_term.term > + self.handle_style_error(first_line_number_with_unwanted_term, Making use of webkitpy.tool.grammar.pluralize() to do the pluralization logic of "term", and removing the remark "or in commits" from the error message (*), I would write this code as: # FIXME: Handle hyphens used to divide words. unwanted_phrase_and_line_number_pairs = [] for i, line in enumerate(cleansed_lines): for phrase in unwanted_phrases: if (searchIgnorecase(phrase, line)): line_number = first_line_checked + i unwanted_phrase_and_line_number_pairs.append((phrase, line_number)) if not unwanted_phrase_and_line_number_pairs: return unwanted_phrases = [] for phrase, line_number in unwanted_phrase_and_line_number_pairs: unwanted_phrases.append(phrase) first_line_number_with_unwanted_phrase = unwanted_phrase_and_line_number_pairs[0][1] self.handle_style_error(first_line_number_with_unwanted_phrase, "changelog/unwantedsecurityterms", 3, "Please don't use the following security-sensitive {} in change logs: {}".format(pluralize(len(unwanted_phrase_and_line_number_pairs), "term"), " ".join(unwanted_phrases)) Notice that we do not need to use sys.maxsize. (*) It seem unnecessary to add the remark "or in commits" as we assume a committer knows the project conventions such that they will always use the ChangeLog message as their commit message. For commit queue landed changes, the commit queue does this automatically. > Tools/Scripts/webkitpy/style/checkers/changelog.py:142 > + "changelog/unwantedsecurityterms", 3, Do we need to add the category "changelog/unwantedsecurityterms" to ChangeLogChecker.categories? > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:190 > + ' A buffer overflow existed in code.') I suggest that we add a newline character to the end of this line to more closely match how a real ChangeLog entry would be formatted. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:202 > + ' This patch addresses a great number of issues.' > + ' Therefore there is a lot to say here about a great' > + ' many things such as the weather, the latest and' > + ' greatest in sports, and the mood of fiction' > + ' characters. Anyway the patch fixes a null pointer' > + ' dereference which is not good. Well, it is good' > + ' that it is fixed but not good that it existed.') I suggest that we add newline characters to the end of each line in this section to make it straightforward to reason about the correctness of this test. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:209 > + ' This patch addresses a pretty bad buffer ' > + ' overflow in ') Ditto. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:221 > + ' Bug found through fuzzing.') I suggest that we add a newline character to the end of this line to more closely match how a real ChangeLog entry would be formatted. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:228 > + '\n' I suggest that we either move this '\n' to the end of the previous line (line 227) or add a newline character to the end of line 227 to make it straightforward to reason about the correctness of this test. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:230 > + '\n' I suggest that we either move this '\n' to the end of the previous line (line 229) or add a newline character to the end of line 229 to make it straightforward to reason about the correctness of this test. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:240 > + '\n' I suggest that we either move this '\n' to the end of the previous line (line 239) or add a newline character to the end of line 239 to make it straightforward to reason about the correctness of this test. > Tools/Scripts/webkitpy/style/checkers/changelog_unittest.py:242 > + '\n' I suggest that we either move this '\n' to the end of the previous line (line 241) or add a newline character to the end of line 241 to make it straightforward to reason about the correctness of this test.

Brent Fulgham

Comment 6 2016-10-06 14:39:10 PDT

Comment on attachment 290839 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=290839&action=review >> Tools/Scripts/webkitpy/style/checkers/changelog.py:103 >> + "memory corruption", "null dereference", "null pointer dereference", > > From my understanding we do not consider a null pointer deference as a security bug. So, I would remove the terms "null dereference" and "null pointer dereference". That's a good point. I agree we should remove the terms "null pointer" and "null pointer dereference" from the list. >> Tools/Scripts/webkitpy/style/checkers/changelog.py:107 >> + "fuzz", # Captures fuzzer, fuzzing, fuzztest > > This will also match the adjective "fuzzy" and "jsfunfuzz" - the name of a JavaScript fuzzer. I guess we do not mention either often enough. > > Minor: No change needed. I do not see the need to space out this comment and the one below such that they are both left aligned. We probably should not match "fuzzy", but we should probably avoid "jsfunfuzz", since anything related to jsfunfuzz is likely to be of interest to people looking for vulnerabilities.

Daniel Bates

Comment 7 2016-10-06 14:51:56 PDT

(In reply to comment #5) > # FIXME: Handle hyphens used to divide words. > unwanted_phrase_and_line_number_pairs = [] > for i, line in enumerate(cleansed_lines): > for phrase in unwanted_phrases: > if (searchIgnorecase(phrase, line)): > line_number = first_line_checked + i > unwanted_phrase_and_line_number_pairs.append((phrase, > line_number)) > > if not unwanted_phrase_and_line_number_pairs: > return > > unwanted_phrases = [] > for phrase, line_number in unwanted_phrase_and_line_number_pairs: > unwanted_phrases.append(phrase) > first_line_number_with_unwanted_phrase = > unwanted_phrase_and_line_number_pairs[0][1] > self.handle_style_error(first_line_number_with_unwanted_phrase, > "changelog/unwantedsecurityterms", 3, > "Please don't use the following security-sensitive > {} in change logs: > {}".format(pluralize(len(unwanted_phrase_and_line_number_pairs), "term"), " > ".join(unwanted_phrases)) Disregard this code snippet as it does not find unwanted phrases across line breaks.

John Wilander

Comment 8 2016-10-06 17:21:57 PDT

Created attachment 290873 [details] Patch

John Wilander

Comment 9 2016-10-06 17:24:14 PDT

Brent and Dan, thanks for both of your reviews. I have made the changes you suggested and simplified the code. I also searched through all of WebCore's change logs to see if we had phrases that would cause noise/false positives. Those phrases were removed from the blacklist.

John Wilander

Comment 10 2016-10-06 17:26:04 PDT

Created attachment 290874 [details] Patch

John Wilander

Comment 11 2016-10-06 17:26:32 PDT

Removed unnecessary import statement.

John Wilander

Comment 12 2016-10-10 16:43:33 PDT

Anyone up for a review of this?

Brent Fulgham

Comment 13 2016-10-11 09:18:17 PDT

Comment on attachment 290874 [details] Patch Looks good. Let's see how this goes!

WebKit Commit Bot

Comment 14 2016-10-11 09:41:09 PDT

Comment on attachment 290874 [details] Patch Clearing flags on attachment: 290874 Committed r207146: <http://trac.webkit.org/changeset/207146>

WebKit Commit Bot

Comment 15 2016-10-11 09:41:13 PDT

All reviewed patches have been landed. Closing bug.

Daniel Bates

Comment 16 2016-10-11 09:54:04 PDT

Comment on attachment 290874 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=290874&action=review > Tools/Scripts/webkitpy/style/checkers/changelog.py:121 > + lines_with_single_spaces = [] > + for line in lines: > + lines_with_single_spaces.append(" ".join(line.split())) > + > + found_unwanted_security_phrases = [] > + last_index = len(lines_with_single_spaces) - 1 > + first_line_number_with_unwanted_phrase = maxsize > + for unwanted_phrase in unwanted_security_phrases: > + for line_index, line in enumerate(lines_with_single_spaces): > + next_line = "" if line_index >= last_index else lines_with_single_spaces[line_index + 1] Is there much value in finding the line number of the first unwanted phrase given that the emitted error contains the path to the offending ChangeLog file and we list the found phrases? We can simplify this code by concatenating the lines in lines_with_single_spaces together such that each line is delimited from the next line by a single space character (and, optionally, removing hyphens), call this changelog_entry. Then search for unwanted phrase p in changelog_entry for each p in unwanted_security_phrases, appending p to found_unwanted_security_phrases if changelog_entry contains p. If you have your heart set on providing the line number of the first occurrence of an unwanted phrase then we can build up a list of the starting positions of each line in changelog_entry when building up changelog_entry and use this list to map back the starting position of the first found unwanted phrase to its line number. > Tools/Scripts/webkitpy/style/checkers/changelog.py:129 > + "Please consider whether the use of security-sensitive phrasing could help someone exploit WebKit: {}".format(", ".join(found_unwanted_security_phrases))) The presence of these unwanted phrases in a ChangeLog message does not necessarily help someone "exploit WebKit" as much as they attract attention to a security sensitive change. I suggest that we phrase this message in terms of attention attracting capabilities of using these words. Maybe a message of the form: "ChangeLog contains the words: X, Y. Avoid using these words and analogous language as they can attract unnecessary attention to the security implications of this change."

Note You need to log in before you can comment on or make changes to this bug.