Bug 27610 - Add check for line-breaking rule #3 to cpplint
Summary: Add check for line-breaking rule #3 to cpplint
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-23 11:19 PDT by Jakob Petsovits
Modified: 2009-07-23 15:51 PDT (History)
4 users (show)

See Also:


Attachments
Add check for line-breaking rule #3 to cpplint (8.57 KB, patch)
2009-07-23 11:38 PDT, Jakob Petsovits
manyoso: review-
Details | Formatted Diff | Diff
Add check for line-breaking rule #3 to cpplint (try 2) (11.21 KB, patch)
2009-07-23 14:42 PDT, Jakob Petsovits
no flags Details | Formatted Diff | Diff
Add check for line-breaking rule #3 to cpplint (13.48 KB, patch)
2009-07-23 15:35 PDT, Jakob Petsovits
manyoso: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakob Petsovits 2009-07-23 11:19:39 PDT
According to the styleguide:
"An else if statement should be written as an if statement when the prior if concludes with a return statement."

And yes, I've got a patch for that! See below.
Comment 1 Jakob Petsovits 2009-07-23 11:38:26 PDT
Created attachment 33347 [details]
Add check for line-breaking rule #3 to cpplint
Comment 2 Adam Treat 2009-07-23 12:14:09 PDT
Comment on attachment 33347 [details]
Add check for line-breaking rule #3 to cpplint

First, I want to say that this is awesome.  I wasn't sure we'd be able to easily check this style rule.

> +    readability/else_if

So you are adding an entirely new category.  Far enough.  At this point I want to say I think this test should check for the following situation too:

if (true) {
    foo();
    return;
} else {
    bar();
}

The style guide doesn't explicitly say it, but I think it is implied.  Why on earth would we allow the above, but deny for 'else if' ?

As such, I think the name of the category as well as the rest of the code should be changed to reflect this case as well and test cases added.

The style guide should probably be updated to make it explicit as well.

> +def check_else_if_with_prior_return(filename, clean_lines, line_number, error):

See above re: naming of method.

> +    elseif_match = match(r'(?P<elseif_indentation>\s*)(?P<brace>\}\s*)?else\s+if\s*\(', line)
> +    if not elseif_match:
> +        return
> +
> +    elseif_indentation = elseif_match.group('elseif_indentation')
> +    inner_indentation = elseif_indentation + ' ' * 4
> +    expect_closing_brace = elseif_match.start('brace') == -1

Ok, I follow.  You are going to rely upon indentation to tell the story.  This is going to produce false positives for places where the code is improperly indented and it is going to miss
a lot of cases too potentially.  That is ok, though as we understand cpplint has false positives and is only meant as a guide in addition to review.

> +        # Skip not only empty lines but also those with preprocessor directives
> +        # and goto labels.
> +        if current_line.strip() == '' or current_line.startswith('#') or match(r'\w+\s*:\s*$', current_line):
> +            continue

Perhaps it'd be good to break these regex's out where they are used in multiple places by cpplint.  RE_PATTERN_INCLUDE does this with good success.

Maybe RE_PATTERN_GOTO and RE_PATTERN_PREPROCESSOR too?  Good refactoring possibility to think about.

> +        # Skip lines with closing braces on the original indentation level.
> +        # Even though the styleguide says they should be on the same line as
> +        # the "else if" statement, we also want to check for instances where
> +        # the current code does not comply with the coding style. Thus, ignore
> +        # these lines and proceed to the line before that.
> +        if expect_closing_brace and current_line == elseif_indentation + '}':
> +            expect_closing_brace = False
> +            continue

After this you don't expect a closing brace, right?  Where is 'expect_closing_brace' used again?  I don't see it...  Is this variable necessary?  What happens when you expect a closing brace and it never comes because the prior if doesn't have braces?

> +        # As we're going up the lines, the first real statement to encounter
> +        # has to be a return statement - otherwise, this check doesn't apply.
> +        if not encountered_return:
> +            if current_line.startswith(inner_indentation + 'return ') or current_line.startswith(inner_indentation + 'return;'):
> +                encountered_return = True
> +                continue
> +            else:
> +                break

Why not just, "if current_line.startswith(inner_indentation + 'return')" ?

> +        # When code execution reaches this point, we have found a "return" line
> +        # as last statement of the previous block. Now we only need to make
> +        # sure that the block belongs to an "if", and we can throw an error.

What else is it going to belong to? an 'else if'? if it is an 'else if'  shouldn't we complain for that too?

    if (condition1) {
        foo1();
    } else if (condition2) {
        foo2();
        return;
    } else if (condition3 {
        foo3();
    } else {
        foo4();
    }

That should be rewritten too, no?

    if (!condition1 && condition2) {
        foo2();
        return;
    }

    if (condition1) {
        foo1();
    } else if (condition3 {
        foo3();
    } else {
        foo4();
    }

What do others think?

> +        # Skip lines with opening braces on the original indentation level,
> +        # similar to the closing braces check above.
> +        if current_line == elseif_indentation + '{':
> +            continue

I don't follow this...

> +        current_indentation_match = match(r'(?P<indentation>\s*)(?P<remaining_line>.*)$', current_line);
> +        current_indentation = current_indentation_match.group('indentation')
> +        remaining_line = current_indentation_match.group('remaining_line')

Guess you are looking for original 'if'...

> +        # Skip everything that's further indented than our "else if".
> +        if current_indentation.startswith(elseif_indentation) and current_indentation != elseif_indentation:
> +            continue
> +
> +        # So we've got a line with same (or less) indentation. Is it an "if"?
> +        # If yes: throw an error. If no: don't throw an error.
> +        # Whatever the outcome, this is the end of our loop.
> +        if match(r'if\s*\(', remaining_line):
> +            error(filename, line_number + line_offset, 'readability/else_if', 4,
> +                  'An else if statement should be written as an if statement when the prior if concludes with a return statement.')
> +        break

Can be greatly simplified if we also consider above I think.  Then you won't have to go searching for original if.

I'd like to get some others feedback if possible.
Comment 3 Adam Treat 2009-07-23 12:18:16 PDT
Also, I think your algorithm will fail for following example:

    if (condition) {
        foo();
        return m_foo
                   && m_bar
                   && m_baz;
    } else if (condition2)
        bar();

Am I right?
Comment 4 Jakob Petsovits 2009-07-23 12:26:37 PDT
re #3:

> Also, I think your algorithm will fail for following example:
> (snip)
> Am I right?

Yes. But that's pretty much unavoidable, if you consider the test case with the nested if. Would take lots more complicated code to distinguish between these two cases, and I'd prefer to keep it clear of false positives.
Comment 5 Jakob Petsovits 2009-07-23 13:01:58 PDT
(In reply to comment #2)
> (From update of attachment 33347 [details])
> First, I want to say that this is awesome.  I wasn't sure we'd be able to
> easily check this style rule.
> 
> > +    readability/else_if
> 
> So you are adding an entirely new category.  Far enough.

I just didn't find any existing one that was appropriate. Any suggestions for a better category name are highly appreciated.

> At this point I want
> to say I think this test should check for the following situation too:
> 
> if (true) {
>     foo();
>     return;
> } else {
>     bar();
> }
> 
> The style guide doesn't explicitly say it, but I think it is implied.
> Why on earth would we allow the above, but deny for 'else if' ?

Seems right to me. A confirmation from someone else would be nice though, and addition to the style guide even nicer.

> > +    elseif_match = match(r'(?P<elseif_indentation>\s*)(?P<brace>\}\s*)?else\s+if\s*\(', line)
> > +    if not elseif_match:
> > +        return
> > +
> > +    elseif_indentation = elseif_match.group('elseif_indentation')
> > +    inner_indentation = elseif_indentation + ' ' * 4
> > +    expect_closing_brace = elseif_match.start('brace') == -1
> 
> Ok, I follow.  You are going to rely upon indentation to tell the story.
> This is going to produce false positives for places where the code is
> improperly indented and it is going to miss a lot of cases too potentially. 
> That is ok, though as we understand cpplint has false positives and is
> only meant as a guide in addition to review.

Yep. This check relies on indentation as much as my other ones (namespace and switch checks) do, and if code is not indented correctly then there should probably another check catching that. I think the percentage of code doing both wrong at once will be pretty low in practice, so it probably shouldn't be that much of a problem.

> > +        # Skip not only empty lines but also those with preprocessor directives
> > +        # and goto labels.
> > +        if current_line.strip() == '' or current_line.startswith('#') or match(r'\w+\s*:\s*$', current_line):
> > +            continue
> 
> Perhaps it'd be good to break these regex's out where they are used in multiple
> places by cpplint.  RE_PATTERN_INCLUDE does this with good success.
> 
> Maybe RE_PATTERN_GOTO and RE_PATTERN_PREPROCESSOR too?  Good refactoring
> possibility to think about.

Apart from the fact that the goto label pattern in the namespace indentation check looks a bit differently because it checks on raw lines instead of cleaned ones, you've still got a point here. Worth investigating.

> > +        # Skip lines with closing braces on the original indentation level.
> > +        # Even though the styleguide says they should be on the same line as
> > +        # the "else if" statement, we also want to check for instances where
> > +        # the current code does not comply with the coding style. Thus, ignore
> > +        # these lines and proceed to the line before that.
> > +        if expect_closing_brace and current_line == elseif_indentation + '}':
> > +            expect_closing_brace = False
> > +            continue
> 
> After this you don't expect a closing brace, right?  Where is
> 'expect_closing_brace' used again?  I don't see it...  Is this variable
> necessary?  What happens when you expect a closing brace and it never comes
> because the prior if doesn't have braces?

It's used directly in the if that you quoted. Essentially, I came up with this in order to avoid triggering on code like this:

if (condition) {
    foo();
}
{
    return;
}

But looking at it again, it's not possible anyways to have an else if following that second block. So I'll get rid of this variable.

> > +        # As we're going up the lines, the first real statement to encounter
> > +        # has to be a return statement - otherwise, this check doesn't apply.
> > +        if not encountered_return:
> > +            if current_line.startswith(inner_indentation + 'return ') or current_line.startswith(inner_indentation + 'return;'):
> > +                encountered_return = True
> > +                continue
> > +            else:
> > +                break
> 
> Why not just, "if current_line.startswith(inner_indentation + 'return')" ?

if (condition)
    returnValue = foo;
else if (otherCondition)
    returnValue = bar;

return returnValue;

Yeah, probably pretty uncommon, but again, better less positives than more false positives.

> > +        # When code execution reaches this point, we have found a "return" line
> > +        # as last statement of the previous block. Now we only need to make
> > +        # sure that the block belongs to an "if", and we can throw an error.
> 
> What else is it going to belong to? an 'else if'? if it is an 'else if' 
> shouldn't we complain for that too?
> 
>     if (condition1) {
>         foo1();
>     } else if (condition2) {
>         foo2();
>         return;
>     } else if (condition3) {
>         foo3();
>     } else {
>         foo4();
>     }
> 
> That should be rewritten too, no?
> 
>     if (!condition1 && condition2) {
>         foo2();
>         return;
>     }
> 
>     if (condition1) {
>         foo1();
>     } else if (condition3) {
>         foo3();
>     } else {
>         foo4();
>     }
> 
> What do others think?

I don't think we should catch this case. On the one hand, it necessitates checking on condition1 twice instead of once. On the other hand, if the logic has more of these cases, the resulting code might be even harder to read as the original one with the simple else-if distinction - after all, the else-if makes it clear that all of the code belongs together logically. This should be up to the coder's judgment imho, I'd rather not check on this.

> > +        # Skip lines with opening braces on the original indentation level,
> > +        # similar to the closing braces check above.
> > +        if current_line == elseif_indentation + '{':
> > +            continue
> 
> I don't follow this...

if (condition)
{
    return;
}
else
    foo();

If I don't skip this line, the loop doesn't get all the way up to the if and therefore the check won't trigger. So I skip this line, and the check triggers.

> > +        current_indentation_match = match(r'(?P<indentation>\s*)(?P<remaining_line>.*)$', current_line);
> > +        current_indentation = current_indentation_match.group('indentation')
> > +        remaining_line = current_indentation_match.group('remaining_line')
> 
> Guess you are looking for original 'if'...
> 
> > +        # Skip everything that's further indented than our "else if".
> > +        if current_indentation.startswith(elseif_indentation) and current_indentation != elseif_indentation:
> > +            continue
> > +
> > +        # So we've got a line with same (or less) indentation. Is it an "if"?
> > +        # If yes: throw an error. If no: don't throw an error.
> > +        # Whatever the outcome, this is the end of our loop.
> > +        if match(r'if\s*\(', remaining_line):
> > +            error(filename, line_number + line_offset, 'readability/else_if', 4,
> > +                  'An else if statement should be written as an if statement when the prior if concludes with a return statement.')
> > +        break
> 
> Can be greatly simplified if we also consider above I think.
> Then you won't have to go searching for original if.

If we want the above, then yes. Considering that I disagree with it, let's have a look what others have to say.
Comment 6 David Levin 2009-07-23 13:31:26 PDT
I haven't looked over the code in any detail, but about the rule itself:

I believe the spirit of the rule applies to break/continue as well (not just return -- though return is the most common example).
Comment 7 Jakob Petsovits 2009-07-23 14:42:12 PDT
Created attachment 33379 [details]
Add check for line-breaking rule #3 to cpplint (try 2)

New version, incorporating most feedback from Adam Treat and Dave Levin, both from the comments in here and from comments in IRC.
Now checks for both "else" and "else if" statements, as well as "break" and "continue" in addition to "return".

This version does not trigger on exit statements in else-if blocks (only in if blocks) and also does not (yet?) unify regular expressions for goto labels (which was suggested by Adam). Apart from that, it's pretty neat though. Please have another look.
Comment 8 Jakob Petsovits 2009-07-23 15:35:59 PDT
Created attachment 33386 [details]
Add check for line-breaking rule #3 to cpplint

Third try. Changes this time:
- Adding goto as fourth exit statement that triggers this check.
- Updating not only the code but also comments to reflect the new situation with multiple exit statements.
- Make the (otherwise unrelated) label indentation check more permissive, because currently it doesn't allow for goto labels at the very start of a line.
- Even more unit tests. If I didn't know these are all necessary, I'd say it's too many of them. But no, these are all necessary.
Comment 9 Adam Treat 2009-07-23 15:45:06 PDT
Comment on attachment 33386 [details]
Add check for line-breaking rule #3 to cpplint

Ok, I think you've suffered enough given the test cases we're now seeing.  It looks pretty damn robust and well thoughtout.  Cheers!
Comment 10 Adam Treat 2009-07-23 15:51:10 PDT
Landed with r46292.