Bug 180259 - Options::maximumInliningCallerSize does not meaningfully limit codeBlock size, and is cliff-like
Summary: Options::maximumInliningCallerSize does not meaningfully limit codeBlock size...
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Robin Morisset
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-01 10:16 PST by Robin Morisset
Modified: 2019-10-07 16:44 PDT (History)
6 users (show)

See Also:


Attachments
Patch (13.06 KB, patch)
2017-12-01 10:25 PST, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (13.06 KB, patch)
2017-12-01 10:33 PST, Robin Morisset
no flags Details | Formatted Diff | Diff
Patch (6.67 KB, patch)
2019-10-07 15:56 PDT, Robin Morisset
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Morisset 2017-12-01 10:16:31 PST
Currently, Options::maximumInliningCallerSize has a value of 10000, and has the following consequences:
- A function with an instructionCount() of 10001 can never have any non-intrinsic function inlined into it
- A function with an instructionCount() of 9999 can have just as much inlining as a function of size 10.

This cliff-like behavior is not only surprising, it also goes against the comment justifying this option:
/* Maximum size of a caller for enabling inlining. This is purely to protect us */
/* from super long compiles that take a lot of memory. */
since it puts no cap on how large a function can become through inlining, only on how large it can be before inlining makes it larger.

I would like to do two things about this:
- First make this option an actual inlining budget, so that if a function is of size 6000, only up to 4000 instruction bytes can be inlined into it (making the comment above true)
- Split this option into two, one for DFG compilation and one for FTL compilation. Since we care less about compile time in FTL than in DFG, we can give it a higher inlining budget.

My preliminary attempt at this seems to be performance-neutral on the old jsc-only benchmarks (Octane, Kraken, Sunspider), or at most a small non-significant speedup.
I would like to benchmark it against ARES/Speedometer to tune the default values for these options before landing it.
Comment 1 Robin Morisset 2017-12-01 10:25:18 PST
Created attachment 328115 [details]
Patch
Comment 2 Robin Morisset 2017-12-01 10:33:14 PST
Created attachment 328117 [details]
Patch
Comment 3 Mark Lam 2017-12-01 11:20:22 PST
Comment on attachment 328115 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328115&action=review

> Source/JavaScriptCore/ChangeLog:15
> +        I split Options::maximumInliningCallerSize in two (DFG and FTL versions), and changed its behaviour.
> +        It now provides a cap on the maximum size that a function can reach through inlining, instead of being just a threshold below which inlining was unconstrained.
> +        This should help keep compile time/memory under control.
> +        Interestingly, the comment for this option already implied this behaviour.
> +
> +        This patch also does two very minor changes:
> +        - It removes CodeBlock::numberOfInstructions() which was a duplicate of CodeBlock::instructionCount
> +        - It makes the style of noticeIncomingCall more uniform.

I'm not convinced that this new inlining policy is an improvement.  Here's why: let's say you have a function like this:
    function foo() {
        bar(); // large codeBlock
        baz1(); // small codeBlock
        baz2(); // small codeBlock
        baz3(); // small codeBlock
    }

The original policy will reject bar() and inline baz1(), baz2(), and baz3().
The new policy may inline bar(), and as a consequence of exhausting the budget, now cease to inline baz1(), baz2(), and baz3().

It is not clear that the new policy is a win.  My intuition says that inlining small functions tend to be a win (e.g. getter/setters), while inlining large functions is a grab bag.  Please do some benchmarking with this new policy.

If this does lead to a progression, I wonder if we can do something about the pathology in my example above where the large inline target crowds out potentially more profitable ones.  Perhaps some weighting to ensure that no one inline target gets to hog the whole inline budget.  The larger the inline target, the greater the bias against it being inlined.  What do you think?

> Source/JavaScriptCore/bytecode/CodeBlock.cpp:2090
> +    if (callerCodeBlock->instructionCount() > Options::maximumInliningCallerSizeInFTL()) {

This implies that Options::maximumInliningCallerSizeInFTL() > Options::maximumInliningCallerSizeInDFG().  I suggest adding a check in Options.cpp's recomputeDependentOptions() to ensure that this is the case.  Either RELEASE_ASSERT it, or dataLog the discrepancy and cap the DFG limit to the FTL one.

> Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1490
>          codeBlock->m_shouldAlwaysBeInlined = false;
>          VERBOSE_LOG("    Failing because the caller is too large.\n");

Is this still true?  We're only failing to inline because it exceeds the current budget.  That doesn't mean that this target codeBlock can't be inlined in another function.

> Source/JavaScriptCore/runtime/Options.h:300
> +    v(unsigned, maximumInliningCallerSizeInDFG, 10000, Normal, nullptr) \
> +    v(unsigned, maximumInliningCallerSizeInFTL, 20000, Normal, nullptr) \

I suggest just renaming these to maximumDFGInliningCallerSize and maximumFTLInliningCallerSize respectively.
Comment 4 Robin Morisset 2017-12-01 12:02:54 PST
(In reply to Mark Lam from comment #3)
> Comment on attachment 328115 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=328115&action=review
> 
> > Source/JavaScriptCore/ChangeLog:15
> > +        I split Options::maximumInliningCallerSize in two (DFG and FTL versions), and changed its behaviour.
> > +        It now provides a cap on the maximum size that a function can reach through inlining, instead of being just a threshold below which inlining was unconstrained.
> > +        This should help keep compile time/memory under control.
> > +        Interestingly, the comment for this option already implied this behaviour.
> > +
> > +        This patch also does two very minor changes:
> > +        - It removes CodeBlock::numberOfInstructions() which was a duplicate of CodeBlock::instructionCount
> > +        - It makes the style of noticeIncomingCall more uniform.
> 
> I'm not convinced that this new inlining policy is an improvement.  Here's
> why: let's say you have a function like this:
>     function foo() {
>         bar(); // large codeBlock
>         baz1(); // small codeBlock
>         baz2(); // small codeBlock
>         baz3(); // small codeBlock
>     }
> 
> The original policy will reject bar() and inline baz1(), baz2(), and baz3().
> The new policy may inline bar(), and as a consequence of exhausting the
> budget, now cease to inline baz1(), baz2(), and baz3().
The original policy would have either inlined all 4, or rejected all 4 (ignoring the per-callee inlining balance/threshold, which is left untouched by this patch).
However, it is true that we can go from inlining all 4 to inlining only bar() and rejecting the rest because of having exhausted the budget. I agree it is something to look out for.

> It is not clear that the new policy is a win.  My intuition says that
> inlining small functions tend to be a win (e.g. getter/setters), while
> inlining large functions is a grab bag.  Please do some benchmarking with
> this new policy.
I will definitely do more benchmarking on this before landing anything. It is just delayed for a bit because of some build issues I am dealing with.

> If this does lead to a progression, I wonder if we can do something about
> the pathology in my example above where the large inline target crowds out
> potentially more profitable ones.  Perhaps some weighting to ensure that no
> one inline target gets to hog the whole inline budget.  The larger the
> inline target, the greater the bias against it being inlined.  What do you
> think?
Yes, this sounds good. I will test something along those lines.

> > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2090
> > +    if (callerCodeBlock->instructionCount() > Options::maximumInliningCallerSizeInFTL()) {
> 
> This implies that Options::maximumInliningCallerSizeInFTL() >
> Options::maximumInliningCallerSizeInDFG().  I suggest adding a check in
> Options.cpp's recomputeDependentOptions() to ensure that this is the case. 
> Either RELEASE_ASSERT it, or dataLog the discrepancy and cap the DFG limit
> to the FTL one.
Good catch, I will fix this.

> > Source/JavaScriptCore/dfg/DFGByteCodeParser.cpp:1490
> >          codeBlock->m_shouldAlwaysBeInlined = false;
> >          VERBOSE_LOG("    Failing because the caller is too large.\n");
> 
> Is this still true?  We're only failing to inline because it exceeds the
> current budget.  That doesn't mean that this target codeBlock can't be
> inlined in another function.
It can be inlined in another function, but it should be compiled independently since we will be calling it (instead of inlining it). m_shouldAlwaysBeInlined is used to tag a function that is never called (because it is always inlined) and so should never be compiled independently. I think it is appropriate here to put it to false.

> > Source/JavaScriptCore/runtime/Options.h:300
> > +    v(unsigned, maximumInliningCallerSizeInDFG, 10000, Normal, nullptr) \
> > +    v(unsigned, maximumInliningCallerSizeInFTL, 20000, Normal, nullptr) \
> 
> I suggest just renaming these to maximumDFGInliningCallerSize and
> maximumFTLInliningCallerSize respectively.
Thanks for the suggestion, I agree it sounds/looks nicer.
Comment 5 Robin Morisset 2019-10-07 15:56:50 PDT
Created attachment 380369 [details]
Patch

I've become interested by this again because there is a function in JetStream2/tagcloud-SP that frequently takes ~1s just for register allocation, and I suspect it is due to pathological inlining blowing it up.
So I've rebased the patch and sent it to our perf infra now that I know how to use. I'll update this bug once I know if it is a speedup as hoped.