Bug 125881 - frameRegisterCount() should include maxFrameExtentForSlowPathCall.
Summary: frameRegisterCount() should include maxFrameExtentForSlowPathCall.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-17 16:07 PST by Mark Lam
Modified: 2013-12-17 21:58 PST (History)
3 users (show)

See Also:


Attachments
the patch. (8.52 KB, patch)
2013-12-17 16:14 PST, Mark Lam
ggaren: review-
Details | Formatted Diff | Diff
patch 2: applied requested fixes. (8.10 KB, patch)
2013-12-17 17:29 PST, Mark Lam
ggaren: review+
Details | Formatted Diff | Diff
Patch for commit. Extra bits reviewed by msaboff. (12.77 KB, patch)
2013-12-17 18:45 PST, Mark Lam
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2013-12-17 16:07:21 PST
In https://bugs.webkit.org/show_bug.cgi?id=125849#c8, Pizlo said, "The *entire point* of frameRegisterCount() is that it should already include maxFrameExtentForSlowPathCall."

Patch coming to make frameRegisterCount() include maxFrameExtentForSlowPathCall.
Comment 1 Mark Lam 2013-12-17 16:14:20 PST
Created attachment 219463 [details]
the patch.
Comment 2 Filip Pizlo 2013-12-17 16:26:09 PST
Comment on attachment 219463 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=219463&action=review

> Source/JavaScriptCore/bytecode/CodeBlock.cpp:3376
> +        return jitCode()->dfgCommon()->frameRegisterCountWithSlowPathFrameExtent();
> +

This is wrong.  This doesn't actually change the DFG's stack layout behavior.  The DFG never calls CodeBlock::frameRegisterCount().  (Nor should it - the DFG knows that it's the DFG so it will call Graph::frameRegisterCount() directly.  Also, during DFG compilation we don't have a JITCode yet, so this method would just segfault.)

Just change Graph::frameRegisterCount() to include the parameter slow path thingy.  And make sure that it does the right thing for both JS calls and slow path calls.  Right now the DFG lays out a parameter area for itself already.  The only thing missing is that it only accounts for the number of parameter slots needed for JS->JS calls (Graph::m_parameterSlots).  So, just use something like std::max(m_parameterSlots, maxFrameExtentForSlowPathCall).  Of course there will probably be some alignment or other arithmetic.

> Source/JavaScriptCore/dfg/DFGCommonData.cpp:83
> +unsigned CommonData::frameRegisterCountWithSlowPathFrameExtent()
> +{
> +    size_t registerCount = frameRegisterCount + maxFrameExtentForSlowPathCallInRegisters;
> +    ASSERT(registerCount == WTF::roundUpToMultipleOf(stackAlignmentRegisters(), registerCount));
> +    return registerCount;
> +}
> +

Remove this.

> Source/JavaScriptCore/dfg/DFGCommonData.h:94
> +    unsigned frameRegisterCountWithSlowPathFrameExtent();
> +

Remove.

> Source/JavaScriptCore/jit/JIT.h:251
> +        static unsigned frameRegisterCountInBytesFor(CodeBlock* codeBlock)
>          {
> -            ASSERT(!(codeBlock->m_numCalleeRegisters & 1));
> -            return codeBlock->m_numCalleeRegisters;
> +            return -virtualRegisterForLocal(frameRegisterCountFor(codeBlock) - 1).offset() * sizeof(Register);
>          }

Why does this exist?  Why can't you just always say frameRegisterCountFor(codeBlock) * sizeof(Register)?
Comment 3 Geoffrey Garen 2013-12-17 16:59:12 PST
Comment on attachment 219463 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=219463&action=review

> Source/JavaScriptCore/bytecode/CodeBlock.cpp:3378
> +        return jitCode()->dfgCommon()->frameRegisterCountWithSlowPathFrameExtent();
> +
>      case JITCode::FTLJIT:
>          return jitCode()->dfgCommon()->frameRegisterCount;

This smells fishy.

I don't think the DFG wants a frameRegisterCount, and a separate frameRegisterCountWithSlowPathFrameExtent helper function, where the former is an invitation to do things wrong, and the latter is the right answer. I think we just want to put the right answer into frameRegisterCount. My understanding is that's the whole point of that data member.

frameRegisterCount is already initialized for us. The bug you need to fix is that DFG::Graph::frameRegisterCount() does not add in the max slow path thingy. (This also means that the DFG is using the wrong value for SP right now.)

> Source/JavaScriptCore/jit/JIT.cpp:785
> +    ASSERT(!(codeBlock->m_numCalleeRegisters & 1));

Let's remove this.

> Source/JavaScriptCore/jit/JIT.h:250
> +            return -virtualRegisterForLocal(frameRegisterCountFor(codeBlock) - 1).offset() * sizeof(Register);

Instead of frameRegisterCountInBytes, it look like what we need is for VirtualRegister to provide an offsetInBytes() function.

We don't want to directly encode the "-" here, because that makes an assumption about the number line used by VirtualRegister, and the whole point of virtualRegisterForLocal is to encapsulate that number line.

> Source/JavaScriptCore/jit/JITOpcodes.cpp:640
> +    size_t frameExtent = JIT::frameRegisterCountInBytesFor(codeBlock());

Let's rename the LHS to match the RHS.
Comment 4 Filip Pizlo 2013-12-17 17:20:23 PST
(In reply to comment #3)
> (From update of attachment 219463 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=219463&action=review
> 
> > Source/JavaScriptCore/bytecode/CodeBlock.cpp:3378
> > +        return jitCode()->dfgCommon()->frameRegisterCountWithSlowPathFrameExtent();
> > +
> >      case JITCode::FTLJIT:
> >          return jitCode()->dfgCommon()->frameRegisterCount;
> 
> This smells fishy.
> 
> I don't think the DFG wants a frameRegisterCount, and a separate frameRegisterCountWithSlowPathFrameExtent helper function, where the former is an invitation to do things wrong, and the latter is the right answer. I think we just want to put the right answer into frameRegisterCount. My understanding is that's the whole point of that data member.

Yup.

> 
> frameRegisterCount is already initialized for us. The bug you need to fix is that DFG::Graph::frameRegisterCount() does not add in the max slow path thingy. (This also means that the DFG is using the wrong value for SP right now.)

Yes.

> 
> > Source/JavaScriptCore/jit/JIT.cpp:785
> > +    ASSERT(!(codeBlock->m_numCalleeRegisters & 1));
> 
> Let's remove this.
> 
> > Source/JavaScriptCore/jit/JIT.h:250
> > +            return -virtualRegisterForLocal(frameRegisterCountFor(codeBlock) - 1).offset() * sizeof(Register);
> 
> Instead of frameRegisterCountInBytes, it look like what we need is for VirtualRegister to provide an offsetInBytes() function.
> 
> We don't want to directly encode the "-" here, because that makes an assumption about the number line used by VirtualRegister, and the whole point of virtualRegisterForLocal is to encapsulate that number line.
> 
> > Source/JavaScriptCore/jit/JITOpcodes.cpp:640
> > +    size_t frameExtent = JIT::frameRegisterCountInBytesFor(codeBlock());
> 
> Let's rename the LHS to match the RHS.
Comment 5 Mark Lam 2013-12-17 17:29:44 PST
Created attachment 219479 [details]
patch 2: applied requested fixes.
Comment 6 Geoffrey Garen 2013-12-17 17:41:42 PST
Comment on attachment 219479 [details]
patch 2: applied requested fixes.

Let's remove the JSStack::CallerFrameAndPCSize line. The slow path helper, by definition, must include those. (That value seems to have been added in <http://trac.webkit.org/changeset/160506> as a crude estimate of maxFrameExtentForSlowPathCallInRegisters.)
Comment 7 Michael Saboff 2013-12-17 18:15:06 PST
(In reply to comment #6)
> (From update of attachment 219479 [details])
> Let's remove the JSStack::CallerFrameAndPCSize line. The slow path helper, by definition, must include those. (That value seems to have been added in <http://trac.webkit.org/changeset/160506> as a crude estimate of maxFrameExtentForSlowPathCallInRegisters.)

CallerFrameAndPCSize was added in <http://trac.webkit.org/changeset/160506> to account for the caller's frame pointer and return PC for outgoing calls.  If we remove CallerFrameAndPCSize from this place, then maxFrameExtentForSlowPathCallInRegisters should be adjusted up on all platforms to account for the same.  I believe that Mark will do this in a separate patch.
Comment 8 Filip Pizlo 2013-12-17 18:26:02 PST
Comment on attachment 219479 [details]
patch 2: applied requested fixes.

View in context: https://bugs.webkit.org/attachment.cgi?id=219479&action=review

I smell weirdness.  Can you comment on what is going on in the DFG with CallerFrameAndPCSize, and why the other engines don't do similar?

> Source/JavaScriptCore/dfg/DFGGraph.cpp:708
> +    unsigned maxOutgoingArgsSlots = std::max(m_parameterSlots, static_cast<unsigned>(JSStack::CallerFrameAndPCSize));

Isn't the point here that you're trying to account for the outgoing frame header size for calls on platforms that can pass all arguments in registers?

In that case, why isn't this just part of maxFrameExtentForSlowPathCallInRegisters?

And why doesn't the baseline JIT also use this as part of its frameRegisterCount() calculation?

> Source/JavaScriptCore/jit/JIT.cpp:787
> +    size_t registerCount = codeBlock->m_numCalleeRegisters + maxFrameExtentForSlowPathCallInRegisters;
> +    ASSERT(registerCount == WTF::roundUpToMultipleOf(stackAlignmentRegisters(), registerCount));
> +    return registerCount;

This feels like it should also take into account JSStack::CallerFrameAndPCSize, just like you're doing in the DFG.

Or, even better, neither should take it into account because it would be factored into maxFrameExtentForSlowPathCallInRegisters or some other constant.
Comment 9 Filip Pizlo 2013-12-17 18:27:26 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (From update of attachment 219479 [details] [details])
> > Let's remove the JSStack::CallerFrameAndPCSize line. The slow path helper, by definition, must include those. (That value seems to have been added in <http://trac.webkit.org/changeset/160506> as a crude estimate of maxFrameExtentForSlowPathCallInRegisters.)
> 
> CallerFrameAndPCSize was added in <http://trac.webkit.org/changeset/160506> to account for the caller's frame pointer and return PC for outgoing calls.  If we remove CallerFrameAndPCSize from this place, then maxFrameExtentForSlowPathCallInRegisters should be adjusted up on all platforms to account for the same.  I believe that Mark will do this in a separate patch.

OK, but it would really be better to do this as part of this patch.  I don't like how this patch leaves the code in an inconsistent state.  It'll make revision archeology very confusing.
Comment 10 Mark Lam 2013-12-17 18:45:37 PST
Created attachment 219488 [details]
Patch for commit.  Extra bits reviewed by msaboff.
Comment 11 Filip Pizlo 2013-12-17 18:46:37 PST
Comment on attachment 219488 [details]
Patch for commit.  Extra bits reviewed by msaboff.

View in context: https://bugs.webkit.org/attachment.cgi?id=219488&action=review

> Source/JavaScriptCore/dfg/DFGGraph.cpp:708
> -    unsigned result = m_nextMachineLocal + std::max(m_parameterSlots, static_cast<unsigned>(JSStack::CallerFrameAndPCSize));
> +    unsigned result = m_nextMachineLocal + std::max(m_parameterSlots, static_cast<unsigned>(maxFrameExtentForSlowPathCallInRegisters));

OMG so much better!
Comment 12 Filip Pizlo 2013-12-17 18:46:50 PST
Comment on attachment 219488 [details]
Patch for commit.  Extra bits reviewed by msaboff.

r=me too
Comment 13 Mark Lam 2013-12-17 19:35:53 PST
Thanks for all the reviews.  Landed in r160745: <http://trac.webkit.org/r160745>.
Comment 14 Geoffrey Garen 2013-12-17 20:18:12 PST
> CallerFrameAndPCSize was added in <http://trac.webkit.org/changeset/160506> to account for the caller's frame pointer and return PC for outgoing calls.

Do you mean outgoing calls to VM helper functions, or outgoing calls to JavaScript functions?

If we have an outgoing call to a JavaScript function, m_parameterSlots should be non-0, and maxing with CallerFrameAndPCSize should be a no-op. Also, it would be supremely weird to encode the stack frame needs of a JavaScript function call in a variable named maxFrameExtentForSlowPathCall.

If we have an outgoing call to a VM helper function, the return PC and return BP of the host frame are pushed after the call, and so they do not influence what frameRegisterCount should be set to, since frameRegisterCount is the value that sp should hold before the call instruction.

Either way, it smells like this patch covered up a really nasty bug by arbitrarily making the stack frame a tiny bit bigger.
Comment 15 Michael Saboff 2013-12-17 21:15:51 PST
(In reply to comment #14)
> > CallerFrameAndPCSize was added in <http://trac.webkit.org/changeset/160506> to account for the caller's frame pointer and return PC for outgoing calls.
> 
> Do you mean outgoing calls to VM helper functions, or outgoing calls to JavaScript functions?
> 
> If we have an outgoing call to a JavaScript function, m_parameterSlots should be non-0, and maxing with CallerFrameAndPCSize should be a no-op. Also, it would be supremely weird to encode the stack frame needs of a JavaScript function call in a variable named maxFrameExtentForSlowPathCall.
> 
> If we have an outgoing call to a VM helper function, the return PC and return BP of the host frame are pushed after the call, and so they do not influence what frameRegisterCount should be set to, since frameRegisterCount is the value that sp should hold before the call instruction.
> 
> Either way, it smells like this patch covered up a really nasty bug by arbitrarily making the stack frame a tiny bit bigger.

It was only for outgoing calls to VM helper functions.

frameRegisterCount needs to include the size of the largest outgoing frame and it includes the return PC and return BP that the callee will push.  When there aren't any outgoing JS calls, we need to include space for the return PC and BP of any outgoing VM helper calls.  Therefore the max() of m_parameterSlots and CallerFrameAndPCSize. When the stack pointer is set up, we don't include (i.e. subtract out) CallerFrameAndPCSize as the callee will push return PC and return BP.  The current patch rightly doesn't change the setting of the stack pointer, we are still subtracting out CallerFrameAndPCSize.  The current patch didn't change any of the calculations results for the DFG, it just moved CallerFrameAndPCSize to be part of maxFrameExtentForSlowPathCall.

If frameRegisterCount is the value that sp should set to before the call, we create a stack frame that has two extra slots.  This is due to m_parameterSlots including the size of a full frame header.  See ByteCodeParser::addCall() where we have:
    m_parameterSlots = JSStack::ThisArgument + argCount

The stack check must check at least frameRegisterCount without subtracting CallerFrameAndPCSize.
Comment 16 Geoffrey Garen 2013-12-17 21:58:52 PST
> If frameRegisterCount is the value that sp should set to before the call, we create a stack frame that has
> two extra slots.  This is due to m_parameterSlots including the size of a full frame header.  See
> ByteCodeParser::addCall() where we have:
>     m_parameterSlots = JSStack::ThisArgument + argCount

Can we fix this by initializing m_parameterSlots to a more appropriate value (JSStack::CallFrameHeaderSize - CallerFrameAndPCSize + argCount)? What's the motivation behind pretending that we have two more outgoing parameter slots than we have actually allocated in our stack frame? If somebody ever tried to iterate from 0 to m_parameterSlots, they would corrupt the stack. That doesn't seem so good.

> The stack check must check at least frameRegisterCount without subtracting CallerFrameAndPCSize.

Sure, the stack check has special needs. But the stack check computation is distinct from setting sp. The purpose of frameRegisterCount is to set sp.

I believe our stated plan for accommodating CallerFrameAndPCSize in C++ helper functions was just to fold it into in the redzone we already need for C++ functions generally. Knowing that you have 16 bytes free on the stack does you no good if you're about to call a C++ function that uses that 16 bytes for CallerFrameAndPCSize plus an additional 1kB for its local variables.