Bug 127071 - CStack Branch: X86-32 Fix LLInt
Summary: CStack Branch: X86-32 Fix LLInt
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks: 127086 127201
  Show dependency treegraph
 
Reported: 2014-01-15 15:36 PST by Michael Saboff
Modified: 2014-01-17 21:01 PST (History)
1 user (show)

See Also:


Attachments
Patch (32.55 KB, patch)
2014-01-15 15:51 PST, Michael Saboff
mark.lam: review-
Details | Formatted Diff | Diff
Updated patch (23.09 KB, patch)
2014-01-16 17:30 PST, Michael Saboff
mark.lam: review-
Details | Formatted Diff | Diff
proposed patch with changes in areas where I felt we need more clarity. ChangeLog not included. (23.84 KB, patch)
2014-01-17 17:38 PST, Mark Lam
no flags Details | Formatted Diff | Diff
diff between the 2 patches (9.24 KB, application/octet-stream)
2014-01-17 17:39 PST, Mark Lam
no flags Details
Patch with updates after conversation with Mark (24.57 KB, patch)
2014-01-17 20:49 PST, Michael Saboff
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2014-01-15 15:36:55 PST
The C Branch currently doesn't compile for X86-32.  This is to get it to compile and then run in the LLInt.
Comment 1 Michael Saboff 2014-01-15 15:51:22 PST
Created attachment 221312 [details]
Patch
Comment 2 Mark Lam 2014-01-15 18:51:33 PST
Comment on attachment 221312 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=221312&action=review

Thanks for doing all this work.  I think your alignment logic is off (and I’ve made comments where I spotted them).  Basically, you seem to expect the stack to be aligned before a call instruction.  I think the sp should be 8 bytes off alignment just before the call instruction.  The call instruction will push 4 bytes for the return PC, and the callee prologue will push another 4 bytes for the saved cfr.  Thereafter, the sp becomes aligned again.  Can you please take another look through this?

As for the JIT parts, I’m not super familiar with them, but I did what I could by comparing them against the 64-bit version as a trusted version.  You might want to get a second pair of eyes on the JIT parts in case I missed any subtleties.  Thanks.

> Source/JavaScriptCore/jit/JITCall32_64.cpp:198
> +    addPtr(TrustedImm32(sizeof(CallerFrameAndPC)), regT3, stackPointerRegister);

Should regT3 here be regT4 instead?  This is different compared to what I’m seeing in the 64-bit version.

> Source/JavaScriptCore/jit/JITCall32_64.cpp:228
>      emitLoad(JSStack::Callee, regT1, regT0);

This looks like it will trash the result of the 2 loadPtrs above.  Also, the 64-bit version doesn’t have this line.

> Source/JavaScriptCore/jit/JITOpcodes32_64.cpp:-53
> +    emitFunctionPrologue();
>      emitPutImmediateToCallFrameHeader(0, JSStack::CodeBlock);
> -    storePtr(callFrameRegister, &m_vm->topCallFrame);

The 64-bit version of JIT::privateCompileCTINativeCall() just returns vm->getCTIStub(nativeCallGenerator).  Is there any reason the 32-bit version should behave differently or should be structured differently for this function?

> Source/JavaScriptCore/jit/ThunkGenerators.cpp:518
> +    // FIXME: I don't think currentReturnThunkPC is used and should be deleted.
> +#  if 0
> +    jit.loadPtr(&vm->currentReturnThunkPC, GPRInfo::regT2);
> +    jit.storePtr(GPRInfo::regT2, JSInterfaceJIT::Address(JSInterfaceJIT::callFrameRegister, CallFrame::returnPCOffset()));
> +#   else
> +    jit.storePtr(GPRInfo::regT5, JSInterfaceJIT::Address(JSInterfaceJIT::callFrameRegister, CallFrame::returnPCOffset()));
> +#   endif

FIXME to be addressed?  I noticed that this isn’t in the 64-bit code.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:105
> +        subp 8, sp

Is this necessary?  Assuming the sp is currently aligned, we push 2 args, the call pushes the retAddress, and the callee pushes %rbp ... with that, we’re aligned again.  So, I think this sp bump is not needed here.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:109
> +        addp 16, sp

In light of the above, this should be "addp 8, sp”.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:132
> +        push arg4
> +        push arg3
> +        push arg2
> +        push arg1

I think we need to “subp 8, sp” before we push the args here.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:134
> +        addp 16, sp

Accordingly, this should be “addp 24, sp”.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:210
> +    callToJavaScriptPrologue()

See comments for callToJavaScriptPrologue() in LowLevelInterpreter.asm below.  Add the “subp 8, sp” after this line with the comment about this being the alignment adjustment for the VMEntrySentinelFrame which is only 5 Registers in size.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:337
> +        # Put callee frame pointer on stack as arg0, also put it in ecx for "fastcall" targets
> +        move sp, t2 # t2 is ecx
> +        move cfr, [sp] # put caller frame pointer into callee frame since callee prologue can't
> +        subp 12, sp
> +        push t2

Shouldn’t this be “sub 4, sp” instead of “sub 12, sp”?  The sp is aligned to begin with, we push 1 arg, the call pushes the retAddress, the callee pushes rbp.  That means we’re only 4 bytes off for alignment, not 12.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:344
> +        addp 16, sp

Ditto.  This should be "addp 8, sp” instead of “addp 16, sp”.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:-2114
> -        subp 16 - 8, sp
>          loadi Callee + PayloadOffset[cfr], t1
>          loadp JSFunction::m_executable[t1], t1
> +        checkStackPointerAlignment(t3, 0xdead0001)
>          call executableOffsetToFunction[t1]
> -        addp 16 - 8, sp

The native function expects 1 ExecState* arg.  Hence, before the call, we need to: subtract the sp by 4 for alignment, and push the cfr.  With this 2 and the pushed return PC and cfr to follow, the sp will be aligned again in the callee.  Similarly after the call returns, we need to add 8 to the sp.

> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:307
>          push t0

I think you should make this part of pushCalleeSaves so that it’s easier to reason about the stack alignment in one place.

> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:311
>          pushLRAndFP

This question is not directly due to this patch, but since it’s in this area, I thought I’d ask: is the lr register not accessible as a general purpose register on ARM64 like it is on ARM?  If lr is just like any other GPR, why don’t we just fold this ARM64 section into the ARM one, and remove the need for pushLRAndFP.  Similarly for popLRAndFP.  Let me know if I’m wrong about this.  Thanks.

> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:319
> +    if X86
> +        subp 12, sp
> +    end

You should make this part of pushCalleeSaves instead of doing it here.  Either that or we get rid of pushCalleeSaves and push the registers explicitly here.  Otherwise, it’s one step removed to be able to see whether we have retained stack alignment or not.

At first, looking at this code, I thought we need to "subp 4, sp” instead of “subp 12, sp” because we pushed 3 callee saved regs bringing the total to 12 and needing 4 more bytes for alignment.  And then I realized that we need to add another 8 to compensate for the fact that the 32-bit CallFrameHeader only has 5 Registers in size and that we need to push a VMEntrySentinelFrame soon.

The pushCalleeSaves number of words may vary architecture to architecture, but we’ll always want to subtract the sp by 8 here (for ARM and other 32-bit archs as well) to compensate for the CallFrameHeader being only 5 Registers in size and that we need the stack aligned after we push the VMEntrySentinelFrame.  So, here’s what I propose we do to make things more sane:

1. After we push cfr, the sp should be aligned (as one would expect in the ABI).
2. After we push the callee saved registers, let’s keep the sp aligned.  Hence, in pushCalleeSaves for X86, let’s subtract another 4 bytes from sp.
3. In doCallToJavaScript(), subtract the sp by 8 there before pushing the VMEntrySentinelFrame.  Also add a comment to explain that that 8 bytes adjustment is needed because we need to keep the sp aligned after we push the VMEntrySentinelFrame and the VmEntrySentinelFrame being of the size of a CallFrameHeader is 8 bytes short of alignment.

This way, it’ll be easier to reason about the stack alignment at each step.  Each part takes care of their own requirement.  And if pushCalleeSaves vary for different ports, they only need to worry about their own number of saved registers and align that.

> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:329
> +    if X86
> +        addp 12, sp
> +    end
> +

You should do this as part of popCalleeSaves. Same as above, add 4 instead of 12.

> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:332
> +        pop t2

Ditto.  Make this part of popCalleeSaves so that it’s easier to reason about the stack alignment in one place.

> Source/JavaScriptCore/runtime/CommonSlowPaths.cpp:2
> - * Copyright (C) 2011, 2012, 2013 Apple Inc. All rights reserved.
> + * Copyright (C) 2011, 2012, 2013, 2014 Apple Inc. All rights reserved.

Not needed because the only change is the white space change below.

> Source/JavaScriptCore/runtime/CommonSlowPaths.cpp:-364
>      BEGIN();
> -    

I don’t think this white space only change is needed.
Comment 3 Mark Lam 2014-01-16 14:34:42 PST
Comment on attachment 221312 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=221312&action=review

Michael pointed out that the ABI requirement is to have the sp aligned before the call instruction, and the cfr as a result will be off by 8 (and need not be aligned).  Updating / adding some of the review comments based on this new info.  The other review comments still stand.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:105
>> +        subp 8, sp
> 
> Is this necessary?  Assuming the sp is currently aligned, we push 2 args, the call pushes the retAddress, and the callee pushes %rbp ... with that, we’re aligned again.  So, I think this sp bump is not needed here.

Disregard this comment.  The "subp 9, sp" is needed.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:109
>> +        addp 16, sp
> 
> In light of the above, this should be "addp 8, sp”.

Disregard this comment.  The "addp 16, sp" is correct.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:132
>> +        push arg1
> 
> I think we need to “subp 8, sp” before we push the args here.

Disregard this comment.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:134
>> +        addp 16, sp
> 
> Accordingly, this should be “addp 24, sp”.

Disregard this comment.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:210
>> +    callToJavaScriptPrologue()
> 
> See comments for callToJavaScriptPrologue() in LowLevelInterpreter.asm below.  Add the “subp 8, sp” after this line with the comment about this being the alignment adjustment for the VMEntrySentinelFrame which is only 5 Registers in size.

I'm not sure if we need to do any adjustments here.  It depends on what makes sense to go into pushCalleeSaves().  One requirement is that the sp needs to be aligned by the time we make the call to _llint_throw_stack_overflow_error() below.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:337
>> +        push t2
> 
> Shouldn’t this be “sub 4, sp” instead of “sub 12, sp”?  The sp is aligned to begin with, we push 1 arg, the call pushes the retAddress, the callee pushes rbp.  That means we’re only 4 bytes off for alignment, not 12.

Disregard this comment.

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:344
>> +        addp 16, sp
> 
> Ditto.  This should be "addp 8, sp” instead of “addp 16, sp”.

Disregard this comment.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:2111
>          loadp MarkedBlock::m_weakSet + WeakSet::m_vm[t3], t3

BTW, just above this code, the "load ScopeChain[cfr], t3" is redundant.  The ScopeChain is already in t1 (loaded in the instructions above it).  You can remove the "loadp ScopeChain[cfr], t3" and use t1 instead in the 2 andp and loadp instructions for loading the VM*.

> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:-2110
> -        subp 16 - 8, sp

You need to push the cfr here as well.  Hence:

subp 12, sp
push cfr

>> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:-2114
>> -        addp 16 - 8, sp
> 
> The native function expects 1 ExecState* arg.  Hence, before the call, we need to: subtract the sp by 4 for alignment, and push the cfr.  With this 2 and the pushed return PC and cfr to follow, the sp will be aligned again in the callee.  Similarly after the call returns, we need to add 8 to the sp.

Here, you'll need:
addp 16, sp

>> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:319
>> +    end
> 
> You should make this part of pushCalleeSaves instead of doing it here.  Either that or we get rid of pushCalleeSaves and push the registers explicitly here.  Otherwise, it’s one step removed to be able to see whether we have retained stack alignment or not.
> 
> At first, looking at this code, I thought we need to "subp 4, sp” instead of “subp 12, sp” because we pushed 3 callee saved regs bringing the total to 12 and needing 4 more bytes for alignment.  And then I realized that we need to add another 8 to compensate for the fact that the 32-bit CallFrameHeader only has 5 Registers in size and that we need to push a VMEntrySentinelFrame soon.
> 
> The pushCalleeSaves number of words may vary architecture to architecture, but we’ll always want to subtract the sp by 8 here (for ARM and other 32-bit archs as well) to compensate for the CallFrameHeader being only 5 Registers in size and that we need the stack aligned after we push the VMEntrySentinelFrame.  So, here’s what I propose we do to make things more sane:
> 
> 1. After we push cfr, the sp should be aligned (as one would expect in the ABI).
> 2. After we push the callee saved registers, let’s keep the sp aligned.  Hence, in pushCalleeSaves for X86, let’s subtract another 4 bytes from sp.
> 3. In doCallToJavaScript(), subtract the sp by 8 there before pushing the VMEntrySentinelFrame.  Also add a comment to explain that that 8 bytes adjustment is needed because we need to keep the sp aligned after we push the VMEntrySentinelFrame and the VmEntrySentinelFrame being of the size of a CallFrameHeader is 8 bytes short of alignment.
> 
> This way, it’ll be easier to reason about the stack alignment at each step.  Each part takes care of their own requirement.  And if pushCalleeSaves vary for different ports, they only need to worry about their own number of saved registers and align that.

Since there is no requirement on the alignment of the cfr, the only requirement is that the sp is aligned by the time you get to the call to llint_throw_stack_overflow_error() in doCallToJavaScriptCore().  I think you should still do all the adjustments in pushCalleeSaves and document the reasons for any padding you add there for alignment.

Thinking about this some more, I feel that we should have pushCalleeSaves as a macro in the LLINT asm files.  Currently, having to look into the offline asm files (in order to figure out what pushCalleeSaves does) makes it harder to reason about whether we've met the alignment requirement or not.  Can you retire the pushCalleeSaves and popCalleeSaves instructions and replace them with LLINT macros instead?

>> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:329
>> +
> 
> You should do this as part of popCalleeSaves. Same as above, add 4 instead of 12.

Same … do this as part of a LLINT popCalleeSaves() macro, but I'm not sure about the amount to align yet.  Need to see what the resultant pushCalleeSaves() and popCalleeSaves() macros look like.
Comment 4 Michael Saboff 2014-01-16 17:30:37 PST
Created attachment 221432 [details]
Updated patch

(In reply to comment #3)
> (From update of attachment 221312 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=221312&action=review
> 
> Michael pointed out that the ABI requirement is to have the sp aligned before the call instruction, and the cfr as a result will be off by 8 (and need not be aligned).  Updating / adding some of the review comments based on this new info.  The other review comments still stand.
> >> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:210
> >> +    callToJavaScriptPrologue()
> > 
> > See comments for callToJavaScriptPrologue() in LowLevelInterpreter.asm below.  Add the “subp 8, sp” after this line with the comment about this being the alignment adjustment for the VMEntrySentinelFrame which is only 5 Registers in size.
> 
> I'm not sure if we need to do any adjustments here.  It depends on what makes sense to go into pushCalleeSaves().  One requirement is that the sp needs to be aligned by the time we make the call to _llint_throw_stack_overflow_error() below.

Changed the move cfr, sp to subp cfr, 8, sp.

> > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:2111
> >          loadp MarkedBlock::m_weakSet + WeakSet::m_vm[t3], t3
> 
> BTW, just above this code, the "load ScopeChain[cfr], t3" is redundant.  The ScopeChain is already in t1 (loaded in the instructions above it).  You can remove the "loadp ScopeChain[cfr], t3" and use t1 instead in the 2 andp and loadp instructions for loading the VM*.
> 
> > Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:-2110
> > -        subp 16 - 8, sp
> 
> You need to push the cfr here as well.  Hence:
> 
> subp 12, sp
> push cfr

Actually we just did a functionPrologue above, so that isn't needed.

> >> Source/JavaScriptCore/llint/LowLevelInterpreter32_64.asm:-2114
> >> -        addp 16 - 8, sp
> > 
> > The native function expects 1 ExecState* arg.  Hence, before the call, we need to: subtract the sp by 4 for alignment, and push the cfr.  With this 2 and the pushed return PC and cfr to follow, the sp will be aligned again in the callee.  Similarly after the call returns, we need to add 8 to the sp.
> 
> Here, you'll need:
> addp 16, sp

Not needed per prior comment.

> >> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:319
> >> +    end
> > 
> > You should make this part of pushCalleeSaves instead of doing it here.  Either that or we get rid of pushCalleeSaves and push the registers explicitly here.  Otherwise, it’s one step removed to be able to see whether we have retained stack alignment or not.
> > 
> > At first, looking at this code, I thought we need to "subp 4, sp” instead of “subp 12, sp” because we pushed 3 callee saved regs bringing the total to 12 and needing 4 more bytes for alignment.  And then I realized that we need to add another 8 to compensate for the fact that the 32-bit CallFrameHeader only has 5 Registers in size and that we need to push a VMEntrySentinelFrame soon.
> > 
> > The pushCalleeSaves number of words may vary architecture to architecture, but we’ll always want to subtract the sp by 8 here (for ARM and other 32-bit archs as well) to compensate for the CallFrameHeader being only 5 Registers in size and that we need the stack aligned after we push the VMEntrySentinelFrame.  So, here’s what I propose we do to make things more sane:
> > 
> > 1. After we push cfr, the sp should be aligned (as one would expect in the ABI).
> > 2. After we push the callee saved registers, let’s keep the sp aligned.  Hence, in pushCalleeSaves for X86, let’s subtract another 4 bytes from sp.
> > 3. In doCallToJavaScript(), subtract the sp by 8 there before pushing the VMEntrySentinelFrame.  Also add a comment to explain that that 8 bytes adjustment is needed because we need to keep the sp aligned after we push the VMEntrySentinelFrame and the VmEntrySentinelFrame being of the size of a CallFrameHeader is 8 bytes short of alignment.
> > 
> > This way, it’ll be easier to reason about the stack alignment at each step.  Each part takes care of their own requirement.  And if pushCalleeSaves vary for different ports, they only need to worry about their own number of saved registers and align that.
> 
> Since there is no requirement on the alignment of the cfr, the only requirement is that the sp is aligned by the time you get to the call to llint_throw_stack_overflow_error() in doCallToJavaScriptCore().  I think you should still do all the adjustments in pushCalleeSaves and document the reasons for any padding you add there for alignment.
>
> Thinking about this some more, I feel that we should have pushCalleeSaves as a macro in the LLINT asm files.  Currently, having to look into the offline asm files (in order to figure out what pushCalleeSaves does) makes it harder to reason about whether we've met the alignment requirement or not.  Can you retire the pushCalleeSaves and popCalleeSaves instructions and replace them with LLINT macros instead?

It probably makes sense to break out pushCalleeSaves as individual pushes and corresponding pops for popCalleeSaves.  That is probably best deferred to after landing on trunk as there exists the prerequisite changes to ARM64 push and pop working on pairs of registers.  Files <https://bugs.webkit.org/show_bug.cgi?id=127155> to track this.

> >> Source/JavaScriptCore/llint/LowLevelInterpreter.asm:329
> >> +
> > 
> > You should do this as part of popCalleeSaves. Same as above, add 4 instead of 12.
> 
> Same … do this as part of a LLINT popCalleeSaves() macro, but I'm not sure about the amount to align yet.  Need to see what the resultant pushCalleeSaves() and popCalleeSaves() macros look like.
Comment 5 Mark Lam 2014-01-17 17:37:43 PST
Comment on attachment 221432 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=221432&action=review

> Source/JavaScriptCore/assembler/MaxFrameExtentForSlowPathCall.h:50
>  // 6 args on stack (24 bytes) + 8 bytes to align the stack.

This comment is out of date.

> Source/JavaScriptCore/interpreter/ProtoCallFrame.cpp:49
> -    paddedArgsCount = WTF::roundUpToMultipleOf(stackAlignmentRegisters(), paddedArgsCount);
> -    this->setPaddedArgsCount(paddedArgsCount);
> +    paddedArgsCount = roundArgumentCountToAlignFrame(paddedArgsCount);
> +    this->setPaddedArgCount(paddedArgsCount);

I find roundArgumentCountToAlignFrame() to be unclear as to what it’s trying to do here.  I’ll upload a patch shortly with a different way to compute this which I think is more clear about the intent.

> Source/JavaScriptCore/llint/LLIntEntrypoint.cpp:122
> -    size_t registerCount = codeBlock->m_numCalleeRegisters + maxFrameExtentForSlowPathCallInRegisters;
> +    size_t registerCount = codeBlock->m_numCalleeRegisters;
>      ASSERT(registerCount == WTF::roundUpToMultipleOf(stackAlignmentRegisters(), registerCount));
> +    registerCount += maxFrameExtentForSlowPathCallInRegisters;

This weakens the assertion that was previously there.  I think we can do better.  I’ll upload a patch shortly for my proposal on how.

> Source/JavaScriptCore/runtime/StackAlignment.h:48
> +// Align argument count taking into account the CallFrameHeaderSize may be unaligned
> +inline unsigned roundArgumentCountToAlignFrame(unsigned argumentCount)
> +{
> +    unsigned callFrameHeaderSizeBias = JSStack::CallFrameHeaderSize % 2;
> +    return WTF::roundUpToMultipleOf(stackAlignmentRegisters(), argumentCount + callFrameHeaderSizeBias) - callFrameHeaderSizeBias;
> +}

This function is confusing to me.  2 main issues:
1. Why % 2?  Why not % stackAlignmentRegisters()?  I think this logic will break if stackAlignmentRegisters() is not 2.
2. We’re not clear about the end goal of this alignment exercise?  Hence, it is hard to tell whether we’re doing it right.

I’ll propose an alternative in a patch I’ll upload shortly.
Comment 6 Mark Lam 2014-01-17 17:38:40 PST
Created attachment 221518 [details]
proposed patch with changes in areas where I felt we need more clarity. ChangeLog not included.
Comment 7 Mark Lam 2014-01-17 17:39:58 PST
Created attachment 221519 [details]
diff between the 2 patches
Comment 8 Michael Saboff 2014-01-17 20:49:05 PST
Created attachment 221527 [details]
Patch with updates after conversation with Mark

Fixed the rounding math to use JSStack::CallFrameHeaderSize for argument count rounding and JSStack::CallerFrameAndPCSize for local register count rounding.  What Mark was calling framePointerAlignmentOvershoot is really roundUpToMultipleOf(stackAlignmentRegisters(), JSStack::CallerFrameAndPCSize).  I used that directly in the local register rounding computation.
Comment 9 Michael Saboff 2014-01-17 21:01:26 PST
Committed r162240: <http://trac.webkit.org/changeset/162240>