Bug 136436 - REGRESSION(r173031): crashes during run-layout-jsc on x86/Linux
Summary: REGRESSION(r173031): crashes during run-layout-jsc on x86/Linux
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
: 132740 (view as bug list)
Depends on:
Blocks: 108645
  Show dependency treegraph
 
Reported: 2014-09-01 14:41 PDT by Julien Brianceau
Modified: 2014-09-05 00:49 PDT (History)
5 users (show)

See Also:


Attachments
run-layout-jsc results (tested on r173161) (133.04 KB, application/x-compressed-tar)
2014-09-01 14:44 PDT, Julien Brianceau
no flags Details
Workaround proposal (1.33 KB, patch)
2014-09-01 14:56 PDT, Julien Brianceau
no flags Details | Formatted Diff | Diff
Fix proposal (1.75 KB, patch)
2014-09-02 15:00 PDT, Julien Brianceau
no flags Details | Formatted Diff | Diff
Patch (1.89 KB, patch)
2014-09-02 15:22 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch from review comments. (1.98 KB, patch)
2014-09-02 16:11 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch after discussion with ggaren (9.83 KB, patch)
2014-09-04 12:52 PDT, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Julien Brianceau 2014-09-01 14:41:22 PDT
Since r173031, 200+ crashes seen when performing run-layout-jsc on x86 32-bit/Linux/gcc 4.7.3
Comment 1 Julien Brianceau 2014-09-01 14:44:08 PDT
Created attachment 237463 [details]
run-layout-jsc results (tested on r173161)
Comment 2 Julien Brianceau 2014-09-01 14:56:08 PDT
Created attachment 237465 [details]
Workaround proposal

It appears that in my env (Ubuntu Linux x86 32-bit / gcc 4.7.3):

    execCallee->setScope(exec->scope());
    // exec pointer is valid at this point
    execCallee->setCodeBlock(0);
    // exec pointer is null now
    execCallee->setCallerFrame(exec); // setting null caller frame
Comment 3 Julien Brianceau 2014-09-01 14:57:27 PDT
With this patch, run-layout-jsc is running fine again:

    All 523 tests passed!
Comment 4 Akos Kiss 2014-09-01 15:22:08 PDT
Could you provide a disassembly of operationCallEval? That might help understanding why exec is overwritten. I'm just curious.
Comment 5 Julien Brianceau 2014-09-01 15:47:26 PDT
(In reply to comment #4)
> Could you provide a disassembly of operationCallEval? That might help understanding why exec is overwritten. I'm just curious.
Sure. Here is the disassembly, taken from Source/JavaScriptCore/CMakeFiles/JavaScriptCore.dir/jit/JITOperations.cpp.o file, in release mode with -g:

00003233 <operationCallEval>:

EncodedJSValue JIT_OPERATION operationCallEval(ExecState* exec, ExecState* execCallee)
{
    3233:       55                      push   %ebp
    3234:       89 e5                   mov    %esp,%ebp
    3236:       53                      push   %ebx
    3237:       83 ec 34                sub    $0x34,%esp
    323a:       e8 fc ff ff ff          call   323b <operationCallEval+0x8>
    323f:       81 c3 02 00 00 00       add    $0x2,%ebx
    ASSERT(exec->codeBlock()->codeType() != FunctionCode
    3245:       8b 45 08                mov    0x8(%ebp),%eax
    3248:       89 04 24                mov    %eax,(%esp)
    324b:       e8 fc ff ff ff          call   324c <operationCallEval+0x19>
    3250:       89 04 24                mov    %eax,(%esp)
    3253:       e8 fc ff ff ff          call   3254 <operationCallEval+0x21>
        || !exec->codeBlock()->needsActivation()
        || exec->hasActivation());
    3258:       83 f8 02                cmp    $0x2,%eax
    325b:       75 58                   jne    32b5 <operationCallEval+0x82>
    directPutByVal(exec, asObject(baseValue), subscript, value);
}

EncodedJSValue JIT_OPERATION operationCallEval(ExecState* exec, ExecState* execCallee)
{
    ASSERT(exec->codeBlock()->codeType() != FunctionCode
    325d:       8b 45 08                mov    0x8(%ebp),%eax
    3260:       89 04 24                mov    %eax,(%esp)
    3263:       e8 fc ff ff ff          call   3264 <operationCallEval+0x31>
    3268:       89 04 24                mov    %eax,(%esp)
    326b:       e8 fc ff ff ff          call   326c <operationCallEval+0x39>
    3270:       84 c0                   test   %al,%al
    3272:       74 41                   je     32b5 <operationCallEval+0x82>
    3274:       8b 45 08                mov    0x8(%ebp),%eax
    3277:       89 04 24                mov    %eax,(%esp)
    327a:       e8 fc ff ff ff          call   327b <operationCallEval+0x48>
    327f:       83 f0 01                xor    $0x1,%eax
    3282:       84 c0                   test   %al,%al
    3284:       74 2f                   je     32b5 <operationCallEval+0x82>
    3286:       8d 83 c0 21 00 00       lea    0x21c0(%ebx),%eax
    328c:       89 44 24 0c             mov    %eax,0xc(%esp)
    3290:       8d 83 00 c0 00 00       lea    0xc000(%ebx),%eax
    3296:       89 44 24 08             mov    %eax,0x8(%esp)
    329a:       c7 44 24 04 62 02 00    movl   $0x262,0x4(%esp)
    32a1:       00 
    32a2:       8d 83 90 20 00 00       lea    0x2090(%ebx),%eax
    32a8:       89 04 24                mov    %eax,(%esp)
    32ab:       e8 fc ff ff ff          call   32ac <operationCallEval+0x79>
    32b0:       e8 fc ff ff ff          call   32b1 <operationCallEval+0x7e>
        || !exec->codeBlock()->needsActivation()
        || exec->hasActivation());

    execCallee->setScope(exec->scope());
    32b5:       8b 45 08                mov    0x8(%ebp),%eax
    32b8:       89 04 24                mov    %eax,(%esp)
    32bb:       e8 fc ff ff ff          call   32bc <operationCallEval+0x89>
    32c0:       89 44 24 04             mov    %eax,0x4(%esp)
    32c4:       8b 45 0c                mov    0xc(%ebp),%eax
    32c7:       89 04 24                mov    %eax,(%esp)
    32ca:       e8 fc ff ff ff          call   32cb <operationCallEval+0x98>
    execCallee->setCodeBlock(0);
    32cf:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp)
    32d6:       00 
    32d7:       8b 45 0c                mov    0xc(%ebp),%eax
    32da:       89 04 24                mov    %eax,(%esp)
    32dd:       e8 fc ff ff ff          call   32de <operationCallEval+0xab>
    execCallee->setCallerFrame(exec);
    32e2:       8b 45 08                mov    0x8(%ebp),%eax
    32e5:       89 44 24 04             mov    %eax,0x4(%esp)
    32e9:       8b 45 0c                mov    0xc(%ebp),%eax
    32ec:       89 04 24                mov    %eax,(%esp)
    32ef:       e8 fc ff ff ff          call   32f0 <operationCallEval+0xbd>

    if (!isHostFunction(execCallee->calleeAsValue(), globalFuncEval))
    32f4:       8d 45 f0                lea    -0x10(%ebp),%eax
    32f7:       8b 55 0c                mov    0xc(%ebp),%edx
    32fa:       89 54 24 04             mov    %edx,0x4(%esp)
    32fe:       89 04 24                mov    %eax,(%esp)
    3301:       e8 fc ff ff ff          call   3302 <operationCallEval+0xcf>
    [...]
Comment 6 Akos Kiss 2014-09-02 08:18:57 PDT
Something very nasty is happening here. I've put my hands on an x86/32-bit chroot and built an EFL/debug jsc. When I run jsc --useLLInt=false eval.js, where eval.js is:

var c = "";
eval(c);

I also get the segfault. Running it in gdb and setting a breakpoint at JSC::operationCallEval reveals the following:

(gdb) disas  JSC::operationCallEval
Dump of assembler code for function JSC::operationCallEval(JSC::ExecState*, JSC::ExecState*):
   0x08808885 <+0>:     push   %ebp
   0x08808886 <+1>:     mov    %esp,%ebp
   0x08808888 <+3>:     push   %ebx
   0x08808889 <+4>:     sub    $0x34,%esp
   0x0880888c <+7>:     call   0x86bfa30 <__x86.get_pc_thunk.bx>
   0x08808891 <+12>:    add    $0xc8476f,%ebx
=> 0x08808897 <+18>:    mov    0x8(%ebp),%eax

At this point, esp=0xffffcda0 and ebp=0xffffcdd8 (the difference between them is 0x38, just as expected from the code above). The contents of the stack from *ebp are:

0xffffcdd8: 0xffffce28 // <=ebp, this is the result of push %ebp, i.e., before calling operationCallEval, ebp was 0xffffce28
0xffffcddc: 0xf5f1eef0 // this is the return address pushed on stack by the call to operationCallEval
0xffffcde0: 0xffffce28 // this is the first argument of operationCallEval, i.e., ExecState* exec (the previous ebp)
0xffffcde4: 0xffffcdd8 // this is the second argument of operationCallEval, i.e., ExecState* execCallee

And this last line is strange: execCallee should point to a region of memory which has the layout of { callerFrame, returnAddress, codeBlock, scopeChain, callee, argCount, thisArg, args... }. However, in this case, the arguments of operationCallEval (i.e., the pointers exec and execCallee) on the stack are overlapping with the contents of execCallee itself. So, calling execCallee->setCodeBlock(0) zeros out execCallee[2], which is also exec.

"Interestingly", this was also happening before the patch landed in r173031, but fortunately, exec was not used after the call to setCodeBlock, so the problem did not reveal itself. So, swapping the lines acts as a workaround but there is something nastier underneath, which would be worth of investigation. IMHO.
Comment 7 Julien Brianceau 2014-09-02 08:24:24 PDT
(In reply to comment #6)
> "Interestingly", this was also happening before the patch landed in r173031, but fortunately, exec was not used after the call to setCodeBlock, so the problem did not reveal itself. So, swapping the lines acts as a workaround but there is something nastier underneath, which would be worth of investigation. IMHO.
I fully agree with you, that's why I named my patch "workaround" and not "fix" :D
Comment 8 Michael Saboff 2014-09-02 10:07:09 PDT
(In reply to comment #6)
> Something very nasty is happening here. I've put my hands on an x86/32-bit chroot and built an EFL/debug jsc. When I run jsc --useLLInt=false eval.js, where eval.js is:
> 
> var c = "";
> eval(c);
> 
> I also get the segfault. Running it in gdb and setting a breakpoint at JSC::operationCallEval reveals the following:
> 
> (gdb) disas  JSC::operationCallEval
> Dump of assembler code for function JSC::operationCallEval(JSC::ExecState*, JSC::ExecState*):
>    0x08808885 <+0>:     push   %ebp
>    0x08808886 <+1>:     mov    %esp,%ebp
>    0x08808888 <+3>:     push   %ebx
>    0x08808889 <+4>:     sub    $0x34,%esp
>    0x0880888c <+7>:     call   0x86bfa30 <__x86.get_pc_thunk.bx>
>    0x08808891 <+12>:    add    $0xc8476f,%ebx
> => 0x08808897 <+18>:    mov    0x8(%ebp),%eax
> 
> At this point, esp=0xffffcda0 and ebp=0xffffcdd8 (the difference between them is 0x38, just as expected from the code above). The contents of the stack from *ebp are:
> 
> 0xffffcdd8: 0xffffce28 // <=ebp, this is the result of push %ebp, i.e., before calling operationCallEval, ebp was 0xffffce28
> 0xffffcddc: 0xf5f1eef0 // this is the return address pushed on stack by the call to operationCallEval
> 0xffffcde0: 0xffffce28 // this is the first argument of operationCallEval, i.e., ExecState* exec (the previous ebp)
> 0xffffcde4: 0xffffcdd8 // this is the second argument of operationCallEval, i.e., ExecState* execCallee
> 
> And this last line is strange: execCallee should point to a region of memory which has the layout of { callerFrame, returnAddress, codeBlock, scopeChain, callee, argCount, thisArg, args... }. However, in this case, the arguments of operationCallEval (i.e., the pointers exec and execCallee) on the stack are overlapping with the contents of execCallee itself. So, calling execCallee->setCodeBlock(0) zeros out execCallee[2], which is also exec.
> 
> "Interestingly", this was also happening before the patch landed in r173031, but fortunately, exec was not used after the call to setCodeBlock, so the problem did not reveal itself. So, swapping the lines acts as a workaround but there is something nastier underneath, which would be worth of investigation. IMHO.

I don't think that this is due to r173031.  The issue is likely that the space pointed to by execCallee overlaps the stacked argument location on architectures that pass arguments via the stack.  In JIT::compileCallEval(), we use the stack pointer as the calleeFrame pointer and then make the call via callOperationNoExceptionCheck().  For X86-32, we use poke(reg, offset) where the offset is a positive offset of the stack pointer to pass the arguments.

I think requires that we either move up execCallee in the stack (harder) or move down the outgoing argument area for architectures that use stacked arguments (easier).
Comment 9 Julien Brianceau 2014-09-02 15:00:59 PDT
Created attachment 237517 [details]
Fix proposal
Comment 10 Michael Saboff 2014-09-02 15:22:35 PDT
Created attachment 237519 [details]
Patch

Well, we think alike.  I was working on this one and testing when you posted.  I prefer keying this off of argument registers and not CPU type.

BTW, X86-32 on Mac works with the old code on debug because the arguments are moved to local stack variables and for release because the args are moved into registers.
Comment 11 Julien Brianceau 2014-09-02 15:44:46 PDT
(In reply to comment #10)
> Created an attachment (id=237519) [details]
> Patch
> 
> Well, we think alike.  I was working on this one and testing when you posted.  I prefer keying this off of argument registers and not CPU type.
> 
> BTW, X86-32 on Mac works with the old code on debug because the arguments are moved to local stack variables and for release because the args are moved into registers.

Looks good to me.

As the stack must remain aligned (otherwise it will crash in an alignment check in the LLINT), do you think it's worth to assert something like this too ?

   ASSERT(!((sizeof(Register)*4) % stackAlignmentBytes()));
Comment 12 Akos Kiss 2014-09-02 15:51:53 PDT
(In reply to comment #10)
> Created an attachment (id=237519) [details]
> Patch

I was wondering why to subtract sizeof(Register) * 4 from sp, why not sizeof(CallerFrameAndPC)? (And then, the arithmetic is not even necessary, since the result is already in regT1.)

A stack/call frame layout problem also exists on ARM32. It would be worth looking at https://bugs.webkit.org/show_bug.cgi?id=132740 . There, a highly similar approach has been suggested (not reviewed yet / titled dirty hack).
Comment 13 Michael Saboff 2014-09-02 16:11:44 PDT
Created attachment 237526 [details]
Updated patch from review comments.
Comment 14 Geoffrey Garen 2014-09-03 11:48:00 PDT
Comment on attachment 237526 [details]
Updated patch from review comments.

View in context: https://bugs.webkit.org/attachment.cgi?id=237526&action=review

> Source/JavaScriptCore/jit/JITCall32_64.cpp:227
> +#if NUMBER_OF_ARGUMENT_REGISTERS < 4
> +    // Add stack space for arguments for architectures that pass most / all args on the stack
> +    int32_t addedStackSpace = (sizeof(GPRReg) * 4 + stackAlignmentBytes() - 1) & ~(stackAlignmentBytes() - 1);
> +    addPtr(TrustedImm32(-addedStackSpace), stackPointerRegister);

Who actually writes these four GPRs to the stack? Why isn't that function responsible for making room on the stack?
Comment 15 Michael Saboff 2014-09-04 12:52:23 PDT
Created attachment 237640 [details]
Updated patch after discussion with ggaren
Comment 16 Geoffrey Garen 2014-09-04 13:50:31 PDT
Comment on attachment 237640 [details]
Updated patch after discussion with ggaren

View in context: https://bugs.webkit.org/attachment.cgi?id=237640&action=review

r=me

> Source/JavaScriptCore/ChangeLog:10
> +        That stack pointer provides space for the worse case number of stacked

"worst"

> Source/JavaScriptCore/jit/JITOperations.cpp:-615
> -    execCallee->setCallerFrame(exec);

Does the LLInt need updating to perform this callerFrame store as well, or does it do so already?
Comment 17 Michael Saboff 2014-09-04 14:03:37 PDT
(In reply to comment #16)
> Does the LLInt need updating to perform this callerFrame store as well, or does it do so already?

The LLInt has its own slow path that includes the setting the callerFrame.
Comment 18 Michael Saboff 2014-09-04 14:23:56 PDT
Committed r173282: <http://trac.webkit.org/changeset/173282>
Comment 19 Csaba Osztrogonác 2014-09-05 00:47:16 PDT
*** Bug 132740 has been marked as a duplicate of this bug. ***