Bug 127777

Summary: [GTK][Windows] Fix the regression caused by the jsCStack branch merge
Product: WebKit Reporter: Csaba Osztrogonác <ossy>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Blocker CC: ap, barraclough, bfulgham, bugs-noreply, eric.carlson, ggaren, gyuyoung.kim, mcatanzaro, mhahnenberg, mrobinson, msaboff, oliver, optional.anne, ossy, roger_fong, sergio, zan
Priority: P1    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 127898, 127902, 127909, 127940, 128434, 128961, 129099, 132740    
Bug Blocks: 127763    
Attachments:
Description Flags
release test log with stress test
none
release test log without stress test
none
debug test log without stress test
none
debug test log with stress test
none
Disassembled JSC::Interpreter::execute none

Description Csaba Osztrogonác 2014-01-28 07:01:04 PST
After adding the missing files to the cmake buildsystem and removed
the dead code I managed to build EFLWebKit. But it seems that the
merge caused a serious regression in tests.

( I don't know the impact of the merge for the layout tests, because it 
is impossible to build a WebKitEFL which is able to run layout tests.
See https://bugs.webkit.org/show_bug.cgi?id=127776 for details.)

I'll attach the detailed results.
Comment 1 Csaba Osztrogonác 2014-01-28 07:04:25 PST
Created attachment 222443 [details]
release test log with stress test
Comment 2 Mark Hahnenberg 2014-01-28 07:21:01 PST
(In reply to comment #1)
> Created an attachment (id=222443) [details]
> run-javascriptcore-tests log

Hmm, there are no failing tests on Mac, so I don't know why this seems so broken for EFL.
Comment 3 Csaba Osztrogonác 2014-01-28 08:59:00 PST
Created attachment 222452 [details]
release test log without stress test

command to reproduce: $ Tools/Scripts/run-javascriptcore-tests --release --efl --no-jsc-stress --no-build
Comment 4 Csaba Osztrogonác 2014-01-28 09:01:47 PST
Created attachment 222455 [details]
debug test log without stress test

command to reproduce: $ Tools/Scripts/run-javascriptcore-tests --debug --efl --no-jsc-stress --no-build
(after I removed forcing us to use SHARED_CORE=ON in Source/cmake/OptionsEfl.cmake and build-jsc)

All tests pass in debug mode. It's so strange.
Comment 5 Csaba Osztrogonác 2014-01-28 09:17:43 PST
Created attachment 222460 [details]
debug test log with stress test
Comment 6 Csaba Osztrogonác 2014-01-28 09:24:18 PST
short summary:
- release mode with stress test: 2096 failures
- debug mode with stress test: 2 failures
- release mode without stress test: 35 regressions
- debug mode without stress test: 0 regressions

I try to produce a release build with debug 
symbols to be able produce a useful crash log.
Comment 7 Zan Dobersek 2014-01-28 15:44:08 PST
Same issue on the GTK port.

Given the fact that the changes work well on debug builds but fail on release builds, I played with using the system allocator as the default on release builds (since it's used in debug builds). Turns out this is _not_ the reason behind these regressions.
Comment 8 Zan Dobersek 2014-01-29 03:52:17 PST
The crashes disappear when optimizations in release builds are disabled (i.e. switching from -O2 to -O0).
Comment 9 Csaba Osztrogonác 2014-01-29 06:28:59 PST
I managed to get a release build with debug symbols and could reproduce
the crash:

Program received signal SIGSEGV, Segmentation fault.
JSC::eval (callFrame=0x7fffffffcf90) at /home/ossy/WebKit/Source/JavaScriptCore/interpreter/Interpreter.cpp:111
111	    EvalExecutable* eval = callerCodeBlock->evalCodeCache().tryGet(callerCodeBlock->isStrictMode(), programSource, callerScopeChain);
(gdb) bt
#0  JSC::eval (callFrame=0x7fffffffcf90) at /home/ossy/WebKit/Source/JavaScriptCore/interpreter/Interpreter.cpp:111
#1  0x00000000006121a4 in JSC::operationCallEval (exec=0x7fffffffcf90, execCallee=0x7fffffffd060)
    at /home/ossy/WebKit/Source/JavaScriptCore/jit/JITOperations.cpp:625
#2  0x00007fffab12dac2 in ?? ()
#3  0x0000000000000000 in ?? ()

(gdb) p callerCodeBlock
$1 = (JSC::CodeBlock *) 0x0

Source/JavaScriptCore/interpreter/Interpreter.cpp:111
------------------------------------------------------
eval = callerCodeBlock->evalCodeCache().getSlow(callFrame, callerCodeBlock->ownerExecutable(), callerCodeBlock->isStrictMode(), programSource, callerScopeChain);


Any pointer how can I continue debugging?
Comment 10 Zan Dobersek 2014-01-29 06:50:46 PST
Created attachment 222570 [details]
Disassembled JSC::Interpreter::execute

This is the backtrace as provided by GDB:

#0  JSC::Interpreter::execute (this=0x7ffff7ee1960, eval=0x7fffb195c270, callFrame=callFrame@entry=0x7fffffffc140, thisValue=..., scope=scope@entry=0x0) at ../../Source/JavaScriptCore/heap/MarkedBlock.h:262
No locals.
#1  0x00007ffff77ff50e in JSC::eval (callFrame=<optimized out>) at ../../Source/JavaScriptCore/interpreter/Interpreter.cpp:138
        topCallFrame = {vm = @0x7ffff7ed5000, oldCallFrame = 0x7fffffffc2f0}
        programSource = "Y()"
        eval = <optimized out>
        program = <optimized out>
        interpreter = <optimized out>
#2  0x00007ffff783789b in JSC::operationCallEval (exec=0x7ffff7ee1960, execCallee=0x7fffb195c270) at ../../Source/JavaScriptCore/jit/JITOperations.cpp:625
        result = {static numberOfInt52Bits = 52, static int52ShiftAmount = 12, u = {asInt64 = 0, ptr = 0x0, asBits = {payload = 0, tag = 0}}}
#3  0x00007fffb3a8359e in ?? ()
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.

Attached is the disassembly of JSC::Interpreter::execute.
Comment 11 Zan Dobersek 2014-01-29 12:13:44 PST
Testing the various invocations of the sunspider-1.0/date-format-tofte.js test:


The following arguments to jsc result in a crash with the backtrace detailed in comment #9:
1. --useLLInt=false

The following extra arguments to jsc (besides the path to test) result in a crash with the backtrace detailed in comment #10:
1. none (just the path to test)
2. --minHeapUtilization=2.0 --minCopiedBlockUtilization=2.0
3. --enableConcurrentJIT=false --validateBytecode=true --validateGraphAtEachPhase=true
4. --thresholdForJITAfterWarmUp=10 --thresholdForJITSoon=10 --thresholdForOptimizeAfterWarmUp=20 --thresholdForOptimizeAfterLongWarmUp=20 --thresholdForOptimizeSoon=20 --thresholdForFTLOptimizeAfterWarmUp=20 --thresholdForFTLOptimizeSoon=20
5. --enableConcurrentJIT=false --validateGraph=true --thresholdForJITAfterWarmUp=10 --thresholdForJITSoon=10 --thresholdForOptimizeAfterWarmUp=20 --thresholdForOptimizeAfterLongWarmUp=20 --thresholdForOptimizeSoon=20 --thresholdForFTLOptimizeAfterWarmUp=20 --thresholdForFTLOptimizeSoon=20
Comment 12 Csaba Osztrogonác 2014-01-29 14:26:25 PST
It made all tests fail on Windows too:
http://build.webkit.org/builders/Apple%20Win%207%20Release%20%28Tests%29/builds/42167

Now Apple Mac is the only working WebKit port :(
Comment 13 Zan Dobersek 2014-01-29 14:59:53 PST
One of the problems is in operationCallEval.
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/jit/JITOperations.cpp#L612

Here's the disassembled function for the current code:
Dump of assembler code for function operationCallEval:
=> 0x00007ffff7837840 <+0>:	mov    0x20(%rsi),%rax
   0x00007ffff7837844 <+4>:	mov    0x18(%rdi),%rdx
   0x00007ffff7837848 <+8>:	movabs $0xffff000000000002,%rcx
   0x00007ffff7837852 <+18>:	movq   $0x0,0x10(%rsi)
   0x00007ffff783785a <+26>:	test   %rcx,%rax
   0x00007ffff783785d <+29>:	mov    %rdx,0x18(%rsi)
   0x00007ffff7837861 <+33>:	jne    0x7ffff783786c <operationCallEval+44>
   0x00007ffff7837863 <+35>:	mov    (%rax),%rcx
   0x00007ffff7837866 <+38>:	cmpb   $0x13,0x5c(%rcx)
   0x00007ffff783786a <+42>:	je     0x7ffff7837870 <operationCallEval+48>
   0x00007ffff783786c <+44>:	xor    %eax,%eax
   0x00007ffff783786e <+46>:	retq   
   0x00007ffff783786f <+47>:	nop
   0x00007ffff7837870 <+48>:	mov    0x18(%rax),%rax
   0x00007ffff7837874 <+52>:	mov    0x8(%rax),%ecx
   0x00007ffff7837877 <+55>:	test   %ecx,%ecx
   0x00007ffff7837879 <+57>:	jne    0x7ffff783786c <operationCallEval+44>
   0x00007ffff783787b <+59>:	push   %rbx
   0x00007ffff783787c <+60>:	mov    0x34f6ed(%rip),%rbx        # 0x7ffff7b86f70
   0x00007ffff7837883 <+67>:	cmp    %rbx,0x50(%rax)
   0x00007ffff7837887 <+71>:	jne    0x7ffff78378a5 <operationCallEval+101>
   0x00007ffff7837889 <+73>:	xor    %dx,%dx
   0x00007ffff783788c <+76>:	mov    %rsi,%rdi
   0x00007ffff783788f <+79>:	mov    0x468(%rdx),%rbx
   0x00007ffff7837896 <+86>:	callq  0x7ffff760e790 <_ZN3JSC4evalEPNS_9ExecStateE@plt>
   0x00007ffff783789b <+91>:	cmpq   $0x0,0xbdb0(%rbx)
   0x00007ffff78378a3 <+99>:	je     0x7ffff78378b0 <operationCallEval+112>
   0x00007ffff78378a5 <+101>:	xor    %eax,%eax
   0x00007ffff78378a7 <+103>:	nopw   0x0(%rax,%rax,1)
   0x00007ffff78378b0 <+112>:	pop    %rbx
   0x00007ffff78378b1 <+113>:	retq   
End of assembler dump.

Throwing a simple fprintf() call into the mix, in this case before isHostFunction() call, seems to shift some registers around, and the number of JSC stress test failures drops to 50. Here's the resulting disassembled function:
Dump of assembler code for function operationCallEval:
=> 0x00007ffff78378a0 <+0>:	push   %rbp
   0x00007ffff78378a1 <+1>:	lea    0x26d0b6(%rip),%rdx        # 0x7ffff7aa495e
   0x00007ffff78378a8 <+8>:	push   %rbx
   0x00007ffff78378a9 <+9>:	mov    %rsi,%rbx
   0x00007ffff78378ac <+12>:	sub    $0x8,%rsp
   0x00007ffff78378b0 <+16>:	mov    0x18(%rdi),%rax
   0x00007ffff78378b4 <+20>:	movq   $0x0,0x10(%rsi)
   0x00007ffff78378bc <+28>:	mov    %rax,0x18(%rsi)
   0x00007ffff78378c0 <+32>:	mov    0x34f6a9(%rip),%rax        # 0x7ffff7b86f70
   0x00007ffff78378c7 <+39>:	mov    $0x1,%esi
   0x00007ffff78378cc <+44>:	mov    (%rax),%rdi
   0x00007ffff78378cf <+47>:	xor    %eax,%eax
   0x00007ffff78378d1 <+49>:	callq  0x7ffff760e790 <__fprintf_chk@plt>
   0x00007ffff78378d6 <+54>:	mov    0x20(%rbx),%rax
   0x00007ffff78378da <+58>:	movabs $0xffff000000000002,%rdx
   0x00007ffff78378e4 <+68>:	test   %rdx,%rax
   0x00007ffff78378e7 <+71>:	jne    0x7ffff78378f2 <operationCallEval+82>
   0x00007ffff78378e9 <+73>:	mov    (%rax),%rdx
   0x00007ffff78378ec <+76>:	cmpb   $0x13,0x5c(%rdx)
   0x00007ffff78378f0 <+80>:	je     0x7ffff7837900 <operationCallEval+96>
   0x00007ffff78378f2 <+82>:	add    $0x8,%rsp
   0x00007ffff78378f6 <+86>:	xor    %eax,%eax
   0x00007ffff78378f8 <+88>:	pop    %rbx
   0x00007ffff78378f9 <+89>:	pop    %rbp
   0x00007ffff78378fa <+90>:	retq   
   0x00007ffff78378fb <+91>:	nopl   0x0(%rax,%rax,1)
   0x00007ffff7837900 <+96>:	mov    0x18(%rax),%rax
   0x00007ffff7837904 <+100>:	mov    0x8(%rax),%edx
   0x00007ffff7837907 <+103>:	test   %edx,%edx
   0x00007ffff7837909 <+105>:	jne    0x7ffff78378f2 <operationCallEval+82>
   0x00007ffff783790b <+107>:	mov    0x34f666(%rip),%rcx        # 0x7ffff7b86f78
   0x00007ffff7837912 <+114>:	cmp    %rcx,0x50(%rax)
   0x00007ffff7837916 <+118>:	jne    0x7ffff78378f2 <operationCallEval+82>
   0x00007ffff7837918 <+120>:	mov    0x18(%rbx),%rax
   0x00007ffff783791c <+124>:	mov    %rbx,%rdi
   0x00007ffff783791f <+127>:	xor    %ax,%ax
   0x00007ffff7837922 <+130>:	mov    0x468(%rax),%rbp
   0x00007ffff7837929 <+137>:	callq  0x7ffff760e7a0 <_ZN3JSC4evalEPNS_9ExecStateE@plt>
   0x00007ffff783792e <+142>:	cmpq   $0x0,0xbdb0(%rbp)
   0x00007ffff7837936 <+150>:	jne    0x7ffff78378f2 <operationCallEval+82>
   0x00007ffff7837938 <+152>:	add    $0x8,%rsp
   0x00007ffff783793c <+156>:	pop    %rbx
   0x00007ffff783793d <+157>:	pop    %rbp
   0x00007ffff783793e <+158>:	retq   
End of assembler dump.
Comment 14 Zan Dobersek 2014-01-29 15:23:53 PST
Some further clarification on comment #13:

JSC::eval() and JSC::Interpreter::execute() rely on a non-null JSScope object being accessible through the relevant CallFrame. Currently the callerScopeChain variable in JSC::eval()[1] is assigned a null value, and the changes described in comment #13 (adding the extra fprintf call) somehow fix this.


[1] http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/interpreter/Interpreter.cpp#L110
Comment 15 Michael Saboff 2014-01-29 16:28:57 PST
(In reply to comment #14)
> Some further clarification on comment #13:
> 
> JSC::eval() and JSC::Interpreter::execute() rely on a non-null JSScope object being accessible through the relevant CallFrame. Currently the callerScopeChain variable in JSC::eval()[1] is assigned a null value, and the changes described in comment #13 (adding the extra fprintf call) somehow fix this.
> 
> 
> [1] http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/interpreter/Interpreter.cpp#L110

I believe the issue may be that the optimized version is not pushing rbp at the top of the function.  The top of the stack after the push of rbp is the exec pointer passed in as the first argument.  Without the push of rbp, the chain of caller frames is not complete.  In the case of operationCallEval() we pass in both the caller and callee, but the callee's "CallerFrame" doesn't get set up without the push rbp.

Are you compiling with the gcc option -fomit-frame-pointer?  If so, you may want to eliminate that option and see what that does.
Comment 16 Zan Dobersek 2014-01-30 00:39:06 PST
The -O2 option implies -fomit-frame-pointer, and passing -fno-omit-frame-pointer explicitly helps greatly, reducing the number of failures down to 25.

I'll deduce which additional optimization flags that are implied by -O2 are causing regressions.
Comment 17 Csaba Osztrogonác 2014-01-30 02:22:03 PST
I checked EFL compiling with -fno-omit-frame-pointer and all jsc tests pass without stress tests and only two failing stress tests remained.
Comment 18 Csaba Osztrogonác 2014-01-30 02:27:27 PST
(In reply to comment #17)
> I checked EFL compiling with -fno-omit-frame-pointer and all jsc tests pass without stress tests and only two failing stress tests remained.

Let's build EFL with -fno-omit-frame-pointer: https://bugs.webkit.org/show_bug.cgi?id=127898
Comment 19 Gyuyoung Kim 2014-02-05 22:29:19 PST
(In reply to comment #18)
> (In reply to comment #17)
> > I checked EFL compiling with -fno-omit-frame-pointer and all jsc tests pass without stress tests and only two failing stress tests remained.
> 
> Let's build EFL with -fno-omit-frame-pointer: https://bugs.webkit.org/show_bug.cgi?id=127898

Ossy, thank you for fixing this problem on EFL port ! However, it looks this js test regression still happen on efl bots and my local env. Do I need to do something to run js test for EFL port ?
Comment 20 Csaba Osztrogonác 2014-02-06 03:36:17 PST
(In reply to comment #19)
> Ossy, thank you for fixing this problem on EFL port ! However, it looks this js test regression still happen on efl bots and my local env. Do I need to do something to run js test for EFL port ?

There are still 2 failing tests on GTK and EFL too - https://bugs.webkit.org/show_bug.cgi?id=127902

Maybe -fno-tree-dce could solve the other failures occur on the bots
similar to GTK's fix in https://bugs.webkit.org/show_bug.cgi?id=127909 .
Unfortunately I can't try if it helps, I haven't a machine with GCC 4.8.