Bug 124409

Summary: [Win] JavaScript crashes on 64-bit with JIT enabled.
Product: WebKit Reporter: peavo
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, bfulgham, commit-queue, eflews.bot, ggaren, gyuyoung.kim, msaboff, rego+ews, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 124450    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description peavo 2013-11-15 06:52:55 PST
I'm currently getting several different crashes when running WinCairo with JIT enabled on 64-bit.

I've found the following issues:

The saving of the frame pointer in callToJavaScript in JITStubsMSVC64.asm is wrong.
The move instruction has flipped the arguments, compared to the GCC version.

The registers rsi and rdi in callToJavaScript needs to be saved and restored.
This is required by the Windows 64-bit ABI. The caller stores local variables in them,
and if not saved and restored, we crash when returning to the caller.

The getHostCallReturnValue function needs to be updated according to it's GCC counterpart.

The methods JIT::appendCallWithExceptionCheck, and JIT::appendCallWithCallFrameRollbackOnException needs
to allocate stack space for the 4 argument registers, as required by the Windows 64-bit ABI.
Comment 1 peavo 2013-11-15 07:09:19 PST
Created attachment 217049 [details]
Patch
Comment 2 Michael Saboff 2013-11-15 09:54:28 PST
Comment on attachment 217049 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=217049&action=review

Thanks for the work.  Looking pretty good.  It needs a couple of changes.
In addition to what is noted inline, the value that we sub/add to sp in callToJavaScript / returnFromJavaScript need to be adjusted for the additional pushes of rbi/rdi and for the space needed to make calls out.  The value should be large enough for the space needed to call AND result in a 32 byte aligned SP.  I think that means the new value should be 38h.   In addition to changing 28h -> 38h, update the comment to include that the calling convention requires space for 4 Dwords.

> Source/JavaScriptCore/jit/JITInlines.h:123
> +
> +#if OS(WINDOWS) && CPU(X86_64)
> +    // According to Windows 64-bit ABI, we need to allocate space on the stack for the 4 argument registers.
> +    subPtr(TrustedImm32(32), stackPointerRegister);
> +#endif
> +

Instead of making space here, we should increase the amount that we decrement from SP in callToJavaScript.

> Source/JavaScriptCore/jit/JITInlines.h:142
> +
> +#if OS(WINDOWS) && CPU(X86_64)
> +    // According to Windows 64-bit ABI, we need to allocate space on the stack for the 4 argument registers.
> +    subPtr(TrustedImm32(32), stackPointerRegister);
> +#endif
> +

Ditto.

> Source/JavaScriptCore/jit/JITStubsMSVC64.asm:36
> -    mov rbp, rax ; Save previous frame pointer
> +    mov rax, rbp ; Save previous frame pointer

This fix was taken care of in r159290: <http://trac.webkit.org/changeset/159290>
Comment 3 peavo 2013-11-15 11:00:14 PST
(In reply to comment #2)
> (From update of attachment 217049 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=217049&action=review
> 
> Thanks for the work.  Looking pretty good.  It needs a couple of changes.
> In addition to what is noted inline, the value that we sub/add to sp in callToJavaScript / returnFromJavaScript need to be adjusted for the additional pushes of rbi/rdi and for the space needed to make calls out.  The value should be large enough for the space needed to call AND result in a 32 byte aligned SP.  I think that means the new value should be 38h.   In addition to changing 28h -> 38h, update the comment to include that the calling convention requires space for 4 Dwords.

Thanks for the feedback :)

I updated the value to 38h, and removed the other two calls, but then I get a crash in:

void JIT_OPERATION operationPutByIdNonStrictOptimize(ExecState* exec, StructureStubInfo* stubInfo, EncodedJSValue encodedValue, EncodedJSValue encodedBase, StringImpl* uid)

I assume the 5. parameter here goes on the stack.

According to http://msdn.microsoft.com/en-us/library/ms235286.aspx, the stack should be aligned to 16 bytes, but I'm not sure what's correct?
Comment 4 Michael Saboff 2013-11-15 11:09:37 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 217049 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=217049&action=review
> > 
> > Thanks for the work.  Looking pretty good.  It needs a couple of changes.
> > In addition to what is noted inline, the value that we sub/add to sp in callToJavaScript / returnFromJavaScript need to be adjusted for the additional pushes of rbi/rdi and for the space needed to make calls out.  The value should be large enough for the space needed to call AND result in a 32 byte aligned SP.  I think that means the new value should be 38h.   In addition to changing 28h -> 38h, update the comment to include that the calling convention requires space for 4 Dwords.
> 
> Thanks for the feedback :)
> 
> I updated the value to 38h, and removed the other two calls, but then I get a crash in:
> 
> void JIT_OPERATION operationPutByIdNonStrictOptimize(ExecState* exec, StructureStubInfo* stubInfo, EncodedJSValue encodedValue, EncodedJSValue encodedBase, StringImpl* uid)
> 
> I assume the 5. parameter here goes on the stack.
> 
> According to http://msdn.microsoft.com/en-us/library/ms235286.aspx, the stack should be aligned to 16 bytes, but I'm not sure what's correct?

I take it things worked with your earlier patch.  My math could be wrong on the 38h.  Certainly 38h though is large enough for up to 7 values.  Can you provide a stack trace for the crash and the faulting instruction?

IIRC, The 32 byte requirement is for spilling xmm registers.
Comment 5 peavo 2013-11-15 11:57:50 PST
(In reply to comment #4)
> I take it things worked with your earlier patch.  My math could be wrong on the 38h.  Certainly 38h though is large enough for up to 7 values.  Can you provide a stack trace for the crash and the faulting instruction?
> 
> IIRC, The 32 byte requirement is for spilling xmm registers.

Yes, it worked with the previous patch. I had the exact same crash before, it was what made me add the SP subPtr/addPtr adjustments.

The faulting instruction is the last instruction in the disassembly:

void JIT_OPERATION operationPutByIdNonStrictOptimize(ExecState* exec, StructureStubInfo* stubInfo, EncodedJSValue encodedValue, EncodedJSValue encodedBase, StringImpl* uid)
{
0000000002D01100  mov         qword ptr [rsp+18h],rbx  
0000000002D01105  push        rbp  
0000000002D01106  push        rsi  
0000000002D01107  push        rdi  
0000000002D01108  push        r12  
0000000002D0110A  push        r13  
0000000002D0110C  sub         rsp,50h  
    VM* vm = &exec->vm();
0000000002D01110  mov         rax,qword ptr [rcx+18h]  
    NativeCallFrameTracer tracer(vm, exec);
    
    Identifier ident(vm, uid);
0000000002D01114  mov         rbx,qword ptr [uid]  
0000000002D0111C  mov         rdi,rdx  
0000000002D0111F  and         rax,0FFFFFFFFFFFF0000h  
0000000002D01125  mov         rsi,r9  
0000000002D01128  mov         r13,r8  
0000000002D0112B  mov         rdx,qword ptr [rax+468h]  
0000000002D01132  mov         rbp,rcx  
0000000002D01135  mov         qword ptr [rdx+9DB0h],rcx  
0000000002D0113C  mov         eax,dword ptr [rbx+18h]  


Stacktrace of crash:

>	JavaScriptCore.dll!operationPutByIdNonStrictOptimize(JSC::ExecState * exec, JSC::StructureStubInfo * stubInfo, __int64 encodedValue, __int64 encodedBase, WTF::StringImpl * uid)  Line 300 + 0x28 bytes	C++
 	0000000003d43e32()	
 	0000000006035420()	
 	000000000012ded8()	
 	0000000005f2ff98()	
 	JavaScriptCore.dll!JSC::getCallLinkInfoReturnLocation(JSC::CallLinkInfo * callLinkInfo)  Line 105	C++
 	130a29aa8dcc1fb4()	
 	0000000003c4f9b0()	
 	0000e6c375e3872c()	
 	000000000012ded8()	
 	0000000005f2ff98()	
 	0000000005f2ff98()	
 	00000000ffffffff()	
 	000000000012f288()	
 	0000000003c4f9b0()	
 	0000000001c40e68()	
 	000000000012df30()	
 	JavaScriptCore.dll!JSC::JITCode::execute(JSC::JSStack * stack, JSC::ExecState * callFrame, JSC::VM * vm)  Line 49 + 0x19 bytes	C++
 	JavaScriptCore.dll!JSC::Interpreter::execute(JSC::ProgramExecutable * program, JSC::ExecState * callFrame, JSC::JSObject * thisObj)  Line 883 + 0x28 bytes	C++
 	JavaScriptCore.dll!JSC::evaluate(JSC::ExecState * exec, const JSC::SourceCode & source, JSC::JSValue thisValue, JSC::JSValue * returnedException)  Line 85	C++
 	WebKit.dll!WebCore::ScriptController::evaluateInWorld(const WebCore::ScriptSourceCode & sourceCode, WebCore::DOMWrapperWorld & world)  Line 145 + 0x41 bytes	C++
 	WebKit.dll!WebCore::ScriptController::evaluate(const WebCore::ScriptSourceCode & sourceCode)  Line 162	C++
 	WebKit.dll!WebCore::ScriptElement::executeScript(const WebCore::ScriptSourceCode & sourceCode)  Line 310 + 0x14 bytes	C++
 	WebKit.dll!WebCore::ScriptElement::prepareScript(const WTF::TextPosition & scriptStartPosition, WebCore::ScriptElement::LegacyTypeSupport supportLegacyTypes)  Line 241 + 0x3d bytes	C++
 	WebKit.dll!WebCore::HTMLScriptRunner::runScript(WebCore::Element * script, const WTF::TextPosition & scriptStartPosition)  Line 304	C++
 	WebKit.dll!WebCore::HTMLScriptRunner::execute(WTF::PassRefPtr<WebCore::Element> * scriptElement, const WTF::TextPosition & scriptStartPosition)  Line 177	C++
 	WebKit.dll!WebCore::HTMLDocumentParser::runScriptsForPausedTreeBuilder()  Line 264	C++
 	WebKit.dll!WebCore::HTMLDocumentParser::pumpTokenizer(WebCore::HTMLDocumentParser::SynchronousMode mode)  Line 527 + 0x41 bytes	C++
 	WebKit.dll!WebCore::HTMLDocumentParser::resumeParsingAfterYield()  Line 252	C++
 	WebKit.dll!WebCore::ThreadTimers::sharedTimerFiredInternal()  Line 130	C++
 	WebKit.dll!WebCore::TimerWindowWndProc(HWND__ * hWnd, unsigned int message, unsigned __int64 wParam, __int64 lParam)  Line 111	C++
Comment 6 Michael Saboff 2013-11-15 12:08:39 PST
(In reply to comment #5)

What are the register values?
Comment 7 peavo 2013-11-15 12:17:16 PST
(In reply to comment #6)
> (In reply to comment #5)
> 
> What are the register values?

RAX = 0000000003FA0000 RBX = 130A29AA8DCC1FB4 RCX = 0000000005F2FF48 RDX = 0000000001C30FD0 RSI = 0000000003DDFFB0 RDI = 0000000006026AA0 R8  = 0000000003D7DAF0 R9  = 0000000003DDFFB0 R10 = 0000000005F3E390 R11 = 0000000002D01100 R12 = 0000000001C40E68 R13 = 0000000003D7DAF0 R14 = FFFF000000000000 R15 = FFFF000000000002 RIP = 0000000002D0113C RSP = 000000000012DD00 RBP = 0000000005F2FF48 EFL = 00010204
Comment 8 peavo 2013-11-15 13:34:05 PST
I have probably misunderstood something, but below is the disassembly which calls operationPutByIdNonStrictOptimize; I thought we had to also adjust the stack pointer before this call, not only before the call in callToJavaScript?

...
...
00000000053D3E00  mov         eax,edx  
00000000053D3E02  mov         r9,rax  
00000000053D3E05  mov         rdx,60E91B0h  
00000000053D3E0F  mov         rcx,rbp  
00000000053D3E12  mov         dword ptr [rbp+2Ch],1Eh  
00000000053D3E19  mov         r11,1C0AD80h  
00000000053D3E23  mov         qword ptr [r11],rbp  
00000000053D3E26  mov         r11,2CE1100h  
00000000053D3E30  call        r11               <     calls operationPutByIdNonStrictOptimize
Comment 9 Michael Saboff 2013-11-15 14:27:42 PST
(In reply to comment #8)
> I have probably misunderstood something, but below is the disassembly which calls operationPutByIdNonStrictOptimize; I thought we had to also adjust the stack pointer before this call, not only before the call in callToJavaScript?
> 
> ...
> ...
> 00000000053D3E00  mov         eax,edx  
> 00000000053D3E02  mov         r9,rax  
> 00000000053D3E05  mov         rdx,60E91B0h  
> 00000000053D3E0F  mov         rcx,rbp  
> 00000000053D3E12  mov         dword ptr [rbp+2Ch],1Eh  
> 00000000053D3E19  mov         r11,1C0AD80h  
> 00000000053D3E23  mov         qword ptr [r11],rbp  
> 00000000053D3E26  mov         r11,2CE1100h  
> 00000000053D3E30  call        r11               <     calls operationPutByIdNonStrictOptimize

We shouldn't have to because the stack pointer shouldn't be changed by LLInt or JIT code.  From the register dump, rbx is garbage.  Could you provide more disassembly for the caller?  I'm looking for where arg 5 is put on the stack.
Comment 10 Michael Saboff 2013-11-15 14:29:27 PST
Actually, the value for rbx is the same as the PC for the caller of getCallLinkInfoReturnLocation() in the stack trace.
Comment 11 peavo 2013-11-15 14:47:28 PST
(In reply to comment #9)

> We shouldn't have to because the stack pointer shouldn't be changed by LLInt or JIT code.  From the register dump, rbx is garbage.  Could you provide more disassembly for the caller?  I'm looking for where arg 5 is put on the stack.

0000000005463CE8  add         byte ptr [rax-75h],cl  
0000000005463CEB  xor         byte ptr [r8-75h],r9b  
0000000005463CEF  push        rbp  
0000000005463CF0  or          byte ptr [rax-75h],cl  
0000000005463CF3  ins         dword ptr [rdi],dx  
0000000005463CF4  add         byte ptr [rdx-3Dh],dl  
0000000005463CF7  mov         dword ptr [rbp+2Ch],0Ah  
0000000005463CFE  mov         r11,1CCAD80h  
0000000005463D08  mov         qword ptr [r11],rbp  
0000000005463D0B  add         rsp,0FFFFFFFFFFFFFFF0h  
0000000005463D0F  mov         rcx,rsp  
0000000005463D12  mov         rdx,rbp  
0000000005463D15  mov         r8,606C520h  
0000000005463D1F  mov         r11,2E55200h  
0000000005463D29  call        r11  
0000000005463D2C  pop         rax  
0000000005463D2D  pop         rdx  
0000000005463D2E  mov         r11,1CCC830h  
0000000005463D38  mov         r11,qword ptr [r11]  
0000000005463D3B  test        r11,r11  
0000000005463D3E  jne         0000000005463FD2  
0000000005463D44  mov         rax,qword ptr [rbp-28h]  
0000000005463D48  jmp         0000000005463AA1  
0000000005463D4D  mov         dword ptr [rbp+2Ch],0Dh  
0000000005463D54  mov         r11,1CCAD80h  
0000000005463D5E  mov         qword ptr [r11],rbp  
0000000005463D61  add         rsp,0FFFFFFFFFFFFFFF0h  
0000000005463D65  mov         rcx,rsp  
0000000005463D68  mov         rdx,rbp  
0000000005463D6B  mov         r8,606C538h  
0000000005463D75  mov         r11,2E58C70h  
0000000005463D7F  call        r11  
0000000005463D82  pop         rax  
0000000005463D83  pop         rdx  
0000000005463D84  mov         r11,1CCC830h  
0000000005463D8E  mov         r11,qword ptr [r11]  
0000000005463D91  test        r11,r11  
0000000005463D94  jne         0000000005463FD2  
0000000005463D9A  mov         rax,qword ptr [rbp+30h]  
0000000005463D9E  jmp         0000000005463ADD  
0000000005463DA3  mov         rdx,5F68120h  
0000000005463DAD  mov         rcx,rbp  
0000000005463DB0  mov         dword ptr [rbp+2Ch],11h  
0000000005463DB7  mov         r11,1CCAD80h  
0000000005463DC1  mov         qword ptr [r11],rbp  
0000000005463DC4  mov         r11,2D95510h  
0000000005463DCE  call        r11  
0000000005463DD1  mov         r11,1CCC830h  
0000000005463DDB  mov         r11,qword ptr [r11]  
0000000005463DDE  test        r11,r11  
0000000005463DE1  jne         0000000005463FD2  
0000000005463DE7  mov         qword ptr [rbp-18h],rax  
0000000005463DEB  jmp         0000000005463B18  
0000000005463DF0  mov         r11,6065810h  
0000000005463DFA  mov         qword ptr [rsp],r11  
0000000005463DFE  mov         r8,rdx  
0000000005463E01  mov         r9,rax  
0000000005463E04  mov         rdx,6065540h  
0000000005463E0E  mov         rcx,rbp  
0000000005463E11  mov         dword ptr [rbp+2Ch],1Eh  
0000000005463E18  mov         r11,1CCAD80h  
0000000005463E22  mov         qword ptr [r11],rbp  
0000000005463E25  mov         r11,2D91100h  
0000000005463E2F  call        r11
Comment 12 peavo 2013-11-15 15:20:12 PST
(In reply to comment #10)
> Actually, the value for rbx is the same as the PC for the caller of getCallLinkInfoReturnLocation() in the stack trace.

I found the 5. parameter (StringImpl *) on the stack, it's located 20h off from where we actually read.
Comment 13 Michael Saboff 2013-11-15 15:35:26 PST
(In reply to comment #12)
> (In reply to comment #10)
> > Actually, the value for rbx is the same as the PC for the caller of getCallLinkInfoReturnLocation() in the stack trace.
> 
> I found the 5. parameter (StringImpl *) on the stack, it's located 20h off from where we actually read.

Makes sense.  Doesn't appear that we accounted for the first 4 args before poking arg5.  The poke for arg5 is: 
0000000005463DFA  mov         qword ptr [rsp],r11
That should be 0000000005463DFA  mov         qword ptr [rsp + 0x20],r11

I think the C call helper for 5 args for X86 win needs to account for the first 4 args.
Comment 14 peavo 2013-11-15 15:48:03 PST
(In reply to comment #13)
> (In reply to comment #12)
> > (In reply to comment #10)
> > > Actually, the value for rbx is the same as the PC for the caller of getCallLinkInfoReturnLocation() in the stack trace.
> > 
> > I found the 5. parameter (StringImpl *) on the stack, it's located 20h off from where we actually read.
> 
> Makes sense.  Doesn't appear that we accounted for the first 4 args before poking arg5.  The poke for arg5 is: 
> 0000000005463DFA  mov         qword ptr [rsp],r11
> That should be 0000000005463DFA  mov         qword ptr [rsp + 0x20],r11
> 
> I think the C call helper for 5 args for X86 win needs to account for the first 4 args.

Ah, I see... Thanks, will update the patch :)
Comment 15 peavo 2013-11-15 16:07:15 PST
Created attachment 217094 [details]
Patch
Comment 16 EFL EWS Bot 2013-11-15 16:11:55 PST
Comment on attachment 217094 [details]
Patch

Attachment 217094 [details] did not pass efl-ews (efl):
Output: http://webkit-queues.appspot.com/results/24538011
Comment 17 Michael Saboff 2013-11-15 16:14:40 PST
Comment on attachment 217094 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=217094&action=review

> Source/JavaScriptCore/jit/JITStubsMSVC64.asm:35
> +    mov r10, qword ptr[esp]

Shouldn't this be rsp?
Comment 18 Michael Saboff 2013-11-15 16:23:22 PST
Comment on attachment 217094 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=217094&action=review

> Source/JavaScriptCore/jit/CCallHelpers.h:733
> +#if CPU(MIPS) || OS(WINDOWS) && CPU(X86_64)

Parenthesize this expression.
Comment 19 peavo 2013-11-15 16:28:48 PST
Created attachment 217097 [details]
Patch
Comment 20 peavo 2013-11-15 16:32:37 PST
Created attachment 217099 [details]
Patch
Comment 21 peavo 2013-11-15 16:34:08 PST
(In reply to comment #17)
> (From update of attachment 217094 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=217094&action=review
> 
> > Source/JavaScriptCore/jit/JITStubsMSVC64.asm:35
> > +    mov r10, qword ptr[esp]
> 
> Shouldn't this be rsp?

Yes :)
Comment 22 Michael Saboff 2013-11-15 16:34:30 PST
r=me.  Let's watch the EWS bots and then I can cq+
Comment 23 peavo 2013-11-15 16:35:38 PST
(In reply to comment #22)
> r=me.  Let's watch the EWS bots and then I can cq+

Thanks, I hope I got it right now :)
Comment 24 peavo 2013-11-15 17:22:02 PST
(In reply to comment #22)
> r=me.  Let's watch the EWS bots and then I can cq+

Seems to build successfully now :)
Comment 25 WebKit Commit Bot 2013-11-15 19:23:31 PST
Comment on attachment 217099 [details]
Patch

Clearing flags on attachment: 217099

Committed r159376: <http://trac.webkit.org/changeset/159376>
Comment 26 WebKit Commit Bot 2013-11-15 19:23:34 PST
All reviewed patches have been landed.  Closing bug.