Bug 150936

Summary: Crash in sorting-benchmark.js on ARM64 with full op_sub FTL coverage
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, fpizlo, ggaren, keith_miller, msaboff, ossy, saam, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 150712    
Attachments:
Description Flags
the patch. msaboff: review+

Description Mark Lam 2015-11-05 09:56:32 PST
I think this is a latent bug that has been masked by the fact that we didn't previously had FTL coverage for op_sub untyped operands.  With the add support for op_sub untyped operands, we now get the following crash on ARM64:

ASSERTION FAILED: exception
/Volumes/Data/ws2/OpenSource/Source/JavaScriptCore/jit/JITExceptions.cpp(54) : void JSC::genericUnwind(JSC::VM *, JSC::ExecState *, JSC::UnwindStart)
1   0x100bf13b0 WTFCrash
2   0x1007a9200 JSC::genericUnwind(JSC::VM*, JSC::ExecState*, JSC::UnwindStart)
3   0x1007bc33c lookupExceptionHandler
4   0x13f517234
5   0x13f4ec974
6   0x10096e8a8 llint_entry
7   0x100968bc8 vmEntryToJavaScript
8   0x1007a6160 JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
9   0x100778030 JSC::Interpreter::execute(JSC::ProgramExecutable*, JSC::ExecState*, JSC::JSObject*)
10  0x1001f968c JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&)
11  0x100009f0c runWithScripts(GlobalObject*, WTF::Vector<Script, 0ul, WTF::CrashOnOverflow, 16ul> const&, bool, bool)
12  0x100008f38 jscmain(int, char**)
13  0x100008b80 main
14  0x18148a8b8 <redacted>

Some facts:
1. This only manifests if function benchmark#ERdD87 is FTL compiled.
   Previously, this function was prevented from being compiled due to inadequate FTL coverage of op_sub untyped operands.
2. The failure is due to the vm->exception() returning a NULL value in genericUnwind().
   Some debugging reviews that no exception was actually thrown (hence, the NULL vm->exception() value).  We got this crash probably because of some erroneous jump (or fall thru code) in the FTL.
Comment 1 Mark Lam 2015-11-05 09:58:04 PST
I should add:
3. This issue only manifests on ARM64.
   It does not manifest on X64_64.
Comment 2 Radar WebKit Bug Importer 2015-11-05 09:58:37 PST
<rdar://problem/23413932>
Comment 3 Mark Lam 2015-11-05 12:29:11 PST
Investigation notes:

1. Minimum DFG white list:
benchmark#Be0R5z
bottom_up_merge_sort#A5p6f2

2. Minimum inlining depth for reproduction: 2

3. JS stack trace at point of failure:

      frame 0x16fd79780 {
         name: benchmark
         sourceURL: sorting-benchmark.js
         isInlinedFrame: false
         callee: 0x102c605b0
         returnPC: 0x11d3ca250
         callerFrame: 0x16fd79880
         rawLocationBits: 0 0x0
         codeBlock: 0x102cf4d00 benchmark#Be0R5z:[0x102cf4d00->0x102c66500->0x102c74900, FTLFunctionCall, 182]
            hasCodeOrigins: true
            callSiteIndex: 0 of 5
            line: 117
            column: 6
      }
      frame 0x16fd79880 {
         name: main
         sourceURL: sorting-benchmark.js
         isInlinedFrame: false
         callee: 0x102c604f0
         returnPC: 0x100a091a8
         callerFrame: 0x16fd798f0
         rawLocationBits: 288 0x120
         codeBlock: 0x102c67700 main#BBIyow:[0x102c67700->0x102c74600, BaselineFunctionCall, 848]
            bytecodeOffset: 288 of 848
            line: 159
            column: 14
      }
      frame 0x16fd798f0 {
         name: global code
         sourceURL: sorting-benchmark.js
         isInlinedFrame: false
         callee: 0x102c2bde0
         returnPC: 0x100a034c8
         callerFrame: 0x0
         rawLocationBits: 288 0x120
         codeBlock: 0x102c67d00 <global>#AA6tQf:[0x102c67d00->0x102c7da00, LLIntGlobal, 299]
            bytecodeOffset: 288 of 299
            line: 164
            column: 3
      }

4. After eliminating all uses of the subtraction operator from being DFG (and therefore FTL) compiled, I can still reproduce the crash.  This confirms that this issue is indeed not related to the support for op_sub in the FTL.

   Details: to do this experiment, I did the following:
   1. Use the DFG whitelist to only compile the 2 functions above: benchmark and bottom_up_merge_sort.
   2. noInline() every other function that uses the subtraction operator.
   3. Change the Date subtraction in benchmark() to an addition instead.
Comment 4 Mark Lam 2015-11-09 11:58:43 PST
I managed to reduce the test case a bit more and only DFG & FTL compile 1 function: bottom_up_merge_sort.

With the above test case, I was observing that register x26 was getting trashed.  So, on Geoff's suggestion, before each call in the baseline JIT (plus call ICs, and a few other places) I stash away the value of x26 and x27 (a tag register) in a global vector, and overwrote them with a synthesized value.  On return from the call, I verify that x26 and x27 should contain the values that I put there, and then I restore the original values that I stashed away in the global vector.  The intent is to confirm that each callee is actually restoring the callee save registers.  x26 and x27 are both callee saved, and hence, should be restored.

With this test, I am able to catch restoration issues.  Some facts about the issue:

5. FTL compilation is necessary to reproduce the issue.

6. DFG / FTL OSR entry is not needed.

7. x26 was not restored after an OSR exit from the FTL function.

8. Added probe logging shows:

    mlam FTL Prologue: x27:0x104008a78 x26:0x10000003               // First line in FTL prologue. x26 is what I set it to be.
    mlam FTL Prologue 2000: x27:0x104008a78 x26:0xffff000000000002  // Further down in the FTL prologue, x26 is already trashed.

    // Generating the OSR exit off-ramp:
Compiling OSR exit with exitID = 2
    Owning block: bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600, FTLFunctionCall, 619]  // OSR exiting.

    // Running the OSR exit off-ramp:
    mlam OSR EXIT TOP by FTL OSRExitCompiler::compileStub: x27:0x104008a78 x26:0xffff000000000002 @ bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600, FTLFunctionCall, 619]
    mlam OSR EXIT BOTTOM by FTL OSRExitCompiler::compileStub: x27:0xffff000000000000 x26:0xffff000000000002 @ bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600, FTLFunctionCall, 619]

    // At the end of the OSR exit, we find that x26 was not restored by the OSR exit off-ramp.
    mlam ERROR @ AFTER [3] EXIT BOTTOM by FTL OSRExitCompiler::compileStub: x26: expect 0x10000003, actual 0xffff000000000002 @ codeBlock bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600, FTLFunctionCall, 619] bottom_up_merge_sort#CdOV1a:[0x1040a9

9. The OSR off-ramp disassembly only shows the following references to x26:

    Code at [0x1c02a7c80, 0x1c02a8520):
         ...
         0x1c02a7d5c: 	stur	x26, [x0, #208]
         ...

   i.e. the off-ramp thinks that there is a value in x26 that needs to be flushed to the stack.  However, it does not think there's a value in x26 that it needs to restore.
Comment 5 Michael Saboff 2015-11-09 12:44:15 PST
(In reply to comment #4)
> I managed to reduce the test case a bit more and only DFG & FTL compile 1
> function: bottom_up_merge_sort.
> 
> With the above test case, I was observing that register x26 was getting
> trashed.  So, on Geoff's suggestion, before each call in the baseline JIT
> (plus call ICs, and a few other places) I stash away the value of x26 and
> x27 (a tag register) in a global vector, and overwrote them with a
> synthesized value.  On return from the call, I verify that x26 and x27
> should contain the values that I put there, and then I restore the original
> values that I stashed away in the global vector.  The intent is to confirm
> that each callee is actually restoring the callee save registers.  x26 and
> x27 are both callee saved, and hence, should be restored.
> 
> With this test, I am able to catch restoration issues.  Some facts about the
> issue:
> 
> 5. FTL compilation is necessary to reproduce the issue.
> 
> 6. DFG / FTL OSR entry is not needed.
> 
> 7. x26 was not restored after an OSR exit from the FTL function.
> 
> 8. Added probe logging shows:
> 
>     mlam FTL Prologue: x27:0x104008a78 x26:0x10000003               // First
> line in FTL prologue. x26 is what I set it to be.
>     mlam FTL Prologue 2000: x27:0x104008a78 x26:0xffff000000000002  //
> Further down in the FTL prologue, x26 is already trashed.
> 
>     // Generating the OSR exit off-ramp:
> Compiling OSR exit with exitID = 2
>     Owning block:
> bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600,
> FTLFunctionCall, 619]  // OSR exiting.
> 
>     // Running the OSR exit off-ramp:
>     mlam OSR EXIT TOP by FTL OSRExitCompiler::compileStub: x27:0x104008a78
> x26:0xffff000000000002 @
> bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600,
> FTLFunctionCall, 619]
>     mlam OSR EXIT BOTTOM by FTL OSRExitCompiler::compileStub:
> x27:0xffff000000000000 x26:0xffff000000000002 @
> bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600,
> FTLFunctionCall, 619]
> 
>     // At the end of the OSR exit, we find that x26 was not restored by the
> OSR exit off-ramp.
>     mlam ERROR @ AFTER [3] EXIT BOTTOM by FTL OSRExitCompiler::compileStub:
> x26: expect 0x10000003, actual 0xffff000000000002 @ codeBlock
> bottom_up_merge_sort#CdOV1a:[0x1040a9f00->0x1040aae00->0x104060600,
> FTLFunctionCall, 619] bottom_up_merge_sort#CdOV1a:[0x1040a9
> 
> 9. The OSR off-ramp disassembly only shows the following references to x26:
> 
>     Code at [0x1c02a7c80, 0x1c02a8520):
>          ...
>          0x1c02a7d5c: 	stur	x26, [x0, #208]
>          ...
> 
>    i.e. the off-ramp thinks that there is a value in x26 that needs to be
> flushed to the stack.  However, it does not think there's a value in x26
> that it needs to restore.

I believe that is the code in the OSRExit handler that saves every register to a scratch buffer.  If that is the only reference to x26, then it sounds like the FTL doesn't think that it used x26.


Did the FTL compiled function save x26 and is it actively using it?  The unwind information for the function should show that it was saved and we turn that into the a RegisterAtOffsetList in parseUnwindInfo().  We use that registerAtOffset list to manage our callee saves, including during OSR exit processing.

You can also look at the FTL disassembly to see if the FTL function saves x26 and where it materializes the tag mask constant into x26.
Comment 6 Mark Lam 2015-11-09 13:33:58 PST
(In reply to comment #5)
> You can also look at the FTL disassembly to see if the FTL function saves
> x26 and where it materializes the tag mask constant into x26.

From the FTL disassembly:

Generated LLVM code after stackmap-based fix-up for bottom_up_merge_sort#CdOV1a:[0x104899f00->0x10489ae00->0x10486c600, FTLFunctionCall, 619] in FTLMode #0, __text:
         0x10de45ae0: 	stp	x28, x27, [sp, #-96]!
         0x10de45ae4: 	stp	x26, x25, [sp, #16]        // x26 saved right at the start of the function.
         0x10de45ae8: 	stp	x24, x23, [sp, #32]
         0x10de45aec: 	stp	x22, x21, [sp, #48]
         0x10de45af0: 	stp	x20, x19, [sp, #64]
         0x10de45af4: 	stp	x29, x30, [sp, #80]
         0x10de45af8: 	add	x29, sp, #80
         0x10de45afc: 	sub	sp, sp, #112
         ...
         0x10de45cac: 	movz	x26, #0xffff, lsl #48      // Next use still in the FTL prologue code at the block for checkArguments: no more saving to the stack before loading a value into it.
         0x10de45cb0: 	movk	x26, #0x2
         ...

Looks like the FTL did save it, but our OSR exit compiler is not aware of it.  The exit off-ramp contains the following:

         0x10de482bc: 	movz	x17, #0xcf68
         0x10de482c0: 	movk	x17, #0x490, lsl #16
         0x10de482c4: 	movk	x17, #0x1, lsl #32
         0x10de482c8: 	ldr	 x19, [x17, xzr]
         0x10de482cc: 	ldur	x20, [x17, #8]
         0x10de482d0: 	ldur	x21, [x17, #16]
         0x10de482d4: 	ldur	x22, [x17, #24]
         0x10de482d8: 	ldur	x23, [x17, #32]
         0x10de482dc: 	ldur	x24, [x17, #40]
         0x10de482e0: 	ldur	x25, [x17, #48]

x27 and x28 gets loaded with tag values early on but are also used to store other temp values later.  I didn't trace it too carefully.  x26 gets completely left out.  I'll look into the unwind info.
Comment 7 Mark Lam 2015-11-09 13:40:56 PST
parseUnwindInfo() saw UNWIND_ARM64_FRAME_X25_X26_PAIR.  The unwind info dump says:

Unwind info for bottom_up_merge_sort#CdOV1a:[0x104899f00->0x10489ae00->0x10486c600, FTLFunctionCall, 619]:
    %r19 at -8, %r20 at -16, %r21 at -24, %r22 at -32, %r23 at -40, %r24 at -48, %r25 at -56, %r26 at -64, %r27 at -72, %r28 at -80, %fp at 0

i.e. x26 is included.
Comment 8 Michael Saboff 2015-11-09 14:03:17 PST
(In reply to comment #7)
> parseUnwindInfo() saw UNWIND_ARM64_FRAME_X25_X26_PAIR.  The unwind info dump
> says:
> 
> Unwind info for
> bottom_up_merge_sort#CdOV1a:[0x104899f00->0x10489ae00->0x10486c600,
> FTLFunctionCall, 619]:
>     %r19 at -8, %r20 at -16, %r21 at -24, %r22 at -32, %r23 at -40, %r24 at
> -48, %r25 at -56, %r26 at -64, %r27 at -72, %r28 at -80, %fp at 0
> 
> i.e. x26 is included.

Looks like this function uses all callee saves.

x27 & x28 are filled with the tag contents, because we know that is what the baseline expects.

x26 will not be restored itself, because it is also callee save for the baseline.  The baseline never uses this register, but if it did, it could put whatever it wants in that register as it will restore the prior contents when it returns.

The contents at stack offsets -64, -72 and -80 for x26, x27 and x28 should be copied to the appropriate callee save locations for the baseline frame.  From memory these should be -24[%fp] for x26, -16[%fp] for x27 and -8[%fp] for x28.  When the baseline function we are OSR exiting to returns, it will restore the prior contents of x26, x27 and x28 of the caller.

I'd look to make sure that all callee save values where copied from the stack offsets given in the unwind info to the scratch buffer.  Looks like this happens in the loop near FTLOSRExitCompiler.cpp::compileStub::lines 376-382.

Then make sure that the contents saved in that loop are restored to the right register or stack offset depending on whether or not the register is a callee save for the baseline.  That is done in the loop near FTLOSRExitCompiler.cpp::compileStub::lines 430-466.  For the callee saves, we should load into x0, storing from the same.

Verify that the generated assembly makes sense.
Comment 9 Mark Lam 2015-11-09 14:55:37 PST
(In reply to comment #8)
> Verify that the generated assembly makes sense.

Yes, I think it make sense.  The x26 trail I was on turns out to be a red herring.  x26 turns out to also be a baseline callee save regs.  The FTL OSR exit stores the restore value in the baseline's callee save location on the stack.  When the baseline version of the function returns, x26 was indeed restored.  Hence, it was not right for me to assert the value of x26 on OSR exit.

Once I corrected that assertion, I'm getting a crash elsewhere.  I'll continue to investigate.
Comment 10 Mark Lam 2015-11-11 09:28:40 PST
Here's the latest data on what is happening on the path to manifesting this issue:

1. There are 2 functions: benchmark() and bottom_up_merge_sort().  Let's call the A() and B() for brevity.
2. A() is FTL compiled.
3. B() is FTL compiled.
4. FTL A() calls FTL B().  At B()'s FTL prologue, x26 contains the value I planted there.
5. FTL B() loops for a long time.
6. FTL B() goes through an OSR exit, and deopts to baseline.  The OSR exit off-ramp puts x26's original value in the baseline stash location in its stack frame.
7. Baseline B() gets DFG optimized, and we OSR enter into DFG B().
8. DFG B() returns to FTL A().  This is where I detect that x26 has not been restored to the value that I planted there.

The issue has to be in step (7) or (8).  My guess is that (7) is the culprit.  Digging into that next.
Comment 11 Mark Lam 2015-11-11 10:06:19 PST
(In reply to comment #10)
> 7. Baseline B() gets DFG optimized, and we OSR enter into DFG B().
> 8. DFG B() returns to FTL A().  This is where I detect that x26 has not been
> restored to the value that I planted there.
> 
> The issue has to be in step (7) or (8).  My guess is that (7) is the
> culprit.  Digging into that next.

I verified that (8) did not think x26 needs to be restored.  In the DFG return site, emitRestoreCalleeSaves() only restored x27 and x28 because codeBlock->calleeSaveRegisters() did not include x26.

This means that the OSR entry ramp into DFG B() should have restored x26 on OSR entry, or DFG B() should have included x26 in its callee saves.  Looking into the OSR entry ramp next.
Comment 12 Mark Lam 2015-11-11 13:42:55 PST
(In reply to comment #11)
> This means that the OSR entry ramp into DFG B() should have restored x26 on
> OSR entry, or DFG B() should have included x26 in its callee saves.  Looking
> into the OSR entry ramp next.

Michael confirmed that the issue is in the OSR entry thunk.  The generic fix is for the entry trunk to restore the appropriate registers from the VM scratch buffer.
Comment 13 Mark Lam 2015-11-11 16:18:06 PST
Created attachment 265329 [details]
the patch.

Still doing a full test run locally.
Comment 14 Michael Saboff 2015-11-11 16:29:02 PST
Comment on attachment 265329 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=265329&action=review

r=me

> Source/JavaScriptCore/ChangeLog:8
> +        The OSR entry thunk does not currently restore the callee saved registers.  As a result,

I think it is more accurate to say "The OSR entry thunk does not restore callee saved registers that aren't saved by the DFG".
Comment 15 Mark Lam 2015-11-11 23:07:05 PST
Thanks.  Landed in r192352: <http://trac.webkit.org/r192352>.
Comment 16 Csaba Osztrogon√°c 2015-11-12 01:58:01 PST
*** Bug 149061 has been marked as a duplicate of this bug. ***