Bug 153996

Summary: Changed WTFCrash to not trash the crash site register state.
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, benjamin, cdumez, cmarcelo, commit-queue, ggaren, mitz
Priority: P2    
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
proposed patch.
none
x86_64 benchmark result.
none
proposed patch 2: added volatile attribute to asm statements.
ggaren: review+
patch for landing. none

Description Mark Lam 2016-02-08 11:31:40 PST
When doing post-mortem crash site analysis using data from crash reports, it is immensely valuable to be able to infer the crashing program's state from the register values at crash time.  However, for RELEASE_ASSERT failures, we crash using WTFCrash(), and WTFCrash() is currently implemented as a function call that, in turn, calls a lot of other functions to do crash handling before actually crashing.  As a result, the register values captured in the crash reports are not likely to still contain the values used by the caller function that failed the RELEASE_ASSERT.

This patch aims to remedy this issue for non-debug builds on OS(DARWIN) ports.  It does so by changing WTFCrash() into an inlined function that has an inlined asm statement to issues the CPU specific breakpoint trap instruction.  As a result, for non-debug OS(DARWIN) builds, crashes due to failed RELEASE_ASSERTs will now show up in crash reports as crashing due to EXC_BREAKPOINT (SIGTRAP) instead of a EXC_BAD_ACCESS (SIGSEGV) on address 0xbbadbeef.

For debug and non-DARWIN builds, WTFCrash() behavior currently remains unchanged.
Comment 1 Mark Lam 2016-02-08 11:36:53 PST
Created attachment 270871 [details]
proposed patch.
Comment 2 Mark Lam 2016-02-08 11:37:53 PST
Created attachment 270872 [details]
x86_64 benchmark result.
Comment 3 Mark Lam 2016-02-08 11:40:57 PST
A benchmark run (on a 13" Retina MacBook Pro) shows that perf is effectively neutral with this change.

A size comparison on the jsc framework binary shows that this patch is a progression: 

Without patch:
$ size old/WebKitBuild/Release/JavaScriptCore.framework/JavaScriptCore 
__TEXT	__DATA	__OBJC	others	dec	hex
10776576	225280	0	4739072	15740928	f03000	

With patch:
$ size new/WebKitBuild/Release/JavaScriptCore.framework/JavaScriptCore 
__TEXT	__DATA	__OBJC	others	dec	hex
10752000	225280	0	4730880	15708160	efb000
Comment 4 mitz 2016-02-08 12:01:44 PST
A nice feature of the current implementation of WTFCrash() is that if the debugger is attached to the program when it calls WTFCrash(), it’s possible  to continue execution by doing something like “thread return” followed by “continue”. Is there a similar way to continue execution after hitting an int3 instruction?
Comment 5 Alex Christensen 2016-02-08 12:12:45 PST
Right now it's easy to look in a crash report for 0xbbadbeef and know with almost certainty that that crash was from WTFCrash.  If I look at a crash report after this change, will there be any easy way to distinguish it from a non-WTFCrash-crash?
Comment 6 Mark Lam 2016-02-08 12:17:06 PST
(In reply to comment #4)
> A nice feature of the current implementation of WTFCrash() is that if the
> debugger is attached to the program when it calls WTFCrash(), it’s possible 
> to continue execution by doing something like “thread return” followed by
> “continue”. Is there a similar way to continue execution after hitting an
> int3 instruction?

That's interesting.  I don't believe there is (at least not without manually rigging some program state).  We can step over the int3, but there's nothing to return from.  The generated code implements the call to WTFCrash as a jump instead as a call because it is a NO_RETURN function.

May I ask why this continuation feature is needed / useful (considering we just crashed due to a RELEASE_ASSERT failure)?

(In reply to comment #5)
> Right now it's easy to look in a crash report for 0xbbadbeef and know with
> almost certainty that that crash was from WTFCrash.  If I look at a crash
> report after this change, will there be any easy way to distinguish it from
> a non-WTFCrash-crash?

Yes, look for EXC_BREAKPOINT (SIGTRAP) instead of a EXC_BAD_ACCESS (SIGSEGV) on address 0xbbadbeef.  You shouldn't be getting SIGTRAP crashes from any other sources.
Comment 7 mitz 2016-02-08 12:55:26 PST
(In reply to comment #6)
> (In reply to comment #4)
> > A nice feature of the current implementation of WTFCrash() is that if the
> > debugger is attached to the program when it calls WTFCrash(), it’s possible 
> > to continue execution by doing something like “thread return” followed by
> > “continue”. Is there a similar way to continue execution after hitting an
> > int3 instruction?
> 
> That's interesting.  I don't believe there is (at least not without manually
> rigging some program state).  We can step over the int3, but there's nothing
> to return from.  The generated code implements the call to WTFCrash as a
> jump instead as a call because it is a NO_RETURN function.
> 
> May I ask why this continuation feature is needed / useful (considering we
> just crashed due to a RELEASE_ASSERT failure)?

I’ve found it useful in debugging when either my changes or the bug are causing an assertion to fail, but I would still like to see what the code would have done if the ASSERT wasn’t there. I could recompile the code without the assertion(s), but doing this in the debugger is sometimes more convenient.
Comment 8 Michael Catanzaro 2016-02-08 13:20:31 PST
Comment on attachment 270871 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=270871&action=review

> Source/WTF/wtf/Assertions.h:172
> +    asm("int3");

I think this should work on Linux as well, so it probably doesn't need to be Darwin-specific?
Comment 9 Mark Lam 2016-02-08 14:11:45 PST
(In reply to comment #8)
> Comment on attachment 270871 [details]
> proposed patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=270871&action=review
> 
> > Source/WTF/wtf/Assertions.h:172
> > +    asm("int3");
> 
> I think this should work on Linux as well, so it probably doesn't need to be
> Darwin-specific?

It should.  But this patch changes the behavior of WTFCrash() such that it no longer invokes the globalHook nor calls WTFReportBacktrace().  My understanding is that those were added by linux folks.  Is it OK to by pass those for linux as well?
Comment 10 Mark Lam 2016-02-08 18:22:58 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #4)
> > > A nice feature of the current implementation of WTFCrash() is that if the
> > > debugger is attached to the program when it calls WTFCrash(), it’s possible 
> > > to continue execution by doing something like “thread return” followed by
> > > “continue”. Is there a similar way to continue execution after hitting an
> > > int3 instruction?
> > 
> > That's interesting.  I don't believe there is (at least not without manually
> > rigging some program state).  We can step over the int3, but there's nothing
> > to return from.  The generated code implements the call to WTFCrash as a
> > jump instead as a call because it is a NO_RETURN function.
> > 
> > May I ask why this continuation feature is needed / useful (considering we
> > just crashed due to a RELEASE_ASSERT failure)?
> 
> I’ve found it useful in debugging when either my changes or the bug are
> causing an assertion to fail, but I would still like to see what the code
> would have done if the ASSERT wasn’t there. I could recompile the code
> without the assertion(s), but doing this in the debugger is sometimes more
> convenient.

I just tried doing "thread return" after a WTFCrash() on a release build without my patch.  With that, I see that the debugger assumes that the instruction after the call to WTFCrash() is the return address.  However, this turns out to not be true.  Because WTFCrash() has the attribute NO_RETURN_DUE_TO_CRASH, you were never actually able to just continue execution.  The compiler emits the call to WTFCrash() out of line, and does not expect it to return to the mainline code.  Hence, it is not followed by any code that transitions execution back to mainline.  You may have been lucked out, and the program continue to executed, but that is just a fluke and it may not be executing what you expected.  Note: this is on a non-debug build (which is where I'm applying this change).

Based on this, I think there's no loss of functionality with this patch after all.
Comment 11 mitz 2016-02-08 18:42:08 PST
Makes sense. My experience is largely with Debug builds.
Comment 12 Geoffrey Garen 2016-02-09 11:09:53 PST
> I’ve found it useful in debugging when either my changes or the bug are
> causing an assertion to fail, but I would still like to see what the code
> would have done if the ASSERT wasn’t there. I could recompile the code
> without the assertion(s), but doing this in the debugger is sometimes more
> convenient.

A point that I think has been lost here: The proposed int3 instruction is a breakpoint instruction, which the debugger should be able to catch and continue from directly, just like any other breakpoint.
Comment 13 Mark Lam 2016-02-09 11:16:18 PST
(In reply to comment #12)
> > I’ve found it useful in debugging when either my changes or the bug are
> > causing an assertion to fail, but I would still like to see what the code
> > would have done if the ASSERT wasn’t there. I could recompile the code
> > without the assertion(s), but doing this in the debugger is sometimes more
> > convenient.
> 
> A point that I think has been lost here: The proposed int3 instruction is a
> breakpoint instruction, which the debugger should be able to catch and
> continue from directly, just like any other breakpoint.

You can continue, but there's nothing to make you continue in the caller of WTFCrash (which is what I think we hoped the continue would do).  Since WTFCrash is a NO_RETURN function, the C++ compiler implements it as a jump to out of line code where it calls WTFCrash (or inline the int3 in this case).  It is expected to never continue thereafter.  Hence, the instructions that follow after it may have nothing to do with the call site of WTFCrash.
Comment 14 Geoffrey Garen 2016-02-09 11:18:09 PST
Comment on attachment 270871 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=270871&action=review

> Source/WTF/wtf/Assertions.h:176
> +    asm("brk #0");

I think you need to mark your asm blocks volatile. Otherwise, the compiler might "optimize away" some assertions that they test for seemingly impossible or undefined behavior.
Comment 15 Geoffrey Garen 2016-02-09 11:32:57 PST
Comment on attachment 270871 [details]
proposed patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=270871&action=review

> Source/WTF/wtf/Assertions.h:165
> +#if defined(NDEBUG) && OS(DARWIN)

Honestly, I think this behavior is better in debug builds too. The tradeoff is: You lose the printf backtrace, but you gain stopping in the debugger exactly at the point of the crash.
Comment 16 Mark Lam 2016-02-10 09:45:03 PST
(In reply to comment #15)
> Comment on attachment 270871 [details]
> proposed patch.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=270871&action=review
> 
> > Source/WTF/wtf/Assertions.h:165
> > +#if defined(NDEBUG) && OS(DARWIN)
> 
> Honestly, I think this behavior is better in debug builds too. The tradeoff
> is: You lose the printf backtrace, but you gain stopping in the debugger
> exactly at the point of the crash.

With the pre-existing WTFCrash():
1. Without the debugger attached, the crash trace will be conveniently printed for us on the console.
2. With the debugger attached,  we can still easily inspect the crashing frame by going 1 frame up.

I feel that we didn't lose much convenience in terms of getting to the point of crash, but we certainly would lose more convenience in getting the crash trace when the debugger is not attached (will have to consult crash logs to now find the trace).

In both cases, there's no loss of functionality, just some tradeoff in convenience.  Does anyone else have an opinion on this?
Comment 17 Mark Lam 2016-02-10 10:03:10 PST
Created attachment 270999 [details]
proposed patch 2: added volatile attribute to asm statements.

Applied Geoff's feedback on the need for the volatile attribute.
Comment 18 Geoffrey Garen 2016-02-10 13:57:40 PST
Comment on attachment 270999 [details]
proposed patch 2: added volatile attribute to asm statements.

r=me
Comment 19 Geoffrey Garen 2016-02-10 13:58:01 PST
You should email webkit-dev with instructions for identifying an ASSERT crash before you land this.
Comment 20 Mark Lam 2016-02-10 14:59:00 PST
Comment on attachment 270999 [details]
proposed patch 2: added volatile attribute to asm statements.

View in context: https://bugs.webkit.org/attachment.cgi?id=270999&action=review

> Source/WTF/wtf/Assertions.h:176
> +#if CPU(X86_64) || CPU(X86)
> +    asm("int3");
> +#elif CPU(ARM_THUMB2)
> +    asm("bkpt #0");
> +#elif CPU(ARM64)
> +    asm("brk #0");

Eeek.  I uploaded the wrong patch here without the "volatile" changes.  Will land the proper one with the change included.
Comment 21 Mark Lam 2016-02-10 15:00:29 PST
Created attachment 271034 [details]
patch for landing.
Comment 22 Mark Lam 2016-02-10 15:03:23 PST
Landed in r196397: <http://trac.webkit.org/r196397>.