Bug 130539

Summary: REGRESSION(r164205): WebKit crash @StructureIDTable::get
Product: WebKit Reporter: Antonio Gomes <tonikitoo>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: anthony.liot, fpizlo, lquinn, mark.lam, mhahnenberg, oliver, webkit-bug-importer
Priority: P1 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 128883    
Attachments:
Description Flags
test case - load issuetest.html
none
the patch.
fpizlo: review-
patch 2: applied feedback. ggaren: review+

Description Antonio Gomes 2014-03-20 14:48:34 PDT
Created attachment 227338 [details]
test case - load issuetest.html

Load attached test case (by Anthony Liot!)

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000027f8301e0
0x0000000103158f53 in JSC::StructureIDTable::get (this=0x112360848, structureID=796288356) at StructureIDTable.h:86
86 return table()[structureID].structure;
(gdb) bt
#0 0x0000000103158f53 in JSC::StructureIDTable::get (this=0x112360848, structureID=796288356) at StructureIDTable.h:86
#1 0x000000010315fd76 in JSC::JSCell::structure (this=0x112360fbd, vm=@0x112360570) at JSCellInlines.h:104
#2 0x0000000103164581 in JSC::JSCell::classInfo (this=0x112360fbd) at JSDestructibleObject.h:37
#3 0x000000010315b8e9 in JSC::JSCell::inherits (this=0x112360fbd, info=0x103cf7c88) at JSCellInlines.h:211
#4 0x00000001037f7465 in JSC::jsDynamicCast<JSC::JSString*, JSC::JSCell> (from=0x112360fbd) at JSCell.h:244
#5 0x00000001037f6f95 in JSC::speculationFromCell (cell=0x112360fbd) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/bytecode/SpeculatedType.cpp:332
#6 0x00000001037f7113 in JSC::speculationFromValue (value={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4600500157, ptr = 0x112360fbd, asBits = {payload = 305532861, tag = 1}}}) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/bytecode/SpeculatedType.cpp:357
#7 0x00000001033c435e in JSC::DFG::AbstractValue::validate (this=0x10f0dd780, value={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4600500157, ptr = 0x112360fbd, asBits = {payload = 305532861, tag = 1}}}) at DFGAbstractValue.h:212
#8 0x00000001033c398a in JSC::DFG::prepareOSREntry (exec=0x7fff5fbfcdc0, codeBlock=0x1123e18f0, bytecodeIndex=189) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/dfg/DFGOSREntry.cpp:172
#9 0x000000010353aad0 in operationOptimize (exec=0x7fff5fbfcdc0, bytecodeIndex=189) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/jit/JITOperations.cpp:1216
#10 0x0000339f358e30d7 in ?? ()
#11 0x0000339f3583c8bc in ?? ()
#12 0x0000339f3583b96b in ?? ()
#13 0x0000000103694a86 in llint_op_call ()
#14 0x0000000103694a86 in llint_op_call ()
#15 0x0000000103694a86 in llint_op_call ()
#16 0x0000000103694a86 in llint_op_call ()
#17 0x0000000103694a86 in llint_op_call ()
#18 0x0000000103694a86 in llint_op_call ()
#19 0x000000010368eb44 in callToJavaScript ()
#20 0x000000010352a46d in JSC::JITCode::execute (this=0x112397520, vm=0x10d80be00, protoCallFrame=0x7fff5fbfda28) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/jit/JITCode.cpp:47
#21 0x000000010350ef19 in JSC::Interpreter::executeCall (this=0x11000afc0, callFrame=0x10cf9f4b0, function=0x126bf9f30, callType=JSC::CallTypeJS, callData=@0x7fff5fbfdd08, thisValue={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4303355824, ptr = 0x1007fffb0, asBits = {payload = 8388528, tag = 1}}}, args=@0x7fff5fbfdc48) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/interpreter/Interpreter.cpp:994
#22 0x00000001031e315e in JSC::call (exec=0x10cf9f4b0, functionObject={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4945059632, ptr = 0x126bf9f30, asBits = {payload = 650092336, tag = 1}}}, callType=JSC::CallTypeJS, callData=@0x7fff5fbfdd08, thisValue={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4303355824, ptr = 0x1007fffb0, asBits = {payload = 8388528, tag = 1}}}, args=@0x7fff5fbfdc48) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/JavaScriptCore/runtime/CallData.cpp:39
#23 0x00000001058ed79b in WebCore::JSMainThreadExecState::call (exec=0x10cf9f4b0, functionObject={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4945059632, ptr = 0x126bf9f30, asBits = {payload = 650092336, tag = 1}}}, callType=JSC::CallTypeJS, callData=@0x7fff5fbfdd08, thisValue={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4303355824, ptr = 0x1007fffb0, asBits = {payload = 8388528, tag = 1}}}, args=@0x7fff5fbfdc48) at JSMainThreadExecState.h:55
#24 0x000000010656dd51 in WebCore::ScheduledAction::executeFunctionInContext (this=0x119a3fe40, globalObject=0x10cf9f470, thisValue={static numberOfInt52Bits = <optimized out>, static int52ShiftAmount = <optimized out>, u = {asInt64 = 4303355824, ptr = 0x1007fffb0, asBits = {payload = 8388528, tag = 1}}}, context=0x10d01c0a0) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/bindings/js/ScheduledAction.cpp:103
#25 0x000000010656d942 in WebCore::ScheduledAction::execute (this=0x119a3fe40, document=0x10d01c000) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/bindings/js/ScheduledAction.cpp:124
#26 0x000000010656d7d4 in WebCore::ScheduledAction::execute (this=0x119a3fe40, context=0x10d01c0a0) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/bindings/js/ScheduledAction.cpp:78
#27 0x0000000105169228 in WebCore::DOMTimer::fired (this=0x119a45180) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/page/DOMTimer.cpp:182
#28 0x000000010692a29c in WebCore::ThreadTimers::sharedTimerFiredInternal (this=0x10e829880) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/platform/ThreadTimers.cpp:132
#29 0x0000000106929f59 in WebCore::ThreadTimers::sharedTimerFired () at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/platform/ThreadTimers.cpp:107
#30 0x000000010666c2b3 in WebCore::timerFired () at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebCore/platform/mac/SharedTimerMac.mm:133
#31 0x00007fff92500804 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ ()
#32 0x00007fff9250031d in __CFRunLoopDoTimer ()
#33 0x00007fff924e5ad9 in __CFRunLoopRun ()
#34 0x00007fff924e50e2 in CFRunLoopRunSpecific ()
#35 0x00007fff92ce2eb4 in RunCurrentEventLoopInMode ()
#36 0x00007fff92ce2c52 in ReceiveNextEventCommon ()
#37 0x00007fff92ce2ae3 in BlockUntilNextEventMatchingListInMode ()
#38 0x00007fff93678533 in _DPSNextEvent ()
#39 0x00007fff93677df2 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#40 0x00007fff9366f1a3 in -[NSApplication run] ()
#41 0x000000010144ba7f in WebKit::WebContentProcessMainDelegate::startRunLoop (this=0x7fff5fbff5c0) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebKit2/WebProcess/EntryPoint/mac/LegacyProcess/WebContentProcessMain.mm:183
#42 0x000000010144aaaf in WebKit::ChildProcessMain<WebKit::WebProcess, WebKit::WebContentProcessMainDelegate> (argc=6, argv=0x7fff5fbff6e8) at ChildProcessEntryPoint.h:93
#43 0x000000010144a7fb in WebContentProcessMain (argc=6, argv=0x7fff5fbff6e8) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebKit2/WebProcess/EntryPoint/mac/LegacyProcess/WebContentProcessMain.mm:198
#44 0x0000000100000cc1 in WebKit::BootstrapMain (argc=6, argv=0x7fff5fbff6e8) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source/WebKit2/Shared/EntryPointUtilities/mac/LegacyProcess/ChildProcessMain.mm:81
#45 0x0000000100000ae2 in main (argc=6, argv=0x7fff5fbff6e8) at /Users/a1.gomes/Devel/Samsung/webcl-webkit/Source

For completeness, it works fine on stock Safari (OSX 10.9.x and 10.8.x), Firefox, Chrome.
Comment 1 Radar WebKit Bug Importer 2014-03-20 15:02:15 PDT
<rdar://problem/16384199>
Comment 2 Antonio Gomes 2014-03-20 22:43:51 PDT
A quick bisect has shown this revision as culprit: r164207.

commit 51614ccfbb0d9c4e68e4de129620093619d18b81
Author: fpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Date:   Mon Feb 17 06:35:32 2014 +0000

    FTL should inline polymorphic heap accesses
    https://bugs.webkit.org/show_bug.cgi?id=128795

Trying to confirm it now.
Comment 3 Antonio Gomes 2014-03-21 09:47:25 PDT
(In reply to comment #2)
Ok, comment #2 is wrong. I properly bisected and found the offending commit:

commit 6fd929fb654f1c0b85bdb051ece45c1a5bb2ab9c
Author: fpizlo@apple.com <fpizlo@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Date:   Mon Feb 17 06:25:05 2014 +0000

    DFG::prepareOSREntry should be nice to the stack
    https://bugs.webkit.org/show_bug.cgi?id=128883
    
    Reviewed by Oliver Hunt.


Raising the priority as well.
Comment 4 Mark Lam 2014-04-18 08:27:17 PDT
Here's what I know so far:
1. This issue manifests with the following disabled, and hence is probably not due to these:
     a. FTL JIT
     b. Inlining
     c. Concurrent JIT

2. No GC activity was seen in the vicinity of the issue.

3. The issue is dependent on stack layout of local variables.
    a. Removing unused local vars can cause the issue to stop manifesting.
    b. Removing code that stores unused values to those vars do NOT stop the issue from manifesting.

4. The issue is intermittent, and is probably dependent on what value happen to be on the stack.
    - If the issue does not manifest, running a few more times often helps get it to manifest.
Comment 5 Mark Lam 2014-04-18 10:07:31 PDT
Here's some more data about the problem area I've been investigating:

1. There's an OSR exit followed by a OSR entry.  We crash in the OSR entry because we see a bogus JSCell value in a CallFrame local and try to access its classInfo or structure.

2. I confirmed that that bogus JSCell value came from the OSR exit which was writing out the value of a local that is "DisplacedInJSStack".
Comment 6 Mark Lam 2014-04-18 10:58:33 PDT
More data: before the OSR exit was a previous OSR enter that entered the DFG function.  I now have evidence that the local variable was not initialized on entry, and we encounter a speculation check failure that caused us to exit before that local can be initialized.  Whatever random value was on the stack at the time of OSR entry gets propagated forwards resulting in the intermittent failure I've been seeing.
Comment 7 Mark Lam 2014-04-18 17:47:28 PDT
Found the smoking gun:

In prepareOSREntry()'s step 3, we do the work of saving off the locals values from the old baseline frame there.  Later on in step 4, we'll use these values to populate the locals of the new DFG frame.  Unfortunately, the frameSize used in step 3 is too small.  As a result, some parts of the pivot buffer remains uninitialized, and step 4 may end up writing the random stuff in the pivot buffer into the DFG frame's locals.  This is the root cause of the issue.

Next question: what should frameSize actually be?  Grepping now ...
Comment 8 Mark Lam 2014-04-18 18:04:27 PDT
(In reply to comment #7)
> Next question: what should frameSize actually be?  Grepping now ...

Found it.  entry->m_expectedValues.numberOfLocals() should give us the number of locals in the baseline frame.  For justification that this is the right value to use, see step 1 in prepareOSREntry().  There, we're using entry->m_expectedValues.numberOfLocals() as the number of incoming locals (from the baseline frame) to validate our type inference expectations against.
Comment 9 Mark Lam 2014-04-18 18:39:42 PDT
Created attachment 229700 [details]
the patch.
Comment 10 Filip Pizlo 2014-04-18 18:54:36 PDT
Comment on attachment 229700 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=229700&action=review

> Source/JavaScriptCore/dfg/DFGOSREntry.cpp:249
> +    for (unsigned i = baselineFrameSize; i--;) {

This is wrong - at this point we have shuffled to using the DFG's local numbering. So we want frameSize here.
Comment 11 Mark Lam 2014-04-18 19:53:58 PDT
Comment on attachment 229700 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=229700&action=review

>> Source/JavaScriptCore/dfg/DFGOSREntry.cpp:249
>> +    for (unsigned i = baselineFrameSize; i--;) {
> 
> This is wrong - at this point we have shuffled to using the DFG's local numbering. So we want frameSize here.

Hmmm.  I was thinking of it as the clearing the parts of the baseline frame that the DFG frame isn’t using, and was also assuming that frameSize < baselineFrameSize always.  But, in retrospect, I realize that I don’t actually know if that is true.  If frameSize can be greater than baselineFrameSize, then using frameSize makes more sense.  But then, in the allocation of the scratch buffer above, we should be using max(frameSize, baselineFrameSize) instead.

Can you please clarify if it is possible for frameSize to be greater than baselineFrameSize?  If not, why is it more proper to use frameSize here?  Thanks.
Comment 12 Geoffrey Garen 2014-04-18 20:42:00 PDT
Comment on attachment 229700 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=229700&action=review

>>> Source/JavaScriptCore/dfg/DFGOSREntry.cpp:249
>>> +    for (unsigned i = baselineFrameSize; i--;) {
>> 
>> This is wrong - at this point we have shuffled to using the DFG's local numbering. So we want frameSize here.
> 
> Hmmm.  I was thinking of it as the clearing the parts of the baseline frame that the DFG frame isn’t using, and was also assuming that frameSize < baselineFrameSize always.  But, in retrospect, I realize that I don’t actually know if that is true.  If frameSize can be greater than baselineFrameSize, then using frameSize makes more sense.  But then, in the allocation of the scratch buffer above, we should be using max(frameSize, baselineFrameSize) instead.
> 
> Can you please clarify if it is possible for frameSize to be greater than baselineFrameSize?  If not, why is it more proper to use frameSize here?  Thanks.

The point of the scratch buffer is that the DFG will copy out of it in order to initialize the locals in its stack frame. It's nonsense for the scratch buffer to contain null values beyond the DFG stack frame size because the DFG will never copy them.

I agree that we should allocate the scratch buffer  to "max(frameSize, baselineFrameSize)".
Comment 13 Filip Pizlo 2014-04-18 20:43:15 PDT
(In reply to comment #12)
> (From update of attachment 229700 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=229700&action=review
> 
> >>> Source/JavaScriptCore/dfg/DFGOSREntry.cpp:249
> >>> +    for (unsigned i = baselineFrameSize; i--;) {
> >> 
> >> This is wrong - at this point we have shuffled to using the DFG's local numbering. So we want frameSize here.
> > 
> > Hmmm.  I was thinking of it as the clearing the parts of the baseline frame that the DFG frame isn’t using, and was also assuming that frameSize < baselineFrameSize always.  But, in retrospect, I realize that I don’t actually know if that is true.  If frameSize can be greater than baselineFrameSize, then using frameSize makes more sense.  But then, in the allocation of the scratch buffer above, we should be using max(frameSize, baselineFrameSize) instead.
> > 
> > Can you please clarify if it is possible for frameSize to be greater than baselineFrameSize?  If not, why is it more proper to use frameSize here?  Thanks.
> 
> The point of the scratch buffer is that the DFG will copy out of it in order to initialize the locals in its stack frame. It's nonsense for the scratch buffer to contain null values beyond the DFG stack frame size because the DFG will never copy them.
> 
> I agree that we should allocate the scratch buffer  to "max(frameSize, baselineFrameSize)".

frameSize could be bigger than baselineFrameSize. Probably you want to use the max of the two.
Comment 14 Mark Lam 2014-04-18 22:51:48 PDT
(In reply to comment #12)
> The point of the scratch buffer is that the DFG will copy out of it in order to initialize the locals in its stack frame. It's nonsense for the scratch buffer to contain null values beyond the DFG stack frame size because the DFG will never copy them.

Yeah, once I realized that that code is clearing out the scratch buffer and not the stack frame, and also that the OSR entry thunk would operate using frameSize and not baselineFrameSize, I came to the same conclusion as well.  I’ll fix up the patch.
Comment 15 Mark Lam 2014-04-18 23:03:10 PDT
Created attachment 229725 [details]
patch 2: applied feedback.
Comment 16 Geoffrey Garen 2014-04-18 23:33:56 PDT
Comment on attachment 229725 [details]
patch 2: applied feedback.

r=me
Comment 17 Mark Lam 2014-04-18 23:55:50 PDT
Thanks.  Landed in r167532: <http://trac.webkit.org/r167532>.