Bug 128766 - Web Inspector: CRASH when evaluating in console of JSContext RWI with disabled breakpoints
Summary: Web Inspector: CRASH when evaluating in console of JSContext RWI with disable...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Inspector (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Mark Lam
URL:
Keywords: InRadar
Depends on: 129265
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-13 14:24 PST by Joseph Pecoraro
Modified: 2014-02-25 18:08 PST (History)
14 users (show)

See Also:


Attachments
the patch. (13.70 KB, patch)
2014-02-22 18:52 PST, Mark Lam
ggaren: review-
Details | Formatted Diff | Diff
patch 2 (11.42 KB, patch)
2014-02-25 00:31 PST, Mark Lam
no flags Details | Formatted Diff | Diff
patch 3: require ordered re-entry into the VM (4.15 KB, patch)
2014-02-25 17:54 PST, Mark Lam
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Pecoraro 2014-02-13 14:24:34 PST
* STEPS TO REPRODUCE:
1. Build and Run JSPong
2. Open JSContext RWI
3. Add breakpoint to nextMove
4. Disable all breakpoints
5. Ensure the specific breakpoint is "enabled" (click if needed)
6. Evaluate "1+1" in console
  => CRASH

* CRASH:
ASSERTION FAILED: vm().entryScope
/Volumes/Data/Code/safari/OpenSource/Source/JavaScriptCore/interpreter/CallFrame.cpp(134) : JSC::JSGlobalObject *JSC::ExecState::vmEntryGlobalObject()
1   0x100719120 WTFCrash
2   0x1000943ee JSC::ExecState::vmEntryGlobalObject()
3   0x1003983cc JSC::Interpreter::debug(JSC::ExecState*, JSC::DebugHookID)
4   0x1003c21d4 operationDebug
5   0x2c3082002a97
6   0x10050c854 callToJavaScript
7   0x1003b223d JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
8   0x1003976c9 JSC::Interpreter::executeCall(JSC::ExecState*, JSC::JSObject*, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&)
9   0x100093f0e JSC::call(JSC::ExecState*, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&)
10  0x100469fcc JSObjectCallAsFunction
11  0x100004f05 -[PongAI nextMove]
12  0x100003cb6 -[PongController update]
13  0x7fff915700f4 __NSFireTimer
14  0x7fff8ead2564 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__
15  0x7fff8ead209f __CFRunLoopDoTimer
16  0x7fff8eb435aa __CFRunLoopDoTimers
17  0x7fff8ea8d8e5 __CFRunLoopRun
18  0x7fff8ea8d0b5 CFRunLoopRunSpecific
19  0x7fff8dae5a0d RunCurrentEventLoopInMode
20  0x7fff8dae57b7 ReceiveNextEventCommon
21  0x7fff8dae55bc _BlockUntilNextEventMatchingListInModeWithFilter
22  0x7fff94dfb3de _DPSNextEvent
23  0x7fff94dfaa2b -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
24  0x7fff94deeb2c -[NSApplication run]
25  0x7fff94dd9913 NSApplicationMain
26  0x100001522 main
27  0x7fff8c9f85fd start
28  0x3
Comment 1 Radar WebKit Bug Importer 2014-02-13 14:25:38 PST
<rdar://problem/16063681>
Comment 2 Joseph Pecoraro 2014-02-13 14:36:09 PST
Hmm, here is another, slightly different crash. Just evaluating JS in the console of a new RWI. This time breakpoints were enabled. So I really don't know what is going on here.

(lldb) bt
* thread #2: tid = 0x3ac3a, 0x0000000100048fcc JavaScriptCore`WTF::RefPtr<JSC::JITCode>::get(this=0x00000000000000da) const + 12 at RefPtr.h:57, queue = 'com.apple.JavaScriptCore.remote-inspector-xpc-connection', stop reason = EXC_BAD_ACCESS (code=1, address=0xda)
  * frame #0: 0x0000000100048fcc JavaScriptCore`WTF::RefPtr<JSC::JITCode>::get(this=0x00000000000000da) const + 12 at RefPtr.h:57
    frame #1: 0x00000001000442cc JavaScriptCore`JSC::CodeBlock::jitType(this=0x000000000000000a) const + 28 at CodeBlock.h:271
    frame #2: 0x0000000100094d35 JavaScriptCore`JSC::CodeBlock::hasCodeOrigins(this=0x000000000000000a) + 21 at CodeBlock.h:555
    frame #3: 0x0000000100675ee3 JavaScriptCore`JSC::StackVisitor::readFrame(this=0x000000010197e018, callFrame=0x000000010197fa30) + 179 at StackVisitor.cpp:79
    frame #4: 0x0000000100676034 JavaScriptCore`JSC::StackVisitor::gotoNextFrame(this=0x000000010197e018) + 116 at StackVisitor.cpp:54
    frame #5: 0x000000010039ad58 JavaScriptCore`void JSC::StackVisitor::visit<JSC::GetStackTraceFunctor>(startFrame=0x000000010197e470, functor=0x000000010197e0a0) + 104 at StackVisitor.h:126
    frame #6: 0x000000010039957d JavaScriptCore`void JSC::ExecState::iterate<JSC::GetStackTraceFunctor>(this=0x000000010197e470, functor=0x000000010197e0a0) + 29 at CallFrame.h:313
    frame #7: 0x000000010039590a JavaScriptCore`JSC::Interpreter::getStackTrace(this=0x00006000000d2910, results=0x000000010197e268, maxStackSize=18446744073709551615) + 170 at Interpreter.cpp:571
    frame #8: 0x00000001006b68e2 JavaScriptCore`JSC::VM::throwException(this=0x0000000102820200, exec=0x000000010197e470, error=JSValue at 0x000000010197e280) + 306 at VM.cpp:664
    frame #9: 0x00000001006b7458 JavaScriptCore`JSC::VM::throwException(this=0x0000000102820200, exec=0x000000010197e470, error=0x0000000101dcfd70) + 72 at VM.cpp:713
    frame #10: 0x0000000100509641 JavaScriptCore`llint_slow_path_get_from_scope(exec=0x000000010197e470, pc=0x0000000101b05240) + 385 at LLIntSlowPaths.cpp:1379
Comment 3 Mark Lam 2014-02-22 12:22:34 PST
This issue is because we now allow more than 1 JS threads to drop the VM lock.  Consider the following scenario:

1. Thread T1 locks the VM and enters the VM to execute some JS code.
2. On entry, T1 detects that VM::m_entryScope is null i.e. this is the first time it entered the VM.
    T1 sets VM::m_entryScope to T1's entryScope.
3. T1 drops all locks.

4. Thread T2 locks the VM and enters the VM to execute some JS code.
    On entry, T2 sees that VM::m_entryScope is NOT null, and therefore does not set the entryScope.
5. T2 drops all locks.

6. T1 re-grabs locks.
7. T1 returns all the way out of JS code.  On exit from the outer most JS function, T1 clears VM::m_entryScope (because T1 was the one who set it).
8. T1 unlocks the VM.

9. T2 re-grabs locks.
10. T2 proceeds to execute some code and expects VM::m_entryScope to be NOT null, but it turns out to be null.  Assertion failures and crashes ensue.

I think the proper fix for each thread to set its own VMEntryScope and hence, JSGlobalObject at VM entry.  Since we're not requiring different threads to be synchronized on the order in which locks are re-grabbed after being dropped, the act of switching JS execution from T1 to T2 can be thought of conceptually as a thread context switch (T1 and T2's stacks are not synchronized), whereas previously, it was like an RPC call (T1 and T2's stacks are synchronized at the point of the VM lock changing hands).  As such, both T1 and T2 should have their own VMEntryScope just like they would have their own per-thread context.

What this means, in terms of implementation of the fix, is:

1. JSLock::dropAllLocks() saves VM::m_entryScope in thread local data, and sets VM::m_entryScope to null.
2. JSLock::grabAllLocks() sets VM::m_entryScope to the saved entryScope in thread local data.

After we dropped the locks, if a second thread enters the VM for the first time, it will see a null VM::m_entryScope and install its own as expected.  If the second thread unlocks the VM (either because it has exited out of the VM altogether or it dropped the locks), the original thread can re-grab the lock and re-install its own entryScope and continue executing as expected.
Comment 4 Mark Lam 2014-02-22 18:46:31 PST
Correction: now that the VMEntryScope is thread-specific, it doesn't make sense to store it in the VM.  Trying to store it in the VM results in more complexity to keep the value sync'ed with the one that belongs to the thread.  Note: many of our regression tests were failing when I had implemented the naive solution (described in my previous comment) without all the complexity to sync the VM's entryScope value with the thread specific one.

Instead of storing the entryScope in the VM, we can just store it in thread local data.  With that, all the regressions go away, and the extra complexity is not needed.

In this solution, I ended up introducing the VMEntryScope* as a void* to WTFThreadData.  This feels like a layer violation to me.  Also, every time we need to add more VM thread local data, we'll end up modifying WTFThreadData.h.  For now, I'll implement the fix as is by modifying WTFThreadData.  In a subsequent patch, I'll look into making WTFThreadData extensible in some way, and make use of that to extend it to hold additional JSC VM specific thread data.
Comment 5 Mark Lam 2014-02-22 18:52:36 PST
Created attachment 224988 [details]
the patch.
Comment 6 Geoffrey Garen 2014-02-23 09:28:14 PST
Comment on attachment 224988 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=224988&action=review

> Source/JavaScriptCore/ChangeLog:8
> +        Now that we allow more than one thread to drop VM locks, it is no longer

We always allowed this, so you need to update the premise in your description.

> Source/JavaScriptCore/runtime/VM.cpp:894
> +VMEntryScope* VM::entryScope() const
> +{
> +    WTFThreadData& threadData = wtfThreadData();
> +    return reinterpret_cast<VMEntryScope*>(threadData.vmEntryScope());
> +}

It doesn't make sense to turn entryScope into a per-thread concept, for two reasons:

(1) A thread can use more than one VM, and the VMs will stomp each others' per-thread entry scope values.

(2) A null entryScope is used as a promise that nobody is using the VM right now, which allows us to do things like destructively discard code. That signal is only valid because entryScope is a global per-VM concept.
Comment 7 Mark Lam 2014-02-25 00:31:17 PST
Created attachment 225126 [details]
patch 2

While testing patch 2 after rebasing to ToT, I saw some new failures.  I'll have to do some additional testing to make sure that these failures are not due to this patch.
Comment 8 Geoffrey Garen 2014-02-25 09:30:10 PST
Comment on attachment 225126 [details]
patch 2

Previously, the value of vm->entryScope()->globalObject() was the global object that initiated execution. In this patch, the value of vm->entryScope()->globalObject() is random based on thread scheduling.

Was that change intentional? If so, why is it desirable?
Comment 9 Geoffrey Garen 2014-02-25 09:31:16 PST
Also, what is it about the web inspector that ends up running JavaScript on multiple threads for the same VM?
Comment 10 Joseph Pecoraro 2014-02-25 10:49:21 PST
(In reply to comment #9)
> Also, what is it about the web inspector that ends up running JavaScript on multiple threads for the same VM?

Web Inspector always creates a background dispatch queue to receive and dispatch messages from a remote inspector. So a (1) JSContext being used on any thread and (2) a dispatch queue for a RWI debugging it.

I wish I had attached a complete crash log, they are since gone on my system. Mark Lam may have a complete crash log. The backtraces of all threads would have shown the main thread as in Comment #1 and the RWI dispatch_queue somewhere under an APIEntryShim, but having released the lock trying an allocation while still with work to do in the VM.
Comment 11 Geoffrey Garen 2014-02-25 11:54:40 PST
> Web Inspector always creates a background dispatch queue to receive and dispatch messages from a remote inspector. So a (1) JSContext being used on any thread and (2) a dispatch queue for a RWI debugging it.

Does this mean that, when attached to a non-Safari WebKit app, the Web Inspector will end up using the WebCore VM on a secondary thread?
Comment 12 Joseph Pecoraro 2014-02-25 12:08:24 PST
(In reply to comment #11)
> > Web Inspector always creates a background dispatch queue to receive and dispatch messages from a remote inspector. So a (1) JSContext being used on any thread and (2) a dispatch queue for a RWI debugging it.
> 
> Does this mean that, when attached to a non-Safari WebKit app, the Web Inspector will end up using the WebCore VM on a secondary thread?

No. Sorry. For JSContext remote inspection it creates a background dispatch queue. For normal WebCore::Page inspection it deliberately uses the MainThread via the main queue on Mac, or WebThread on iOS. See RemoteInspectorDebuggableConnection::dispatchAsyncOnDebuggable.
Comment 13 Joseph Pecoraro 2014-02-25 12:09:17 PST
RemoteInspectorDebuggableConnection::RemoteInspectorDebuggableConnection constructor has a comment to this effect.
Comment 14 Mark Lam 2014-02-25 17:54:06 PST
Created attachment 225205 [details]
patch 3: require ordered re-entry into the VM

I talked with Geoff offline, and we decided to switch tactics.  Instead of continuing with allowing unordered re-entry into the VM, we're switching to requiring ordered re-entry (which is how we used to roll back in the day i.e. a few weeks ago).  This means making JSLock::grabAllLocks() work the same way it does for the C loop LLINT.  We currently cannot think of any clients (or potential clients) that would require and significantly benefit from ordered re-entry.  Hence, we're moving forward with the simplest solution of just making JSLock::grabAllLocks() ordered.
Comment 15 Geoffrey Garen 2014-02-25 17:55:48 PST
Comment on attachment 225205 [details]
patch 3: require ordered re-entry into the VM

r=me
Comment 16 Mark Lam 2014-02-25 18:08:06 PST
Thanks.  Landed jn r164687: <http://trac.webkit.org/r164687>.