* STEPS TO REPRODUCE:
1. Build and Run JSPong
2. Open JSContext RWI
3. Add breakpoint to nextMove
4. Disable all breakpoints
5. Ensure the specific breakpoint is "enabled" (click if needed)
6. Evaluate "1+1" in console
ASSERTION FAILED: vm().entryScope
1 0x100719120 WTFCrash
2 0x1000943ee JSC::ExecState::vmEntryGlobalObject()
3 0x1003983cc JSC::Interpreter::debug(JSC::ExecState*, JSC::DebugHookID)
4 0x1003c21d4 operationDebug
7 0x1003b223d JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
8 0x1003976c9 JSC::Interpreter::executeCall(JSC::ExecState*, JSC::JSObject*, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&)
9 0x100093f0e JSC::call(JSC::ExecState*, JSC::JSValue, JSC::CallType, JSC::CallData const&, JSC::JSValue, JSC::ArgList const&)
10 0x100469fcc JSObjectCallAsFunction
11 0x100004f05 -[PongAI nextMove]
12 0x100003cb6 -[PongController update]
13 0x7fff915700f4 __NSFireTimer
14 0x7fff8ead2564 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__
15 0x7fff8ead209f __CFRunLoopDoTimer
16 0x7fff8eb435aa __CFRunLoopDoTimers
17 0x7fff8ea8d8e5 __CFRunLoopRun
18 0x7fff8ea8d0b5 CFRunLoopRunSpecific
19 0x7fff8dae5a0d RunCurrentEventLoopInMode
20 0x7fff8dae57b7 ReceiveNextEventCommon
21 0x7fff8dae55bc _BlockUntilNextEventMatchingListInModeWithFilter
22 0x7fff94dfb3de _DPSNextEvent
23 0x7fff94dfaa2b -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
24 0x7fff94deeb2c -[NSApplication run]
25 0x7fff94dd9913 NSApplicationMain
26 0x100001522 main
27 0x7fff8c9f85fd start
Hmm, here is another, slightly different crash. Just evaluating JS in the console of a new RWI. This time breakpoints were enabled. So I really don't know what is going on here.
This issue is because we now allow more than 1 JS threads to drop the VM lock. Consider the following scenario:
1. Thread T1 locks the VM and enters the VM to execute some JS code.
2. On entry, T1 detects that VM::m_entryScope is null i.e. this is the first time it entered the VM.
T1 sets VM::m_entryScope to T1's entryScope.
3. T1 drops all locks.
4. Thread T2 locks the VM and enters the VM to execute some JS code.
On entry, T2 sees that VM::m_entryScope is NOT null, and therefore does not set the entryScope.
5. T2 drops all locks.
6. T1 re-grabs locks.
7. T1 returns all the way out of JS code. On exit from the outer most JS function, T1 clears VM::m_entryScope (because T1 was the one who set it).
8. T1 unlocks the VM.
9. T2 re-grabs locks.
10. T2 proceeds to execute some code and expects VM::m_entryScope to be NOT null, but it turns out to be null. Assertion failures and crashes ensue.
I think the proper fix for each thread to set its own VMEntryScope and hence, JSGlobalObject at VM entry. Since we're not requiring different threads to be synchronized on the order in which locks are re-grabbed after being dropped, the act of switching JS execution from T1 to T2 can be thought of conceptually as a thread context switch (T1 and T2's stacks are not synchronized), whereas previously, it was like an RPC call (T1 and T2's stacks are synchronized at the point of the VM lock changing hands). As such, both T1 and T2 should have their own VMEntryScope just like they would have their own per-thread context.
What this means, in terms of implementation of the fix, is:
1. JSLock::dropAllLocks() saves VM::m_entryScope in thread local data, and sets VM::m_entryScope to null.
2. JSLock::grabAllLocks() sets VM::m_entryScope to the saved entryScope in thread local data.
After we dropped the locks, if a second thread enters the VM for the first time, it will see a null VM::m_entryScope and install its own as expected. If the second thread unlocks the VM (either because it has exited out of the VM altogether or it dropped the locks), the original thread can re-grab the lock and re-install its own entryScope and continue executing as expected.
Correction: now that the VMEntryScope is thread-specific, it doesn't make sense to store it in the VM. Trying to store it in the VM results in more complexity to keep the value sync'ed with the one that belongs to the thread. Note: many of our regression tests were failing when I had implemented the naive solution (described in my previous comment) without all the complexity to sync the VM's entryScope value with the thread specific one.
Instead of storing the entryScope in the VM, we can just store it in thread local data. With that, all the regressions go away, and the extra complexity is not needed.
In this solution, I ended up introducing the VMEntryScope* as a void* to WTFThreadData. This feels like a layer violation to me. Also, every time we need to add more VM thread local data, we'll end up modifying WTFThreadData.h. For now, I'll implement the fix as is by modifying WTFThreadData. In a subsequent patch, I'll look into making WTFThreadData extensible in some way, and make use of that to extend it to hold additional JSC VM specific thread data.
Created attachment 224988 [details]
Comment on attachment 224988 [details]
View in context: https://bugs.webkit.org/attachment.cgi?id=224988&action=review
> + Now that we allow more than one thread to drop VM locks, it is no longer
We always allowed this, so you need to update the premise in your description.
> +VMEntryScope* VM::entryScope() const
> + WTFThreadData& threadData = wtfThreadData();
> + return reinterpret_cast<VMEntryScope*>(threadData.vmEntryScope());
It doesn't make sense to turn entryScope into a per-thread concept, for two reasons:
(1) A thread can use more than one VM, and the VMs will stomp each others' per-thread entry scope values.
(2) A null entryScope is used as a promise that nobody is using the VM right now, which allows us to do things like destructively discard code. That signal is only valid because entryScope is a global per-VM concept.
Created attachment 225126 [details]
While testing patch 2 after rebasing to ToT, I saw some new failures. I'll have to do some additional testing to make sure that these failures are not due to this patch.
Comment on attachment 225126 [details]
Previously, the value of vm->entryScope()->globalObject() was the global object that initiated execution. In this patch, the value of vm->entryScope()->globalObject() is random based on thread scheduling.
Was that change intentional? If so, why is it desirable?
(In reply to comment #9)
Web Inspector always creates a background dispatch queue to receive and dispatch messages from a remote inspector. So a (1) JSContext being used on any thread and (2) a dispatch queue for a RWI debugging it.
I wish I had attached a complete crash log, they are since gone on my system. Mark Lam may have a complete crash log. The backtraces of all threads would have shown the main thread as in Comment #1 and the RWI dispatch_queue somewhere under an APIEntryShim, but having released the lock trying an allocation while still with work to do in the VM.
> Web Inspector always creates a background dispatch queue to receive and dispatch messages from a remote inspector. So a (1) JSContext being used on any thread and (2) a dispatch queue for a RWI debugging it.
Does this mean that, when attached to a non-Safari WebKit app, the Web Inspector will end up using the WebCore VM on a secondary thread?
(In reply to comment #11)
> > Web Inspector always creates a background dispatch queue to receive and dispatch messages from a remote inspector. So a (1) JSContext being used on any thread and (2) a dispatch queue for a RWI debugging it.
> Does this mean that, when attached to a non-Safari WebKit app, the Web Inspector will end up using the WebCore VM on a secondary thread?
No. Sorry. For JSContext remote inspection it creates a background dispatch queue. For normal WebCore::Page inspection it deliberately uses the MainThread via the main queue on Mac, or WebThread on iOS. See RemoteInspectorDebuggableConnection::dispatchAsyncOnDebuggable.
RemoteInspectorDebuggableConnection::RemoteInspectorDebuggableConnection constructor has a comment to this effect.
Created attachment 225205 [details]
patch 3: require ordered re-entry into the VM
I talked with Geoff offline, and we decided to switch tactics. Instead of continuing with allowing unordered re-entry into the VM, we're switching to requiring ordered re-entry (which is how we used to roll back in the day i.e. a few weeks ago). This means making JSLock::grabAllLocks() work the same way it does for the C loop LLINT. We currently cannot think of any clients (or potential clients) that would require and significantly benefit from ordered re-entry. Hence, we're moving forward with the simplest solution of just making JSLock::grabAllLocks() ordered.
Comment on attachment 225205 [details]
patch 3: require ordered re-entry into the VM
Thanks. Landed jn r164687: <http://trac.webkit.org/r164687>.