* SUMMARY GuardMalloc crash on DFGWorkList thread in JSC::Heap::isCollecting for destroyed Web Worker. * STEPS TO REPRODUCE shell> run-webkit-tests --release --no-retry-failures --time-out-ms=30000 -g -1 --iterations=50 LayoutTests/inspector/heap/snapshot.html * NOTES - In Release it is very quick to crash (almost always the 2nd test run). - In Debug it is very rare to crash, I got it after ~20 runs. - The Web Inspector was recently made to use Web Workers here <http://trac.webkit.org/changeset/198353> * CRASH SNIPPET Crashed Thread: 15 FTL Worklist Worker Thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000358989b30 Exception Note: EXC_CORPSE_NOTIFY Application Specific Information: CRASHING TEST: inspector/heap/snapshot.html This process is running with libgmalloc.dylib (GuardMalloc) which may have forced the crash due to a memory access error. Thread 15 Crashed:: FTL Worklist Worker Thread 0 com.apple.JavaScriptCore 0x000000010bb7fc9e JSC::Heap::isCollecting() + 14 1 com.apple.JavaScriptCore 0x000000010c1ab0c5 JSC::DFG::Worklist::runThread(JSC::DFG::ThreadData*) + 757 2 com.apple.JavaScriptCore 0x000000010c1a9404 JSC::DFG::Worklist::threadFunction(void*) + 36 3 com.apple.JavaScriptCore 0x000000010ca53dc9 WTF::createThread(void (*)(void*), void*, char const*)::$_0::operator()() const + 25 4 com.apple.JavaScriptCore 0x000000010ca53d9d void std::__1::__invoke_void_return_wrapper<void>::__call<WTF::createThread(void (*)(void*), void*, char const*)::$_0&>(WTF::createThread(void (*)(void*), void*, char const*)::$_0&&&) + 45 5 com.apple.JavaScriptCore 0x000000010ca53d3c std::__1::__function::__func<WTF::createThread(void (*)(void*), void*, char const*)::$_0, std::__1::allocator<WTF::createThread(void (*)(void*), void*, char const*)::$_0>, void ()>::operator()() + 44 6 com.apple.JavaScriptCore 0x000000010c2eef5a std::__1::function<void ()>::operator()() const + 26 7 com.apple.JavaScriptCore 0x000000010ca5299e WTF::threadEntryPoint(void*) + 158 8 com.apple.JavaScriptCore 0x000000010ca54331 WTF::wtfThreadEntryPoint(void*) + 289 9 libsystem_pthread.dylib 0x00007fff966b799d _pthread_body + 131 10 libsystem_pthread.dylib 0x00007fff966b791a _pthread_start + 168 11 libsystem_pthread.dylib 0x00007fff966b5351 thread_start + 13
<rdar://problem/25251439>
Caught in lldb I was able to confirm that this is a Cancelled DFG::Plan holding a reference to now gone Web Worker VM that has since destructed. 1. VM::~VM for the Web Worker on thread #289: (lldb) bt thread #289: tid = 0x69ded, 0x0000000109170b10 JavaScriptCore`JSC::VM::~VM(this=0x0000001715b67a90) + 16 at VM.cpp:335, name = 'WebCore: Worker', stop reason = breakpoint 3.2 frame #0: JavaScriptCore`JSC::VM::~VM(this=0x0000001715b67a90) + 16 at VM.cpp:335 frame #1: JavaScriptCore`WTF::ThreadSafeRefCounted<JSC::VM>::deref frame #2: JavaScriptCore`void WTF::derefIfNotNull<JSC::VM> frame #3: JavaScriptCore`WTF::RefPtr<JSC::VM>::operator= frame #4: JavaScriptCore`JSC::JSLockHolder::~JSLockHolder frame #5: JavaScriptCore`JSC::JSLockHolder::~JSLockHolder frame #6: WebCore`WebCore::WorkerScriptController::~WorkerScriptController frame #7: WebCore`WebCore::WorkerScriptController::~WorkerScriptController frame #8: WebCore`WebCore::WorkerGlobalScope::clearScript frame #9: WebCore`WebCore::WorkerGlobalScope::clearScript frame #10: WebCore`WebCore::WorkerGlobalScope::clearScript frame #11: WebCore`WebCore::WorkerGlobalScope::clearScript frame #12: WebCore`WebCore::WorkerThread::stop (lldb) c 2. Hitting the Crash (lldb) Process 35098 stopped * thread #41: tid = 0x6934f, 0x000000010845c6ae JavaScriptCore`JSC::Heap::isCollecting(this=0x0000001715b67aa8) + 14 at HeapInlines.h:58, name = 'FTL Worklist Worker Thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1715b67b30) frame #0: 0x000000010845c6ae JavaScriptCore`JSC::Heap::isCollecting(this=0x0000001715b67aa8) + 14 at HeapInlines.h:58 55 56 inline bool Heap::isCollecting() 57 { -> 58 return m_operationInProgress == FullCollection || m_operationInProgress == EdenCollection; 59 } 60 61 inline Heap* Heap::heap(const JSCell* cell) 3. Inspect the DFG::Plan it contains the Web Worker's VM, and the Plan has been Cancelled, so none of its members are safe to use. (lldb) up frame #1: 0x0000000108a75e95 JavaScriptCore`JSC::DFG::Worklist::runThread(this=0x00000003c4c73ef0, data=0x00000003c4ee5fe0) + 757 at DFGWorklist.cpp:370 367 368 RELEASE_ASSERT(!plan->vm.heap.isCollecting()); 369 plan->compileInThread(longLivedState, data); -> 370 RELEASE_ASSERT(!plan->vm.heap.isCollecting()); 371 372 { 373 LockHolder locker(m_lock); (lldb) frame variable plan.m_ptr->vm (JSC::VM &) plan.m_ptr->vm = 0x0000001715b67a90: {...} (lldb) frame variable plan.m_ptr->stage (JSC::DFG::Plan::Stage) plan.m_ptr->stage = Cancelled
Created attachment 274500 [details] [PATCH] Proposed Fix Proposed fix. This DFG::Plan is not seen by VM::~VM's attempt to remove all plans for the destructing VM because that path loops over `m_plans`. A Cancelled DFG::Plan would have already been removed from `m_plans`.
(In reply to comment #2) > Caught in lldb I was able to confirm that this is a Cancelled DFG::Plan > holding a reference to now gone Web Worker VM that has since destructed. > > 1. VM::~VM for the Web Worker on thread #289: > > (lldb) bt > thread #289: tid = 0x69ded, 0x0000000109170b10 > JavaScriptCore`JSC::VM::~VM(this=0x0000001715b67a90) + 16 at VM.cpp:335, > name = 'WebCore: Worker', stop reason = breakpoint 3.2 > frame #0: JavaScriptCore`JSC::VM::~VM(this=0x0000001715b67a90) + 16 > at VM.cpp:335 > frame #1: JavaScriptCore`WTF::ThreadSafeRefCounted<JSC::VM>::deref > frame #2: JavaScriptCore`void WTF::derefIfNotNull<JSC::VM> > frame #3: JavaScriptCore`WTF::RefPtr<JSC::VM>::operator= > frame #4: JavaScriptCore`JSC::JSLockHolder::~JSLockHolder > frame #5: JavaScriptCore`JSC::JSLockHolder::~JSLockHolder > frame #6: > WebCore`WebCore::WorkerScriptController::~WorkerScriptController > frame #7: > WebCore`WebCore::WorkerScriptController::~WorkerScriptController > frame #8: WebCore`WebCore::WorkerGlobalScope::clearScript > frame #9: WebCore`WebCore::WorkerGlobalScope::clearScript > frame #10: WebCore`WebCore::WorkerGlobalScope::clearScript > frame #11: WebCore`WebCore::WorkerGlobalScope::clearScript > frame #12: WebCore`WebCore::WorkerThread::stop > (lldb) c > > > 2. Hitting the Crash > > (lldb) Process 35098 stopped > * thread #41: tid = 0x6934f, 0x000000010845c6ae > JavaScriptCore`JSC::Heap::isCollecting(this=0x0000001715b67aa8) + 14 at > HeapInlines.h:58, name = 'FTL Worklist Worker Thread', stop reason = > EXC_BAD_ACCESS (code=1, address=0x1715b67b30) > frame #0: 0x000000010845c6ae > JavaScriptCore`JSC::Heap::isCollecting(this=0x0000001715b67aa8) + 14 at > HeapInlines.h:58 > 55 > 56 inline bool Heap::isCollecting() > 57 { > -> 58 return m_operationInProgress == FullCollection || > m_operationInProgress == EdenCollection; > 59 } > 60 > 61 inline Heap* Heap::heap(const JSCell* cell) > > 3. Inspect the DFG::Plan it contains the Web Worker's VM, and the Plan has > been Cancelled, so none of its members are safe to use. > > > (lldb) up > frame #1: 0x0000000108a75e95 > JavaScriptCore`JSC::DFG::Worklist::runThread(this=0x00000003c4c73ef0, > data=0x00000003c4ee5fe0) + 757 at DFGWorklist.cpp:370 > 367 > 368 RELEASE_ASSERT(!plan->vm.heap.isCollecting()); > 369 plan->compileInThread(longLivedState, data); > -> 370 RELEASE_ASSERT(!plan->vm.heap.isCollecting()); > 371 > 372 { > 373 LockHolder locker(m_lock); > > (lldb) frame variable plan.m_ptr->vm > (JSC::VM &) plan.m_ptr->vm = 0x0000001715b67a90: {...} > > (lldb) frame variable plan.m_ptr->stage > (JSC::DFG::Plan::Stage) plan.m_ptr->stage = Cancelled Fascinating. VM destruction should wait for all plans for that VM to be retired from the worklist. I believe that this code is fine. But note that the GC has a magical power, which I believe comes into play here: it can run concurrently to compilation and cancel a plan while that plan is compiling. We take advantage of the fact that some parts of the compilation pipeline, like the B3 prepareForGeneration() phase, are both long and have no need for the VM& or any JS-ey things. In other words, we can run GC during this "safe" phase. If the GC finds that the plan is compiling something that is garbage, the GC can cancel the plan. B3 will keep running, but when it finishes, we will "exit" the safepoint, which will wait for GC to finish, and then observe that the plan has been cancelled. The plan will then just tear down all of its data and return. And then this awesomeness happens: the VM has also been destroyed in the meantime. Normally, the VM would wait for compilation threads to finish. But it doesn't wait for cancelled ones to finish. So, the VM was destroyed and then this idiotic assertion happens! It's probably the only darn thing on the cancelled-plan-gives-up-and-dies path that requires the VM! Since it's a release assertion we always access the isCollecting bits, and so in this case we die. The fix is to remove the assertion. :-) That way, there won't be anything on the "give-up-and-die" path that needs the VM to still exist. And yes, I just read the code for this, and it does look like everyone else just speedily bails without touching anything, except for these bogus assertions. There are a handful of them. I believe they should all be burned.
Comment on attachment 274500 [details] [PATCH] Proposed Fix Nice. Better than removing the assertion.
http://trac.webkit.org/changeset/198477
This is not a security issue since it resulted in a RELEASE_ASSERT().