I'm still investigating what's going on. I think I might have found it, but I'm about to run tests. If you put the tests from bug #166707 into a directory, and run them using run-jsc-stress-tests, about a quarter of the run-jsc-stress-tests invocations will hit a crash.
Ok, I think there is a place where we may be forgetting to writeBarrier. Testing out locally now.
<rdar://problem/29983526>
(In reply to comment #1) > Ok, I think there is a place where we may be forgetting to writeBarrier. > Testing out locally now. Ok, ran tests for over an hour with no crashes, so I think I've found it. The bug is, we have code like this: ``` ALWAYS_INLINE PropertyOffset JSObject::prepareToPutDirectWithoutTransition(VM& vm, PropertyName propertyName, unsigned attributes, StructureID structureID, Structure* structure) { unsigned oldOutOfLineCapacity = structure->outOfLineCapacity(); PropertyOffset result; structure->addPropertyWithoutTransition( vm, propertyName, attributes, [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset, PropertyOffset newLastOffset) { unsigned newOutOfLineCapacity = Structure::outOfLineCapacity(newLastOffset); if (newOutOfLineCapacity != oldOutOfLineCapacity) { Butterfly* butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, newOutOfLineCapacity); nukeStructureAndSetButterfly(vm, structureID, butterfly); structure->setLastOffset(newLastOffset); WTF::storeStoreFence(); setStructureIDDirectly(structureID); } else structure->setLastOffset(newLastOffset); result = offset; }); return result; } ``` However, we're missing a writeBarrier after re-setting the StructureID to be non-nuked. So the code should be something like: ``` ALWAYS_INLINE PropertyOffset JSObject::prepareToPutDirectWithoutTransition(VM& vm, PropertyName propertyName, unsigned attributes, StructureID structureID, Structure* structure) { unsigned oldOutOfLineCapacity = structure->outOfLineCapacity(); PropertyOffset result; structure->addPropertyWithoutTransition( vm, propertyName, attributes, [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset, PropertyOffset newLastOffset) { unsigned newOutOfLineCapacity = Structure::outOfLineCapacity(newLastOffset); if (newOutOfLineCapacity != oldOutOfLineCapacity) { Butterfly* butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, newOutOfLineCapacity); nukeStructureAndSetButterfly(vm, structureID, butterfly); structure->setLastOffset(newLastOffset); WTF::storeStoreFence(); setStructureIDDirectly(structureID); } else structure->setLastOffset(newLastOffset); result = offset; vm.heap.writeBarrier(this); }); return result; } ``` Not doing the writeBarrier could cause some weird issues. I believe this would be one: (Mutator Thread: M, Collector Thread: C, assuming sequential consistency via proper barriers) M: allocate new butterfly M: Set nuked structure ID M: Set butterfly (this does a barrier) C: Start scanning O C: load structure ID C: See it's nuked and bail, relying on the barrier to rescan Note, we never rescan, so the butterfly that we just allocated could not be kept alive, and could be allocated over! I think there are probably other bugs, but I haven't worked out all the possibilities.
It almost seems like setStructureIDDirectly should fire a write barrier by default, and the variant of it that we use in nukeBlahBlah should be a special one that doesn't barrier. Maybe that one can have an extra funky name like setStructureIDDirectlyWithoutWriteBarrier.
This looks like the same issue as my bug <rdar://problem/29779787>.
Created attachment 298709 [details] WIP RFC. Going to write up a changelog.
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review I think that this is right. > Source/JavaScriptCore/heap/Heap.cpp:531 > + // FIXME: maybe debug assert? Release asserts are better. > Source/JavaScriptCore/heap/Heap.cpp:2257 > + "Mms", "Mutator/Race Mark Stack", The abbreviation codes are meant to be created like this: - Take first letter of each word on the long version. - Capitalize the first letter. - Downcase all other letters. In this convention, the short name should become "Mrms". The use of first-letter-capital means that we can print these without delimiters. For example the GC might print "WrhCbMrms(100)", which means that it evaluated the weak reference harvester (Wrh), CodeBlock (Cb), and Mutator/Race Mark Stack (Mrms) constraints in that order.
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review > Source/JavaScriptCore/heap/Heap.cpp:2257 > + "Mms", "Mutator/Race Mark Stack", How about "Mutator+Race" which is a more accurate description. > Source/JavaScriptCore/runtime/Structure.cpp:798 > + vm.heap.writeBarrier(object); Is this still needed given that SlotVisitor::didRace() now ensures re-scanning on races?
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review >> Source/JavaScriptCore/runtime/Structure.cpp:798 >> + vm.heap.writeBarrier(object); > > Is this still needed given that SlotVisitor::didRace() now ensures re-scanning on races? In general, write barriers are supposed to happen after stores rather than before. This change corrects the ordering.
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review >>> Source/JavaScriptCore/runtime/Structure.cpp:798 >>> + vm.heap.writeBarrier(object); >> >> Is this still needed given that SlotVisitor::didRace() now ensures re-scanning on races? > > In general, write barriers are supposed to happen after stores rather than before. This change corrects the ordering. Yes, but I think the purpose of this writeBarrier is because we're changing the structureID, and may race with the GC. The GC now automatically takes care this with the raceMarkStack. Hence, there shouldn't be a need for these manually issue barriers anymore. That is unless there's another reason we need this here (which is what I'm really asking)?
Created attachment 298716 [details] patch
Comment on attachment 298716 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=298716&action=review > Source/JavaScriptCore/heap/SlotVisitor.cpp:705 > + heap()->addToRaceMarkStack(race.cell()); You should just inline addToRaceMarkStack here and make SlotVisitor a friend of Heap if it isn't already. I bet it already is.
Comment on attachment 298716 [details] patch r=me too.
Using the work in progress patch with a debug build I got the crash below on the 8th run. Test: DYLD_FRAMEWORK_PATH=WebKitBuild/Debug/ WebKitBuild/Debug/jsc --useFTLJIT\=false --useFunctionDotArguments\=true --maxPerThreadStackUsage\=1572864 --thresholdForJITAfterWarmUp\=10 --thresholdForJITSoon\=10 --thresholdForOptimizeAfterWarmUp\=20 --thresholdForOptimizeAfterLongWarmUp\=20 --thresholdForOptimizeSoon\=20 --thresholdForFTLOptimizeAfterWarmUp\=20 --thresholdForFTLOptimizeSoon\=20 --maximumEvalCacheableSourceLength\=150000 --useEagerCodeBlockJettisonTiming\=true --collectContinuously\=true --useGenerationalGC\=false JSTests/stress/has-own-property-name-cache-string-keys.js Backtrace: ASSERTION FAILED: cell->cellState() == CellState::PossiblyGrey /Volumes/Data/src/webkit/Source/JavaScriptCore/heap/SlotVisitor.cpp(287) : void JSC::SlotVisitor::appendToMarkStack(ContainerType &, JSC::JSCell *) [ContainerType = JSC::MarkedBlock] 1 0x108e6d71d WTFCrash 2 0x108cafa4a void JSC::SlotVisitor::appendToMarkStack<JSC::MarkedBlock>(JSC::MarkedBlock&, JSC::JSCell*) 3 0x108caf82d void JSC::SlotVisitor::setMarkedAndAppendToMarkStack<JSC::MarkedBlock>(JSC::MarkedBlock&, JSC::JSCell*) 4 0x108caf44a JSC::SlotVisitor::setMarkedAndAppendToMarkStack(JSC::JSCell*) 5 0x108caf4dc JSC::SlotVisitor::appendHidden(JSC::JSValue) 6 0x1088d4c25 void JSC::SlotVisitor::appendHidden<JSC::Unknown>(JSC::WriteBarrierBase<JSC::Unknown> const&) 7 0x108975ca6 JSC::JSSegmentedVariableObject::visitChildren(JSC::JSCell*, JSC::SlotVisitor&) 8 0x108cafe04 JSC::SlotVisitor::visitChildren(JSC::JSCell const*) 9 0x108cb02cb JSC::SlotVisitor::drain(WTF::MonotonicTime) 10 0x108cb0b78 JSC::SlotVisitor::drainFromShared(JSC::SlotVisitor::SharedDrainMode, WTF::MonotonicTime) 11 0x10870900e JSC::Heap::markToFixpoint(double)::$_2::operator()() const 12 0x108708c19 WTF::SharedTaskFunctor<void (), JSC::Heap::markToFixpoint(double)::$_2>::run() 13 0x108ea921d WTF::ParallelHelperClient::runTask(WTF::RefPtr<WTF::SharedTask<void ()> >) 14 0x108ea9f8f WTF::ParallelHelperPool::Thread::work() 15 0x108e7b86d WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0::operator()() const 16 0x108e7b62d void std::__1::__invoke_void_return_wrapper<void>::__call<WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0&>(WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0&&&) 17 0x108e7b3c9 std::__1::__function::__func<WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0, std::__1::allocator<WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0>, void ()>::operator()() 18 0x10846586a std::__1::function<void ()>::operator()() const 19 0x108edeaa7 WTF::threadEntryPoint(void*) 20 0x108ee0471 WTF::wtfThreadEntryPoint(void*) 21 0x7fffbfc9faab _pthread_body 22 0x7fffbfc9f9f7 _pthread_body 23 0x7fffbfc9f221 thread_start Segmentation fault: 11
For my crash, the main thread had this backtrace. All other threads were waiting. Thread 0:: Dispatch queue: com.apple.main-thread 0 com.apple.JavaScriptCore 0x0000000108cdb991 JSC::Structure::pin(JSC::VM&, JSC::PropertyTable*) + 1 (Structure.cpp:804) 1 com.apple.JavaScriptCore 0x0000000107a97622 int JSC::Structure::addPropertyWithoutTransition<JSC::JSObject::prepareToPutDirectWithoutTransition(JSC::VM&, JSC::PropertyName, unsigned int, unsigned int, JSC::Structure*)::'lambda'(JSC::GCSafeConcurrentJSLocker const&, int, int)>(JSC::VM&, JSC::PropertyName, unsigned int, JSC::JSObject::prepareToPutDirectWithoutTransition(JSC::VM&, JSC::PropertyName, unsigned int, unsigned int, JSC::Structure*)::'lambda'(JSC::GCSafeConcurrentJSLocker const&, int, int) const&) + 82 (StructureInlines.h:354) 2 com.apple.JavaScriptCore 0x0000000107a9606a JSC::JSObject::prepareToPutDirectWithoutTransition(JSC::VM&, JSC::PropertyName, unsigned int, unsigned int, JSC::Structure*) + 138 (JSObjectInlines.h:192) 3 com.apple.JavaScriptCore 0x00000001080e90c0 bool JSC::JSObject::putDirectInternal<(JSC::JSObject::PutMode)0>(JSC::VM&, JSC::PropertyName, JSC::JSValue, unsigned int, JSC::PutPropertySlot&) + 928 (JSObjectInlines.h:278) 4 com.apple.JavaScriptCore 0x00000001080e89dc JSC::JSObject::putInline(JSC::JSCell*, JSC::ExecState*, JSC::PropertyName, JSC::JSValue, JSC::PutPropertySlot&) + 924 (JSObjectInlines.h:215) 5 com.apple.JavaScriptCore 0x000000010893f845 JSC::JSObject::put(JSC::JSCell*, JSC::ExecState*, JSC::PropertyName, JSC::JSValue, JSC::PutPropertySlot&) + 69 (JSObject.cpp:751) 6 com.apple.JavaScriptCore 0x00000001080e32fe JSC::JSValue::put(JSC::ExecState*, JSC::PropertyName, JSC::JSValue, JSC::PutPropertySlot&) + 206 (JSCJSValueInlines.h:867) 7 com.apple.JavaScriptCore 0x00000001083caeb7 void JSC::DFG::operationPutByValInternal<false, false>(JSC::ExecState*, long long, long long, long long) + 807 (DFGOperations.cpp:145) 8 com.apple.JavaScriptCore 0x00000001083cab7b operationPutByValNonStrict + 75 (DFGOperations.cpp:624) 9 ??? 0x00003def31a03067 0 + 68097539059815 10 com.apple.JavaScriptCore 0x0000000108a3928e vmEntryToJavaScript + 334 11 com.apple.JavaScriptCore 0x000000010884d269 JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*) + 329 (JITCode.cpp:81) 12 com.apple.JavaScriptCore 0x0000000108805534 JSC::Interpreter::execute(JSC::ProgramExecutable*, JSC::ExecState*, JSC::JSObject*) + 4788 (Interpreter.cpp:871) 13 com.apple.JavaScriptCore 0x00000001080ebd8d JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&) + 669 (Completion.cpp:110) 14 jsc 0x000000010798a9eb runWithScripts(GlobalObject*, WTF::Vector<Script, 0ul, WTF::CrashOnOverflow, 16ul> const&, WTF::String const&, bool, bool, bool) + 2123 (jsc.cpp:2957) 15 jsc 0x000000010797c46a runJSC(JSC::VM*, CommandLine) + 1370 (jsc.cpp:3237) 16 jsc 0x000000010797afcd jscmain(int, char**) + 157 (jsc.cpp:3292) 17 jsc 0x000000010797af1e main + 46 (jsc.cpp:2804) 18 libdyld.dylib 0x00007fffbfa88255 start + 1
Created attachment 298729 [details] patch for landing Discussed with Fil and other how the race could happen over IRC and why that assert is wrong: [15:36:24] <&pizlo> 1) mutator fires write barrier on black-unmarked object during full GC [15:36:33] <&pizlo> 2) mutator CASes the color of the object from black to white [15:36:37] <&pizlo> 3) GC marks the object [15:36:43] <&pizlo> 4) GC sets the object's color to grey [15:36:54] <&pizlo> 5) mutator realizes that the object is now marked [15:36:59] <&pizlo> 6) mutator sets the object's color to black [15:37:07] <&pizlo> 7) GC crashes because the color of the object is not grey Also note, that it could end up being white, since the setState(PossiblyGrey) isn't CAS and can leave it being white in a not sequentially consistent world.
Comment on attachment 298729 [details] patch for landing Clearing flags on attachment: 298729 Committed r210694: <http://trac.webkit.org/changeset/210694>
All reviewed patches have been landed. Closing bug.