Bug 166960

Summary: Concurrent GC has a bug where we would detect a race but fail to rescan the object
Product: WebKit Reporter: Saam Barati <saam>
Component: JavaScriptCoreAssignee: Saam Barati <saam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, commit-queue, fpizlo, ggaren, gskachkov, jfbastien, keith_miller, mark.lam, msaboff, oliver, ticaiolima, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Local Build   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 165909, 166944, 166989    
Attachments:
Description Flags
WIP
none
patch
fpizlo: review+
patch for landing none

Saam Barati
Reported 2017-01-11 19:01:28 PST
I'm still investigating what's going on. I think I might have found it, but I'm about to run tests. If you put the tests from bug #166707 into a directory, and run them using run-jsc-stress-tests, about a quarter of the run-jsc-stress-tests invocations will hit a crash.
Attachments
WIP (8.43 KB, patch)
2017-01-12 14:31 PST, Saam Barati
no flags
patch (12.56 KB, patch)
2017-01-12 15:14 PST, Saam Barati
fpizlo: review+
patch for landing (12.28 KB, patch)
2017-01-12 16:09 PST, Saam Barati
no flags
Saam Barati
Comment 1 2017-01-11 19:05:59 PST
Ok, I think there is a place where we may be forgetting to writeBarrier. Testing out locally now.
Radar WebKit Bug Importer
Comment 2 2017-01-11 19:36:56 PST
Saam Barati
Comment 3 2017-01-11 20:10:09 PST
(In reply to comment #1) > Ok, I think there is a place where we may be forgetting to writeBarrier. > Testing out locally now. Ok, ran tests for over an hour with no crashes, so I think I've found it. The bug is, we have code like this: ``` ALWAYS_INLINE PropertyOffset JSObject::prepareToPutDirectWithoutTransition(VM& vm, PropertyName propertyName, unsigned attributes, StructureID structureID, Structure* structure) { unsigned oldOutOfLineCapacity = structure->outOfLineCapacity(); PropertyOffset result; structure->addPropertyWithoutTransition( vm, propertyName, attributes, [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset, PropertyOffset newLastOffset) { unsigned newOutOfLineCapacity = Structure::outOfLineCapacity(newLastOffset); if (newOutOfLineCapacity != oldOutOfLineCapacity) { Butterfly* butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, newOutOfLineCapacity); nukeStructureAndSetButterfly(vm, structureID, butterfly); structure->setLastOffset(newLastOffset); WTF::storeStoreFence(); setStructureIDDirectly(structureID); } else structure->setLastOffset(newLastOffset); result = offset; }); return result; } ``` However, we're missing a writeBarrier after re-setting the StructureID to be non-nuked. So the code should be something like: ``` ALWAYS_INLINE PropertyOffset JSObject::prepareToPutDirectWithoutTransition(VM& vm, PropertyName propertyName, unsigned attributes, StructureID structureID, Structure* structure) { unsigned oldOutOfLineCapacity = structure->outOfLineCapacity(); PropertyOffset result; structure->addPropertyWithoutTransition( vm, propertyName, attributes, [&] (const GCSafeConcurrentJSLocker&, PropertyOffset offset, PropertyOffset newLastOffset) { unsigned newOutOfLineCapacity = Structure::outOfLineCapacity(newLastOffset); if (newOutOfLineCapacity != oldOutOfLineCapacity) { Butterfly* butterfly = allocateMoreOutOfLineStorage(vm, oldOutOfLineCapacity, newOutOfLineCapacity); nukeStructureAndSetButterfly(vm, structureID, butterfly); structure->setLastOffset(newLastOffset); WTF::storeStoreFence(); setStructureIDDirectly(structureID); } else structure->setLastOffset(newLastOffset); result = offset; vm.heap.writeBarrier(this); }); return result; } ``` Not doing the writeBarrier could cause some weird issues. I believe this would be one: (Mutator Thread: M, Collector Thread: C, assuming sequential consistency via proper barriers) M: allocate new butterfly M: Set nuked structure ID M: Set butterfly (this does a barrier) C: Start scanning O C: load structure ID C: See it's nuked and bail, relying on the barrier to rescan Note, we never rescan, so the butterfly that we just allocated could not be kept alive, and could be allocated over! I think there are probably other bugs, but I haven't worked out all the possibilities.
Filip Pizlo
Comment 4 2017-01-11 22:07:49 PST
It almost seems like setStructureIDDirectly should fire a write barrier by default, and the variant of it that we use in nukeBlahBlah should be a special one that doesn't barrier. Maybe that one can have an extra funky name like setStructureIDDirectlyWithoutWriteBarrier.
Michael Saboff
Comment 5 2017-01-11 22:13:23 PST
This looks like the same issue as my bug <rdar://problem/29779787>.
Saam Barati
Comment 6 2017-01-12 14:31:14 PST
Created attachment 298709 [details] WIP RFC. Going to write up a changelog.
Filip Pizlo
Comment 7 2017-01-12 14:35:44 PST
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review I think that this is right. > Source/JavaScriptCore/heap/Heap.cpp:531 > + // FIXME: maybe debug assert? Release asserts are better. > Source/JavaScriptCore/heap/Heap.cpp:2257 > + "Mms", "Mutator/Race Mark Stack", The abbreviation codes are meant to be created like this: - Take first letter of each word on the long version. - Capitalize the first letter. - Downcase all other letters. In this convention, the short name should become "Mrms". The use of first-letter-capital means that we can print these without delimiters. For example the GC might print "WrhCbMrms(100)", which means that it evaluated the weak reference harvester (Wrh), CodeBlock (Cb), and Mutator/Race Mark Stack (Mrms) constraints in that order.
Mark Lam
Comment 8 2017-01-12 14:42:58 PST
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review > Source/JavaScriptCore/heap/Heap.cpp:2257 > + "Mms", "Mutator/Race Mark Stack", How about "Mutator+Race" which is a more accurate description. > Source/JavaScriptCore/runtime/Structure.cpp:798 > + vm.heap.writeBarrier(object); Is this still needed given that SlotVisitor::didRace() now ensures re-scanning on races?
Geoffrey Garen
Comment 9 2017-01-12 14:47:02 PST
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review >> Source/JavaScriptCore/runtime/Structure.cpp:798 >> + vm.heap.writeBarrier(object); > > Is this still needed given that SlotVisitor::didRace() now ensures re-scanning on races? In general, write barriers are supposed to happen after stores rather than before. This change corrects the ordering.
Mark Lam
Comment 10 2017-01-12 14:49:36 PST
Comment on attachment 298709 [details] WIP View in context: https://bugs.webkit.org/attachment.cgi?id=298709&action=review >>> Source/JavaScriptCore/runtime/Structure.cpp:798 >>> + vm.heap.writeBarrier(object); >> >> Is this still needed given that SlotVisitor::didRace() now ensures re-scanning on races? > > In general, write barriers are supposed to happen after stores rather than before. This change corrects the ordering. Yes, but I think the purpose of this writeBarrier is because we're changing the structureID, and may race with the GC. The GC now automatically takes care this with the raceMarkStack. Hence, there shouldn't be a need for these manually issue barriers anymore. That is unless there's another reason we need this here (which is what I'm really asking)?
Saam Barati
Comment 11 2017-01-12 15:14:24 PST
Filip Pizlo
Comment 12 2017-01-12 15:16:22 PST
Comment on attachment 298716 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=298716&action=review > Source/JavaScriptCore/heap/SlotVisitor.cpp:705 > + heap()->addToRaceMarkStack(race.cell()); You should just inline addToRaceMarkStack here and make SlotVisitor a friend of Heap if it isn't already. I bet it already is.
Mark Lam
Comment 13 2017-01-12 15:19:26 PST
Comment on attachment 298716 [details] patch r=me too.
Michael Saboff
Comment 14 2017-01-12 15:28:27 PST
Using the work in progress patch with a debug build I got the crash below on the 8th run. Test: DYLD_FRAMEWORK_PATH=WebKitBuild/Debug/ WebKitBuild/Debug/jsc --useFTLJIT\=false --useFunctionDotArguments\=true --maxPerThreadStackUsage\=1572864 --thresholdForJITAfterWarmUp\=10 --thresholdForJITSoon\=10 --thresholdForOptimizeAfterWarmUp\=20 --thresholdForOptimizeAfterLongWarmUp\=20 --thresholdForOptimizeSoon\=20 --thresholdForFTLOptimizeAfterWarmUp\=20 --thresholdForFTLOptimizeSoon\=20 --maximumEvalCacheableSourceLength\=150000 --useEagerCodeBlockJettisonTiming\=true --collectContinuously\=true --useGenerationalGC\=false JSTests/stress/has-own-property-name-cache-string-keys.js Backtrace: ASSERTION FAILED: cell->cellState() == CellState::PossiblyGrey /Volumes/Data/src/webkit/Source/JavaScriptCore/heap/SlotVisitor.cpp(287) : void JSC::SlotVisitor::appendToMarkStack(ContainerType &, JSC::JSCell *) [ContainerType = JSC::MarkedBlock] 1 0x108e6d71d WTFCrash 2 0x108cafa4a void JSC::SlotVisitor::appendToMarkStack<JSC::MarkedBlock>(JSC::MarkedBlock&, JSC::JSCell*) 3 0x108caf82d void JSC::SlotVisitor::setMarkedAndAppendToMarkStack<JSC::MarkedBlock>(JSC::MarkedBlock&, JSC::JSCell*) 4 0x108caf44a JSC::SlotVisitor::setMarkedAndAppendToMarkStack(JSC::JSCell*) 5 0x108caf4dc JSC::SlotVisitor::appendHidden(JSC::JSValue) 6 0x1088d4c25 void JSC::SlotVisitor::appendHidden<JSC::Unknown>(JSC::WriteBarrierBase<JSC::Unknown> const&) 7 0x108975ca6 JSC::JSSegmentedVariableObject::visitChildren(JSC::JSCell*, JSC::SlotVisitor&) 8 0x108cafe04 JSC::SlotVisitor::visitChildren(JSC::JSCell const*) 9 0x108cb02cb JSC::SlotVisitor::drain(WTF::MonotonicTime) 10 0x108cb0b78 JSC::SlotVisitor::drainFromShared(JSC::SlotVisitor::SharedDrainMode, WTF::MonotonicTime) 11 0x10870900e JSC::Heap::markToFixpoint(double)::$_2::operator()() const 12 0x108708c19 WTF::SharedTaskFunctor<void (), JSC::Heap::markToFixpoint(double)::$_2>::run() 13 0x108ea921d WTF::ParallelHelperClient::runTask(WTF::RefPtr<WTF::SharedTask<void ()> >) 14 0x108ea9f8f WTF::ParallelHelperPool::Thread::work() 15 0x108e7b86d WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0::operator()() const 16 0x108e7b62d void std::__1::__invoke_void_return_wrapper<void>::__call<WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0&>(WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0&&&) 17 0x108e7b3c9 std::__1::__function::__func<WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0, std::__1::allocator<WTF::AutomaticThread::start(WTF::Locker<WTF::LockBase> const&)::$_0>, void ()>::operator()() 18 0x10846586a std::__1::function<void ()>::operator()() const 19 0x108edeaa7 WTF::threadEntryPoint(void*) 20 0x108ee0471 WTF::wtfThreadEntryPoint(void*) 21 0x7fffbfc9faab _pthread_body 22 0x7fffbfc9f9f7 _pthread_body 23 0x7fffbfc9f221 thread_start Segmentation fault: 11
Michael Saboff
Comment 15 2017-01-12 15:33:44 PST
For my crash, the main thread had this backtrace. All other threads were waiting. Thread 0:: Dispatch queue: com.apple.main-thread 0 com.apple.JavaScriptCore 0x0000000108cdb991 JSC::Structure::pin(JSC::VM&, JSC::PropertyTable*) + 1 (Structure.cpp:804) 1 com.apple.JavaScriptCore 0x0000000107a97622 int JSC::Structure::addPropertyWithoutTransition<JSC::JSObject::prepareToPutDirectWithoutTransition(JSC::VM&, JSC::PropertyName, unsigned int, unsigned int, JSC::Structure*)::'lambda'(JSC::GCSafeConcurrentJSLocker const&, int, int)>(JSC::VM&, JSC::PropertyName, unsigned int, JSC::JSObject::prepareToPutDirectWithoutTransition(JSC::VM&, JSC::PropertyName, unsigned int, unsigned int, JSC::Structure*)::'lambda'(JSC::GCSafeConcurrentJSLocker const&, int, int) const&) + 82 (StructureInlines.h:354) 2 com.apple.JavaScriptCore 0x0000000107a9606a JSC::JSObject::prepareToPutDirectWithoutTransition(JSC::VM&, JSC::PropertyName, unsigned int, unsigned int, JSC::Structure*) + 138 (JSObjectInlines.h:192) 3 com.apple.JavaScriptCore 0x00000001080e90c0 bool JSC::JSObject::putDirectInternal<(JSC::JSObject::PutMode)0>(JSC::VM&, JSC::PropertyName, JSC::JSValue, unsigned int, JSC::PutPropertySlot&) + 928 (JSObjectInlines.h:278) 4 com.apple.JavaScriptCore 0x00000001080e89dc JSC::JSObject::putInline(JSC::JSCell*, JSC::ExecState*, JSC::PropertyName, JSC::JSValue, JSC::PutPropertySlot&) + 924 (JSObjectInlines.h:215) 5 com.apple.JavaScriptCore 0x000000010893f845 JSC::JSObject::put(JSC::JSCell*, JSC::ExecState*, JSC::PropertyName, JSC::JSValue, JSC::PutPropertySlot&) + 69 (JSObject.cpp:751) 6 com.apple.JavaScriptCore 0x00000001080e32fe JSC::JSValue::put(JSC::ExecState*, JSC::PropertyName, JSC::JSValue, JSC::PutPropertySlot&) + 206 (JSCJSValueInlines.h:867) 7 com.apple.JavaScriptCore 0x00000001083caeb7 void JSC::DFG::operationPutByValInternal<false, false>(JSC::ExecState*, long long, long long, long long) + 807 (DFGOperations.cpp:145) 8 com.apple.JavaScriptCore 0x00000001083cab7b operationPutByValNonStrict + 75 (DFGOperations.cpp:624) 9 ??? 0x00003def31a03067 0 + 68097539059815 10 com.apple.JavaScriptCore 0x0000000108a3928e vmEntryToJavaScript + 334 11 com.apple.JavaScriptCore 0x000000010884d269 JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*) + 329 (JITCode.cpp:81) 12 com.apple.JavaScriptCore 0x0000000108805534 JSC::Interpreter::execute(JSC::ProgramExecutable*, JSC::ExecState*, JSC::JSObject*) + 4788 (Interpreter.cpp:871) 13 com.apple.JavaScriptCore 0x00000001080ebd8d JSC::evaluate(JSC::ExecState*, JSC::SourceCode const&, JSC::JSValue, WTF::NakedPtr<JSC::Exception>&) + 669 (Completion.cpp:110) 14 jsc 0x000000010798a9eb runWithScripts(GlobalObject*, WTF::Vector<Script, 0ul, WTF::CrashOnOverflow, 16ul> const&, WTF::String const&, bool, bool, bool) + 2123 (jsc.cpp:2957) 15 jsc 0x000000010797c46a runJSC(JSC::VM*, CommandLine) + 1370 (jsc.cpp:3237) 16 jsc 0x000000010797afcd jscmain(int, char**) + 157 (jsc.cpp:3292) 17 jsc 0x000000010797af1e main + 46 (jsc.cpp:2804) 18 libdyld.dylib 0x00007fffbfa88255 start + 1
Saam Barati
Comment 16 2017-01-12 16:09:20 PST
Created attachment 298729 [details] patch for landing Discussed with Fil and other how the race could happen over IRC and why that assert is wrong: [15:36:24] <&pizlo> 1) mutator fires write barrier on black-unmarked object during full GC [15:36:33] <&pizlo> 2) mutator CASes the color of the object from black to white [15:36:37] <&pizlo> 3) GC marks the object [15:36:43] <&pizlo> 4) GC sets the object's color to grey [15:36:54] <&pizlo> 5) mutator realizes that the object is now marked [15:36:59] <&pizlo> 6) mutator sets the object's color to black [15:37:07] <&pizlo> 7) GC crashes because the color of the object is not grey Also note, that it could end up being white, since the setState(PossiblyGrey) isn't CAS and can leave it being white in a not sequentially consistent world.
WebKit Commit Bot
Comment 17 2017-01-12 19:13:14 PST
Comment on attachment 298729 [details] patch for landing Clearing flags on attachment: 298729 Committed r210694: <http://trac.webkit.org/changeset/210694>
WebKit Commit Bot
Comment 18 2017-01-12 19:13:21 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.