Bug 227757

Summary: Fix uses of Dependency::fence with respect to the compiler outsmarting us
Product: WebKit Reporter: Saam Barati <saam>
Component: JavaScriptCoreAssignee: Saam Barati <saam>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, ews-watchlist, keith_miller, mark.lam, msaboff, rmorisset, tzagallo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch I'm using to test
none
patch
rmorisset: review+
patch for landing none

Description Saam Barati 2021-07-07 11:22:00 PDT
stress/compare-number-and-other.js.ftl-eager: GC Verifier: ERROR cell 0x102e49ba0 was not marked
stress/compare-number-and-other.js.ftl-eager: test_script_33113: line 2: 39217 Trace/BPT trap: 5       ( "$@" ../../.vm/JavaScriptCore.framework/Helpers/jsc --useFTLJIT\=false --useFunctionDotArguments\=true --validateExceptionChecks\=true --useDollarVM\=true --maxPerThreadStackUsage\=1572864 --airForceBriggsAllocator\=true --useRandomizingExecutableIslandAllocation\=true --forcePolyProto\=true --useDataIC\=true --useFTLJIT\=true --thresholdForJITAfterWarmUp\=10 --thresholdForJITSoon\=10 --thresholdForOptimizeAfterWarmUp\=20 --thresholdForOptimizeAfterLongWarmUp\=20 --thresholdForOptimizeSoon\=20 --thresholdForFTLOptimizeAfterWarmUp\=20 --thresholdForFTLOptimizeSoon\=20 --thresholdForOMGOptimizeAfterWarmUp\=20 --thresholdForOMGOptimizeSoon\=20 --maximumEvalCacheableSourceLength\=150000 --useEagerCodeBlockJettisonTiming\=true --repatchBufferingCountdown\=0 --collectContinuously\=true --useGenerationalGC\=false --verifyGC\=true compare-number-and-other.js )

Running on an M1 Mac.
Comment 1 Radar WebKit Bug Importer 2021-07-07 11:34:29 PDT
<rdar://problem/80280931>
Comment 2 Saam Barati 2021-07-13 12:37:37 PDT
Been looking into this since yesterday. Adding more logging. Current status is that I'm pretty confused, and not sure what's going on. My logging is indicating something different than what the verifier is saying. Some of my logging is saying an object is marked or that we're marking it, but the verifier is disagreeing
Comment 3 Saam Barati 2021-07-14 17:24:32 PDT
Still getting some weird data. We're seeing a cell that is marked, but then somehow the verifier sees it as unmarked when it look at the normal mark bits.

Need to spend some more time verifying that this is actually what's happening, because it's very peculiar.
Comment 4 Saam Barati 2021-07-15 16:59:29 PDT
Issue seems to go away if I turn off parallel marking constraint solver
Comment 5 Saam Barati 2021-07-16 16:35:40 PDT
(In reply to Saam Barati from comment #4)
> Issue seems to go away if I turn off parallel marking constraint solver

This is wrong. It doesn't go away. But it becomes less common.
Comment 6 Saam Barati 2021-07-20 19:17:39 PDT
What I'm seeing is truly bizarre at this point. Starting to look into the low level details. We're seeing an isMarked check return true then return false for the same object, inside the same function.

Like:

if (visitor.isMarked(this)) {
   // do some things
   RELEASE_ASSERT(visitor.isMarked(this)); // <---- crashes!!!!
}
Comment 7 Saam Barati 2021-07-20 19:29:29 PDT
Created attachment 433916 [details]
patch I'm using to test
Comment 8 Saam Barati 2021-07-20 22:56:33 PDT
The only change needed to apply to ToT to get things to crash is:

Index: Source/JavaScriptCore/bytecode/CodeBlock.cpp
===================================================================
--- Source/JavaScriptCore/bytecode/CodeBlock.cpp	(revision 280124)
+++ Source/JavaScriptCore/bytecode/CodeBlock.cpp	(working copy)
@@ -1214,8 +1214,16 @@ void CodeBlock::determineLiveness(const 
     
 #if ENABLE(DFG_JIT)
     VM& vm = *m_vm;
-    if (visitor.isMarked(this))
+    if (visitor.isMarked(this)) {
+        WTF::compilerFence();
+        if (!visitor.isMarked(this)) {
+            dataLogLn("bit flip, crashing");
+            dataLogLn();
+            RELEASE_ASSERT_NOT_REACHED();
+        }
+
         return;
+    }
     
     // In rare and weird cases, this could be called on a baseline CodeBlock. One that I found was
     // that we might decide that the CodeBlock should be jettisoned due to old age, so the
Comment 9 Saam Barati 2021-07-21 17:59:27 PDT
Created attachment 433981 [details]
patch
Comment 10 Saam Barati 2021-07-21 18:01:01 PDT
Comment on attachment 433981 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433981&action=review

> Source/WTF/wtf/Atomics.h:439
> +    static Dependency loadAndFence(T* pointer, T& output)

I could also change the API to be pair based instead of using an out parameter.
Comment 11 Robin Morisset 2021-07-22 10:01:56 PDT
Comment on attachment 433981 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433981&action=review

r=me

> Source/JavaScriptCore/runtime/SparseArrayValueMap.cpp:-170
> -JSValue SparseArrayEntry::getConcurrently() const

I'm a bit surprised that you had to remove const here.
Could this be fixed by also having a version of loadAndFence that takes a const T* ?

>> Source/WTF/wtf/Atomics.h:439
>> +    static Dependency loadAndFence(T* pointer, T& output)
> 
> I could also change the API to be pair based instead of using an out parameter.

It would be cleaner, but considering that this is going to be on very hot paths, and my lack of trust in LLVM's abilitiy to optimize a struct return type, I'm in favor of sticking with the out param.
Comment 12 Saam Barati 2021-07-22 11:21:34 PDT
Comment on attachment 433981 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433981&action=review

>> Source/JavaScriptCore/runtime/SparseArrayValueMap.cpp:-170
>> -JSValue SparseArrayEntry::getConcurrently() const
> 
> I'm a bit surprised that you had to remove const here.
> Could this be fixed by also having a version of loadAndFence that takes a const T* ?

Good point. I think it'll be as easy as taking 'const T*' in that function.

>>> Source/WTF/wtf/Atomics.h:439
>>> +    static Dependency loadAndFence(T* pointer, T& output)
>> 
>> I could also change the API to be pair based instead of using an out parameter.
> 
> It would be cleaner, but considering that this is going to be on very hot paths, and my lack of trust in LLVM's abilitiy to optimize a struct return type, I'm in favor of sticking with the out param.

ok, I'll leave it as is for now.
Comment 13 Saam Barati 2021-07-22 12:02:58 PDT
Created attachment 434025 [details]
patch for landing
Comment 14 EWS 2021-07-22 13:05:16 PDT
Committed r280195 (239883@main): <https://commits.webkit.org/239883@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 434025 [details].