Bug 227757 - Fix uses of Dependency::fence with respect to the compiler outsmarting us
Summary: Fix uses of Dependency::fence with respect to the compiler outsmarting us
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Saam Barati
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-07-07 11:22 PDT by Saam Barati
Modified: 2021-07-22 13:05 PDT (History)
10 users (show)

See Also:


Attachments
patch I'm using to test (17.04 KB, patch)
2021-07-20 19:29 PDT, Saam Barati
no flags Details | Formatted Diff | Diff
patch (12.95 KB, patch)
2021-07-21 17:59 PDT, Saam Barati
rmorisset: review+
Details | Formatted Diff | Diff
patch for landing (12.02 KB, patch)
2021-07-22 12:02 PDT, Saam Barati
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Saam Barati 2021-07-07 11:22:00 PDT
stress/compare-number-and-other.js.ftl-eager: GC Verifier: ERROR cell 0x102e49ba0 was not marked
stress/compare-number-and-other.js.ftl-eager: test_script_33113: line 2: 39217 Trace/BPT trap: 5       ( "$@" ../../.vm/JavaScriptCore.framework/Helpers/jsc --useFTLJIT\=false --useFunctionDotArguments\=true --validateExceptionChecks\=true --useDollarVM\=true --maxPerThreadStackUsage\=1572864 --airForceBriggsAllocator\=true --useRandomizingExecutableIslandAllocation\=true --forcePolyProto\=true --useDataIC\=true --useFTLJIT\=true --thresholdForJITAfterWarmUp\=10 --thresholdForJITSoon\=10 --thresholdForOptimizeAfterWarmUp\=20 --thresholdForOptimizeAfterLongWarmUp\=20 --thresholdForOptimizeSoon\=20 --thresholdForFTLOptimizeAfterWarmUp\=20 --thresholdForFTLOptimizeSoon\=20 --thresholdForOMGOptimizeAfterWarmUp\=20 --thresholdForOMGOptimizeSoon\=20 --maximumEvalCacheableSourceLength\=150000 --useEagerCodeBlockJettisonTiming\=true --repatchBufferingCountdown\=0 --collectContinuously\=true --useGenerationalGC\=false --verifyGC\=true compare-number-and-other.js )

Running on an M1 Mac.
Comment 1 Radar WebKit Bug Importer 2021-07-07 11:34:29 PDT
<rdar://problem/80280931>
Comment 2 Saam Barati 2021-07-13 12:37:37 PDT
Been looking into this since yesterday. Adding more logging. Current status is that I'm pretty confused, and not sure what's going on. My logging is indicating something different than what the verifier is saying. Some of my logging is saying an object is marked or that we're marking it, but the verifier is disagreeing
Comment 3 Saam Barati 2021-07-14 17:24:32 PDT
Still getting some weird data. We're seeing a cell that is marked, but then somehow the verifier sees it as unmarked when it look at the normal mark bits.

Need to spend some more time verifying that this is actually what's happening, because it's very peculiar.
Comment 4 Saam Barati 2021-07-15 16:59:29 PDT
Issue seems to go away if I turn off parallel marking constraint solver
Comment 5 Saam Barati 2021-07-16 16:35:40 PDT
(In reply to Saam Barati from comment #4)
> Issue seems to go away if I turn off parallel marking constraint solver

This is wrong. It doesn't go away. But it becomes less common.
Comment 6 Saam Barati 2021-07-20 19:17:39 PDT
What I'm seeing is truly bizarre at this point. Starting to look into the low level details. We're seeing an isMarked check return true then return false for the same object, inside the same function.

Like:

if (visitor.isMarked(this)) {
   // do some things
   RELEASE_ASSERT(visitor.isMarked(this)); // <---- crashes!!!!
}
Comment 7 Saam Barati 2021-07-20 19:29:29 PDT
Created attachment 433916 [details]
patch I'm using to test
Comment 8 Saam Barati 2021-07-20 22:56:33 PDT
The only change needed to apply to ToT to get things to crash is:

Index: Source/JavaScriptCore/bytecode/CodeBlock.cpp
===================================================================
--- Source/JavaScriptCore/bytecode/CodeBlock.cpp	(revision 280124)
+++ Source/JavaScriptCore/bytecode/CodeBlock.cpp	(working copy)
@@ -1214,8 +1214,16 @@ void CodeBlock::determineLiveness(const 
     
 #if ENABLE(DFG_JIT)
     VM& vm = *m_vm;
-    if (visitor.isMarked(this))
+    if (visitor.isMarked(this)) {
+        WTF::compilerFence();
+        if (!visitor.isMarked(this)) {
+            dataLogLn("bit flip, crashing");
+            dataLogLn();
+            RELEASE_ASSERT_NOT_REACHED();
+        }
+
         return;
+    }
     
     // In rare and weird cases, this could be called on a baseline CodeBlock. One that I found was
     // that we might decide that the CodeBlock should be jettisoned due to old age, so the
Comment 9 Saam Barati 2021-07-21 17:59:27 PDT
Created attachment 433981 [details]
patch
Comment 10 Saam Barati 2021-07-21 18:01:01 PDT
Comment on attachment 433981 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433981&action=review

> Source/WTF/wtf/Atomics.h:439
> +    static Dependency loadAndFence(T* pointer, T& output)

I could also change the API to be pair based instead of using an out parameter.
Comment 11 Robin Morisset 2021-07-22 10:01:56 PDT
Comment on attachment 433981 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433981&action=review

r=me

> Source/JavaScriptCore/runtime/SparseArrayValueMap.cpp:-170
> -JSValue SparseArrayEntry::getConcurrently() const

I'm a bit surprised that you had to remove const here.
Could this be fixed by also having a version of loadAndFence that takes a const T* ?

>> Source/WTF/wtf/Atomics.h:439
>> +    static Dependency loadAndFence(T* pointer, T& output)
> 
> I could also change the API to be pair based instead of using an out parameter.

It would be cleaner, but considering that this is going to be on very hot paths, and my lack of trust in LLVM's abilitiy to optimize a struct return type, I'm in favor of sticking with the out param.
Comment 12 Saam Barati 2021-07-22 11:21:34 PDT
Comment on attachment 433981 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=433981&action=review

>> Source/JavaScriptCore/runtime/SparseArrayValueMap.cpp:-170
>> -JSValue SparseArrayEntry::getConcurrently() const
> 
> I'm a bit surprised that you had to remove const here.
> Could this be fixed by also having a version of loadAndFence that takes a const T* ?

Good point. I think it'll be as easy as taking 'const T*' in that function.

>>> Source/WTF/wtf/Atomics.h:439
>>> +    static Dependency loadAndFence(T* pointer, T& output)
>> 
>> I could also change the API to be pair based instead of using an out parameter.
> 
> It would be cleaner, but considering that this is going to be on very hot paths, and my lack of trust in LLVM's abilitiy to optimize a struct return type, I'm in favor of sticking with the out param.

ok, I'll leave it as is for now.
Comment 13 Saam Barati 2021-07-22 12:02:58 PDT
Created attachment 434025 [details]
patch for landing
Comment 14 EWS 2021-07-22 13:05:16 PDT
Committed r280195 (239883@main): <https://commits.webkit.org/239883@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 434025 [details].