CodeBlocks get traced during garbage collection when their owner Executable* is found to be live and then traced. But CodeBlocks may also be traced because they are owned by DFGCodeBlocks. The latter is necessary because a DFG CodeBlock may be jettisoned - removed from its owner Executable* - because of recompilation. In fact, there may be multiple DFG CodeBlocks that had once belonged to the same Executable* that are no longer owned by it, but are owned by DFGCodeBlocks. This might happen if first the Executable's DFG CodeBlock is thrown out due to recompilation (because its speculations were found to be ineffective), but the CodeBlock was kept alive because there were still functions executing on the JS stack that were using that CodeBlock. Then another DFG CodeBlock may be compiled, and again thrown out, but still kept because of stack references.
This mechanism is implemented by always associating all DFG CodeBlocks with the DFGCodeBlocks data structure, which is part of the Heap. This way the Heap always knows what the set of DFG CodeBlocks (i.e. CodeBlocks that may be jettisoned and may need to have references to them from the stack tracked). Since it tracks both jettisoned and non-jettisoned CodeBlocks, some care must be taken to treat these differently in some cases.
If a CodeBlock is jettisoned but still has references to it from the stack, DFGCodeBlocks must trace it - and in fact it does so correctly. Unfortunately, it fails to check whether the CodeBlock was jettisoned. Because it has references to non-jettisoned CodeBlocks as well, it may end up tracing a CodeBlock that is also going to be traced by virtue of being owned by an Executable that was found to be live.
Our tracing methods (visitChildren, visitAggregate, and the like) are designed to be idempotent: if you call them more than once during GC, then the effect (modulo wasted CPU cycles) is the same as if you had executed them once. So in most cases, this bug would be harmless.
Except that the CodeBlock::visitAggregate method is actually only idempotent if called serially. If the same CodeBlock instance's visitAggregate method is called concurrently from multiple GC threads - something that can happen in a parallel GC - then the method may cause bizarre state corruption.
The state corruption arises from the one place where CodeBlock::visitAggregate may cause state changes other than marking: namely, its calls to ValueProfile::computeUpdatedPrediction(). This method will read a JSValue, do some things to it, and then store an empty JSValue back. On 64-bit, this turns out to be racy and imprecise if computeUpdatedPrediction() is called concurrently. But on 32-bit, you can get a crash due to JSValue tearing: as one thread bangs on the JSValue, the other thread may see the tag of the old value and payload of the new one, or vice-versa. For example the method might be lead to believe it has a heap pointer (based on the tag) when the payload actually contains some integer.
Hence this is a benign bug on 64-bit but a potential show-stopper on 32-bit.
And the fix is super simple: just have DFGCodeBlocks only call CodeBlock::visitAggregate() for those CodeBlocks that are known to be jettisoned and live, rather than calling that method for all live CodeBlocks.
Created attachment 129815 [details]
Comment on attachment 129815 [details]
I just realized that this fix is inadequate. The reason why we were scanning the CodeBlocks even if they were not jettisoned is that they might become jettisoned later in the GC!
The correct fix is to have a separate mechanism for ensuring that a CodeBlock is only scanned once. I will work on this...
Created attachment 129825 [details]
Landed in http://trac.webkit.org/changeset/109519
I wonder if a better long-term approach would be something along the lines of our unconditional finalizers: anything that needs serial post-GC fixup would register an unconditional serial finalizer to do the fixup.
(In reply to comment #6)
> I wonder if a better long-term approach would be something along the lines of our unconditional finalizers: anything that needs serial post-GC fixup would register an unconditional serial finalizer to do the fixup.
I'm not sure how that would work. The problem is that the outcome of the GC's trace is affected by what objects are marked, and CodeBlock::visitAggregate will mark objects. So if DFGCodeBlocks knows that a CodeBlock needs to do marking, then it should tell that CodeBlock to do marking.
But that same CodeBlock may at any time during marking be found to be live because its ownerExecutable is live, in which case it will also end up doing marking.
Because all of this is part of a fixpoint, I can't see an easy way to impose any sort of phasing.