Bug 157324 - References from code to Structures should be stronger than weak
Summary: References from code to Structures should be stronger than weak
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: All All
: P2 Normal
Assignee: Filip Pizlo
URL:
Keywords:
: 128072 (view as bug list)
Depends on:
Blocks: 157334
  Show dependency treegraph
 
Reported: 2016-05-03 14:02 PDT by Filip Pizlo
Modified: 2016-05-19 14:07 PDT (History)
6 users (show)

See Also:


Attachments
possible patch (7.72 KB, patch)
2016-05-03 14:34 PDT, Filip Pizlo
no flags Details | Formatted Diff | Diff
the patch (13.53 KB, patch)
2016-05-03 16:34 PDT, Filip Pizlo
mark.lam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Pizlo 2016-05-03 14:02:09 PDT
If code refers to a Structure and the Structure dies, then currently we'll kill the code.  This makes sense because the Structure could be the only thing left referring to some global object or prototype.

But this also causes unnecessary churn.  Sometimes there will be a structure that we just haven't really done anything with recently and so it appears dead.  The approach we use elsewhere in our type inference is that the type that the code uses is general enough to handle every past execution.  Having the GC clear code when some Structure it uses dies means that we forget that the code used that Structure.  We'll either assume that the code is more monomorphic than it really is (because after GC we patch in some other structure but not the deleted one, so it looks like we only ever saw the new structure), or we'll assume that it's crazier than it really is (because we'll remember that there had been some structure that caused deletion, so we'll assume that deletions might happen in the future, so we'll use a fully dynamic IC).

We should have a more nuanced policy: if it's cheap to mark a dead Structure then we should mark it just so that all of the code that refers to it remembers that there had been this exact Structure in the past.  If the code often goes through different Structures then we already have great mechanisms to realize that the code is nutty (namely, the PolymorphicAccess size limit).  But if the code just does this a handful of times then remembering this old Structure is probably net good:

- It obeys the "handle all past executions" law.
- It preserves the history of the property access, allowing a precise measure of its past polymorphism.
- It makes the code ready to run fast if the user decides to use that Structure again. Marking the Structure means it will stay in whatever property transition tables it was in, so if the program does the same thing it did in the past, it will get this old Structure.

Right now it looks like this is a progression in gbemu and it makes gbemu perform more deterministically. I still need to do more experiments to validate this.
Comment 1 Filip Pizlo 2016-05-03 14:34:21 PDT
Created attachment 278026 [details]
possible patch
Comment 2 Filip Pizlo 2016-05-03 16:34:09 PDT
Created attachment 278044 [details]
the patch
Comment 3 Mark Lam 2016-05-03 17:31:20 PDT
Comment on attachment 278044 [details]
the patch

View in context: https://bugs.webkit.org/attachment.cgi?id=278044&action=review

r=me with comments.

> Source/JavaScriptCore/ChangeLog:23
> +        This change introduces a more nuanced policy a more nuanced policy: if it's cheap to mark a

duplicate "a more nuanced policy".

> Source/JavaScriptCore/bytecode/PolymorphicAccess.cpp:562
> +    if (m_structure)
> +        result &= m_structure->markIfCheap(visitor);

If markIfCheap() returns true here, does it still make sense to continue below?  We might end up appending m_structure twice, but I suppose that is harmless in terms of correctness.

Out of curiosity, can Heap::isMarked(m_structure->previousID()) ever be false if m_structure->markIfCheap() is true?

> Source/JavaScriptCore/runtime/Structure.cpp:1141
> +    return (!m_globalObject || Heap::isMarked(m_globalObject.get()))
> +        && (!storedPrototypeObject() || Heap::isMarked(storedPrototypeObject()));

Can you clarify why this is considered to be "cheap"?  Is it because these are the conditions that do not require traversing the graphs of m_globalObject and storedPrototypeObject()?  I see.  So, "cheap" here means, the GC doesn't have to do much work for marking this structure, yes?
Comment 4 Filip Pizlo 2016-05-03 19:16:03 PDT
(In reply to comment #3)
> Comment on attachment 278044 [details]
> the patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=278044&action=review
> 
> r=me with comments.
> 
> > Source/JavaScriptCore/ChangeLog:23
> > +        This change introduces a more nuanced policy a more nuanced policy: if it's cheap to mark a
> 
> duplicate "a more nuanced policy".

Fixed.

> 
> > Source/JavaScriptCore/bytecode/PolymorphicAccess.cpp:562
> > +    if (m_structure)
> > +        result &= m_structure->markIfCheap(visitor);
> 
> If markIfCheap() returns true here, does it still make sense to continue
> below?  We might end up appending m_structure twice, but I suppose that is
> harmless in terms of correctness.

You're right!

I wanted to structure the code according to the intuition behind it: first handle all transitions that don't care about m_type and then switch on m_type to handle the per-type things.  That's the structure we use elsewhere in AccessCase, so I felt like it would be helpful for others to be consistent.

> 
> Out of curiosity, can Heap::isMarked(m_structure->previousID()) ever be
> false if m_structure->markIfCheap() is true?

Sure!  We could have just marked the global object and prototype, but have yet to trace beyond this and mark either m_structure or m_structure->previousID().

The propagateTransitions() code could be called at any stage in tracing and we must assume that objects could be in any state so long as the stack is black and the tri-color invariant holds.  As a reminder, the tri-color theory of marking says that: white = not marked, grey = marked but not yet visited, black = marked and visited. The invariant is that black objects cannot refer to white ones, because at the moment that we visit an object, we mark anything it refers to. Marking changes white objects to grey, but it doesn't change the color of objects that were already grey or black. In our GC, isMarked() is true when the object is not white. A grey object can be distinguished from a black one based on whether it's in the worklist. If it's not in the worklist anymore, then we have visited it, so it's black. This information can also be computed, under the right circumstances, using a combination of the mark bit and data in the object's header. A useful property of the tri-color invariant is that it's an exact property of our GC: just as the tri-color invariant doesn't prohibit grey or white objects from pointing to objects of any color, so too must our GC always assume that so long as the job of marking is not yet complete, it's possible for white or grey objects to refer to objects of any color.

So, m_structure and m_structure->previousID() could both be white, in which case m_structure->previousID() would return false. markIfCheap() would return true if m_structure->globalObject and m_structure->storedPrototype are both grey or black. That's totally possible, since white objects are allowed to reference grey or black objects. It doesn't violate the tri-color invariant, therefore it is possible.

> 
> > Source/JavaScriptCore/runtime/Structure.cpp:1141
> > +    return (!m_globalObject || Heap::isMarked(m_globalObject.get()))
> > +        && (!storedPrototypeObject() || Heap::isMarked(storedPrototypeObject()));
> 
> Can you clarify why this is considered to be "cheap"?  Is it because these
> are the conditions that do not require traversing the graphs of
> m_globalObject and storedPrototypeObject()?  I see.  So, "cheap" here means,
> the GC doesn't have to do much work for marking this structure, yes?

I'll write a comment about this.  Here's a detailed explanation.

A structure by itself is not an expensive thing to have in the heap. We control how big it is, and we're quite good at ensuring that those structures that we cache in this code are unlikely to be large. Structures are a VM-internal data structure that the user cannot directly manipulate, so even if we keep alive one that isn't of use to the user, we can be sure that this "leak" is bounded in size: we control the bound because we control the object. But, a structure strongly references its objects' global object and prototype. The global object and prototype can be arbitrarily large, and they may refer to an arbitrarily large user-constructed graph of other arbitrarily large objects.

Hence, if a VM-internal optimization like inline caching marks a structure even though there are no objects in the heap that have that structure as their structure, then the risk is that we will keep a global object or prototype alive too long. Ordinarily a structure would keep a global object or prototype alive because the structure is referenced from some object O and that object has G and P as their global object and prototype, respectively. We're semantically required to keep G and P alive if O is alive so the fact that this happens via some structure that acts as an intermediary is just fine.

But if there does not exist any object that uses the structure and the VM keeps it alive due to caching, then it's possible that this will keep an otherwise-dead global object and prototype alive.  That's bad, since this means that the VM's optimization may make a program that runs fine (because the ginormous global object got deallocated on next GC) into one that doesn't (it swaps or jetsams because the VM kept alive a ginormous global object across many GCs because the VM internally cached some otherwise-dead structure).

I've argued that structures by themselves are cheap to mark because the structure object itself is small and completely under our control. I've also argued that structures may be extremely expensive to mark because the structure may refer to a ginormous global object or prototype, and the structure may be the last reference to those ginormous things - so marking the structure would be uniquely responsible for increasing our memory footprint by a lot.

Hence, being "cheap" to mark means: marking this structure will definitely only add a small and bounded amount of bytes to the heap size.  This is true if the global object and prototype are already marked, since in that case, marking the structure will only increase footprint by the size of the structure itself, which has a small and bounded size.
Comment 5 Filip Pizlo 2016-05-03 19:23:11 PDT
Landed in http://trac.webkit.org/changeset/200405
Comment 6 Filip Pizlo 2016-05-19 14:07:05 PDT
*** Bug 128072 has been marked as a duplicate of this bug. ***