Bug 154937

Summary: Heap Snapshot should include different Edge types and data (Property, Index, Variable)
Product: WebKit Reporter: Joseph Pecoraro <joepeck>
Component: JavaScriptCoreAssignee: Joseph Pecoraro <joepeck>
Status: RESOLVED FIXED    
Severity: Normal CC: bburg, commit-queue, fpizlo, ggaren, joepeck, keith_miller, kling, mark.lam, msaboff, saam, timothy
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
[PATCH] Proposed Fix
joepeck: review-
[PATCH] Proposed Fix
none
[PATCH] Proposed Fix none

Description Joseph Pecoraro 2016-03-02 16:28:43 PST
* SUMMARY
Heap Snapshot should include different Edge types and data (Property, Index, Variable).

This will be useful in visualizing exactly what the retaining edge it.

* Property Edge
- data is the property name string
- JS*Object with inline storage ({x:"x", y:"y"})
- JS*Object with out of line storage (objects with a bunch of properties, dictionary mode structures)
- JS*Object with index > 2^32 => becomes a string property

We should mark the edge from "Object" cell (myObject) to "string" cell ("value") is a Property with name "propertyName" in example:

    var myObject = {};
    myObject.propertyName = "value";


* Index Edge
- data is the uint32_t index
- JS*Object with indexed values (o[0])

We should mark the edge from "Object" cell (myArray) to "string" cell ("value") is an Index with index 100 in example:

    var myArray = [];
    myArray[23] = "value";


* Variable Edge
- data is the variable name string
- JSScope subclasses such as (JSGlobalLexicalEnvironment, JSLexicalEnvironment, JSSegmentedVariableObject)
- a closure variable name in global scope
- a closure variable name in function scope

We should mark the edge from "Function" cell's scope (myFunction) to "string" cell ("value") is a Variable with name "closureVariable" in example:

      var closureVariable = "value";
      function myFunction() {
          console.log(closureVariable);
      }
Comment 1 Joseph Pecoraro 2016-03-02 16:34:42 PST
Created attachment 272702 [details]
[PATCH] Proposed Fix
Comment 2 Geoffrey Garen 2016-03-02 16:53:12 PST
Comment on attachment 272702 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=272702&action=review

> Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97
> +    std::lock_guard<Lock> lock(m_appendingEdgeMutex);

The m_appendingEdgeMutex mutex guards the data in m_edges, so we should call it m_edgesMutex. You would take the mutex even if you were not appending. For example, you would take the mutex if you were reading.

> Source/JavaScriptCore/runtime/JSEnvironmentRecord.cpp:47
> +        ConcurrentJITLocker locker(thisObject->symbolTable()->m_lock);

I don't think you need this because you're not writing to a value that the JIT will read.

> Source/JavaScriptCore/runtime/JSObject.cpp:197
> +        if (mode == VisitChildrenModeNormal)
> +            visitor.appendValues(butterfly->contiguous().data(), butterfly->publicLength());
> +        else
> +            visitor.appendAndSnapshotIndexedProperties(butterfly->contiguous().data(), butterfly->publicLength());
>          break;
>      case ALL_ARRAY_STORAGE_INDEXING_TYPES:
> -        visitor.appendValues(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength());
> +        if (mode == VisitChildrenModeNormal)
> +            visitor.appendValues(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength());
> +        else
> +            visitor.appendAndSnapshotIndexedProperties(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength());

I'm worried about snapshotting intruding into the logic of GC. GC is complex and easy to get wrong, and I'd like to keep it as simple as possible. I'd really prefer not to intermingle "do this or else we'll crash and/or have a security bug" code with "please gather this diagnostic information" code.

Can we do better here at separating the logic of GC from the logic of recording diagnostic edges?

Here's one idea:

Add a new methodTable function: heapSnapshot. At the end of a heap snapshotting GC, iterate all live objects in the heap and call heapSnapshot on them.

This may even make it possible to perform a snapshot without running a GC.

What do you think?

> Source/JavaScriptCore/runtime/JSSegmentedVariableObject.cpp:72
> +        ConcurrentJITLocker locker(thisObject->symbolTable()->m_lock);

I don't think you need this because you're not writing to a value that the JIT will read.
Comment 3 Joseph Pecoraro 2016-03-02 17:43:19 PST
Comment on attachment 272702 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=272702&action=review

>> Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97
>> +    std::lock_guard<Lock> lock(m_appendingEdgeMutex);
> 
> The m_appendingEdgeMutex mutex guards the data in m_edges, so we should call it m_edgesMutex. You would take the mutex even if you were not appending. For example, you would take the mutex if you were reading.

At this point we only have simultaneous access to m_edges during building. When building is complete we don't use a lock to access the list. I don't foresee a need to access edges while building, but perhaps the name "m_buildingEdgesMutex" would be clearer? Then if someone repurposes this lock it would be clear that they should rename it.

>> Source/JavaScriptCore/runtime/JSObject.cpp:197
>> +            visitor.appendAndSnapshotIndexedProperties(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength());
> 
> I'm worried about snapshotting intruding into the logic of GC. GC is complex and easy to get wrong, and I'd like to keep it as simple as possible. I'd really prefer not to intermingle "do this or else we'll crash and/or have a security bug" code with "please gather this diagnostic information" code.
> 
> Can we do better here at separating the logic of GC from the logic of recording diagnostic edges?
> 
> Here's one idea:
> 
> Add a new methodTable function: heapSnapshot. At the end of a heap snapshotting GC, iterate all live objects in the heap and call heapSnapshot on them.
> 
> This may even make it possible to perform a snapshot without running a GC.
> 
> What do you think?

I think that will work well!

- Still have SlotVisitor::append add a basic edge during snapshotting GC (gets us all internal edges)
- For these few places where we have things we want to better label:
  - SlotVisitor::appendHidden (name TBD) behaves the same as SlotVisitor::append without informing the snapshot builder
  - Method Table function heapSnapshot on live cells to name the hidden edges

My original concern was making sure snapshotting edges had 100% accurate data so I put it in visitChildren. However, since there are only a handful of cases that want extra data it would be easy to move that out of visitChildren. Also, it is an easy pattern to follow.

---

I also have plans to do a micro snapshot without running GC by looking at the heap. However the plans for that are only for node data and not edges. I'm not sure how useful it would be to gather edge data outside of a GC.
Comment 4 Joseph Pecoraro 2016-03-02 19:52:30 PST
Comment on attachment 272702 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=272702&action=review

>> Source/JavaScriptCore/runtime/JSSegmentedVariableObject.cpp:72
>> +        ConcurrentJITLocker locker(thisObject->symbolTable()->m_lock);
> 
> I don't think you need this because you're not writing to a value that the JIT will read.

These locks are only used to get the iterator begin/end. I don't see any way to iterate the symbolTable() without a ConcurrentJITLocker. I may have some questions about this tomorrow.
Comment 5 Joseph Pecoraro 2016-03-02 19:53:13 PST
Created attachment 272722 [details]
[PATCH] Proposed Fix
Comment 6 Joseph Pecoraro 2016-03-02 19:57:18 PST
Comment on attachment 272722 [details]
[PATCH] Proposed Fix

View in context: https://bugs.webkit.org/attachment.cgi?id=272722&action=review

> Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97
> +void HeapSnapshotBuilder::appendPropertyNameEdge(JSCell* from, JSCell* to, UniquedStringImpl* propertyName)
> +{
> +    ASSERT(m_profiler.activeSnapshotBuilder() == this);
> +    ASSERT(to);
> +
> +    std::lock_guard<Lock> lock(m_buildingEdgeMutex);

With gathering extra heap snapshot data (named edges) no longer happening via SlotVisitor it is no longer in parallel, so we could avoid locking in these specialized cases. However it may be worth adding a parallel way to visit live cells.
Comment 7 Geoffrey Garen 2016-03-03 08:59:05 PST
> >> Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97
> >> +    std::lock_guard<Lock> lock(m_appendingEdgeMutex);
> > 
> > The m_appendingEdgeMutex mutex guards the data in m_edges, so we should call it m_edgesMutex. You would take the mutex even if you were not appending. For example, you would take the mutex if you were reading.
> 
> At this point we only have simultaneous access to m_edges during building.

Okeedokee.

> - For these few places where we have things we want to better label:
>   - SlotVisitor::appendHidden (name TBD) behaves the same as
> SlotVisitor::append without informing the snapshot builder
>   - Method Table function heapSnapshot on live cells to name the hidden edges

Is the purpose of appendHidden() just to say "don't label this normally, because I'm going to label it specially later in heapSnapshot()?" If so, can we remove the distinction between append() and appendHidden(), and just have heapSnapshot() override or remove the edge added by append()? This will serve two goals: First, it will further reduce the overlap between the GC algorithm and the snapshotting algorithm; and second, it will ensure accurate accounting, by preventing cases where appendHidden() and heapSnapshot() get out of sync.
Comment 8 Joseph Pecoraro 2016-03-03 23:50:56 PST
> > - For these few places where we have things we want to better label:
> >   - SlotVisitor::appendHidden (name TBD) behaves the same as
> > SlotVisitor::append without informing the snapshot builder
> >   - Method Table function heapSnapshot on live cells to name the hidden edges
> 
> Is the purpose of appendHidden() just to say "don't label this normally,
> because I'm going to label it specially later in heapSnapshot()?"

Yep.

> If so, can
> we remove the distinction between append() and appendHidden(), and just have
> heapSnapshot() override or remove the edge added by append()? This will
> serve two goals: First, it will further reduce the overlap between the GC
> algorithm and the snapshotting algorithm; and second, it will ensure
> accurate accounting, by preventing cases where appendHidden() and
> heapSnapshot() get out of sync.

I can move in that direction. To do it efficiently I think I will want to sort and de/dup the list of edges for serialization. Sorting the edges is something I had already considered doing for other reasons.

Do you want to see an updated patch taking this approach or can this be done in a follow-up?
Comment 9 Joseph Pecoraro 2016-03-03 23:59:05 PST
Created attachment 272839 [details]
[PATCH] Proposed Fix

Rebaselined.
Comment 10 Geoffrey Garen 2016-03-07 14:52:26 PST
Comment on attachment 272839 [details]
[PATCH] Proposed Fix

r=me
Comment 11 WebKit Commit Bot 2016-03-07 15:45:28 PST
Comment on attachment 272839 [details]
[PATCH] Proposed Fix

Clearing flags on attachment: 272839

Committed r197712: <http://trac.webkit.org/changeset/197712>
Comment 12 WebKit Commit Bot 2016-03-07 15:45:33 PST
All reviewed patches have been landed.  Closing bug.