154937 – Heap Snapshot should include different Edge types and data (Property, Index, Variable)

RESOLVED FIXED 154937

Heap Snapshot should include different Edge types and data (Property, Index, Variable)

https://bugs.webkit.org/show_bug.cgi?id=154937

Summary Heap Snapshot should include different Edge types and data (Property, Index, ...

Joseph Pecoraro

Reported 2016-03-02 16:28:43 PST

* SUMMARY Heap Snapshot should include different Edge types and data (Property, Index, Variable). This will be useful in visualizing exactly what the retaining edge it. * Property Edge - data is the property name string - JS*Object with inline storage ({x:"x", y:"y"}) - JS*Object with out of line storage (objects with a bunch of properties, dictionary mode structures) - JS*Object with index > 2^32 => becomes a string property We should mark the edge from "Object" cell (myObject) to "string" cell ("value") is a Property with name "propertyName" in example: var myObject = {}; myObject.propertyName = "value"; * Index Edge - data is the uint32_t index - JS*Object with indexed values (o[0]) We should mark the edge from "Object" cell (myArray) to "string" cell ("value") is an Index with index 100 in example: var myArray = []; myArray[23] = "value"; * Variable Edge - data is the variable name string - JSScope subclasses such as (JSGlobalLexicalEnvironment, JSLexicalEnvironment, JSSegmentedVariableObject) - a closure variable name in global scope - a closure variable name in function scope We should mark the edge from "Function" cell's scope (myFunction) to "string" cell ("value") is a Variable with name "closureVariable" in example: var closureVariable = "value"; function myFunction() { console.log(closureVariable); }

Attachments
[PATCH] Proposed Fix (28.32 KB, patch) 2016-03-02 16:34 PST, Joseph Pecoraro	joepeck: review-	Details Formatted Diff Diff
[PATCH] Proposed Fix (36.90 KB, patch) 2016-03-02 19:53 PST, Joseph Pecoraro	no flags	Details Formatted Diff Diff
[PATCH] Proposed Fix (37.34 KB, patch) 2016-03-03 23:59 PST, Joseph Pecoraro	no flags	Details Formatted Diff Diff
Show Obsolete (2) View All Add attachment proposed patch, testcase, etc.

Joseph Pecoraro

Comment 1 2016-03-02 16:34:42 PST

Created attachment 272702 [details] [PATCH] Proposed Fix

Geoffrey Garen

Comment 2 2016-03-02 16:53:12 PST

Comment on attachment 272702 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=272702&action=review > Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97 > + std::lock_guard<Lock> lock(m_appendingEdgeMutex); The m_appendingEdgeMutex mutex guards the data in m_edges, so we should call it m_edgesMutex. You would take the mutex even if you were not appending. For example, you would take the mutex if you were reading. > Source/JavaScriptCore/runtime/JSEnvironmentRecord.cpp:47 > + ConcurrentJITLocker locker(thisObject->symbolTable()->m_lock); I don't think you need this because you're not writing to a value that the JIT will read. > Source/JavaScriptCore/runtime/JSObject.cpp:197 > + if (mode == VisitChildrenModeNormal) > + visitor.appendValues(butterfly->contiguous().data(), butterfly->publicLength()); > + else > + visitor.appendAndSnapshotIndexedProperties(butterfly->contiguous().data(), butterfly->publicLength()); > break; > case ALL_ARRAY_STORAGE_INDEXING_TYPES: > - visitor.appendValues(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength()); > + if (mode == VisitChildrenModeNormal) > + visitor.appendValues(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength()); > + else > + visitor.appendAndSnapshotIndexedProperties(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength()); I'm worried about snapshotting intruding into the logic of GC. GC is complex and easy to get wrong, and I'd like to keep it as simple as possible. I'd really prefer not to intermingle "do this or else we'll crash and/or have a security bug" code with "please gather this diagnostic information" code. Can we do better here at separating the logic of GC from the logic of recording diagnostic edges? Here's one idea: Add a new methodTable function: heapSnapshot. At the end of a heap snapshotting GC, iterate all live objects in the heap and call heapSnapshot on them. This may even make it possible to perform a snapshot without running a GC. What do you think? > Source/JavaScriptCore/runtime/JSSegmentedVariableObject.cpp:72 > + ConcurrentJITLocker locker(thisObject->symbolTable()->m_lock); I don't think you need this because you're not writing to a value that the JIT will read.

Joseph Pecoraro

Comment 3 2016-03-02 17:43:19 PST

Comment on attachment 272702 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=272702&action=review >> Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97 >> + std::lock_guard<Lock> lock(m_appendingEdgeMutex); > > The m_appendingEdgeMutex mutex guards the data in m_edges, so we should call it m_edgesMutex. You would take the mutex even if you were not appending. For example, you would take the mutex if you were reading. At this point we only have simultaneous access to m_edges during building. When building is complete we don't use a lock to access the list. I don't foresee a need to access edges while building, but perhaps the name "m_buildingEdgesMutex" would be clearer? Then if someone repurposes this lock it would be clear that they should rename it. >> Source/JavaScriptCore/runtime/JSObject.cpp:197 >> + visitor.appendAndSnapshotIndexedProperties(butterfly->arrayStorage()->m_vector, butterfly->arrayStorage()->vectorLength()); > > I'm worried about snapshotting intruding into the logic of GC. GC is complex and easy to get wrong, and I'd like to keep it as simple as possible. I'd really prefer not to intermingle "do this or else we'll crash and/or have a security bug" code with "please gather this diagnostic information" code. > > Can we do better here at separating the logic of GC from the logic of recording diagnostic edges? > > Here's one idea: > > Add a new methodTable function: heapSnapshot. At the end of a heap snapshotting GC, iterate all live objects in the heap and call heapSnapshot on them. > > This may even make it possible to perform a snapshot without running a GC. > > What do you think? I think that will work well! - Still have SlotVisitor::append add a basic edge during snapshotting GC (gets us all internal edges) - For these few places where we have things we want to better label: - SlotVisitor::appendHidden (name TBD) behaves the same as SlotVisitor::append without informing the snapshot builder - Method Table function heapSnapshot on live cells to name the hidden edges My original concern was making sure snapshotting edges had 100% accurate data so I put it in visitChildren. However, since there are only a handful of cases that want extra data it would be easy to move that out of visitChildren. Also, it is an easy pattern to follow. --- I also have plans to do a micro snapshot without running GC by looking at the heap. However the plans for that are only for node data and not edges. I'm not sure how useful it would be to gather edge data outside of a GC.

Joseph Pecoraro

Comment 4 2016-03-02 19:52:30 PST

Comment on attachment 272702 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=272702&action=review >> Source/JavaScriptCore/runtime/JSSegmentedVariableObject.cpp:72 >> + ConcurrentJITLocker locker(thisObject->symbolTable()->m_lock); > > I don't think you need this because you're not writing to a value that the JIT will read. These locks are only used to get the iterator begin/end. I don't see any way to iterate the symbolTable() without a ConcurrentJITLocker. I may have some questions about this tomorrow.

Joseph Pecoraro

Comment 5 2016-03-02 19:53:13 PST

Created attachment 272722 [details] [PATCH] Proposed Fix

Joseph Pecoraro

Comment 6 2016-03-02 19:57:18 PST

Comment on attachment 272722 [details] [PATCH] Proposed Fix View in context: https://bugs.webkit.org/attachment.cgi?id=272722&action=review > Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97 > +void HeapSnapshotBuilder::appendPropertyNameEdge(JSCell* from, JSCell* to, UniquedStringImpl* propertyName) > +{ > + ASSERT(m_profiler.activeSnapshotBuilder() == this); > + ASSERT(to); > + > + std::lock_guard<Lock> lock(m_buildingEdgeMutex); With gathering extra heap snapshot data (named edges) no longer happening via SlotVisitor it is no longer in parallel, so we could avoid locking in these specialized cases. However it may be worth adding a parallel way to visit live cells.

Geoffrey Garen

Comment 7 2016-03-03 08:59:05 PST

> >> Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:97 > >> + std::lock_guard<Lock> lock(m_appendingEdgeMutex); > > > > The m_appendingEdgeMutex mutex guards the data in m_edges, so we should call it m_edgesMutex. You would take the mutex even if you were not appending. For example, you would take the mutex if you were reading. > > At this point we only have simultaneous access to m_edges during building. Okeedokee. > - For these few places where we have things we want to better label: > - SlotVisitor::appendHidden (name TBD) behaves the same as > SlotVisitor::append without informing the snapshot builder > - Method Table function heapSnapshot on live cells to name the hidden edges Is the purpose of appendHidden() just to say "don't label this normally, because I'm going to label it specially later in heapSnapshot()?" If so, can we remove the distinction between append() and appendHidden(), and just have heapSnapshot() override or remove the edge added by append()? This will serve two goals: First, it will further reduce the overlap between the GC algorithm and the snapshotting algorithm; and second, it will ensure accurate accounting, by preventing cases where appendHidden() and heapSnapshot() get out of sync.

Joseph Pecoraro

Comment 8 2016-03-03 23:50:56 PST

> > - For these few places where we have things we want to better label: > > - SlotVisitor::appendHidden (name TBD) behaves the same as > > SlotVisitor::append without informing the snapshot builder > > - Method Table function heapSnapshot on live cells to name the hidden edges > > Is the purpose of appendHidden() just to say "don't label this normally, > because I'm going to label it specially later in heapSnapshot()?" Yep. > If so, can > we remove the distinction between append() and appendHidden(), and just have > heapSnapshot() override or remove the edge added by append()? This will > serve two goals: First, it will further reduce the overlap between the GC > algorithm and the snapshotting algorithm; and second, it will ensure > accurate accounting, by preventing cases where appendHidden() and > heapSnapshot() get out of sync. I can move in that direction. To do it efficiently I think I will want to sort and de/dup the list of edges for serialization. Sorting the edges is something I had already considered doing for other reasons. Do you want to see an updated patch taking this approach or can this be done in a follow-up?

Joseph Pecoraro

Comment 9 2016-03-03 23:59:05 PST

Created attachment 272839 [details] [PATCH] Proposed Fix Rebaselined.

Geoffrey Garen

Comment 10 2016-03-07 14:52:26 PST

Comment on attachment 272839 [details] [PATCH] Proposed Fix r=me

WebKit Commit Bot

Comment 11 2016-03-07 15:45:28 PST

Comment on attachment 272839 [details] [PATCH] Proposed Fix Clearing flags on attachment: 272839 Committed r197712: <http://trac.webkit.org/changeset/197712>

WebKit Commit Bot

Comment 12 2016-03-07 15:45:33 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS Unspecified

Product WebKit

Component JavaScriptCore

Assignee

Joseph Pecoraro

Reported

2016-03-02 16:28 PST

Modified

2016-03-07 15:45 PST History

CC List

11 users Show

URL

Keywords

Depends on

Blocks