Bug 154192

Summary: bmalloc: Unify VMHeap and Heap LargeObjects free lists to reduce fragmentation
Product: WebKit Reporter: Michael Saboff <msaboff>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: commit-queue, ggaren, mark.lam
Priority: P2    
Version: Other   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 154113    
Bug Blocks:    
Attachments:
Description Flags
Patch
ggaren: review-
Updated patch
ggaren: review-
Patch with suggested changes and updates
ggaren: review-
Changed based on feedback ggaren: review+

Description Michael Saboff 2016-02-12 13:42:31 PST
From https://bugs.webkit.org/show_bug.cgi?id=154113#c5:

Let's establish an invariant that the Heap never splits an object returned by the VMHeap. Our current splitting behavior for aligned allocation needlessly commits some dirty memory. Also, extra splitting might create surprising merge opportunities when moving objects between heaps, causing our fragmentation ASSERTs to fire.

I think we can take the "split and allocate" code in Heap and move it into helper functions. Then, both Heap and VMHeap can have identical "split and allocate" logic, and Heap can just immediately return whatever VMHeap returns, without worrying about further splits. It looks like the inputs to the function are always a large object and a segregated free list.
Comment 1 Michael Saboff 2016-02-12 22:47:32 PST
Created attachment 271273 [details]
Patch

I don't have a measure of how much memory this might save.
Comment 2 WebKit Commit Bot 2016-02-12 22:49:24 PST
Attachment 271273 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Heap.cpp:364:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/Heap.cpp:392:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/VMHeap.h:163:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/VMHeap.h:181:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 4 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Geoffrey Garen 2016-02-13 14:51:53 PST
Comment on attachment 271273 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=271273&action=review

> Source/bmalloc/bmalloc/VMHeap.h:74
> +inline LargeObject& splitAndAllocate(LargeObject& largeObject, SegregatedFreeList& freeList, size_t size, DoAllocationFunctor doAllocationFunctor)

You can simplify these functions and make them truer to their names if you change them to call setFree(false) inline. This will take care of silencing any insertion ASSERTs, while also removing some code from Heap.cpp and eliminating the need for a DoAllocationFunctor template parameter.

Another good reason to guarantee that the returned object is not free is that it would be nice for the call to vmAllocatePhysicalPagesSloppy() to drop the heap lock -- but we can't do that unless we mark our region as unmergeable first.

> Source/bmalloc/bmalloc/VMHeap.h:180
> +    largeObject = splitAndAllocate(largeObject, m_largeObjects, alignment, size,

I think the new behavior here is incompatible with the behavior of vmAllocatePhysicalPagesSloppy(), which comments that "Allocation must proceed left-to-right". If the passed-in alignment requirement is less than a page, we might split off a left overhang without committing its physical pages, creating a garbage memory problem.

Deallocation of physical pages must not deallocate a shared page when it overhangs left or right, and allocation of physical pages must allocate a shared page if it overhangs left or right. The current code assumes that an allocation overhang to the left is impossible, and as a result it optimizes out up to 16 calls to vmAllocatePhysicalPages() when allocating 1kB large objects on an 16kB page system. 

Please change vmAllocatePhysicalPagesSloppy() to round begin *down* instead of *up*, and then test MallocBench with the page size artificially set to 16kB. If performance is OK, then we can change vmAllocatePhysicalPagesSloppy() to be conservative, and this patch will work.

Otherwise, we need to think a little harder about how to make the patch work.
Comment 4 Michael Saboff 2016-02-17 22:26:52 PST
After several discussions with Geoff and trying out various changes to improve fragmentation, unifying the VMHeap and Heap free lists into one free list stored in the Heap reduce fragmentation.  As part of this change, going to repurpose the Owner value to be the state of pages in a LargeObject with three values, Virtual means that the pages are decommitted, Physical means that the pages have been committed, and Mixed means that the LargeObject contains pages that are a mix of the two.
Comment 5 Michael Saboff 2016-02-17 22:56:45 PST
Created attachment 271637 [details]
Updated patch
Comment 6 Geoffrey Garen 2016-02-18 10:56:46 PST
Comment on attachment 271637 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=271637&action=review

> Source/bmalloc/bmalloc/FreeList.cpp:40
> +LargeObject FreeList::takeGreedy()

Let's rename this to takeNonVirtual or takeNonVirtualGreedy to indicate the condition you've added.

But see below about removing the virtual check.

> Source/bmalloc/bmalloc/Heap.cpp:361
> +        vmAllocatePhysicalPagesSloppy(largeObject.begin(), largeObject.size());

You should add a comment explaining the we commit before splitting in order to avoid pathological split/merge commit/decommit churn.

> Source/bmalloc/bmalloc/Heap.cpp:413
> +void Heap::deallocateLargeObjectMemory(std::unique_lock<StaticMutex>& lock, LargeObject largeObject)

Let's call this deallocatePhysicalPages for consistency.

But see below for doing something different.

> Source/bmalloc/bmalloc/Heap.cpp:419
> +        // If we couldn't merge with our neighbors before because they were in the
> +        // VM heap, we can merge with them now.

This comment is wrong now. You should remove it. The comment above is why we merge.

> Source/bmalloc/bmalloc/Heap.cpp:434
> +    m_largeObjects.insert(largeObject);

It's a little gross that we're doing O(N^2) work here.

Also, the relationship between Heap and VMHeap is kind of muddled now, since the VMHeap claims to allocate large objects, but really it only allocates large chunks, and meanwhile the Heap now holds purely virtual objects.

Also, the fact that we don't prefer Physical and Mixed objects over Virtual objects at allocation time means that we'll sometimes commit physical pages needlessly.

I wasn't sure how to fix this yesterday, but now I am:

Maintain the SegregatedFreeList in the VMHeap, and move objects from Heap to VMHeap when we scavenge them. (Heap will still be responsible for committing physical pages to large object allocations if needed.)

You can do this by making VMState a two-bit field instead of an enum, with 1 bit for Virtual and 1 bit for Physical (and both bits implying Mixed). Merging states will just OR. The 0 state is invalid. Instead of an Owner, a SegregatedFreeList will have an expected bit value for the Physical bit. The Heap's will be 1 and the VMHeap's will be 0. Any free list entry is invalid if the Physical bit is wrong. Allocation must vmAllocatePhysicalPages if the Virtual bit is set. 

This solves our O(N^2) problem and adds some clarity. It also makes it so that allocations prefer Physical and Mixed objects over Virtual objects, which is a nice side benefit.

> Source/bmalloc/bmalloc/Heap.h:134
> +inline LargeObject& Heap::splitAndAllocate(LargeObject& largeObject, SegregatedFreeList& freeList, size_t size)

These functions should go in the .cpp file since they're private and only called from the .cpp file.

> Source/bmalloc/bmalloc/Heap.h:156
> +inline LargeObject& Heap::splitAndAllocate(LargeObject& largeObject, SegregatedFreeList& freeList, size_t alignment, size_t size)

Ditto.

> Source/bmalloc/bmalloc/VMState.h:37
> +inline VMState mergeVMStates(VMState vmState1, VMState vmState2)

You can just call this merge. The argument types indicate that you're merging VM states.
Comment 7 Michael Saboff 2016-02-18 12:58:29 PST
(In reply to comment #6)

> I wasn't sure how to fix this yesterday, but now I am:
> 
> Maintain the SegregatedFreeList in the VMHeap, and move objects from Heap to
> VMHeap when we scavenge them. (Heap will still be responsible for committing
> physical pages to large object allocations if needed.)
> 
> You can do this by making VMState a two-bit field instead of an enum, with 1
> bit for Virtual and 1 bit for Physical (and both bits implying Mixed).
> Merging states will just OR. The 0 state is invalid. Instead of an Owner, a
> SegregatedFreeList will have an expected bit value for the Physical bit. The
> Heap's will be 1 and the VMHeap's will be 0. Any free list entry is invalid
> if the Physical bit is wrong. Allocation must vmAllocatePhysicalPages if the
> Virtual bit is set. 
> 
> This solves our O(N^2) problem and adds some clarity. It also makes it so
> that allocations prefer Physical and Mixed objects over Virtual objects,
> which is a nice side benefit.

I have coded this up, but I have one question about the semantics of merging.  When we go to merge a largeObject that has the Physical bit set, can we merge it with a neighbor that only has the Virtual bit set?  If so, we will effectively take it from the VMHeap free list, by making the entry there invalid.  If we don't merge with a Virtual only object not, then we end up with only Physical objects in the Heap free list.
Comment 8 Geoffrey Garen 2016-02-18 16:12:30 PST
> When we go to merge a largeObject that has the Physical bit set,
> can we merge it with a neighbor that only has the Virtual bit set?

Yes, we must do this or we reintroduce our original bug.

To be clear, this requirement trumps all others, so if the algorithm I suggested contradicts this requirement, I made a mistake.

> If so,
> we will effectively take it from the VMHeap free list, by making the entry
> there invalid.

That's OK.

It's OK for a free to the Heap to make an entry in the VMHeap invalid (and vice versa). As long as the entry ends up valid and in some free list, we're cool.

It's not OK (a leak) if we ever free to the Heap in the Virtual state or free to the VMHeap in the Physical or Mixed state -- then our only record of a memory location is invalid, and we leak it.

Can you think of a case where this might happen?

We should ASSERT that objects are in the Virtual state when they enter the VMHeap free list, and vice versa when they enter the Heap free list.

We can get close to this bad state in the VMHeap deallocation loop, since it might put an object in the Virtual state, release the lock, deallocate the range, acquire the lock, and then merge into the Mixed state. *But*, it will also go around the loop again and end up in the Virtual state before inserting into the free list. 

So, I don't see a problem, but maybe I missed something and you see the problem, and at least we should ASSERT that we didn't miss anything.

You said by email:

> A large object with the Physical bit clear can only merge with another Virtual only object.  This would only happen for an object in the VMHeap.

It is *legal* for anything to merge with anything. This is our most important invariant, and the only thing that solves our bug.

But I think you're pointing out that we expect Virtual+Mixed and Virtual+Physical merges never to happen in practice?

Of course Virtual+Physical merges happen. The Heap does them and produces Mixed objects.

But Virtual+Mixed merges can also happen, in the VMHeap edge case I described above. And that's OK.

The one case that should never happen is that 0 should never merge with anything, since 0 is an invalid state.
Comment 9 Michael Saboff 2016-02-18 23:18:40 PST
Created attachment 271737 [details]
Patch with suggested changes and updates
Comment 10 Geoffrey Garen 2016-02-19 08:50:24 PST
Comment on attachment 271737 [details]
Patch with suggested changes and updates

View in context: https://bugs.webkit.org/attachment.cgi?id=271737&action=review

> Source/bmalloc/bmalloc/FreeList.cpp:40
> -LargeObject FreeList::takeGreedy(Owner owner)
> +LargeObject FreeList::takeGreedy()

This doesn't look right. take*() needs to accept a boolean expected value for the VMState physical bit, and test (vmState & VMState::Physical == expected). Let's call the boolean hasPhysical or expectedHasPhysical.

> Source/bmalloc/bmalloc/FreeList.cpp:56
> -LargeObject FreeList::take(Owner owner, size_t size)
> +LargeObject FreeList::take(size_t size)

Ditto.

> Source/bmalloc/bmalloc/FreeList.cpp:83
> -LargeObject FreeList::take(Owner owner, size_t alignment, size_t size, size_t unalignedSize)
> +LargeObject FreeList::take(size_t alignment, size_t size, size_t unalignedSize)

Ditto.

> Source/bmalloc/bmalloc/FreeList.cpp:116
> -void FreeList::removeInvalidAndDuplicateEntries(Owner owner)
> +void FreeList::removeInvalidAndDuplicateEntries()

Ditto.

> Source/bmalloc/bmalloc/LargeObject.h:127
> +    return m_beginTag->prev()->isFree()
> +        && (physicalState.hasPhysical() || (!m_beginTag->prev()->vmState().hasPhysical()));

This is wrong. It reintroduces the bug.

> Source/bmalloc/bmalloc/LargeObject.h:133
> +    return m_endTag->next()->isFree()
> +        && (physicalState.hasPhysical() || (!m_endTag->next()->vmState().hasPhysical()));

This is wrong. It reintroduces the bug.

> Source/bmalloc/bmalloc/LargeObject.h:-179
> -    if (m_beginTag->owner() != expectedOwner)
> -        return false;

isValidAndFree needs to test the hasPhysical boolean against VMState.

> Source/bmalloc/bmalloc/LargeObject.h:192
> +    // If this object has Physical pages, it can merge with any other object.
> +    // Otherwise the other object cannot have Physical pages.
> +    if (prev->isFree() && (vmState.hasPhysical() || (!prev->vmState().hasPhysical()))) {

This is wrong. It reintroduces the bug.

> Source/bmalloc/bmalloc/LargeObject.h:205
> +    // Same Physical page merging constraint as above.
> +    if (next->isFree() && (vmState.hasPhysical() || (!next->vmState().hasPhysical()))) {

This is wrong. It reintroduces the bug.

> Source/bmalloc/bmalloc/SegregatedFreeList.h:59
> +    VMState validPhysicalState() const { return m_vmPhysicalState; }

Let's call this hasPhysical, to match VMState's function.

> Source/bmalloc/bmalloc/SegregatedFreeList.h:63
> +    VMState m_vmPhysicalState;

Let's not use VMState to represent this boolean condition. Using VMState makes it tempting to think that the value is a VMState and can be compared == or != to a VMState. But that's not how it behaves.

> Source/bmalloc/bmalloc/VMHeap.h:128
> +    // We don't change the VMState of our possibly growing largeObject because we don't
> +    // want to precluding merging based on not having any physical pages.

This is wrong.

> Source/bmalloc/bmalloc/VMHeap.h:145
> +    } while (largeObject.prevCanMerge(VMState::withPhysical()) || largeObject.nextCanMerge(VMState::withPhysical()));

Merging should not consider physical state, or we have reintroduced the bug.

> Source/bmalloc/bmalloc/VMState.h:48
> +    VMState(unsigned vmState)
> +        : m_state(static_cast<state>(vmState))
> +    {
> +    }

This breaks type safety. Let's not do this. If we must do this, let's use an explicit constructor.

> Source/bmalloc/bmalloc/VMState.h:58
> +    static VMState withPhysical()
> +    {
> +        return VMState(Physical);
> +    }
> +
> +    static VMState withoutPhysical()
> +    {
> +        return VMState(Invalid);
> +    }

You can get rid of these functions if you represent expected physical state as a boolean.

> Source/bmalloc/bmalloc/VMState.h:63
> +    inline bool matchesPhysical(VMState otherVMState)
> +    {
> +        return !((m_state ^ otherVMState.m_state) & state::Physical);
> +    }

You can get rid of this function if you represent expected physical state as a boolean, and just call hasPhysical().

> Source/bmalloc/bmalloc/VMState.h:80
> +    operator unsigned() const { return m_state; }

This breaks type safety. Let's not do this. If we must do this, let's make it explicit operator unsigned to avoid accidental casts.
Comment 11 Michael Saboff 2016-02-19 14:24:58 PST
Created attachment 271800 [details]
Changed based on feedback

On the web page http://nim-lang.org/docs/lib.html on an iPhone 6, this reduces peak memory by ~17% (106.7MB -> 88MB) and virtual memory by 63% (823MB -> 303MB).
Comment 12 Geoffrey Garen 2016-02-19 14:30:54 PST
Comment on attachment 271800 [details]
Changed based on feedback

View in context: https://bugs.webkit.org/attachment.cgi?id=271800&action=review

r=me

> Source/bmalloc/bmalloc/VMAllocate.h:186
>  // Expands requests that are un-page-aligned. NOTE: Allocation must proceed left-to-right.

Please remove the left-to-right comment.

> Source/bmalloc/bmalloc/VMHeap.h:156
> +    largeObject.setVMState(VMState::Virtual);

Please remove the comment above that says:

170141        // If we couldn't merge with our neighbors before because they were in the
171142        // VM heap, we can merge with them now.
Comment 13 Michael Saboff 2016-02-19 15:27:36 PST
Committed r196840: <http://trac.webkit.org/changeset/196840>