196240 – [bmalloc] bmalloc should handle realloc for larger size well

Bug 196240 - [bmalloc] bmalloc should handle realloc for larger size well

Summary: [bmalloc] bmalloc should handle realloc for larger size well

Status:	NEW

Alias:	None

Product:	WebKit
Classification:	Unclassified
Component:	bmalloc (show other bugs)
Version:	WebKit Nightly Build
Hardware:	Unspecified Unspecified

Importance:	P2 Normal
Assignee:	Yusuke Suzuki

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-03-25 21:00 PDT by Yusuke Suzuki
Modified:	2019-03-26 16:19 PDT (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yusuke Suzuki 2019-03-25 21:00:56 PDT

We should explore the chance to use existing contiguous regions when "realloc" is called for large objects.
It can avoid deallocations & allocations. It is particularly nice for current and peak memory footprint.

1. For current memory footprint, "realloc" avoids deallocations. If deallocations happen, we do not see memory progression immediately. It will be delayed by asynchronous scavenger. So avoiding deallocations gets pure win in terms of "current" memory footprint.
2. For peak memory footprint, we do not have "new" and "old" memory region at the same time. So we can make peak memory footprint small.


In system malloc, using "realloc" aggressively in JSC Butterfly allocations shows 3% progression in RAMification non-JIT tests.

Comment 1 Geoffrey Garen 2019-03-26 14:27:58 PDT

One way to think about this proposal is:

Currently, bmalloc does all large allocations "first fit", picking the lowest address that can fit the memory allocation. Sometimes, this results in more memory being committed in the short term, if a better fit or an already-committed fit is available at a higher address. However, in the long run, "first fit" tends to result in less memory use, by compacting allocations toward lower addresses, reducing fragmentation.

The proposal here is to use "best fit" if and only if we're doing a realloc. But what is it about realloc that is special?

Avoiding a large memcpy seems like one reason that realloc might be special. I think that's the key.

If we believed that something about this change would reduce footprint, maybe we should consider making the same change for all large allocations.

Comment 2 Yusuke Suzuki 2019-03-26 16:19:31 PDT

(In reply to Geoffrey Garen from comment #1)
> One way to think about this proposal is:
> 
> Currently, bmalloc does all large allocations "first fit", picking the
> lowest address that can fit the memory allocation. Sometimes, this results
> in more memory being committed in the short term, if a better fit or an
> already-committed fit is available at a higher address. However, in the long
> run, "first fit" tends to result in less memory use, by compacting
> allocations toward lower addresses, reducing fragmentation.

Interesting, I think we can use "realloc"'s semantics to further enhance this policy.

> The proposal here is to use "best fit" if and only if we're doing a realloc.
> But what is it about realloc that is special?

I think the key in realloc is that we know that the given range is about to be destroyed and can be considered as free memory range.
And this would offer 2 ways to enhance the current bmalloc's large allocation.

1. Merge existing contiguous region

We can consider the passed memory range as a free range. And it offers an additional candidate to bmalloc Large memory allocator, which can have lower address.
We first have one large range, split it into two, and use the former one.

[ Used 2MB ][ Free 2MB ]

And later, we call realloc(Used, 4MB). In this case, we have a chance to merge Used and Free into 4MB and return it as a realloc's result.

[  Used 4MB            ]

If we call "malloc(4MB), memcpy(Used, New), free(Used)", the layout would become like this.

[  Free 4MB            ]        ...       [   New 4MB          ]

It is possible to have a chance to expand the current region. For example, repeatedly appending something to a Vector.

2. Extend existing entire Large range.

Another way is extending the existing VM region to larger size. If the existing memory ranges are not fit into the requested size, and if the passed memory range is only used by itself, we can extend this VM region into a large size and continue using it (if we are not in a Gigacage).


I'm not sure whether these are actually profitable in RAMification and other applications, but I think it would be potentially profitable, but anyway, not sure until exploring it actually.

> Avoiding a large memcpy seems like one reason that realloc might be special.
> I think that's the key.
> 
> If we believed that something about this change would reduce footprint,
> maybe we should consider making the same change for all large allocations.