Bug 192389 - bmalloc uses more memory on iOS compared to macOS due to physical page size differences
Summary: bmalloc uses more memory on iOS compared to macOS due to physical page size d...
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: bmalloc (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on: 194547 194550
Blocks:
  Show dependency treegraph
 
Reported: 2018-12-04 17:39 PST by Michael Saboff
Modified: 2019-02-12 12:19 PST (History)
6 users (show)

See Also:


Attachments
Work in progress patch (12.66 KB, patch)
2018-12-04 18:08 PST, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch (12.67 KB, patch)
2018-12-11 10:17 PST, Michael Saboff
ggaren: review-
Details | Formatted Diff | Diff
{patch addressing review issues (12.63 KB, patch)
2019-02-04 16:44 PST, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch addressing review comments (12.71 KB, patch)
2019-02-04 17:05 PST, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with some pref improvements. (12.87 KB, patch)
2019-02-06 12:29 PST, Michael Saboff
mark.lam: review-
Details | Formatted Diff | Diff
Updated patch responding to review comments (11.01 KB, patch)
2019-02-06 17:20 PST, Michael Saboff
mark.lam: review+
Details | Formatted Diff | Diff
Patch for Landing (10.80 KB, patch)
2019-02-06 18:57 PST, Michael Saboff
ggaren: review-
Details | Formatted Diff | Diff
Updated patch responding to review comments (11.57 KB, patch)
2019-02-07 13:32 PST, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2018-12-04 17:39:34 PST
bmalloc uses physical page size chunks when creating size classes.  Given that iOS has a 16K page size while macOS has a 4K page size, under certain memory loads, iOS will use more memory than macOS.

One fix is to decouple the allocation of size classes from the physical page size.
Comment 1 Michael Saboff 2018-12-04 17:40:03 PST
<rdar://problem/45465291>
Comment 2 Michael Saboff 2018-12-04 18:08:18 PST
Created attachment 356564 [details]
Work in progress patch

Want to run EWS bots.  Also Running performance tests.
Comment 3 EWS Watchlist 2018-12-04 18:10:06 PST
Attachment 356564 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Heap.cpp:127:  Multi line control clauses should use braces.  [whitespace/braces] [4]
Total errors found: 1 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Michael Saboff 2018-12-11 10:17:52 PST
Created attachment 357062 [details]
Patch
Comment 5 EWS Watchlist 2018-12-11 10:18:58 PST
Attachment 357062 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/Heap.cpp:127:  Multi line control clauses should use braces.  [whitespace/braces] [4]
Total errors found: 1 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 Geoffrey Garen 2018-12-11 10:29:56 PST
Comment on attachment 356564 [details]
Work in progress patch

View in context: https://bugs.webkit.org/attachment.cgi?id=356564&action=review

> Source/bmalloc/bmalloc/Heap.cpp:258
> +struct Heap::SmallPageRange Heap::findSmallPageRangeSharingPhysicalPages(Lock&, SmallPage* page, size_t pageSize, Function includeSmallPageAt)

This seems like an overly complicated way to find a set that will always be either "page" or "page" and three of its neighbors. I would have expected the function to use operations similar to Chunk::offset and Chunk::page, along with masking, to iterate from the first VM-page-size-aligned page to its (0 or 3) neighbors.

I'm also not sure we want an abstract find operation with a predicate. I think we just want a function to return the set of (1 or 4) pages.

> Source/bmalloc/bmalloc/Heap.cpp:316
> +        for (auto* page : chunk->freePages()) {
> +            if (pageToCheck == page)
> +                return pageToCheck->hasPhysicalPages();
> +        }

I'm not sure why we need to do this -- but it's O(n^2), so we should find another way.
Comment 7 Geoffrey Garen 2018-12-12 20:21:00 PST
Comment on attachment 357062 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=357062&action=review

> Source/bmalloc/bmalloc/Heap.cpp:129
> -        for (size_t pageSize = m_vmPageSizePhysical;
> +        for (size_t pageSize = smallPageSize;
>              pageSize < pageSizeMax;
> -            pageSize += m_vmPageSizePhysical) {
> +            pageSize += smallPageSize) {

I think you need some code here to ensure that you don't use 12kB pages on 16kB page systems. On 16kB page systems, we want pages sizes of [ 4kB, 8kB, 16kB ... ]

Also worth testing whether including 2kB in the mix reduces memory use without causing a throughput regression.

I think you also need some code to ensure that you don't use the 20kB page size, etc. on 16kB systems. Those are really awkward values that will produce a lot of physical page size waste by defeating the scavenger.

The simplest solution is just to use [ 4kB, 8kB, 16kB, 32kB ... ]. But it might also be useful to include 2kB.
Comment 8 Michael Saboff 2019-02-04 16:44:19 PST
Created attachment 361127 [details]
{patch addressing review issues

This is a 6.5% reduction in memory on iOS running ramification tests.  Running performance and more memory tests.
Comment 9 EWS Watchlist 2019-02-04 16:46:32 PST
Attachment 361127 [details] did not pass style-queue:


ERROR: Source/bmalloc/ChangeLog:1:  ChangeLog entry has no bug number  [changelog/bugnumber] [5]
Total errors found: 1 in 6 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Michael Saboff 2019-02-04 17:05:36 PST
Created attachment 361133 [details]
Patch addressing review comments
Comment 11 Michael Saboff 2019-02-06 12:29:52 PST
Created attachment 361315 [details]
Patch with some pref improvements.
Comment 12 Mark Lam 2019-02-06 14:56:50 PST
Comment on attachment 361315 [details]
Patch with some pref improvements.

View in context: https://bugs.webkit.org/attachment.cgi?id=361315&action=review

r- because I think there are many fixes that should be done.

> Source/bmalloc/bmalloc/Heap.cpp:47
> +static_assert(bmalloc::isPowerOfTwo(smallPageSize), "");

No need for bmalloc:: qualifier.  We're in bmalloc namespace.

> Source/bmalloc/bmalloc/Heap.cpp:276
> +void Heap::tryDecommitSmallPageSlow(std::lock_guard<Mutex>&, BulkDecommit& decommitter, SmallPage* smallPage, size_t pageSize)
> +{
> +    Chunk* chunk = Chunk::get(smallPage);

I suggest you pass in the chunk since it's always know in the caller.

> Source/bmalloc/bmalloc/Heap.cpp:300
> +    SmallPage* firstPageInRange = chunk->page(chunk->offset(physicalPagesBegin));
> +    SmallPage* lastPageInRange = chunk->page(chunk->offset(physicalPagesBegin + pageSize * (smallPageCount - 1)));
> +    
> +    for (auto* page : chunk->freePages()) {
> +        if (page >= firstPageInRange && page <= lastPageInRange) {
> +            matchingPageCount++;
> +            if (matchingPageCount == smallPageCount) {
> +                firstPageToDecommit = firstPageInRange;
> +                pagesToDecommit = matchingPageCount;
> +                break;
> +            }
> +        }
> +    }
> +
> +    if (!firstPageToDecommit || !pagesToDecommit)
> +        return;

Instead of doing this, can we instead do the following:
1. In class SmallPage, add a bit: isCommitted.
2. In Heap::commitSmallPage():
    if the SmallPage does not have a physical page yet, allocate the physical page, and set all of its SmallPage isCommitted bits to false.
    Always set the requested SmallPage's isCommitted bit to true.
3. Here in Heap::tryDecommitSmallPage():
    Assert that all SmallPages in this physical page bounds has a physical page.
    Clear the isCommitted bit for this SmallPage.
    Check if all SmallPage's isCommitted bit is cleared.  If so, decommit the physical page and clear all the m_hasPhysicalPages bits.

> Source/bmalloc/bmalloc/Heap.cpp:327
> +void Heap::commitSmallPageSlow(std::unique_lock<Mutex>&, SmallPage* page, size_t pageSize)
> +{
> +    Chunk* chunk = Chunk::get(page);

I suggest you pass in the chunk since it's always know in the caller.

> Source/bmalloc/bmalloc/Heap.cpp:331
> +    char* physicalPagesBegin = roundDownToMultipleOf(m_vmPageSizePhysical, page->begin()->begin());

Can you add the following above this?
    BASSERT(isPowerOfTwo(pageSize));
    BASSERT(pageSize < m_vmPageSizePhysical);

> Source/bmalloc/bmalloc/Heap.cpp:335
> +    unsigned smallPageCount = m_vmPageSizePhysical / pageSize;
> +
> +    size_t firstPageOffset = chunk->offset(physicalPagesBegin);
> +    size_t lastPageOffset = firstPageOffset + smallPageCount * pageSize;

You can do away with smallPageCount and just compute lastPageOffset as:
    size_t lastPageOffset = firstPageOffset + m_vmPageSizePhysical;

Since we make reference to a SmallPage and a physical page in this body of code, I think it would be clearer if we name variables more clearly to distinguish between which one we're referring to.  I suggest renaming firstPageOffset to physicalPageBeginOffset, and lastPageOffset to physicalPageEndOffset.

Ditto for firstPageToCommit: I suggest renaming to firstSmallPageToCommit.
Ditto for pagesToCommit: I suggest renaming to smallPagesToCommit.

> Source/bmalloc/bmalloc/Heap.cpp:350
> +    for (auto it = begin; it + pageSize <= end; it = it + pageSize) {
> +        if (!firstPageToCommit) {
> +            if (!it.page()->hasPhysicalPages()) {
> +                firstPageToCommit = it.page();
> +                pagesToCommit = 1;
> +            }
> +        } else if (!it.page()->hasPhysicalPages())
> +            pagesToCommit++;
> +        else
> +            break;
> +    }

I don't understand this.  My understanding is that either all SmallPages in this physical page's bound has a physical page or they don't.  Why not just check the requested page and be done?  That lessens the number of cache lines we touch too.  You can turn this into a debug assert loop to verify that all SmallPages in this bounds to be the same as the requested page in terms of whether they have a physical page or not.

> Source/bmalloc/bmalloc/Heap.cpp:352
> +    BASSERT(firstPageToCommit && pagesToCommit);

This assert is invalid in light of the observation about the loop above.

> Source/bmalloc/bmalloc/Heap.cpp:374
> +    char* firstPageBegin = firstPageToCommit->begin()->begin();
> +    size_t commitSize = physicalPageSizeSloppyRoundUp(firstPageBegin, pagesToCommit * pageSize);
> +    m_footprint += commitSize;
> +
> +    vmAllocatePhysicalPagesSloppy(firstPageBegin, commitSize);
> +
> +    if (pagesToCommit == 1)
> +        firstPageToCommit->setHasPhysicalPages(true);
> +    else {
> +        size_t firstPageOffset = chunk->offset(firstPageBegin);
> +        size_t lastPageOffset = firstPageOffset + pagesToCommit * pageSize;
> +
> +        Object begin(chunk, firstPageOffset);
> +        Object end(chunk, lastPageOffset);
> +
> +        for (auto it = begin; it + pageSize <= end; it = it + pageSize)
> +            it.page()->setHasPhysicalPages(true);
> +    }
> +#if ENABLE_PHYSICAL_PAGE_MAP
> +    m_physicalPageMap.commit(firstPageToCommit, commitSize);
> +#endif

This is also no longer valid.  This can be simplified as follows:

    if (!page->hasPhysicalPages()) {
        commit physical page;
        m_footprint += m_vmPageSizePhysical;
#if ENABLE_PHYSICAL_PAGE_MAP
        m_physicalPageMap.commit(physicalPageBegin, m_vmPageSizePhysical);
#endif
    }

> Source/bmalloc/bmalloc/Heap.h:124
> +    void tryDecommitSmallPageSlow(std::lock_guard<Mutex>&, BulkDecommit& decommitter, SmallPage*, size_t pageSize);
> +    void commitSmallPageSlow(std::unique_lock<Mutex>&, SmallPage*, size_t pageSize);

I think you can drop the "Slow" in the names since there is no fast version of this.
Comment 13 Mark Lam 2019-02-06 15:18:34 PST
Comment on attachment 361315 [details]
Patch with some pref improvements.

View in context: https://bugs.webkit.org/attachment.cgi?id=361315&action=review

>> Source/bmalloc/bmalloc/Heap.cpp:300
>> +        return;
> 
> Instead of doing this, can we instead do the following:
> 1. In class SmallPage, add a bit: isCommitted.
> 2. In Heap::commitSmallPage():
>     if the SmallPage does not have a physical page yet, allocate the physical page, and set all of its SmallPage isCommitted bits to false.
>     Always set the requested SmallPage's isCommitted bit to true.
> 3. Here in Heap::tryDecommitSmallPage():
>     Assert that all SmallPages in this physical page bounds has a physical page.
>     Clear the isCommitted bit for this SmallPage.
>     Check if all SmallPage's isCommitted bit is cleared.  If so, decommit the physical page and clear all the m_hasPhysicalPages bits.

My comment here is invalid.  I misunderstood and thought that this Heap::tryDecommitSmallPageSlow() is called symmetrically to Heap::commitSmallPageSlow() like free() to malloc().  I also missed that Heap::commitSmallPageSlow() is not called on all SmallPages, only those that does not already have a physical page.  Hence, we don't have a convenient opportunity to set and clear my "isCommitted" flag (which is also a misnomer, I meant "isInUse").
Comment 14 Michael Saboff 2019-02-06 17:20:38 PST
Created attachment 361348 [details]
Updated patch responding to review comments
Comment 15 Mark Lam 2019-02-06 17:39:35 PST
Comment on attachment 361348 [details]
Updated patch responding to review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=361348&action=review

r=me with suggestions.

> Source/bmalloc/ChangeLog:4
> +        https://bugs.webkit.org/show_bug.cgi?id=192389
> +        bmalloc uses more memory on iOS compared to macOS due to physical page size differences

This is inverted.  Michael said he'll fix locally.

> Source/bmalloc/bmalloc/Heap.cpp:280
> +    size_t matchingPageCount = 0;

nit: let's use the { 0 } initializer style to be consistent with the lines above.  Or change the lines above to be consistent with this line.  It just looks strange.

> Source/bmalloc/bmalloc/Heap.cpp:283
> +    unsigned smallPageCount = m_vmPageSizePhysical / pageSize;

I suggest renaming smallPageCount to smallPagesPerPhysicalPage.  See below for reason.

> Source/bmalloc/bmalloc/Heap.cpp:296
> +                firstPageToDecommit = firstPageInRange;
> +                pagesToDecommit = matchingPageCount;

We don't need these.  All we need to know is that matchingPageCount == smallPageCount.  If so, we will recommit the page.

Given that you can remove firstPageToDecommit and pagesToDecommit here, I suggest renaming matchingPageCount to pagesToDecommit.  I think that reads better, and focus on the idea that we'll only decommit if pagesToDecommit == smallPagesPerPhysicalPage.  You can decide.

> Source/bmalloc/bmalloc/Heap.cpp:303
> +    if (firstPageToDecommit != firstPageInRange || pagesToDecommit != smallPageCount)
> +        return;

This can check for (matchingPageCount < smallPageCount) instead ... or (pagesToDecommit < smallPagesPerPhysicalPage) if you choose my suggested renaming.

> Source/bmalloc/bmalloc/Heap.cpp:316
> +    m_physicalPageMap.decommit(firstPageToDecommit, decommitSize);

Just use physicalPagesBegin here instead of firstPageToDecommit.
Comment 16 Michael Saboff 2019-02-06 18:57:44 PST
Created attachment 361362 [details]
Patch for Landing

Has changes in response to review.  Waiting on performance tests before landing.
Comment 17 Geoffrey Garen 2019-02-06 21:57:28 PST
Comment on attachment 361362 [details]
Patch for Landing

View in context: https://bugs.webkit.org/attachment.cgi?id=361362&action=review

> Source/bmalloc/bmalloc/Heap.cpp:136
> +        // We only want power of 2 pageSizes. Given that smallPageSize is a power of 2, we just double it when we want a larger size.
> +        for (size_t pageSize = smallPageSize; pageSize < pageSizeMax; pageSize *= 2) {

This isn't quite right. At page sizes above { 4kB on Mac, 16kB on iOS }, you have a wasteful increase in page size.

We don't actually want powers of two. We want either even divisors of m_vmPageSizePhysical or even multiples of m_vmPageSizePhysical. That's what guarantees that we can madvise each physical page without waste.

You can get that effect by doing *= 2 up to m_vmPageSizePhysical and += m_vmPageSizePhysical above m_vmPageSizePhysical. I would write that as two loops.

> Source/bmalloc/bmalloc/Heap.cpp:294
> +    for (auto* page : chunk->freePages()) {
> +        if (page >= beginPageRange && page < endPageRange) {
> +            smallPagesReadyToDecommit++;
> +            if (smallPagesReadyToDecommit == smallPagePerPhysicalPage)
> +                break;
> +        }
> +    }

We don't need a linear search of the free page list to find { 2 or 4 } pages thet are aligned neighbors.

Chunk needs this API:

    size_t pageNumber(SmallPage*);

You can make a mask like this:

    auto mask = smallPagePerPhysicalPage - 1;

Then you can iterate the pages you need like this:

auto pages = chunk->pages()[chunk->pageNumber(smallPage) & mask];
for (size_t i = 0; i < smallPagePerPhysicalPage; ++i) {
    auto page = pages[i];
    ...
}

Also, there's no need to count smallPagesReadyToDecommit. You can just return if any page says it has a non-zero refCount().
Comment 18 Geoffrey Garen 2019-02-07 07:26:22 PST
Actually, even better than masking the page and iterating all neighbors, you should just skip any page that isn’t already aligned to the mask. 

That way, the first page in a run of pages is in charge of madvising the run, and the rest of the pages can be no-ops. If you never iterate the first page, that’s ok, that proves you would not have madvised the run.
Comment 19 Michael Saboff 2019-02-07 11:53:13 PST
(In reply to Geoffrey Garen from comment #17)
> Comment on attachment 361362 [details]
> Patch for Landing
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=361362&action=review
> 
> > Source/bmalloc/bmalloc/Heap.cpp:136
> > +        // We only want power of 2 pageSizes. Given that smallPageSize is a power of 2, we just double it when we want a larger size.
> > +        for (size_t pageSize = smallPageSize; pageSize < pageSizeMax; pageSize *= 2) {
> 
> This isn't quite right. At page sizes above { 4kB on Mac, 16kB on iOS }, you
> have a wasteful increase in page size.
> 
> We don't actually want powers of two. We want either even divisors of
> m_vmPageSizePhysical or even multiples of m_vmPageSizePhysical. That's what
> guarantees that we can madvise each physical page without waste.
> 
> You can get that effect by doing *= 2 up to m_vmPageSizePhysical and +=
> m_vmPageSizePhysical above m_vmPageSizePhysical. I would write that as two
> loops.

Done.

> > Source/bmalloc/bmalloc/Heap.cpp:294
> > +    for (auto* page : chunk->freePages()) {
> > +        if (page >= beginPageRange && page < endPageRange) {
> > +            smallPagesReadyToDecommit++;
> > +            if (smallPagesReadyToDecommit == smallPagePerPhysicalPage)
> > +                break;
> > +        }
> > +    }
> 
> We don't need a linear search of the free page list to find { 2 or 4 } pages
> thet are aligned neighbors.
> 
> Chunk needs this API:
> 
>     size_t pageNumber(SmallPage*);
> 
> You can make a mask like this:
> 
>     auto mask = smallPagePerPhysicalPage - 1;
> 
> Then you can iterate the pages you need like this:
> 
> auto pages = chunk->pages()[chunk->pageNumber(smallPage) & mask];
> for (size_t i = 0; i < smallPagePerPhysicalPage; ++i) {
>     auto page = pages[i];
>     ...
> }

We can't iterate by incrementing the page number since pages that are bigger than smallPageSize use multiple SmallPage entries in Chunk::m_pages (and set Page::m_slide accordingly).  Calculating a stride is possible, but ugly.  We should use a for loop of Objects with the pageSize just like elsewhere in the code.

It still makes sense to let the first page in the physical page manage the madvising.
Comment 20 Geoffrey Garen 2019-02-07 12:03:53 PST
> We can't iterate by incrementing the page number since pages that are bigger
> than smallPageSize use multiple SmallPage entries in Chunk::m_pages (and set
> Page::m_slide accordingly).  Calculating a stride is possible, but ugly.  We
> should use a for loop of Objects with the pageSize just like elsewhere in
> the code.
> 
> It still makes sense to let the first page in the physical page manage the
> madvising.

Good point! Yeah, let's use masking to identify the first page and make the first page responsible for scanning the rest, but let's also use Object-based iteration rather than ++i.
Comment 21 Michael Saboff 2019-02-07 13:32:57 PST
Created attachment 361436 [details]
Updated patch responding to review comments
Comment 22 Geoffrey Garen 2019-02-07 14:29:00 PST
Comment on attachment 361436 [details]
Updated patch responding to review comments

r=me
Comment 23 Michael Saboff 2019-02-07 18:30:24 PST
Committed r241182: <https://trac.webkit.org/changeset/241182>
Comment 24 Saam Barati 2019-02-10 19:48:54 PST
It seems possible this is a speedometer2 regression on iOS
Comment 25 WebKit Commit Bot 2019-02-12 11:17:43 PST
Re-opened since this is blocked by bug 194547