REOPENED Bug 192389
bmalloc uses more memory on iOS compared to macOS due to physical page size differences
https://bugs.webkit.org/show_bug.cgi?id=192389
Summary bmalloc uses more memory on iOS compared to macOS due to physical page size d...
Michael Saboff
Reported 2018-12-04 17:39:34 PST
bmalloc uses physical page size chunks when creating size classes. Given that iOS has a 16K page size while macOS has a 4K page size, under certain memory loads, iOS will use more memory than macOS. One fix is to decouple the allocation of size classes from the physical page size.
Attachments
Work in progress patch (12.66 KB, patch)
2018-12-04 18:08 PST, Michael Saboff
no flags
Patch (12.67 KB, patch)
2018-12-11 10:17 PST, Michael Saboff
ggaren: review-
{patch addressing review issues (12.63 KB, patch)
2019-02-04 16:44 PST, Michael Saboff
no flags
Patch addressing review comments (12.71 KB, patch)
2019-02-04 17:05 PST, Michael Saboff
no flags
Patch with some pref improvements. (12.87 KB, patch)
2019-02-06 12:29 PST, Michael Saboff
mark.lam: review-
Updated patch responding to review comments (11.01 KB, patch)
2019-02-06 17:20 PST, Michael Saboff
mark.lam: review+
Patch for Landing (10.80 KB, patch)
2019-02-06 18:57 PST, Michael Saboff
ggaren: review-
Updated patch responding to review comments (11.57 KB, patch)
2019-02-07 13:32 PST, Michael Saboff
ggaren: review+
Michael Saboff
Comment 1 2018-12-04 17:40:03 PST
Michael Saboff
Comment 2 2018-12-04 18:08:18 PST
Created attachment 356564 [details] Work in progress patch Want to run EWS bots. Also Running performance tests.
EWS Watchlist
Comment 3 2018-12-04 18:10:06 PST
Attachment 356564 [details] did not pass style-queue: ERROR: Source/bmalloc/bmalloc/Heap.cpp:127: Multi line control clauses should use braces. [whitespace/braces] [4] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 4 2018-12-11 10:17:52 PST
EWS Watchlist
Comment 5 2018-12-11 10:18:58 PST
Attachment 357062 [details] did not pass style-queue: ERROR: Source/bmalloc/bmalloc/Heap.cpp:127: Multi line control clauses should use braces. [whitespace/braces] [4] Total errors found: 1 in 5 files If any of these errors are false positives, please file a bug against check-webkit-style.
Geoffrey Garen
Comment 6 2018-12-11 10:29:56 PST
Comment on attachment 356564 [details] Work in progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=356564&action=review > Source/bmalloc/bmalloc/Heap.cpp:258 > +struct Heap::SmallPageRange Heap::findSmallPageRangeSharingPhysicalPages(Lock&, SmallPage* page, size_t pageSize, Function includeSmallPageAt) This seems like an overly complicated way to find a set that will always be either "page" or "page" and three of its neighbors. I would have expected the function to use operations similar to Chunk::offset and Chunk::page, along with masking, to iterate from the first VM-page-size-aligned page to its (0 or 3) neighbors. I'm also not sure we want an abstract find operation with a predicate. I think we just want a function to return the set of (1 or 4) pages. > Source/bmalloc/bmalloc/Heap.cpp:316 > + for (auto* page : chunk->freePages()) { > + if (pageToCheck == page) > + return pageToCheck->hasPhysicalPages(); > + } I'm not sure why we need to do this -- but it's O(n^2), so we should find another way.
Geoffrey Garen
Comment 7 2018-12-12 20:21:00 PST
Comment on attachment 357062 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=357062&action=review > Source/bmalloc/bmalloc/Heap.cpp:129 > - for (size_t pageSize = m_vmPageSizePhysical; > + for (size_t pageSize = smallPageSize; > pageSize < pageSizeMax; > - pageSize += m_vmPageSizePhysical) { > + pageSize += smallPageSize) { I think you need some code here to ensure that you don't use 12kB pages on 16kB page systems. On 16kB page systems, we want pages sizes of [ 4kB, 8kB, 16kB ... ] Also worth testing whether including 2kB in the mix reduces memory use without causing a throughput regression. I think you also need some code to ensure that you don't use the 20kB page size, etc. on 16kB systems. Those are really awkward values that will produce a lot of physical page size waste by defeating the scavenger. The simplest solution is just to use [ 4kB, 8kB, 16kB, 32kB ... ]. But it might also be useful to include 2kB.
Michael Saboff
Comment 8 2019-02-04 16:44:19 PST
Created attachment 361127 [details] {patch addressing review issues This is a 6.5% reduction in memory on iOS running ramification tests. Running performance and more memory tests.
EWS Watchlist
Comment 9 2019-02-04 16:46:32 PST
Attachment 361127 [details] did not pass style-queue: ERROR: Source/bmalloc/ChangeLog:1: ChangeLog entry has no bug number [changelog/bugnumber] [5] Total errors found: 1 in 6 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 10 2019-02-04 17:05:36 PST
Created attachment 361133 [details] Patch addressing review comments
Michael Saboff
Comment 11 2019-02-06 12:29:52 PST
Created attachment 361315 [details] Patch with some pref improvements.
Mark Lam
Comment 12 2019-02-06 14:56:50 PST
Comment on attachment 361315 [details] Patch with some pref improvements. View in context: https://bugs.webkit.org/attachment.cgi?id=361315&action=review r- because I think there are many fixes that should be done. > Source/bmalloc/bmalloc/Heap.cpp:47 > +static_assert(bmalloc::isPowerOfTwo(smallPageSize), ""); No need for bmalloc:: qualifier. We're in bmalloc namespace. > Source/bmalloc/bmalloc/Heap.cpp:276 > +void Heap::tryDecommitSmallPageSlow(std::lock_guard<Mutex>&, BulkDecommit& decommitter, SmallPage* smallPage, size_t pageSize) > +{ > + Chunk* chunk = Chunk::get(smallPage); I suggest you pass in the chunk since it's always know in the caller. > Source/bmalloc/bmalloc/Heap.cpp:300 > + SmallPage* firstPageInRange = chunk->page(chunk->offset(physicalPagesBegin)); > + SmallPage* lastPageInRange = chunk->page(chunk->offset(physicalPagesBegin + pageSize * (smallPageCount - 1))); > + > + for (auto* page : chunk->freePages()) { > + if (page >= firstPageInRange && page <= lastPageInRange) { > + matchingPageCount++; > + if (matchingPageCount == smallPageCount) { > + firstPageToDecommit = firstPageInRange; > + pagesToDecommit = matchingPageCount; > + break; > + } > + } > + } > + > + if (!firstPageToDecommit || !pagesToDecommit) > + return; Instead of doing this, can we instead do the following: 1. In class SmallPage, add a bit: isCommitted. 2. In Heap::commitSmallPage(): if the SmallPage does not have a physical page yet, allocate the physical page, and set all of its SmallPage isCommitted bits to false. Always set the requested SmallPage's isCommitted bit to true. 3. Here in Heap::tryDecommitSmallPage(): Assert that all SmallPages in this physical page bounds has a physical page. Clear the isCommitted bit for this SmallPage. Check if all SmallPage's isCommitted bit is cleared. If so, decommit the physical page and clear all the m_hasPhysicalPages bits. > Source/bmalloc/bmalloc/Heap.cpp:327 > +void Heap::commitSmallPageSlow(std::unique_lock<Mutex>&, SmallPage* page, size_t pageSize) > +{ > + Chunk* chunk = Chunk::get(page); I suggest you pass in the chunk since it's always know in the caller. > Source/bmalloc/bmalloc/Heap.cpp:331 > + char* physicalPagesBegin = roundDownToMultipleOf(m_vmPageSizePhysical, page->begin()->begin()); Can you add the following above this? BASSERT(isPowerOfTwo(pageSize)); BASSERT(pageSize < m_vmPageSizePhysical); > Source/bmalloc/bmalloc/Heap.cpp:335 > + unsigned smallPageCount = m_vmPageSizePhysical / pageSize; > + > + size_t firstPageOffset = chunk->offset(physicalPagesBegin); > + size_t lastPageOffset = firstPageOffset + smallPageCount * pageSize; You can do away with smallPageCount and just compute lastPageOffset as: size_t lastPageOffset = firstPageOffset + m_vmPageSizePhysical; Since we make reference to a SmallPage and a physical page in this body of code, I think it would be clearer if we name variables more clearly to distinguish between which one we're referring to. I suggest renaming firstPageOffset to physicalPageBeginOffset, and lastPageOffset to physicalPageEndOffset. Ditto for firstPageToCommit: I suggest renaming to firstSmallPageToCommit. Ditto for pagesToCommit: I suggest renaming to smallPagesToCommit. > Source/bmalloc/bmalloc/Heap.cpp:350 > + for (auto it = begin; it + pageSize <= end; it = it + pageSize) { > + if (!firstPageToCommit) { > + if (!it.page()->hasPhysicalPages()) { > + firstPageToCommit = it.page(); > + pagesToCommit = 1; > + } > + } else if (!it.page()->hasPhysicalPages()) > + pagesToCommit++; > + else > + break; > + } I don't understand this. My understanding is that either all SmallPages in this physical page's bound has a physical page or they don't. Why not just check the requested page and be done? That lessens the number of cache lines we touch too. You can turn this into a debug assert loop to verify that all SmallPages in this bounds to be the same as the requested page in terms of whether they have a physical page or not. > Source/bmalloc/bmalloc/Heap.cpp:352 > + BASSERT(firstPageToCommit && pagesToCommit); This assert is invalid in light of the observation about the loop above. > Source/bmalloc/bmalloc/Heap.cpp:374 > + char* firstPageBegin = firstPageToCommit->begin()->begin(); > + size_t commitSize = physicalPageSizeSloppyRoundUp(firstPageBegin, pagesToCommit * pageSize); > + m_footprint += commitSize; > + > + vmAllocatePhysicalPagesSloppy(firstPageBegin, commitSize); > + > + if (pagesToCommit == 1) > + firstPageToCommit->setHasPhysicalPages(true); > + else { > + size_t firstPageOffset = chunk->offset(firstPageBegin); > + size_t lastPageOffset = firstPageOffset + pagesToCommit * pageSize; > + > + Object begin(chunk, firstPageOffset); > + Object end(chunk, lastPageOffset); > + > + for (auto it = begin; it + pageSize <= end; it = it + pageSize) > + it.page()->setHasPhysicalPages(true); > + } > +#if ENABLE_PHYSICAL_PAGE_MAP > + m_physicalPageMap.commit(firstPageToCommit, commitSize); > +#endif This is also no longer valid. This can be simplified as follows: if (!page->hasPhysicalPages()) { commit physical page; m_footprint += m_vmPageSizePhysical; #if ENABLE_PHYSICAL_PAGE_MAP m_physicalPageMap.commit(physicalPageBegin, m_vmPageSizePhysical); #endif } > Source/bmalloc/bmalloc/Heap.h:124 > + void tryDecommitSmallPageSlow(std::lock_guard<Mutex>&, BulkDecommit& decommitter, SmallPage*, size_t pageSize); > + void commitSmallPageSlow(std::unique_lock<Mutex>&, SmallPage*, size_t pageSize); I think you can drop the "Slow" in the names since there is no fast version of this.
Mark Lam
Comment 13 2019-02-06 15:18:34 PST
Comment on attachment 361315 [details] Patch with some pref improvements. View in context: https://bugs.webkit.org/attachment.cgi?id=361315&action=review >> Source/bmalloc/bmalloc/Heap.cpp:300 >> + return; > > Instead of doing this, can we instead do the following: > 1. In class SmallPage, add a bit: isCommitted. > 2. In Heap::commitSmallPage(): > if the SmallPage does not have a physical page yet, allocate the physical page, and set all of its SmallPage isCommitted bits to false. > Always set the requested SmallPage's isCommitted bit to true. > 3. Here in Heap::tryDecommitSmallPage(): > Assert that all SmallPages in this physical page bounds has a physical page. > Clear the isCommitted bit for this SmallPage. > Check if all SmallPage's isCommitted bit is cleared. If so, decommit the physical page and clear all the m_hasPhysicalPages bits. My comment here is invalid. I misunderstood and thought that this Heap::tryDecommitSmallPageSlow() is called symmetrically to Heap::commitSmallPageSlow() like free() to malloc(). I also missed that Heap::commitSmallPageSlow() is not called on all SmallPages, only those that does not already have a physical page. Hence, we don't have a convenient opportunity to set and clear my "isCommitted" flag (which is also a misnomer, I meant "isInUse").
Michael Saboff
Comment 14 2019-02-06 17:20:38 PST
Created attachment 361348 [details] Updated patch responding to review comments
Mark Lam
Comment 15 2019-02-06 17:39:35 PST
Comment on attachment 361348 [details] Updated patch responding to review comments View in context: https://bugs.webkit.org/attachment.cgi?id=361348&action=review r=me with suggestions. > Source/bmalloc/ChangeLog:4 > + https://bugs.webkit.org/show_bug.cgi?id=192389 > + bmalloc uses more memory on iOS compared to macOS due to physical page size differences This is inverted. Michael said he'll fix locally. > Source/bmalloc/bmalloc/Heap.cpp:280 > + size_t matchingPageCount = 0; nit: let's use the { 0 } initializer style to be consistent with the lines above. Or change the lines above to be consistent with this line. It just looks strange. > Source/bmalloc/bmalloc/Heap.cpp:283 > + unsigned smallPageCount = m_vmPageSizePhysical / pageSize; I suggest renaming smallPageCount to smallPagesPerPhysicalPage. See below for reason. > Source/bmalloc/bmalloc/Heap.cpp:296 > + firstPageToDecommit = firstPageInRange; > + pagesToDecommit = matchingPageCount; We don't need these. All we need to know is that matchingPageCount == smallPageCount. If so, we will recommit the page. Given that you can remove firstPageToDecommit and pagesToDecommit here, I suggest renaming matchingPageCount to pagesToDecommit. I think that reads better, and focus on the idea that we'll only decommit if pagesToDecommit == smallPagesPerPhysicalPage. You can decide. > Source/bmalloc/bmalloc/Heap.cpp:303 > + if (firstPageToDecommit != firstPageInRange || pagesToDecommit != smallPageCount) > + return; This can check for (matchingPageCount < smallPageCount) instead ... or (pagesToDecommit < smallPagesPerPhysicalPage) if you choose my suggested renaming. > Source/bmalloc/bmalloc/Heap.cpp:316 > + m_physicalPageMap.decommit(firstPageToDecommit, decommitSize); Just use physicalPagesBegin here instead of firstPageToDecommit.
Michael Saboff
Comment 16 2019-02-06 18:57:44 PST
Created attachment 361362 [details] Patch for Landing Has changes in response to review. Waiting on performance tests before landing.
Geoffrey Garen
Comment 17 2019-02-06 21:57:28 PST
Comment on attachment 361362 [details] Patch for Landing View in context: https://bugs.webkit.org/attachment.cgi?id=361362&action=review > Source/bmalloc/bmalloc/Heap.cpp:136 > + // We only want power of 2 pageSizes. Given that smallPageSize is a power of 2, we just double it when we want a larger size. > + for (size_t pageSize = smallPageSize; pageSize < pageSizeMax; pageSize *= 2) { This isn't quite right. At page sizes above { 4kB on Mac, 16kB on iOS }, you have a wasteful increase in page size. We don't actually want powers of two. We want either even divisors of m_vmPageSizePhysical or even multiples of m_vmPageSizePhysical. That's what guarantees that we can madvise each physical page without waste. You can get that effect by doing *= 2 up to m_vmPageSizePhysical and += m_vmPageSizePhysical above m_vmPageSizePhysical. I would write that as two loops. > Source/bmalloc/bmalloc/Heap.cpp:294 > + for (auto* page : chunk->freePages()) { > + if (page >= beginPageRange && page < endPageRange) { > + smallPagesReadyToDecommit++; > + if (smallPagesReadyToDecommit == smallPagePerPhysicalPage) > + break; > + } > + } We don't need a linear search of the free page list to find { 2 or 4 } pages thet are aligned neighbors. Chunk needs this API: size_t pageNumber(SmallPage*); You can make a mask like this: auto mask = smallPagePerPhysicalPage - 1; Then you can iterate the pages you need like this: auto pages = chunk->pages()[chunk->pageNumber(smallPage) & mask]; for (size_t i = 0; i < smallPagePerPhysicalPage; ++i) { auto page = pages[i]; ... } Also, there's no need to count smallPagesReadyToDecommit. You can just return if any page says it has a non-zero refCount().
Geoffrey Garen
Comment 18 2019-02-07 07:26:22 PST
Actually, even better than masking the page and iterating all neighbors, you should just skip any page that isn’t already aligned to the mask. That way, the first page in a run of pages is in charge of madvising the run, and the rest of the pages can be no-ops. If you never iterate the first page, that’s ok, that proves you would not have madvised the run.
Michael Saboff
Comment 19 2019-02-07 11:53:13 PST
(In reply to Geoffrey Garen from comment #17) > Comment on attachment 361362 [details] > Patch for Landing > > View in context: > https://bugs.webkit.org/attachment.cgi?id=361362&action=review > > > Source/bmalloc/bmalloc/Heap.cpp:136 > > + // We only want power of 2 pageSizes. Given that smallPageSize is a power of 2, we just double it when we want a larger size. > > + for (size_t pageSize = smallPageSize; pageSize < pageSizeMax; pageSize *= 2) { > > This isn't quite right. At page sizes above { 4kB on Mac, 16kB on iOS }, you > have a wasteful increase in page size. > > We don't actually want powers of two. We want either even divisors of > m_vmPageSizePhysical or even multiples of m_vmPageSizePhysical. That's what > guarantees that we can madvise each physical page without waste. > > You can get that effect by doing *= 2 up to m_vmPageSizePhysical and += > m_vmPageSizePhysical above m_vmPageSizePhysical. I would write that as two > loops. Done. > > Source/bmalloc/bmalloc/Heap.cpp:294 > > + for (auto* page : chunk->freePages()) { > > + if (page >= beginPageRange && page < endPageRange) { > > + smallPagesReadyToDecommit++; > > + if (smallPagesReadyToDecommit == smallPagePerPhysicalPage) > > + break; > > + } > > + } > > We don't need a linear search of the free page list to find { 2 or 4 } pages > thet are aligned neighbors. > > Chunk needs this API: > > size_t pageNumber(SmallPage*); > > You can make a mask like this: > > auto mask = smallPagePerPhysicalPage - 1; > > Then you can iterate the pages you need like this: > > auto pages = chunk->pages()[chunk->pageNumber(smallPage) & mask]; > for (size_t i = 0; i < smallPagePerPhysicalPage; ++i) { > auto page = pages[i]; > ... > } We can't iterate by incrementing the page number since pages that are bigger than smallPageSize use multiple SmallPage entries in Chunk::m_pages (and set Page::m_slide accordingly). Calculating a stride is possible, but ugly. We should use a for loop of Objects with the pageSize just like elsewhere in the code. It still makes sense to let the first page in the physical page manage the madvising.
Geoffrey Garen
Comment 20 2019-02-07 12:03:53 PST
> We can't iterate by incrementing the page number since pages that are bigger > than smallPageSize use multiple SmallPage entries in Chunk::m_pages (and set > Page::m_slide accordingly). Calculating a stride is possible, but ugly. We > should use a for loop of Objects with the pageSize just like elsewhere in > the code. > > It still makes sense to let the first page in the physical page manage the > madvising. Good point! Yeah, let's use masking to identify the first page and make the first page responsible for scanning the rest, but let's also use Object-based iteration rather than ++i.
Michael Saboff
Comment 21 2019-02-07 13:32:57 PST
Created attachment 361436 [details] Updated patch responding to review comments
Geoffrey Garen
Comment 22 2019-02-07 14:29:00 PST
Comment on attachment 361436 [details] Updated patch responding to review comments r=me
Michael Saboff
Comment 23 2019-02-07 18:30:24 PST
Saam Barati
Comment 24 2019-02-10 19:48:54 PST
It seems possible this is a speedometer2 regression on iOS
WebKit Commit Bot
Comment 25 2019-02-12 11:17:43 PST
Re-opened since this is blocked by bug 194547
Note You need to log in before you can comment on or make changes to this bug.