ASSIGNED 230143
[bmalloc] Merge large frees after making them eligible
https://bugs.webkit.org/show_bug.cgi?id=230143
Summary [bmalloc] Merge large frees after making them eligible
Basuke Suzuki
Reported 2021-09-09 20:00:06 PDT
Current implementation tries merging of two mergeable large frees when free is added to largeMap. If one is not eligible, that region is no chance to be merged with other regions in the map. It is possible to improve the chance of merging after all regions are marked as eligible.
Attachments
patch (7.39 KB, patch)
2021-09-09 21:39 PDT, Basuke Suzuki
ews-feeder: commit-queue-
PATCH (8.33 KB, patch)
2021-09-10 15:16 PDT, Basuke Suzuki
ews-feeder: commit-queue-
PATCH (8.33 KB, patch)
2021-09-10 23:20 PDT, Basuke Suzuki
no flags
Patch for landing (2.57 KB, patch)
2021-09-11 11:31 PDT, Basuke Suzuki
simon.fraser: review-
ews-feeder: commit-queue-
Patch for landing 2 (10.90 KB, patch)
2021-09-12 11:45 PDT, Basuke Suzuki
no flags
patch (6.34 KB, patch)
2021-09-18 13:19 PDT, Basuke Suzuki
ggaren: review-
Basuke Suzuki
Comment 1 2021-09-09 21:39:58 PDT
Basuke Suzuki
Comment 2 2021-09-09 21:49:48 PDT
There's no fundamental number differences between current one and patched on testmem: main: end score: 6.9931 MB peak score: 10.5662 MB total memory score: 8.596 MB time score: 482.92 ms patched: end score: 6.8855 MB peak score: 10.6945 MB total memory score: 8.5812 MB time score: 482.6 ms
Basuke Suzuki
Comment 3 2021-09-10 15:16:09 PDT
Created attachment 437914 [details] PATCH Found a bug during add(). m_free.insert can invalidate iterator. It had better not touching that there. usedSinceLastScavenge should be managed by the caller of add(). Also it should be true if the created large range has physical pages so that it can be set in constructor. Another thing is that most usages are simply full of physical pages attached, so make the constructor simple.
Basuke Suzuki
Comment 4 2021-09-10 23:20:02 PDT
Created attachment 437941 [details] PATCH Wrong BASSERT was added. Fixed
Basuke Suzuki
Comment 5 2021-09-11 11:31:31 PDT
Created attachment 437960 [details] Patch for landing
Simon Fraser (smfr)
Comment 6 2021-09-12 08:54:11 PDT
Comment on attachment 437960 [details] Patch for landing This patch only contains the changelog.
Basuke Suzuki
Comment 7 2021-09-12 11:45:37 PDT
Created attachment 437992 [details] Patch for landing 2
Basuke Suzuki
Comment 8 2021-09-12 11:49:49 PDT
(In reply to Simon Fraser (smfr) from comment #6) > Comment on attachment 437960 [details] > Patch for landing > > This patch only contains the changelog. Thanks. I don't know why this happens. I'm so sad that I took my decent time to check EWS results for unchanged tree.
Geoffrey Garen
Comment 9 2021-09-14 12:39:21 PDT
Comment on attachment 437992 [details] Patch for landing 2 View in context: https://bugs.webkit.org/attachment.cgi?id=437992&action=review Can you separate the refactoring changes from the behavior changes in this patch, and post them separately? It's subtle to figure out which is which here. > Source/bmalloc/ChangeLog:11 > + Deallocated larges are stored in Heap's LargeFree. They're merged if they are neighbors and > + has same states. But chances are only when they added to the LargeFree map. When they are > + not eligible (it is likely possible) at that time, there's no chance to be merged except > + other neighbors are inserted. I think what you're trying to say is that the scavenger may cause temporary fragmentation when it calls setEligible(false). Is that right? Did you discover this behavior in some workload? What were the specifics? > Source/bmalloc/bmalloc/LargeMap.cpp:-76 > - merged = merge(merged, m_free.pop(i--)); Why is it OK to remove this merging step? If your plan is to wait for the call to mergeNeighbors(), that's not OK, because it's not guaranteed to happen soon.
Geoffrey Garen
Comment 10 2021-09-14 12:40:06 PDT
+Michael, since I think he added setEligible().
Basuke Suzuki
Comment 11 2021-09-14 15:45:14 PDT
(In reply to Geoffrey Garen from comment #9) > Comment on attachment 437992 [details] > Patch for landing 2 > > View in context: > https://bugs.webkit.org/attachment.cgi?id=437992&action=review > > Can you separate the refactoring changes from the behavior changes in this > patch, and post them separately? It's subtle to figure out which is which > here. Okay, I do that first.
Basuke Suzuki
Comment 12 2021-09-14 16:31:10 PDT
LargeRange related refactoring is opened in this bug. https://bugs.webkit.org/show_bug.cgi?id=230283
Basuke Suzuki
Comment 13 2021-09-14 16:58:42 PDT
(In reply to Geoffrey Garen from comment #9) > > Source/bmalloc/ChangeLog:11 > > + Deallocated larges are stored in Heap's LargeFree. They're merged if they are neighbors and > > + has same states. But chances are only when they added to the LargeFree map. When they are > > + not eligible (it is likely possible) at that time, there's no chance to be merged except > > + other neighbors are inserted. > > I think what you're trying to say is that the scavenger may cause temporary > fragmentation when it calls setEligible(false). Is that right? Think about this situation: Range A and B are attached and can be merged if they have same state. Here Range A is eligible in LargeFree. Range C is also in LargeFree and is not attached to B. Now Range B which is not eligible and about to be added to LargeFree. It cannot be merged with A because of the eligibility. So it will be appended at the end for LargeFree. Range A [eligible] Range C [...] Range B [non-eligible] Sooner or later, scavenger will make all eligible, but there're no chance to merge A and B any more until they are used and returned. > Did you discover this behavior in some workload? What were the specifics? We are still investigation of the situation, but we are seeing the issue that some region in large free are sit there without decommiting whole pages. The situation happens after playing long video playing. One reason we guess is related to the question I've asked in the slack channel https://webkit.slack.com/archives/CU64U6FDW/p1630628124106600 and related to the size of largeAlignment. This patch is related to the other solution that we are trying to add API to bmalloc to release largeFree in case of low memory situation. [https://bugs.webkit.org/show_bug.cgi?id=230126] The sorted ranges in LargeFree helps to detect freeable LargeRange in LargeFree efficiently. Also increasing the chance to be merged with other range may increase the possibility to be reused. > > Source/bmalloc/bmalloc/LargeMap.cpp:-76 > > - merged = merge(merged, m_free.pop(i--)); > > Why is it OK to remove this merging step? If your plan is to wait for the > call to mergeNeighbors(), that's not OK, because it's not guaranteed to > happen soon. That is correct. I'll recover merge trial while appending to FreeLarge.
Radar WebKit Bug Importer
Comment 14 2021-09-16 20:01:14 PDT
Basuke Suzuki
Comment 15 2021-09-18 13:19:31 PDT
Basuke Suzuki
Comment 16 2021-09-18 13:26:07 PDT
(In reply to Geoffrey Garen from comment #9) > Comment on attachment 437992 [details] > Patch for landing 2 > > View in context: > https://bugs.webkit.org/attachment.cgi?id=437992&action=review > > Can you separate the refactoring changes from the behavior changes in this > patch, and post them separately? It's subtle to figure out which is which > here. This new patch just contains large free related patch.
Geoffrey Garen
Comment 17 2021-10-06 15:30:36 PDT
Comment on attachment 438567 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=438567&action=review > Source/bmalloc/ChangeLog:22 > + Here's a quick result from testmem on MacBook Air M1. The differences are tiny. This patch seems to include some subtleties, including a potential throughput penalty at the end of scavenging. Can we point to a clearer memory improvement that would justify the subtleties and the potential throughput penalty? We generally prefer not to make changes for performance unless we can measure the impact. > Source/bmalloc/bmalloc/LargeMap.cpp:69 > void LargeMap::add(const LargeRange& range) Why does LargeMap::add change at all in this patch? Why isn't the heap always back in a fully merged, consistent state after LargeMap::mergeNeighbors() has returned? > Source/bmalloc/bmalloc/LargeMap.cpp:90 > + auto* next = it + 1; > + if (next != m_free.end() && canMerge(*it, *next)) { > + *it = merge(*it, *next); > + m_free.remove(next); > + } I don't think it's right to check only the immediate neighbor (it + 1) for merging. m_free is not sorted, so I'm not sure what this would achieve. > Source/bmalloc/bmalloc/LargeMap.cpp:112 > + auto* next = &range + 1; > + > + while (next < m_free.end() && canMerge(range, *next)) > + range = merge(range, *next++); Same comment about it + 1. > Source/bmalloc/bmalloc/Vector.h:174 > + BASSERT(from < to); > + size_t moveCount = end() - to; > + if (moveCount) > + std::memmove(from, to, moveCount * sizeof(T)); No need to check moveCount here.
Basuke Suzuki
Comment 18 2021-10-06 19:41:06 PDT
Comment on attachment 438567 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=438567&action=review Thanks for reviewing. >> Source/bmalloc/ChangeLog:22 >> + Here's a quick result from testmem on MacBook Air M1. The differences are tiny. > > This patch seems to include some subtleties, including a potential throughput penalty at the end of scavenging. Can we point to a clearer memory improvement that would justify the subtleties and the potential throughput penalty? We generally prefer not to make changes for performance unless we can measure the impact. Okay. I prepare such statistics. While waiting that stats, please review this code with the information I've replied. I tried to describe your question. >> Source/bmalloc/bmalloc/LargeMap.cpp:69 >> void LargeMap::add(const LargeRange& range) > > Why does LargeMap::add change at all in this patch? Why isn't the heap always back in a fully merged, consistent state after LargeMap::mergeNeighbors() has returned? > Why does LargeMap::add change at all in this patch? They are now kept sorted while adding > Why isn't the heap always back in a fully merged, consistent state after LargeMap::mergeNeighbors() has returned? Sorry I don't get it. It can't be fully merged because they are split into pieces. Not all regions are returned. >> Source/bmalloc/bmalloc/LargeMap.cpp:90 >> + } > > I don't think it's right to check only the immediate neighbor (it + 1) for merging. m_free is not sorted, so I'm not sure what this would achieve. Because they are sorted, there's only three chances to be merged: - end of copied is the beginning of some of ranges. - beginning of the copied is the end of some of ranges and end of the copied is not attached to any ranges. - both ends are attached to two ranges. Other cases, it will be inserted into the proper position. >> Source/bmalloc/bmalloc/LargeMap.cpp:112 >> + range = merge(range, *next++); > > Same comment about it + 1. Same here. Because they are sorted. >> Source/bmalloc/bmalloc/Vector.h:174 >> + std::memmove(from, to, moveCount * sizeof(T)); > > No need to check moveCount here. Got it.
Geoffrey Garen
Comment 19 2021-11-18 14:09:11 PST
Comment on attachment 438567 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=438567&action=review >>> Source/bmalloc/bmalloc/LargeMap.cpp:90 >>> + } >> >> I don't think it's right to check only the immediate neighbor (it + 1) for merging. m_free is not sorted, so I'm not sure what this would achieve. > > Because they are sorted, there's only three chances to be merged: > - end of copied is the beginning of some of ranges. > - beginning of the copied is the end of some of ranges and end of the copied is not attached to any ranges. > - both ends are attached to two ranges. > > Other cases, it will be inserted into the proper position. I see: I hadn't noticed that some of the changes to add() were motivated by changing the invariant from an unsorted list to a sorted list. That's a great thing to call out in the ChangeLog. Is it OK that LargeMap::remove() still calls m_free.pop(candidate)? I believe that will un-sort-ify the list. FWIW, I find it easier to reason about an unsorted list, because it has fewer invariants, and therefore fewer opportunities to violate an invariant. But perhaps the perf numbers will show that a sorted list is better in the end. Or perhaps mergeNeighbors() should sort and then merge through "two-finger compaction".
Note You need to log in before you can comment on or make changes to this bug.