Bug 230143 - [bmalloc] Merge large frees after making them eligible
Summary: [bmalloc] Merge large frees after making them eligible
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: bmalloc (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Basuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks: 230126
  Show dependency treegraph
 
Reported: 2021-09-09 20:00 PDT by Basuke Suzuki
Modified: 2021-11-18 14:09 PST (History)
7 users (show)

See Also:


Attachments
patch (7.39 KB, patch)
2021-09-09 21:39 PDT, Basuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
PATCH (8.33 KB, patch)
2021-09-10 15:16 PDT, Basuke Suzuki
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
PATCH (8.33 KB, patch)
2021-09-10 23:20 PDT, Basuke Suzuki
no flags Details | Formatted Diff | Diff
Patch for landing (2.57 KB, patch)
2021-09-11 11:31 PDT, Basuke Suzuki
simon.fraser: review-
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch for landing 2 (10.90 KB, patch)
2021-09-12 11:45 PDT, Basuke Suzuki
no flags Details | Formatted Diff | Diff
patch (6.34 KB, patch)
2021-09-18 13:19 PDT, Basuke Suzuki
ggaren: review-
Basuke.Suzuki: commit-queue?
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Basuke Suzuki 2021-09-09 20:00:06 PDT
Current implementation tries merging of two mergeable large frees when free is added to largeMap. If one is not eligible, that region is no chance to be merged with other regions in the map. It is possible to improve the chance of merging after all regions are marked as eligible.
Comment 1 Basuke Suzuki 2021-09-09 21:39:58 PDT
Created attachment 437835 [details]
patch
Comment 2 Basuke Suzuki 2021-09-09 21:49:48 PDT
There's no fundamental number differences between current one and patched on testmem:

main:

end score:           6.9931 MB
peak score:          10.5662 MB
total memory score:  8.596 MB
time score:          482.92 ms


patched:

end score:           6.8855 MB
peak score:          10.6945 MB
total memory score:  8.5812 MB
time score:          482.6 ms
Comment 3 Basuke Suzuki 2021-09-10 15:16:09 PDT
Created attachment 437914 [details]
PATCH

Found a bug during add(). m_free.insert can invalidate iterator. It had better not touching that there. usedSinceLastScavenge should be managed by the caller of add().

Also it should be true if the created large range has physical pages so that it can be set in constructor.

Another thing is that most usages are simply full of physical pages attached, so make the constructor simple.
Comment 4 Basuke Suzuki 2021-09-10 23:20:02 PDT
Created attachment 437941 [details]
PATCH

Wrong BASSERT was added. Fixed
Comment 5 Basuke Suzuki 2021-09-11 11:31:31 PDT
Created attachment 437960 [details]
Patch for landing
Comment 6 Simon Fraser (smfr) 2021-09-12 08:54:11 PDT
Comment on attachment 437960 [details]
Patch for landing

This patch only contains the changelog.
Comment 7 Basuke Suzuki 2021-09-12 11:45:37 PDT
Created attachment 437992 [details]
Patch for landing 2
Comment 8 Basuke Suzuki 2021-09-12 11:49:49 PDT
(In reply to Simon Fraser (smfr) from comment #6)
> Comment on attachment 437960 [details]
> Patch for landing
> 
> This patch only contains the changelog.

Thanks. I don't know why this happens. I'm so sad that I took my decent time to check EWS results for unchanged tree.
Comment 9 Geoffrey Garen 2021-09-14 12:39:21 PDT
Comment on attachment 437992 [details]
Patch for landing 2

View in context: https://bugs.webkit.org/attachment.cgi?id=437992&action=review

Can you separate the refactoring changes from the behavior changes in this patch, and post them separately? It's subtle to figure out which is which here.

> Source/bmalloc/ChangeLog:11
> +        Deallocated larges are stored in Heap's LargeFree. They're merged if they are neighbors and
> +        has same states. But chances are only when they added to the LargeFree map. When they are
> +        not eligible (it is likely possible) at that time, there's no chance to be merged except
> +        other neighbors are inserted.

I think what you're trying to say is that the scavenger may cause temporary fragmentation when it calls setEligible(false). Is that right?

Did you discover this behavior in some workload? What were the specifics?

> Source/bmalloc/bmalloc/LargeMap.cpp:-76
> -        merged = merge(merged, m_free.pop(i--));

Why is it OK to remove this merging step? If your plan is to wait for the call to mergeNeighbors(), that's not OK, because it's not guaranteed to happen soon.
Comment 10 Geoffrey Garen 2021-09-14 12:40:06 PDT
+Michael, since I think he added setEligible().
Comment 11 Basuke Suzuki 2021-09-14 15:45:14 PDT
(In reply to Geoffrey Garen from comment #9)
> Comment on attachment 437992 [details]
> Patch for landing 2
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=437992&action=review
> 
> Can you separate the refactoring changes from the behavior changes in this
> patch, and post them separately? It's subtle to figure out which is which
> here.

Okay, I do that first.
Comment 12 Basuke Suzuki 2021-09-14 16:31:10 PDT
LargeRange related refactoring is opened in this bug. https://bugs.webkit.org/show_bug.cgi?id=230283
Comment 13 Basuke Suzuki 2021-09-14 16:58:42 PDT
(In reply to Geoffrey Garen from comment #9)
> > Source/bmalloc/ChangeLog:11
> > +        Deallocated larges are stored in Heap's LargeFree. They're merged if they are neighbors and
> > +        has same states. But chances are only when they added to the LargeFree map. When they are
> > +        not eligible (it is likely possible) at that time, there's no chance to be merged except
> > +        other neighbors are inserted.
> 
> I think what you're trying to say is that the scavenger may cause temporary
> fragmentation when it calls setEligible(false). Is that right?

Think about this situation: Range A and B are attached and can be merged if they have same state. Here Range A is eligible in LargeFree. Range C is also in LargeFree and is not attached to B. Now Range B which is not eligible and about to be added to LargeFree. It cannot be merged with A because of the eligibility. So it will be appended at the end for LargeFree.

Range A [eligible]
Range C [...]
Range B [non-eligible]

Sooner or later, scavenger will make all eligible, but there're no chance to merge A and B any more until they are used and returned.

> Did you discover this behavior in some workload? What were the specifics?

We are still investigation of the situation, but we are seeing the issue that some region in large free are sit there without decommiting whole pages. The situation happens after playing long video playing.

One reason we guess is related to the question I've asked in the slack channel https://webkit.slack.com/archives/CU64U6FDW/p1630628124106600 and related to the size of largeAlignment.

This patch is related to the other solution that we are trying to add API to bmalloc to release largeFree in case of low memory situation. [https://bugs.webkit.org/show_bug.cgi?id=230126] The sorted ranges in LargeFree helps to detect freeable LargeRange in LargeFree efficiently.

Also increasing the chance to be merged with other range may increase the possibility to be reused.


> > Source/bmalloc/bmalloc/LargeMap.cpp:-76
> > -        merged = merge(merged, m_free.pop(i--));
> 
> Why is it OK to remove this merging step? If your plan is to wait for the
> call to mergeNeighbors(), that's not OK, because it's not guaranteed to
> happen soon.

That is correct. I'll recover merge trial while appending to FreeLarge.
Comment 14 Radar WebKit Bug Importer 2021-09-16 20:01:14 PDT
<rdar://problem/83224160>
Comment 15 Basuke Suzuki 2021-09-18 13:19:31 PDT
Created attachment 438567 [details]
patch
Comment 16 Basuke Suzuki 2021-09-18 13:26:07 PDT
(In reply to Geoffrey Garen from comment #9)
> Comment on attachment 437992 [details]
> Patch for landing 2
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=437992&action=review
> 
> Can you separate the refactoring changes from the behavior changes in this
> patch, and post them separately? It's subtle to figure out which is which
> here.

This new patch just contains large free related patch.
Comment 17 Geoffrey Garen 2021-10-06 15:30:36 PDT
Comment on attachment 438567 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438567&action=review

> Source/bmalloc/ChangeLog:22
> +        Here's a quick result from testmem on MacBook Air M1. The differences are tiny.

This patch seems to include some subtleties, including a potential throughput penalty at the end of scavenging. Can we point to a clearer memory improvement that would justify the subtleties and the potential throughput penalty? We generally prefer not to make changes for performance unless we can measure the impact.

> Source/bmalloc/bmalloc/LargeMap.cpp:69
>  void LargeMap::add(const LargeRange& range)

Why does LargeMap::add change at all in this patch? Why isn't the heap always back in a fully merged, consistent state after LargeMap::mergeNeighbors() has returned?

> Source/bmalloc/bmalloc/LargeMap.cpp:90
> +            auto* next = it + 1;
> +            if (next != m_free.end() && canMerge(*it, *next)) {
> +                *it = merge(*it, *next);
> +                m_free.remove(next);
> +            }

I don't think it's right to check only the immediate neighbor (it + 1) for merging. m_free is not sorted, so I'm not sure what this would achieve.

> Source/bmalloc/bmalloc/LargeMap.cpp:112
> +        auto* next = &range + 1;
> +
> +        while (next < m_free.end() && canMerge(range, *next))
> +            range = merge(range, *next++);

Same comment about it + 1.

> Source/bmalloc/bmalloc/Vector.h:174
> +    BASSERT(from < to);
> +    size_t moveCount = end() - to;
> +    if (moveCount)
> +        std::memmove(from, to, moveCount * sizeof(T));

No need to check moveCount here.
Comment 18 Basuke Suzuki 2021-10-06 19:41:06 PDT
Comment on attachment 438567 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438567&action=review

Thanks for reviewing.

>> Source/bmalloc/ChangeLog:22
>> +        Here's a quick result from testmem on MacBook Air M1. The differences are tiny.
> 
> This patch seems to include some subtleties, including a potential throughput penalty at the end of scavenging. Can we point to a clearer memory improvement that would justify the subtleties and the potential throughput penalty? We generally prefer not to make changes for performance unless we can measure the impact.

Okay. I prepare such statistics. While waiting that stats, please review this code with the information I've replied. I tried to describe your question.

>> Source/bmalloc/bmalloc/LargeMap.cpp:69
>>  void LargeMap::add(const LargeRange& range)
> 
> Why does LargeMap::add change at all in this patch? Why isn't the heap always back in a fully merged, consistent state after LargeMap::mergeNeighbors() has returned?

> Why does LargeMap::add change at all in this patch?
They are now kept sorted while adding

> Why isn't the heap always back in a fully merged, consistent state after LargeMap::mergeNeighbors() has returned?
Sorry I don't get it. It can't be fully merged because they are split into pieces. Not all regions are returned.

>> Source/bmalloc/bmalloc/LargeMap.cpp:90
>> +            }
> 
> I don't think it's right to check only the immediate neighbor (it + 1) for merging. m_free is not sorted, so I'm not sure what this would achieve.

Because they are sorted, there's only three chances to be merged:
- end of copied is the beginning of some of ranges.
- beginning of the copied is the end of some of ranges and end of the copied is not attached to any ranges.
- both ends are attached to two ranges.

Other cases, it will be inserted into the proper position.

>> Source/bmalloc/bmalloc/LargeMap.cpp:112
>> +            range = merge(range, *next++);
> 
> Same comment about it + 1.

Same here. Because they are sorted.

>> Source/bmalloc/bmalloc/Vector.h:174
>> +        std::memmove(from, to, moveCount * sizeof(T));
> 
> No need to check moveCount here.

Got it.
Comment 19 Geoffrey Garen 2021-11-18 14:09:11 PST
Comment on attachment 438567 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=438567&action=review

>>> Source/bmalloc/bmalloc/LargeMap.cpp:90
>>> +            }
>> 
>> I don't think it's right to check only the immediate neighbor (it + 1) for merging. m_free is not sorted, so I'm not sure what this would achieve.
> 
> Because they are sorted, there's only three chances to be merged:
> - end of copied is the beginning of some of ranges.
> - beginning of the copied is the end of some of ranges and end of the copied is not attached to any ranges.
> - both ends are attached to two ranges.
> 
> Other cases, it will be inserted into the proper position.

I see: I hadn't noticed that some of the changes to add() were motivated by changing the invariant from an unsorted list to a sorted list. That's a great thing to call out in the ChangeLog.

Is it OK that LargeMap::remove() still calls m_free.pop(candidate)? I believe that will un-sort-ify the list.

FWIW, I find it easier to reason about an unsorted list, because it has fewer invariants, and therefore fewer opportunities to violate an invariant. But perhaps the perf numbers will show that a sorted list is better in the end. Or perhaps mergeNeighbors() should sort and then merge through "two-finger compaction".