Bug 226237 - [bmalloc] Make adaptive scavenging more precise
Summary: [bmalloc] Make adaptive scavenging more precise
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified macOS 11
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords:
Depends on:
Blocks: 203987
  Show dependency treegraph
 
Reported: 2021-05-25 13:00 PDT by Michael Saboff
Modified: 2021-07-06 14:20 PDT (History)
3 users (show)

See Also:


Attachments
Work in progress patch (32.56 KB, patch)
2021-05-25 13:13 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Updated patch (33.71 KB, patch)
2021-05-26 00:27 PDT, Michael Saboff
ggaren: review+
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch for Landing. (36.45 KB, patch)
2021-05-27 16:54 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with mini-mode fix. (36.93 KB, patch)
2021-06-09 11:26 PDT, Michael Saboff
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Updated patch with fix (37.36 KB, patch)
2021-06-23 15:20 PDT, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2021-05-25 13:00:06 PDT
The adaptive scavenger introduced many more madvise() calls.  Some of those calls are expensive since the scavenger uses a conservative approach to madvise()'ing.

Instead, we should minimize the range of the address space we madvise().  We can do this by keeping track of the allocated extent of large regions.
Comment 1 Michael Saboff 2021-05-25 13:00:35 PDT
<rdar://78188389>
Comment 2 Michael Saboff 2021-05-25 13:13:02 PDT
Created attachment 429685 [details]
Work in progress patch
Comment 3 Geoffrey Garen 2021-05-25 14:12:54 PDT
Comment on attachment 429685 [details]
Work in progress patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429685&action=review

> Source/bmalloc/ChangeLog:9
> +        It also tries to be more precise when calling madvise() by keeping track

You can probably be less coy and say that it measured as 1000X more precise in practice -- due to the specific pathology of Gigacage.

> Source/bmalloc/bmalloc/Heap.cpp:111
> +#if 0
> +    static unsigned initNoPhys = 0;
> +#endif

revert

> Source/bmalloc/bmalloc/Heap.cpp:253
> -    m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize));
> +    physicalExtent = totalPhysicalSize == size ? chunk->bytes() + totalPhysicalSize : physicalExtent;
> +    m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent));

I think it's always correct just to use physicalExtent, so let's do that to simplify.

Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse now. When we first allocate a Chunk, there's code in allocateSmallChunk() that decommits any unused space at the beginning and end of the Chunk. But here we're telling the large free set that the whole Chunk is physical pages. That's wrong. We either need to stop decommitting those two pieces of the Chunk (I'm not sure if that code accomplishes anything in practice or not), or change the physical pages reported here.

> Source/bmalloc/bmalloc/LargeRange.h:90
> +    // This the last address of physical memory in this range.

This the => This is the

> Source/bmalloc/bmalloc/LargeRange.h:92
> +    char* physicalExtent() const { return m_physicalExtent; }

Maybe call this physicalEnd? 'end' is the word c++ usually uses for a pointer just past the end.

> Source/bmalloc/bmalloc/LargeRange.h:94
> +    void resetPhysicalExtent() { m_physicalExtent = begin(); }

reset => clear
Comment 4 Michael Saboff 2021-05-25 20:09:45 PDT
(In reply to Geoffrey Garen from comment #3)
> Comment on attachment 429685 [details]
> Work in progress patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=429685&action=review
> 
> > Source/bmalloc/ChangeLog:9
> > +        It also tries to be more precise when calling madvise() by keeping track
> 
> You can probably be less coy and say that it measured as 1000X more precise
> in practice -- due to the specific pathology of Gigacage.
> 
> > Source/bmalloc/bmalloc/Heap.cpp:111
> > +#if 0
> > +    static unsigned initNoPhys = 0;
> > +#endif
> 
> revert

Will do.  Was used for debugging.

> > Source/bmalloc/bmalloc/Heap.cpp:253
> > -    m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize));
> > +    physicalExtent = totalPhysicalSize == size ? chunk->bytes() + totalPhysicalSize : physicalExtent;
> > +    m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent));
> 
> I think it's always correct just to use physicalExtent, so let's do that to
> simplify.

Actually, the rounding up to totalPhysicalSize is needed.  Took me a little to find this bug.  Consider that the small chunk we are deallocating has a page class that doesn't work out to an even number of pages.  I found this with 12K page class size.  The calculated physical extent of the last page was 4K less (one page on x86) than totalPhysicalSize.  I think it is only an issue for the last page in the chunk.  This is the unused space at the end of the chunk in your next comment.

> Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse
> now. When we first allocate a Chunk, there's code in allocateSmallChunk()
> that decommits any unused space at the beginning and end of the Chunk. But
> here we're telling the large free set that the whole Chunk is physical
> pages. That's wrong. We either need to stop decommitting those two pieces of
> the Chunk (I'm not sure if that code accomplishes anything in practice or
> not), or change the physical pages reported here.

So I think my change above takes care of the unused space at the end.  I'll look into the unused space at the beginning.  Maybe we need to rethink the handling of unused space on both sides of the chunk.

I think there is another bug here that I was going to address separately.  We calculate startPhysicalSize based on hasPhysicalPages.  The code above sets hasPhysicalPages to false for any page that doesn't have physical pages.  Seems to me that some page(s) at the end of the chunk could have been recommitted and will result in startPhysicalSize being set to 0.  I think the startPhysicalSize needs to be calculated based on how many pages at the front of the chunk still have physical memory.

> > Source/bmalloc/bmalloc/LargeRange.h:90
> > +    // This the last address of physical memory in this range.
> 
> This the => This is the

Fixed

> > Source/bmalloc/bmalloc/LargeRange.h:92
> > +    char* physicalExtent() const { return m_physicalExtent; }
> 
> Maybe call this physicalEnd? 'end' is the word c++ usually uses for a
> pointer just past the end.

Agree that physicalEnd is a better name.

> > Source/bmalloc/bmalloc/LargeRange.h:94
> > +    void resetPhysicalExtent() { m_physicalExtent = begin(); }
> 
> reset => clear

Changed.
Comment 5 Geoffrey Garen 2021-05-25 20:43:14 PDT
Comment on attachment 429685 [details]
Work in progress patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429685&action=review

>>> Source/bmalloc/bmalloc/Heap.cpp:253
>>> +    m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent));
>> 
>> I think it's always correct just to use physicalExtent, so let's do that to simplify.
>> 
>> Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse now. When we first allocate a Chunk, there's code in allocateSmallChunk() that decommits any unused space at the beginning and end of the Chunk. But here we're telling the large free set that the whole Chunk is physical pages. That's wrong. We either need to stop decommitting those two pieces of the Chunk (I'm not sure if that code accomplishes anything in practice or not), or change the physical pages reported here.
> 
> Actually, the rounding up to totalPhysicalSize is needed.  Took me a little to find this bug.  Consider that the small chunk we are deallocating has a page class that doesn't work out to an even number of pages.  I found this with 12K page class size.  The calculated physical extent of the last page was 4K less (one page on x86) than totalPhysicalSize.  I think it is only an issue for the last page in the chunk.  This is the unused space at the end of the chunk in your next comment.

I don't understand. In the 12kB page size case, forEachPage should iterate all 12kB ranges, and then stop. If there's an 8kB range at the end of the Chunk, forEachPage should not iterate it. Can you explain more about what caused a 12kB page to look like an 8kB page in this code? Seems like it might be a bug we should fix some other way.

Here's another bug: physicalExtent is initialized to 'chunk', but a Chunk has physical pages at its head that contain metadata, so the minimum physicalExtent should actually be chunk + Chunk::metadataSize(pageSize). Getting this wrong will prevent us from decommitting Chunk metadata. This bug will trigger in the case where all pages are already decommitted, because physicalExtent will remain at its initial value.

Put another way, I think we need to make sure that physicalExtent is always correct; otherwise it is a liability. So, if there is a case where physicalExtent is calculated incorrectly, and then we fix it up by checking totalPhysicalSize, I'd like to know more about that case, and I'd like us to just calculate physicalExtent correctly instead.
Comment 6 Michael Saboff 2021-05-26 00:27:08 PDT
(In reply to Geoffrey Garen from comment #5)
> Comment on attachment 429685 [details]
> Work in progress patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=429685&action=review
> 
> >>> Source/bmalloc/bmalloc/Heap.cpp:253
> >>> +    m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent));
> >> 
> >> I think it's always correct just to use physicalExtent, so let's do that to simplify.
> >> 
> >> Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse now. When we first allocate a Chunk, there's code in allocateSmallChunk() that decommits any unused space at the beginning and end of the Chunk. But here we're telling the large free set that the whole Chunk is physical pages. That's wrong. We either need to stop decommitting those two pieces of the Chunk (I'm not sure if that code accomplishes anything in practice or not), or change the physical pages reported here.
> > 
> > Actually, the rounding up to totalPhysicalSize is needed.  Took me a little to find this bug.  Consider that the small chunk we are deallocating has a page class that doesn't work out to an even number of pages.  I found this with 12K page class size.  The calculated physical extent of the last page was 4K less (one page on x86) than totalPhysicalSize.  I think it is only an issue for the last page in the chunk.  This is the unused space at the end of the chunk in your next comment.
> 
> I don't understand. In the 12kB page size case, forEachPage should iterate
> all 12kB ranges, and then stop. If there's an 8kB range at the end of the
> Chunk, forEachPage should not iterate it. Can you explain more about what
> caused a 12kB page to look like an 8kB page in this code? Seems like it
> might be a bug we should fix some other way.

In the 12Kb page size case, forEachPage does iterate through all 12Kb ranges.  There is 4Kb that wasn't allocated to any page at the end of the chunk.  In allocateSmallChunk, we calculate that unused extra in the line:
    auto decommitSize = chunkSize - metadataSize - accountedInFreeable;
And then we decommit that amount at the end of the chunk.  Here in deallocateSmallChunk, we want to "recover" that unused part of the chunk.

> Here's another bug: physicalExtent is initialized to 'chunk', but a Chunk
> has physical pages at its head that contain metadata, so the minimum
> physicalExtent should actually be chunk + Chunk::metadataSize(pageSize).
> Getting this wrong will prevent us from decommitting Chunk metadata. This
> bug will trigger in the case where all pages are already decommitted,
> because physicalExtent will remain at its initial value.

In the patch I'll post shortly, I no longer decommit the metadata at the front of the chunk and the unused space at the end in allocateSmallChunk.  In deallocateSmallChunk, I also start physicalEnd (nee physicalExtent) as chunk + Chunk::metadataSize(pageSize).  I'm open to suggestions on how to add back the unused region at the end of the chunk.

> Put another way, I think we need to make sure that physicalExtent is always
> correct; otherwise it is a liability. So, if there is a case where
> physicalExtent is calculated incorrectly, and then we fix it up by checking
> totalPhysicalSize, I'd like to know more about that case, and I'd like us to
> just calculate physicalExtent correctly instead.
Comment 7 Michael Saboff 2021-05-26 00:27:47 PDT
Created attachment 429734 [details]
Updated patch
Comment 8 Geoffrey Garen 2021-05-27 11:53:20 PDT
Comment on attachment 429734 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=429734&action=review

r=me

> Source/bmalloc/ChangeLog:20
> +        There is the possible future optimization where we also keep track of the
> +        first address of physically mapped memory in a LargeRange.  Since bmalloc
> +        allocates memory from lower addresses first, it is thought that the change
> +        in this patch is sufficient to reduce not only the number of madvise calls,
> +        but the time it takes to make those calls.

Might be worth mentioning that you tested this thought, and what the results were.

> Source/bmalloc/bmalloc/Heap.cpp:-239
> -        auto metadataSize = Chunk::metadataSize(pageSize);
> -        vmDeallocatePhysicalPagesSloppy(chunk->address(sizeof(Chunk)), metadataSize - sizeof(Chunk));
> -
> -        auto decommitSize = chunkSize - metadataSize - accountedInFreeable;
> -        if (decommitSize > 0)
> -            vmDeallocatePhysicalPagesSloppy(chunk->address(chunkSize - decommitSize), decommitSize);

I thought about this for a while and did not come up with a better idea than removing these VM deallocations. But I think this change means we need to A/B test Membuster and PLUM before we land.

> Source/bmalloc/bmalloc/Heap.cpp:231
> +            if (calculatingStartPhysicalSize) {

I think this would be clearer if we had a local variable 'SmallPage* firstPageWithoutPhysicalPages = nullptr'.

Then here we can have:

if (!firstPageWithoutPhysicalPages)
    firstPageWithoutPhysicalPages = page;

And below we can have:

size_t startPhysicalSize = firstPageWithoutPhysicalPages ? firstPageWithoutPhysicalPages->begin()->begin() - chunk->bytes() : size;

That way, it's more explicit what the loop is searching for, and there's never a time when startPhysicalSize holds an invalid value.

> Source/bmalloc/bmalloc/LargeRange.h:90
> +    // This is the last address of physical memory in this range.

last => past the end
Comment 9 Michael Saboff 2021-05-27 16:54:05 PDT
Created attachment 429956 [details]
Patch for Landing.

Will run some A-B perf tests locally before landing.
Comment 10 Michael Saboff 2021-05-31 07:58:20 PDT
PLUM results run on three iPhone variants and an iPad are neutral.

Membuster results run on two Mac variants show a progression of between 0.8-1.0%.
Comment 11 Michael Saboff 2021-05-31 08:04:29 PDT
Committed r278278 (238315@main): <https://commits.webkit.org/238315@main>
Comment 12 Michael Saboff 2021-06-03 20:47:22 PDT
Rolled out in https://trac.webkit.org/changeset/278447/webkit.

It made some JSC mini mode and other tests flakey.
Comment 13 Michael Saboff 2021-06-09 11:26:40 PDT
Created attachment 430982 [details]
Patch with mini-mode fix.

The mini mode issue was fixed by addressing the FIXME bug https://bugs.webkit.org/show_bug.cgi?id=203987 - "[bmalloc] Mini-mode should also adjust scavenger interval".
Comment 14 Geoffrey Garen 2021-06-14 16:56:01 PDT
Comment on attachment 430982 [details]
Patch with mini-mode fix.

View in context: https://bugs.webkit.org/attachment.cgi?id=430982&action=review

> Source/bmalloc/ChangeLog:10
> +        Reland the adaptive scavenger change for macOS with a fix to mini mode.

A policy to always use the min interval in mini mode seems sensible. But why did the
previous interval make correctness tests flaky? And why do we want a below-minimum initial value to trigger the scavenger right after startup?

Those are the things we need to know in order to evaluate whether this approach is sensible.
Comment 15 Geoffrey Garen 2021-06-14 16:59:01 PDT
mac-as-debug-wk2 claims to have found some crashes that only happen with this patch applied. Is it right about that?
Comment 16 Michael Saboff 2021-06-14 18:20:30 PDT
(In reply to Geoffrey Garen from comment #15)
> mac-as-debug-wk2 claims to have found some crashes that only happen with
> this patch applied. Is it right about that?

I should have put a comment in.

I kern the mac-AS-debug-wk2 tests.  There are indeed a few crashes with the patch.  The testing without the patch seems to have died.  The total time running WK tests without the patch was 23 seconds, while the two runs with the patch ran for 1:03 (1 hour and 3 minutes).  The first time the mac-AS-debug-wk2 tests ran was the same, 21sec w/o the patch and 1:03 with the patch.

Therefore that bot is unless in my mind.  I built and ran Mac-AS Release WK2 tests on my system.  I'll run the Debug version.
Comment 17 Michael Saboff 2021-06-14 18:47:30 PDT
(In reply to Geoffrey Garen from comment #14)
> Comment on attachment 430982 [details]
> Patch with mini-mode fix.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=430982&action=review
> 
> > Source/bmalloc/ChangeLog:10
> > +        Reland the adaptive scavenger change for macOS with a fix to mini mode.
> 
> A policy to always use the min interval in mini mode seems sensible. But why
> did the
> previous interval make correctness tests flaky? And why do we want a
> below-minimum initial value to trigger the scavenger right after startup?

I was only able to catch the crash with a release build in the debugger, never with a debug build.  When I caught the crash, it was in a "tryAllocate...()" path that bubbled up to a butterfly allocation that didn't expect an allocation failure.  See ButterflyInlines.h Butterfly::create() where it calls Butterfly::tryCreate().  So that crash isn't necessarily a correctness issue.  I filed https://bugs.webkit.org/show_bug.cgi?id=227003 - "Butterfly::create() does not handle allocation failures.".

As far as the initial scavenging value, it has always been that way as far as I can remember.  My belief is that it was done that way to effectively scavenge immediately but asynchronously via the timer.
 
> Those are the things we need to know in order to evaluate whether this
> approach is sensible.
Comment 18 Michael Saboff 2021-06-15 11:49:01 PDT
(In reply to Michael Saboff from comment #16)
> (In reply to Geoffrey Garen from comment #15)
> > mac-as-debug-wk2 claims to have found some crashes that only happen with
> > this patch applied. Is it right about that?
> 
> I should have put a comment in.
> 
> I kern the mac-AS-debug-wk2 tests.  There are indeed a few crashes with the
> patch.  The testing without the patch seems to have died.  The total time
> running WK tests without the patch was 23 seconds, while the two runs with
> the patch ran for 1:03 (1 hour and 3 minutes).  The first time the
> mac-AS-debug-wk2 tests ran was the same, 21sec w/o the patch and 1:03 with
> the patch.
> 
> Therefore that bot is unless in my mind.  I built and ran Mac-AS Release WK2
> tests on my system.  I'll run the Debug version.

Debug WebKit2 tests on local AS device both with and without the patch had numerous crashes.  A combination mostly of crashes in LaunchServices trying to launch a service worker (including in IndexDB tests) and the rest were ASSERTion failures.
Comment 19 Geoffrey Garen 2021-06-16 13:37:05 PDT
So.... the previous patch didn't actually regress mini mode, and this patch doesn't actually fix anything about mini mode?

If so, at a minimum you should fix the patch title and ChangeLog to say that.

But also: Why did the patch change at all? Why not re-land the original?
Comment 20 Michael Saboff 2021-06-17 12:58:28 PDT
(In reply to Geoffrey Garen from comment #19)
> So.... the previous patch didn't actually regress mini mode, and this patch
> doesn't actually fix anything about mini mode?

I believe the previous patch did regress mini mode on macOS, because it eliminated the partial scavenging code.  From my reading of the partial scavenging code, we'd partial scavenge every 12 seconds (x86) or 8 seconds (AS), irrespective of mini mode.  I recall from the testing I did when working on the adaptive code, I rarely saw full scavenges.  We were almost always doing partial scavenges.

The previous patch eliminated the partial scavenging path and introduced the 10msec between full scavenges for mini mode.  This patch changes the minimum time between full scavenges for mini mode to match what we do when not in mini mode.

> If so, at a minimum you should fix the patch title and ChangeLog to say that.
> 
> But also: Why did the patch change at all? Why not re-land the original?

Because there appears to be an issue in mini mode doing full scavenges every 10 msec.
Comment 21 Filip Pizlo 2021-06-17 15:37:38 PDT
(In reply to Michael Saboff from comment #20)
> (In reply to Geoffrey Garen from comment #19)
> > So.... the previous patch didn't actually regress mini mode, and this patch
> > doesn't actually fix anything about mini mode?
> 
> I believe the previous patch did regress mini mode on macOS, because it
> eliminated the partial scavenging code.  From my reading of the partial
> scavenging code, we'd partial scavenge every 12 seconds (x86) or 8 seconds
> (AS), irrespective of mini mode.  I recall from the testing I did when
> working on the adaptive code, I rarely saw full scavenges.  We were almost
> always doing partial scavenges.
> 
> The previous patch eliminated the partial scavenging path and introduced the
> 10msec between full scavenges for mini mode.  This patch changes the minimum
> time between full scavenges for mini mode to match what we do when not in
> mini mode.
> 
> > If so, at a minimum you should fix the patch title and ChangeLog to say that.
> > 
> > But also: Why did the patch change at all? Why not re-land the original?
> 
> Because there appears to be an issue in mini mode doing full scavenges every
> 10 msec.

What's the issue?
Comment 22 Michael Saboff 2021-06-22 17:27:24 PDT
(In reply to Filip Pizlo from comment #21)
> (In reply to Michael Saboff from comment #20)
> > (In reply to Geoffrey Garen from comment #19)
> > > So.... the previous patch didn't actually regress mini mode, and this patch
> > > doesn't actually fix anything about mini mode?
> > 
> > I believe the previous patch did regress mini mode on macOS, because it
> > eliminated the partial scavenging code.  From my reading of the partial
> > scavenging code, we'd partial scavenge every 12 seconds (x86) or 8 seconds
> > (AS), irrespective of mini mode.  I recall from the testing I did when
> > working on the adaptive code, I rarely saw full scavenges.  We were almost
> > always doing partial scavenges.
> > 
> > The previous patch eliminated the partial scavenging path and introduced the
> > 10msec between full scavenges for mini mode.  This patch changes the minimum
> > time between full scavenges for mini mode to match what we do when not in
> > mini mode.
> > 
> > > If so, at a minimum you should fix the patch title and ChangeLog to say that.
> > > 
> > > But also: Why did the patch change at all? Why not re-land the original?
> > 
> > Because there appears to be an issue in mini mode doing full scavenges every
> > 10 msec.
> 
> What's the issue?

Found the issue where we clear the "can merge" flag during scavenging, but it doesn't get set later.  This is due to an if that avoids unnecessary deallocation of physical pages. The code that deallocates physical pages also sets the can merge flag to true.

I will post a fix shortly after some more testing.
Comment 23 Michael Saboff 2021-06-23 15:20:59 PDT
Created attachment 432096 [details]
Updated patch with fix
Comment 24 Geoffrey Garen 2021-07-02 13:03:09 PDT
Comment on attachment 432096 [details]
Updated patch with fix

r=me

Glad we found the bug.
Comment 25 Michael Saboff 2021-07-06 14:20:57 PDT
Committed r279621 (239439@main): <https://commits.webkit.org/239439@main>