226237 – [bmalloc] Make adaptive scavenging more precise

RESOLVED FIXED226237

[bmalloc] Make adaptive scavenging more precise

https://bugs.webkit.org/show_bug.cgi?id=226237

Summary [bmalloc] Make adaptive scavenging more precise

Michael Saboff

Reported 2021-05-25 13:00:06 PDT

The adaptive scavenger introduced many more madvise() calls. Some of those calls are expensive since the scavenger uses a conservative approach to madvise()'ing. Instead, we should minimize the range of the address space we madvise(). We can do this by keeping track of the allocated extent of large regions.

Attachments
Work in progress patch (32.56 KB, patch) 2021-05-25 13:13 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Updated patch (33.71 KB, patch) 2021-05-26 00:27 PDT, Michael Saboff	ggaren: review+ ews-feeder: commit-queue-	Details Formatted Diff Diff
Patch for Landing. (36.45 KB, patch) 2021-05-27 16:54 PDT, Michael Saboff	no flags	Details Formatted Diff Diff
Patch with mini-mode fix. (36.93 KB, patch) 2021-06-09 11:26 PDT, Michael Saboff	ews-feeder: commit-queue-	Details Formatted Diff Diff
Updated patch with fix (37.36 KB, patch) 2021-06-23 15:20 PDT, Michael Saboff	ggaren: review+	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.

Michael Saboff

Comment 1 2021-05-25 13:00:35 PDT

<rdar://78188389>

Michael Saboff

Comment 2 2021-05-25 13:13:02 PDT

Created attachment 429685 [details] Work in progress patch

Geoffrey Garen

Comment 3 2021-05-25 14:12:54 PDT

Comment on attachment 429685 [details] Work in progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=429685&action=review > Source/bmalloc/ChangeLog:9 > + It also tries to be more precise when calling madvise() by keeping track You can probably be less coy and say that it measured as 1000X more precise in practice -- due to the specific pathology of Gigacage. > Source/bmalloc/bmalloc/Heap.cpp:111 > +#if 0 > + static unsigned initNoPhys = 0; > +#endif revert > Source/bmalloc/bmalloc/Heap.cpp:253 > - m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize)); > + physicalExtent = totalPhysicalSize == size ? chunk->bytes() + totalPhysicalSize : physicalExtent; > + m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent)); I think it's always correct just to use physicalExtent, so let's do that to simplify. Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse now. When we first allocate a Chunk, there's code in allocateSmallChunk() that decommits any unused space at the beginning and end of the Chunk. But here we're telling the large free set that the whole Chunk is physical pages. That's wrong. We either need to stop decommitting those two pieces of the Chunk (I'm not sure if that code accomplishes anything in practice or not), or change the physical pages reported here. > Source/bmalloc/bmalloc/LargeRange.h:90 > + // This the last address of physical memory in this range. This the => This is the > Source/bmalloc/bmalloc/LargeRange.h:92 > + char* physicalExtent() const { return m_physicalExtent; } Maybe call this physicalEnd? 'end' is the word c++ usually uses for a pointer just past the end. > Source/bmalloc/bmalloc/LargeRange.h:94 > + void resetPhysicalExtent() { m_physicalExtent = begin(); } reset => clear

Michael Saboff

Comment 4 2021-05-25 20:09:45 PDT

(In reply to Geoffrey Garen from comment #3) > Comment on attachment 429685 [details] > Work in progress patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=429685&action=review > > > Source/bmalloc/ChangeLog:9 > > + It also tries to be more precise when calling madvise() by keeping track > > You can probably be less coy and say that it measured as 1000X more precise > in practice -- due to the specific pathology of Gigacage. > > > Source/bmalloc/bmalloc/Heap.cpp:111 > > +#if 0 > > + static unsigned initNoPhys = 0; > > +#endif > > revert Will do. Was used for debugging. > > Source/bmalloc/bmalloc/Heap.cpp:253 > > - m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize)); > > + physicalExtent = totalPhysicalSize == size ? chunk->bytes() + totalPhysicalSize : physicalExtent; > > + m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent)); > > I think it's always correct just to use physicalExtent, so let's do that to > simplify. Actually, the rounding up to totalPhysicalSize is needed. Took me a little to find this bug. Consider that the small chunk we are deallocating has a page class that doesn't work out to an even number of pages. I found this with 12K page class size. The calculated physical extent of the last page was 4K less (one page on x86) than totalPhysicalSize. I think it is only an issue for the last page in the chunk. This is the unused space at the end of the chunk in your next comment. > Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse > now. When we first allocate a Chunk, there's code in allocateSmallChunk() > that decommits any unused space at the beginning and end of the Chunk. But > here we're telling the large free set that the whole Chunk is physical > pages. That's wrong. We either need to stop decommitting those two pieces of > the Chunk (I'm not sure if that code accomplishes anything in practice or > not), or change the physical pages reported here. So I think my change above takes care of the unused space at the end. I'll look into the unused space at the beginning. Maybe we need to rethink the handling of unused space on both sides of the chunk. I think there is another bug here that I was going to address separately. We calculate startPhysicalSize based on hasPhysicalPages. The code above sets hasPhysicalPages to false for any page that doesn't have physical pages. Seems to me that some page(s) at the end of the chunk could have been recommitted and will result in startPhysicalSize being set to 0. I think the startPhysicalSize needs to be calculated based on how many pages at the front of the chunk still have physical memory. > > Source/bmalloc/bmalloc/LargeRange.h:90 > > + // This the last address of physical memory in this range. > > This the => This is the Fixed > > Source/bmalloc/bmalloc/LargeRange.h:92 > > + char* physicalExtent() const { return m_physicalExtent; } > > Maybe call this physicalEnd? 'end' is the word c++ usually uses for a > pointer just past the end. Agree that physicalEnd is a better name. > > Source/bmalloc/bmalloc/LargeRange.h:94 > > + void resetPhysicalExtent() { m_physicalExtent = begin(); } > > reset => clear Changed.

Geoffrey Garen

Comment 5 2021-05-25 20:43:14 PDT

Comment on attachment 429685 [details] Work in progress patch View in context: https://bugs.webkit.org/attachment.cgi?id=429685&action=review >>> Source/bmalloc/bmalloc/Heap.cpp:253 >>> + m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent)); >> >> I think it's always correct just to use physicalExtent, so let's do that to simplify. >> >> Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse now. When we first allocate a Chunk, there's code in allocateSmallChunk() that decommits any unused space at the beginning and end of the Chunk. But here we're telling the large free set that the whole Chunk is physical pages. That's wrong. We either need to stop decommitting those two pieces of the Chunk (I'm not sure if that code accomplishes anything in practice or not), or change the physical pages reported here. > > Actually, the rounding up to totalPhysicalSize is needed. Took me a little to find this bug. Consider that the small chunk we are deallocating has a page class that doesn't work out to an even number of pages. I found this with 12K page class size. The calculated physical extent of the last page was 4K less (one page on x86) than totalPhysicalSize. I think it is only an issue for the last page in the chunk. This is the unused space at the end of the chunk in your next comment. I don't understand. In the 12kB page size case, forEachPage should iterate all 12kB ranges, and then stop. If there's an 8kB range at the end of the Chunk, forEachPage should not iterate it. Can you explain more about what caused a 12kB page to look like an 8kB page in this code? Seems like it might be a bug we should fix some other way. Here's another bug: physicalExtent is initialized to 'chunk', but a Chunk has physical pages at its head that contain metadata, so the minimum physicalExtent should actually be chunk + Chunk::metadataSize(pageSize). Getting this wrong will prevent us from decommitting Chunk metadata. This bug will trigger in the case where all pages are already decommitted, because physicalExtent will remain at its initial value. Put another way, I think we need to make sure that physicalExtent is always correct; otherwise it is a liability. So, if there is a case where physicalExtent is calculated incorrectly, and then we fix it up by checking totalPhysicalSize, I'd like to know more about that case, and I'd like us to just calculate physicalExtent correctly instead.

Michael Saboff

Comment 6 2021-05-26 00:27:08 PDT

(In reply to Geoffrey Garen from comment #5) > Comment on attachment 429685 [details] > Work in progress patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=429685&action=review > > >>> Source/bmalloc/bmalloc/Heap.cpp:253 > >>> + m_largeFree.add(LargeRange(chunk, size, startPhysicalSize, totalPhysicalSize, physicalExtent)); > >> > >> I think it's always correct just to use physicalExtent, so let's do that to simplify. > >> > >> Relatedly, I think there's a maybe-pre-existing bug here that's maybe worse now. When we first allocate a Chunk, there's code in allocateSmallChunk() that decommits any unused space at the beginning and end of the Chunk. But here we're telling the large free set that the whole Chunk is physical pages. That's wrong. We either need to stop decommitting those two pieces of the Chunk (I'm not sure if that code accomplishes anything in practice or not), or change the physical pages reported here. > > > > Actually, the rounding up to totalPhysicalSize is needed. Took me a little to find this bug. Consider that the small chunk we are deallocating has a page class that doesn't work out to an even number of pages. I found this with 12K page class size. The calculated physical extent of the last page was 4K less (one page on x86) than totalPhysicalSize. I think it is only an issue for the last page in the chunk. This is the unused space at the end of the chunk in your next comment. > > I don't understand. In the 12kB page size case, forEachPage should iterate > all 12kB ranges, and then stop. If there's an 8kB range at the end of the > Chunk, forEachPage should not iterate it. Can you explain more about what > caused a 12kB page to look like an 8kB page in this code? Seems like it > might be a bug we should fix some other way. In the 12Kb page size case, forEachPage does iterate through all 12Kb ranges. There is 4Kb that wasn't allocated to any page at the end of the chunk. In allocateSmallChunk, we calculate that unused extra in the line: auto decommitSize = chunkSize - metadataSize - accountedInFreeable; And then we decommit that amount at the end of the chunk. Here in deallocateSmallChunk, we want to "recover" that unused part of the chunk. > Here's another bug: physicalExtent is initialized to 'chunk', but a Chunk > has physical pages at its head that contain metadata, so the minimum > physicalExtent should actually be chunk + Chunk::metadataSize(pageSize). > Getting this wrong will prevent us from decommitting Chunk metadata. This > bug will trigger in the case where all pages are already decommitted, > because physicalExtent will remain at its initial value. In the patch I'll post shortly, I no longer decommit the metadata at the front of the chunk and the unused space at the end in allocateSmallChunk. In deallocateSmallChunk, I also start physicalEnd (nee physicalExtent) as chunk + Chunk::metadataSize(pageSize). I'm open to suggestions on how to add back the unused region at the end of the chunk. > Put another way, I think we need to make sure that physicalExtent is always > correct; otherwise it is a liability. So, if there is a case where > physicalExtent is calculated incorrectly, and then we fix it up by checking > totalPhysicalSize, I'd like to know more about that case, and I'd like us to > just calculate physicalExtent correctly instead.

Michael Saboff

Comment 7 2021-05-26 00:27:47 PDT

Created attachment 429734 [details] Updated patch

Geoffrey Garen

Comment 8 2021-05-27 11:53:20 PDT

Comment on attachment 429734 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=429734&action=review r=me > Source/bmalloc/ChangeLog:20 > + There is the possible future optimization where we also keep track of the > + first address of physically mapped memory in a LargeRange. Since bmalloc > + allocates memory from lower addresses first, it is thought that the change > + in this patch is sufficient to reduce not only the number of madvise calls, > + but the time it takes to make those calls. Might be worth mentioning that you tested this thought, and what the results were. > Source/bmalloc/bmalloc/Heap.cpp:-239 > - auto metadataSize = Chunk::metadataSize(pageSize); > - vmDeallocatePhysicalPagesSloppy(chunk->address(sizeof(Chunk)), metadataSize - sizeof(Chunk)); > - > - auto decommitSize = chunkSize - metadataSize - accountedInFreeable; > - if (decommitSize > 0) > - vmDeallocatePhysicalPagesSloppy(chunk->address(chunkSize - decommitSize), decommitSize); I thought about this for a while and did not come up with a better idea than removing these VM deallocations. But I think this change means we need to A/B test Membuster and PLUM before we land. > Source/bmalloc/bmalloc/Heap.cpp:231 > + if (calculatingStartPhysicalSize) { I think this would be clearer if we had a local variable 'SmallPage* firstPageWithoutPhysicalPages = nullptr'. Then here we can have: if (!firstPageWithoutPhysicalPages) firstPageWithoutPhysicalPages = page; And below we can have: size_t startPhysicalSize = firstPageWithoutPhysicalPages ? firstPageWithoutPhysicalPages->begin()->begin() - chunk->bytes() : size; That way, it's more explicit what the loop is searching for, and there's never a time when startPhysicalSize holds an invalid value. > Source/bmalloc/bmalloc/LargeRange.h:90 > + // This is the last address of physical memory in this range. last => past the end

Michael Saboff

Comment 9 2021-05-27 16:54:05 PDT

Created attachment 429956 [details] Patch for Landing. Will run some A-B perf tests locally before landing.

Michael Saboff

Comment 10 2021-05-31 07:58:20 PDT

PLUM results run on three iPhone variants and an iPad are neutral. Membuster results run on two Mac variants show a progression of between 0.8-1.0%.

Michael Saboff

Comment 11 2021-05-31 08:04:29 PDT

Committed r278278 (238315@main): <https://commits.webkit.org/238315@main>

Michael Saboff

Comment 12 2021-06-03 20:47:22 PDT

Rolled out in https://trac.webkit.org/changeset/278447/webkit. It made some JSC mini mode and other tests flakey.

Michael Saboff

Comment 13 2021-06-09 11:26:40 PDT

Created attachment 430982 [details] Patch with mini-mode fix. The mini mode issue was fixed by addressing the FIXME bug https://bugs.webkit.org/show_bug.cgi?id=203987 - "[bmalloc] Mini-mode should also adjust scavenger interval".

Geoffrey Garen

Comment 14 2021-06-14 16:56:01 PDT

Comment on attachment 430982 [details] Patch with mini-mode fix. View in context: https://bugs.webkit.org/attachment.cgi?id=430982&action=review > Source/bmalloc/ChangeLog:10 > + Reland the adaptive scavenger change for macOS with a fix to mini mode. A policy to always use the min interval in mini mode seems sensible. But why did the previous interval make correctness tests flaky? And why do we want a below-minimum initial value to trigger the scavenger right after startup? Those are the things we need to know in order to evaluate whether this approach is sensible.

Geoffrey Garen

Comment 15 2021-06-14 16:59:01 PDT

mac-as-debug-wk2 claims to have found some crashes that only happen with this patch applied. Is it right about that?

Michael Saboff

Comment 16 2021-06-14 18:20:30 PDT

(In reply to Geoffrey Garen from comment #15) > mac-as-debug-wk2 claims to have found some crashes that only happen with > this patch applied. Is it right about that? I should have put a comment in. I kern the mac-AS-debug-wk2 tests. There are indeed a few crashes with the patch. The testing without the patch seems to have died. The total time running WK tests without the patch was 23 seconds, while the two runs with the patch ran for 1:03 (1 hour and 3 minutes). The first time the mac-AS-debug-wk2 tests ran was the same, 21sec w/o the patch and 1:03 with the patch. Therefore that bot is unless in my mind. I built and ran Mac-AS Release WK2 tests on my system. I'll run the Debug version.

Michael Saboff

Comment 17 2021-06-14 18:47:30 PDT

(In reply to Geoffrey Garen from comment #14) > Comment on attachment 430982 [details] > Patch with mini-mode fix. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=430982&action=review > > > Source/bmalloc/ChangeLog:10 > > + Reland the adaptive scavenger change for macOS with a fix to mini mode. > > A policy to always use the min interval in mini mode seems sensible. But why > did the > previous interval make correctness tests flaky? And why do we want a > below-minimum initial value to trigger the scavenger right after startup? I was only able to catch the crash with a release build in the debugger, never with a debug build. When I caught the crash, it was in a "tryAllocate...()" path that bubbled up to a butterfly allocation that didn't expect an allocation failure. See ButterflyInlines.h Butterfly::create() where it calls Butterfly::tryCreate(). So that crash isn't necessarily a correctness issue. I filed https://bugs.webkit.org/show_bug.cgi?id=227003 - "Butterfly::create() does not handle allocation failures.". As far as the initial scavenging value, it has always been that way as far as I can remember. My belief is that it was done that way to effectively scavenge immediately but asynchronously via the timer. > Those are the things we need to know in order to evaluate whether this > approach is sensible.

Michael Saboff

Comment 18 2021-06-15 11:49:01 PDT

(In reply to Michael Saboff from comment #16) > (In reply to Geoffrey Garen from comment #15) > > mac-as-debug-wk2 claims to have found some crashes that only happen with > > this patch applied. Is it right about that? > > I should have put a comment in. > > I kern the mac-AS-debug-wk2 tests. There are indeed a few crashes with the > patch. The testing without the patch seems to have died. The total time > running WK tests without the patch was 23 seconds, while the two runs with > the patch ran for 1:03 (1 hour and 3 minutes). The first time the > mac-AS-debug-wk2 tests ran was the same, 21sec w/o the patch and 1:03 with > the patch. > > Therefore that bot is unless in my mind. I built and ran Mac-AS Release WK2 > tests on my system. I'll run the Debug version. Debug WebKit2 tests on local AS device both with and without the patch had numerous crashes. A combination mostly of crashes in LaunchServices trying to launch a service worker (including in IndexDB tests) and the rest were ASSERTion failures.

Geoffrey Garen

Comment 19 2021-06-16 13:37:05 PDT

So.... the previous patch didn't actually regress mini mode, and this patch doesn't actually fix anything about mini mode? If so, at a minimum you should fix the patch title and ChangeLog to say that. But also: Why did the patch change at all? Why not re-land the original?

Michael Saboff

Comment 20 2021-06-17 12:58:28 PDT

(In reply to Geoffrey Garen from comment #19) > So.... the previous patch didn't actually regress mini mode, and this patch > doesn't actually fix anything about mini mode? I believe the previous patch did regress mini mode on macOS, because it eliminated the partial scavenging code. From my reading of the partial scavenging code, we'd partial scavenge every 12 seconds (x86) or 8 seconds (AS), irrespective of mini mode. I recall from the testing I did when working on the adaptive code, I rarely saw full scavenges. We were almost always doing partial scavenges. The previous patch eliminated the partial scavenging path and introduced the 10msec between full scavenges for mini mode. This patch changes the minimum time between full scavenges for mini mode to match what we do when not in mini mode. > If so, at a minimum you should fix the patch title and ChangeLog to say that. > > But also: Why did the patch change at all? Why not re-land the original? Because there appears to be an issue in mini mode doing full scavenges every 10 msec.

Filip Pizlo

Comment 21 2021-06-17 15:37:38 PDT

(In reply to Michael Saboff from comment #20) > (In reply to Geoffrey Garen from comment #19) > > So.... the previous patch didn't actually regress mini mode, and this patch > > doesn't actually fix anything about mini mode? > > I believe the previous patch did regress mini mode on macOS, because it > eliminated the partial scavenging code. From my reading of the partial > scavenging code, we'd partial scavenge every 12 seconds (x86) or 8 seconds > (AS), irrespective of mini mode. I recall from the testing I did when > working on the adaptive code, I rarely saw full scavenges. We were almost > always doing partial scavenges. > > The previous patch eliminated the partial scavenging path and introduced the > 10msec between full scavenges for mini mode. This patch changes the minimum > time between full scavenges for mini mode to match what we do when not in > mini mode. > > > If so, at a minimum you should fix the patch title and ChangeLog to say that. > > > > But also: Why did the patch change at all? Why not re-land the original? > > Because there appears to be an issue in mini mode doing full scavenges every > 10 msec. What's the issue?

Michael Saboff

Comment 22 2021-06-22 17:27:24 PDT

(In reply to Filip Pizlo from comment #21) > (In reply to Michael Saboff from comment #20) > > (In reply to Geoffrey Garen from comment #19) > > > So.... the previous patch didn't actually regress mini mode, and this patch > > > doesn't actually fix anything about mini mode? > > > > I believe the previous patch did regress mini mode on macOS, because it > > eliminated the partial scavenging code. From my reading of the partial > > scavenging code, we'd partial scavenge every 12 seconds (x86) or 8 seconds > > (AS), irrespective of mini mode. I recall from the testing I did when > > working on the adaptive code, I rarely saw full scavenges. We were almost > > always doing partial scavenges. > > > > The previous patch eliminated the partial scavenging path and introduced the > > 10msec between full scavenges for mini mode. This patch changes the minimum > > time between full scavenges for mini mode to match what we do when not in > > mini mode. > > > > > If so, at a minimum you should fix the patch title and ChangeLog to say that. > > > > > > But also: Why did the patch change at all? Why not re-land the original? > > > > Because there appears to be an issue in mini mode doing full scavenges every > > 10 msec. > > What's the issue? Found the issue where we clear the "can merge" flag during scavenging, but it doesn't get set later. This is due to an if that avoids unnecessary deallocation of physical pages. The code that deallocates physical pages also sets the can merge flag to true. I will post a fix shortly after some more testing.

Michael Saboff

Comment 23 2021-06-23 15:20:59 PDT

Created attachment 432096 [details] Updated patch with fix

Geoffrey Garen

Comment 24 2021-07-02 13:03:09 PDT

Comment on attachment 432096 [details] Updated patch with fix r=me Glad we found the bug.

Michael Saboff

Comment 25 2021-07-06 14:20:57 PDT

Committed r279621 (239439@main): <https://commits.webkit.org/239439@main>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version WebKit Nightly Build

Hardware Unspecified

OS macOS 11

Product WebKit

Component JavaScriptCore

Assignee

Michael Saboff

Reported

2021-05-25 13:00 PDT

Modified

2021-07-06 14:20 PDT History

CC List

3 users Show

URL

Keywords

Depends on

Blocks

203987

Dependencies

tree graph