RESOLVED FIXED 171384
bmalloc scavenger should know what page classes are allocating
https://bugs.webkit.org/show_bug.cgi?id=171384
Summary bmalloc scavenger should know what page classes are allocating
Michael Saboff
Reported 2017-04-27 11:37:44 PDT
The scavenging process has one flag to indicate that bmalloc is allocating VM heap memory for any class. When we scavenge, we check this flag and wait for any allocations to complete. Instead we should keep a flag for every page size class and skip classes that are currently allocating and come back to them later. <rdar://problem/31868122>
Attachments
Patch (8.24 KB, patch)
2017-04-27 11:45 PDT, Michael Saboff
ggaren: review-
Updated patch addressing review comments (9.35 KB, patch)
2017-04-27 14:07 PDT, Michael Saboff
ggaren: review+
Michael Saboff
Comment 1 2017-04-27 11:45:58 PDT
Geoffrey Garen
Comment 2 2017-04-27 12:20:14 PDT
Comment on attachment 308418 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=308418&action=review We need to measure MallocBench with this change. > Source/bmalloc/bmalloc/Heap.cpp:119 > + m_isPageClassAllocating.fill(false); The default constructor does this for us. You can remove this. > Source/bmalloc/bmalloc/Heap.cpp:-136 > - waitUntilFalse(lock, sleepDuration, m_isAllocatingPages); I think we still want this initial wait. Otherwise, we'll reclaim everything before we have time to learn that we're still allocating pages. > Source/bmalloc/bmalloc/Heap.cpp:159 > + allocationInProgress = true; > + break; Instead of taking an out parameter, we should just reschedule ourselves here when we learn that we have more work to do. > Source/bmalloc/bmalloc/Heap.cpp:163 > m_vmHeap.deallocateSmallPage(lock, pageClass, page); I think it's a pre-existing bug that we don't drop the lock around the system call, like we do in scavengeLargeObjects. We should probably fix that too. > Source/bmalloc/bmalloc/Heap.cpp:183 > + allocationInProgress = true; > + break; Ditto. Also, I would put this check at the top of the loop. Otherwise, you always deallocate one. > Source/bmalloc/bmalloc/Heap.h:120 > + std::array<bool, pageClassCount + 1> m_isPageClassAllocating; // Extra class is for large objects. Let's call this m_isAllocatingPages and only use pageClassCount items. > Source/bmalloc/bmalloc/Heap.h:121 > +#define isAllocatingLargePages m_isPageClassAllocating[pageClassCount] Let's make this a data member instead of a #define: m_largeIsAllocatingPages. > Source/bmalloc/bmalloc/bmalloc.h:79 > - PerProcess<Heap>::get()->scavenge(lock, std::chrono::milliseconds(0)); > + PerProcess<Heap>::get()->scavenge(lock); I think we've lost the behavior where this function should synchronously guarantee (or nearly guarantee) a complete scavenge before any new allocations took place. That's an important thing to do in memory measurement and also in our critical memory warning handler. One option, which is a little gross, is to pass a fake mutex to our callee, so the lock never actually unlocks. Another option is to pass a bool/enum parameter that says whether to unlock or not. Or maybe pass some kind of helper lambda or object for unlocking.
Michael Saboff
Comment 3 2017-04-27 14:06:44 PDT
Comment on attachment 308418 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=308418&action=review >> Source/bmalloc/bmalloc/Heap.cpp:163 >> m_vmHeap.deallocateSmallPage(lock, pageClass, page); > > I think it's a pre-existing bug that we don't drop the lock around the system call, like we do in scavengeLargeObjects. We should probably fix that too. We do drop the lock in VMHeap::deallocateSmallPage().
Michael Saboff
Comment 4 2017-04-27 14:07:18 PDT
Created attachment 308441 [details] Updated patch addressing review comments
Michael Saboff
Comment 5 2017-04-27 14:07:52 PDT
I'll run MallocBench shortly and post the results before/after.
Geoffrey Garen
Comment 6 2017-04-27 14:40:29 PDT
Comment on attachment 308441 [details] Updated patch addressing review comments View in context: https://bugs.webkit.org/attachment.cgi?id=308441&action=review r=me > Source/bmalloc/bmalloc/VMHeap.h:44 > +typedef enum { DeallocateSynchronously, DeallocateAsynchronously } DeallocationMode; This should be an enum class. Then you get better type safety and you don't need to put the class name in the value names. I'd call this "enum class { Sync, Async } ScavengeMode".
Michael Saboff
Comment 7 2017-04-27 16:44:51 PDT
Here are the MallocBench results running on an iOS device: Baseline WithPatch Execution Time: message_one 155ms 164ms message_many 162ms 143ms churn --parallel 41ms 38ms list_allocate --parallel 76ms 81ms tree_allocate --parallel 96ms 115ms tree_churn --parallel 76ms 75ms facebook --parallel 732ms 775ms reddit --parallel 422ms 465ms flickr --parallel 456ms 486ms theverge --parallel 604ms 651ms <geometric mean> 186ms 193ms <arithmetic mean> 282ms 299ms <harmonic mean> 123ms 124ms Peak Memory: message_one 6,176kB 5,872kB message_many 18,672kB 10,992kB churn --parallel 2,032kB 2,112kB list_allocate --parallel 3,680kB 3,696kB tree_allocate --parallel 5,856kB 5,456kB tree_churn --parallel 5,296kB 5,168kB facebook --parallel 399,568kB 406,000kB reddit --parallel 77,552kB 80,576kB flickr --parallel 147,904kB 148,944kB theverge --parallel 159,408kB 160,112kB <geometric mean> 21,335kB 20,159kB <arithmetic mean> 82,614kB 82,893kB <harmonic mean> 7,313kB 7,093kB Memory at End: message_one 1,120kB 1,056kB message_many 1,824kB 1,760kB churn --parallel 1,152kB 1,232kB list_allocate --parallel 1,520kB 1,456kB tree_allocate --parallel 1,792kB 1,744kB tree_churn --parallel 2,240kB 2,336kB facebook --parallel 15,872kB 16,560kB reddit --parallel 9,872kB 10,720kB flickr --parallel 14,000kB 14,464kB theverge --parallel 11,568kB 11,424kB <geometric mean> 3,599kB 3,631kB <arithmetic mean> 6,096kB 6,275kB <harmonic mean> 2,329kB 2,312kB
Michael Saboff
Comment 8 2017-04-27 17:04:17 PDT
Here are the full set of MallocBench tests: Baseline WithPatch Execution Time: churn 56ms 56ms list_allocate 46ms 46ms tree_allocate 53ms 53ms tree_churn 51ms 51ms fragment 41ms 42ms fragment_iterate 40ms 40ms medium 191ms 190ms big 53ms 64ms facebook 139ms 137ms reddit 86ms 86ms flickr 89ms 89ms theverge 109ms 108ms nimlang 81ms 81ms message_one 165ms 165ms message_many 211ms 212ms churn --parallel 39ms 42ms list_allocate --parallel 82ms 82ms tree_allocate --parallel 106ms 98ms tree_churn --parallel 80ms 69ms fragment --parallel 46ms 39ms fragment_iterate --parallel 19ms 17ms medium --parallel 112ms 114ms big --parallel 90ms 84ms facebook --parallel 824ms 860ms reddit --parallel 480ms 470ms flickr --parallel 484ms 497ms theverge --parallel 641ms 669ms <geometric mean> 100ms 100ms <arithmetic mean> 163ms 165ms <harmonic mean> 72ms 71ms Peak Memory: churn 1,760kB 1,760kB list_allocate 3,024kB 3,024kB tree_allocate 6,384kB 6,496kB tree_churn 5,824kB 5,824kB fragment 7,904kB 7,952kB fragment_iterate 26,672kB 26,640kB medium 1,194,560kB 1,193,968kB big 1,088,224kB 1,088,192kB facebook 74,496kB 74,592kB reddit 15,648kB 15,712kB flickr 28,000kB 28,000kB theverge 28,656kB 28,656kB nimlang 178,016kB 178,096kB message_one 6,336kB 6,256kB message_many 23,776kB 22,720kB churn --parallel 2,128kB 2,048kB list_allocate --parallel 3,728kB 3,664kB tree_allocate --parallel 5,632kB 5,504kB tree_churn --parallel 5,216kB 5,280kB fragment --parallel 8,096kB 8,144kB fragment_iterate --parallel 26,960kB 27,056kB medium --parallel 1,159,216kB 1,162,448kB big --parallel 1,046,768kB 1,064,368kB facebook --parallel 407,200kB 407,456kB reddit --parallel 79,888kB 80,064kB flickr --parallel 147,472kB 148,336kB theverge --parallel 157,200kB 159,744kB <geometric mean> 32,446kB 32,394kB <arithmetic mean> 212,548kB 213,407kB <harmonic mean> 8,869kB 8,799kB Memory at End: churn 864kB 864kB list_allocate 864kB 864kB tree_allocate 816kB 928kB tree_churn 928kB 928kB fragment 880kB 928kB fragment_iterate 1,216kB 1,184kB medium 19,824kB 19,712kB big 2,112kB 2,080kB facebook 1,328kB 1,360kB reddit 1,024kB 1,088kB flickr 1,152kB 1,152kB theverge 1,264kB 1,264kB nimlang 59,712kB 59,792kB message_one 1,264kB 1,168kB message_many 1,888kB 2,000kB churn --parallel 1,248kB 1,168kB list_allocate --parallel 1,504kB 1,440kB tree_allocate --parallel 1,680kB 1,760kB tree_churn --parallel 2,304kB 2,192kB fragment --parallel 2,464kB 2,160kB fragment_iterate --parallel 1,680kB 1,696kB medium --parallel 25,328kB 27,472kB big --parallel 24,800kB 22,784kB facebook --parallel 15,392kB 16,016kB reddit --parallel 10,512kB 10,656kB flickr --parallel 14,144kB 13,840kB theverge --parallel 11,152kB 11,424kB <geometric mean> 2,913kB 2,915kB <arithmetic mean> 7,679kB 7,701kB <harmonic mean> 1,717kB 1,727kB
Michael Saboff
Comment 9 2017-04-27 17:37:52 PDT
Note You need to log in before you can comment on or make changes to this bug.