Bug 171384 - bmalloc scavenger should know what page classes are allocating
Summary: bmalloc scavenger should know what page classes are allocating
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: bmalloc (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-04-27 11:37 PDT by Michael Saboff
Modified: 2017-04-27 17:37 PDT (History)
2 users (show)

See Also:


Attachments
Patch (8.24 KB, patch)
2017-04-27 11:45 PDT, Michael Saboff
ggaren: review-
Details | Formatted Diff | Diff
Updated patch addressing review comments (9.35 KB, patch)
2017-04-27 14:07 PDT, Michael Saboff
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2017-04-27 11:37:44 PDT
The scavenging process has one flag to indicate that bmalloc is allocating VM heap memory for any class.  When we scavenge, we check this flag and wait for any allocations to complete.  Instead we should keep a flag for every page size class and skip classes that are currently allocating and come back to them later.

<rdar://problem/31868122>
Comment 1 Michael Saboff 2017-04-27 11:45:58 PDT
Created attachment 308418 [details]
Patch
Comment 2 Geoffrey Garen 2017-04-27 12:20:14 PDT
Comment on attachment 308418 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=308418&action=review

We need to measure MallocBench with this change.

> Source/bmalloc/bmalloc/Heap.cpp:119
> +    m_isPageClassAllocating.fill(false);

The default constructor does this for us. You can remove this.

> Source/bmalloc/bmalloc/Heap.cpp:-136
> -    waitUntilFalse(lock, sleepDuration, m_isAllocatingPages);

I think we still want this initial wait. Otherwise, we'll reclaim everything before we have time to learn that we're still allocating pages.

> Source/bmalloc/bmalloc/Heap.cpp:159
> +                allocationInProgress = true;
> +                break;

Instead of taking an out parameter, we should just reschedule ourselves here when we learn that we have more work to do.

> Source/bmalloc/bmalloc/Heap.cpp:163
>              m_vmHeap.deallocateSmallPage(lock, pageClass, page);

I think it's a pre-existing bug that we don't drop the lock around the system call, like we do in scavengeLargeObjects. We should probably fix that too.

> Source/bmalloc/bmalloc/Heap.cpp:183
> +            allocationInProgress = true;
> +            break;

Ditto.

Also, I would put this check at the top of the loop. Otherwise, you always deallocate one.

> Source/bmalloc/bmalloc/Heap.h:120
> +    std::array<bool, pageClassCount + 1> m_isPageClassAllocating; // Extra class is for large objects.

Let's call this m_isAllocatingPages and only use pageClassCount items.

> Source/bmalloc/bmalloc/Heap.h:121
> +#define isAllocatingLargePages m_isPageClassAllocating[pageClassCount]

Let's make this a data member instead of a #define: m_largeIsAllocatingPages.

> Source/bmalloc/bmalloc/bmalloc.h:79
> -    PerProcess<Heap>::get()->scavenge(lock, std::chrono::milliseconds(0));
> +    PerProcess<Heap>::get()->scavenge(lock);

I think we've lost the behavior where this function should synchronously guarantee (or nearly guarantee) a complete scavenge before any new allocations took place. That's an important thing to do in memory measurement and also in our critical memory warning handler.

One option, which is a little gross, is to pass a fake mutex to our callee, so the lock never actually unlocks. Another option is to pass a bool/enum parameter that says whether to unlock or not. Or maybe pass some kind of helper lambda or object for unlocking.
Comment 3 Michael Saboff 2017-04-27 14:06:44 PDT
Comment on attachment 308418 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=308418&action=review

>> Source/bmalloc/bmalloc/Heap.cpp:163
>>              m_vmHeap.deallocateSmallPage(lock, pageClass, page);
> 
> I think it's a pre-existing bug that we don't drop the lock around the system call, like we do in scavengeLargeObjects. We should probably fix that too.

We do drop the lock in VMHeap::deallocateSmallPage().
Comment 4 Michael Saboff 2017-04-27 14:07:18 PDT
Created attachment 308441 [details]
Updated patch addressing review comments
Comment 5 Michael Saboff 2017-04-27 14:07:52 PDT
I'll run MallocBench shortly and post the results before/after.
Comment 6 Geoffrey Garen 2017-04-27 14:40:29 PDT
Comment on attachment 308441 [details]
Updated patch addressing review comments

View in context: https://bugs.webkit.org/attachment.cgi?id=308441&action=review

r=me

> Source/bmalloc/bmalloc/VMHeap.h:44
> +typedef enum { DeallocateSynchronously, DeallocateAsynchronously } DeallocationMode;

This should be an enum class. Then you get better type safety and you don't need to put the class name in the value names.

I'd call this "enum class { Sync, Async } ScavengeMode".
Comment 7 Michael Saboff 2017-04-27 16:44:51 PDT
Here are the MallocBench results running on an iOS device:
                                                Baseline	 WithPatch
Execution Time:	
    message_one                                    155ms	     164ms
    message_many                                   162ms	     143ms
    churn --parallel                                41ms	      38ms
    list_allocate --parallel                        76ms	      81ms
    tree_allocate --parallel                        96ms	     115ms
    tree_churn --parallel                           76ms	      75ms
    facebook --parallel                            732ms	     775ms
    reddit --parallel                              422ms	     465ms
    flickr --parallel                              456ms	     486ms
    theverge --parallel                            604ms	     651ms
	
    <geometric mean>                               186ms	     193ms
    <arithmetic mean>                              282ms	     299ms
    <harmonic mean>                                123ms	     124ms
	
Peak Memory:	
    message_one                                  6,176kB	   5,872kB
    message_many                                18,672kB	  10,992kB
    churn --parallel                             2,032kB	   2,112kB
    list_allocate --parallel                     3,680kB	   3,696kB
    tree_allocate --parallel                     5,856kB	   5,456kB
    tree_churn --parallel                        5,296kB	   5,168kB
    facebook --parallel                        399,568kB	 406,000kB
    reddit --parallel                           77,552kB	  80,576kB
    flickr --parallel                          147,904kB	 148,944kB
    theverge --parallel                        159,408kB	 160,112kB
	
    <geometric mean>                            21,335kB	  20,159kB
    <arithmetic mean>                           82,614kB	  82,893kB
    <harmonic mean>                              7,313kB	   7,093kB
	
Memory at End:	
    message_one                                  1,120kB	   1,056kB
    message_many                                 1,824kB	   1,760kB
    churn --parallel                             1,152kB	   1,232kB
    list_allocate --parallel                     1,520kB	   1,456kB
    tree_allocate --parallel                     1,792kB	   1,744kB
    tree_churn --parallel                        2,240kB	   2,336kB
    facebook --parallel                         15,872kB	  16,560kB
    reddit --parallel                            9,872kB	  10,720kB
    flickr --parallel                           14,000kB	  14,464kB
    theverge --parallel                         11,568kB	  11,424kB
	
    <geometric mean>                             3,599kB	   3,631kB
    <arithmetic mean>                            6,096kB	   6,275kB
    <harmonic mean>                              2,329kB	   2,312kB
Comment 8 Michael Saboff 2017-04-27 17:04:17 PDT
Here are the full set of MallocBench tests:
                                                      Baseline	       WithPatch
Execution Time:	
    churn                                                 56ms	            56ms
    list_allocate                                         46ms	            46ms
    tree_allocate                                         53ms	            53ms
    tree_churn                                            51ms	            51ms
    fragment                                              41ms	            42ms
    fragment_iterate                                      40ms	            40ms
    medium                                               191ms	           190ms
    big                                                   53ms	            64ms
    facebook                                             139ms	           137ms
    reddit                                                86ms	            86ms
    flickr                                                89ms	            89ms
    theverge                                             109ms	           108ms
    nimlang                                               81ms	            81ms
    message_one                                          165ms	           165ms
    message_many                                         211ms	           212ms
    churn --parallel                                      39ms	            42ms
    list_allocate --parallel                              82ms	            82ms
    tree_allocate --parallel                             106ms	            98ms
    tree_churn --parallel                                 80ms	            69ms
    fragment --parallel                                   46ms	            39ms
    fragment_iterate --parallel                           19ms	            17ms
    medium --parallel                                    112ms	           114ms
    big --parallel                                        90ms	            84ms
    facebook --parallel                                  824ms	           860ms
    reddit --parallel                                    480ms	           470ms
    flickr --parallel                                    484ms	           497ms
    theverge --parallel                                  641ms	           669ms
	
    <geometric mean>                                     100ms	           100ms
    <arithmetic mean>                                    163ms	           165ms
    <harmonic mean>                                       72ms	            71ms
	
Peak Memory:	
    churn                                              1,760kB	         1,760kB
    list_allocate                                      3,024kB	         3,024kB
    tree_allocate                                      6,384kB	         6,496kB
    tree_churn                                         5,824kB	         5,824kB
    fragment                                           7,904kB	         7,952kB
    fragment_iterate                                  26,672kB	        26,640kB
    medium                                         1,194,560kB	     1,193,968kB
    big                                            1,088,224kB	     1,088,192kB
    facebook                                          74,496kB	        74,592kB
    reddit                                            15,648kB	        15,712kB
    flickr                                            28,000kB	        28,000kB
    theverge                                          28,656kB	        28,656kB
    nimlang                                          178,016kB	       178,096kB
    message_one                                        6,336kB	         6,256kB
    message_many                                      23,776kB	        22,720kB
    churn --parallel                                   2,128kB	         2,048kB
    list_allocate --parallel                           3,728kB	         3,664kB
    tree_allocate --parallel                           5,632kB	         5,504kB
    tree_churn --parallel                              5,216kB	         5,280kB
    fragment --parallel                                8,096kB	         8,144kB
    fragment_iterate --parallel                       26,960kB	        27,056kB
    medium --parallel                              1,159,216kB	     1,162,448kB
    big --parallel                                 1,046,768kB	     1,064,368kB
    facebook --parallel                              407,200kB	       407,456kB
    reddit --parallel                                 79,888kB	        80,064kB
    flickr --parallel                                147,472kB	       148,336kB
    theverge --parallel                              157,200kB	       159,744kB
	
    <geometric mean>                                  32,446kB	        32,394kB
    <arithmetic mean>                                212,548kB	       213,407kB
    <harmonic mean>                                    8,869kB	         8,799kB
	
Memory at End:	
    churn                                                864kB	           864kB
    list_allocate                                        864kB	           864kB
    tree_allocate                                        816kB	           928kB
    tree_churn                                           928kB	           928kB
    fragment                                             880kB	           928kB
    fragment_iterate                                   1,216kB	         1,184kB
    medium                                            19,824kB	        19,712kB
    big                                                2,112kB	         2,080kB
    facebook                                           1,328kB	         1,360kB
    reddit                                             1,024kB	         1,088kB
    flickr                                             1,152kB	         1,152kB
    theverge                                           1,264kB	         1,264kB
    nimlang                                           59,712kB	        59,792kB
    message_one                                        1,264kB	         1,168kB
    message_many                                       1,888kB	         2,000kB
    churn --parallel                                   1,248kB	         1,168kB
    list_allocate --parallel                           1,504kB	         1,440kB
    tree_allocate --parallel                           1,680kB	         1,760kB
    tree_churn --parallel                              2,304kB	         2,192kB
    fragment --parallel                                2,464kB	         2,160kB
    fragment_iterate --parallel                        1,680kB	         1,696kB
    medium --parallel                                 25,328kB	        27,472kB
    big --parallel                                    24,800kB	        22,784kB
    facebook --parallel                               15,392kB	        16,016kB
    reddit --parallel                                 10,512kB	        10,656kB
    flickr --parallel                                 14,144kB	        13,840kB
    theverge --parallel                               11,152kB	        11,424kB
	
    <geometric mean>                                   2,913kB	         2,915kB
    <arithmetic mean>                                  7,679kB	         7,701kB
    <harmonic mean>                                    1,717kB	         1,727kB
Comment 9 Michael Saboff 2017-04-27 17:37:52 PDT
Committed r215909: <http://trac.webkit.org/changeset/215909>