Bug 172880 - bmalloc: Small and large objects should share memory
Summary: bmalloc: Small and large objects should share memory
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-02 16:35 PDT by Geoffrey Garen
Modified: 2017-06-05 21:40 PDT (History)
4 users (show)

See Also:


Attachments
Patch (25.58 KB, patch)
2017-06-02 16:52 PDT, Geoffrey Garen
sam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2017-06-02 16:35:10 PDT
bmalloc: Small and large objects should share memory
Comment 1 Geoffrey Garen 2017-06-02 16:52:59 PDT
Created attachment 311885 [details]
Patch
Comment 2 Geoffrey Garen 2017-06-02 16:54:27 PDT
Benchmark results:

~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks --memory_warning Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/
                                                                                     
                                                                Baseline                               Patch                                   Δ
Peak Memory:
    reddit_memory_warning --runs 0                              15,320kB                            15,268kB                      ^ 1.0x smaller
    flickr_memory_warning --runs 0                              27,980kB                            27,296kB                     ^ 1.03x smaller
    theverge_memory_warning --runs 0                            28,256kB                            28,300kB                       ! 1.0x bigger

    <geometric mean>                                            22,965kB                            22,763kB                     ^ 1.01x smaller
    <arithmetic mean>                                           23,852kB                            23,621kB                     ^ 1.01x smaller
    <harmonic mean>                                             21,993kB                            21,823kB                     ^ 1.01x smaller

Memory at End:
    reddit_memory_warning --runs 0                               8,880kB                             8,804kB                     ^ 1.01x smaller
    flickr_memory_warning --runs 0                              12,144kB                            12,140kB                      ^ 1.0x smaller
    theverge_memory_warning --runs 0                            20,900kB                            20,944kB                       ! 1.0x bigger

    <geometric mean>                                            13,111kB                            13,081kB                      ^ 1.0x smaller
    <arithmetic mean>                                           13,975kB                            13,963kB                      ^ 1.0x smaller
    <harmonic mean>                                             12,356kB                            12,310kB                      ^ 1.0x smaller

=====
~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks --memory Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/
                                                                                
                                  Baseline                Patch                    Δ
Peak Memory:
    facebook --runs 0             73,492kB             75,292kB       ! 1.02x bigger
    reddit --runs 0               15,360kB             15,348kB       ^ 1.0x smaller
    flickr --runs 0               27,856kB             28,208kB       ! 1.01x bigger
    theverge --runs 0             28,304kB             28,888kB       ! 1.02x bigger
    nimlang --runs 0             119,808kB            107,236kB      ^ 1.12x smaller

    <geometric mean>              40,325kB             39,888kB      ^ 1.01x smaller
    <arithmetic mean>             52,964kB             50,994kB      ^ 1.04x smaller
    <harmonic mean>               31,588kB             31,680kB        ! 1.0x bigger

Memory at End:
    facebook --runs 0              2,764kB              2,392kB      ^ 1.16x smaller
    reddit --runs 0                1,824kB              1,632kB      ^ 1.12x smaller
    flickr --runs 0                2,800kB              2,532kB      ^ 1.11x smaller
    theverge --runs 0              2,852kB              2,560kB      ^ 1.11x smaller
    nimlang --runs 0               8,556kB              8,392kB      ^ 1.02x smaller

    <geometric mean>               3,217kB              2,920kB       ^ 1.1x smaller
    <arithmetic mean>              3,759kB              3,502kB      ^ 1.07x smaller
    <harmonic mean>                2,882kB              2,583kB      ^ 1.12x smaller

=====
~/OpenSource/Source/bmalloc> 



~/OpenSource/Source/bmalloc>  ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/
                                                                                
                                                      Baseline                          Patch                              Δ
Execution Time:
    churn                                                 81ms                           81ms                               
    list_allocate                                         75ms                           74ms                 ^ 1.01x faster
    tree_allocate                                         80ms                           83ms                 ! 1.04x slower
    tree_churn                                            99ms                           90ms                  ^ 1.1x faster
    fragment                                              72ms                           74ms                 ! 1.03x slower
    fragment_iterate                                      80ms                           81ms                 ! 1.01x slower
    medium                                               157ms                          164ms                 ! 1.04x slower
    big                                                  138ms                          143ms                 ! 1.04x slower
    facebook                                             228ms                          234ms                 ! 1.03x slower
    reddit                                               115ms                          119ms                 ! 1.03x slower
    flickr                                               125ms                          118ms                 ^ 1.06x faster
    theverge                                             154ms                          157ms                 ! 1.02x slower
    nimlang                                              127ms                          125ms                 ^ 1.02x faster
    message_one                                          195ms                          192ms                 ^ 1.02x faster
    message_many                                         120ms                          125ms                 ! 1.04x slower
    churn --parallel                                      38ms                           38ms                               
    list_allocate --parallel                              71ms                           71ms                               
    tree_allocate --parallel                              87ms                           87ms                               
    tree_churn --parallel                                 91ms                           90ms                 ^ 1.01x faster
    fragment --parallel                                   54ms                           55ms                 ! 1.02x slower
    fragment_iterate --parallel                           34ms                           34ms                               
    medium --parallel                                    167ms                          159ms                 ^ 1.05x faster
    big --parallel                                       159ms                          155ms                 ^ 1.03x faster
    facebook --parallel                                  676ms                          642ms                 ^ 1.05x faster
    reddit --parallel                                    327ms                          321ms                 ^ 1.02x faster
    flickr --parallel                                    324ms                          321ms                 ^ 1.01x faster
    theverge --parallel                                  441ms                          431ms                 ^ 1.02x faster

    <geometric mean>                                     123ms                          123ms                  ^ 1.0x faster
    <arithmetic mean>                                    160ms                          158ms                 ^ 1.01x faster
    <harmonic mean>                                      100ms                          100ms                  ! 1.0x slower

Peak Memory:
    churn                                              2,308kB                        2,300kB                 ^ 1.0x smaller
    list_allocate                                      3,580kB                        3,588kB                  ! 1.0x bigger
    tree_allocate                                      6,944kB                        7,400kB                 ! 1.07x bigger
    tree_churn                                         6,184kB                        6,208kB                  ! 1.0x bigger
    fragment                                           8,484kB                        9,452kB                 ! 1.11x bigger
    fragment_iterate                                  27,072kB                       27,112kB                  ! 1.0x bigger
    medium                                         1,183,944kB                    1,190,800kB                 ! 1.01x bigger
    big                                            1,085,132kB                    1,090,592kB                 ! 1.01x bigger
    facebook                                          74,412kB                       81,216kB                 ! 1.09x bigger
    reddit                                            15,432kB                       15,388kB                 ^ 1.0x smaller
    flickr                                            27,936kB                       29,384kB                 ! 1.05x bigger
    theverge                                          28,436kB                       28,952kB                 ! 1.02x bigger
    nimlang                                          175,228kB                      167,908kB                ^ 1.04x smaller
    message_one                                        6,828kB                        6,624kB                ^ 1.03x smaller
    message_many                                       4,712kB                        4,604kB                ^ 1.02x smaller
    churn --parallel                                   2,412kB                        2,396kB                ^ 1.01x smaller
    list_allocate --parallel                           3,644kB                        3,692kB                 ! 1.01x bigger
    tree_allocate --parallel                           4,748kB                        4,768kB                  ! 1.0x bigger
    tree_churn --parallel                              4,400kB                        4,420kB                  ! 1.0x bigger
    fragment --parallel                                8,584kB                        9,556kB                 ! 1.11x bigger
    fragment_iterate --parallel                       27,196kB                       27,996kB                 ! 1.03x bigger
    medium --parallel                              1,182,568kB                    1,192,292kB                 ! 1.01x bigger
    big --parallel                                 1,082,116kB                    1,083,532kB                  ! 1.0x bigger
    facebook --parallel                              280,920kB                      280,972kB                  ! 1.0x bigger
    reddit --parallel                                 53,828kB                       56,428kB                 ! 1.05x bigger
    flickr --parallel                                102,724kB                      103,980kB                 ! 1.01x bigger
    theverge --parallel                              107,100kB                      110,152kB                 ! 1.03x bigger

    <geometric mean>                                  29,516kB                       30,058kB                 ! 1.02x bigger
    <arithmetic mean>                                204,329kB                      205,619kB                 ! 1.01x bigger
    <harmonic mean>                                    8,965kB                        9,059kB                 ! 1.01x bigger

Memory at End:
    churn                                                492kB                          472kB                ^ 1.04x smaller
    list_allocate                                        488kB                          472kB                ^ 1.03x smaller
    tree_allocate                                        516kB                          464kB                ^ 1.11x smaller
    tree_churn                                           480kB                          456kB                ^ 1.05x smaller
    fragment                                             528kB                          472kB                ^ 1.12x smaller
    fragment_iterate                                     684kB                          472kB                ^ 1.45x smaller
    medium                                            12,088kB                          532kB               ^ 22.72x smaller
    big                                                1,276kB                          536kB                ^ 2.38x smaller
    facebook                                           2,896kB                        2,528kB                ^ 1.15x smaller
    reddit                                             1,888kB                        1,664kB                ^ 1.13x smaller
    flickr                                             2,872kB                        2,664kB                ^ 1.08x smaller
    theverge                                           2,972kB                        2,624kB                ^ 1.13x smaller
    nimlang                                           58,248kB                       58,644kB                 ! 1.01x bigger
    message_one                                          780kB                          748kB                ^ 1.04x smaller
    message_many                                       1,196kB                        1,212kB                 ! 1.01x bigger
    churn --parallel                                     576kB                          560kB                ^ 1.03x smaller
    list_allocate --parallel                             604kB                          632kB                 ! 1.05x bigger
    tree_allocate --parallel                             768kB                          804kB                 ! 1.05x bigger
    tree_churn --parallel                                872kB                          812kB                ^ 1.07x smaller
    fragment --parallel                                  804kB                          824kB                 ! 1.02x bigger
    fragment_iterate --parallel                          852kB                          656kB                 ^ 1.3x smaller
    medium --parallel                                 17,084kB                        8,472kB                ^ 2.02x smaller
    big --parallel                                    19,708kB                       30,400kB                 ! 1.54x bigger
    facebook --parallel                               11,916kB                       11,984kB                 ! 1.01x bigger
    reddit --parallel                                  6,880kB                        7,044kB                 ! 1.02x bigger
    flickr --parallel                                 11,620kB                       11,564kB                 ^ 1.0x smaller
    theverge --parallel                               11,320kB                       11,048kB                ^ 1.02x smaller

    <geometric mean>                                   2,098kB                        1,698kB                ^ 1.24x smaller
    <arithmetic mean>                                  6,311kB                        5,880kB                ^ 1.07x smaller
    <harmonic mean>                                    1,089kB                          911kB                 ^ 1.2x smaller
Comment 3 Sam Weinig 2017-06-03 13:09:28 PDT
Comment on attachment 311885 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=311885&action=review

> Source/bmalloc/ChangeLog:10
> +        This reduces our high-watermark memory usage on JetStream on macOS
> +        by 10%-20%.

Very nice!

Probably worth stating there performance change or that it had no effect.

Also, I would find it useful (and future readers of the code might as well) to have an explanation up top of the changes in the algorithm,

> Source/bmalloc/ChangeLog:20
> +        (bmalloc::forEachPage):A new helper function for iterating the pages

Missing space after the : which is personally killing me.

> Source/bmalloc/ChangeLog:47
> +        (bmalloc::Heap::deallocateSmallLine):  Updated for new APIs. Note that
> +        we save one chunk per page class a little cache. This avoids churn

I don't follow the second sentence.  I feel like maybe a word or some punctuation is missing.

> Source/bmalloc/bmalloc/Chunk.h:85
> +    for (Object it = begin; it + pageSize <= end; it = it + pageSize)

I'm not sure what your auto usage is like in bmalloc, but, you could replace Object it with auto it.

> Source/bmalloc/bmalloc/Heap.cpp:172
> +        for (auto chunk = m_freePages[pageClass].begin(); chunk != m_freePages[pageClass].end(); ++chunk) {
> +            for (auto page = chunk->freePages().begin(); page != chunk->freePages().end(); ++page) {

I think these two loops could be simplified to:

for (auto& chunk : m_freePages[pageClass]) {
    for (auto& page = chunk.freePages()) {
        ...
    }
}

> Source/bmalloc/bmalloc/Heap.cpp:183
> +    for (size_t pageClass = 0; pageClass < pageClassCount; pageClass++) {

We usually use ++pageClass.

> Source/bmalloc/bmalloc/LargeRange.h:54
> +    size_t physicalSize() const { return m_physicalSize; } // True physical size can be larger.

This comment is confusing. Perhaps you could clarify what "True physical size" is and why it can be larger.
Comment 4 Geoffrey Garen 2017-06-05 19:21:13 PDT
Committed r217811: <http://trac.webkit.org/changeset/217811>
Comment 5 Carlos Alberto Lopez Perez 2017-06-05 21:40:42 PDT
(In reply to Geoffrey Garen from comment #4)
> Committed r217811: <http://trac.webkit.org/changeset/217811>

It seems this fails to build with GCC-4.9 (our minimum version of the supported compiler on WebKitGTK+).

The log: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20Debian%20Stable%20%28Build%29/builds/2454/steps/compile-webkit/logs/stdio