Bug 150218 - bmalloc: per-thread cache data structure should be smaller
Summary: bmalloc: per-thread cache data structure should be smaller
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-15 17:56 PDT by Geoffrey Garen
Modified: 2015-10-16 13:00 PDT (History)
2 users (show)

See Also:


Attachments
Patch (9.67 KB, patch)
2015-10-15 18:00 PDT, Geoffrey Garen
kling: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2015-10-15 17:56:39 PDT
bmalloc: per-thread cache data structure should be smaller
Comment 1 Geoffrey Garen 2015-10-15 18:00:04 PDT
Created attachment 263234 [details]
Patch
Comment 2 Geoffrey Garen 2015-10-15 18:00:28 PDT
~/OpenSource/WebKitBuild> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuild/ReleaseBaseline/ Patch:~/OpenSource/WebKitBuild/Release/
                                                                                
                                                      Baseline                          Patch                              Δ
Execution Time:
    churn                                                 72ms                           67ms                 ^ 1.07x faster
    list_allocate                                         69ms                           67ms                 ^ 1.03x faster
    tree_allocate                                         66ms                           65ms                 ^ 1.02x faster
    tree_churn                                            76ms                           75ms                 ^ 1.01x faster
    fragment                                              61ms                           61ms                               
    fragment_iterate                                      51ms                           51ms                               
    medium                                               165ms                          164ms                 ^ 1.01x faster
    big                                                  123ms                          124ms                 ! 1.01x slower
    facebook                                             152ms                          153ms                 ! 1.01x slower
    reddit                                                72ms                           72ms                               
    flickr                                                79ms                           79ms                               
    theverge                                              95ms                           97ms                 ! 1.02x slower
    message_one                                          199ms                          200ms                 ! 1.01x slower
    message_many                                         934ms                          937ms                  ! 1.0x slower
    churn --parallel                                      88ms                           90ms                 ! 1.02x slower
    list_allocate --parallel                             223ms                          224ms                  ! 1.0x slower
    tree_allocate --parallel                           1,183ms                        1,191ms                 ! 1.01x slower
    tree_churn --parallel                              1,184ms                        1,219ms                 ! 1.03x slower
    fragment --parallel                                  108ms                          111ms                 ! 1.03x slower
    fragment_iterate --parallel                           13ms                           12ms                 ^ 1.08x faster
    medium --parallel                                    264ms                          267ms                 ! 1.01x slower
    big --parallel                                        83ms                           83ms                               

    <geometric mean>                                     127ms                          127ms                  ^ 1.0x faster
    <arithmetic mean>                                    244ms                          246ms                 ! 1.01x slower
    <harmonic mean>                                       80ms                           78ms                 ^ 1.03x faster

Peak Memory:
    churn                                                900kB                          888kB                ^ 1.01x smaller
    list_allocate                                      2,204kB                        2,192kB                ^ 1.01x smaller
    tree_allocate                                      5,632kB                        5,620kB                 ^ 1.0x smaller
    tree_churn                                         4,900kB                        4,912kB                  ! 1.0x bigger
    fragment                                           7,160kB                        7,148kB                 ^ 1.0x smaller
    fragment_iterate                                  25,928kB                       25,916kB                 ^ 1.0x smaller
    medium                                         1,070,432kB                    1,070,420kB                 ^ 1.0x smaller
    big                                            1,062,424kB                    1,062,412kB                 ^ 1.0x smaller
    facebook                                          77,648kB                       77,620kB                 ^ 1.0x smaller
    reddit                                            15,084kB                       15,076kB                 ^ 1.0x smaller
    flickr                                            27,488kB                       27,484kB                 ^ 1.0x smaller
    theverge                                          28,716kB                       28,700kB                 ^ 1.0x smaller
    message_one                                        4,568kB                        4,556kB                 ^ 1.0x smaller
    message_many                                       2,900kB                        2,852kB                ^ 1.02x smaller
    churn --parallel                                   1,960kB                        1,668kB                ^ 1.18x smaller
    list_allocate --parallel                           3,412kB                        3,124kB                ^ 1.09x smaller
    tree_allocate --parallel                          13,764kB                       12,996kB                ^ 1.06x smaller
    tree_churn --parallel                             13,524kB                       13,204kB                ^ 1.02x smaller
    fragment --parallel                                7,296kB                        7,040kB                ^ 1.04x smaller
    fragment_iterate --parallel                       27,020kB                       26,560kB                ^ 1.02x smaller
    medium --parallel                              1,042,576kB                    1,040,032kB                 ^ 1.0x smaller
    big --parallel                                 1,011,296kB                      996,568kB                ^ 1.01x smaller

    <geometric mean>                                  19,877kB                       19,481kB                ^ 1.02x smaller
    <arithmetic mean>                                202,583kB                      201,681kB                 ^ 1.0x smaller
    <harmonic mean>                                    5,546kB                        5,341kB                ^ 1.04x smaller

Memory at End:
    churn                                                500kB                          488kB                ^ 1.02x smaller
    list_allocate                                        520kB                          508kB                ^ 1.02x smaller
    tree_allocate                                        612kB                          600kB                ^ 1.02x smaller
    tree_churn                                           584kB                          572kB                ^ 1.02x smaller
    fragment                                             612kB                          600kB                ^ 1.02x smaller
    fragment_iterate                                     940kB                          928kB                ^ 1.01x smaller
    medium                                             6,804kB                        6,792kB                 ^ 1.0x smaller
    big                                                6,780kB                        7,136kB                 ! 1.05x bigger
    facebook                                           3,476kB                        3,456kB                ^ 1.01x smaller
    reddit                                             2,136kB                        2,124kB                ^ 1.01x smaller
    flickr                                             3,192kB                        3,172kB                ^ 1.01x smaller
    theverge                                           3,252kB                        3,240kB                 ^ 1.0x smaller
    message_one                                          904kB                          860kB                ^ 1.05x smaller
    message_many                                       1,296kB                        1,312kB                 ! 1.01x bigger
    churn --parallel                                   1,544kB                        1,260kB                ^ 1.23x smaller
    list_allocate --parallel                           1,992kB                        1,740kB                ^ 1.14x smaller
    tree_allocate --parallel                           2,488kB                        2,084kB                ^ 1.19x smaller
    tree_churn --parallel                              4,896kB                        4,644kB                ^ 1.05x smaller
    fragment --parallel                                1,840kB                        1,544kB                ^ 1.19x smaller
    fragment_iterate --parallel                        2,232kB                        1,932kB                ^ 1.16x smaller
    medium --parallel                                  7,672kB                        7,408kB                ^ 1.04x smaller
    big --parallel                                     7,492kB                        7,112kB                ^ 1.05x smaller

    <geometric mean>                                   1,932kB                        1,838kB                ^ 1.05x smaller
    <arithmetic mean>                                  2,807kB                        2,705kB                ^ 1.04x smaller
    <harmonic mean>                                    1,331kB                        1,273kB                ^ 1.05x smaller

=====
~/OpenSource/WebKitBuild>
Comment 3 Geoffrey Garen 2015-10-15 18:02:18 PDT
So, a bit faster for single-threaded code, a bit slower for 24-wide multi-threaded code, and definitively smaller. I think this is a good tradeoff.
Comment 4 Andreas Kling 2015-10-15 20:45:13 PDT
Comment on attachment 263234 [details]
Patch

r=me, awesome!
Comment 5 Geoffrey Garen 2015-10-16 13:00:04 PDT
Committed r191196: <http://trac.webkit.org/changeset/191196>