RESOLVED FIXED 150218
bmalloc: per-thread cache data structure should be smaller
https://bugs.webkit.org/show_bug.cgi?id=150218
Summary bmalloc: per-thread cache data structure should be smaller
Geoffrey Garen
Reported 2015-10-15 17:56:39 PDT
bmalloc: per-thread cache data structure should be smaller
Attachments
Patch (9.67 KB, patch)
2015-10-15 18:00 PDT, Geoffrey Garen
kling: review+
Geoffrey Garen
Comment 1 2015-10-15 18:00:04 PDT
Geoffrey Garen
Comment 2 2015-10-15 18:00:28 PDT
~/OpenSource/WebKitBuild> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuild/ReleaseBaseline/ Patch:~/OpenSource/WebKitBuild/Release/ Baseline Patch Δ Execution Time: churn 72ms 67ms ^ 1.07x faster list_allocate 69ms 67ms ^ 1.03x faster tree_allocate 66ms 65ms ^ 1.02x faster tree_churn 76ms 75ms ^ 1.01x faster fragment 61ms 61ms fragment_iterate 51ms 51ms medium 165ms 164ms ^ 1.01x faster big 123ms 124ms ! 1.01x slower facebook 152ms 153ms ! 1.01x slower reddit 72ms 72ms flickr 79ms 79ms theverge 95ms 97ms ! 1.02x slower message_one 199ms 200ms ! 1.01x slower message_many 934ms 937ms ! 1.0x slower churn --parallel 88ms 90ms ! 1.02x slower list_allocate --parallel 223ms 224ms ! 1.0x slower tree_allocate --parallel 1,183ms 1,191ms ! 1.01x slower tree_churn --parallel 1,184ms 1,219ms ! 1.03x slower fragment --parallel 108ms 111ms ! 1.03x slower fragment_iterate --parallel 13ms 12ms ^ 1.08x faster medium --parallel 264ms 267ms ! 1.01x slower big --parallel 83ms 83ms <geometric mean> 127ms 127ms ^ 1.0x faster <arithmetic mean> 244ms 246ms ! 1.01x slower <harmonic mean> 80ms 78ms ^ 1.03x faster Peak Memory: churn 900kB 888kB ^ 1.01x smaller list_allocate 2,204kB 2,192kB ^ 1.01x smaller tree_allocate 5,632kB 5,620kB ^ 1.0x smaller tree_churn 4,900kB 4,912kB ! 1.0x bigger fragment 7,160kB 7,148kB ^ 1.0x smaller fragment_iterate 25,928kB 25,916kB ^ 1.0x smaller medium 1,070,432kB 1,070,420kB ^ 1.0x smaller big 1,062,424kB 1,062,412kB ^ 1.0x smaller facebook 77,648kB 77,620kB ^ 1.0x smaller reddit 15,084kB 15,076kB ^ 1.0x smaller flickr 27,488kB 27,484kB ^ 1.0x smaller theverge 28,716kB 28,700kB ^ 1.0x smaller message_one 4,568kB 4,556kB ^ 1.0x smaller message_many 2,900kB 2,852kB ^ 1.02x smaller churn --parallel 1,960kB 1,668kB ^ 1.18x smaller list_allocate --parallel 3,412kB 3,124kB ^ 1.09x smaller tree_allocate --parallel 13,764kB 12,996kB ^ 1.06x smaller tree_churn --parallel 13,524kB 13,204kB ^ 1.02x smaller fragment --parallel 7,296kB 7,040kB ^ 1.04x smaller fragment_iterate --parallel 27,020kB 26,560kB ^ 1.02x smaller medium --parallel 1,042,576kB 1,040,032kB ^ 1.0x smaller big --parallel 1,011,296kB 996,568kB ^ 1.01x smaller <geometric mean> 19,877kB 19,481kB ^ 1.02x smaller <arithmetic mean> 202,583kB 201,681kB ^ 1.0x smaller <harmonic mean> 5,546kB 5,341kB ^ 1.04x smaller Memory at End: churn 500kB 488kB ^ 1.02x smaller list_allocate 520kB 508kB ^ 1.02x smaller tree_allocate 612kB 600kB ^ 1.02x smaller tree_churn 584kB 572kB ^ 1.02x smaller fragment 612kB 600kB ^ 1.02x smaller fragment_iterate 940kB 928kB ^ 1.01x smaller medium 6,804kB 6,792kB ^ 1.0x smaller big 6,780kB 7,136kB ! 1.05x bigger facebook 3,476kB 3,456kB ^ 1.01x smaller reddit 2,136kB 2,124kB ^ 1.01x smaller flickr 3,192kB 3,172kB ^ 1.01x smaller theverge 3,252kB 3,240kB ^ 1.0x smaller message_one 904kB 860kB ^ 1.05x smaller message_many 1,296kB 1,312kB ! 1.01x bigger churn --parallel 1,544kB 1,260kB ^ 1.23x smaller list_allocate --parallel 1,992kB 1,740kB ^ 1.14x smaller tree_allocate --parallel 2,488kB 2,084kB ^ 1.19x smaller tree_churn --parallel 4,896kB 4,644kB ^ 1.05x smaller fragment --parallel 1,840kB 1,544kB ^ 1.19x smaller fragment_iterate --parallel 2,232kB 1,932kB ^ 1.16x smaller medium --parallel 7,672kB 7,408kB ^ 1.04x smaller big --parallel 7,492kB 7,112kB ^ 1.05x smaller <geometric mean> 1,932kB 1,838kB ^ 1.05x smaller <arithmetic mean> 2,807kB 2,705kB ^ 1.04x smaller <harmonic mean> 1,331kB 1,273kB ^ 1.05x smaller ===== ~/OpenSource/WebKitBuild>
Geoffrey Garen
Comment 3 2015-10-15 18:02:18 PDT
So, a bit faster for single-threaded code, a bit slower for 24-wide multi-threaded code, and definitively smaller. I think this is a good tradeoff.
Andreas Kling
Comment 4 2015-10-15 20:45:13 PDT
Comment on attachment 263234 [details] Patch r=me, awesome!
Geoffrey Garen
Comment 5 2015-10-16 13:00:04 PDT
Note You need to log in before you can comment on or make changes to this bug.