RESOLVED FIXED 173552
bmalloc: Add a per-thread line cache
https://bugs.webkit.org/show_bug.cgi?id=173552
Summary bmalloc: Add a per-thread line cache
Geoffrey Garen
Reported 2017-06-19 10:34:32 PDT
bmalloc: Add a per-thread line cache
Attachments
Patch (14.19 KB, patch)
2017-06-19 20:13 PDT, Geoffrey Garen
darin: review+
Geoffrey Garen
Comment 1 2017-06-19 20:13:14 PDT
Geoffrey Garen
Comment 2 2017-06-19 20:14:11 PDT
MacBook Air MallocBench results: ~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/ Baseline Patch Δ Execution Time: churn 80ms 78ms ^ 1.03x faster list_allocate 70ms 73ms ! 1.04x slower tree_allocate 74ms 76ms ! 1.03x slower tree_churn 82ms 81ms ^ 1.01x faster fragment 70ms 70ms fragment_iterate 78ms 77ms ^ 1.01x faster medium 157ms 160ms ! 1.02x slower big 139ms 138ms ^ 1.01x faster facebook 221ms 219ms ^ 1.01x faster reddit 112ms 113ms ! 1.01x slower flickr 115ms 117ms ! 1.02x slower theverge 144ms 150ms ! 1.04x slower nimlang 119ms 117ms ^ 1.02x faster message_one 190ms 188ms ^ 1.01x faster message_many 122ms 119ms ^ 1.03x faster churn --parallel 37ms 37ms list_allocate --parallel 68ms 69ms ! 1.01x slower tree_allocate --parallel 84ms 80ms ^ 1.05x faster tree_churn --parallel 83ms 73ms ^ 1.14x faster fragment --parallel 53ms 50ms ^ 1.06x faster fragment_iterate --parallel 33ms 33ms medium --parallel 155ms 153ms ^ 1.01x faster big --parallel 148ms 140ms ^ 1.06x faster facebook --parallel 635ms 628ms ^ 1.01x faster reddit --parallel 316ms 277ms ^ 1.14x faster flickr --parallel 312ms 282ms ^ 1.11x faster theverge --parallel 412ms 368ms ^ 1.12x faster <geometric mean> 118ms 115ms ^ 1.02x faster <arithmetic mean> 152ms 147ms ^ 1.04x faster <harmonic mean> 96ms 94ms ^ 1.02x faster Peak Memory: churn 2,296kB 2,288kB ^ 1.0x smaller list_allocate 3,584kB 3,588kB ! 1.0x bigger tree_allocate 7,404kB 7,408kB ! 1.0x bigger tree_churn 6,224kB 6,224kB fragment 9,452kB 9,460kB ! 1.0x bigger fragment_iterate 27,124kB 27,120kB ^ 1.0x smaller medium 1,190,816kB 1,190,812kB ^ 1.0x smaller big 1,090,788kB 1,090,588kB ^ 1.0x smaller facebook 81,136kB 80,624kB ^ 1.01x smaller reddit 15,412kB 15,412kB flickr 29,324kB 29,320kB ^ 1.0x smaller theverge 28,976kB 28,940kB ^ 1.0x smaller nimlang 166,900kB 166,344kB ^ 1.0x smaller message_one 6,612kB 6,632kB ! 1.0x bigger message_many 4,272kB 4,444kB ! 1.04x bigger churn --parallel 2,420kB 2,428kB ! 1.0x bigger list_allocate --parallel 3,684kB 3,684kB tree_allocate --parallel 4,764kB 4,752kB ^ 1.0x smaller tree_churn --parallel 4,412kB 4,416kB ! 1.0x bigger fragment --parallel 9,560kB 9,576kB ! 1.0x bigger fragment_iterate --parallel 27,984kB 28,004kB ! 1.0x bigger medium --parallel 1,191,240kB 1,193,328kB ! 1.0x bigger big --parallel 1,087,676kB 1,089,688kB ! 1.0x bigger facebook --parallel 286,476kB 284,320kB ^ 1.01x smaller reddit --parallel 56,480kB 56,548kB ! 1.0x bigger flickr --parallel 101,908kB 101,936kB ! 1.0x bigger theverge --parallel 110,156kB 109,852kB ^ 1.0x smaller <geometric mean> 29,976kB 30,008kB ! 1.0x bigger <arithmetic mean> 205,818kB 205,842kB ! 1.0x bigger <harmonic mean> 9,014kB 9,043kB ! 1.0x bigger Memory at End: churn 464kB 456kB ^ 1.02x smaller list_allocate 464kB 468kB ! 1.01x bigger tree_allocate 464kB 468kB ! 1.01x bigger tree_churn 468kB 468kB fragment 468kB 476kB ! 1.02x bigger fragment_iterate 480kB 476kB ^ 1.01x smaller medium 544kB 540kB ^ 1.01x smaller big 536kB 536kB facebook 2,444kB 2,444kB reddit 1,684kB 1,684kB flickr 2,600kB 2,596kB ^ 1.0x smaller theverge 2,644kB 2,608kB ^ 1.01x smaller nimlang 58,460kB 58,544kB ! 1.0x bigger message_one 740kB 748kB ! 1.01x bigger message_many 1,324kB 1,148kB ^ 1.15x smaller churn --parallel 580kB 588kB ! 1.01x bigger list_allocate --parallel 620kB 608kB ^ 1.02x smaller tree_allocate --parallel 820kB 828kB ! 1.01x bigger tree_churn --parallel 868kB 780kB ^ 1.11x smaller fragment --parallel 716kB 1,008kB ! 1.41x bigger fragment_iterate --parallel 652kB 644kB ^ 1.01x smaller medium --parallel 5,752kB 6,744kB ! 1.17x bigger big --parallel 38,308kB 29,200kB ^ 1.31x smaller facebook --parallel 12,392kB 11,956kB ^ 1.04x smaller reddit --parallel 6,972kB 6,808kB ^ 1.02x smaller flickr --parallel 11,432kB 11,524kB ! 1.01x bigger theverge --parallel 10,848kB 10,996kB ! 1.01x bigger <geometric mean> 1,689kB 1,685kB ^ 1.0x smaller <arithmetic mean> 6,065kB 5,753kB ^ 1.05x smaller <harmonic mean> 910kB 916kB ! 1.01x bigger
Geoffrey Garen
Comment 3 2017-06-19 20:15:24 PDT
Mac Pro MallocBench results: ~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/ Baseline Patch Δ Execution Time: churn 71ms 71ms list_allocate 63ms 65ms ! 1.03x slower tree_allocate 63ms 64ms ! 1.02x slower tree_churn 76ms 75ms ^ 1.01x faster fragment 61ms 61ms fragment_iterate 66ms 66ms medium 138ms 138ms big 119ms 121ms ! 1.02x slower facebook 184ms 184ms reddit 100ms 100ms flickr 104ms 105ms ! 1.01x slower theverge 132ms 133ms ! 1.01x slower nimlang 117ms 114ms ^ 1.03x faster message_one 176ms 174ms ^ 1.01x faster message_many 953ms 911ms ^ 1.05x faster churn --parallel 33ms 32ms ^ 1.03x faster list_allocate --parallel 146ms 116ms ^ 1.26x faster tree_allocate --parallel 805ms 613ms ^ 1.31x faster tree_churn --parallel 1,009ms 354ms ^ 2.85x faster fragment --parallel 82ms 62ms ^ 1.32x faster fragment_iterate --parallel 12ms 12ms medium --parallel 119ms 117ms ^ 1.02x faster big --parallel 116ms 115ms ^ 1.01x faster facebook --parallel 4,719ms 4,104ms ^ 1.15x faster reddit --parallel 3,852ms 2,753ms ^ 1.4x faster flickr --parallel 4,126ms 2,532ms ^ 1.63x faster theverge --parallel 4,456ms 3,289ms ^ 1.35x faster <geometric mean> 199ms 177ms ^ 1.12x faster <arithmetic mean> 811ms 610ms ^ 1.33x faster <harmonic mean> 88ms 85ms ^ 1.03x faster Peak Memory: churn 1,024kB 1,036kB ! 1.01x bigger list_allocate 2,324kB 2,324kB tree_allocate 6,132kB 6,144kB ! 1.0x bigger tree_churn 4,960kB 4,948kB ^ 1.0x smaller fragment 8,176kB 8,176kB fragment_iterate 25,840kB 25,852kB ! 1.0x bigger medium 1,189,528kB 1,189,528kB big 1,089,316kB 1,089,548kB ! 1.0x bigger facebook 79,400kB 79,816kB ! 1.01x bigger reddit 14,092kB 14,080kB ^ 1.0x smaller flickr 28,044kB 27,976kB ^ 1.0x smaller theverge 27,608kB 27,608kB nimlang 165,688kB 166,184kB ! 1.0x bigger message_one 5,484kB 5,460kB ^ 1.0x smaller message_many 2,916kB 2,924kB ! 1.0x bigger churn --parallel 1,696kB 1,760kB ! 1.04x bigger list_allocate --parallel 3,136kB 3,296kB ! 1.05x bigger tree_allocate --parallel 12,972kB 12,800kB ^ 1.01x smaller tree_churn --parallel 13,024kB 13,964kB ! 1.07x bigger fragment --parallel 7,216kB 7,528kB ! 1.04x bigger fragment_iterate --parallel 28,176kB 28,348kB ! 1.01x bigger medium --parallel 1,134,232kB 1,159,656kB ! 1.02x bigger big --parallel 1,038,960kB 1,024,420kB ^ 1.01x smaller facebook --parallel 1,582,944kB 1,595,348kB ! 1.01x bigger reddit --parallel 291,172kB 300,924kB ! 1.03x bigger flickr --parallel 550,712kB 555,568kB ! 1.01x bigger theverge --parallel 602,324kB 614,916kB ! 1.02x bigger <geometric mean> 36,479kB 36,866kB ! 1.01x bigger <arithmetic mean> 293,226kB 295,190kB ! 1.01x bigger <harmonic mean> 6,982kB 7,089kB ! 1.02x bigger Memory at End: churn 572kB 584kB ! 1.02x bigger list_allocate 584kB 584kB tree_allocate 572kB 584kB ! 1.02x bigger tree_churn 584kB 572kB ^ 1.02x smaller fragment 572kB 572kB fragment_iterate 572kB 584kB ! 1.02x bigger medium 632kB 632kB big 632kB 636kB ! 1.01x bigger facebook 2,560kB 2,508kB ^ 1.02x smaller reddit 1,748kB 1,736kB ^ 1.01x smaller flickr 2,704kB 2,636kB ^ 1.03x smaller theverge 2,660kB 2,660kB nimlang 58,548kB 58,440kB ^ 1.0x smaller message_one 1,012kB 984kB ^ 1.03x smaller message_many 1,488kB 1,556kB ! 1.05x bigger churn --parallel 1,260kB 1,324kB ! 1.05x bigger list_allocate --parallel 1,596kB 1,652kB ! 1.04x bigger tree_allocate --parallel 2,560kB 2,764kB ! 1.08x bigger tree_churn --parallel 2,836kB 2,520kB ^ 1.13x smaller fragment --parallel 2,608kB 2,828kB ! 1.08x bigger fragment_iterate --parallel 1,716kB 2,128kB ! 1.24x bigger medium --parallel 49,384kB 31,760kB ^ 1.55x smaller big --parallel 79,960kB 77,192kB ^ 1.04x smaller facebook --parallel 40,992kB 39,532kB ^ 1.04x smaller reddit --parallel 30,316kB 29,752kB ^ 1.02x smaller flickr --parallel 37,020kB 34,972kB ^ 1.06x smaller theverge --parallel 32,012kB 32,288kB ! 1.01x bigger <geometric mean> 3,081kB 3,055kB ^ 1.01x smaller <arithmetic mean> 13,248kB 12,370kB ^ 1.07x smaller <harmonic mean> 1,334kB 1,349kB ! 1.01x bigger
Darin Adler
Comment 4 2017-06-19 22:57:54 PDT
Comment on attachment 313353 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=313353&action=review > Source/bmalloc/bmalloc/List.h:118 > + static void remove(ListNode<T>* node) The insertAfter function could also be marked static; why not? > Source/bmalloc/bmalloc/SmallPage.h:71 > +typedef std::array<List<SmallPage>, sizeClassCount> LineCache; In new code, we’ve been preferring using to typedef.
Geoffrey Garen
Comment 5 2017-06-24 13:14:35 PDT
Saam Barati
Comment 6 2017-06-26 12:04:44 PDT
This is a 4% progression on wasm benchmarks too.
Note You need to log in before you can comment on or make changes to this bug.