Bug 172880

Summary: bmalloc: Small and large objects should share memory
Product: WebKit Reporter: Geoffrey Garen <ggaren>
Component: New BugsAssignee: Geoffrey Garen <ggaren>
Status: RESOLVED FIXED    
Severity: Normal CC: bugs-noreply, clopez, msaboff, sam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch sam: review+

Geoffrey Garen
Reported 2017-06-02 16:35:10 PDT
bmalloc: Small and large objects should share memory
Attachments
Patch (25.58 KB, patch)
2017-06-02 16:52 PDT, Geoffrey Garen
sam: review+
Geoffrey Garen
Comment 1 2017-06-02 16:52:59 PDT
Geoffrey Garen
Comment 2 2017-06-02 16:54:27 PDT
Benchmark results: ~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks --memory_warning Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/ Baseline Patch Δ Peak Memory: reddit_memory_warning --runs 0 15,320kB 15,268kB ^ 1.0x smaller flickr_memory_warning --runs 0 27,980kB 27,296kB ^ 1.03x smaller theverge_memory_warning --runs 0 28,256kB 28,300kB ! 1.0x bigger <geometric mean> 22,965kB 22,763kB ^ 1.01x smaller <arithmetic mean> 23,852kB 23,621kB ^ 1.01x smaller <harmonic mean> 21,993kB 21,823kB ^ 1.01x smaller Memory at End: reddit_memory_warning --runs 0 8,880kB 8,804kB ^ 1.01x smaller flickr_memory_warning --runs 0 12,144kB 12,140kB ^ 1.0x smaller theverge_memory_warning --runs 0 20,900kB 20,944kB ! 1.0x bigger <geometric mean> 13,111kB 13,081kB ^ 1.0x smaller <arithmetic mean> 13,975kB 13,963kB ^ 1.0x smaller <harmonic mean> 12,356kB 12,310kB ^ 1.0x smaller ===== ~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks --memory Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/ Baseline Patch Δ Peak Memory: facebook --runs 0 73,492kB 75,292kB ! 1.02x bigger reddit --runs 0 15,360kB 15,348kB ^ 1.0x smaller flickr --runs 0 27,856kB 28,208kB ! 1.01x bigger theverge --runs 0 28,304kB 28,888kB ! 1.02x bigger nimlang --runs 0 119,808kB 107,236kB ^ 1.12x smaller <geometric mean> 40,325kB 39,888kB ^ 1.01x smaller <arithmetic mean> 52,964kB 50,994kB ^ 1.04x smaller <harmonic mean> 31,588kB 31,680kB ! 1.0x bigger Memory at End: facebook --runs 0 2,764kB 2,392kB ^ 1.16x smaller reddit --runs 0 1,824kB 1,632kB ^ 1.12x smaller flickr --runs 0 2,800kB 2,532kB ^ 1.11x smaller theverge --runs 0 2,852kB 2,560kB ^ 1.11x smaller nimlang --runs 0 8,556kB 8,392kB ^ 1.02x smaller <geometric mean> 3,217kB 2,920kB ^ 1.1x smaller <arithmetic mean> 3,759kB 3,502kB ^ 1.07x smaller <harmonic mean> 2,882kB 2,583kB ^ 1.12x smaller ===== ~/OpenSource/Source/bmalloc> ~/OpenSource/Source/bmalloc> ~/OpenSource/PerformanceTests/MallocBench/run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/ Baseline Patch Δ Execution Time: churn 81ms 81ms list_allocate 75ms 74ms ^ 1.01x faster tree_allocate 80ms 83ms ! 1.04x slower tree_churn 99ms 90ms ^ 1.1x faster fragment 72ms 74ms ! 1.03x slower fragment_iterate 80ms 81ms ! 1.01x slower medium 157ms 164ms ! 1.04x slower big 138ms 143ms ! 1.04x slower facebook 228ms 234ms ! 1.03x slower reddit 115ms 119ms ! 1.03x slower flickr 125ms 118ms ^ 1.06x faster theverge 154ms 157ms ! 1.02x slower nimlang 127ms 125ms ^ 1.02x faster message_one 195ms 192ms ^ 1.02x faster message_many 120ms 125ms ! 1.04x slower churn --parallel 38ms 38ms list_allocate --parallel 71ms 71ms tree_allocate --parallel 87ms 87ms tree_churn --parallel 91ms 90ms ^ 1.01x faster fragment --parallel 54ms 55ms ! 1.02x slower fragment_iterate --parallel 34ms 34ms medium --parallel 167ms 159ms ^ 1.05x faster big --parallel 159ms 155ms ^ 1.03x faster facebook --parallel 676ms 642ms ^ 1.05x faster reddit --parallel 327ms 321ms ^ 1.02x faster flickr --parallel 324ms 321ms ^ 1.01x faster theverge --parallel 441ms 431ms ^ 1.02x faster <geometric mean> 123ms 123ms ^ 1.0x faster <arithmetic mean> 160ms 158ms ^ 1.01x faster <harmonic mean> 100ms 100ms ! 1.0x slower Peak Memory: churn 2,308kB 2,300kB ^ 1.0x smaller list_allocate 3,580kB 3,588kB ! 1.0x bigger tree_allocate 6,944kB 7,400kB ! 1.07x bigger tree_churn 6,184kB 6,208kB ! 1.0x bigger fragment 8,484kB 9,452kB ! 1.11x bigger fragment_iterate 27,072kB 27,112kB ! 1.0x bigger medium 1,183,944kB 1,190,800kB ! 1.01x bigger big 1,085,132kB 1,090,592kB ! 1.01x bigger facebook 74,412kB 81,216kB ! 1.09x bigger reddit 15,432kB 15,388kB ^ 1.0x smaller flickr 27,936kB 29,384kB ! 1.05x bigger theverge 28,436kB 28,952kB ! 1.02x bigger nimlang 175,228kB 167,908kB ^ 1.04x smaller message_one 6,828kB 6,624kB ^ 1.03x smaller message_many 4,712kB 4,604kB ^ 1.02x smaller churn --parallel 2,412kB 2,396kB ^ 1.01x smaller list_allocate --parallel 3,644kB 3,692kB ! 1.01x bigger tree_allocate --parallel 4,748kB 4,768kB ! 1.0x bigger tree_churn --parallel 4,400kB 4,420kB ! 1.0x bigger fragment --parallel 8,584kB 9,556kB ! 1.11x bigger fragment_iterate --parallel 27,196kB 27,996kB ! 1.03x bigger medium --parallel 1,182,568kB 1,192,292kB ! 1.01x bigger big --parallel 1,082,116kB 1,083,532kB ! 1.0x bigger facebook --parallel 280,920kB 280,972kB ! 1.0x bigger reddit --parallel 53,828kB 56,428kB ! 1.05x bigger flickr --parallel 102,724kB 103,980kB ! 1.01x bigger theverge --parallel 107,100kB 110,152kB ! 1.03x bigger <geometric mean> 29,516kB 30,058kB ! 1.02x bigger <arithmetic mean> 204,329kB 205,619kB ! 1.01x bigger <harmonic mean> 8,965kB 9,059kB ! 1.01x bigger Memory at End: churn 492kB 472kB ^ 1.04x smaller list_allocate 488kB 472kB ^ 1.03x smaller tree_allocate 516kB 464kB ^ 1.11x smaller tree_churn 480kB 456kB ^ 1.05x smaller fragment 528kB 472kB ^ 1.12x smaller fragment_iterate 684kB 472kB ^ 1.45x smaller medium 12,088kB 532kB ^ 22.72x smaller big 1,276kB 536kB ^ 2.38x smaller facebook 2,896kB 2,528kB ^ 1.15x smaller reddit 1,888kB 1,664kB ^ 1.13x smaller flickr 2,872kB 2,664kB ^ 1.08x smaller theverge 2,972kB 2,624kB ^ 1.13x smaller nimlang 58,248kB 58,644kB ! 1.01x bigger message_one 780kB 748kB ^ 1.04x smaller message_many 1,196kB 1,212kB ! 1.01x bigger churn --parallel 576kB 560kB ^ 1.03x smaller list_allocate --parallel 604kB 632kB ! 1.05x bigger tree_allocate --parallel 768kB 804kB ! 1.05x bigger tree_churn --parallel 872kB 812kB ^ 1.07x smaller fragment --parallel 804kB 824kB ! 1.02x bigger fragment_iterate --parallel 852kB 656kB ^ 1.3x smaller medium --parallel 17,084kB 8,472kB ^ 2.02x smaller big --parallel 19,708kB 30,400kB ! 1.54x bigger facebook --parallel 11,916kB 11,984kB ! 1.01x bigger reddit --parallel 6,880kB 7,044kB ! 1.02x bigger flickr --parallel 11,620kB 11,564kB ^ 1.0x smaller theverge --parallel 11,320kB 11,048kB ^ 1.02x smaller <geometric mean> 2,098kB 1,698kB ^ 1.24x smaller <arithmetic mean> 6,311kB 5,880kB ^ 1.07x smaller <harmonic mean> 1,089kB 911kB ^ 1.2x smaller
Sam Weinig
Comment 3 2017-06-03 13:09:28 PDT
Comment on attachment 311885 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=311885&action=review > Source/bmalloc/ChangeLog:10 > + This reduces our high-watermark memory usage on JetStream on macOS > + by 10%-20%. Very nice! Probably worth stating there performance change or that it had no effect. Also, I would find it useful (and future readers of the code might as well) to have an explanation up top of the changes in the algorithm, > Source/bmalloc/ChangeLog:20 > + (bmalloc::forEachPage):A new helper function for iterating the pages Missing space after the : which is personally killing me. > Source/bmalloc/ChangeLog:47 > + (bmalloc::Heap::deallocateSmallLine): Updated for new APIs. Note that > + we save one chunk per page class a little cache. This avoids churn I don't follow the second sentence. I feel like maybe a word or some punctuation is missing. > Source/bmalloc/bmalloc/Chunk.h:85 > + for (Object it = begin; it + pageSize <= end; it = it + pageSize) I'm not sure what your auto usage is like in bmalloc, but, you could replace Object it with auto it. > Source/bmalloc/bmalloc/Heap.cpp:172 > + for (auto chunk = m_freePages[pageClass].begin(); chunk != m_freePages[pageClass].end(); ++chunk) { > + for (auto page = chunk->freePages().begin(); page != chunk->freePages().end(); ++page) { I think these two loops could be simplified to: for (auto& chunk : m_freePages[pageClass]) { for (auto& page = chunk.freePages()) { ... } } > Source/bmalloc/bmalloc/Heap.cpp:183 > + for (size_t pageClass = 0; pageClass < pageClassCount; pageClass++) { We usually use ++pageClass. > Source/bmalloc/bmalloc/LargeRange.h:54 > + size_t physicalSize() const { return m_physicalSize; } // True physical size can be larger. This comment is confusing. Perhaps you could clarify what "True physical size" is and why it can be larger.
Geoffrey Garen
Comment 4 2017-06-05 19:21:13 PDT
Carlos Alberto Lopez Perez
Comment 5 2017-06-05 21:40:42 PDT
(In reply to Geoffrey Garen from comment #4) > Committed r217811: <http://trac.webkit.org/changeset/217811> It seems this fails to build with GCC-4.9 (our minimum version of the supported compiler on WebKitGTK+). The log: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20Debian%20Stable%20%28Build%29/builds/2454/steps/compile-webkit/logs/stdio
Note You need to log in before you can comment on or make changes to this bug.