177856 – [Linux] Port MallocBench

Yusuke Suzuki

Reported 2017-10-04 01:55:25 PDT

We would like to run MallocBench to decide whether a new improvement in bmalloc works well in Linux! (Including changing bmalloc StaticMutex to an adaptive one with futex).

Yusuke Suzuki

Comment 1 2017-10-04 03:35:37 PDT

Created attachment 322649 [details] Patch

Yusuke Suzuki

Comment 2 2017-10-04 07:15:08 PDT

Created attachment 322666 [details] WIP

Build Bot

Comment 3 2017-10-04 07:16:42 PDT

Attachment 322666 [details] did not pass style-queue: ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:130: More than one command on the same line [whitespace/newline] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:135: Use 'WTFMove()' instead of 'std::move()'. [runtime/wtf_move] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:142: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:173: More than one command on the same line [whitespace/newline] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:27: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:43: vm_info is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:44: vm_size is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] Total errors found: 7 in 21 files If any of these errors are false positives, please file a bug against check-webkit-style.

Yusuke Suzuki

Comment 4 2017-10-04 08:55:44 PDT

Created attachment 322677 [details] WIP

Build Bot

Comment 5 2017-10-04 08:58:23 PDT

Attachment 322677 [details] did not pass style-queue: ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:27: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:28: Streams are highly discouraged. [readability/streams] [3] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:44: vm_info is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:45: vm_size is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: PerformanceTests/MallocBench/MallocBench/CMakeLists.txt:49: Use lowercase command "set" [command/lowercase] [5] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:126: More than one command on the same line [whitespace/newline] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:131: Use 'WTFMove()' instead of 'std::move()'. [runtime/wtf_move] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:138: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:169: More than one command on the same line [whitespace/newline] [4] Total errors found: 9 in 22 files If any of these errors are false positives, please file a bug against check-webkit-style.

Yusuke Suzuki

Comment 6 2017-10-04 09:13:20 PDT

macOS dump. (SystemMalloc v.s. bmalloc). This is to ensure MallocBench works right now. MacBook Pro SystemMalloc patched Δ Execution Time: churn 547ms 86ms ^ 6.36x faster list_allocate 570ms 62ms ^ 9.19x faster tree_allocate 490ms 73ms ^ 6.71x faster tree_churn 194ms 77ms ^ 2.52x faster fragment 597ms 69ms ^ 8.65x faster fragment_iterate 252ms 31ms ^ 8.13x faster medium 531ms 164ms ^ 3.24x faster big 500ms 88ms ^ 5.68x faster facebook 383ms 184ms ^ 2.08x faster reddit 270ms 103ms ^ 2.62x faster flickr 270ms 106ms ^ 2.55x faster theverge 329ms 135ms ^ 2.44x faster nimlang 673ms 98ms ^ 6.87x faster message_one 973ms 110ms ^ 8.85x faster message_many 2,613ms 197ms ^ 13.26x faster churn --parallel 120ms 32ms ^ 3.75x faster list_allocate --parallel 320ms 60ms ^ 5.33x faster tree_allocate --parallel 561ms 132ms ^ 4.25x faster tree_churn --parallel 245ms 82ms ^ 2.99x faster fragment --parallel 181ms 47ms ^ 3.85x faster fragment_iterate --parallel 80ms 11ms ^ 7.27x faster medium --parallel 262ms 144ms ^ 1.82x faster big --parallel 177ms 73ms ^ 2.42x faster facebook --parallel 1,119ms 799ms ^ 1.4x faster reddit --parallel 893ms 596ms ^ 1.5x faster flickr --parallel 991ms 565ms ^ 1.75x faster theverge --parallel 1,254ms 729ms ^ 1.72x faster <geometric mean> 423ms 110ms ^ 3.86x faster <arithmetic mean> 570ms 180ms ^ 3.17x faster <harmonic mean> 318ms 71ms ^ 4.5x faster Peak Memory: churn 976kB 1,172kB ! 1.2x bigger list_allocate 9,732kB 2,468kB ^ 3.94x smaller tree_allocate 10,828kB 6,284kB ^ 1.72x smaller tree_churn 10,944kB 5,112kB ^ 2.14x smaller fragment 11,244kB 8,332kB ^ 1.35x smaller fragment_iterate 26,204kB 25,996kB ^ 1.01x smaller medium 1,145,984kB 1,189,664kB ! 1.04x bigger big 1,076,456kB 1,089,456kB ! 1.01x bigger facebook 130,808kB 80,076kB ^ 1.63x smaller reddit 32,212kB 14,340kB ^ 2.25x smaller flickr 53,976kB 28,296kB ^ 1.91x smaller theverge 51,852kB 27,864kB ^ 1.86x smaller nimlang 232,784kB 165,592kB ^ 1.41x smaller message_one 8,584kB 4,528kB ^ 1.9x smaller message_many 11,672kB 4,036kB ^ 2.89x smaller churn --parallel 1,076kB 1,712kB ! 1.59x bigger list_allocate --parallel 4,120kB 2,844kB ^ 1.45x smaller tree_allocate --parallel 11,204kB 6,816kB ^ 1.64x smaller tree_churn --parallel 6,904kB 5,696kB ^ 1.21x smaller fragment --parallel 10,756kB 7,816kB ^ 1.38x smaller fragment_iterate --parallel 28,728kB 27,284kB ^ 1.05x smaller medium --parallel 1,113,536kB 1,185,820kB ! 1.06x bigger big --parallel 1,009,408kB 1,086,988kB ! 1.08x bigger facebook --parallel 522,520kB 550,512kB ! 1.05x bigger reddit --parallel 91,368kB 108,564kB ! 1.19x bigger flickr --parallel 185,648kB 192,304kB ! 1.04x bigger theverge --parallel 199,736kB 213,148kB ! 1.07x bigger <geometric mean> 40,410kB 30,197kB ^ 1.34x smaller <arithmetic mean> 222,195kB 223,804kB ! 1.01x bigger <harmonic mean> 8,227kB 7,019kB ^ 1.17x smaller Memory at End: churn 436kB 584kB ! 1.34x bigger list_allocate 520kB 592kB ! 1.14x bigger tree_allocate 604kB 592kB ^ 1.02x smaller tree_churn 584kB 604kB ! 1.03x bigger fragment 1,588kB 592kB ^ 2.68x smaller fragment_iterate 780kB 592kB ^ 1.32x smaller medium 1,308kB 640kB ^ 2.04x smaller big 1,716kB 640kB ^ 2.68x smaller facebook 1,276kB 2,632kB ! 2.06x bigger reddit 904kB 1,856kB ! 2.05x bigger flickr 1,016kB 2,820kB ! 2.78x bigger theverge 1,052kB 2,776kB ! 2.64x bigger nimlang 61,028kB 58,820kB ^ 1.04x smaller message_one 736kB 708kB ^ 1.04x smaller message_many 748kB 864kB ! 1.16x bigger churn --parallel 536kB 1,140kB ! 2.13x bigger list_allocate --parallel 576kB 1,216kB ! 2.11x bigger tree_allocate --parallel 672kB 1,640kB ! 2.44x bigger tree_churn --parallel 628kB 1,540kB ! 2.45x bigger fragment --parallel 1,672kB 1,768kB ! 1.06x bigger fragment_iterate --parallel 940kB 1,300kB ! 1.38x bigger medium --parallel 1,524kB 11,804kB ! 7.75x bigger big --parallel 2,036kB 41,716kB ! 20.49x bigger facebook --parallel 2,088kB 24,016kB ! 11.5x bigger reddit --parallel 1,284kB 13,616kB ! 10.6x bigger flickr --parallel 1,424kB 23,560kB ! 16.54x bigger theverge --parallel 1,604kB 22,260kB ! 13.88x bigger <geometric mean> 1,142kB 2,391kB ! 2.09x bigger <arithmetic mean> 3,307kB 8,181kB ! 2.47x bigger <harmonic mean> 917kB 1,223kB ! 1.33x bigger

Yusuke Suzuki

Comment 7 2017-10-04 09:30:54 PDT

Then, this is Linux version! Actually, we can see that bmalloc offers 2x performance improvement in Linux too. It seems that glibc's malloc works a bit nicely, but bmalloc is still beneficial :) SystemMalloc patched Δ Execution Time: churn 167ms 106ms ^ 1.58x faster list_allocate 118ms 69ms ^ 1.71x faster tree_allocate 138ms 72ms ^ 1.92x faster tree_churn 122ms 63ms ^ 1.94x faster fragment 308ms 86ms ^ 3.58x faster fragment_iterate 330ms 57ms ^ 5.79x faster medium 406ms 235ms ^ 1.73x faster big 407ms 160ms ^ 2.54x faster facebook 319ms 220ms ^ 1.45x faster reddit 151ms 88ms ^ 1.72x faster flickr 152ms 88ms ^ 1.73x faster theverge 188ms 108ms ^ 1.74x faster nimlang 215ms 192ms ^ 1.12x faster message_one 740ms 123ms ^ 6.02x faster message_many 747ms 134ms ^ 5.57x faster churn --parallel 138ms 42ms ^ 3.29x faster list_allocate --parallel 186ms 73ms ^ 2.55x faster tree_allocate --parallel 142ms 83ms ^ 1.71x faster tree_churn --parallel 78ms 58ms ^ 1.34x faster fragment --parallel 213ms 149ms ^ 1.43x faster fragment_iterate --parallel 126ms 46ms ^ 2.74x faster medium --parallel 535ms 159ms ^ 3.36x faster big --parallel 220ms 152ms ^ 1.45x faster facebook --parallel 821ms 722ms ^ 1.14x faster reddit --parallel 281ms 261ms ^ 1.08x faster flickr --parallel 282ms 265ms ^ 1.06x faster theverge --parallel 353ms 329ms ^ 1.07x faster <geometric mean> 240ms 120ms ^ 2.0x faster <arithmetic mean> 292ms 153ms ^ 1.9x faster <harmonic mean> 203ms 100ms ^ 2.04x faster Peak Memory: churn 228kB 392kB ! 1.72x bigger list_allocate 2,720kB 1,676kB ^ 1.62x smaller tree_allocate 6,680kB 5,276kB ^ 1.27x smaller tree_churn 4,296kB 4,296kB fragment 7,216kB 7,328kB ! 1.02x bigger fragment_iterate 29,460kB 25,992kB ^ 1.13x smaller medium 1,069,036kB 1,191,972kB ! 1.11x bigger big 1,051,064kB 1,102,476kB ! 1.05x bigger facebook 67,728kB 95,340kB ! 1.41x bigger reddit 11,384kB 14,016kB ! 1.23x bigger flickr 21,492kB 28,344kB ! 1.32x bigger theverge 22,024kB 28,508kB ! 1.29x bigger nimlang 302,112kB 196,392kB ^ 1.54x smaller message_one 4,220kB 2,272kB ^ 1.86x smaller message_many 9,920kB 1,396kB ^ 7.11x smaller churn --parallel 284kB 380kB ! 1.34x bigger list_allocate --parallel 2,840kB 1,404kB ^ 2.02x smaller tree_allocate --parallel 3,512kB 2,432kB ^ 1.44x smaller tree_churn --parallel 2,388kB 1,400kB ^ 1.71x smaller fragment --parallel 5,444kB 6,520kB ! 1.2x bigger fragment_iterate --parallel 27,816kB 24,952kB ^ 1.11x smaller medium --parallel 1,068,852kB 1,192,120kB ! 1.12x bigger big --parallel 1,048,908kB 1,103,460kB ! 1.05x bigger facebook --parallel 267,464kB 294,996kB ! 1.1x bigger reddit --parallel 42,712kB 53,484kB ! 1.25x bigger flickr --parallel 83,748kB 101,860kB ! 1.22x bigger theverge --parallel 85,868kB 107,980kB ! 1.26x bigger <geometric mean> 22,041kB 20,172kB ^ 1.09x smaller <arithmetic mean> 194,423kB 207,284kB ! 1.07x bigger <harmonic mean> 2,525kB 2,765kB ! 1.1x bigger Memory at End: churn 228kB 392kB ! 1.72x bigger list_allocate 224kB 392kB ! 1.75x bigger tree_allocate 224kB 392kB ! 1.75x bigger tree_churn 224kB 396kB ! 1.77x bigger fragment 228kB 396kB ! 1.74x bigger fragment_iterate 224kB 392kB ! 1.75x bigger medium 224kB 476kB ! 2.13x bigger big 224kB 476kB ! 2.13x bigger facebook 220kB 2,304kB ! 10.47x bigger reddit 224kB 1,492kB ! 6.66x bigger flickr 224kB 2,432kB ! 10.86x bigger theverge 228kB 2,456kB ! 10.77x bigger nimlang 62,172kB 58,200kB ^ 1.07x smaller message_one 240kB 408kB ! 1.7x bigger message_many 264kB 436kB ! 1.65x bigger churn --parallel 284kB 380kB ! 1.34x bigger list_allocate --parallel 2,840kB 380kB ^ 7.47x smaller tree_allocate --parallel 3,512kB 384kB ^ 9.15x smaller tree_churn --parallel 2,388kB 376kB ^ 6.35x smaller fragment --parallel 788kB 376kB ^ 2.1x smaller fragment_iterate --parallel 788kB 376kB ^ 2.1x smaller medium --parallel 17,092kB 468kB ^ 36.52x smaller big --parallel 2,840kB 476kB ^ 5.97x smaller facebook --parallel 8,456kB 8,072kB ^ 1.05x smaller reddit --parallel 884kB 4,800kB ! 5.43x bigger flickr --parallel 888kB 8,548kB ! 9.63x bigger theverge --parallel 892kB 8,652kB ! 9.7x bigger <geometric mean> 705kB 960kB ! 1.36x bigger <arithmetic mean> 3,964kB 3,864kB ^ 1.03x smaller <harmonic mean> 373kB 577kB ! 1.55x bigger

Yusuke Suzuki

Comment 8 2017-10-04 09:39:55 PDT

Created attachment 322682 [details] Patch

Build Bot

Comment 9 2017-10-04 09:41:06 PDT

Attachment 322682 [details] did not pass style-queue: ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:27: Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:28: Streams are highly discouraged. [readability/streams] [3] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:44: vm_info is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: PerformanceTests/MallocBench/MallocBench/Memory.cpp:45: vm_size is incorrectly named. Don't use underscores in your identifier names. [readability/naming/underscores] [4] ERROR: PerformanceTests/MallocBench/MallocBench/CMakeLists.txt:49: Use lowercase command "set" [command/lowercase] [5] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:126: More than one command on the same line [whitespace/newline] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:131: Use 'WTFMove()' instead of 'std::move()'. [runtime/wtf_move] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:138: Place brace on its own line for function definitions. [whitespace/braces] [4] ERROR: PerformanceTests/MallocBench/MallocBench/message.cpp:169: More than one command on the same line [whitespace/newline] [4] Total errors found: 9 in 22 files If any of these errors are false positives, please file a bug against check-webkit-style.

Yusuke Suzuki

Comment 10 2017-10-05 00:05:54 PDT

Committed r222900: <http://trac.webkit.org/changeset/222900>

Radar WebKit Bug Importer

Comment 11 2017-10-05 00:06:55 PDT

<rdar://problem/34828861>

Michael Catanzaro

Comment 12 2017-10-05 09:31:57 PDT

This broke the build with Clang due to use of a VLA: ../../PerformanceTests/MallocBench/MallocBench/message.cpp:212:38: error: variable length array of non-POD element type 'std::unique_ptr<WorkQueue>' std::unique_ptr<WorkQueue> queues[queueCount]; ^ 1 error generated. I'm tempted to say the array should be heap-allocated instead... but I don't know if that will mess up the thing it's trying to benchmark. :D

Carlos Alberto Lopez Perez

Comment 13 2017-10-05 11:23:26 PDT

(In reply to Michael Catanzaro from comment #12) > This broke the build with Clang due to use of a VLA: > > ../../PerformanceTests/MallocBench/MallocBench/message.cpp:212:38: error: > variable length array of non-POD element type 'std::unique_ptr<WorkQueue>' > std::unique_ptr<WorkQueue> queues[queueCount]; > ^ > 1 error generated. > > I'm tempted to say the array should be heap-allocated instead... but I don't > know if that will mess up the thing it's trying to benchmark. :D It seems also broke 32-bit builds: https://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/4306/steps/compile-webkit/logs/stdio https://build.webkit.org/builders/GTK%20Linux%20ARM%20Release/builds/1287/steps/compile-webkit/logs/stdio

Yusuke Suzuki

Comment 14 2017-10-05 16:58:21 PDT

(In reply to Michael Catanzaro from comment #12) > This broke the build with Clang due to use of a VLA: > > ../../PerformanceTests/MallocBench/MallocBench/message.cpp:212:38: error: > variable length array of non-POD element type 'std::unique_ptr<WorkQueue>' > std::unique_ptr<WorkQueue> queues[queueCount]; > ^ > 1 error generated. > > I'm tempted to say the array should be heap-allocated instead... but I don't > know if that will mess up the thing it's trying to benchmark. :D I’ll fix it with std::vector here.

Yusuke Suzuki

Comment 15 2017-10-05 18:15:41 PDT

Committed r222948: <http://trac.webkit.org/changeset/222948>

Darin Adler

Comment 16 2017-10-07 15:12:25 PDT

(In reply to Yusuke Suzuki from comment #14) > I’ll fix it with std::vector here. No need to change this test code, but I think in standard C++, if resizing is not needed, then std::unique_ptr<std::unique_ptr<WorkQueue>[]> would be preferred. std::vector has additional overhead to make resizing efficient.

Yusuke Suzuki

Comment 17 2017-10-07 19:53:12 PDT

(In reply to Darin Adler from comment #16) > (In reply to Yusuke Suzuki from comment #14) > > I’ll fix it with std::vector here. > > No need to change this test code, but I think in standard C++, if resizing > is not needed, then std::unique_ptr<std::unique_ptr<WorkQueue>[]> would be > preferred. std::vector has additional overhead to make resizing efficient. I'm a bit (In reply to Darin Adler from comment #16) > (In reply to Yusuke Suzuki from comment #14) > > I’ll fix it with std::vector here. > > No need to change this test code, but I think in standard C++, if resizing > is not needed, then std::unique_ptr<std::unique_ptr<WorkQueue>[]> would be > preferred. std::vector has additional overhead to make resizing efficient. Thank you. I've changed it to `make_unique<WorkQueue[]>(queueCount)`, it should be good in this use case :)

Yusuke Suzuki

Comment 18 2017-10-07 19:55:51 PDT

Committed r223026: <http://trac.webkit.org/changeset/223026>

Attachments
Patch (20.64 KB, patch) 2017-10-04 03:35 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
WIP (29.86 KB, patch) 2017-10-04 07:15 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
WIP (39.19 KB, patch) 2017-10-04 08:55 PDT, Yusuke Suzuki	no flags	Details Formatted Diff Diff
Patch (39.86 KB, patch) 2017-10-04 09:39 PDT, Yusuke Suzuki	fpizlo: review+	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.