Bug 154720 - bmalloc: Added a fast XLarge allocator
Summary: bmalloc: Added a fast XLarge allocator
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: bmalloc (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords:
Depends on: 154762
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-25 22:52 PST by Geoffrey Garen
Modified: 2016-03-03 20:04 PST (History)
6 users (show)

See Also:


Attachments
Patch (55.57 KB, patch)
2016-02-25 23:36 PST, Geoffrey Garen
no flags Details | Formatted Diff | Diff
Patch (55.95 KB, patch)
2016-02-25 23:43 PST, Geoffrey Garen
no flags Details | Formatted Diff | Diff
Patch (56.07 KB, patch)
2016-02-25 23:48 PST, Geoffrey Garen
no flags Details | Formatted Diff | Diff
Patch (56.27 KB, patch)
2016-02-26 00:13 PST, Geoffrey Garen
kling: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2016-02-25 22:52:42 PST
bmalloc: Added a fast XLarge allocator
Comment 1 Geoffrey Garen 2016-02-25 23:36:01 PST
Created attachment 272306 [details]
Patch
Comment 2 Geoffrey Garen 2016-02-25 23:36:42 PST
~/OpenSource/PerformanceTests/MallocBench> cat results.txt 
~/OpenSource/PerformanceTests/MallocBench> ./run-malloc-benchmarks Baseline:~/OpenSource/WebKitBuildBaseline/Release/ Patch:~/OpenSource/WebKitBuild/Release/
                                                                                
                                                      Baseline                          Patch                              Δ
Execution Time:
    churn                                                 85ms                           82ms                 ^ 1.04x faster
    list_allocate                                         70ms                           76ms                 ! 1.09x slower
    tree_allocate                                         82ms                           86ms                 ! 1.05x slower
    tree_churn                                            85ms                           88ms                 ! 1.04x slower
    fragment                                              97ms                           77ms                 ^ 1.26x faster
    fragment_iterate                                      84ms                           82ms                 ^ 1.02x faster
    medium                                               206ms                          188ms                  ^ 1.1x faster
    big                                                  160ms                          146ms                  ^ 1.1x faster
    facebook                                             473ms                          252ms                 ^ 1.88x faster
    reddit                                               120ms                          122ms                 ! 1.02x slower
    flickr                                               128ms                          127ms                 ^ 1.01x faster
    theverge                                             158ms                          157ms                 ^ 1.01x faster
    nimlang                                              137ms                          133ms                 ^ 1.03x faster
    message_one                                          218ms                          223ms                 ! 1.02x slower
    message_many                                         120ms                          119ms                 ^ 1.01x faster
    churn --parallel                                      39ms                           39ms                               
    list_allocate --parallel                              72ms                           75ms                 ! 1.04x slower
    tree_allocate --parallel                              85ms                           90ms                 ! 1.06x slower
    tree_churn --parallel                                 87ms                           85ms                 ^ 1.02x faster
    fragment --parallel                                   53ms                           52ms                 ^ 1.02x faster
    fragment_iterate --parallel                           34ms                           32ms                 ^ 1.06x faster
    medium --parallel                                    200ms                          182ms                  ^ 1.1x faster
    big --parallel                                       135ms                          133ms                 ^ 1.02x faster

    <geometric mean>                                     107ms                          102ms                 ^ 1.05x faster
    <arithmetic mean>                                    127ms                          115ms                 ^ 1.11x faster
    <harmonic mean>                                       91ms                           89ms                 ^ 1.02x faster

Peak Memory:
    churn                                                788kB                          804kB                 ! 1.02x bigger
    list_allocate                                      2,100kB                        2,104kB                  ! 1.0x bigger
    tree_allocate                                      5,488kB                        5,492kB                  ! 1.0x bigger
    tree_churn                                         4,800kB                        4,800kB                               
    fragment                                           7,004kB                        7,008kB                  ! 1.0x bigger
    fragment_iterate                                  25,568kB                       25,596kB                  ! 1.0x bigger
    medium                                         1,068,412kB                    1,068,420kB                  ! 1.0x bigger
    big                                            1,060,468kB                    1,060,464kB                 ^ 1.0x smaller
    facebook                                          73,928kB                       73,916kB                 ^ 1.0x smaller
    reddit                                            13,920kB                       13,896kB                 ^ 1.0x smaller
    flickr                                            25,840kB                       25,852kB                  ! 1.0x bigger
    theverge                                          27,012kB                       26,960kB                 ^ 1.0x smaller
    nimlang                                          215,548kB                      217,276kB                 ! 1.01x bigger
    message_one                                        4,704kB                        5,304kB                 ! 1.13x bigger
    message_many                                       4,964kB                        4,908kB                ^ 1.01x smaller
    churn --parallel                                     956kB                          968kB                 ! 1.01x bigger
    list_allocate --parallel                           2,216kB                        2,232kB                 ! 1.01x bigger
    tree_allocate --parallel                           3,380kB                        3,332kB                ^ 1.01x smaller
    tree_churn --parallel                              3,004kB                        3,032kB                 ! 1.01x bigger
    fragment --parallel                                7,144kB                        7,156kB                  ! 1.0x bigger
    fragment_iterate --parallel                       25,680kB                       25,748kB                  ! 1.0x bigger
    medium --parallel                              1,062,960kB                    1,066,388kB                  ! 1.0x bigger
    big --parallel                                 1,055,320kB                    1,058,988kB                  ! 1.0x bigger

    <geometric mean>                                  18,511kB                       18,643kB                 ! 1.01x bigger
    <arithmetic mean>                                204,400kB                      204,811kB                  ! 1.0x bigger
    <harmonic mean>                                    4,418kB                        4,472kB                 ! 1.01x bigger

Memory at End:
    churn                                                380kB                          392kB                 ! 1.03x bigger
    list_allocate                                        408kB                          408kB                               
    tree_allocate                                        468kB                          468kB                               
    tree_churn                                           456kB                          452kB                ^ 1.01x smaller
    fragment                                             468kB                          460kB                ^ 1.02x smaller
    fragment_iterate                                     644kB                          640kB                ^ 1.01x smaller
    medium                                             4,612kB                        4,612kB                               
    big                                                4,616kB                        4,596kB                 ^ 1.0x smaller
    facebook                                           3,336kB                        3,288kB                ^ 1.01x smaller
    reddit                                             2,296kB                        2,272kB                ^ 1.01x smaller
    flickr                                             3,276kB                        3,252kB                ^ 1.01x smaller
    theverge                                           3,320kB                        3,268kB                ^ 1.02x smaller
    nimlang                                           34,256kB                       34,132kB                 ^ 1.0x smaller
    message_one                                          688kB                          700kB                 ! 1.02x bigger
    message_many                                       1,108kB                        1,180kB                 ! 1.06x bigger
    churn --parallel                                     548kB                          552kB                 ! 1.01x bigger
    list_allocate --parallel                             640kB                          624kB                ^ 1.03x smaller
    tree_allocate --parallel                             724kB                          704kB                ^ 1.03x smaller
    tree_churn --parallel                              1,168kB                        1,180kB                 ! 1.01x bigger
    fragment --parallel                                1,312kB                        1,352kB                 ! 1.03x bigger
    fragment_iterate --parallel                          848kB                          852kB                  ! 1.0x bigger
    medium --parallel                                  4,720kB                        4,740kB                  ! 1.0x bigger
    big --parallel                                     4,724kB                        4,720kB                 ^ 1.0x smaller

    <geometric mean>                                   1,424kB                        1,426kB                  ! 1.0x bigger
    <arithmetic mean>                                  3,262kB                        3,254kB                 ^ 1.0x smaller
    <harmonic mean>                                      911kB                          913kB                  ! 1.0x bigger

=====
~/OpenSource/PerformanceTests/MallocBench> 
=====
~/OpenSource/PerformanceTests/MallocBench>
Comment 3 WebKit Commit Bot 2016-02-25 23:37:13 PST
Attachment 272306 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:26:  Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:108:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:109:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:117:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:131:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/SortedVector.h:150:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 6 in 19 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Geoffrey Garen 2016-02-25 23:37:54 PST
Facebook sees the biggest impact because it allocates and frees some 2MB regions:

    facebook                                             473ms                          252ms                 ^ 1.88x faster
Comment 5 Geoffrey Garen 2016-02-25 23:43:32 PST
Created attachment 272307 [details]
Patch
Comment 6 WebKit Commit Bot 2016-02-25 23:46:03 PST
Attachment 272307 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:26:  Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:108:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:109:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:117:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:131:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/SortedVector.h:150:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 6 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Geoffrey Garen 2016-02-25 23:48:58 PST
Created attachment 272309 [details]
Patch
Comment 8 WebKit Commit Bot 2016-02-25 23:51:05 PST
Attachment 272309 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:26:  Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:108:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:109:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:117:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:131:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/SortedVector.h:150:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 6 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Geoffrey Garen 2016-02-26 00:13:01 PST
Created attachment 272311 [details]
Patch
Comment 10 WebKit Commit Bot 2016-02-26 00:14:34 PST
Attachment 272311 [details] did not pass style-queue:


ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:26:  Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:108:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:109:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:117:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/XLargeMap.cpp:131:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
ERROR: Source/bmalloc/bmalloc/SortedVector.h:156:  Place brace on its own line for function definitions.  [whitespace/braces] [4]
Total errors found: 6 in 20 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 Andreas Kling 2016-02-26 08:21:40 PST
Comment on attachment 272311 [details]
Patch

r=me

(Should we worry about that 13% increase in peak on message_one?)
Comment 12 Geoffrey Garen 2016-02-26 09:42:44 PST
> (Should we worry about that 13% increase in peak on message_one?)

message_one is multithreaded, so its peak memory is a unavoidably racy.

I verified in a debugger that message_one does not perform XLarge allocations.
Comment 13 Geoffrey Garen 2016-02-26 09:57:30 PST
Committed r197174: <http://trac.webkit.org/changeset/197174>
Comment 14 WebKit Commit Bot 2016-02-26 16:56:02 PST
Re-opened since this is blocked by bug 154762
Comment 15 Ryan Haddad 2016-02-26 17:01:27 PST
This result from ios-simulator tests shows the two different crashes that started after this change was landed:

<https://build.webkit.org/results/Apple%20iOS%209%20Simulator%20Release%20WK2%20(Tests)/r197214%20(3458)/results.html>
Comment 16 Geoffrey Garen 2016-03-03 16:02:31 PST
It looks like the crash is an ASSERT:

JavaScriptCore[0x6c9b6e] <+62>:  leaq   0x3fc000(%rax), %r14
JavaScriptCore[0x6c9b75] <+69>:  leaq   0x1fffff(%rax), %r12
JavaScriptCore[0x6c9b7c] <+76>:  andq   $-0x200000, %r12
JavaScriptCore[0x6c9b83] <+83>:  leaq   0x200000(%r12), %r15
JavaScriptCore[0x6c9b8b] <+91>:  cmpq   %r14, %r15
JavaScriptCore[0x6c9b8e] <+94>:  ja     0x6c9c5b                  ; <+299>

This corresponds to tryVMAllocate():

RELEASE_BASSERT(alignedEnd <= mappedEnd);
Comment 17 Geoffrey Garen 2016-03-03 16:05:31 PST
Values for 'mapped' (the mmap return value) when ASSERTing include:

0x0000000110202000
0x0000000112a01000
0x0000000115201000
0x0000000115801000
0x0000000119803000
0x000000011c001000
0x000000011c801000
0x0000000122402000
0x0000000122a02000
0x0000000128801000
Comment 18 Geoffrey Garen 2016-03-03 16:16:48 PST
The problem seems to be that the iOS simulator returns VM ranges that are not 16kB aligned. I suppose this could also happen on 32bit iOS devices, maybe?
Comment 19 Geoffrey Garen 2016-03-03 20:04:35 PST
Committed r197545: <http://trac.webkit.org/changeset/197545>