Bug 195691 - [bmalloc] Add StaticPerProcess for known types to save pages
Summary: [bmalloc] Add StaticPerProcess for known types to save pages
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Yusuke Suzuki
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-03-13 12:05 PDT by Yusuke Suzuki
Modified: 2019-03-14 01:19 PDT (History)
2 users (show)

See Also:


Attachments
Patch (37.69 KB, patch)
2019-03-13 15:22 PDT, Yusuke Suzuki
mark.lam: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2019-03-13 12:05:43 PDT
[bmalloc] Add StaticPerProcess for known types to save pages
Comment 1 Yusuke Suzuki 2019-03-13 12:33:32 PDT
This is revival of SafePerProcess for known types. As initial memory footprint of VM + JSGlobalObject becomes 488KB dirty size in fast malloc memory (w/ JSC_useJIT=0 and Malloc=1), pages for PerProcess is costly. For example, under Malloc=1 mode, we still need to allocate PerProcess<DebugHeap> and PerProcess<Environment>. And sizeof(Environment) is only 1 (bool flag!), and sizeof(DebugHeap) is 120. But we are allocating 1 pages for them. Since page size in iOS is 16KB, this 121B consumes 16KB dirty memory, and it is not negligible size if we keep in mind that the current fast malloc heap size is 488KB.
Putting them into a __DATA, close to the other mutable data, can save a bit.
Comment 2 Yusuke Suzuki 2019-03-13 12:35:59 PDT
==== Summary for process 31657
ReadOnly portion of Libraries: Total=355.8M resident=132.7M(37%) swapped_out_or_unallocated=223.2M(63%)
Writable regions: Total=1.1G written=1032K(0%) resident=1200K(0%) swapped_out=0K(0%) unallocated=1.1G(100%)

                                VIRTUAL RESIDENT    DIRTY  SWAPPED VOLATILE   NONVOL    EMPTY   REGION
REGION TYPE                        SIZE     SIZE     SIZE     SIZE     SIZE     SIZE     SIZE    COUNT (non-coalesced)
===========                     ======= ========    =====  ======= ========   ======    =====  =======
Dispatch continuations            56.0M      24K      24K       0K       0K       0K       0K        1
JS JIT generated code              1.0G       0K       0K       0K       0K       0K       0K        3
Kernel Alloc Once                    8K       4K       4K       0K       0K       0K       0K        1
MALLOC guard page                   32K       0K       0K       0K       0K       0K       0K        8
MALLOC metadata                    324K     308K     308K       0K       0K       0K       0K        9
MALLOC_SMALL                      24.0M     444K     444K       0K       0K       0K       0K        3         see MALLOC ZONE table below
MALLOC_SMALL (empty)              8192K      12K      12K       0K       0K       0K       0K        1         see MALLOC ZONE table below
MALLOC_TINY                       6144K     212K     212K       0K       0K       0K       0K        4         see MALLOC ZONE table below
MALLOC_TINY (empty)               1024K      12K      12K       0K       0K       0K       0K        1         see MALLOC ZONE table below
STACK GUARD                       56.0M       0K       0K       0K       0K       0K       0K        3
Stack                             9232K      80K      80K       0K       0K       0K       0K        4
WebKit Malloc                        4K       4K       4K       0K       0K       0K       0K        1
__DATA                            15.9M    11.2M     756K       0K       0K       0K       0K      186
__FONT_DATA                          4K       0K       0K       0K       0K       0K       0K        1
__LINKEDIT                       228.9M    64.2M       0K       0K       0K       0K       0K        4
__TEXT                           126.9M    68.4M       0K       0K       0K       0K       0K      187
__UNICODE                          564K     492K       0K       0K       0K       0K       0K        1
mapped file                          4K       4K       0K       0K       0K       0K       0K        1
shared memory                       28K      28K      28K       0K       0K       0K       0K        5
===========                     ======= ========    =====  ======= ========   ======    =====  =======
TOTAL                              1.5G   145.4M    1884K       0K       0K       0K       0K      424

                                          VIRTUAL   RESIDENT      DIRTY    SWAPPED ALLOCATION      BYTES DIRTY+SWAP          REGION
MALLOC ZONE                                  SIZE       SIZE       SIZE       SIZE      COUNT  ALLOCATED  FRAG SIZE  % FRAG   COUNT
===========                               =======  =========  =========  =========  =========  =========  =========  ======  ======
DefaultMallocZone_0x10ab95000               29.0M       192K       192K         0K        865       109K        83K     44%       6
WebKit Using System Malloc_0x10abc5000      10.0M       488K       488K         0K       1537       561K         0K      0%       3
===========                               =======  =========  =========  =========  =========  =========  =========  ======  ======
TOTAL                                       39.0M       680K       680K         0K       2402       670K        10K      2%       9
Comment 3 Yusuke Suzuki 2019-03-13 12:36:44 PDT
(In reply to Yusuke Suzuki from comment #2)
Note that this is number in macOS.
Comment 4 Yusuke Suzuki 2019-03-13 15:22:10 PDT
Created attachment 364582 [details]
Patch
Comment 5 Mark Lam 2019-03-13 17:58:03 PDT
Comment on attachment 364582 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364582&action=review

r=me

> Source/bmalloc/ChangeLog:11
> +        size if we keep in mind that the current fast malloc heap size is 488KB. Putting them into a __DATA, close to the other mutable data, can save this page.

/into a __DATA/into the __DATA section/
/, can save this page/, we can avoid allocating this page/

> Source/bmalloc/ChangeLog:13
> +        This patch revives SafePerProcess concept in r228107. We add "StaticPerProcess<T>", which allocates underlying storage statically in __DATA section instead of

...revives the SafePerProcess...
... in the __DATA section ...

> Source/bmalloc/bmalloc/Gigacage.cpp:67
> +}

nit: Add // namespace bmalloc

> Source/bmalloc/bmalloc/StaticPerProcess.h:34
> +// StaticPerProcess<T> behaves like PerProcess<T>, but we must need to explicitly define a storage for T with EXTERN.

/must need to/need to/ and /define a storage/define storage/

> Source/bmalloc/bmalloc/StaticPerProcess.h:35
> +// In this way, we allocate a storage for a per-process object statically instead of allocating memory at runtime.

/allocate a storage/allocate storage/

> Source/bmalloc/bmalloc/StaticPerProcess.h:51
> +// Object will be instantiated only once, even in the face of concurrency.

/the face of/the presence of/
Comment 6 Yusuke Suzuki 2019-03-14 01:02:10 PDT
Committed r242938: <https://trac.webkit.org/changeset/242938>
Comment 7 Radar WebKit Bug Importer 2019-03-14 01:10:23 PDT
<rdar://problem/48880249>
Comment 8 Yusuke Suzuki 2019-03-14 01:19:44 PDT
Comment on attachment 364582 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364582&action=review

>> Source/bmalloc/ChangeLog:11
>> +        size if we keep in mind that the current fast malloc heap size is 488KB. Putting them into a __DATA, close to the other mutable data, can save this page.
> 
> /into a __DATA/into the __DATA section/
> /, can save this page/, we can avoid allocating this page/

Fixed.

>> Source/bmalloc/ChangeLog:13
>> +        This patch revives SafePerProcess concept in r228107. We add "StaticPerProcess<T>", which allocates underlying storage statically in __DATA section instead of
> 
> ...revives the SafePerProcess...
> ... in the __DATA section ...

Fixed.

>> Source/bmalloc/bmalloc/Gigacage.cpp:67
>> +}
> 
> nit: Add // namespace bmalloc

Fixed.

>> Source/bmalloc/bmalloc/StaticPerProcess.h:34
>> +// StaticPerProcess<T> behaves like PerProcess<T>, but we must need to explicitly define a storage for T with EXTERN.
> 
> /must need to/need to/ and /define a storage/define storage/

Fixed.

>> Source/bmalloc/bmalloc/StaticPerProcess.h:35
>> +// In this way, we allocate a storage for a per-process object statically instead of allocating memory at runtime.
> 
> /allocate a storage/allocate storage/

Fixed.

>> Source/bmalloc/bmalloc/StaticPerProcess.h:51
>> +// Object will be instantiated only once, even in the face of concurrency.
> 
> /the face of/the presence of/

Fixed.