[bmalloc] Add StaticPerProcess for known types to save pages
This is revival of SafePerProcess for known types. As initial memory footprint of VM + JSGlobalObject becomes 488KB dirty size in fast malloc memory (w/ JSC_useJIT=0 and Malloc=1), pages for PerProcess is costly. For example, under Malloc=1 mode, we still need to allocate PerProcess<DebugHeap> and PerProcess<Environment>. And sizeof(Environment) is only 1 (bool flag!), and sizeof(DebugHeap) is 120. But we are allocating 1 pages for them. Since page size in iOS is 16KB, this 121B consumes 16KB dirty memory, and it is not negligible size if we keep in mind that the current fast malloc heap size is 488KB. Putting them into a __DATA, close to the other mutable data, can save a bit.
==== Summary for process 31657 ReadOnly portion of Libraries: Total=355.8M resident=132.7M(37%) swapped_out_or_unallocated=223.2M(63%) Writable regions: Total=1.1G written=1032K(0%) resident=1200K(0%) swapped_out=0K(0%) unallocated=1.1G(100%) VIRTUAL RESIDENT DIRTY SWAPPED VOLATILE NONVOL EMPTY REGION REGION TYPE SIZE SIZE SIZE SIZE SIZE SIZE SIZE COUNT (non-coalesced) =========== ======= ======== ===== ======= ======== ====== ===== ======= Dispatch continuations 56.0M 24K 24K 0K 0K 0K 0K 1 JS JIT generated code 1.0G 0K 0K 0K 0K 0K 0K 3 Kernel Alloc Once 8K 4K 4K 0K 0K 0K 0K 1 MALLOC guard page 32K 0K 0K 0K 0K 0K 0K 8 MALLOC metadata 324K 308K 308K 0K 0K 0K 0K 9 MALLOC_SMALL 24.0M 444K 444K 0K 0K 0K 0K 3 see MALLOC ZONE table below MALLOC_SMALL (empty) 8192K 12K 12K 0K 0K 0K 0K 1 see MALLOC ZONE table below MALLOC_TINY 6144K 212K 212K 0K 0K 0K 0K 4 see MALLOC ZONE table below MALLOC_TINY (empty) 1024K 12K 12K 0K 0K 0K 0K 1 see MALLOC ZONE table below STACK GUARD 56.0M 0K 0K 0K 0K 0K 0K 3 Stack 9232K 80K 80K 0K 0K 0K 0K 4 WebKit Malloc 4K 4K 4K 0K 0K 0K 0K 1 __DATA 15.9M 11.2M 756K 0K 0K 0K 0K 186 __FONT_DATA 4K 0K 0K 0K 0K 0K 0K 1 __LINKEDIT 228.9M 64.2M 0K 0K 0K 0K 0K 4 __TEXT 126.9M 68.4M 0K 0K 0K 0K 0K 187 __UNICODE 564K 492K 0K 0K 0K 0K 0K 1 mapped file 4K 4K 0K 0K 0K 0K 0K 1 shared memory 28K 28K 28K 0K 0K 0K 0K 5 =========== ======= ======== ===== ======= ======== ====== ===== ======= TOTAL 1.5G 145.4M 1884K 0K 0K 0K 0K 424 VIRTUAL RESIDENT DIRTY SWAPPED ALLOCATION BYTES DIRTY+SWAP REGION MALLOC ZONE SIZE SIZE SIZE SIZE COUNT ALLOCATED FRAG SIZE % FRAG COUNT =========== ======= ========= ========= ========= ========= ========= ========= ====== ====== DefaultMallocZone_0x10ab95000 29.0M 192K 192K 0K 865 109K 83K 44% 6 WebKit Using System Malloc_0x10abc5000 10.0M 488K 488K 0K 1537 561K 0K 0% 3 =========== ======= ========= ========= ========= ========= ========= ========= ====== ====== TOTAL 39.0M 680K 680K 0K 2402 670K 10K 2% 9
(In reply to Yusuke Suzuki from comment #2) Note that this is number in macOS.
Created attachment 364582 [details] Patch
Comment on attachment 364582 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=364582&action=review r=me > Source/bmalloc/ChangeLog:11 > + size if we keep in mind that the current fast malloc heap size is 488KB. Putting them into a __DATA, close to the other mutable data, can save this page. /into a __DATA/into the __DATA section/ /, can save this page/, we can avoid allocating this page/ > Source/bmalloc/ChangeLog:13 > + This patch revives SafePerProcess concept in r228107. We add "StaticPerProcess<T>", which allocates underlying storage statically in __DATA section instead of ...revives the SafePerProcess... ... in the __DATA section ... > Source/bmalloc/bmalloc/Gigacage.cpp:67 > +} nit: Add // namespace bmalloc > Source/bmalloc/bmalloc/StaticPerProcess.h:34 > +// StaticPerProcess<T> behaves like PerProcess<T>, but we must need to explicitly define a storage for T with EXTERN. /must need to/need to/ and /define a storage/define storage/ > Source/bmalloc/bmalloc/StaticPerProcess.h:35 > +// In this way, we allocate a storage for a per-process object statically instead of allocating memory at runtime. /allocate a storage/allocate storage/ > Source/bmalloc/bmalloc/StaticPerProcess.h:51 > +// Object will be instantiated only once, even in the face of concurrency. /the face of/the presence of/
Committed r242938: <https://trac.webkit.org/changeset/242938>
<rdar://problem/48880249>
Comment on attachment 364582 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=364582&action=review >> Source/bmalloc/ChangeLog:11 >> + size if we keep in mind that the current fast malloc heap size is 488KB. Putting them into a __DATA, close to the other mutable data, can save this page. > > /into a __DATA/into the __DATA section/ > /, can save this page/, we can avoid allocating this page/ Fixed. >> Source/bmalloc/ChangeLog:13 >> + This patch revives SafePerProcess concept in r228107. We add "StaticPerProcess<T>", which allocates underlying storage statically in __DATA section instead of > > ...revives the SafePerProcess... > ... in the __DATA section ... Fixed. >> Source/bmalloc/bmalloc/Gigacage.cpp:67 >> +} > > nit: Add // namespace bmalloc Fixed. >> Source/bmalloc/bmalloc/StaticPerProcess.h:34 >> +// StaticPerProcess<T> behaves like PerProcess<T>, but we must need to explicitly define a storage for T with EXTERN. > > /must need to/need to/ and /define a storage/define storage/ Fixed. >> Source/bmalloc/bmalloc/StaticPerProcess.h:35 >> +// In this way, we allocate a storage for a per-process object statically instead of allocating memory at runtime. > > /allocate a storage/allocate storage/ Fixed. >> Source/bmalloc/bmalloc/StaticPerProcess.h:51 >> +// Object will be instantiated only once, even in the face of concurrency. > > /the face of/the presence of/ Fixed.