Bug 170343

Summary: WebAssembly: recycle fast memories more better
Product: WebKit Reporter: JF Bastien <jfbastien>
Component: JavaScriptCoreAssignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: clopez, fpizlo, ggaren, jfbastien, keith_miller, mark.lam, msaboff, saam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=163600
https://bugs.webkit.org/show_bug.cgi?id=175150
Bug Depends on:    
Bug Blocks: 159775    

Description JF Bastien 2017-03-31 09:11:17 PDT
We could be smarter about how WasmMemory.cpp recycles fast memories. Right now we:

 - Put up to 4 of them on a free list
 - memset to 0
 - PROT_NONE the entire range

A few random ideas:

 1. We do this synchronously and I'm not sure that's a great idea. Maybe it's fine, it would be good to measure.
 2. Is it even worth considering bzero versus memset 0?
 3. Move memset 0 to re-allocation, instead of de-allocation (we still need to PROT_NONE, and remember how many pages were dirty, on reallocation un-PROT_NONE the range, zero it out, then honor the new allocation's wishes, ugh).
 4. We could then consider using madvise. On MacOS we have MADV_ZERO_WIRED_PAGES which sounds pretty interesting. See kern_mman.c, it maps to VM_BEHAVIOR_ZERO_WIRED_PAGES in vm_map.c:vm_map_behavior_set this will set zero_wired_pages which will asynchronously cause vm_fault.c:vm_fault_unwire to pmap_zero_page(VM_PAGE_GET_PHYS_PAGE(result_page)).
 5. We could return the fast memories if we GC a few times and never reuse them, or if we get a "low-memory" signal from the system.

I think 4. and 5. are really the interesting ones, but then we need to:

 - For 4.: figure out if, when we're ready to reuse a fast memory, the pages haven't been zero'd out.
 - For 5.: is holding on to virtually-allocated but not physically wired pages that bad? It will cause the kernel to use more memory per page table and potentially make TLB misses a tiny bit slower.

Let's measure whether that's important at all:

 - Is deallocation measurably slow right now?
 - Does it get faster?
 - Do small-memory systems like us more because our number of dirty pages goes down faster?

I've measure dirty pages on a limited-memory system in a test that does work and reuses memories, and the number of dirty pages definitely goes up / down between tests, so we're not fairing that bad right now.
Comment 1 Geoffrey Garen 2017-03-31 13:09:04 PDT
For large allocations that need zero-backing, we generally prefer MADV_FREE_REUSABLE with explicit zeroing because it avoids the (very large) cost of page faults.

JF pointed out that Emscripten doesn't support grow-on-demand heaps, so lots of WASM programs demand very large heaps that they don't use. If that's true, WASM may be an exception to our general strategy, and it may benefit from MADV_ZERO_WIRED_PAGES or some other madvise/mmap API that forces on-demand page-fault-and-zero-fill behavior, since the memory cost of huge overcommits may be too high.

>  5. We could return the fast memories if we GC a few times and never reuse them, or if we get a "low-memory" signal from the system.

There's no point to unmapping cached fast memories in response to low memory warnings because they don't hold any physical pages.