Bug 269937 - MegaBug: WASM memory, SAB and Gigacage fragmentation, causes unpredictable wasm memory allocation
Summary: MegaBug: WASM memory, SAB and Gigacage fragmentation, causes unpredictable wa...
Status: RESOLVED DUPLICATE of bug 272171
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
: 221530 269777 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-02-22 12:34 PST by Justin Michaud
Modified: 2024-05-15 23:30 PDT (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Michaud 2024-02-22 12:34:40 PST
MegaBug: WASM memory, SAB and Gigacage fragmentation, causes unpredictable wasm memory allocation
Comment 1 Justin Michaud 2024-02-22 12:35:49 PST
<rdar://105896689>
Comment 2 Justin Michaud 2024-02-22 12:36:33 PST
*** Bug 269777 has been marked as a duplicate of this bug. ***
Comment 3 Justin Michaud 2024-02-22 12:37:32 PST
*** Bug 221530 has been marked as a duplicate of this bug. ***
Comment 4 Justin Michaud 2024-02-22 12:42:52 PST
We need to make wasm memory more predictably allocatable. This may involve disabling the gigacage.

We also need a WASM spec story for web developers to avoid being OOM-killed because they use too much non-wasm memory.

We have seen multiple bugs where developers of large applications have their app killed without warning on iOS. This is bad.

1) Websites should be able to degrade gracefully.

This is a core principal of the web. As WebAssembly welcomes developers that are new to the web and do not understand the features that have made this platform so successful, we should set a positive example.

If your application is killed for using too much memory while users are doing important work, you have three solutions today:

- Let users lose work
- Limit users to the most limited device that you are willing to support / test on. That is, even though my iPad has 8gb of ram, I won't be able to open a 4k image because an iPhone XR can't handle it without crashing
- Refuse to support any device / browser that isn't the latest and greatest (the current most common approach)

None of these solutions are good for users, the web platform, developers, the environment and disproportionately harm the experience of users with older devices.

2) Web developers should receive actionable feedback

Let's tell websites when there is memory pressure so that they can go dirty a bunch of memory! Let's expose the precise amount of memory on their system!

This is a hard problem and we can't expect other vendors or even application developers to come up with a good API here.

3) This API shouldn't expose new fingerprinting opportunities
Comment 5 jujjyl 2024-02-23 08:10:17 PST
With respect to the point

> Websites should be able to degrade gracefully.

there is an important note to make, that running out of memory does not necessarily mean a degraded experience. How so?

There are at least two scenarios that currently exist in Wasm/Emscripten code that interact with OOMs.

1) In WebAssembly, growing the size of the Memory object can be somewhat expensive, and can cause hiccups to smooth game framerate in rendering, if wasmMemory.grow() takes some milliseconds extra.

Due to that Emscripten employs a geometric growth rate to "pre-grow" the Memory size. E.g. even if the compiled code was malloc()ing a small allocation in its linear Memory and the WebAssembly Memory needed to .grow() to accommodate it, the Emscripten runtime will do a geometric factor growth, e.g. a 1.10x, 1.25x, 1.5x or 2.0x, to get the std::vector-like growth that does not have to do do a Memory.grow() on every single malloc() in the compiled code.

Then if that geometric reservation cannot be satisfied, Emscripten gradually cuts back the reservation, say if 1.5x grow() fails, it might try to grow only by 1.25x, 1.10x, or finally by the minimum amount needed to satisfy the allocation.

This kind of scheme works gracefully in Chrome and Firefox: the browser gracefully tells Wasm "you can't grow by that much", and the page might be very well content to grow by less, and this will not result in an OOM.

I.e. not getting the initial asked memory growth might not actually mean that anything on the page would have degraded.

But in Safari this kind of geometric reservation scheme is risky due to Safari watchdog(?) thinking that the page might be asking for too much, and might get killed due to it.

Emscripten provides detailed build-time knobs to configure these geometric (or additive) growth factors, and this Safari watchdog behavior was a major reason why these knobs are exposed to Unity game developers as advanced settings so they can try to come up with the "optimal"/conservative factors to appease Safari, and to try to artificially limit the max Memory size to what they think that Safari might get users away with.

If the watchdog problem didn't exist, users would not need to spend development time to try to tune these types of heuristics.

2) A common tricky question about mark-and-sweep garbage collectors is when it would be a good time to initiate a garbage collection.

For example, Unity compiles the Boehm garbage collector as part of the C# runtime to Unity's web games.

When the page runs out of the initial free Wasm Memory, the Boehm garbage collector needs to decide: should it contend to just initiate a collection? Or should it instead go ahead and .grow() the Wasm Memory?

Earlier, we would have a scheme where we would always first run a GC before resorting to .grow()ing the Memory if GC didn't reclaim enough. But this gave rise to pathological behavior: when the Wasm Memory has just very little amount of free managed C# memory left in it (maybe e.g. on a relatively tiny 32MB Wasm Memory), and memory usage is dominated by temporary C# allocations, then it can happen that performance sinks because practically almost every GC_malloc() would trigger a collection to get rid of a small amount of previous temp garbage to fit the new allocation.

Deciding to .grow() the Memory instead would help the collector performance immensely, as it would then be able to GC_malloc() more temp memory before needing to do a full GC sweep.

So we have a typical heuristic, where we first try to GC before .grow(), but keep track of how many allocations are made, or when such GCs become too frequent, and then instead of settling with such a small memory reservations, the heap attempts to .grow() to improve performance and reduce the frequency of GCs that would be needed.

But in Safari, this attempt to .grow() might be too much to the watchdog and the page gets reloaded.

In Chrome and Firefox, when the browser gracefully refuses the Wasm Memory from .grow()ing, the Wasm content recognizes it and then gracefully settles to play ball with whatever size of the GC managed heap it was then able to build up for C# allocations, making that a ceiling of performance for the collector.

So in this case as well, not being able to .grow() doesn't mean that content would be degraded in functionality - just in managed GC collection performance inside the Wasm Memory. The page could still run fine/well enough for the user's liking, as performance is very subjective.

In summary, both cases 1) and 2) above are situations where the content might be perfectly happy with observing a limit to the amount of memory that was given, as long as attempting to ask more would not lead to catastrophic consequences.

I wonder if there might be a way to impose graceful hard limits to Wasm Memory growths? Even if it was not possible to make the limit be full 4GB if there are technical limitations, such graceful limits would already fix many use cases with Unity content.

3) Last, a related note: Mozilla pitched the idea of a way to discard individual pages inside a WebAssembly Memory. This is something that would be possible to incorporate into the low-level malloc() allocators in Emscripten (I did a quick prototype at https://github.com/juj/emscripten/commit/f9b9523f8d751e440a938b6974b08d6046667395 last year). If such a feature would be of any help in this area, we would definitely be able to integrate that with Unity WebAssembly content.
Comment 6 Justin Michaud 2024-02-23 11:06:54 PST
Thanks for the extra context!

> This kind of scheme works gracefully in Chrome and Firefox: the browser
> gracefully tells Wasm "you can't grow by that much", and the page might be
> very well content to grow by less, and this will not result in an OOM.

Asking for more memory doesn't cause an OOM. Of course, it is possible that you can use almost all of your available memory, and then other allocations can push it over the edge. If you observe a jetsam immediately after calling grow, then I would definitely immediately investigate that.

Unfortunately, since there is no way for us to ask you to take back some wasm memory, it really is the application's responsibility to not ask for too much. If you ask for 1gb, and your tab only has 1.3gb of memory free, we have no way to know if you plan to use an extra 300mb of memory outside of wasm. Again, if you ask for 1.3gb of memory, you shouldn't be immediately killed. That would be a WebKit bug. On the other hand, asking for 1gb, getting it, then allocating 300mb elsewhere and getting killed absolutely is the expected behaviour.

The web platform should provide applications with a way to manage memory expectations. It is just a really hard problem. All of the obvious solutions that I have seen are deeply flawed in one way or another, but that doesn't mean that we shouldn't try!

One common issue (and one reason why it has been so difficult to provide a good API here) is that when applications learn that they are under memory pressure, sometimes they choose to free things. This can easily cause you to page in pages that were already compressed, causing you to be OOM-killed. We really don't want developers to do this, and at the same time, JS+Wasm makes it super hard to predict across all browsers what might accidentally page-in data. For those coming from a native background (I am sure Unity has a lot of native code to handle this case), the web can be a bit harder to predict unfortunately.

> 
> I.e. not getting the initial asked memory growth might not actually mean
> that anything on the page would have degraded.
> 
> But in Safari this kind of geometric reservation scheme is risky due to
> Safari watchdog(?) thinking that the page might be asking for too much, and
> might get killed due to it.

This scheme may be risky in the sense that WASM memory and SAB use the same small protected region (called the gigacage), so if you allocate too much wasm memory, you may run out of memory when allocating a shared array buffer or typed array later. This shouldn't cause you to get killed though, if it does then that is totally wrong.

> 2) A common tricky question about mark-and-sweep garbage collectors is when
> it would be a good time to initiate a garbage collection.
> 
> For example, Unity compiles the Boehm garbage collector as part of the C#
> runtime to Unity's web games.
> 
> When the page runs out of the initial free Wasm Memory, the Boehm garbage
> collector needs to decide: should it contend to just initiate a collection?
> Or should it instead go ahead and .grow() the Wasm Memory?
> 
> Earlier, we would have a scheme where we would always first run a GC before
> resorting to .grow()ing the Memory if GC didn't reclaim enough. But this
> gave rise to pathological behavior: when the Wasm Memory has just very
> little amount of free managed C# memory left in it (maybe e.g. on a
> relatively tiny 32MB Wasm Memory), and memory usage is dominated by
> temporary C# allocations, then it can happen that performance sinks because
> practically almost every GC_malloc() would trigger a collection to get rid
> of a small amount of previous temp garbage to fit the new allocation.

Not only that, but this may actually get you killed. If a page isn't touched in a while, it will be compressed. I really wish we had a better story here.

> Deciding to .grow() the Memory instead would help the collector performance
> immensely, as it would then be able to GC_malloc() more temp memory before
> needing to do a full GC sweep.
> 
> So we have a typical heuristic, where we first try to GC before .grow(), but
> keep track of how many allocations are made, or when such GCs become too
> frequent, and then instead of settling with such a small memory
> reservations, the heap attempts to .grow() to improve performance and reduce
> the frequency of GCs that would be needed.

That sounds right

> 
> But in Safari, this attempt to .grow() might be too much to the watchdog and
> the page gets reloaded.

Again, this really shouldn't be happening. Are you sure that you aren't paging in a bunch of memory at the same time? How much dirty memory is your web content process using before this happens?

> 
> 
> I wonder if there might be a way to impose graceful hard limits to Wasm
> Memory growths? Even if it was not possible to make the limit be full 4GB if
> there are technical limitations, such graceful limits would already fix many
> use cases with Unity content.
> 
> 3) Last, a related note: Mozilla pitched the idea of a way to discard
> individual pages inside a WebAssembly Memory. This is something that would
> be possible to incorporate into the low-level malloc() allocators in
> Emscripten (I did a quick prototype at
> https://github.com/juj/emscripten/commit/
> f9b9523f8d751e440a938b6974b08d6046667395 last year). If such a feature would
> be of any help in this area, we would definitely be able to integrate that
> with Unity WebAssembly content.

I was interested in that, but I seem to remember that on Windows there was some pathology where it forced the page to be zeroed or something like that. I forget now. It might be worth investigating again.

-----

Moving forward, having info about how much dirty memory your web content process is using before getting killed would be super helpful. Could you take a vmmap right before? https://developer.apple.com/library/archive/documentation/Performance/Conceptual/ManagingMemory/Articles/VMPages.html

Thanks again for reaching out! We really do need a better solution here.
Comment 7 Ben Nham 2024-03-13 10:27:48 PDT
*** Bug 268816 has been marked as a duplicate of this bug. ***
Comment 8 Mark Lam 2024-05-15 10:39:34 PDT
We've increased the Gigacage to accommodate WAsm memory in https://bugs.webkit.org/show_bug.cgi?id=272171.

If this is not sufficient, please file another bug with new data.  Thanks.

*** This bug has been marked as a duplicate of bug 272171 ***
Comment 9 Mark Lam 2024-05-15 10:40:39 PDT
Also see https://bugs.webkit.org/show_bug.cgi?id=272232.
Comment 10 jujjyl 2024-05-15 23:30:05 PDT
Thanks Mark!

Is there a way to know which Safari version number will first carry the landed fix?