Bug 302711

Summary: WebGPU Crash on iOS with Time-Varying Mesh Access using instancing vertex buffers
Product: WebKit Reporter: s.todchuk
Component: WebGPUAssignee: Mike Wyrzykowski <mwyrzykowski>
Status: RESOLVED FIXED    
Severity: Major CC: mwyrzykowski, tzagallo, webkit-bug-importer
Priority: P1 Keywords: InRadar
Version: Safari 26   
Hardware: iPhone / iPad   
OS: iOS 26   
Attachments:
Description Flags
HTML file demonstrating the crash (takes around 70 sec)
none
patch from April none

s.todchuk
Reported 2025-11-18 06:08:27 PST
Created attachment 477420 [details] HTML file demonstrating the crash (takes around 70 sec) Rendering with instancing vertex buffers crashes after ~70s when mesh access order varies per frame. Works fine with fixed order or storage buffers. # WebGPU Crash on iOS with Time-Varying Mesh Access Patterns ## Summary Safari on iOS crashes when rendering multiple meshes with **mesh access order that varies per frame** while using instancing vertex buffers for transforms. The crash is **time-dependent** (accumulates over ~70 seconds), **scales with mesh count and object count**, indicating a memory corruption or resource tracking bug in iOS WebGPU's Metal backend. **Does NOT crash on macOS Safari** - iOS-specific bug. ## Environment - **Device**: iPhone 15 Pro/16 Pro Max, iOS 26.0/26.1 - **Browser**: Safari (WebKit Metal backend) - **Cross-platform**: DOES not crash on Chrome/Edge/Firefox (Windows/Android/macOS) and Safari macOS ## Reproduction Steps ### Test Case: `iPhoneWGPUCrash.html` Standalone HTML file demonstrating the crash. Error after ~70 seconds: `"InvalidStateError: GPUCommandEncoder.finish: Unable to finish."` **Minimum crash conditions:** - 1000+ unique meshes, 10000+ object instances - Instancing vertex buffer for per-object transforms - **Mesh access order that changes every frame** (random shuffle OR random start offset) **Config:** `MESH_COUNT`, `GRID_SIZE`, `VARY_RQ_HEAD`, `DO_RQ_SHUFFLE` (see code comments) ### Key Findings **What triggers the crash:** - **Time-varying mesh access patterns** - order changes per frame (whether via `firstIndex` OR buffer binding) - Instancing vertex buffer for transforms - Scales with mesh count (min: 1000) and object count (min: 4096 on iPhone 15 Pro) - Time-dependent failure (~70s), not immediate validation error **What does NOT affect crash:** - Mesh complexity or vertex format - `baseVertex` parameter - Buffer layout (shared vs separate buffers) - Draw call count (batching improves FPS but doesn't prevent crash) ### Workarounds Tested | Workaround | Result | |------------|--------| | Bake vertex offsets, `baseVertex=0` | ❌ Crashes | | Separate buffers per mesh | ❌ Crashes | | Batch draws via instancing | ❌ Crashes | | Static transform buffers (no updates) | ❌ Crashes | | Indirect drawing (`drawIndexedIndirect`) | ❌ Crashes | | **Fixed mesh order per frame** (sequential OR constant random) | ✅ Works | | **Storage/constant buffer for transforms** (instead of instancing vertex buffer) | ✅ Works | **Root cause:** Instancing vertex buffer + time-varying mesh access order = crash. Fixed order (even if non-sequential) works fine. ## Impact on Real-World Applications This bug **blocks all standard 3D engine techniques** that vary rendering order per frame: - **Frustum Culling** - rendering only visible objects - **Depth Sorting** - transparent object ordering - **Material Batching** - grouping by material/shader - **LOD Systems** - dynamic mesh detail switching - **Dynamic Scenes** - adding/removing objects **Result:** iOS WebGPU is effectively unusable for production 3D applications. ## Business Impact **Blocking delivery to enterprise customers:** ConocoPhillips and AkerBP (major oil & gas companies) are waiting for our 3D engine product, which uses these exact rendering patterns. We cannot ship a product that crashes on iOS. **Storage buffer workaround limitations:** - Requires major architectural changes - Reduces rendering efficiency (alignment overhead, suboptimal memory access) - Limits hardware/browser compatibility (stricter size limits) ## Request Please investigate this memory corruption/resource tracking bug in iOS WebGPU's Metal backend. This is a **critical, reproducible issue** blocking legitimate 3D rendering techniques and enterprise product deliveries.
Attachments
HTML file demonstrating the crash (takes around 70 sec) (27.66 KB, text/html)
2025-11-18 06:08 PST, s.todchuk
no flags
patch from April (26.53 KB, patch)
2025-11-20 10:03 PST, Mike Wyrzykowski
no flags
s.todchuk
Comment 1 2025-11-18 06:31:08 PST
https://webgpu.github.io/webgpu-samples/sample/animometer/ also crashes after some time (~2 minutes) with numTriangles=20000, renderBundles=false, dynamicOffsets=true
Radar WebKit Bug Importer
Comment 2 2025-11-18 22:09:08 PST
Mike Wyrzykowski
Comment 3 2025-11-18 22:21:28 PST
Thank you for the repro case. The main difference between iOS and macOS is that iOS will terminate due to memory pressure whereas macOS will not until it reaches much higher thresholds. In the repro, memory starts at 2GB and climbs to 3GB in WebKit's GPU process on macOS as well. Retain issue is reproducible on macOS. Seems something some large amount of memory is being retained when it should not, I will take a look
Mike Wyrzykowski
Comment 4 2025-11-18 22:29:25 PST
checking memgraphs for the com.apple.WebKit.GPU and com.apple.WebKit.WebKit processes
Mike Wyrzykowski
Comment 5 2025-11-18 22:30:00 PST
memory usage seems about half or less on Chrome for reference
Mike Wyrzykowski
Comment 6 2025-11-18 22:32:21 PST
Guess based on the report details is this repro constantly triggers vertex buffer validation and we have a retain issue with that
Mike Wyrzykowski
Comment 7 2025-11-19 14:59:36 PST
Skipping vertex buffer validation keeps memory usage stable around ~500MB over the same 70 second time period Certainly it appears we have unbounded memory growth due to buffer validation.
Mike Wyrzykowski
Comment 8 2025-11-19 15:00:00 PST
Great bug report by the way, thank you.
Mike Wyrzykowski
Comment 9 2025-11-19 15:35:33 PST
So it appears our cache quickly approaches ~5 million elements https://github.com/WebKit/WebKit/blob/7d08e130dc4395638075edb553966d2b4a6659b9/Source/WebGPU/WebGPU/Buffer.h#L183 Either we are incorrectly missing the cache or we need to clear it.
Mike Wyrzykowski
Comment 10 2025-11-19 15:56:30 PST
Perf wise the buffer validation is severely negatively impacting performance here. Disabling buffer validation I observe ~45fps. With buffer validation I observe ~9fps. Chrome on the same Mac is ~30fps. There is more occurring than just an out of control cache. Memory usage limiting the cache size to 10 elements is still 1.6GB
Mike Wyrzykowski
Comment 11 2025-11-20 09:06:26 PST
The memory usage appears to originate from Vertex : Vertex memory barriers we emit, of which there are several thousand per frame
Mike Wyrzykowski
Comment 12 2025-11-20 09:31:10 PST
We can emit a single memory barrier by switching to MTLParallelRenderCommandEncoder. I.e., perform vertex buffer validation for all draws first, then proceed with standard rendering. I will try migrating to MTLParallelRenderCommandEncoder
Mike Wyrzykowski
Comment 13 2025-11-20 10:03:54 PST
Created attachment 477453 [details] patch from April I wrote a patch in April to switch to parallel command encoding for this purpose but there were some bugs and it was deemed too risky. I'm going to try and clean it up so we remove all but one of the memory barriers.
Mike Wyrzykowski
Comment 14 2025-11-20 14:48:12 PST
MTLParallelRenderCommandEncoder and limiting the cache size resolves the memory growth. Perf wise is still not great. Maybe something easy to resolve that too
Mike Wyrzykowski
Comment 15 2025-11-20 15:21:25 PST
Oh nice using the ring buffer allocator gets us ~35 fps and Chrome is ~33 fps on the same Mac so virtually identical. Going to make an iOS build to ensure the issue is fully addressed
Mike Wyrzykowski
Comment 16 2025-11-20 15:31:25 PST
Mike Wyrzykowski
Comment 17 2025-11-24 12:30:26 PST
Seems fine on an iPhone 13 mini, no crashes after several minutes
EWS
Comment 18 2025-12-04 16:20:06 PST
Committed 303942@main (df6c49376568): <https://commits.webkit.org/303942@main> Reviewed commits have been landed. Closing PR #54279 and removing active labels.
Note You need to log in before you can comment on or make changes to this bug.