RESOLVED FIXED Bug 237180
REGRESSION (iOS 15.4 beta): Unity WASM builds fail to load
https://bugs.webkit.org/show_bug.cgi?id=237180
Summary REGRESSION (iOS 15.4 beta): Unity WASM builds fail to load
Brendan Duncan
Reported 2022-02-24 20:27:39 PST
Created attachment 453171 [details] Unity project build that loads in iOS 15.3 but not in 15.4 While testing iOS 15.4 Safari for Unity WebGL builds, Unity projects no longer load. iOS 15.3 works correctly. There are a lot of function not implemented errors, and sometimes memory heap corruption errors. I have not yet been able to diagnose the reason for the failures.
Attachments
Unity project build that loads in iOS 15.3 but not in 15.4 (4.25 MB, application/zip)
2022-02-24 20:27 PST, Brendan Duncan
no flags
Javascript WebGL playback recording of the Unity frame render (3.21 MB, text/html)
2022-02-26 09:47 PST, Brendan Duncan
no flags
This build does not crash, possibly because the WASM is < 9MB. (2.75 MB, application/zip)
2022-03-08 21:17 PST, Brendan Duncan
no flags
This build crashes, possibly because the WASM is > 9MB. (3.28 MB, application/zip)
2022-03-08 21:17 PST, Brendan Duncan
no flags
Working Unity project build with the problem function compiled with no optimizations (6.42 MB, application/zip)
2022-03-21 15:25 PDT, Brendan Duncan
no flags
Crashing Unity project build with the problem function compiled with optimizations (6.42 MB, application/zip)
2022-03-21 15:27 PDT, Brendan Duncan
no flags
Kenneth Russell
Comment 1 2022-02-24 20:30:27 PST
CC'ing WebGL / ANGLE folks in case failures to create or work with the WebGL context are affecting the behavior here.
Kenneth Russell
Comment 2 2022-02-24 20:37:35 PST
On an M1 Mac this example shows a white square on a blue background in Safari 15.3 (17612.4.9.1.8), but renders solid black with no obvious errors in the JavaScript console in Safari Technology Preview Release 140 (Safari 15.4, WebKit 17614.1.1.5).
Kenneth Russell
Comment 3 2022-02-24 21:19:01 PST
Seems to render correctly on top-of-tree WebKit run inside Safari (run-safari --debug) on M1 hardware - assuming my build is being picked up properly.
Radar WebKit Bug Importer
Comment 4 2022-02-25 09:55:21 PST
Kenneth Russell
Comment 5 2022-02-25 15:41:16 PST
+ysuzuki in case anyone from the JSC team knows who should triage this - thanks in advance.
Yusuke Suzuki
Comment 6 2022-02-25 15:46:04 PST
(In reply to Kenneth Russell from comment #3) > Seems to render correctly on top-of-tree WebKit run inside Safari > (run-safari --debug) on M1 hardware - assuming my build is being picked up > properly. Probably this sounds like a WebGL error, Kimmo, do you have an idea?
Yusuke Suzuki
Comment 7 2022-02-25 15:46:52 PST
Also CCing Justin if it is related to wasm, (but probably it is not for 15.4 release).
Brendan Duncan
Comment 8 2022-02-25 15:49:59 PST
The errors on iOS seem to be different than on desktop Safari Tech Preview 15.4. They seem to be WASM related (function not found, memory heap corruption). But maybe a WebGL problem is causing the corruption.
Brendan Duncan
Comment 9 2022-02-26 09:47:58 PST
Created attachment 453298 [details] Javascript WebGL playback recording of the Unity frame render
Brendan Duncan
Comment 10 2022-02-26 09:54:59 PST
Here's another datapoint indicating the problem is coming from WASM execution rather than Angle/WebGL. I used my tool that does a playback recording of all the WebGL commands executed by Unity to render a frame. It saves all the commands into an html that can be opened to recreate the frame render in pure Javascript. The javascript version renders in both Safari Tech Preview 15.4 and iOS 15.4 beta, but the WASM version does not. There are a bunch of WebGL errors in WebKit that do not occur in Chrome, but they don't affect the render. This indicates to me the WASM execution is causing the problems that cause the memory corruption / stack errors on iOS, and the black screen on STP.
Brendan Duncan
Comment 11 2022-02-26 09:55:45 PST
I attached the frame recording html to the issue.
Kenneth Russell
Comment 12 2022-02-28 10:57:30 PST
Upgrading to P1, Major - could someone from WebKit's Wasm team please triage this? It would be an unfortunate regression. Thanks.
Brendan Duncan
Comment 13 2022-03-04 09:42:55 PST
Another datapoint: The just-released Safari Tech Preview on MacOS (Release 141, Safari 15.4, WebKit 17614.1.3.8) is now working. The previous version of STP had the black screen. iOS 15.4 beta 5 is still not working with WASM memory errors.
Brendan Duncan
Comment 14 2022-03-08 21:16:10 PST
Here is some more information about this crash. With iOS 15.4 Release Candidate still being broken, this means all Unity web builds will be broken on iOS when 15.4 ships. The crash seems to be related to the size of the WASM file. It seems WASM files > ~9MB crash, while WASM files < 9MB work. I attached two builds of the same Unity project, which does nothing but set the clear color of the canvas. unity_ios15_4_working.zip has a build where the WASM file size is 6.2MB. unity_ios_15_4_crash.zip is the same project, but built in development mode, and the WASM file size is 9.1MB. I did another test where I built the project with different compiler settings that resulted in a 8.4MB WASM, which still worked.
Brendan Duncan
Comment 15 2022-03-08 21:17:14 PST
Created attachment 454189 [details] This build does not crash, possibly because the WASM is < 9MB.
Brendan Duncan
Comment 16 2022-03-08 21:17:44 PST
Created attachment 454190 [details] This build crashes, possibly because the WASM is > 9MB.
connell
Comment 17 2022-03-16 12:24:54 PDT
Our customers and end-users are encountering this issue in production now that iOS 15.4 has started rolling out. We'd really appreciate a point release that could resolve this issue as soon as possible and some more transparency about the work going on to resolve it. This is a vital technology for us and this regression is likely to have a significant impact on our business and that of our customers. @Brendan Duncan - I don't think we can influence the version of emsdk that Unity uses from our side, but is it possible to try earlier versions, e.g. 2.0.16? We're using that version with other platforms to avoid issues with earlier iOS versions that appear with later emsdk versions.
Simon Taylor
Comment 18 2022-03-16 12:37:01 PDT
Another potential workaround to explore Brendan - we previously encountered a Chrome bug triggered by memory growth: https://bugs.chromium.org/p/v8/issues/detail?id=11863 The workaround we used for that one was to allocate a large WebAssembly.Memory up-front - more details here: https://github.com/emscripten-core/emscripten/issues/14538#issuecomment-873605760 As Connell says this bug is going to have a massively negative impact on us and our customers, and it unfortunately wasn't one of the ones I was tracking during the betas... it's proving very challenging to keep track of all the new bugs affecting our products that Safari is introducing at the moment :(
Simon Taylor
Comment 19 2022-03-17 03:37:20 PDT
(In reply to Brendan Duncan from comment #14) > I did another test where I built the project with different compiler > settings that resulted in a 8.4MB WASM, which still worked. 8 MiB (8 * 1024 * 1024, 1 << 23) is 8388608 bytes, which finder displays as 8.4MB, so that seems a plausible threshold to me. I'm trying out a few things now to see if I can find any workarounds - will update here if so.
Mike Welsh
Comment 20 2022-03-19 19:15:36 PDT
I believe we're also hitting this regression in Ruffle: https://ruffle.rs/demo/ iOS Safari 15.4, iPad Air 4th Gen. Throwing "RuntimeError: Out of bounds memory access" or other errors. iOS Safari 15.3 and earlier worked correctly. macos Safari 15.4 works correctly (at least on my x86 2017 MacBook). If I can make a minimal repro case, I'll post it here.
Brendan Duncan
Comment 21 2022-03-21 13:51:29 PDT
Here are some more notes from researching the bug in Unity. Memory errors are hard to figure out. I found a specific function, and a specific chunk of code within that function, that starts the cascade of failures that result in the crash. The name of the function is GenericMetadata::InflateIfNeeded, and the previously attached builds should have symbols in their WASM files. If I wrap that function with #pragma clang optimize off ... #pragma clang optimize on Then the crash goes away. The code is normally built with -O2. Only the iOS15.4 WASM VM has the crash. The function is a switch statement, with small blocks of code for each case. If I move the body of code for the case where the crash originates to its own function, the crash goes away, regardless of optimization level. So it seems to have to do with the optimization of that switch, along with something the iOS15.4 WASM VM is doing. If I add printf debugging to the section of code where the error occurs, then the values of what is printed can change depending on where I print it from, even though the value I'm printing shouldn't change. So, observation can change the behavior, though doesn't stop the crash. This indicates maybe some alignment issue, where the added code is shifting the alignment.
Saam Barati
Comment 22 2022-03-21 14:54:24 PDT
(In reply to Brendan Duncan from comment #21) > Here are some more notes from researching the bug in Unity. Memory errors > are hard to figure out. > > I found a specific function, and a specific chunk of code within that > function, that starts the cascade of failures that result in the crash. The > name of the function is GenericMetadata::InflateIfNeeded, and the previously > attached builds should have symbols in their WASM files. > > If I wrap that function with > #pragma clang optimize off > ... > #pragma clang optimize on > > Then the crash goes away. The code is normally built with -O2. > > Only the iOS15.4 WASM VM has the crash. > > The function is a switch statement, with small blocks of code for each case. > If I move the body of code for the case where the crash originates to its > own function, the crash goes away, regardless of optimization level. > > So it seems to have to do with the optimization of that switch, along with > something the iOS15.4 WASM VM is doing. > > If I add printf debugging to the section of code where the error occurs, > then the values of what is printed can change depending on where I print it > from, even though the value I'm printing shouldn't change. So, observation > can change the behavior, though doesn't stop the crash. This indicates maybe > some alignment issue, where the added code is shifting the alignment. Thanks for this info, it's super helpful.
Saam Barati
Comment 23 2022-03-21 15:03:01 PDT
(In reply to Mike Welsh from comment #20) > I believe we're also hitting this regression in Ruffle: > https://ruffle.rs/demo/ > iOS Safari 15.4, iPad Air 4th Gen. > Throwing "RuntimeError: Out of bounds memory access" or other errors. > > iOS Safari 15.3 and earlier worked correctly. macos Safari 15.4 works > correctly (at least on my x86 2017 MacBook). If I can make a minimal repro > case, I'll post it here. Thanks this is also helpful.
Brendan Duncan
Comment 24 2022-03-21 15:25:59 PDT
Created attachment 455282 [details] Working Unity project build with the problem function compiled with no optimizations I'm attaching two builds of a Unity project that are identical except for the one function I found to seemingly trigger the crash. In this working version, the function is compiled with `#pragma clang optimize off`. The problem function is GenericMetadata::InflateIfNeeded. The WASM has symbols, and the function should be defined as il2cpp::metadata::GenericMetadata::InflateIfNeeded_Il2CppType_const*__Il2CppGenericContext_const*__bool_.
Brendan Duncan
Comment 25 2022-03-21 15:27:05 PDT
Created attachment 455284 [details] Crashing Unity project build with the problem function compiled with optimizations I'm attaching two builds of a Unity project that are identical except for the one function I found to seemingly trigger the crash. In this crashing version, the function is compiled with `#pragma clang optimize on`. The problem function is GenericMetadata::InflateIfNeeded. The WASM has symbols, and the function should be defined as il2cpp::metadata::GenericMetadata::InflateIfNeeded_Il2CppType_const*__Il2CppGenericContext_const*__bool_.
Brendan Duncan
Comment 26 2022-03-21 15:38:40 PDT
I attached two new builds, one that seems to work without crashing, and the other that crashes. The only difference between the two builds, is the one function I added #pragma clang optimize off to disable compiler optimizations. The function itself is fairly simple. A switch statement, where one of the cases, when inlined in the switch and with optimizations, causes the crash. That case is defined as: Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType)); memcpy(genericType, type, sizeof(Il2CppType)); genericType->data.generic_class = genericClass; ++il2cpp_runtime_stats.inflated_type_count; return genericType; That case block in that function will be called 260 times, and then on the 261st time, something is different and object has bad data. sizeof(Il2CppType) is 8, and Il2Cpp is defined as: typedef struct Il2CppType { union { void* dummy; TypeDefinitionIndex __klassIndex; Il2CppMetadataTypeHandle typeHandle; const Il2CppType *type; Il2CppArrayType *array; GenericParameterIndex __genericParameterIndex; Il2CppMetadataGenericParameterHandle genericParameterHandle; Il2CppGenericClass *generic_class; } data; unsigned int attrs : 16; Il2CppTypeEnum type : 8; unsigned int num_mods : 5; unsigned int byref : 1; unsigned int pinned : 1; unsigned int valuetype : 1; } Il2CppType; If I add a printf to the code, as: Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType)); memcpy(genericType, type, sizeof(Il2CppType)); genericType->data.generic_class = genericClass; ++il2cpp_runtime_stats.inflated_type_count; printf("%d == %d\n", genericType->type, type->type); return genericType; Then after 260 prints, it starts printing 0 == 21. Nothing in the code should be changing genericType->type from type->type; If I move the printf statement up, such as: Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType)); memcpy(genericType, type, sizeof(Il2CppType)); printf("%d == %d\n", genericType->type, type->type); genericType->data.generic_class = genericClass; ++il2cpp_runtime_stats.inflated_type_count; return genericType; Then it always prints 21 == 21, and it still crashes. And even stranger, if I print in both places: Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType)); memcpy(genericType, type, sizeof(Il2CppType)); printf("%d == %d\n", genericType->type, type->type); genericType->data.generic_class = genericClass; ++il2cpp_runtime_stats.inflated_type_count; printf("%d == %d\n", genericType->type, type->type); return genericType; Then it always prints 21 == 21 in both places, and still crashes.
Carlo Piovesan
Comment 27 2022-03-23 03:58:01 PDT
I was checking the diff between the latest Wasm files posted by Brendan Duncan, and in the crashing case the very first switch case sometimes falls back to the following case: -------------------- br_table 2 4 5 6 6 6 //... end local.get 3 i32.const 15 i32.ne br_if 5 end //go on with flow IFF the third local was exactly 15 //... -------------------- This reminded me of this other bug: https://bugs.webkit.org/show_bug.cgi?id=210105 connected to a similar problem while treating switches. If this is the actual problem, on the producer side it might be possible to avoid this construct.
Justin Michaud
Comment 28 2022-03-23 18:37:58 PDT
Testing against safari-613.1.17.0-branch on iOS 15.4, I have confirmed that commenting out the mutations to the program from CSE’s processWasmAddressValue fixes ruffle.rs and the crashing test case. The original (black screen) test case is not fixed by this. The WasmAddress CSE bug is being tracked by https://bugs.webkit.org/show_bug.cgi?id=238302 and I will have a patch up soon for that. I think this bug should continue to track the original unity black screen test case, the cause of which is still unknown.
Brendan Duncan
Comment 29 2022-03-23 19:56:42 PDT
That's fantastic if this is the fix! There are a number of WebGL graphics regressions that are currently being patched by the Angle WebGL developers, so if it's not crashing and is a black screen, then I suspect that's the separate WebGL issue.
Justin Michaud
Comment 30 2022-04-06 11:02:39 PDT
All of the WASM changes have landed, and the ANGLE changes landed a little while back. I verified that the test cases on iOS and on macOS (with __XPC_JSC_useWebAssemblyFastMemory=0 to trigger the bug in the first place) are fixed with ToT plus the patch that just landed. Thank you for taking the time to report these bugs!
paul
Comment 31 2022-04-08 01:57:19 PDT
*** Bug 238662 has been marked as a duplicate of this bug. ***
Simon Taylor
Comment 32 2022-04-21 01:39:48 PDT
Just a note - I am still experiencing these wasm bugs on the latest 15.5 beta 2 build (tested on an iPod touch 7th Gen). The ANGLE fixes for anti-aliased canvases do seem to have made it in to that build at least.
Mike Welsh
Comment 33 2022-04-29 21:21:14 PDT
Also noting that I am still experiencing the Wasm issues on iPad OS 15.5 beta 3 on https://ruffle.rs/demo (select a demo), in case it is at all helpful. Thank you!
Mike Welsh
Comment 34 2022-05-03 11:55:01 PDT
Update, the fix seems to have made it into iOS 15.4 beta 4, released today (May 3). Thanks everyone :-)
Simon Taylor
Comment 35 2022-05-04 09:28:24 PDT
Confirming this looks fixed to me too in the 15.5 beta 4 build. Just to note the mention of 15.4 beta 4 in the previous comment was likely a typo - 15.4 introduced the bug, 15.5 beta 4 seems to be when the fix made it into an iOS build, so hopefully it stays fixed until the 15.5 final release goes out!
Sam Sneddon [:gsnedders]
Comment 36 2022-05-16 15:40:08 PDT
Just to let everyone know, the fix for this has shipped in Safari 15.5.
Note You need to log in before you can comment on or make changes to this bug.