Bug 237180

Summary: REGRESSION (iOS 15.4 beta): Unity WASM builds fail to load
Product: WebKit Reporter: Brendan Duncan <brendanduncan>
Component: WebAssemblyAssignee: Justin Michaud <justin_michaud>
Status: RESOLVED FIXED    
Severity: Major CC: anthony.bowker, carlo, connell, dino, johncunningham, justin_michaud, kbr, kkinnunen, kpiddington, mwelsh, paul, roman.wache, saam, simontaylor1, webkit-bug-importer, ysuzuki
Priority: P1 Keywords: InRadar
Version: Safari 15   
Hardware: iPhone / iPad   
OS: iOS 15   
Bug Depends on: 238302    
Bug Blocks: 242912    
Attachments:
Description Flags
Unity project build that loads in iOS 15.3 but not in 15.4
none
Javascript WebGL playback recording of the Unity frame render
none
This build does not crash, possibly because the WASM is < 9MB.
none
This build crashes, possibly because the WASM is > 9MB.
none
Working Unity project build with the problem function compiled with no optimizations
none
Crashing Unity project build with the problem function compiled with optimizations none

Description Brendan Duncan 2022-02-24 20:27:39 PST
Created attachment 453171 [details]
Unity project build that loads in iOS 15.3 but not in 15.4

While testing iOS 15.4 Safari for Unity WebGL builds, Unity projects no longer load. iOS 15.3 works correctly. There are a lot of function not implemented errors, and sometimes memory heap corruption errors. I have not yet been able to diagnose the reason for the failures.
Comment 1 Kenneth Russell 2022-02-24 20:30:27 PST
CC'ing WebGL / ANGLE folks in case failures to create or work with the WebGL context are affecting the behavior here.
Comment 2 Kenneth Russell 2022-02-24 20:37:35 PST
On an M1 Mac this example shows a white square on a blue background in Safari 15.3 (17612.4.9.1.8), but renders solid black with no obvious errors in the JavaScript console in Safari Technology Preview Release 140 (Safari 15.4, WebKit 17614.1.1.5).
Comment 3 Kenneth Russell 2022-02-24 21:19:01 PST
Seems to render correctly on top-of-tree WebKit run inside Safari (run-safari --debug) on M1 hardware - assuming my build is being picked up properly.
Comment 4 Radar WebKit Bug Importer 2022-02-25 09:55:21 PST
<rdar://problem/89479422>
Comment 5 Kenneth Russell 2022-02-25 15:41:16 PST
+ysuzuki in case anyone from the JSC team knows who should triage this - thanks in advance.
Comment 6 Yusuke Suzuki 2022-02-25 15:46:04 PST
(In reply to Kenneth Russell from comment #3)
> Seems to render correctly on top-of-tree WebKit run inside Safari
> (run-safari --debug) on M1 hardware - assuming my build is being picked up
> properly.

Probably this sounds like a WebGL error, Kimmo, do you have an idea?
Comment 7 Yusuke Suzuki 2022-02-25 15:46:52 PST
Also CCing Justin if it is related to wasm, (but probably it is not for 15.4 release).
Comment 8 Brendan Duncan 2022-02-25 15:49:59 PST
The errors on iOS seem to be different than on desktop Safari Tech Preview 15.4. They seem to be WASM related (function not found, memory heap corruption). But maybe a WebGL problem is causing the corruption.
Comment 9 Brendan Duncan 2022-02-26 09:47:58 PST
Created attachment 453298 [details]
Javascript WebGL playback recording of the Unity frame render
Comment 10 Brendan Duncan 2022-02-26 09:54:59 PST
Here's another datapoint indicating the problem is coming from WASM execution rather than Angle/WebGL. I used my tool that does a playback recording of all the WebGL commands executed by Unity to render a frame. It saves all the commands into an html that can be opened to recreate the frame render in pure Javascript.

The javascript version renders in both Safari Tech Preview 15.4 and iOS 15.4 beta, but the WASM version does not. There are a bunch of WebGL errors in WebKit that do not occur in Chrome, but they don't affect the render.

This indicates to me the WASM execution is causing the problems that cause the memory corruption / stack errors on iOS, and the black screen on STP.
Comment 11 Brendan Duncan 2022-02-26 09:55:45 PST
I attached the frame recording html to the issue.
Comment 12 Kenneth Russell 2022-02-28 10:57:30 PST
Upgrading to P1, Major - could someone from WebKit's Wasm team please triage this? It would be an unfortunate regression. Thanks.
Comment 13 Brendan Duncan 2022-03-04 09:42:55 PST
Another datapoint: 

The just-released Safari Tech Preview on MacOS (Release 141, Safari 15.4, WebKit 17614.1.3.8) is now working. The previous version of STP had the black screen.

iOS 15.4 beta 5 is still not working with WASM memory errors.
Comment 14 Brendan Duncan 2022-03-08 21:16:10 PST
Here is some more information about this crash. With iOS 15.4 Release Candidate still being broken, this means all Unity web builds will be broken on iOS when 15.4 ships.

The crash seems to be related to the size of the WASM file. It seems WASM files > ~9MB crash, while WASM files < 9MB work.

I attached two builds of the same Unity project, which does nothing but set the clear color of the canvas. 

unity_ios15_4_working.zip has a build where the WASM file size is 6.2MB.
unity_ios_15_4_crash.zip is the same project, but built in development mode, and the WASM file size is 9.1MB. 

I did another test where I built the project with different compiler settings that resulted in a 8.4MB WASM, which still worked.
Comment 15 Brendan Duncan 2022-03-08 21:17:14 PST
Created attachment 454189 [details]
This build does not crash, possibly because the WASM is < 9MB.
Comment 16 Brendan Duncan 2022-03-08 21:17:44 PST
Created attachment 454190 [details]
This build crashes, possibly because the WASM is > 9MB.
Comment 17 connell 2022-03-16 12:24:54 PDT
Our customers and end-users are encountering this issue in production now that iOS 15.4 has started rolling out. We'd really appreciate a point release that could resolve this issue as soon as possible and some more transparency about the work going on to resolve it. This is a vital technology for us and this regression is likely to have a significant impact on our business and that of our customers.

@Brendan Duncan - I don't think we can influence the version of emsdk that Unity uses from our side, but is it possible to try earlier versions, e.g. 2.0.16? We're using that version with other platforms to avoid issues with earlier iOS versions that appear with later emsdk versions.
Comment 18 Simon Taylor 2022-03-16 12:37:01 PDT
Another potential workaround to explore Brendan - we previously encountered a Chrome bug triggered by memory growth:
https://bugs.chromium.org/p/v8/issues/detail?id=11863

The workaround we used for that one was to allocate a large WebAssembly.Memory up-front - more details here:
https://github.com/emscripten-core/emscripten/issues/14538#issuecomment-873605760

As Connell says this bug is going to have a massively negative impact on us and our customers, and it unfortunately wasn't one of the ones I was tracking during the betas... it's proving very challenging to keep track of all the new bugs affecting our products that Safari is introducing at the moment :(
Comment 19 Simon Taylor 2022-03-17 03:37:20 PDT
(In reply to Brendan Duncan from comment #14)
> I did another test where I built the project with different compiler
> settings that resulted in a 8.4MB WASM, which still worked.

8 MiB (8 * 1024 * 1024, 1 << 23) is 8388608 bytes, which finder displays as 8.4MB, so that seems a plausible threshold to me.

I'm trying out a few things now to see if I can find any workarounds - will update here if so.
Comment 20 Mike Welsh 2022-03-19 19:15:36 PDT
I believe we're also hitting this regression in Ruffle:
https://ruffle.rs/demo/
iOS Safari 15.4, iPad Air 4th Gen.
Throwing "RuntimeError: Out of bounds memory access" or other errors.

iOS Safari 15.3 and earlier worked correctly. macos Safari 15.4 works correctly (at least on my x86 2017 MacBook). If I can make a minimal repro case, I'll post it here.
Comment 21 Brendan Duncan 2022-03-21 13:51:29 PDT
Here are some more notes from researching the bug in Unity. Memory errors are hard to figure out.

I found a specific function, and a specific chunk of code within that function, that starts the cascade of failures that result in the crash. The name of the function is GenericMetadata::InflateIfNeeded, and the previously attached builds should have symbols in their WASM files.

If I wrap that function with
#pragma clang optimize off
...
#pragma clang optimize on

Then the crash goes away. The code is normally built with -O2.

Only the iOS15.4 WASM VM has the crash.

The function is a switch statement, with small blocks of code for each case.
If I move the body of code for the case where the crash originates to its own function, the crash goes away, regardless of optimization level.

So it seems to have to do with the optimization of that switch, along with something the iOS15.4 WASM VM is doing.

If I add printf debugging to the section of code where the error occurs, then the values of what is printed can change depending on where I print it from, even though the value I'm printing shouldn't change. So, observation can change the behavior, though doesn't stop the crash. This indicates maybe some alignment issue, where the added code is shifting the alignment.
Comment 22 Saam Barati 2022-03-21 14:54:24 PDT
(In reply to Brendan Duncan from comment #21)
> Here are some more notes from researching the bug in Unity. Memory errors
> are hard to figure out.
> 
> I found a specific function, and a specific chunk of code within that
> function, that starts the cascade of failures that result in the crash. The
> name of the function is GenericMetadata::InflateIfNeeded, and the previously
> attached builds should have symbols in their WASM files.
> 
> If I wrap that function with
> #pragma clang optimize off
> ...
> #pragma clang optimize on
> 
> Then the crash goes away. The code is normally built with -O2.
> 
> Only the iOS15.4 WASM VM has the crash.
> 
> The function is a switch statement, with small blocks of code for each case.
> If I move the body of code for the case where the crash originates to its
> own function, the crash goes away, regardless of optimization level.
> 
> So it seems to have to do with the optimization of that switch, along with
> something the iOS15.4 WASM VM is doing.
> 
> If I add printf debugging to the section of code where the error occurs,
> then the values of what is printed can change depending on where I print it
> from, even though the value I'm printing shouldn't change. So, observation
> can change the behavior, though doesn't stop the crash. This indicates maybe
> some alignment issue, where the added code is shifting the alignment.

Thanks for this info, it's super helpful.
Comment 23 Saam Barati 2022-03-21 15:03:01 PDT
(In reply to Mike Welsh from comment #20)
> I believe we're also hitting this regression in Ruffle:
> https://ruffle.rs/demo/
> iOS Safari 15.4, iPad Air 4th Gen.
> Throwing "RuntimeError: Out of bounds memory access" or other errors.
> 
> iOS Safari 15.3 and earlier worked correctly. macos Safari 15.4 works
> correctly (at least on my x86 2017 MacBook). If I can make a minimal repro
> case, I'll post it here.

Thanks this is also helpful.
Comment 24 Brendan Duncan 2022-03-21 15:25:59 PDT
Created attachment 455282 [details]
Working Unity project build with the problem function compiled with no optimizations

I'm attaching two builds of a Unity project that are identical except for the one function I found to seemingly trigger the crash. In this working version, the function is compiled with `#pragma clang optimize off`.
The problem function is GenericMetadata::InflateIfNeeded. The WASM has symbols, and the function should be defined as il2cpp::metadata::GenericMetadata::InflateIfNeeded_Il2CppType_const*__Il2CppGenericContext_const*__bool_.
Comment 25 Brendan Duncan 2022-03-21 15:27:05 PDT
Created attachment 455284 [details]
Crashing Unity project build with the problem function compiled with optimizations

I'm attaching two builds of a Unity project that are identical except for the one function I found to seemingly trigger the crash. In this crashing version, the function is compiled with `#pragma clang optimize on`.
The problem function is GenericMetadata::InflateIfNeeded. The WASM has symbols, and the function should be defined as il2cpp::metadata::GenericMetadata::InflateIfNeeded_Il2CppType_const*__Il2CppGenericContext_const*__bool_.
Comment 26 Brendan Duncan 2022-03-21 15:38:40 PDT
I attached two new builds, one that seems to work without crashing, and the other that crashes. The only difference between the two builds, is the one function I added #pragma clang optimize off to disable compiler optimizations.

The function itself is fairly simple. A switch statement, where one of the cases, when inlined in the switch and with optimizations, causes the crash.

That case is defined as:
    Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType));
    memcpy(genericType, type, sizeof(Il2CppType));
    genericType->data.generic_class = genericClass;
    ++il2cpp_runtime_stats.inflated_type_count;
    return genericType;

That case block in that function will be called 260 times, and then on the 261st time, something is different and object has bad data.

sizeof(Il2CppType) is 8, and Il2Cpp is defined as:
typedef struct Il2CppType
{
    union
    {
        void* dummy;
        TypeDefinitionIndex __klassIndex;
        Il2CppMetadataTypeHandle typeHandle;
        const Il2CppType *type;
        Il2CppArrayType *array;
        GenericParameterIndex __genericParameterIndex;
        Il2CppMetadataGenericParameterHandle genericParameterHandle;
        Il2CppGenericClass *generic_class;
    } data;
    unsigned int attrs    : 16;
    Il2CppTypeEnum type     : 8;
    unsigned int num_mods : 5;
    unsigned int byref    : 1;
    unsigned int pinned   : 1;
    unsigned int valuetype : 1;
} Il2CppType;


If I add a printf to the code, as:
    Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType));
    memcpy(genericType, type, sizeof(Il2CppType));
    genericType->data.generic_class = genericClass;
    ++il2cpp_runtime_stats.inflated_type_count;
    printf("%d == %d\n", genericType->type, type->type);
    return genericType;

Then after 260 prints, it starts printing 0 == 21. Nothing in the code should be changing genericType->type from type->type;


If I move the printf statement up, such as:
    Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType));
    memcpy(genericType, type, sizeof(Il2CppType));
    printf("%d == %d\n", genericType->type, type->type);
    genericType->data.generic_class = genericClass;
    ++il2cpp_runtime_stats.inflated_type_count;
    return genericType;

Then it always prints 21 == 21, and it still crashes.

And even stranger, if I print in both places:
    Il2CppType* genericType = (Il2CppType*)malloc(sizeof(Il2CppType));
    memcpy(genericType, type, sizeof(Il2CppType));
    printf("%d == %d\n", genericType->type, type->type);
    genericType->data.generic_class = genericClass;
    ++il2cpp_runtime_stats.inflated_type_count;
    printf("%d == %d\n", genericType->type, type->type);
    return genericType;

Then it always prints 21 == 21 in both places, and still crashes.
Comment 27 Carlo Piovesan 2022-03-23 03:58:01 PDT
I was checking the diff between the latest Wasm files posted by Brendan Duncan, and in the crashing case the very first switch case sometimes falls back to the following case:

--------------------
    br_table 2 4 5 6 6 6 //...
  end
  local.get 3
  i32.const 15
  i32.ne
  br_if 5
end //go on with flow IFF the third local was exactly 15
//...
--------------------

This reminded me of this other bug: https://bugs.webkit.org/show_bug.cgi?id=210105 connected to a similar problem while treating switches.

If this is the actual problem, on the producer side it might be possible to avoid this construct.
Comment 28 Justin Michaud 2022-03-23 18:37:58 PDT
Testing against safari-613.1.17.0-branch on iOS 15.4, I have confirmed that commenting out the mutations to the program from CSE’s processWasmAddressValue fixes ruffle.rs and the crashing test case. The original (black screen) test case is not fixed by this.

The WasmAddress CSE bug is being tracked by https://bugs.webkit.org/show_bug.cgi?id=238302 and I will have a patch up soon for that.

I think this bug should continue to track the original unity black screen test case, the cause of which is still unknown.
Comment 29 Brendan Duncan 2022-03-23 19:56:42 PDT
That's fantastic if this is the fix!
There are a number of WebGL graphics regressions that are currently being patched by the Angle WebGL developers, so if it's not crashing and is a black screen, then I suspect that's the separate WebGL issue.
Comment 30 Justin Michaud 2022-04-06 11:02:39 PDT
All of the WASM changes have landed, and the ANGLE changes landed a little while back.

I verified that the test cases on iOS and on macOS (with __XPC_JSC_useWebAssemblyFastMemory=0 to trigger the bug in the first place) are fixed with ToT plus the patch that just landed.

Thank you for taking the time to report these bugs!
Comment 31 paul 2022-04-08 01:57:19 PDT
*** Bug 238662 has been marked as a duplicate of this bug. ***
Comment 32 Simon Taylor 2022-04-21 01:39:48 PDT
Just a note - I am still experiencing these wasm bugs on the latest 15.5 beta 2 build (tested on an iPod touch 7th Gen). The ANGLE fixes for anti-aliased canvases do seem to have made it in to that build at least.
Comment 33 Mike Welsh 2022-04-29 21:21:14 PDT
Also noting that I am still experiencing the Wasm issues on iPad OS 15.5 beta 3 on https://ruffle.rs/demo (select a demo), in case it is at all helpful. Thank you!
Comment 34 Mike Welsh 2022-05-03 11:55:01 PDT
Update, the fix seems to have made it into iOS 15.4 beta 4, released today (May 3). Thanks everyone :-)
Comment 35 Simon Taylor 2022-05-04 09:28:24 PDT
Confirming this looks fixed to me too in the 15.5 beta 4 build.

Just to note the mention of 15.4 beta 4 in the previous comment was likely a typo - 15.4 introduced the bug, 15.5 beta 4 seems to be when the fix made it into an iOS build, so hopefully it stays fixed until the 15.5 final release goes out!
Comment 36 Sam Sneddon [:gsnedders] 2022-05-16 15:40:08 PDT
Just to let everyone know, the fix for this has shipped in Safari 15.5.