Bug 230749 - REGRESSION (Safari 15): Poor WebGL performance on https://downloads.scirra.com/labs/particles
Summary: REGRESSION (Safari 15): Poor WebGL performance on https://downloads.scirra.co...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: ANGLE (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kimmo Kinnunen
URL:
Keywords: InRadar
Depends on:
Blocks: anglemetalregr
  Show dependency treegraph
 
Reported: 2021-09-24 05:52 PDT by Ashley Gullen
Modified: 2022-03-18 16:50 PDT (History)
15 users (show)

See Also:


Attachments
Patch (2.20 KB, patch)
2021-10-26 04:25 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Patch (2.21 KB, patch)
2021-10-26 04:27 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff
Patch for landing (2.22 KB, patch)
2021-10-27 06:26 PDT, Kimmo Kinnunen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ashley Gullen 2021-09-24 05:52:48 PDT
Safari 15 appears to have a significant performance regression rendering points in WebGL relative to Safari 14. Perhaps this is related to the new Metal backend.

Demo URL: https://downloads.scirra.com/labs/particles/

On an iPad Pro with Safari 15, this starts off very slow, around 12 FPS. There are only a few hundred point sprites being rendered, which should be well within the capability of the hardware. It seems to gradually warm up and end up rendering at either 30 FPS or 60 FPS. The CPU measurement (based on timing how long is spent in JS execution) looks very high.

Conversely, an iPhone 7 Plus with iOS 14.7.1, and a Pixel 3 running Chrome for Android, both run the same demo at a steady 60 FPS right from the start. Safari 15 clearly has unexpectedly bad performance in this case.

In our engine, the particles are drawn with gl.drawArrays(gl.POINTS, ...). The GLSL references both gl_PointSize and gl_PointCoord. We have not found any other performance issues with any of our other WebGL content. I speculate that this points to a performance regression specifically with point rendering in Safari 15's new Metal backend.
Comment 1 Ashley Gullen 2021-09-24 06:03:15 PDT
Hmm, actually forcing a normal quads rendering path does not seem to improve performance much. So perhaps this is not to do with gl.POINTS after all. I'll leave it up to a browser engineer to profile!
Comment 2 Ashley Gullen 2021-09-24 06:30:24 PDT
Here's the equivalent demo rendering with quads instead of points: https://downloads.scirra.com/labs/particles-quads/

For unrelated reasons, we already have a quad draw path for particles, so in this test I've just flipped the flag to use that. This will instead use lots of calls to gl.drawElements(gl.TRIANGLES, ...) with an index buffer. It's less efficient with draw calls than our points rendering path, but it still runs at 60 FPS on both the iPhone 7 Plus with iOS 14.7.1 and the Pixel 3, and is still very slow on the iPad Pro with Safari 15. In other words, the same performance characteristics as before.

We need both paths to be fast for Construct content, as we use both rendering techniques depending on the content.
Comment 3 Ashley Gullen 2021-09-24 06:33:17 PDT
One more thing, do let me know if you can identify a workaround - if possible I'm keen to update our engine to avoid the performance cliff for the time being (assuming it will be 6 months until the next Safari update rolls out).
Comment 4 Kimmo Kinnunen 2021-09-24 08:20:48 PDT
Thanks for the report! I tried recent build, and it seems to work 60fps.
If the demo uses index buffers, the root cause might have been bug 230107.

If you want to further report observations:
- Use Settings->General->Information to check which iPad you have, and report the model name.
- Check if "WebGL via Metal" == Off has an effect on the FPS. You find the setting from Settings -> Safari -> Advanced -> Experimental settings -> WebGL via Metal.
Remember to turn it back on.
Comment 5 Ashley Gullen 2021-09-24 10:12:54 PDT
The iPad Pro with Safari 15 that exhibits poor performance is an "iPad Pro (12.9-inch)", model number ML0H2ZP/A.

I tried turning off "WebGL via Metal" as advised, but it didn't seem to make much difference. Both samples still run very poorly. I'm not sure if the device needs a reboot in between or something?

I'd also point out the first sample uses dl.drawArrays(gl.POINTS) and does not make use of an index buffer.
Comment 6 Gregg Tavares 2021-09-24 17:28:49 PDT
> This will instead use lots of calls to gl.drawElements(gl.TRIANGLES, ...) with an index buffer. It's less efficient with draw calls than our points rendering path,

Not a bug fix but you can emulate points with instancing and no extra data (well you need a single unitQuad. You'll get the benefit that there's no limit to their size and their offscreen clipping will be consistent across devices, something you don't get with gl.POINTS

https://jsgist.org/?src=535da7687726e14dc26884e085de7629
Comment 7 Alexey Proskuryakov 2021-09-26 14:35:27 PDT
> The iPad Pro with Safari 15 that exhibits poor performance is an "iPad Pro (12.9-inch)", model number ML0H2ZP/A.

This is the original iPad Pro from 2015. I have this model with iOS 14.6 installed, and can confirm that it renders at 60 FPS according to what the webpage measures.

> If the demo uses index buffers, the root cause might have been bug 230107.

I don't think that I can easily get such a build to try out at this point.
Comment 8 Ashley Gullen 2021-09-27 07:59:51 PDT
@Gregg - is there any evidence instancing actually performs better in Safari 15? The performance pitfall also seems to apply to just quads, and since it's not clear why or when the performance pitfall happens, it seems it could also affect instancing.
Comment 9 Radar WebKit Bug Importer 2021-09-27 09:19:40 PDT
<rdar://problem/83576271>
Comment 10 Gregg Tavares 2021-09-27 16:23:57 PDT
Sorry, I didn't mean to imply that solution solves this particular perf issue. 

Instead I just meant to point out, if you're using points, as the demo linked above shows then 

(1) lots of your users are having those images disappear when their center hits the edge of the view and others are not. You really shouldn't be using points at all if you want things to run the same across devices. Checking my own devices I can see the particles are disappearing early on my AMD Mac but moving offscreen smoothly on my NVidia PC.

(2) emulating points is trivial, solves that issue, and only requires a single draw call, same as POINTS, so whatever overhead there is per draw call should disappear  ... separate from any perf bugs in WebKit, ANGLE, the driver.
Comment 11 Richard Davey 2021-09-27 16:29:52 PDT
We're seeing an inconsistent but significant performance decrease in games made with our library (Phaser 3). Games that run at 60fps solid under iOS14 now struggle to hit 20fps. The inconsistency is that it doesn't impact them all, so it can't be our default shaders. We also don't use GL_POINTS at all, so it's unrelated to that. We have simple quads and index buffers.

I will start trying to debug this tomorrow and produce a test case for you but just wanted to add this is a very real issue. It's also not device-specific as we've had reports from across pretty much all modern Apple hardware. It's almost as if something forces Safari into low-power mode, even though that is disabled in the General Settings.
Comment 12 Alexey Proskuryakov 2021-09-27 16:43:25 PDT
> Games that run at 60fps solid under iOS14 now struggle to hit 20fps. 

Please file a new bug about that, unless you are very certain that it's the same root cause as with https://downloads.scirra.com/labs/particles/
Comment 13 Ashley Gullen 2021-09-28 05:21:00 PDT
@Gregg - I tried hacking together code using instancing instead, and it's also still slow. So I don't think instancing has anything to do with the performance issue reported here.

I think the issue title might be misleading now - the test case appears to show a serious performance regression in Safari 15 with points rendering, indexed quads, and instancing.
Comment 14 Richard Davey 2021-09-28 06:20:39 PDT
(In reply to Alexey Proskuryakov from comment #12)
> > Games that run at 60fps solid under iOS14 now struggle to hit 20fps. 
> 
> Please file a new bug about that, unless you are very certain that it's the
> same root cause as with https://downloads.scirra.com/labs/particles/

If it turns out to be different, I will. However, as no one actually knows what causes this in this issue yet, it's too early to say. My guess is that's it's all related.
Comment 15 Ashley Gullen 2021-09-28 06:26:59 PDT
Edited issue title myself as it does not appear to be specific to gl.POINTS.
Comment 16 Kimmo Kinnunen 2021-09-29 05:44:21 PDT
Note: the original report is about bad perf on startup, e.g. slow frame rate during first 10 secs or so.
I can repro the slowness during startup on said iPad Pro model.
Posted the trace to the rdar.
Comment 17 tjkoury@gmail.com 2021-09-29 07:18:16 PDT
Seeing the same kind of performance hit. FPS are cut in half on all platforms (MacBook, iPhone, iPad):

 https://celestrak.com/cesium/orbit-viz.php?tle=/pub/TLE/catalog.txt&satcat=/pub/satcat.txt&referenceFrame=1
Comment 18 Alexey Proskuryakov 2021-09-29 10:08:47 PDT
>  Seeing the same kind of performance hit. FPS are cut in half on all platforms (MacBook, iPhone, iPad):

Please file a new bug report about this. Discussing all WebGL issues in one report makes it LESS likely that they will be fixed soon (because of inevitable confusion), not more likely.
Comment 19 Kimmo Kinnunen 2021-10-25 05:51:38 PDT
This is most likely due to the same root cause as bug 231607.
The test case appears to ramp up the amount of vertices drawn with drawArrays.
This causes the conversion attribute buffers to be re-allocated for each draw.

To work around the issue, you could try to store each vertex attribute in a separate buffer.
Comment 20 Ashley Gullen 2021-10-25 08:06:02 PDT
IIRC, in the provided demos, each vertex attribute already uses its own separate buffer. So I don't think that suggested workaround will help here.
Comment 21 Kimmo Kinnunen 2021-10-26 04:15:08 PDT
Thanks for the info.
Yes, I can see the root cause is not that.
Comment 22 Kimmo Kinnunen 2021-10-26 04:25:05 PDT
Created attachment 442478 [details]
Patch
Comment 23 EWS Watchlist 2021-10-26 04:26:26 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Comment 24 Kimmo Kinnunen 2021-10-26 04:27:37 PDT
Created attachment 442479 [details]
Patch
Comment 25 Kenneth Russell 2021-10-26 11:19:24 PDT
Comment on attachment 442479 [details]
Patch

Fantastic diagnosis Kimmo. Let's make sure this gets upstreamed. r+
Comment 26 Kenneth Russell 2021-10-26 11:35:05 PDT
Filed:

Metal: Upstream index buffer cache mapping performance improvement 
https://bugs.chromium.org/p/angleproject/issues/detail?id=6620

about upstreaming this patch.
Comment 27 Le Hoang Quyen 2021-10-26 18:59:30 PDT
(In reply to Kimmo Kinnunen from comment #24)
> Created attachment 442479 [details]
> Patch

Hi, I am interested in this finding.

Would it be more correct to change this condition:
```
if (!noSync && isCPUReadMemNeedSync())
```
to:
```
if (!noSync && (isCPUReadMemNeedSync() || !readonly))
```

?

The synchronization should be done if `noSync` is not requested AND
1. either the buffer's CPU memory needs synchronization due to ongoing GPU modifications.
2. or we are mapping within an intention to WRITE (even if the GPU is not modifying it).

* If we don't synchronize in the 2nd case then the buffer would be modified while its old content is being drawn by the GPU -> undefined results.

* `isCPUReadMemNeedSync()` flag is only used to indicate that the buffer is being modified by GPU and this modification needs to be made visible to CPU in order for it to read the update-to-date data.
Comment 28 Le Hoang Quyen 2021-10-26 19:05:20 PDT
btw, I didn't notice that ANGLE needs to update the index range every time `drawElements` is called with a new range.
Previously I mostly worked with "normal" mode OpenGL of ANGLE.
Apparently, ANGLE will always do index range check when it operates in "WebGL compatibility" mode.
Comment 29 Kimmo Kinnunen 2021-10-27 06:26:01 PDT
Created attachment 442584 [details]
Patch for landing
Comment 30 Kimmo Kinnunen 2021-10-27 06:27:28 PDT
(In reply to Le Hoang Quyen from comment #27)
> (In reply to Kimmo Kinnunen from comment #24)
> > Created attachment 442479 [details]
> > Patch
> 
> Hi, I am interested in this finding.
> 
> Would it be more correct to change this condition:
> ```
> if (!noSync && isCPUReadMemNeedSync())
> ```
> to:
> ```
> if (!noSync && (isCPUReadMemNeedSync() || !readonly))
> ```
> 

Yes, exactly! Thanks for review. (I couldn't put you as the reviewer due to tooling, sorry).

Taking the liberty to land with Ken's approval and above amends.
Comment 31 EWS 2021-10-27 10:43:45 PDT
Committed r284927 (243598@main): <https://commits.webkit.org/243598@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 442584 [details].
Comment 32 osoker008 2021-11-23 04:16:52 PST
How long time does is take for this BugFixed Update in user's iOS Safari 15?
Comment 33 Ashley Gullen 2022-01-03 06:35:39 PST
This issue is marked fixed but it still reproduces in Safari 15.2. Is it really fixed? Should I file a new issue again?
Comment 34 Alexey Proskuryakov 2022-01-03 08:38:27 PST
This fix is not yet included in any Safari release or beta (other than Safari Technology Preview for macOS). We can not comment about future releases.
Comment 35 Ashley Gullen 2022-01-03 10:14:44 PST
I know this is probably out of your hands, but it is a pretty developer-hostile policy. It leaves us having to manually test every release that comes out to find out for ourselves as nobody at Apple is allowed to tell us anything about when anything will actually ship.