RESOLVED FIXED Bug 230749
REGRESSION (Safari 15): Poor WebGL performance on https://downloads.scirra.com/labs/particles
https://bugs.webkit.org/show_bug.cgi?id=230749
Summary REGRESSION (Safari 15): Poor WebGL performance on https://downloads.scirra.co...
Ashley Gullen
Reported 2021-09-24 05:52:48 PDT
Safari 15 appears to have a significant performance regression rendering points in WebGL relative to Safari 14. Perhaps this is related to the new Metal backend. Demo URL: https://downloads.scirra.com/labs/particles/ On an iPad Pro with Safari 15, this starts off very slow, around 12 FPS. There are only a few hundred point sprites being rendered, which should be well within the capability of the hardware. It seems to gradually warm up and end up rendering at either 30 FPS or 60 FPS. The CPU measurement (based on timing how long is spent in JS execution) looks very high. Conversely, an iPhone 7 Plus with iOS 14.7.1, and a Pixel 3 running Chrome for Android, both run the same demo at a steady 60 FPS right from the start. Safari 15 clearly has unexpectedly bad performance in this case. In our engine, the particles are drawn with gl.drawArrays(gl.POINTS, ...). The GLSL references both gl_PointSize and gl_PointCoord. We have not found any other performance issues with any of our other WebGL content. I speculate that this points to a performance regression specifically with point rendering in Safari 15's new Metal backend.
Attachments
Patch (2.20 KB, patch)
2021-10-26 04:25 PDT, Kimmo Kinnunen
no flags
Patch (2.21 KB, patch)
2021-10-26 04:27 PDT, Kimmo Kinnunen
no flags
Patch for landing (2.22 KB, patch)
2021-10-27 06:26 PDT, Kimmo Kinnunen
no flags
Ashley Gullen
Comment 1 2021-09-24 06:03:15 PDT
Hmm, actually forcing a normal quads rendering path does not seem to improve performance much. So perhaps this is not to do with gl.POINTS after all. I'll leave it up to a browser engineer to profile!
Ashley Gullen
Comment 2 2021-09-24 06:30:24 PDT
Here's the equivalent demo rendering with quads instead of points: https://downloads.scirra.com/labs/particles-quads/ For unrelated reasons, we already have a quad draw path for particles, so in this test I've just flipped the flag to use that. This will instead use lots of calls to gl.drawElements(gl.TRIANGLES, ...) with an index buffer. It's less efficient with draw calls than our points rendering path, but it still runs at 60 FPS on both the iPhone 7 Plus with iOS 14.7.1 and the Pixel 3, and is still very slow on the iPad Pro with Safari 15. In other words, the same performance characteristics as before. We need both paths to be fast for Construct content, as we use both rendering techniques depending on the content.
Ashley Gullen
Comment 3 2021-09-24 06:33:17 PDT
One more thing, do let me know if you can identify a workaround - if possible I'm keen to update our engine to avoid the performance cliff for the time being (assuming it will be 6 months until the next Safari update rolls out).
Kimmo Kinnunen
Comment 4 2021-09-24 08:20:48 PDT
Thanks for the report! I tried recent build, and it seems to work 60fps. If the demo uses index buffers, the root cause might have been bug 230107. If you want to further report observations: - Use Settings->General->Information to check which iPad you have, and report the model name. - Check if "WebGL via Metal" == Off has an effect on the FPS. You find the setting from Settings -> Safari -> Advanced -> Experimental settings -> WebGL via Metal. Remember to turn it back on.
Ashley Gullen
Comment 5 2021-09-24 10:12:54 PDT
The iPad Pro with Safari 15 that exhibits poor performance is an "iPad Pro (12.9-inch)", model number ML0H2ZP/A. I tried turning off "WebGL via Metal" as advised, but it didn't seem to make much difference. Both samples still run very poorly. I'm not sure if the device needs a reboot in between or something? I'd also point out the first sample uses dl.drawArrays(gl.POINTS) and does not make use of an index buffer.
Gregg Tavares
Comment 6 2021-09-24 17:28:49 PDT
> This will instead use lots of calls to gl.drawElements(gl.TRIANGLES, ...) with an index buffer. It's less efficient with draw calls than our points rendering path, Not a bug fix but you can emulate points with instancing and no extra data (well you need a single unitQuad. You'll get the benefit that there's no limit to their size and their offscreen clipping will be consistent across devices, something you don't get with gl.POINTS https://jsgist.org/?src=535da7687726e14dc26884e085de7629
Alexey Proskuryakov
Comment 7 2021-09-26 14:35:27 PDT
> The iPad Pro with Safari 15 that exhibits poor performance is an "iPad Pro (12.9-inch)", model number ML0H2ZP/A. This is the original iPad Pro from 2015. I have this model with iOS 14.6 installed, and can confirm that it renders at 60 FPS according to what the webpage measures. > If the demo uses index buffers, the root cause might have been bug 230107. I don't think that I can easily get such a build to try out at this point.
Ashley Gullen
Comment 8 2021-09-27 07:59:51 PDT
@Gregg - is there any evidence instancing actually performs better in Safari 15? The performance pitfall also seems to apply to just quads, and since it's not clear why or when the performance pitfall happens, it seems it could also affect instancing.
Radar WebKit Bug Importer
Comment 9 2021-09-27 09:19:40 PDT
Gregg Tavares
Comment 10 2021-09-27 16:23:57 PDT
Sorry, I didn't mean to imply that solution solves this particular perf issue. Instead I just meant to point out, if you're using points, as the demo linked above shows then (1) lots of your users are having those images disappear when their center hits the edge of the view and others are not. You really shouldn't be using points at all if you want things to run the same across devices. Checking my own devices I can see the particles are disappearing early on my AMD Mac but moving offscreen smoothly on my NVidia PC. (2) emulating points is trivial, solves that issue, and only requires a single draw call, same as POINTS, so whatever overhead there is per draw call should disappear ... separate from any perf bugs in WebKit, ANGLE, the driver.
Richard Davey
Comment 11 2021-09-27 16:29:52 PDT
We're seeing an inconsistent but significant performance decrease in games made with our library (Phaser 3). Games that run at 60fps solid under iOS14 now struggle to hit 20fps. The inconsistency is that it doesn't impact them all, so it can't be our default shaders. We also don't use GL_POINTS at all, so it's unrelated to that. We have simple quads and index buffers. I will start trying to debug this tomorrow and produce a test case for you but just wanted to add this is a very real issue. It's also not device-specific as we've had reports from across pretty much all modern Apple hardware. It's almost as if something forces Safari into low-power mode, even though that is disabled in the General Settings.
Alexey Proskuryakov
Comment 12 2021-09-27 16:43:25 PDT
> Games that run at 60fps solid under iOS14 now struggle to hit 20fps. Please file a new bug about that, unless you are very certain that it's the same root cause as with https://downloads.scirra.com/labs/particles/
Ashley Gullen
Comment 13 2021-09-28 05:21:00 PDT
@Gregg - I tried hacking together code using instancing instead, and it's also still slow. So I don't think instancing has anything to do with the performance issue reported here. I think the issue title might be misleading now - the test case appears to show a serious performance regression in Safari 15 with points rendering, indexed quads, and instancing.
Richard Davey
Comment 14 2021-09-28 06:20:39 PDT
(In reply to Alexey Proskuryakov from comment #12) > > Games that run at 60fps solid under iOS14 now struggle to hit 20fps. > > Please file a new bug about that, unless you are very certain that it's the > same root cause as with https://downloads.scirra.com/labs/particles/ If it turns out to be different, I will. However, as no one actually knows what causes this in this issue yet, it's too early to say. My guess is that's it's all related.
Ashley Gullen
Comment 15 2021-09-28 06:26:59 PDT
Edited issue title myself as it does not appear to be specific to gl.POINTS.
Kimmo Kinnunen
Comment 16 2021-09-29 05:44:21 PDT
Note: the original report is about bad perf on startup, e.g. slow frame rate during first 10 secs or so. I can repro the slowness during startup on said iPad Pro model. Posted the trace to the rdar.
tjkoury@gmail.com
Comment 17 2021-09-29 07:18:16 PDT
Seeing the same kind of performance hit. FPS are cut in half on all platforms (MacBook, iPhone, iPad): https://celestrak.com/cesium/orbit-viz.php?tle=/pub/TLE/catalog.txt&satcat=/pub/satcat.txt&referenceFrame=1
Alexey Proskuryakov
Comment 18 2021-09-29 10:08:47 PDT
> Seeing the same kind of performance hit. FPS are cut in half on all platforms (MacBook, iPhone, iPad): Please file a new bug report about this. Discussing all WebGL issues in one report makes it LESS likely that they will be fixed soon (because of inevitable confusion), not more likely.
Kimmo Kinnunen
Comment 19 2021-10-25 05:51:38 PDT
This is most likely due to the same root cause as bug 231607. The test case appears to ramp up the amount of vertices drawn with drawArrays. This causes the conversion attribute buffers to be re-allocated for each draw. To work around the issue, you could try to store each vertex attribute in a separate buffer.
Ashley Gullen
Comment 20 2021-10-25 08:06:02 PDT
IIRC, in the provided demos, each vertex attribute already uses its own separate buffer. So I don't think that suggested workaround will help here.
Kimmo Kinnunen
Comment 21 2021-10-26 04:15:08 PDT
Thanks for the info. Yes, I can see the root cause is not that.
Kimmo Kinnunen
Comment 22 2021-10-26 04:25:05 PDT
EWS Watchlist
Comment 23 2021-10-26 04:26:26 PDT
Note that there are important steps to take when updating ANGLE. See https://trac.webkit.org/wiki/UpdatingANGLE
Kimmo Kinnunen
Comment 24 2021-10-26 04:27:37 PDT
Kenneth Russell
Comment 25 2021-10-26 11:19:24 PDT
Comment on attachment 442479 [details] Patch Fantastic diagnosis Kimmo. Let's make sure this gets upstreamed. r+
Kenneth Russell
Comment 26 2021-10-26 11:35:05 PDT
Filed: Metal: Upstream index buffer cache mapping performance improvement https://bugs.chromium.org/p/angleproject/issues/detail?id=6620 about upstreaming this patch.
Le Hoang Quyen
Comment 27 2021-10-26 18:59:30 PDT
(In reply to Kimmo Kinnunen from comment #24) > Created attachment 442479 [details] > Patch Hi, I am interested in this finding. Would it be more correct to change this condition: ``` if (!noSync && isCPUReadMemNeedSync()) ``` to: ``` if (!noSync && (isCPUReadMemNeedSync() || !readonly)) ``` ? The synchronization should be done if `noSync` is not requested AND 1. either the buffer's CPU memory needs synchronization due to ongoing GPU modifications. 2. or we are mapping within an intention to WRITE (even if the GPU is not modifying it). * If we don't synchronize in the 2nd case then the buffer would be modified while its old content is being drawn by the GPU -> undefined results. * `isCPUReadMemNeedSync()` flag is only used to indicate that the buffer is being modified by GPU and this modification needs to be made visible to CPU in order for it to read the update-to-date data.
Le Hoang Quyen
Comment 28 2021-10-26 19:05:20 PDT
btw, I didn't notice that ANGLE needs to update the index range every time `drawElements` is called with a new range. Previously I mostly worked with "normal" mode OpenGL of ANGLE. Apparently, ANGLE will always do index range check when it operates in "WebGL compatibility" mode.
Kimmo Kinnunen
Comment 29 2021-10-27 06:26:01 PDT
Created attachment 442584 [details] Patch for landing
Kimmo Kinnunen
Comment 30 2021-10-27 06:27:28 PDT
(In reply to Le Hoang Quyen from comment #27) > (In reply to Kimmo Kinnunen from comment #24) > > Created attachment 442479 [details] > > Patch > > Hi, I am interested in this finding. > > Would it be more correct to change this condition: > ``` > if (!noSync && isCPUReadMemNeedSync()) > ``` > to: > ``` > if (!noSync && (isCPUReadMemNeedSync() || !readonly)) > ``` > Yes, exactly! Thanks for review. (I couldn't put you as the reviewer due to tooling, sorry). Taking the liberty to land with Ken's approval and above amends.
EWS
Comment 31 2021-10-27 10:43:45 PDT
Committed r284927 (243598@main): <https://commits.webkit.org/243598@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 442584 [details].
osoker008
Comment 32 2021-11-23 04:16:52 PST
How long time does is take for this BugFixed Update in user's iOS Safari 15?
Ashley Gullen
Comment 33 2022-01-03 06:35:39 PST
This issue is marked fixed but it still reproduces in Safari 15.2. Is it really fixed? Should I file a new issue again?
Alexey Proskuryakov
Comment 34 2022-01-03 08:38:27 PST
This fix is not yet included in any Safari release or beta (other than Safari Technology Preview for macOS). We can not comment about future releases.
Ashley Gullen
Comment 35 2022-01-03 10:14:44 PST
I know this is probably out of your hands, but it is a pretty developer-hostile policy. It leaves us having to manually test every release that comes out to find out for ourselves as nobody at Apple is allowed to tell us anything about when anything will actually ship.
Note You need to log in before you can comment on or make changes to this bug.