Bug 231031 - REGRESSION (Safari 15): WebGL Video Texture Performance Regression - Looks GPU-process related
Summary: REGRESSION (Safari 15): WebGL Video Texture Performance Regression - Looks GP...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebGL (show other bugs)
Version: Safari 15
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar
Depends on: 227586 231354 231359 231424 231425
Blocks:
  Show dependency treegraph
 
Reported: 2021-09-30 11:09 PDT by Simon Taylor
Modified: 2022-02-28 23:33 PST (History)
9 users (show)

See Also:


Attachments
Screen recording of apparent rAF throttling (18.12 MB, video/quicktime)
2021-12-14 04:52 PST, Simon Taylor
no flags Details
Screenshot of System Trace from Instruments (852.73 KB, image/png)
2022-01-05 08:24 PST, Simon Taylor
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Taylor 2021-09-30 11:09:58 PDT
The default "Experimental Settings" in Safari 15 (both Mac Version 15.0 (16612.1.29.41.4, 16612) and iOS 15.0) looks to have "GPU Process: Media" enabled and "GPU Process: WebGL" disabled.

In this configuration texture upload performance from video nodes appears to have regressed significantly.

A previous texUpload2d performance regression (bug 216250) ended up being related to changes causing the SW fallback upload path to be used rather than the HW one.

My suspicion is that the HW GPU upload path isn't used with this default mixed configuration of GPU Process settings.

The test pages from bug 216250 show the issue:
https://tango-bravo.net/WebGLVideoPerformance/texImage2d_video.html (uses gl.RGB format and internal format)
https://tango-bravo.net/WebGLVideoPerformance/texImage2d_video_rgba.html (uses gl.RGBA format and internal format)

Also of note is Bug 203148 which seems to be a way to tell if the HW path is used or not - it seems to be a bug specific to the HW upload path.

Before getting to some sample numbers, a couple of points:
1) Benchmarking mobile devices is hard, as they downclock aggressively to save battery (especially when content can comfortably render at 60 fps). Anecdotally I've noticed the timings seem generally lowest and most stable on iOS when opening the app switcher (the Safari content still renders in the app switcher) - that's where I've taken the numbers from. I also had the device plugged in.
2) I've navigated between the pages a bit and these differences appear consistent.
3) It seems a device reboot is a good idea when toggling any of the experimental settings.

Some numbers:

iPod touch 7th gen, iOS 15

 GPU Process: Media | GPU Process: WebGL | RGB Test Upload Time | RGBA Test Upload Time | Bug 203148 Reproducible
 On (Default)       | Off (Default)      | 6-7 ms               | 4-5 ms                | No
 Off                | Off                | 4 ms                 | 2-3 ms                | Yes
 On                 | On                 | 4 ms                 | 2-3 ms                | Yes


Intel MacBook Pro (2017, i5)

Same basic pattern - in the default mixed mode bug 203148 doesn't reproduce, and uploads are slower - typically around 2ms, which drops to around 0.5ms when toggling GPU Process: Media off in Experimental settings in the Develop menu.

(ps: Safari 15 needs adding to the Version field)
Comment 1 Radar WebKit Bug Importer 2021-09-30 12:08:33 PDT
<rdar://problem/83731769>
Comment 2 Kimmo Kinnunen 2021-10-07 03:21:30 PDT
Thanks for the report and good benchmark.

If you have time, please update the title and comments regarding following details:
It appears that you mention that the performance characteristics is a regression to something. Please add mention what characteristic regressed compared to which release. For macOS reports, it's important to include both Safari version as well as the OS version.


To simplify the reporting, please use only the default flags. Non-default flags are not intended for use.

For the benchmark:
WebGL calls complete asynchronously, so you cannot benchmark a call by using pattern:
 var t = time();
 gl.texImage2D(...);
 var dt = time() - dt;
This is especially true with the GPU process case, where the JS call may have returned when the GPU Process might not even have started running the command.

If you want to see how long the time took, you can employ either of following:
a) Call gl.finish();
b) Increase the amount of work done in rAF callback until rAF starts missing 60fps mark.

The option b is preferred. In this case, the test would increase the times it uploads to different textures until rAF callback would not be called at 16ms intervals. Care should be taken to not hit the optimisation or the bug discussed in bug 203148. One strategy would be to upload two different videos on alternate steps.

To compare against the SW codepath:
Currently only full texture uploads go through the GPU path, so if you do texSubImage2D with w-1 width, it will force the SW codepath.


For RGB vs RGBA difference:
There are certain quirks that force RGB be a bit slower than RGBA, especially on older hardware that uses the OpenGL and OpenGL ES backends. This difference is known and less prioritised. If you notice the same behaviour on newer HW or newer macOS, that is higher priority.
Comment 3 Kimmo Kinnunen 2021-10-07 04:45:02 PDT
The iOS regression can be attributed to bug 231354.

iPad Pro (A9, 2015)

iOS 14.8 (OpenGL ES backend)
-RGB ~5ms
-RGBA ~2ms

iOS 15.1 (Metal backend)
-RGB 6ms
-RGBA 6ms

(Curiously also OpenGL ES backend on 15.1 has regressed, but in this we should focus on fixing the Metal backend.)
Comment 4 Simon Taylor 2021-10-07 07:46:36 PDT
Thanks Kimmo,

The regression on iOS is with respect to the test cases in Bug 216250 - my baseline case was some iOS 13 numbers which gave around 2ms for the texUpload2d call.

I take your point about timing around the texUpload2d call not necessarily capturing the full GPU workload. However as the main JS thread is the only one that can make WebGL calls, and the game loop logic also needs to be running in 16ms to maintain 60 FPS, the amount of time individual WebGL functions take on the main JS thread is also critical to performance. [Aside - on Chrome for example there were some glGet calls that involved a GPU process round trip with a horrible performance penalty].

I usually try out Bug 203148 on each iOS update, and noticed it was no longer reproducible on iOS 15. I then also noticed the upload timings looked higher than I remembered with previous iOS versions. A short comment in Bug 216250 wondering whether iOS 15 changes had potentially regressed video upload performance received a response requesting I add a new bug, hence this one.

Performance with default settings is all I personally care about, but I noticed those seemingly new GPU Process settings that sounded potentially relevant and they do appear to have an impact on the timings. It sounds from your latest comments that it’s more likely a Metal backend change rather than the GPU Process stuff though.

Default iOS 15 settings have "GPU Process: Media" turned on, and "GPU Process: WebGL" turned off. I wondered whether the HW GPU upload path was still expected to be useable in that case? I guess there is some XPC overhead still involved, so perhaps that overhead explains the difference between the timings and that's just the price to pay for the improved security (?) of moving Media decoding to a separate process.
Comment 5 Dustin Kerstein 2021-10-23 05:23:56 PDT
See here for another simple replication test case - https://jsfiddle.net/qz9ka6xm (using RGBA textures from a very simple cycling color video).

On stock Safari 15 + latest Beta 15.1 Safari + Safari Technical Preview, when using my 2018 Macbook Pro i9 I am always seeing around 18-20fps in that above test, regardless of enabling GPU Process WebGL and/or WebGL Metal.

However, on a Macbook 13 M1, enabling WebGL Metal on Safari 15, seems to fix the issue entirely, with fps >50. When WebGL Metal is disabled (ie. the default setting), I see 4fps - Which results in breaking many WebGL dependent websites. 

Has there been any work / investigation into these serious performance regressions?
Comment 6 Dustin Kerstein 2021-10-23 05:32:18 PDT
It might also be worth looking into the Feedback assistant bugs: 

FB9688897
FB9666426
FB9554184

As some of these issues may be related to core VideoToolbox issues.
Comment 7 Simon Taylor 2021-12-14 04:50:03 PST
No noticeable improvement in iOS 15.2 on the iPod touch 7 that I generally use for testing.

There's also some weirdness I have noticed where the page doesn't always call rAF at 60FPS, even though frame timings remain ~5ms and Timelines in web inspector suggest there are sufficient resources - just looks like the callback rate is being throttled.

This may not be new in 15.2 (I've seen weirdly throttled simple WebGL content in earlier iOS 15 releases too) but this seems readily reproducible here - navigating to one of the demo pages by tapping the links from this bug report seems to load it in some sort of throttled mode, which survives refresh / back-forward / etc. Killing Safari and re-opening and we're back up to 60FPS. It's almost like each navigation ends up with a different rAF callback rate. I'll attach a screen recording.

Is this just the new normal with GPU process and Metal backend or is there any point hoping for improvements? iOS 15 has definitely made life much harder for those of us trying to ship smooth interactive WebGL experiences for mobile.
Comment 8 Simon Taylor 2021-12-14 04:52:23 PST
Created attachment 447124 [details]
Screen recording of apparent rAF throttling

Screen recording showing some navigations seem to get throttled rAF callback rates. When not screen recording the throttled rates seem a little lower - 40-45 FPS, but even here it's apparent we're not getting the solid 60 FPS I'd expect for this content until Safari is restarted.
Comment 9 Kimmo Kinnunen 2021-12-14 10:46:16 PST
(In reply to Simon Taylor from comment #7)
> No noticeable improvement in iOS 15.2 on the iPod touch 7 that I generally
> use for testing.

Thanks for the report. I'll check it out.

> There's also some weirdness I have noticed where the page doesn't always
> call rAF at 60FPS, even though frame timings remain ~5ms and Timelines in
> web inspector suggest there are sufficient resources - just looks like the
> callback rate is being throttled.

Filed this as bug 234303
Comment 10 Simon Taylor 2022-01-05 08:24:06 PST
Created attachment 448389 [details]
Screenshot of System Trace from Instruments

To correct my earlier comment - it does look like the optimization to skip much of the work if the frame hasn't changed is now in place in iOS 15.2, so thanks for that.

I've attached an Instruments screenshot marked up with the texImage2D calls in two frames. You can see the second call involves significantly less work, but both do end up blocking the main thread for IPC with the GPU Process. Probably unavoidable with media decode running in the GPU Process and WebGL in the content process, I suppose.

There's a couple of other thread-blocking calls back in the content process (such as the [_MTLCommandBuffer waitUntilScheduled] shown in the screenshot - another one of those is responsible for the block before too).

Sharing in the hope there might be some quick wins to be had (or a way to move some of this work out of the JS texImage2D call). Sounds like better performance would come if/when WebGL is also moved into the GPU Process, so feel free to close this if there's nothing else that can be done to improve things with the split-process model.

As an aside, I tried createImageBitmap(video) too in the hope it would kick off the conversion process and resolve the promise with effectively a handle to the converted RGBA texture, but it seems to resolve quickly and most of the actual work still happens in the following texImage2d(imageBitmap) call.
Comment 11 Simon Taylor 2022-02-28 02:31:32 PST
I've closed this as "Resolved Fixed" - in iOS 15.4 beta 4 I think things are working within the range of performance seen in iOS 14.
Comment 12 Kimmo Kinnunen 2022-02-28 23:33:14 PST
(In reply to Simon Taylor from comment #11)
> I've closed this as "Resolved Fixed" - in iOS 15.4 beta 4 I think things are
> working within the range of performance seen in iOS 14.

Thanks for confirming and the investigation!