NEW 283820
REGRESSION(286605@main): [WPE] Massive performance regression on postercircle
https://bugs.webkit.org/show_bug.cgi?id=283820
Summary REGRESSION(286605@main): [WPE] Massive performance regression on postercircle
Miguel Gomez
Reported 2024-11-29 04:47:54 PST
The change to implement the new preserve3D rendering has caused a massive performance regression on postercircle on embedded devices. On the rpi3, the framerate has dropped from 60fps to less than 10 running on 720p. I've measured that the compositions are taking more than 100ms. I'm aware that the calculations done for the new rendering are expensive, but we need to optimize them somehow. Postercircle is a very well known test for 3d rendering and we cannot afford to show such a low performance on embedded devices.
Attachments
Jani Hautakangas
Comment 1 2024-11-29 07:19:47 PST
I don't think it's the calculations causing this, they are not that expensive. There's something else going wrong. I'll start looking into it
Adrian Perez
Comment 2 2024-11-29 07:39:46 PST
For the record, I had backported 286605@main to the webkitglib/2.46 release branch before the .4 release; I have reverted it in the release branch for now.
Jani Hautakangas
Comment 3 2024-11-30 03:02:50 PST
I'm able to reproduce this on rpi3. The problem is less noticeable in desktop environments due to their better processing power. The issue isn't with the calculations or even the new preserve3d logic, but rather with how TextureMapper renders intermediate surfaces. Currently, TextureMapper attempts to minimize the size of intermediate surfaces by considering clip regions and only rendering overlapping parts necessary for the surface. This logic dates back to 2013 (see https://bugs.webkit.org/show_bug.cgi?id=110762). However, with modern hardware and use cases, this optimization may now cause more harm than good in terms of performance. TextureMapper doesn’t cache intermediate surfaces during a single paint call. Instead, it renders regions on demand, often causing the same content to be rendered multiple times into slightly different-sized intermediate surfaces. This behavior is particularly problematic for translucent layers, such as the poster circle, which require intermediate surfaces. When these layers are clipped, multiple small intermediate surfaces are created, leading to reduced performance. This issue is not limited to the poster circle—it also affects other scenarios where layers requiring intermediate surfaces are rendered multiple times, such as backdrops and replicas. The solution is to pre-render the full layer’s intermediate surface at the start of the paint call. This pre-rendered surface can be reused multiple times within the same paint call for rendering masked splits from that layer and released at the end of the paint call. This logic is already implemented for flattened layer surfaces and can be extended to more general use cases. In fact, this approach was discussed during the implementation of the new preserve3d logic, but I haven’t yet had the opportunity to implement it. I'll start working on this
Fujii Hironori
Comment 4 2024-11-30 04:51:00 PST
Sounds a good idea. BTW, can we skip stencil clipping if no intersecting 3D layers?
Nikolas Zimmermann
Comment 5 2024-12-02 02:32:50 PST
Great Jani! That sounds like a plan. Thanks for the clarification.
Jani Hautakangas
Comment 6 2024-12-02 03:09:32 PST
(In reply to Fujii Hironori from comment #4) > Sounds a good idea. > BTW, can we skip stencil clipping if no intersecting 3D layers? Yes, in that case clipping can be skipped
Jani Hautakangas
Comment 7 2024-12-02 03:30:52 PST
I identified several additional issues that need attention: 1. Unnecessary Splits: BSP calculation includes layers that have a size but produce no visual output, resulting in unnecessary splits. This should be straightforward to fix. 2. Stencil VBO Caching: Stencil VBOs are not cached for polygon clipping. In dynamic scenes, the 3D splitting and clipping often vary between frames, leading to frequent uploads of polygon clip vertices using glBufferData and glBufferSubData. On rpi3, these calls are notably slow. Even after addressing unnecessary splits and caching intermediate surfaces, rendering performance remains bad. Stencil logic for polygon clipping requires improvement. 3. Avoiding Stencil Use: In many cases, stencils could be avoided by rendering only specific portions of a layer/texture. Implementing this would require updated shaders and new antialiasing logic to allow for selective antialiased edges. 4. Improved Preprocessing in TextureMapper: Currently, TextureMapper processes the layer tree by rendering each layer on demand, which is often suboptimal for the OpenGL pipeline. Enhanced preprocessing and batching of rendering calls would provide significant performance benefits. I'll create separate tasks for each of these issues. Additionally, I was also thinking that it might be worth exploring potential adoption of Skia in TextureMapper. Skia's built-in batching, clipping, antialiasing, and effects handling could simplify many of above challenges. Also with its backends, smooth transition to Vulkan is also possible if desired.
Fujii Hironori
Comment 8 2024-12-02 12:14:59 PST
How about an idea adopting Chromium's CC? https://chromium.googlesource.com/chromium/src.git/+/refs/heads/main/cc/ Coltor Chen was doing that. https://webkit.slack.com/archives/CU64U6FDW/p1699517524700329?thread_ts=1639455227.471100&cid=CU64U6FDW Windows and Playstation ports are still using cairo. Not finished Skia migration yet. If you plan to use Skia in TextureMapper, please keep the old implementation by using #if USE(SKIA) or copying code. I will remove the lagacy code after Windows and Playstation ports finish the Skia migration.
Jani Hautakangas
Comment 9 2024-12-02 22:45:31 PST
Adopting Chromium's CC is an interesting idea. There were quite many different ideas in that discussion thread. I wonder what's the state of those nowadays. But I think adopting Chromium's cc is huge work and brings in a lot of code Regarding Skia in TextureMapper, it was more of a conceptual idea rather than a concrete plan. I was thinking bringing back TextureMapper interface. OpenGL implementation used to be in TextureMapperGL and Skia implementation would go into TextureMapperSkia. However, at the moment, I don’t have the time or resources to implement a Skia-based approach.
Nikolas Zimmermann
Comment 10 2024-12-05 08:09:03 PST
The focus should really be on the poster circle performance regression, @fuji. I’m afraid we will otherwise have to roll out all the patches, since we cannot afford that kind of regression on low end embed devices, even though it’s clearly more correct now. Jani, from all your optimization ideas, the top one would be the pre-rendering and you are trying this right now? Did I understand this correctly? I’m just trying to get a sense on how involved it is…
Jani Hautakangas
Comment 11 2024-12-05 08:30:28 PST
I'm focusing on fixing the regression and I have patches coming. The first patch is already under review (refer to bullet 1 in the above comments and Bug 284026). I plan to submit another patch soon to address bullet 2. Together, these patches should effectively resolve the most critical issues and are likely sufficient to close this bug. Given that the root causes are identified and fixable, I don't think a revert plan is necessary at this point.
Jani Hautakangas
Comment 12 2024-12-05 08:35:07 PST
See Bug 284027 for bullet 2
Jani Hautakangas
Comment 13 2024-12-09 00:34:40 PST
@magomez, could you check how the landed patches perform in your environment? I understand that your RPi3 setup is quite optimized. On my end, the RPi3 setup struggles to maintain 60fps with the landed patches and eventually stabilizes around 40fps. I'm currently working on Bug 284250, which improves the situation to some extent but doesn’t fully resolve it. The critical missing piece seems to be Bug 284279, which I’ve been prototyping. It immediately achieves 60fps on my setup. However, its implementation is a bit complex as it impacts almost all rendering paths. To address this, I plan to push it as a series of small, incremental patches to ensure it covers all scenarios smoothly, starting with the poster-circle scenario. In the meantime, it would be helpful to know if the already landed patches are sufficient to close this regression issue.
Jani Hautakangas
Comment 14 2024-12-17 02:12:16 PST
New approach in Bug 284250 fixes the regression and poster-circle has steady 60fps on RPi3.
Nikolas Zimmermann
Comment 15 2024-12-18 03:33:17 PST
Great news @jani! Thanks a lot for keeping working on it, and pushing towards a fix!
Note You need to log in before you can comment on or make changes to this bug.