Summary: | [GTK] Webpages completely slow down when CSS blur filter is in use | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | tri.voxel | ||||
Component: | WebKitGTK | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | NEW --- | ||||||
Severity: | Normal | CC: | adam.gamble, bugs-noreply, cgarcia, Hironori.Fujii, kdwkleung, magomez, mcatanzaro, nekohayo, zdobersek | ||||
Priority: | P3 | Keywords: | Gtk | ||||
Version: | WebKit Nightly Build | ||||||
Hardware: | PC | ||||||
OS: | Linux | ||||||
Bug Depends on: | |||||||
Bug Blocks: | 245783 | ||||||
Attachments: |
|
Description
tri.voxel
2022-05-24 04:23:00 PDT
I should also note, after the element has been loaded for a very long time, it may begin to run smoothly again. However, resizing the element or the window will result in the page slowing down again. Pretty sure somebody else reported this on a different bug, though I can't immediately find it. At least you're not alone.... It seems the blur filter done by the cpu is slow. We might try to ensure it's always done by the GPU, but unfortunately it doesn't seem to work properly. Mac uses the accelerate library that I guess makes it a lot faster. I'll submit a test case. Created attachment 462701 [details]
Test case
There are two files in the tarball:
- test-cpu.html: it has several images with blur filter. It's slow, it's easier to reproduce by resizing the window rather than just scrolling.
- test-gpu: it's the same but adding a translateZ transform to force the images to be accelerated. Performance is much better, but there are two issues:
+ Rendering is ot the same as the sofotware version, it looks like pixelated.
+ Images disappear when scrolling and they aren't rendered anymore even if the window is resized.
The pixelated issue is tracked by bug#231653. I've been giving a look for a while to how the blur filter works with opengl, but I need to change to another task, so I'll do a brain dump here of the things that I found out to catch up when I have more time. The blur filter is applied as an weighted average of the values of the pixels around the target pixel. These weights are calculated as from a gaussian distribution, so the weight is smaller as they get further from the target pixel. The filter radius specifies how many pixels to the lest/right/above/below are taken into account for the calculation. Instead of accessing all the pixel values that would be needed for each pixel, it's common that this is implemented as a 2 pass rendering, where during the first pass only the horizontal pixels are taken into account, and then during the second pass only the vertical pixels are taken into account. The combination of these two passes produces the same result with less accesses to the pixels, which improves the performance. This is what the TextureMapper implementation does. Then, regarding the specific implementation details: - TextureMapperGL is the one creating the gauss values and passing them to the TextureMapperShaderProgram in prepareFilterProgram. - In theory, we should be sampling as many pixels in each direction as the radius, but we're not doing so. We are always sampling 10 pixels in each direction (defined in GaussianKernelHalfWidth, which is 11 but it includes the target pixel, so it means 10 really). This means that we sample 10 pixels, the original pixel and then another 10 pixels, and this happens in both passes, so it's done horizontally and vertically. - If the radius is somewhat close to those 10 pixels, the result is going to be quite similar to the cairo implementation. But as the radius gets bigger and bigger, the result gets more different and the pixelated effect starts to show. This happens because as the radius grow, those 10 pixels that we sample get more and more separated among themselves, so we are sampling pixels that are not that close or related to the original pixel, so their contents are not related. Example to show the effect: 20x20 image, getting the result for pixel (10,10) by sampling 3 pixels (one to the left and one to the right, instead of the ten that we have in the real implementation). The gauss weights are (0.25, 0.5, 0.25). Talking only about the horizontal pass, but the vertical one has the same problem. - if the radius is one, we're sampling pixels (9,10), (10,10) and (11,10) with the assigned weights, which is perfect, as we're averaging the value of nearby pixels. - if the radius grows to 5, then we are sampling pixels (5,10), (10,10) and (15,10). But those pixels are not that close to the target pixel, so their content is not that related to it, and they will corrupt the result. Keep in mind that in the normal case, the second biggest weight is used for the pixels that are besides the target one, and in this case it's applied to a pixel far from the target one. The test attached by Carlos uses a radius of 60px. This means that we're sampling 10 pixels to each side, one every (60 pixels / 10 samples) 6 pixels, which shows the strange pixelated effect. As the radius gets reduced, the result gets closer and closed to the cairo one, because the pixels sampled get closer and closer to the nearby ones. There's also a detail in the implementation that I think it's a bug, and it's definition of GaussianKernelStep to 0.2. This defines the advance for each sample. If we're using 10 samples in each direction it should be 0.1, as the position gets calculated as i*step*radius (i=1..10). But being it 0.2, this means that we're sampling up to the double of the radius, which makes the result even worse. So, after all this mess, what's the proper fix to this? - First would be changing that GaussianKernelStep from 0.2 to 0.1. That will keep the result fine for bigger radius. - Eventually the radius will get big enough to cause the buggy rendering. There are several options here: * Increase the number of samples to match the radius. This would produce the perfect result always, but the computational cost will grow a lot as the radius grows. * Increase the number of samples from the current 10 to something bigger, like 30. This would make the result fine fine for bigger radius values, but there will be a point where the result will get buggy anyway. * I saw that some implementations use a trick here. Taking advantage of the interpolation done by opengl when copying textures to different sizes, what they do is they reduce the size of the image to filter and the radius to match the number of pixels that we're sampling. So if the image is 100x100 and we want to use a radius of 40, that's the same than sampling and image of 50x50 with a radius of 20, and the same as an image of 25x25 with a radius of 10, that we can do with our current 10 samples!. I think this is the way to go, despite it requires some not so obvious changes to TextureMapperLayer to perform this downscaling.
> - In theory, we should be sampling as many pixels in each direction as the
> radius, but we're not doing so. We are always sampling 10 pixels in each
> direction (defined in GaussianKernelHalfWidth, which is 11 but it includes
> the target pixel, so it means 10 really). This means that we sample 10
> pixels, the original pixel and then another 10 pixels, and this happens in
> both passes, so it's done horizontally and vertically.
I mean 10 pixels to each side of the target pixel. So 10 to the left and 10 to the right.
Pixelated issue seems to be solved by a PR. Would it be possible to use GPU for blur now? Yes. bug#231653 is fixed. But, you have to add a CSS property will-change:transform or transform:translateZ(0) to a element to composite with GPU. (In reply to Carlos Garcia Campos from comment #3) > It seems the blur filter done by the cpu is slow. We might try to ensure > it's always done by the GPU, but unfortunately it doesn't seem to work > properly. Mac uses the accelerate library that I guess makes it a lot > faster. I'll submit a test case. I was referring to this comment. It would be great if the GPU can handle all blurs so as to take the load off the CPU entirely, and greatly improve performance. I see a number of related fixes in bug #261022, bug #261101, bug #261187, and bug #261102. These are all present in 2.41.92. Please check and see if the problem is resolved. (In reply to Carlos Garcia Campos from comment #4) > Created attachment 462701 [details] > Test case > > There are two files in the tarball: > > - test-cpu.html: it has several images with blur filter. It's slow, it's > easier to reproduce by resizing the window rather than just scrolling. > > - test-gpu: it's the same but adding a translateZ transform to force the > images to be accelerated. Performance is much better, but there are two > issues: > + Rendering is ot the same as the sofotware version, it looks like > pixelated. > + Images disappear when scrolling and they aren't rendered anymore even if > the window is resized. I tried these test files with 2.41.92 (but not 2.41.91, so I don't know how bad it was before). I didn't notice any performance problems with either test. But with the test-gpu, I noticed that rendering breaks after I scroll down and then scroll back up. The web content becomes all white. Should I report a new bug for this? This should be fixed with Skia |