[Chromium] Avoid clear and blending for root layers
Created attachment 115299 [details] Patch
Created attachment 115300 [details] Patch
This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? For non-composited content we don't need any opaque AA variants, since we don't draw the non-composited content tiles with the AA shaders. I'm not sure AA with a 1.0 alpha value makes too much sense anyway.
View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394 > m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT); Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point.
(In reply to comment #3) > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. Same goes with the layer opacity. I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. Is your concern the code complexity of replicating all these shaders? I think we could have good solutions there (e.g. combine shaders out of building blocks). > > For non-composited content we don't need any opaque AA variants, since we don't draw the non-composited content tiles with the AA shaders. I'm not sure AA with a 1.0 alpha value makes too much sense anyway.
(In reply to comment #3) > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? With the opaque shaders, we're just trying to avoid having to turn off alpha writes for the root layer, right? And, we only need that because we have one platform that trashes the alpha channel for text. And, that one platform doesn't really have this performance penalty. So, why don't we just only turn off alpha writes for that one platform? CCTiledLayerImpl::draw is getting ridiculous.
(In reply to comment #4) > View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review > > > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394 > > m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT); > > Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point. Wait, turning off blending for the root layer and turning off masking out of the alpha channel means that it does not have to read anything from the back buffer, right? And that is the purpose of my patch, i.e. instead of writing the source alpha values, just write out 1.0, so it should always be a perf win.
(In reply to comment #5) > (In reply to comment #3) > > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. > Same goes with the layer opacity. > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively). Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.
(In reply to comment #7) > (In reply to comment #4) > > View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review > > > > > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394 > > > m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT); > > > > Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point. > > Wait, turning off blending for the root layer and turning off masking out of the alpha channel means that it does not have to read anything from the back buffer, right? And that is the purpose of my patch, i.e. instead of writing the source alpha values, just write out 1.0, so it should always be a perf win. On a tiled architecture, the GPU has a piece of embedded memory / cache that is smaller than the frame buffer. It collects all screen-space draw operations (clear, triangles post transform etc.), map them to a set of tiles that fit into the cache, then goes for all the tiles: 1- read frame buffer from memory to cache 2- do all draw operations scissored to the tile 3- write frame buffer from cache to memory If you clear first, it can skip (1) because it's easy to know it will erase all the data. Otherwise, it would have to go through all the triangles and check whether or not they fully cover the tile which would be expensive, so it generally doesn't do that.
(In reply to comment #8) > (In reply to comment #5) > > (In reply to comment #3) > > > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? > > > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. > > Same goes with the layer opacity. > > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. > > > > > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively). We most likely are limited by pixel operations. It depends on the architecture, but it's not clear that a single texture load could fully hide the ALUs for both the opacity multiply and the alpha squelching. A benchmark would tell us... > > Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.
> On a tiled architecture, the GPU has a piece of embedded memory / cache that is smaller than the frame buffer. It collects all screen-space draw operations (clear, triangles post transform etc.), map them to a set of tiles that fit into the cache, then goes for all the tiles: > 1- read frame buffer from memory to cache > 2- do all draw operations scissored to the tile > 3- write frame buffer from cache to memory > > If you clear first, it can skip (1) because it's easy to know it will erase all the data. Otherwise, it would have to go through all the triangles and check whether or not they fully cover the tile which would be expensive, so it generally doesn't do that. Although I do believe you :) I feel like we should rather make the more naive assumption that clearing before rendering over the same area is potentially slower (and remove the clear as part of this patch). Ideally, this should be left to the driver or hardware to try to optimize these cases, as I find it counter-intuitive at the OpenGL API level.
(In reply to comment #10) > (In reply to comment #8) > > (In reply to comment #5) > > > (In reply to comment #3) > > > > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? > > > > > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. > > > Same goes with the layer opacity. > > > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. > > > > > > > > > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively). > > We most likely are limited by pixel operations. It depends on the architecture, but it's not clear that a single texture load could fully hide the ALUs for both the opacity multiply and the alpha squelching. > > A benchmark would tell us... > Maybe go for the below then: > > > > Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.
Created attachment 115453 [details] Patch
New patch. Would there be any assert() needed for AA?
Comment on attachment 115453 [details] Patch One principle I've tried to follow for programs is to eagerly initialize any programs that are likely needed on every page and to lazily initialize other programs. Eagerly initialized programs induce a load hit on every compositor initialization, but it's a lower overall cost than the cost to lazily initialize a program. The reason is that lazily initialized programs are created during drawLayers() so querying the uniform locations has to wait for all the uploads on that frame to complete. Currently the non-AA render surface program and non-AA tiler programs are eagerly initialized and all others are lazy. For programs that we want to use on root tiles we definitely want to eagerly initialize.
Comment on attachment 115453 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115453&action=review Daniel pointed out offline that he is eagerly initializing the non-swizzled program and lazily initializing the swizzled one, which matches what we currently do for other programs. Could you please move removing the compositor blue clear to a separate patch? We've had difficulties with that bit in particular and it's not inherently related to the other changes. > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:397 > +#if !defined ( NDEBUG ) we tend to do #ifndef NDEBUG > Source/WebCore/platform/graphics/chromium/ShaderChromium.h:185 > + String getShaderString() const; nit: this shouldn't have a "get" prefix, in WebKit style this function should be called 'shaderString()'. It looks like everything else in this file uses the get prefix as well though.
Created attachment 115465 [details] Patch
Comment on attachment 115465 [details] Patch R=me. We can tune doing a glClear() separately, IMO
Comment on attachment 115465 [details] Patch Clearing flags on attachment: 115465 Committed r100536: <http://trac.webkit.org/changeset/100536>
All reviewed patches have been landed. Closing bug.