72452 – [Chromium] Avoid color mask operations for root layers

RESOLVED FIXED72452

[Chromium] Avoid color mask operations for root layers

https://bugs.webkit.org/show_bug.cgi?id=72452

Summary [Chromium] Avoid color mask operations for root layers

Daniel Sievers

Reported 2011-11-15 19:25:51 PST

[Chromium] Avoid clear and blending for root layers

Attachments
Patch (25.51 KB, patch) 2011-11-15 19:30 PST, Daniel Sievers	no flags	Details Formatted Diff Diff
Patch (25.52 KB, patch) 2011-11-15 19:33 PST, Daniel Sievers	no flags	Details Formatted Diff Diff
Patch (16.54 KB, patch) 2011-11-16 14:43 PST, Daniel Sievers	no flags	Details Formatted Diff Diff
Patch (15.87 KB, patch) 2011-11-16 15:51 PST, Daniel Sievers	no flags	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Daniel Sievers

Comment 1 2011-11-15 19:30:23 PST

Created attachment 115299 [details] Patch

Daniel Sievers

Comment 2 2011-11-15 19:33:21 PST

Created attachment 115300 [details] Patch

James Robinson

Comment 3 2011-11-15 20:58:46 PST

This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? For non-composited content we don't need any opaque AA variants, since we don't draw the non-composited content tiles with the AA shaders. I'm not sure AA with a 1.0 alpha value makes too much sense anyway.

Antoine Labour

Comment 4 2011-11-16 10:15:47 PST

View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394 > m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT); Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point.

Antoine Labour

Comment 5 2011-11-16 10:26:33 PST

(In reply to comment #3) > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. Same goes with the layer opacity. I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. Is your concern the code complexity of replicating all these shaders? I think we could have good solutions there (e.g. combine shaders out of building blocks). > > For non-composited content we don't need any opaque AA variants, since we don't draw the non-composited content tiles with the AA shaders. I'm not sure AA with a 1.0 alpha value makes too much sense anyway.

Adrienne Walker

Comment 6 2011-11-16 11:25:53 PST

(In reply to comment #3) > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? With the opaque shaders, we're just trying to avoid having to turn off alpha writes for the root layer, right? And, we only need that because we have one platform that trashes the alpha channel for text. And, that one platform doesn't really have this performance penalty. So, why don't we just only turn off alpha writes for that one platform? CCTiledLayerImpl::draw is getting ridiculous.

Daniel Sievers

Comment 7 2011-11-16 11:48:52 PST

(In reply to comment #4) > View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review > > > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394 > > m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT); > > Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point. Wait, turning off blending for the root layer and turning off masking out of the alpha channel means that it does not have to read anything from the back buffer, right? And that is the purpose of my patch, i.e. instead of writing the source alpha values, just write out 1.0, so it should always be a perf win.

Daniel Sievers

Comment 8 2011-11-16 11:51:56 PST

(In reply to comment #5) > (In reply to comment #3) > > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. > Same goes with the layer opacity. > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively). Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.

Antoine Labour

Comment 9 2011-11-16 13:06:54 PST

(In reply to comment #7) > (In reply to comment #4) > > View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review > > > > > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394 > > > m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT); > > > > Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point. > > Wait, turning off blending for the root layer and turning off masking out of the alpha channel means that it does not have to read anything from the back buffer, right? And that is the purpose of my patch, i.e. instead of writing the source alpha values, just write out 1.0, so it should always be a perf win. On a tiled architecture, the GPU has a piece of embedded memory / cache that is smaller than the frame buffer. It collects all screen-space draw operations (clear, triangles post transform etc.), map them to a set of tiles that fit into the cache, then goes for all the tiles: 1- read frame buffer from memory to cache 2- do all draw operations scissored to the tile 3- write frame buffer from cache to memory If you clear first, it can skip (1) because it's easy to know it will erase all the data. Otherwise, it would have to go through all the triangles and check whether or not they fully cover the tile which would be expensive, so it generally doesn't do that.

Antoine Labour

Comment 10 2011-11-16 13:10:25 PST

(In reply to comment #8) > (In reply to comment #5) > > (In reply to comment #3) > > > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? > > > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. > > Same goes with the layer opacity. > > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. > > > > > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively). We most likely are limited by pixel operations. It depends on the architecture, but it's not clear that a single texture load could fully hide the ALUs for both the opacity multiply and the alpha squelching. A benchmark would tell us... > > Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.

Daniel Sievers

Comment 11 2011-11-16 13:19:24 PST

> On a tiled architecture, the GPU has a piece of embedded memory / cache that is smaller than the frame buffer. It collects all screen-space draw operations (clear, triangles post transform etc.), map them to a set of tiles that fit into the cache, then goes for all the tiles: > 1- read frame buffer from memory to cache > 2- do all draw operations scissored to the tile > 3- write frame buffer from cache to memory > > If you clear first, it can skip (1) because it's easy to know it will erase all the data. Otherwise, it would have to go through all the triangles and check whether or not they fully cover the tile which would be expensive, so it generally doesn't do that. Although I do believe you :) I feel like we should rather make the more naive assumption that clearing before rendering over the same area is potentially slower (and remove the clear as part of this patch). Ideally, this should be left to the driver or hardware to try to optimize these cases, as I find it counter-intuitive at the OpenGL API level.

Daniel Sievers

Comment 12 2011-11-16 13:20:23 PST

(In reply to comment #10) > (In reply to comment #8) > > (In reply to comment #5) > > > (In reply to comment #3) > > > > This seems like a lot of shaders. Did you consider controlling the alpha behavior with a uniform instead of a new shader? > > > > > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version. > > > Same goes with the layer opacity. > > > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible. > > > > > > > > > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively). > > We most likely are limited by pixel operations. It depends on the architecture, but it's not clear that a single texture load could fully hide the ALUs for both the opacity multiply and the alpha squelching. > > A benchmark would tell us... > Maybe go for the below then: > > > > Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.

Daniel Sievers

Comment 13 2011-11-16 14:43:15 PST

Created attachment 115453 [details] Patch

Daniel Sievers

Comment 14 2011-11-16 14:47:44 PST

New patch. Would there be any assert() needed for AA?

James Robinson

Comment 15 2011-11-16 14:59:55 PST

Comment on attachment 115453 [details] Patch One principle I've tried to follow for programs is to eagerly initialize any programs that are likely needed on every page and to lazily initialize other programs. Eagerly initialized programs induce a load hit on every compositor initialization, but it's a lower overall cost than the cost to lazily initialize a program. The reason is that lazily initialized programs are created during drawLayers() so querying the uniform locations has to wait for all the uploads on that frame to complete. Currently the non-AA render surface program and non-AA tiler programs are eagerly initialized and all others are lazy. For programs that we want to use on root tiles we definitely want to eagerly initialize.

James Robinson

Comment 16 2011-11-16 15:43:55 PST

Comment on attachment 115453 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=115453&action=review Daniel pointed out offline that he is eagerly initializing the non-swizzled program and lazily initializing the swizzled one, which matches what we currently do for other programs. Could you please move removing the compositor blue clear to a separate patch? We've had difficulties with that bit in particular and it's not inherently related to the other changes. > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:397 > +#if !defined ( NDEBUG ) we tend to do #ifndef NDEBUG > Source/WebCore/platform/graphics/chromium/ShaderChromium.h:185 > + String getShaderString() const; nit: this shouldn't have a "get" prefix, in WebKit style this function should be called 'shaderString()'. It looks like everything else in this file uses the get prefix as well though.

Daniel Sievers

Comment 17 2011-11-16 15:51:24 PST

Created attachment 115465 [details] Patch

James Robinson

Comment 18 2011-11-16 16:22:13 PST

Comment on attachment 115465 [details] Patch R=me. We can tune doing a glClear() separately, IMO

WebKit Review Bot

Comment 19 2011-11-16 17:30:05 PST

Comment on attachment 115465 [details] Patch Clearing flags on attachment: 115465 Committed r100536: <http://trac.webkit.org/changeset/100536>

WebKit Review Bot

Comment 20 2011-11-16 17:30:10 PST

All reviewed patches have been landed. Closing bug.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component New Bugs

Assignee

Nobody

Reported

2011-11-15 19:25 PST

Modified

2011-11-16 17:30 PST History

CC List

5 users Show

URL

Keywords

Depends on

Blocks