Bug 72452 - [Chromium] Avoid color mask operations for root layers
Summary: [Chromium] Avoid color mask operations for root layers
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-15 19:25 PST by Daniel Sievers
Modified: 2011-11-16 17:30 PST (History)
5 users (show)

See Also:


Attachments
Patch (25.51 KB, patch)
2011-11-15 19:30 PST, Daniel Sievers
no flags Details | Formatted Diff | Diff
Patch (25.52 KB, patch)
2011-11-15 19:33 PST, Daniel Sievers
no flags Details | Formatted Diff | Diff
Patch (16.54 KB, patch)
2011-11-16 14:43 PST, Daniel Sievers
no flags Details | Formatted Diff | Diff
Patch (15.87 KB, patch)
2011-11-16 15:51 PST, Daniel Sievers
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Sievers 2011-11-15 19:25:51 PST
[Chromium] Avoid clear and blending for root layers
Comment 1 Daniel Sievers 2011-11-15 19:30:23 PST
Created attachment 115299 [details]
Patch
Comment 2 Daniel Sievers 2011-11-15 19:33:21 PST
Created attachment 115300 [details]
Patch
Comment 3 James Robinson 2011-11-15 20:58:46 PST
This seems like a lot of shaders.  Did you consider controlling the alpha behavior with a uniform instead of a new shader?

For non-composited content we don't need any opaque AA variants, since we don't draw the non-composited content tiles with the AA shaders.  I'm not sure AA with a 1.0 alpha value makes too much sense anyway.
Comment 4 Antoine Labour 2011-11-16 10:15:47 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review

> Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394
>      m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT);

Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point.
Comment 5 Antoine Labour 2011-11-16 10:26:33 PST
(In reply to comment #3)
> This seems like a lot of shaders.  Did you consider controlling the alpha behavior with a uniform instead of a new shader?

It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version.
Same goes with the layer opacity.
I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible.

Is your concern the code complexity of replicating all these shaders? I think we could have good solutions there (e.g. combine shaders out of building blocks).

> 
> For non-composited content we don't need any opaque AA variants, since we don't draw the non-composited content tiles with the AA shaders.  I'm not sure AA with a 1.0 alpha value makes too much sense anyway.
Comment 6 Adrienne Walker 2011-11-16 11:25:53 PST
(In reply to comment #3)
> This seems like a lot of shaders.  Did you consider controlling the alpha behavior with a uniform instead of a new shader?

With the opaque shaders, we're just trying to avoid having to turn off alpha writes for the root layer, right? And, we only need that because we have one platform that trashes the alpha channel for text.  And, that one platform doesn't really have this performance penalty.  So, why don't we just only turn off alpha writes for that one platform?

CCTiledLayerImpl::draw is getting ridiculous.
Comment 7 Daniel Sievers 2011-11-16 11:48:52 PST
(In reply to comment #4)
> View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review
> 
> > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394
> >      m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT);
> 
> Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point.

Wait, turning off blending for the root layer and turning off masking out of the alpha channel means that it does not have to read anything from the back buffer, right? And that is the purpose of my patch, i.e. instead of writing the source alpha values, just write out 1.0, so it should always be a perf win.
Comment 8 Daniel Sievers 2011-11-16 11:51:56 PST
(In reply to comment #5)
> (In reply to comment #3)
> > This seems like a lot of shaders.  Did you consider controlling the alpha behavior with a uniform instead of a new shader?
> 
> It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version.
> Same goes with the layer opacity.
> I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible.
> 


Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively).

Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.
Comment 9 Antoine Labour 2011-11-16 13:06:54 PST
(In reply to comment #7)
> (In reply to comment #4)
> > View in context: https://bugs.webkit.org/attachment.cgi?id=115300&action=review
> > 
> > > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:394
> > >      m_context->clear(GraphicsContext3D::COLOR_BUFFER_BIT);
> > 
> > Note: depending on the gpu architecture, doing the clear may actually be a performance improvement. For example, on a tiled architecture, clearing usually permits the driver to avoid loading in the previous frame's back buffer, saving bandwidth. But obviously on others, it has the cost of writing the back buffer with pixels that will be overwritten later. Not sure which way to go, maybe we'll want to make that an option at some point.
> 
> Wait, turning off blending for the root layer and turning off masking out of the alpha channel means that it does not have to read anything from the back buffer, right? And that is the purpose of my patch, i.e. instead of writing the source alpha values, just write out 1.0, so it should always be a perf win.

On a tiled architecture, the GPU has a piece of embedded memory / cache that is smaller than the frame buffer. It collects all screen-space draw operations (clear, triangles post transform etc.), map them to a set of tiles that fit into the cache, then goes for all the tiles:
1- read frame buffer from memory to cache
2- do all draw operations scissored to the tile
3- write frame buffer from cache to memory

If you clear first, it can skip (1) because it's easy to know it will erase all the data. Otherwise, it would have to go through all the triangles and check whether or not they fully cover the tile which would be expensive, so it generally doesn't do that.
Comment 10 Antoine Labour 2011-11-16 13:10:25 PST
(In reply to comment #8)
> (In reply to comment #5)
> > (In reply to comment #3)
> > > This seems like a lot of shaders.  Did you consider controlling the alpha behavior with a uniform instead of a new shader?
> > 
> > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version.
> > Same goes with the layer opacity.
> > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible.
> > 
> 
> 
> Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively).

We most likely are limited by pixel operations. It depends on the architecture, but it's not clear that a single texture load could fully hide the ALUs for both the opacity multiply and the alpha squelching.

A benchmark would tell us...

> 
> Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.
Comment 11 Daniel Sievers 2011-11-16 13:19:24 PST
> On a tiled architecture, the GPU has a piece of embedded memory / cache that is smaller than the frame buffer. It collects all screen-space draw operations (clear, triangles post transform etc.), map them to a set of tiles that fit into the cache, then goes for all the tiles:
> 1- read frame buffer from memory to cache
> 2- do all draw operations scissored to the tile
> 3- write frame buffer from cache to memory
> 
> If you clear first, it can skip (1) because it's easy to know it will erase all the data. Otherwise, it would have to go through all the triangles and check whether or not they fully cover the tile which would be expensive, so it generally doesn't do that.


Although I do believe you :) I feel like we should rather make the more naive assumption that clearing before rendering over the same area is potentially slower (and remove the clear as part of this patch). Ideally, this should be left to the driver or hardware to try to optimize these cases, as I find it counter-intuitive at the OpenGL API level.
Comment 12 Daniel Sievers 2011-11-16 13:20:23 PST
(In reply to comment #10)
> (In reply to comment #8)
> > (In reply to comment #5)
> > > (In reply to comment #3)
> > > > This seems like a lot of shaders.  Did you consider controlling the alpha behavior with a uniform instead of a new shader?
> > > 
> > > It's a tradeoff. Adding a uniform and dynamically controlling this behavior will slow down all shaders. It's not even clear that it'd be a win over the current version.
> > > Same goes with the layer opacity.
> > > I bet no more than a couple of shaders are used for 90% of the pixels we draw, and I think it's worth optimizing those as much as possible.
> > > 
> > 
> > 
> > Although if we only need this for the very simple shaders (and not AA), we are probably not limited by the pixel operations there, so I could probably just add a uniform to be added to the alpha channel (and set it to 0 or 1 respectively).
> 
> We most likely are limited by pixel operations. It depends on the architecture, but it's not clear that a single texture load could fully hide the ALUs for both the opacity multiply and the alpha squelching.
> 
> A benchmark would tell us...
> 

Maybe go for the below then:

> > 
> > Alternatively, I could simply reduce the current patch to only add the new shader variants for simlpe and swizzled.
Comment 13 Daniel Sievers 2011-11-16 14:43:15 PST
Created attachment 115453 [details]
Patch
Comment 14 Daniel Sievers 2011-11-16 14:47:44 PST
New patch. Would there be any assert() needed for AA?
Comment 15 James Robinson 2011-11-16 14:59:55 PST
Comment on attachment 115453 [details]
Patch

One principle I've tried to follow for programs is to eagerly initialize any programs that are likely needed on every page and to lazily initialize other programs.  Eagerly initialized programs induce a load hit on every compositor initialization, but it's a lower overall cost than the cost to lazily initialize a program.  The reason is that lazily initialized programs are created during drawLayers() so querying the uniform locations has to wait for all the uploads on that frame to complete.

Currently the non-AA render surface program and non-AA tiler programs are eagerly initialized and all others are lazy.  For programs that we want to use on root tiles we definitely want to eagerly initialize.
Comment 16 James Robinson 2011-11-16 15:43:55 PST
Comment on attachment 115453 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115453&action=review

Daniel pointed out offline that he is eagerly initializing the non-swizzled program and lazily initializing the swizzled one, which matches what we currently do for other programs.

Could you please move removing the compositor blue clear to a separate patch? We've had difficulties with that bit in particular and it's not inherently related to the other changes.

> Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:397
> +#if !defined ( NDEBUG )

we tend to do #ifndef NDEBUG

> Source/WebCore/platform/graphics/chromium/ShaderChromium.h:185
> +    String getShaderString() const;

nit: this shouldn't have a "get" prefix, in WebKit style this function should be called 'shaderString()'.  It looks like everything else in this file uses the get prefix as well though.
Comment 17 Daniel Sievers 2011-11-16 15:51:24 PST
Created attachment 115465 [details]
Patch
Comment 18 James Robinson 2011-11-16 16:22:13 PST
Comment on attachment 115465 [details]
Patch

R=me. We can tune doing a glClear() separately, IMO
Comment 19 WebKit Review Bot 2011-11-16 17:30:05 PST
Comment on attachment 115465 [details]
Patch

Clearing flags on attachment: 115465

Committed r100536: <http://trac.webkit.org/changeset/100536>
Comment 20 WebKit Review Bot 2011-11-16 17:30:10 PST
All reviewed patches have been landed.  Closing bug.