Bug 80870

Summary: [chromium] Use blit instead of copy for canvas presentation
Product: WebKit Reporter: Sami Kyostila <skyostil>
Component: CanvasAssignee: Sami Kyostila <skyostil>
Status: RESOLVED FIXED    
Severity: Normal CC: cc-bugs, danakj, enne, husky, jamesr, nduca, reveman, senorblanco, tomhudson, twiz, vangelis, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Sami Kyostila 2012-03-12 12:39:35 PDT
Use glDrawArrays() instead of glCopyTexSubImage2D() to copy the accelerated canvas back buffer to the front buffer, because the latter hits a very expensive slow path on some mobile GPUs.
Comment 1 Sami Kyostila 2012-03-13 13:02:21 PDT
Created attachment 131698 [details]
Patch

Still need to write some tests and make WebGL use TextureCopier.
Comment 2 Sami Kyostila 2012-03-16 13:46:46 PDT
Created attachment 132368 [details]
Patch
Comment 3 Sami Kyostila 2012-03-16 13:51:32 PDT
Left out WebGL from this patch as it requires some more involved state gymnastics with the rendering context.

With this patch the Canvas "full clear translation" score in https://github.com/sibblingz/HTML5-Game-Benchmarks improves from ~330 to ~660 on a z600/Quadro FX 380.
Comment 4 Iain Merrick 2012-03-19 07:36:27 PDT
Very nice.

I'm not sure about TextureCopierTest, as it more or less just lists the commands that the copier issues. Is that really giving us good coverage? It seems like we want to check it actually outputs the right thing, but I guess that's what (existing) layout tests are for.
Comment 5 Sami Kyostila 2012-03-19 07:44:06 PDT
(In reply to comment #4)
> I'm not sure about TextureCopierTest, as it more or less just lists the commands that the copier issues. Is that really giving us good coverage? It seems like we want to check it actually outputs the right thing, but I guess that's what (existing) layout tests are for.

Yeah, I agree the test isn't ideal. Checking the actual output would require us to run the unit tests on top of a real GL implementation, but as far as I can see that isn't currently possible.

The existing layout tests do exercise this code when run in GPU mode, so we do have coverage. Perhaps we should just drop TextureCopierTest entirely, or what do you think?
Comment 6 Iain Merrick 2012-03-19 07:54:22 PDT
There is value in just compiling and running code, to guard against silly compile errors.

Maybe simplify the test by removing some of the methods in the mock? I believe if you don't add a gmock macro for a method, it will just be totally ignored.
Comment 7 Sami Kyostila 2012-03-19 07:59:11 PDT
(In reply to comment #6)
> There is value in just compiling and running code, to guard against silly compile errors.
> 
> Maybe simplify the test by removing some of the methods in the mock? I believe if you don't add a gmock macro for a method, it will just be totally ignored.

Good idea, I'll try to reduce the mocked function set to bare essentials.
Comment 8 Sami Kyostila 2012-03-19 08:10:22 PDT
Created attachment 132586 [details]
Patch

- Simplified TextureCopierTest.
Comment 9 Iain Merrick 2012-03-19 08:37:03 PDT
Cool, I like it. Much clearer. Maybe also add a comment explaining that it's stripped-down, though. :)
Comment 10 Sami Kyostila 2012-03-19 09:00:57 PDT
Created attachment 132591 [details]
Patch

- Added note to TextureCopierTest explaining the scope.
Comment 11 Sami Kyostila 2012-03-23 07:33:00 PDT
Ping, anyone care to review -- or suggest a reviewer?
Comment 12 Stephen White 2012-03-23 10:33:55 PDT
(In reply to comment #11)
> Ping, anyone care to review -- or suggest a reviewer?
Comment 13 Stephen White 2012-03-23 10:37:41 PDT
This looks like good stuff, but it also overlaps with the work Jeff Timanus is currently doing to support fast texture copies at the command buffer level; see crbug.com/101051.  His work is a bit broader, in that it has to support WebGL->canvas copies as well.

It might be ok to have both paths (depending on your time horizon), but I thought I'd mention it before diving in.
Comment 14 Dana Jansens 2012-03-23 10:41:47 PDT
It is also worth mentioning we're interested in improving other rectangular blits in the compositor.

http://code.google.com/p/chromium/issues/detail?id=116204

For instance when we blit a surface to its target using LayerRendererChromium::drawTexturedQuad.
Comment 15 Sami Kyostila 2012-03-23 10:49:31 PDT
Sounds like there is lots of potential overlap between this patch and the other two bugs. I'll talk to Jeff to see if there's some unification we could do.

In the meantime it would be nice for something like this to get landed since it gives Chrome on Android a modest 60x boost on the benchmark mentioned in comment #3 :)
Comment 16 Stephen White 2012-03-23 11:05:59 PDT
Comment on attachment 132591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132591&action=review

Leaving for compositor folks to chime in.

> Source/WebCore/platform/graphics/chromium/ShaderChromium.cpp:427
> +String FragmentShaderRGBATexCopy::getShaderString() const

<bikeshed> I don't think this class should really have "Copy" in its name.  Maybe just FragmentShaderRGBATex? </bikeshed>

> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:51
> +        {0, 0},
> +        {1, 0},
> +        {1, 1},
> +        {0, 1}

You could re-use the same canonical vert buffer above for texcoords, if you don't mind doing a little math in the vert shader to turn it from 2x2 to unit.  Failing that, you could also interleave verts and texcoords, as the compositor does, which would at least save you some binding calls.

> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:78
> +    // Note: this code does not restore the viewport, bound program, 2D texture, framebuffer, buffer or blend enable.

I don't know what assumptions the compositor makes about continuity of GL state; compositor folks should probably verify that this is ok.

> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:83
> +    GLC(context, context->clear(GraphicsContext3D::COLOR_BUFFER_BIT));

Why do you need to clear?  Is the draw below not going to write to all pixels?

> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:109
> +    GLC(context, context->texParameteri(GraphicsContext3D::TEXTURE_2D, GraphicsContext3D::TEXTURE_MIN_FILTER, GraphicsContext3D::LINEAR));
> +    GLC(context, context->texParameteri(GraphicsContext3D::TEXTURE_2D, GraphicsContext3D::TEXTURE_MAG_FILTER, GraphicsContext3D::LINEAR));

Compositor folks:  check this please (does the compositor assume LINEAR?)
Comment 17 Sami Kyostila 2012-03-23 11:20:25 PDT
Comment on attachment 132591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132591&action=review

Thanks Stephen.

>> Source/WebCore/platform/graphics/chromium/ShaderChromium.cpp:427
>> +String FragmentShaderRGBATexCopy::getShaderString() const
> 
> <bikeshed> I don't think this class should really have "Copy" in its name.  Maybe just FragmentShaderRGBATex? </bikeshed>

Agreed, I'll change this.

>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:51
>> +        {0, 1}
> 
> You could re-use the same canonical vert buffer above for texcoords, if you don't mind doing a little math in the vert shader to turn it from 2x2 to unit.  Failing that, you could also interleave verts and texcoords, as the compositor does, which would at least save you some binding calls.

Good idea. I'll do the transformation in the vert shader, although that does make the shader a tad less general.

>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:78
>> +    // Note: this code does not restore the viewport, bound program, 2D texture, framebuffer, buffer or blend enable.
> 
> I don't know what assumptions the compositor makes about continuity of GL state; compositor folks should probably verify that this is ok.

We'll need to save and restore the state anyway for WebGL, but for now I chose to avoid the round trip.

>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:83
>> +    GLC(context, context->clear(GraphicsContext3D::COLOR_BUFFER_BIT));
> 
> Why do you need to clear?  Is the draw below not going to write to all pixels?

It's an optimization hint for tiling GPUs (e.g. Imagination SGX) so they know to discard the framebuffer contents before rendering. I was going to replace this with EXT_discard_framebuffer since tilers generally implement that, but did not around to it yet.

>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:109
>> +    GLC(context, context->texParameteri(GraphicsContext3D::TEXTURE_2D, GraphicsContext3D::TEXTURE_MAG_FILTER, GraphicsContext3D::LINEAR));
> 
> Compositor folks:  check this please (does the compositor assume LINEAR?)

AFAIK the compositor resets the filter to linear every time it draws a quad -- which probably should be fixed :P
Comment 18 Adrienne Walker 2012-03-23 12:43:46 PDT
Comment on attachment 132591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132591&action=review

>>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:78
>>> +    // Note: this code does not restore the viewport, bound program, 2D texture, framebuffer, buffer or blend enable.
>> 
>> I don't know what assumptions the compositor makes about continuity of GL state; compositor folks should probably verify that this is ok.
> 
> We'll need to save and restore the state anyway for WebGL, but for now I chose to avoid the round trip.

It looks like you're making these calls outside of a LayerRendererChromium frame.  We restore viewport, program, texture, framebuffer, buffer, and blend enable before each frame there.  LRC should probably be better documented, but I don't see anything wrong with this code.

>>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:83
>>> +    GLC(context, context->clear(GraphicsContext3D::COLOR_BUFFER_BIT));
>> 
>> Why do you need to clear?  Is the draw below not going to write to all pixels?
> 
> It's an optimization hint for tiling GPUs (e.g. Imagination SGX) so they know to discard the framebuffer contents before rendering. I was going to replace this with EXT_discard_framebuffer since tilers generally implement that, but did not around to it yet.

Maybe you should #ifdef this then? Extraneous clears will hurt on some platforms.

>>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:109
>>> +    GLC(context, context->texParameteri(GraphicsContext3D::TEXTURE_2D, GraphicsContext3D::TEXTURE_MAG_FILTER, GraphicsContext3D::LINEAR));
>> 
>> Compositor folks:  check this please (does the compositor assume LINEAR?)
> 
> AFAIK the compositor resets the filter to linear every time it draws a quad -- which probably should be fixed :P

It does set it to linear every time.
Comment 19 Nat Duca 2012-03-23 13:08:23 PDT
Haven't looked at this in detail, but I hope that the ability to do these types of copies is modular such that we can use this for tile blits as well.
Comment 20 Stephen White 2012-03-23 13:20:49 PDT
Comment on attachment 132591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132591&action=review

>>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:78
>>> +    // Note: this code does not restore the viewport, bound program, 2D texture, framebuffer, buffer or blend enable.
>> 
>> I don't know what assumptions the compositor makes about continuity of GL state; compositor folks should probably verify that this is ok.
> 
> We'll need to save and restore the state anyway for WebGL, but for now I chose to avoid the round trip.

Not sure I understand the WebGL comment:  wouldn't this always be called in the compositor context, even for WebGL?
Comment 21 Jeff Timanus 2012-03-23 13:55:18 PDT
(In reply to comment #13)
> This looks like good stuff, but it also overlaps with the work Jeff Timanus is currently doing to support fast texture copies at the command buffer level; see crbug.com/101051.  His work is a bit broader, in that it has to support WebGL->canvas copies as well.
> 
> It might be ok to have both paths (depending on your time horizon), but I thought I'd mention it before diving in.

This patch does overlap a bit with my planned work.  I was adding similar functionality to the GPU process to allow for glCopyTexImage-like functionality for BGRA-to-BGRA copies.

The main differences would be that:
  - The extension has to respect the pixel-storage flags during the copy: UNPACK_FLIP_Y_WEBGL, UN​PACK_PREMULTIPLY_ALPHA_WEBGL, ​UNPACK_COLORSPACE_CONVERSION_​WEBGL
  - The context state would be explicitly tracked and preserved in the GPU process.  The extension can be invoked without concern of corrupting bound framebuffers, textures, etc.

In the longer term, I think that the work here could be replaced by a call into this extension.  However, in the near term, I think that this change can go in, because it may be another week or two before I'm able to land my change.

I will be able to lift some of the shader work, and framebuffer initialization/drawing code from this change.
Comment 22 Sami Kyostila 2012-03-26 03:38:01 PDT
Comment on attachment 132591 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=132591&action=review

>>>>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:78
>>>>> +    // Note: this code does not restore the viewport, bound program, 2D texture, framebuffer, buffer or blend enable.
>>>> 
>>>> I don't know what assumptions the compositor makes about continuity of GL state; compositor folks should probably verify that this is ok.
>>> 
>>> We'll need to save and restore the state anyway for WebGL, but for now I chose to avoid the round trip.
>> 
>> It looks like you're making these calls outside of a LayerRendererChromium frame.  We restore viewport, program, texture, framebuffer, buffer, and blend enable before each frame there.  LRC should probably be better documented, but I don't see anything wrong with this code.
> 
> Not sure I understand the WebGL comment:  wouldn't this always be called in the compositor context, even for WebGL?

Thanks for the clarification Enne.

I guess I misunderstood how DrawingBuffer was being used with WebGL. If WebGL also uses the compositor context for presentation, then this code should work there as well.

>>>> Source/WebCore/platform/graphics/chromium/TextureCopier.cpp:83
>>>> +    GLC(context, context->clear(GraphicsContext3D::COLOR_BUFFER_BIT));
>>> 
>>> Why do you need to clear?  Is the draw below not going to write to all pixels?
>> 
>> It's an optimization hint for tiling GPUs (e.g. Imagination SGX) so they know to discard the framebuffer contents before rendering. I was going to replace this with EXT_discard_framebuffer since tilers generally implement that, but did not around to it yet.
> 
> Maybe you should #ifdef this then? Extraneous clears will hurt on some platforms.

Agreed, I'll restrict this to OS(ANDROID) for now.
Comment 23 Sami Kyostila 2012-03-26 03:55:15 PDT
(In reply to comment #19)
> Haven't looked at this in detail, but I hope that the ability to do these types of copies is modular such that we can use this for tile blits as well.

The copier itself is modular, but the interface would need some expansion to be useful with tile blitting -- mainly transform and blending support.

I guess fast pathing compatible tile blits to EXT_framebuffer_blit could yield some gains, but I'm also hoping to experiment with packing the tiles into a texture atlas to allow batching several blits into one.
Comment 24 Sami Kyostila 2012-03-26 04:10:27 PDT
(In reply to comment #21)
> This patch does overlap a bit with my planned work.  I was adding similar functionality to the GPU process to allow for glCopyTexImage-like functionality for BGRA-to-BGRA copies.
> 
> The main differences would be that:
>   - The extension has to respect the pixel-storage flags during the copy: UNPACK_FLIP_Y_WEBGL, UN​PACK_PREMULTIPLY_ALPHA_WEBGL, ​UNPACK_COLORSPACE_CONVERSION_​WEBGL
>   - The context state would be explicitly tracked and preserved in the GPU process.  The extension can be invoked without concern of corrupting bound framebuffers, textures, etc.
> 
> In the longer term, I think that the work here could be replaced by a call into this extension.  However, in the near term, I think that this change can go in, because it may be another week or two before I'm able to land my change.
> 
> I will be able to lift some of the shader work, and framebuffer initialization/drawing code from this change.

Thanks for the update. Doing this in the GPU process sounds like the way to go in the long term. For one, restoring the state will be much cheaper there, and it may avoid the need to expose some extensions all the way to the renderer.
Comment 25 Sami Kyostila 2012-03-26 04:14:49 PDT
Created attachment 133772 [details]
Patch

- Remove texcoord vertex buffer
- Only clear destination on Android
- Renamed FragmentShaderRGBATexCopy => FragmentShaderRGBATex
- Simplified test code
Comment 26 Adrienne Walker 2012-03-26 13:04:19 PDT
Comment on attachment 133772 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133772&action=review

> Source/WebCore/platform/graphics/chromium/TextureCopier.h:68
> +    OwnPtr<BlitProgram> m_blitProgram;

I think you might also need to handle this program's cleanup in LayerRendererChromium::cleanupSharedObjects.
Comment 27 Stephen White 2012-03-26 13:22:01 PDT
Comment on attachment 133772 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133772&action=review

> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:137
> +    m_frontTexture->allocate(updater.allocator());

Sorry I didn't catch this the first time around, but is this allocating a new texture ID each time?  Or calling glTexImage2D?  (I have no idea how the allocator works).  If so, this could be expensive.  Is there a way we could do this only on first call?
Comment 28 Sami Kyostila 2012-03-27 06:53:47 PDT
Comment on attachment 133772 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=133772&action=review

>> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:137
>> +    m_frontTexture->allocate(updater.allocator());
> 
> Sorry I didn't catch this the first time around, but is this allocating a new texture ID each time?  Or calling glTexImage2D?  (I have no idea how the allocator works).  If so, this could be expensive.  Is there a way we could do this only on first call?

Fair point. In this case m_frontTexture is a ManagedTexture whose allocation lifetime is controlled by a TextureManager. If there is enough GPU memory to go around, m_frontTexture will keep referring to the same GPU texture every time around and allocate() is a no-op.

>> Source/WebCore/platform/graphics/chromium/TextureCopier.h:68
>> +    OwnPtr<BlitProgram> m_blitProgram;
> 
> I think you might also need to handle this program's cleanup in LayerRendererChromium::cleanupSharedObjects.

Good catch! Fixed.
Comment 29 Sami Kyostila 2012-03-27 06:54:18 PDT
Created attachment 134049 [details]
Patch

- Cleanup TextureCopier in LRC::cleanupSharedObjects()
Comment 30 Stephen White 2012-03-27 10:49:45 PDT
Comment on attachment 134049 [details]
Patch

Looks good.  r=me
Comment 31 WebKit Review Bot 2012-03-27 11:18:59 PDT
Comment on attachment 134049 [details]
Patch

Clearing flags on attachment: 134049

Committed r112286: <http://trac.webkit.org/changeset/112286>
Comment 32 WebKit Review Bot 2012-03-27 11:19:06 PDT
All reviewed patches have been landed.  Closing bug.