Bug 85219 - [Chromium] Threaded compositor: canvas layers go blank when when out of GPU mem
Summary: [Chromium] Threaded compositor: canvas layers go blank when when out of GPU mem
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Canvas (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Justin Novosad
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-30 11:27 PDT by Justin Novosad
Modified: 2013-04-15 07:14 PDT (History)
12 users (show)

See Also:


Attachments
Patch (21.16 KB, patch)
2012-05-01 14:25 PDT, Justin Novosad
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Novosad 2012-04-30 11:27:48 PDT
This is the WebKit counterpart of: http://code.google.com/p/chromium/issues/detail?id=125192

The threaded compositor causes canvas layers to be double buffered by default.  When low on GPU memory, the compositor may fail to allocate textures it needs to render different layers.

To illustrate the problem, run the fireflies IE demo ( http://ie.microsoft.com/testdrive/Performance/Fireflies/Default.html ) with a full-screen chromium browser window, with threaded compositing enabled.  Result: many layers of the demo fail to render.
Comment 1 Justin Novosad 2012-05-01 14:25:44 PDT
Created attachment 139680 [details]
Patch
Comment 2 Dana Jansens 2012-05-01 14:55:00 PDT
Comment on attachment 139680 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=139680&action=review

Hey Justin, our reviewers are out this week but here's my thoughts on the patch :)

> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:193
> +            ASSERT(0);

ASSERT_NOT_REACHED()

> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.h:69
> +    bool useDoubleBuffering() const {return m_useDoubleBuffering;}

Please insert a space after { and before }

> Source/WebCore/platform/graphics/chromium/TiledLayerChromium.cpp:395
> +                if (layerTreeHost() && layerTreeHost()->allowDoubleBuffering())
> +                    layerTreeHost()->disableDoubleBufferingAndRestartUpdate();

Can't this just return here and avoid the else clause? We're going to update again and come back to this layer right?

> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:610
> +            updater.clear();

I feel like this updater.clear() is kind of hidden, and would be more obvious back in updateLayers() when we attempt to updatelayers(rootLayer) a second time. Wdyt?

> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.h:222
> +    bool allowDoubleBuffering() {return m_allowDoubleBuffering;}
> +    void disableDoubleBufferingAndRestartUpdate();

Can these names be more specific to canvas layers?

> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.h:247
> +    void resetAllowDoubleBuffering();

ditto.

> Source/WebKit/chromium/tests/Canvas2DLayerChromiumTest.cpp:92
> +        if (m_useMockTextureManager)
> +            return adoptPtr<TextureManager>(new MockTextureManager);
> +        return TextureManager::create(HighLimitBytes, ReclaimLimitBytes, MaxTextureSize);

It seems like you always use a mock texture manager? I think that would be preferable, and then I think you can avoid the changes in TextureManager? There's some big changes coming in TextureManager and it'd be nice to keep this isolated from them.

Can you stick the texture manager in CCTiledLayerTestCommon.h with the other FakeTexture*?
Comment 3 Dana Jansens 2012-05-01 14:57:19 PDT
Comment on attachment 139680 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=139680&action=review

>> Source/WebKit/chromium/tests/Canvas2DLayerChromiumTest.cpp:92
>> +        return TextureManager::create(HighLimitBytes, ReclaimLimitBytes, MaxTextureSize);
> 
> It seems like you always use a mock texture manager? I think that would be preferable, and then I think you can avoid the changes in TextureManager? There's some big changes coming in TextureManager and it'd be nice to keep this isolated from them.
> 
> Can you stick the texture manager in CCTiledLayerTestCommon.h with the other FakeTexture*?

Oh, I did read more after this comment, and I see that you do need to make a subclass the way you're testing this, so ignore the first part of this comment. :) But if you do always use the Mock (it seems like you do?) then dropping the bool and the TextureManager::create would be nice.
Comment 4 WebKit Review Bot 2012-05-01 15:00:02 PDT
Comment on attachment 139680 [details]
Patch

Attachment 139680 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/12597532
Comment 5 Justin Novosad 2012-05-02 07:05:17 PDT
(In reply to comment #2)
> > Source/WebCore/platform/graphics/chromium/TiledLayerChromium.cpp:395
> > +                if (layerTreeHost() && layerTreeHost()->allowDoubleBuffering())
> > +                    layerTreeHost()->disableDoubleBufferingAndRestartUpdate();
> 
> Can't this just return here and avoid the else clause? We're going to update again and come back to this layer right?
> 

When we come back (with double buffering disabled), that is when the else clause becomes useful.  The logic is: if a tile texture allocation still fails  after we disable double buffering, then the tile should silently fail.


> > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.h:222
> > +    bool allowDoubleBuffering() {return m_allowDoubleBuffering;}
> > +    void disableDoubleBufferingAndRestartUpdate();
> 
> Can these names be more specific to canvas layers?

They could.  Although it is not currently the case, this is intended to eventually apply to WebGL layers as well.  But untils that happens, I agree with you.

> It seems like you always use a mock texture manager? 

The new test I added always use the mock texture manager, but the the test that was there before need to use the fake layer tree host without mocking the texture manager, so that calls propagate all the way down to a mock texture allocator.

If there is an easy way to avoid a collision with the upcoming changes to the texture manager, I'm glad to play along.  If the change are imminent, I can put this patch on hold.  All I really need is a way to predictably simulate a failed texture allocation.  I think I could achieve that by using the real TextureManager and playing with the canvas sizes and the texture memory limits, to create conditions that will cause allocation to fail.
Comment 6 Justin Novosad 2012-05-02 08:15:53 PDT
(In reply to comment #5)
> (In reply to comment #2)

> > > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.h:222
> > > +    bool allowDoubleBuffering() {return m_allowDoubleBuffering;}
> > > +    void disableDoubleBufferingAndRestartUpdate();
> > 
> > Can these names be more specific to canvas layers?
> 
> They could.  Although it is not currently the case, this is intended to eventually apply to WebGL layers as well.  But untils that happens, I agree with you.

Referring specifically to canvas layers feels a little uncomfortable.  It comes down to deciding which code smell is the least unpleasant, Speculative Generality or Inappropriate Intimacy?  :-)
Comment 7 Dana Jansens 2012-05-02 09:19:04 PDT
(In reply to comment #5)
> (In reply to comment #2)
> > > Source/WebCore/platform/graphics/chromium/TiledLayerChromium.cpp:395
> > > +                if (layerTreeHost() && layerTreeHost()->allowDoubleBuffering())
> > > +                    layerTreeHost()->disableDoubleBufferingAndRestartUpdate();
> > 
> > Can't this just return here and avoid the else clause? We're going to update again and come back to this layer right?
> > 
> 
> When we come back (with double buffering disabled), that is when the else clause becomes useful.  The logic is: if a tile texture allocation still fails  after we disable double buffering, then the tile should silently fail.

Oh I see. I think we could avoid nesting in an else with a return though?

if (lth() && lth()->allowDB()) {
  ldh()->disableDBandRestart();
  return;
}
oldcodeasbefore? second time around should land here.

> > It seems like you always use a mock texture manager? 
> 
> The new test I added always use the mock texture manager, but the the test that was there before need to use the fake layer tree host without mocking the texture manager, so that calls propagate all the way down to a mock texture allocator.
> 
> If there is an easy way to avoid a collision with the upcoming changes to the texture manager, I'm glad to play along.  If the change are imminent, I can put this patch on hold.  All I really need is a way to predictably simulate a failed texture allocation.  I think I could achieve that by using the real TextureManager and playing with the canvas sizes and the texture memory limits, to create conditions that will cause allocation to fail.

Ah I see! There's some tests in TiledLayerChromiumTest.cpp that do this sort of thing, if you like. Specifically "pushTilesAfterIdlePaintFailed". I'm not sure how imminent the TM changes are going to be though - I think it's likely this will land first if the Reviewers-That-Be approve of the update-restarting. It is nice if a test can have a smaller interface to the code though, as it requires less test churn over time.

(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #2)
> 
> > > > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.h:222
> > > > +    bool allowDoubleBuffering() {return m_allowDoubleBuffering;}
> > > > +    void disableDoubleBufferingAndRestartUpdate();
> > > 
> > > Can these names be more specific to canvas layers?
> > 
> > They could.  Although it is not currently the case, this is intended to eventually apply to WebGL layers as well.  But untils that happens, I agree with you.
> 
> Referring specifically to canvas layers feels a little uncomfortable.  It comes down to deciding which code smell is the least unpleasant, Speculative Generality or Inappropriate Intimacy?  :-)

Aha.. WebGL aka CCTextureLayer. I guess keeping them general is probably fine then :) 

We could potentially use this for tiled layers too in theory.. disabling tile double-buffering (and skipping frames until the commit completed). 

Speaking of which! I think after the update restarts there are some issues unaddressed by this patch.

I wonder if we need to do a bit of cleanup before the next update, in addition to updater.clear()? Do we need to get in a state where we can pretend we're coming back into a new fresh frame? State from the failed update is currently persisting into the new update.
  - Things in commitComplete() may apply. I think m_contentsTextureManager->unprotectAllTextures(); should be called. Clearing the double-buffer list should not though, as those textures are in use on impl.
  - Things done in updateLayers(rootLayer, updater) before the paintLayerContents may apply as well. I think that resetting the partial update state and calling reserveTextures() are what's needed there. Maybe these steps could be made more clear/resilient to churn with some helper functions?

I think I've convinced myself that for tiled layers, double-buffered updates will work, but I would appreciate a unit test to verify this and maintain it in the future. The behaviour of DB tile updates would look something like:
  - Pass 1: Tile A marked for double buffer update.
  - Pass 1: Tile A's texture T1 stolen from A and given to CCTLH.
  - Pass 1: Tile A's given new texture T2 (invalid texture)
  - Pass 1: Tile A reserves T2 (now a valid texture)
  - Pass 1: Some other layer OOMs.
  - Pass 1: pushPropertiesTo is *not* called yet (phew.)
  - Pass 2: Tile A's texture T2 is valid, A is dirty, but A is not "inUseOnImpl" so it will not double buffer a second time (yay!)

A unit test that OOMs on a second layer, and does a partial update and double-buffered update on the first layer, and verifies the partial update still actually happens, the double-buffered update still actually happens, and an extra 3rd texture is not allocated for the double-buffered update would be very comforting.
Comment 8 Justin Novosad 2012-05-02 09:56:59 PDT
(In reply to comment #7)

> Speaking of which! I think after the update restarts there are some issues unaddressed by this patch.
> 
> I wonder if we need to do a bit of cleanup before the next update, in addition to updater.clear()? Do we need to get in a state where we can pretend we're coming back into a new fresh frame? State from the failed update is currently persisting into the new update.
>   - Things in commitComplete() may apply. I think m_contentsTextureManager->unprotectAllTextures(); should be called. Clearing the double-buffer list should not though, as those textures are in use on impl.
>   - Things done in updateLayers(rootLayer, updater) before the paintLayerContents may apply as well. I think that resetting the partial update state and calling reserveTextures() are what's needed there. Maybe these steps could be made more clear/resilient to churn with some helper functions?

This is an excellent point. I know that the way it is now (just clearing the updater) is fine for canvas2d layers. But I confess that I haven't done my homework WRT other LayerChromium subclasses. I think that if updateLayers called itself recursively when a restart is requested would be a robust and maintainable way to make sure everything that needs to be reset gets reset for the second update.  What do you think? 

BTW, I decided to rename allowDoubleBuffering -> preferDoubleBuffering since different layer sub classes may or may not take it into account.
Comment 9 Dana Jansens 2012-05-02 10:23:43 PDT
(In reply to comment #8)
> (In reply to comment #7)
> 
> > Speaking of which! I think after the update restarts there are some issues unaddressed by this patch.
> > 
> > I wonder if we need to do a bit of cleanup before the next update, in addition to updater.clear()? Do we need to get in a state where we can pretend we're coming back into a new fresh frame? State from the failed update is currently persisting into the new update.
> >   - Things in commitComplete() may apply. I think m_contentsTextureManager->unprotectAllTextures(); should be called. Clearing the double-buffer list should not though, as those textures are in use on impl.
> >   - Things done in updateLayers(rootLayer, updater) before the paintLayerContents may apply as well. I think that resetting the partial update state and calling reserveTextures() are what's needed there. Maybe these steps could be made more clear/resilient to churn with some helper functions?
> 
> This is an excellent point. I know that the way it is now (just clearing the updater) is fine for canvas2d layers. But I confess that I haven't done my homework WRT other LayerChromium subclasses. I think that if updateLayers called itself recursively when a restart is requested would be a robust and maintainable way to make sure everything that needs to be reset gets reset for the second update.  What do you think? 

I think that sounds toward a clean approach, but I think you need to be careful about calling calcDrawTransformsEtc() again. If animations are running this may not give you back the same result. If we are unprotecting all textures then it should be alright, but maybe we want to just avoid the work in updateLayers(rootLayer, updater) above "// Reset partial texture update requests."? Take a look at CCLTHImpl.cpp, you'll see that same code has been recently broken out into a helper function there to allow for similar control of when it is called. Maybe we should do the same here, and call it explicitly from updateLayers(), before updateLayers(rootLayer, updater)?

> BTW, I decided to rename allowDoubleBuffering -> preferDoubleBuffering since different layer sub classes may or may not take it into account.

I have another question.. why do we restart the whole update and turn off double buffering for *all* canvas layers when we hit this problem? And why do we do this at all when we run out of texture memory in TiledLayerChromium? Why do we not just switch the current canvas layer when we have a problem, and turn off double buffering on canvas layers from that point forward? Maybe have CanvasLayer2DChromium::update() recurse and try again without double buffering or something?
Comment 10 Justin Novosad 2012-05-02 10:24:17 PDT
(In reply to comment #8)
> (In reply to comment #7)
> I think that if updateLayers called itself recursively when a restart is requested would be a robust and maintainable way to make sure everything that needs to be reset gets reset for the second update.  What do you think? 
> 

I just realized that won't work because the state init we are worrying about is upstream.  What I could do is restart the commit if that is reasonable. I think the layer update is the first task of any significance that gets performed within a commit, so restarting the whole commit should impose much extra overhead.  Does that sound reasonable?
Comment 11 Dana Jansens 2012-05-02 10:27:30 PDT
(In reply to comment #10)
> (In reply to comment #8)
> > (In reply to comment #7)
> > I think that if updateLayers called itself recursively when a restart is requested would be a robust and maintainable way to make sure everything that needs to be reset gets reset for the second update.  What do you think? 
> > 
> 
> I just realized that won't work because the state init we are worrying about is upstream.  What I could do is restart the commit if that is reasonable. I think the layer update is the first task of any significance that gets performed within a commit, so restarting the whole commit should impose much extra overhead.  Does that sound reasonable?

I had thought about that also, but it seemed like then you have to return a tristate back up to the proxy (update not possible | update needs restart | update completed), and dealing with commitComplete seemed more complicated.

I think the state init upstream is not a problem? I'm more worried about state that is constructed during updateLayers(). Any state from inside updateLayers() should be discarded, or considered carefully, when reentering updateLayers().
Comment 12 Justin Novosad 2012-05-02 10:38:59 PDT
> I have another question.. why do we restart the whole update and turn off double buffering for *all* canvas layers when we hit this problem? 

Once a single canvas layer is switched to single buffering. We are force to impose synchronization between the main thread and the impl thread to avoid compositing partially rendered canvas frames.  Doing this negates the benefit of double buffering other layers.  Given that we know we are low on GPU memory at this point, disabling double buffering globally makes sense.

> And why do we do this at all when we run out of texture memory in TiledLayerChromium? Why do we not just switch the current canvas layer when we have a problem, and turn off double buffering on canvas layers from that point forward? Maybe have CanvasLayer2DChromium::update() recurse and try again without double buffering or something?

When we run out of texture memory in TiledLayerChromium, what exactly is "the current canvas layer"?  The only way to make more GPU memory available for the current TiledLayerChromium is to reduce the GPU mem consumption of layers that were updated before it.  That requires switching layers to single buffering after they have been updated.  Hence the need to restart the update.  In some cases the restart is futile: when there are no earlier updated layers that are double buffered canvases.
Comment 13 Dana Jansens 2012-05-02 10:49:17 PDT
(In reply to comment #12)
> > I have another question.. why do we restart the whole update and turn off double buffering for *all* canvas layers when we hit this problem? 
> 
> Once a single canvas layer is switched to single buffering. We are force to impose synchronization between the main thread and the impl thread to avoid compositing partially rendered canvas frames.  Doing this negates the benefit of double buffering other layers.  Given that we know we are low on GPU memory at this point, disabling double buffering globally makes sense.

Ok I see what you mean. What if we just let it waste those resources on the double buffers it had already used for the current frame, we mark it as not using double buffer any more, and we mark setNeedsCommit() to say "try again without double buffering", but don't abort the current update and just go ahead with single buffer from this point on?

The downside I see to this is that we get a frame where fewer tiled layers are drawn than we could actually draw once double buffering is turned off. But we can OOM and skip layers for plenty of other reasons, and I don't see this as being a particularly special case for tiled layers?

> > And why do we do this at all when we run out of texture memory in TiledLayerChromium? Why do we not just switch the current canvas layer when we have a problem, and turn off double buffering on canvas layers from that point forward? Maybe have CanvasLayer2DChromium::update() recurse and try again without double buffering or something?
> 
> When we run out of texture memory in TiledLayerChromium, what exactly is "the current canvas layer"?

Right in that case it would be just ones in the future.

> The only way to make more GPU memory available for the current TiledLayerChromium is to reduce the GPU mem consumption of layers that were updated before it.  That requires switching layers to single buffering after they have been updated.  Hence the need to restart the update.  In some cases the restart is futile: when there are no earlier updated layers that are double buffered canvases.

There is some work to be done wrt hitting OOM on tiled layers in general.. I like that this is trying to be so considerate to tiled layers, but it feels like the CL would be much simpler if it didn't. Since this is already an issue for tiled layers, and this CL doesn't really change that, I think I'd prefer not restarting the update in this CL if that's reasonable for making canvas work.
Comment 14 Justin Novosad 2012-05-02 11:06:18 PDT
(In reply to comment #13)
> (In reply to comment #12)
> > > I have another question.. why do we restart the whole update and turn off double buffering for *all* canvas layers when we hit this problem? 
> > 
> > Once a single canvas layer is switched to single buffering. We are force to impose synchronization between the main thread and the impl thread to avoid compositing partially rendered canvas frames.  Doing this negates the benefit of double buffering other layers.  Given that we know we are low on GPU memory at this point, disabling double buffering globally makes sense.
> 
> Ok I see what you mean. What if we just let it waste those resources on the double buffers it had already used for the current frame, we mark it as not using double buffer any more, and we mark setNeedsCommit() to say "try again without double buffering", but don't abort the current update and just go ahead with single buffer from this point on?
> 
> The downside I see to this is that we get a frame where fewer tiled layers are drawn than we could actually draw once double buffering is turned off. But we can OOM and skip layers for plenty of other reasons, and I don't see this as being a particularly special case for tiled layers?
> 

Eureka! CCLTH is already attempting to group texture reservation into a preliminary step. updateLayers(rootLayer, updater) calls a method named reserveTextures before doing the actual update work.  The problem is that most LayerChromium subclasses are not overloading reserveTextures.  Instead, they reserve their texture in the update method (tsk-tsk).  This is easy to fix for Canvas2DLayerChromium.  I'm not so sure about TiledLayerChromium, but like you said, that does not need to be this patch's business. All I need to do is pass down success/failure of the reserveTextures, and if it failed, turn off double bufferirng, and try to reserve again. All other operation related to the layer update would not be repeated or restarted.
Comment 15 Dana Jansens 2012-05-02 11:10:21 PDT
(In reply to comment #14)
> Eureka! CCLTH is already attempting to group texture reservation into a preliminary step. updateLayers(rootLayer, updater) calls a method named reserveTextures before doing the actual update work.  The problem is that most LayerChromium subclasses are not overloading reserveTextures.  Instead, they reserve their texture in the update method (tsk-tsk).  This is easy to fix for Canvas2DLayerChromium.  I'm not so sure about TiledLayerChromium, but like you said, that does not need to be this patch's business. All I need to do is pass down success/failure of the reserveTextures, and if it failed, turn off double bufferirng, and try to reserve again. All other operation related to the layer update would not be repeated or restarted.

Oh, that sounds great :)
Comment 16 Dana Jansens 2012-05-02 11:14:40 PDT
I should point out that reserveTextures() is a hacky thing to make us reserve the root layer's scrollbars first basically to give them the highest priority. So please make sure that if any layer reserve fails, we unprotect all textures before we go at it again without doublebuffer.

@epenner Probably the reserveTextures() will go away when we use the PrioritizedTextureManager, but I think we will just want to bake this idea of "all or none" for these double buffers into the priority calculation phase.
Comment 17 Justin Novosad 2012-05-02 11:28:59 PDT
(In reply to comment #16)

> @epenner Probably the reserveTextures() will go away when we use the PrioritizedTextureManager, but I think we will just want to bake this idea of "all or none" for these double buffers into the priority calculation phase.

I am not familiar with the design of PrioritizedTextureManager, but I'm not sure that will come into play. We can only turn off double buffering during a commit because the compositor thread references the textureIds, which only get updated during commit (pushPropertiesTo).  Therefore, the textures remain permanently reserved as long as the layers are double buffered, so they are not candidates for eviction. 

If I put reserveTextures to good use, can I keep it? :-))
Comment 18 Dana Jansens 2012-05-02 11:32:32 PDT
(In reply to comment #17)
> (In reply to comment #16)
> 
> > @epenner Probably the reserveTextures() will go away when we use the PrioritizedTextureManager, but I think we will just want to bake this idea of "all or none" for these double buffers into the priority calculation phase.
> 
> I am not familiar with the design of PrioritizedTextureManager, but I'm not sure that will come into play. We can only turn off double buffering during a commit because the compositor thread references the textureIds, which only get updated during commit (pushPropertiesTo).  Therefore, the textures remain permanently reserved as long as the layers are double buffered, so they are not candidates for eviction. 
> 
> If I put reserveTextures to good use, can I keep it? :-))

The big difference is textures to reserve are chosen in a global computation/sort instead of first-come-first-serve front-to-back. It happens during the commit also, just before we actually update the layers. So I think it would look similar to now, where canvas layers would have to say "abort! abort!" to the "prioritization" pass, instead of the "reserve" pass.
Comment 19 Justin Novosad 2012-05-02 12:45:04 PDT
(In reply to comment #18)
> (In reply to comment #17)
> > (In reply to comment #16)
> > 
> > > @epenner Probably the reserveTextures() will go away when we use the PrioritizedTextureManager, but I think we will just want to bake this idea of "all or none" for these double buffers into the priority calculation phase.
> > 
> > I am not familiar with the design of PrioritizedTextureManager, but I'm not sure that will come into play. We can only turn off double buffering during a commit because the compositor thread references the textureIds, which only get updated during commit (pushPropertiesTo).  Therefore, the textures remain permanently reserved as long as the layers are double buffered, so they are not candidates for eviction. 
> > 
> > If I put reserveTextures to good use, can I keep it? :-))
> 
> The big difference is textures to reserve are chosen in a global computation/sort instead of first-come-first-serve front-to-back. It happens during the commit also, just before we actually update the layers. So I think it would look similar to now, where canvas layers would have to say "abort! abort!" to the "prioritization" pass, instead of the "reserve" pass.

That sounds like a great way to proceed.  I think this change should wait for the new texture manager. Putting on hold.