[chromium] Canvas2D should rate-limit drawing to prevent swamping the GPU process
Created attachment 111506 [details] Patch
Comment on attachment 111506 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111506&action=review > Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:69 > + return (m_context->getExtensions()->getGraphicsResetStatusARB() == GraphicsContext3D::NO_ERROR); why is this changed? > Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:95 > + m_rateLimitingTimer.startOneShot(0); I'm a little worried about the use of WebCore::Timer for this. What's the intended behavior here? When exactly should this timer fire?
Comment on attachment 111506 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111506&action=review > Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:60 > + m_contextSupportsRateLimitingExtension = context->getExtensions()->supports("GL_CHROMIUM_rate_limit_offscreen_context"); Might want to double check that our implementaiton of supports() is handled client side. Would suck if this was a call to the GPU process for every layer. :'( > Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:92 > +void Canvas2DLayerChromium::contentChanged() In the case where the user has three canvases, this will rate limit three times, right? Since its all one context, we need to rate limit only once. The problem here is the rateLimit code basically says "insert a token. Increment a counter. Dont return until the GPU process has processed at least up to counter - 2." So, for example, 3 rateLimits in a row is a glFinish. > Source/WebCore/platform/graphics/chromium/GraphicsLayerChromium.cpp:308 > void GraphicsLayerChromium::setContentsNeedsDisplay() Would this get called in unexpected cases? E.g. during layout or when someone changes the size of a div? The thing I'm worrying about is that we'll end up calling rateLimit when we didn't mean to... although, that might be OK. > Source/WebCore/platform/graphics/chromium/LayerChromium.h:198 > + virtual void contentChanged() { } Might want to make a comment about what this means semantically. I'm a bit worried about adding this on all layers, mainly because its pretty complex as it is. This is from an external perspective, hopefully Vangelis has a smart thing to say here. Whatever semantic definition we agree on for contentChanged, we probably want to be self consistent.
Comment on attachment 111506 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111506&action=review >> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:69 >> + return (m_context->getExtensions()->getGraphicsResetStatusARB() == GraphicsContext3D::NO_ERROR); > > why is this changed? I honestly can't remember the details (this tiny patch has been too long in the making). I think the reason is that, since the call to contentChanged() happens inside a "drawsContent()" check higher up the stack, we want the contentChanged() to happen even if the layer is currently disabled (texture ID 0), so that if a texture is created for it in the interim, it still gets rate limited. But I'm not certain about that. >> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:95 >> + m_rateLimitingTimer.startOneShot(0); > > I'm a little worried about the use of WebCore::Timer for this. What's the intended behavior here? When exactly should this timer fire? Basically, it should fire after the current chunk of JavaScript code and any Canvas2D drawing calls made by it have finished executing. By putting it on a timer, we know it won't get called until v8 returns. The same trick is used in WebGLLayerChromium.
Comment on attachment 111506 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111506&action=review >> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:60 >> + m_contextSupportsRateLimitingExtension = context->getExtensions()->supports("GL_CHROMIUM_rate_limit_offscreen_context"); > > Might want to double check that our implementaiton of supports() is handled client side. Would suck if this was a call to the GPU process for every layer. :'( Hmmm. Well, WebGL does the same thing (although perhaps that's no excuse). I'll look into it. >> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:92 >> +void Canvas2DLayerChromium::contentChanged() > > In the case where the user has three canvases, this will rate limit three times, right? Since its all one context, we need to rate limit only once. The problem here is the rateLimit code basically says "insert a token. Increment a counter. Dont return until the GPU process has processed at least up to counter - 2." So, for example, 3 rateLimits in a row is a glFinish. Yeah, that's what I thought. I thought I would see a big perf penalty on multiple canvas layers. But there isn't, either on my test cases, or on Fireflies or Fishbowl, two recent demos which use a lot of canvas layers. I was rather surprised. I did another implementation with a single timer for all canvases, and there was no difference. I'm not sure why, though. >> Source/WebCore/platform/graphics/chromium/LayerChromium.h:198 >> + virtual void contentChanged() { } > > Might want to make a comment about what this means semantically. > > I'm a bit worried about adding this on all layers, mainly because its pretty complex as it is. This is from an external perspective, hopefully Vangelis has a smart thing to say here. Whatever semantic definition we agree on for contentChanged, we probably want to be self consistent. My hope was that we could re-use this for WebGL, and we could remove some code from WebGLRenderingContext, and de-couple GraphicsContext3D from WebGLLayerChromium a bit too.
Comment on attachment 111506 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111506&action=review >>> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:92 >>> +void Canvas2DLayerChromium::contentChanged() >> >> In the case where the user has three canvases, this will rate limit three times, right? Since its all one context, we need to rate limit only once. The problem here is the rateLimit code basically says "insert a token. Increment a counter. Dont return until the GPU process has processed at least up to counter - 2." So, for example, 3 rateLimits in a row is a glFinish. > > Yeah, that's what I thought. I thought I would see a big perf penalty on multiple canvas layers. But there isn't, either on my test cases, or on Fireflies or Fishbowl, two recent demos which use a lot of canvas layers. I was rather surprised. I did another implementation with a single timer for all canvases, and there was no difference. I'm not sure why, though. Should we add some tracing around the place where the blocking happens to make sure it functions as intended? >>> Source/WebCore/platform/graphics/chromium/LayerChromium.h:198 >>> + virtual void contentChanged() { } >> >> Might want to make a comment about what this means semantically. >> >> I'm a bit worried about adding this on all layers, mainly because its pretty complex as it is. This is from an external perspective, hopefully Vangelis has a smart thing to say here. Whatever semantic definition we agree on for contentChanged, we probably want to be self consistent. > > My hope was that we could re-use this for WebGL, and we could remove some code from WebGLRenderingContext, and de-couple GraphicsContext3D from WebGLLayerChromium a bit too. It seems reasonable to me to define a contentChanged() method for LC's and I like the idea of using the same mechanism to rate-limit WebGL.
Comment on attachment 111506 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111506&action=review >>> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:60 >>> + m_contextSupportsRateLimitingExtension = context->getExtensions()->supports("GL_CHROMIUM_rate_limit_offscreen_context"); >> >> Might want to double check that our implementaiton of supports() is handled client side. Would suck if this was a call to the GPU process for every layer. :'( > > Hmmm. Well, WebGL does the same thing (although perhaps that's no excuse). I'll look into it. Seems to be handled client-side. >>> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:69 >>> + return (m_context->getExtensions()->getGraphicsResetStatusARB() == GraphicsContext3D::NO_ERROR); >> >> why is this changed? > > I honestly can't remember the details (this tiny patch has been too long in the making). I think the reason is that, since the call to contentChanged() happens inside a "drawsContent()" check higher up the stack, we want the contentChanged() to happen even if the layer is currently disabled (texture ID 0), so that if a texture is created for it in the interim, it still gets rate limited. But I'm not certain about that. Forget everything I said. This was a bad merge. Removed. >>>> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:92 >>>> +void Canvas2DLayerChromium::contentChanged() >>> >>> In the case where the user has three canvases, this will rate limit three times, right? Since its all one context, we need to rate limit only once. The problem here is the rateLimit code basically says "insert a token. Increment a counter. Dont return until the GPU process has processed at least up to counter - 2." So, for example, 3 rateLimits in a row is a glFinish. >> >> Yeah, that's what I thought. I thought I would see a big perf penalty on multiple canvas layers. But there isn't, either on my test cases, or on Fireflies or Fishbowl, two recent demos which use a lot of canvas layers. I was rather surprised. I did another implementation with a single timer for all canvases, and there was no difference. I'm not sure why, though. > > Should we add some tracing around the place where the blocking happens to make sure it functions as intended? Done.
See also: http://code.google.com/p/chromium/issues/detail?id=84781
Created attachment 111621 [details] Patch
Comment on attachment 111621 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=111621&action=review > Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:103 > + Extensions3DChromium* extensions = static_cast<Extensions3DChromium*>(m_context->getExtensions()); So what does the trace look like if you have multiple canvas' on a single page? Does throttling always kick in?
(In reply to comment #10) > So what does the trace look like if you have multiple canvas' on a single page? Does throttling always kick in? Yes, the throttling works: in my test case and on Fireflies startup, Canvas2DLayerChromium::rateLimitContext() shows up fairly high in the trace. I've attached that test case here too, since it shows pretty clearly how bad things are without this patch.
Created attachment 111766 [details] Test case w/16 animated 1Kx1K canvases and a short setTimeout()
Ping? James, Vangelis, Nat: further comments/suggestions?
Comment on attachment 111621 [details] Patch The case where this patch doesn't work, I think, is where you had touched multiple canvases in different setTimeouts. The rate limiter is here, btw: gpu/command_buffer/client/gles2_implementation.cc When you touch N canvases inside one setTimeout, we post N rateLimiting tasks. Effectively, N canvases looks like this to the command buffer client: for(i = 0 .. N) if(i > 2) waitToken(i-2) setToken(i); } But, if you managed to squeeze multiple setTimeouts into one frame, e.g. by chaining setTimeouts together and drawing to a different canvas each time, then you'd get this on the client side: for(i = 0 .. N) if(i > 2) waitToken(i-2) setToken(i); drawing commands for canvas i; } Code that would do that would be: function renderAll() { var i = 0; for(i = 0 .. N) if(i > 2) waitToken(i-2) setToken(i); } For that case, you'd get a performance penalty. So, the question is, does the fact that we over-penalize multiple setTimeouts matter?
The case that I think would overthrottle would be this: function fillAllCanvases(canvases[]) { var i = 0; function drawOne() { canvases[i].getContext('2d').fillRect(0,0,100,100); if (++i < canvases.length) setTimeout(drawOne, 0); } setTimeout(drawOne, 0); }
(In reply to comment #14) > (From update of attachment 111621 [details]) > The case where this patch doesn't work, I think, is where you had touched multiple canvases in different setTimeouts. > > The rate limiter is here, btw: gpu/command_buffer/client/gles2_implementation.cc > > When you touch N canvases inside one setTimeout, we post N rateLimiting tasks. Effectively, N canvases looks like this to the command buffer client: > for(i = 0 .. N) > if(i > 2) > waitToken(i-2) > setToken(i); > } > > But, if you managed to squeeze multiple setTimeouts into one frame, e.g. by chaining setTimeouts together and drawing to a different canvas each time, then you'd get this on the client side: [elided] That's a very good point, and explains why I wasn't running into this problem before: all the examples I looked at had the drawing code all running in the same JS timer callback. I'm try to work up a test case with multiple setTimeouts to see how it looks with/without this patch.
Created attachment 111830 [details] Test case w/16 animated 1Kx1K canvases drawn on separate timeouts
My JS-fu is pretty weak, but I've given it a shot (see attachment). It runs at ~35FPS with this patch, and ~1FPS without it, basically identical to the single-timeout test case. One strange thing is, both with and without this patch, there's a ~2 sec startup while it seems each canvas seems to allocate and does its first draw. I didn't see that in the other test case. It's entirely possible I've done something stupid in the JS, though. I'm going to try some variants with differing timeouts per-canvas.
Just guessing, but I'd be curious about the performance of a 3 canvas case where the drawing in each canvas is actually time consuming. That is where things go bad --- here, since there's very little gpu-process side work to be done per canvas, a finish is of marginal impact.
(In reply to comment #19) > Just guessing, but I'd be curious about the performance of a 3 canvas case where the drawing in each canvas is actually time consuming. That is where things go bad --- here, since there's very little gpu-process side work to be done per canvas, a finish is of marginal impact. OK, I wrote up a 3-canvas test case with separate timers, and a heavier GPU load. I'll attach it here. However, it doesn't seem to have any worse performance with this patch than without it. I agree with you that in theory, there should be some cases where having multiple rate limiting callbacks on the same context would be problematic, I can't seem to create one in practice. Since this patch addresses known issues with known test cases, could we land it as-is, and see if we can find future problems in another patch?
Created attachment 112236 [details] Test case w/3 canvases, separate timers and a heavy GPU load
Its really your call. I have substantial respect for all the work you've put in showing that the concern isnt important in practice. From a "land it" point of view, I'm fine. But, isn't the fix can just be to move the throttling timer up to CCLayerTreeHost. Is that not the case?
(In reply to comment #22) > Its really your call. > > I have substantial respect for all the work you've put in showing that the concern isnt important in practice. From a "land it" point of view, I'm fine. > > But, isn't the fix can just be to move the throttling timer up to CCLayerTreeHost. Is that not the case? Yes, and the context to throttle. It's definitely doable, just that I thought it was somewhat canvas-specific, since it's the only case that re-uses a single context for multiple layers, so baking that into CCLayerTreeHost felt weird to me, and it seemed to work fine hidden in Canvas2DLayerChromium. But let me give it a try to see how it looks.
Created attachment 112343 [details] new approach using refactored RateLimiter class
The most recent patch refactors the
Be careful what you wish for... :) The most recent patch refactors the rate limiting from WebGL into its own class (RateLimiter). This class is used internally by CCLayerTreeHost::startRateLimiter(ctx), which is called by Canvas2DLayerChromium and WebGLLayerChromium. Canvas2DLayerChromium will always pass the same context; WebGL's will be different per-layer. The context is hashed to find the appropriate RateLimiter, so canvas will only rate limit once for all canvas elements. As threatened, I also refactored the WebGL layer invalidation (formerly WebGLLayerChromium::setTextureUpdated()) into the common path CanvasLayerChromium::contentChanged(), called by GraphicsContextLayer::setNeedsDisplay(). As a bonus, this aids in decoupling WebGLLayerChromium from GraphicsContext3D a bit (something twiz is grappling with in his DrawingBuffer refactor). Note that RateLimiters are not currently deleted until the CCLayerTreeHost is freed. I could actually delete them after the timer fires, but it would thrash memory a bit more (once per frame * numcontexts) and would also require checking for the extension every frame. It does mean they may live longer than the context (they will just no longer be used).
Comment on attachment 112343 [details] new approach using refactored RateLimiter class This is awesome! Wrt the leak, maybe the right way to do this is to put the rate limiter on the WebGL layer, but put the rate limiter class on CCLTH? E.g. CanvasLayer::contentsChanged() { layerTreeHost()->getCanvasRateLimiter()->start()
Comment on attachment 112343 [details] new approach using refactored RateLimiter class View in context: https://bugs.webkit.org/attachment.cgi?id=112343&action=review > Source/WebCore/platform/graphics/chromium/RateLimiter.h:53 > + GraphicsContext3D* m_context; if this object doesn't take a reference to the GraphicsContext3D, then what assurance do you have that this GC3D still exists when the timer fires? Couldn't it be GC'd by this point if it was a WebGL context? > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:436 > + ASSERT(context->getExtensions()); not sure why this ASSERT() is here
(In reply to comment #27) > (From update of attachment 112343 [details]) > This is awesome! Wrt the leak, maybe the right way to do this is to put the rate limiter on the WebGL layer, but put the rate limiter class on CCLTH? E.g. > CanvasLayer::contentsChanged() { > layerTreeHost()->getCanvasRateLimiter()->start() Interesting. Not sure what the ownership semantics would be in that case. It'd be tempting to just have the layers ref a RateLimiter vended by the CCLTH, but since the CCLTH needs to do the hashing, it sort of needs to do the owning, too. Another option would be to trigger off GraphicsContext3D destruction somehow to trigger destruction of the associated RateLimiter as well.
Comment on attachment 112343 [details] new approach using refactored RateLimiter class View in context: https://bugs.webkit.org/attachment.cgi?id=112343&action=review >> Source/WebCore/platform/graphics/chromium/RateLimiter.h:53 >> + GraphicsContext3D* m_context; > > if this object doesn't take a reference to the GraphicsContext3D, then what assurance do you have that this GC3D still exists when the timer fires? Couldn't it be GC'd by this point if it was a WebGL context? None, and yes. This patch is a work-in-progress. :) I tried a few ways of breaking it, but couldn't. It might be that the short fuse (0sec) of the timer prevents this, since a GC cycle can't sneak in, but I'd rather not count on that. I suppose another way to do this would be to have the GraphicsContext3D own the rate-limiting timer. Then it would be certain never to outlive the GC3D, and we wouldn't need the hash either. Yet another way would be find some place to hang a callback off V8-is-done-this-chunk-of-code, and use that instead of a timer. >> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:436 >> + ASSERT(context->getExtensions()); > > not sure why this ASSERT() is here Hysterical reasons. Removed.
Created attachment 112367 [details] Patch
GC3D is refcounted so you could just grab a ref in the rate limiter, right?
Created attachment 112376 [details] new patch; added rate limiter cancelling
(In reply to comment #32) > GC3D is refcounted so you could just grab a ref in the rate limiter, right? I could, but I'm not a big fan of shared ownership semantics. Instead, I added code in WebGLLayerChromium::setContext() to stop any pending rate limiter calls on the old context. Let me know what you think.
Comment on attachment 112376 [details] new patch; added rate limiter cancelling Thanks, this approach makes me much happier. Recapping an offline conversation: I asked about having a start/stopRateLimiting method on CCLTH and not having the RateLimiter class at all. However, it was pointed out to me that such an approach then has issues with deciding m_contextSupportsRateLimiting. We could make chrome-specific assumptions at that point, but to me its a can of worms not worth pursuing since this whole thing will change with "new scheduler." So, LGTM unofficially. I'm a bit too swamped to do a nit pass, can someone else jump in and make Stephen miserable? ;)
Comment on attachment 112376 [details] new patch; added rate limiter cancelling View in context: https://bugs.webkit.org/attachment.cgi?id=112376&action=review R=me, usual assortment of nits > Source/WebCore/platform/graphics/chromium/RateLimiter.cpp:8 > + * * Redistributions of source code must retain the above copyright should use 2-clause license for new files > Source/WebCore/platform/graphics/chromium/WebGLLayerChromium.cpp:130 > + if (layerTreeHost() && contextChanged) > + layerTreeHost()->stopRateLimiter(m_context); do we call setContext(NULL) before destroying a WebGLLayerChromium? > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:438 > + RateLimiterMap::iterator i = m_rateLimiters.find(context); we normally use 'it' for iterators and reserve 'i' for loop counters > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:450 > + RateLimiterMap::iterator i = m_rateLimiters.find(context); same here - i->it > Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.h:53 > +typedef HashMap<GraphicsContext3D*, RefPtr<RateLimiter> > RateLimiterMap; you could put this typedef inside the class next to the declaration so it's not in the WebCore namespace
Created attachment 112403 [details] fixed nits
Comment on attachment 112376 [details] new patch; added rate limiter cancelling View in context: https://bugs.webkit.org/attachment.cgi?id=112376&action=review >> Source/WebCore/platform/graphics/chromium/RateLimiter.cpp:8 >> + * * Redistributions of source code must retain the above copyright > > should use 2-clause license for new files Done. >> Source/WebCore/platform/graphics/chromium/WebGLLayerChromium.cpp:130 >> + layerTreeHost()->stopRateLimiter(m_context); > > do we call setContext(NULL) before destroying a WebGLLayerChromium? Dunno, but I put a call to stopRateLimiter() in the destructor, just in case. >> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:438 >> + RateLimiterMap::iterator i = m_rateLimiters.find(context); > > we normally use 'it' for iterators and reserve 'i' for loop counters Done. >> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHost.cpp:450 >> + RateLimiterMap::iterator i = m_rateLimiters.find(context); > > same here - i->it Done.
Committed r98471: <http://trac.webkit.org/changeset/98471>