This first patch is for review only. It implements two things: (1) a scissor rect optimization: paint rects and layer changes are tracked so that the renderer can compute a rect that contains any changes that were made since the previous redraw. This rect can be used to scissor the rendering passes. Eventually it can also be used to copy sub-buffers in the GPU process. (2) a visualization of paint rects, draw rects, exposed rects, and the compositor scissor rect, on the CCHeadsUpDisplay. This should probably be submitted as a separate patch, when things are ready to commit. For now, if you want to try running the patch, you can hard-code the 3 options to true/false as desired. Reviewers please note: - most of the "TODO(shawnsingh)" comments are actually questions for reviewers. - please suggest how we might need to pipe this scissor rect to the GPU process - please suggest how we want to enable/disable it (e.g., with a command-line flag? where will the setting come from?) About testing: we will need rigorous unit tests, and possibly layout tests. This code has been informally tested on: - youtube html5 videos - khronos webgl+css demo - khronos webgl shiny teapot demo - poster circle - briefly tested canvas and layer masks - testing requires manually selecting text, performing mouse-overs and clicks. Known issues: - https://bugs.webkit.org/show_bug.cgi?id=67009 - youtube html5 seems to produce arbitrary large paint rects. Most of these weird rects are correlated to small popups associated with buttons.
Created attachment 105864 [details] Patch
Comment on attachment 105864 [details] Patch Attachment 105864 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9566954
(In reply to comment #2) > (From update of attachment 105864 [details]) > Attachment 105864 [details] did not pass chromium-ews (chromium-xvfb): > Output: http://queues.webkit.org/results/9566954 The code is not expected to compile because of command-line options not being integrated with chromium, and I'll remove that OutputDebugStringA junk. Could some of you still please review this, anyway?
Comment on attachment 105864 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=105864&action=review > Source/WebCore/platform/graphics/chromium/LayerChromium.cpp:118 > +void LayerChromium::setNeedsCommitBecausePropertyChanged(bool layerHasMoved) Actually, I think we can get rid of this boolean switch. The only difference was whether we need to draw both the "old" and "new" locations of the layer because it moved, or just the existing location of the layer. In the latter case, the "old" and "new" locations are the same. So, it seems better to avoid this error-prone boolean flag. > Source/WebCore/platform/graphics/chromium/cc/CCLayerImpl.h:81 > + const FloatRect& dirtyRect() { return m_dirtyRect; } Whoops, I intended to rename this accessor and member variable as "repaintRect" instead of "dirtyRect". Also need to make the accessor const.
Comment on attachment 105864 [details] Patch Definitely pull this apart into more and smaller patches. There shouldn't be any visualization in this patch, just the logic to compute changed-ness. For the enable flag, you can put a bool on CCSettings, initialize it to false in the constructor, then for your testing purposes set it to true. We can figure out how to turn it on officially later. It feels to me that the "setNeedsCommitBecauseXXX" approach is quite fragile. It seems to me that there is a fair amount of fragility pushed into the layer system with this approach, as each layer has to very-carefully specify what kind of change is happening. Did you give any thought to doing this as a preprocessing step on a QuadList? E.g. make a modified version drawLayers that walks the tree but instead of drawing, for every drawCall, puts a quad into a QuadBuffer: struct DrawQuadInfo {FloatQuad screenspaceQuad (with depth), bool textureChanged, FloatQuad changedRectInsideTextureProjectedToScreenspace} Then, before drawlayers, build this quadlist. Remember the last frame's quadlist as well. Walk the list, if any quads have moved or their textures changed, use that to derive screenspace scissor rect. Then set the scissor and run the drawLayers. Webkit todos are ≈FIXME. E.g. // FIXME: blahblahblah. TODO(username) is chrome style. If you were going to stick with the setNeedsCommitBecause approach, maybe use an enum?
Sure, I'll remove the visualization code from this patch. > It feels to me that the "setNeedsCommitBecauseXXX" approach is quite fragile. It seems to me that there is a fair amount of fragility pushed into the layer system with this approach, as each layer has to very-carefully specify what kind of change is happening. After taking a step back yesterday, I realized the boolean flag is completely unnecessary, and reverting that significantly reduces the amount of changes in the LayerChromium. This includes removing the "setNeedsCommitBecause" naming stuff. > > Did you give any thought to doing this as a preprocessing step on a QuadList? > > E.g. make a modified version drawLayers that walks the tree but instead of drawing, for every drawCall, puts a quad into a QuadBuffer: > struct DrawQuadInfo {FloatQuad screenspaceQuad (with depth), bool textureChanged, FloatQuad changedRectInsideTextureProjectedToScreenspace} > > Then, before drawlayers, build this quadlist. Remember the last frame's quadlist as well. Walk the list, if any quads have moved or their textures changed, use that to derive screenspace scissor rect. Then set the scissor and run the drawLayers. > I think this is analogous to what the patch is already doing. If we follow your approach, we would need to add scissor rects and transforms to this QuadBuffer structure, because that is the actual information we need to know if the layer's quad has changed. Eventually then, it becomes two loops, like you said: (1) to determine what goes into the quad buffer, (2) to actually draw things in the quad buffer. In this patch, those two loops already exist: (1) LayerRendererChromium::computeCompositorScissorRect walks through all layers to determine what their actual draw rect will be. (this is not quite true in my current patch, but it is exactly poised for that role, e.g. computeAllScissorRects) and (2) the existing original nested loop in drawLayersInternal, which actually issues the drawQuad commands. Once we remove the redundancy from this loop, it does become essentially a quick walk through all layers to determine what gets drawn, and should not be any less efficient than walking a separate quad buffer. It seems to me that if we take both approaches through to completion, after cleaning away unnecessary quad buffer storage in your approach and redundant code in my approach, we'd end up with essentially the same solution. Thoughts?
Just to clarify... "Once we remove the redundancy from this loop," ...I'm referring to the redundancy between loops (1) and (2), since they are almost exactly the same in my patch. Most of those conditionals are probably more appropriate in loop (1), and loop (2) can do a quick check to see if the layer was "marked to be drawn", and only culls layers if userRenderSurface() does not work.
One last thing for now... catching things at the drawQuad level may allow us to hook into quads associated with individual tiles - that was another good point Nat had. But the way I see it: (1) The primary purpose of this patch is to compute the scissor rect, and there is no reason that computing the scissor needs to be tiling-aware. (2) The logic for deciding whether a layer (or tile) gets drawn, I'm actually not changing that logic. I might end up re-arranging the code, but not changing the logic. If tiling has other clever ways to cull tiles from being drawn, I think those are independent of this patch, and those optimizations seem to mostly already exist. Thoughts?
Created attachment 112600 [details] pseudcode of the final scissor approach
Attachment 112600 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:308: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:309: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:314: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:339: This { should be at the end of the previous line [whitespace/braces] [4] Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:366: One line control clauses should not use braces. [whitespace/braces] [4] Total errors found: 5 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 112600 [details] pseudcode of the final scissor approach Attachment 112600 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/10223120
Created attachment 112958 [details] please review, but also check comments about this version
Hi all, The most recent patch is a working version of scissor. Notes for reviewers: (1) the visualization code will not be in the final patch. you can ignore changes in CCHeadsUpDisplay. The most important code is in computeCompositorScissorRect(). (2) there are several FIXME comments, I believe all of them will disappear in the final version. You can read the comments to understand the issue. (3) The following edge cases (really, no pun intended) have not been fully addressed/tested yet. They may involve minor changes to the code, or they will require separate bugs because other code is doing something suboptimal. - non composited content layer - scrollbar layers - plugin layers - masks - replicas - appropriate testing for the compositor scissor rect. If you want to run: change the useCompositorScissor flag in CCLayerTreeHost to "true". optionally, change the showCompositorRects flag to "true", and ask me what the colors mean. With scissoring, you will see the "blue background" for any regions of the image that did not change on that particular frame. Enjoy!
Comment on attachment 112958 [details] please review, but also check comments about this version This is not ready for WebKit review, please don't set review?. This puts the patch in a global queue that all WebKit reviewers look at for patches to review.
(In reply to comment #14) > (From update of attachment 112958 [details]) > This is not ready for WebKit review, please don't set review?. This puts the patch in a global queue that all WebKit reviewers look at for patches to review. OK, my mistake. webkit-patch upload does review=? by default. There are two reasons I submitted it: (1) for a few people who wanted to actually run it (2) for vangelis, enne, or nduca, or jamesr to please give a look and give feedback, to reduce the number of iterations and delay of this patch. Thanks in advance, I would appreciate it.
Created attachment 113934 [details] latest working implementation, still needs tests
Comment on attachment 113934 [details] latest working implementation, still needs tests View in context: https://bugs.webkit.org/attachment.cgi?id=113934&action=review > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:-586 > - setScissorToRect(layer->clipRect()); Lets put this in drawLayersToRenderSurface > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:671 > + layer->resetLayerPropertyChanged(); awkward name to me, but I'm new to this code > Source/WebCore/platform/graphics/chromium/cc/CCRenderSurface.h:102 > + const FloatRect& scissorRect() const { return m_scissorEngine.currentScissorRect(); } Why pass through here? It seems to me that you should have a scissorEngine() accessor and someone who wants to manipulate the scissor engine would go directly to it. > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.cpp:69 > + renderSurface->updateScissorRect(); Very strange that this leaves control of the class in order to come back to it. :( > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.cpp:73 > +void CCScissorEngine::computeNextScissor(CCRenderSurface* targetSurface) Explain the algorithm here in a comment. Like calcDrawEtc does. > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.h:38 > +// Computes the region where pixels have actually changed on a surface. This region is used rendersurface? > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.h:42 > + CCScissorEngine(); Might have to follow the explicit constructor pattern --- c.f. CCLayerTreeHost or LayerTextureUpdater constructor. > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.h:44 > + static void computeScissorRectForAllSurfaces(const Vector<RefPtr<CCLayerImpl> >& renderSurfaceLayerList); why is this static? > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.h:46 > + void computeNextScissor(CCRenderSurface*); Its a little awkward to me that this stashes the scissor onto the class. Why not just return it to the person calling? FloatRect computeScissor(CCRenderSurface*); Ideally, you'd separate the logic of "computing the scissor" from 'updating the object for the next frame'. E.g. m_context->setScissor(m_renderSurface->scissorEngine()->computeScissor()); .. do stuff... m_renderSurface()->scissorEngine()->didDraw(); That has better testing properties. > Source/WebCore/platform/graphics/chromium/cc/CCScissorEngine.h:55 > + // these helper functions are used only within computeScissorForNewLayers. these->These > Source/WebKit/chromium/tests/CCScissorEngineTest.cpp:37 > + // testing update rect regions Stating the obvious, each of these should be a separate test. Looking forward to seeing the tests be written.
Comment on attachment 113934 [details] latest working implementation, still needs tests Thanks for the feedback! Only one question: View in context: https://bugs.webkit.org/attachment.cgi?id=113934&action=review >> Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:-586 > > Lets put this in drawLayersToRenderSurface The scissor changes for every layer individually, depending on the layer's clipRect. Putting it in drawLayersToRenderSurface will still be on that inner for-loop where we draw each layer. Do you still think we should do it? >> Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:671 > > awkward name to me, but I'm new to this code Actually this is a separate bugfix. I'll submit it as a separate trivial patch.
> The scissor changes for every layer individually, depending on the layer's clipRect. Putting it in drawLayersToRenderSurface will still be on that inner for-loop where we draw each layer. Do you still think we should do it? Yes. That was the architecture we agreed on in the brainstorming session, remember? You'll of course have to pass state up to the parent renderlayers but I think that complexity is ok given the clarity improvements it causes elsewhere.
Almost reaching the end of this scissoring journey. To do this cleanly, I have (yet again) separated this beast into smaller patches. With any luck this is the final roadmap to scissoring. The first two are already implemented, I just need to make them patch-presentable. - track property changes in render surfaces (similar to what we already did with layers) https://bugs.webkit.org/show_bug.cgi?id=72521 - the real "guts" of the scissoring, computing damaged regions on surfaces https://bugs.webkit.org/show_bug.cgi?id=72520 - this bug will bridge Jonathan's plumbing from https://bugs.webkit.org/show_bug.cgi?id=72028 with the damage tracking, to perform the actual scissoring. - After that, continuing to fix bugs as they arise.
Created attachment 117094 [details] Works pretty well, reviewers please see comments
(In reply to comment #21) > Created an attachment (id=117094) [details] > Works pretty well, reviewers please see comments Hi all, the latest patch is working pretty well. Here are the important points to know: (1) if you want to test this patch, your hardware needs to support partial swaps. As far as I know, chromium will support partial swaps on OS X or chromium OS. Jonathan Backer's OS X patch to support partial swaps can be found here: http://codereview.chromium.org/8726046/ (2) I'm requesting this patch (or some nearby revision) to be landed, even though all layout tests do not pass. The feature is disabled by default, and when it is enabled it doesn't explode, only a small portion of layout tests are failing. I would prefer to debug those in separate follow-up patches. (3) Antoine, or anyone interested: can someone please test this on chromium OS and verify that we actually do get the performance improvements we are hoping for?
(In reply to comment #22) > (In reply to comment #21) > > Created an attachment (id=117094) [details] [details] > > Works pretty well, reviewers please see comments > > Hi all, the latest patch is working pretty well. Here are the important points to know: > > (1) if you want to test this patch, your hardware needs to support partial swaps. As far as I know, chromium will support partial swaps on OS X or chromium OS. Jonathan Backer's OS X patch to support partial swaps can be found here: http://codereview.chromium.org/8726046/ > > (2) I'm requesting this patch (or some nearby revision) to be landed, even though all layout tests do not pass. The feature is disabled by default, and when it is enabled it doesn't explode, only a small portion of layout tests are failing. I would prefer to debug those in separate follow-up patches. > > (3) Antoine, or anyone interested: can someone please test this on chromium OS and verify that we actually do get the performance improvements we are hoping for? By the way, damage tracking does not include the HUD. Therefore, the "frame rate" option that shows on the top-left of the viewport will look like it doesn't update properly - because we aren't copying those pixels =)
Comment on attachment 117094 [details] Works pretty well, reviewers please see comments View in context: https://bugs.webkit.org/attachment.cgi?id=117094&action=review > Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:119 > + layerOriginTransform.translate3d(-0.5 * renderSurfaceLayer->bounds().width(), -0.5 * renderSurfaceLayer->bounds().height(), 0); just using translate() here will save you a multiplication for each component
This is awesome. I'm excited. Found a simple test case where things break (tested on OSX): http://www.webkit.org/blog-files/3d-transforms/poster-circle.html where the window is too short to fit the animation. I did some testing on OSX using the patch referred to above. To be fair, this is only for the 10.5 path which does a glReadPixels readback through SHM, so scissoring should be a big win here. What I saw is that when the partial update is quite small compared to the window size scissoring helps considerably: - launched chrome with --disable-gpu-vsync - made the window fullscreen - went to http://learningwebgl.com/lessons/lesson03/index.html - looked at the time to produce 10 frames in the GPU process (clearly SHM readback bound) using chrome://tracing to average out the variance - time with scissoring was 14.7 ms per frame - time without scissoring was 26.5 ms per frame When the window was smaller (the update region was comparable to the full frame) there is no difference within error --- as expected.
@backer: What GPU?
@nduca: Radeon HD 5770 in my mac.
(In reply to comment #27) > @nduca: Radeon HD 5770 in my mac. Awesome! Great work, everyone. :)
Did some testing on Aura with a chromeos device (Newton). Same test as with OSX similar results. Obviously fill bound: - 22 ms per frame with scissoring on full screen window (pushing 30" display) - 47 ms per frame without scissoring Again, when the window is the size of the update rect no difference (as expected). This was using CompositorGL. I'm having a few issues that I need to resolve with CompositorCC --- but once we have scissoring in the browser side display, I expect bigger wins.
This should produce a (possibly small) performance win even without the partial swap support by setting a tighter scissor on render surface clears, right?
(In reply to comment #30) > This should produce a (possibly small) performance win even without the partial swap support by setting a tighter scissor on render surface clears, right? It should be just a few additional lines of code, so I'll give it a try. I would like to push very very hard that this patch lands by Friday, so that its in for m17. I'm working on bug fixes and will submit a new patch ASAP. Already crushed two bugs with one fix, and Jonathan Backer is working on a patch to add support for damage from plugin layers.
We won't be enabling this for users by default m17, so there's no benefit to trying to squeeze it into the merge window. Don't stress out about it. When it's done and ready to enable for everyone we can see what milestone we are at. Before then just worry about trunk.
Created attachment 117382 [details] Only one remaining bug, probably in readback for layout tests
(In reply to comment #33) > Created an attachment (id=117382) [details] > Only one remaining bug, probably in readback for layout tests As best as I can, I verified that there are no regressions across all layout tests (I ran about 23k of them, before and after the patch). There are about 15-20 failing tests, but I verified manually that they all work correctly. The problem with those tests is in DumpRenderTree trying to capture the image. The image that gets captured is completely blue (the clearColor). Additionally, printfs suggest that the readback is causing an additional frame to be rendered, usually with no damage. So it seems like there is some problem with readback in relation to damage tracking. If someone can help me squash this tomorrow, that would be awesome. By myself I probably cannot do it in one day. Realistically, even though this patch has no regressions, I still feel like there will be a few bugs that come up. I'll leave it to Antoine and James (and anyone else with an opinion) to discuss whether to turn this on by default. Finally, James, apologies I did not yet add the partial clearing. I'll try to add it tomorrow for the final version of the patch.
Created attachment 117587 [details] All known issues are resolved
(In reply to comment #35) > Created an attachment (id=117587) [details] > All known issues are resolved Status: - fixed the scissor-outside-viewport problem - fixed a tricky perspective projection problem - also scissored the clear() calls (except for the root surface clear(), which we agreed to keep just for now) - auto-disable damage tracking when partial swap is not supported Because of that last feature, layout tests will work the same with/without the feature enabled, because DumpRenderTree does not support partial swap. All issues that I am aware of have been solved, but I think it needs more time to bake, so the patch has it disabled by default. James can you please review it? Thanks in advance.
Comment on attachment 117587 [details] All known issues are resolved You should switch from most of the code using CCSettings::useDamageTracker to a LayerRendererCapaibilities::contextSupportsPartialSwap. That should be false if the CCSetting is false and true if and only if the actual underlying context supports swaps.
Addressing comment 36: I'd really like to scissor the root layer clear. It did some testing on Intel graphics based ChromeOS device running Aura. When dragging around a smaller window (which is drawn with WK compositor): - 16 ms per frame without scissoring root clear - 10 ms per frame with scissoring root clear I expect the same sort of wins for Pepper flash. It was just a 2 line change in LRC. Is there a reason that this is hard to do properly?
(In reply to comment #38) > Addressing comment 36: I'd really like to scissor the root layer clear. It did some testing on Intel graphics based ChromeOS device running Aura. When dragging around a smaller window (which is drawn with WK compositor): > - 16 ms per frame without scissoring root clear > - 10 ms per frame with scissoring root clear > > I expect the same sort of wins for Pepper flash. It was just a 2 line change in LRC. Is there a reason that this is hard to do properly? There should be no limtations. I had also discussed this with James and Vangelis. For the moment we're leaving it there because it is helpful for debugging (for example, there's a scrollbar bug that shows blue because it doesn't draw itself), but eventually we should be able to remove that clear() entirely - better than just scissoring it!
> ... wrt scissoring clear... > For the moment we're leaving it there because it is helpful for debugging (for example, there's a scrollbar bug that shows blue because it doesn't draw itself), #ifdef DEBUG then
Created attachment 117676 [details] feature now toggled by capability flag, root clear is also scissored Discussed offline with Nat, as I understand we agreed to not put the #ifdef NDEBUG yet.
Comment on attachment 117676 [details] feature now toggled by capability flag, root clear is also scissored View in context: https://bugs.webkit.org/attachment.cgi?id=117676&action=review > Source/WebCore/ChangeLog:31 > +2011-12-02 Shawn Singh <shawnsingh@chromium.org> > + > + HOORAY IT IS READY TO SUBMIT > + Duplicate entry.
Comment on attachment 117676 [details] feature now toggled by capability flag, root clear is also scissored View in context: https://bugs.webkit.org/attachment.cgi?id=117676&action=review >> Source/WebCore/ChangeLog:31 >> + > > Duplicate entry. whoops. a leak of my local git changes. Will re-submit a fixed patch right now =)
Created attachment 117702 [details] removed duplicate changelog entry
Created attachment 117704 [details] resolved merge conflict on tip of tree
Comment on attachment 117704 [details] resolved merge conflict on tip of tree Great - R=me
Comment on attachment 117704 [details] resolved merge conflict on tip of tree Clearing flags on attachment: 117704 Committed r101908: <http://trac.webkit.org/changeset/101908>
All reviewed patches have been landed. Closing bug.