Bug 74037

Summary: [chromium] Remove assumption that empty surface is always at end of list.
Product: WebKit Reporter: Shawn Singh <shawnsingh>
Component: Layout and RenderingAssignee: Shawn Singh <shawnsingh>
Status: RESOLVED FIXED    
Severity: Normal CC: cc-bugs, enne, jamesr, webkit.review.bot
Priority: P1    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
added test case that reproduces the crash none

Description Shawn Singh 2011-12-07 16:39:15 PST
This patch (coming in just a moment) hopefully fixes some crashes associated with http://code.google.com/p/chromium/issues/detail?id=106734 . It probably does not fix all causes of similar crashes, please see issue 106734 for details.

No tests are added yet, just to get the patch uploaded.  If people want this fix to actually land, then we can take some time to add a test for this.
Comment 1 Shawn Singh 2011-12-07 16:56:05 PST
Created attachment 118289 [details]
Patch
Comment 2 Shawn Singh 2011-12-07 16:57:49 PST
After checking on OSX, this does not fix the problem.  Please see the chromium issue 106734 for discussion.
Comment 3 Shawn Singh 2011-12-08 18:13:54 PST
Created attachment 118505 [details]
Patch

Please refer to chromium issue 106734 for continued discussion, comment #14 refers to this particular patch
Comment 4 Vangelis Kokkevis 2011-12-08 23:12:47 PST
Comment on attachment 118505 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=118505&action=review

> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHostCommon.cpp:444
> +            while (renderSurfaceLayerList.last() != layer) {

I'm not sure I follow.  If layer->renderSurface() != NULL then the layer being processed has an associated RS and it was just added to the renderSurfaceLayerList so it must be the last element. How can we get here with the list having more elements?   
Also, in general, when layer A follows Layer B in the renderSurfaceLayerList, it doesn't necessarily mean than A is a descendant of B.  They could be siblings or even in unrelated parts of the tree.
Comment 5 Shawn Singh 2011-12-08 23:47:21 PST
Comment on attachment 118505 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=118505&action=review

>> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHostCommon.cpp:444
>> +            while (renderSurfaceLayerList.last() != layer) {
> 
> I'm not sure I follow.  If layer->renderSurface() != NULL then the layer being processed has an associated RS and it was just added to the renderSurfaceLayerList so it must be the last element. How can we get here with the list having more elements?   
> Also, in general, when layer A follows Layer B in the renderSurfaceLayerList, it doesn't necessarily mean than A is a descendant of B.  They could be siblings or even in unrelated parts of the tree.

This portion of the code happens after the recursion.  That means that additional surfaces that may get created are completely within the subtree and eventually contribute to layer->renderSurface().

Enne and I discussed offline - we agreed that in theory, additional surfaces should not be created -- if the code were correct.  We're pretty sure the root of this is https://bugs.webkit.org/show_bug.cgi?id=74147 -- by not transforming the clipRect, it makes descendant surfaces think that they are not clipped.  But we also agreed that this current proposed patch is a safer fix for merging to m17.

Does this sound reasonable to you?
Comment 6 Shawn Singh 2011-12-08 23:49:08 PST
Correction - "pretty sure" sounds more confident than we actually are.  We're somewhat sure that's the root of the problem, and when we have time we'll create a reduced test case first, to verify.
Comment 7 James Robinson 2011-12-09 18:44:53 PST
This is deep magick. I know this is for a crash fix but I'd like to see Enne or Vangelis give unambiguous positive feedback, and ideally have a test case, before putting this one in. It's very easy to spend a lot of time running in circles with bugs like this so I'd like for at least two people to have a deep understanding of what's going on here and why the fix works so we don't end up just chasing issue behind this one for weeks.
Comment 8 Shawn Singh 2011-12-10 20:44:52 PST
(In reply to comment #7)
> This is deep magick. I know this is for a crash fix but I'd like to see Enne or Vangelis give unambiguous positive feedback, and ideally have a test case, before putting this one in. It's very easy to spend a lot of time running in circles with bugs like this so I'd like for at least two people to have a deep understanding of what's going on here and why the fix works so we don't end up just chasing issue behind this one for weeks.

I'll definitely make new patch with a test case asap.  Looking forward to hearing Vangelis's and Enne's feedback, too.

If it helps, I can give any verbose details about the debugging investigations I did.
Comment 9 Shawn Singh 2011-12-12 04:19:59 PST
Created attachment 118775 [details]
added test case that reproduces the crash
Comment 10 Shawn Singh 2011-12-12 04:32:01 PST
(In reply to comment #9)
> Created an attachment (id=118775) [details]
> added test case that reproduces the crash

Explanation of the crash:

The crash occurs somewhat indirectly from the cause.

1. When a layer creates its own renderSurface, that layer does not initialize its clipRect.  The clipRect remains whatever it most recently was from the previous frame.

2. Because of this, the layer may have an incorrect clipRect.  This layer and child layers will accidentally think that they should schedule themselves to be drawn, and this surface ultimately gets added to the renderSurfaceLayerList.

3. In combination with this, an ancestor surface knows that it should be clipped, and tries to remove itself from the renderSurfaceLayerList.  At this point, we had assumed this surface was the last item, but when Steps 1 and 2 occur, then this is not true, and we removed the wrong surface from the list.

4. The layer we intended to remove from renderSurfaceLayerList had already cleared its surface.  When we try to iterate over that empty list and access the size() parameter, it crashes.

This is what the unit test reproduces.

This fix is only half the solution.  I have an almost-working "correct" version, but I feel it is too risky to back-port that to m17.   By removing all subsequent layers (which we know are part of the subtree), I think we are keeping the intended correctness, since all those layers end up contributing to a surface that knows it should be clipped.  This seems safer to me for m17.
Comment 11 Adrienne Walker 2011-12-12 09:23:28 PST
Comment on attachment 118775 [details]
added test case that reproduces the crash

View in context: https://bugs.webkit.org/attachment.cgi?id=118775&action=review

Ah, we only modify the render surface layer's clip rect on line 420 after we've processed all the children.  Oops.

Excellent detective work and test case.  :)

> Source/WebCore/platform/graphics/chromium/cc/CCLayerTreeHostCommon.cpp:443
> +            // FIXME: Originally we asserted that this layer was already at the end of the
> +            //        list, and only needed to remove that layer. For now, we remove the
> +            //        entire subtree of surfaces to fix a crash bug. The root cause is
> +            //        https://bugs.webkit.org/show_bug.cgi?id=74147 and we should be able
> +            //        to put the original assert after fixing that.

Maybe for robustness we should just leave this in after the fact.  Maybe there's other edge cases we haven't caught or maybe somebody else will break this in the future.  I'm beginning to think just removing the last layer in the list was too clever of a solution.  All the other render surfaces that are in the list after this layer have to be children, so it should always be safe to remove them.
Comment 12 James Robinson 2011-12-12 10:23:15 PST
Comment on attachment 118775 [details]
added test case that reproduces the crash

Awesome test and explanation. R=me for this.
Comment 13 Shawn Singh 2011-12-12 10:49:33 PST
Comment on attachment 118775 [details]
added test case that reproduces the crash

Cool, thanks =)
Comment 14 WebKit Review Bot 2011-12-12 12:53:45 PST
Comment on attachment 118775 [details]
added test case that reproduces the crash

Clearing flags on attachment: 118775

Committed r102611: <http://trac.webkit.org/changeset/102611>
Comment 15 WebKit Review Bot 2011-12-12 12:53:49 PST
All reviewed patches have been landed.  Closing bug.