Bug 75046 - [chromium] CCLayerSorter accidentally reverses order of some layers, affecting z-index/preserves3D behavior
Summary: [chromium] CCLayerSorter accidentally reverses order of some layers, affectin...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Layout and Rendering (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Shawn Singh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-21 15:00 PST by Shawn Singh
Modified: 2012-01-03 21:25 PST (History)
7 users (show)

See Also:


Attachments
includes fix and appropriate unit test (8.12 KB, patch)
2011-12-21 17:17 PST, Shawn Singh
no flags Details | Formatted Diff | Diff
updated to tip of tree, removed one inconsistent comment (8.09 KB, patch)
2011-12-21 18:31 PST, Shawn Singh
no flags Details | Formatted Diff | Diff
Sample page with 3d transformations having preserver-3d property. (1.55 KB, text/html)
2011-12-21 21:29 PST, Ravi Phaneendra Kasibhatla
no flags Details
Another sample page with preserve-3d property and 3d transformations (2.00 KB, text/html)
2011-12-21 21:33 PST, Ravi Phaneendra Kasibhatla
no flags Details
includes both bugfixes (9.42 KB, patch)
2012-01-03 14:34 PST, Shawn Singh
no flags Details | Formatted Diff | Diff
merged the existing exitThreshold and new nonZeroThreshold (9.46 KB, patch)
2012-01-03 15:07 PST, Shawn Singh
no flags Details | Formatted Diff | Diff
un-merged exitThreshold and nonZeroThreshold again (9.39 KB, patch)
2012-01-03 17:34 PST, Shawn Singh
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Shawn Singh 2011-12-21 15:00:32 PST
CCLayerSorter performs "topological sort", which enforces that nodes are ordered correctly only when there is a dependency between them.  Nodes that are not dependent on each other (explicitly or transitively) can be in any order.  In CCLayerSorter, when there are no explicit dependencies between two layers, we end up reversing the order of those layers (because we use removeLast() to get the next item efficiently).  But it turns out that ordering matters -- we should be preserving the ordering of layers that WebCore gave us when there is no other constraint on ordering of particular layers.

- The "theoretically correct" solution is to make these additional "preserve ordering" dependencies explicit in the topological sort.  However, this will mean that we always create O(n^2) edges for n nodes, which will hurt scalability of the implementation.

- The simpler solution, which I prefer, is simply to preserve the original ordering of the list of layers, when there are no explicit dependencies that require different ordering.  This can be done simply by using a FIFO queue instead of the current removeLast() stack-like behavior.

I'll proceed with the simpler solution unless someone finds a good reason to do the theoretically correct one =)

I already verified this will fix
  http://code.google.com/p/chromium/issues/detail?id=89955

It might also address
  http://code.google.com/p/chromium/issues/detail?id=99564
  http://code.google.com/p/chromium/issues/detail?id=102943
  http://code.google.com/p/chromium/issues/detail?id=102663
Comment 1 Shawn Singh 2011-12-21 17:17:02 PST
Created attachment 120246 [details]
includes fix and appropriate unit test
Comment 2 James Robinson 2011-12-21 17:19:16 PST
Does this fix the issues in the linked crbugs?
Comment 3 WebKit Review Bot 2011-12-21 17:20:26 PST
Attachment 120246 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/update-webkit']" exit_code: 9

Updating OpenSource
First, rewinding head to replay your work on top of it...
Applying: Inform the scrolling coordinator when scrollbar layers come and go
Using index info to reconstruct a base tree...
<stdin>:474806: trailing whitespace.
        [Chromium] DatabaseTrackerChromium: iterating DatabaseSet races with Database disposal on worker thread 
<stdin>:474827: trailing whitespace.
        Nothing to test, just removing redundant code. Correct behavior tested by 
<stdin>:475346: trailing whitespace.
    
warning: 3 lines add whitespace errors.
Falling back to patching base and 3-way merge...
warning: too many files (created: 167528 deleted: 3), skipping inexact rename detection
Auto-merging LayoutTests/ChangeLog
CONFLICT (content): Merge conflict in LayoutTests/ChangeLog
Auto-merging LayoutTests/platform/chromium/test_expectations.txt
CONFLICT (content): Merge conflict in LayoutTests/platform/chromium/test_expectations.txt
Auto-merging Source/WebCore/ChangeLog
CONFLICT (content): Merge conflict in Source/WebCore/ChangeLog
Auto-merging Source/WebCore/page/ScrollingCoordinator.h
CONFLICT (content): Merge conflict in Source/WebCore/page/ScrollingCoordinator.h
Auto-merging Source/WebCore/page/mac/ScrollingCoordinatorMac.mm
CONFLICT (content): Merge conflict in Source/WebCore/page/mac/ScrollingCoordinatorMac.mm
Auto-merging Source/WebCore/platform/ScrollView.cpp
Auto-merging Source/WebCore/rendering/RenderLayerCompositor.cpp
CONFLICT (content): Merge conflict in Source/WebCore/rendering/RenderLayerCompositor.cpp
Auto-merging Source/WebKit2/ChangeLog
CONFLICT (content): Merge conflict in Source/WebKit2/ChangeLog
Auto-merging Tools/ChangeLog
CONFLICT (content): Merge conflict in Tools/ChangeLog
Auto-merging Tools/Scripts/build-webkit
Auto-merging Tools/Scripts/webkitdirs.pm
CONFLICT (content): Merge conflict in Tools/Scripts/webkitdirs.pm
Failed to merge in the changes.
Patch failed at 0001 Inform the scrolling coordinator when scrollbar layers come and go

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase --skip".
To restore the original branch and stop rebasing run "git rebase --abort".

rebase refs/remotes/origin/master: command returned error: 1

Died at Tools/Scripts/update-webkit line 158.


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Shawn Singh 2011-12-21 17:38:53 PST
Comment on attachment 120246 [details]
includes fix and appropriate unit test

(In reply to comment #2)
> Does this fix the issues in the linked crbugs?

It does fix it for http://code.google.com/p/chromium/issues/detail?id=89955

I'll check the other issues soon, and let you know.  Will also update to tip of tree and submit new patch.
Comment 5 Shawn Singh 2011-12-21 18:08:46 PST
Tested on osx, this patch does fix:
  http://code.google.com/p/chromium/issues/detail?id=89955
  http://code.google.com/p/chromium/issues/detail?id=99564

It does not fix:
  http://code.google.com/p/chromium/issues/detail?id=102943
  http://code.google.com/p/chromium/issues/detail?id=102663

From the problems I see on those two issues, it looks like those remaining two issues also need an additional bugfix separate from this one.
Comment 6 James Robinson 2011-12-21 18:13:05 PST
Awesome news. When you get a new patch ready, I think this looks like a good thing for Vangelis to check over.
Comment 7 Shawn Singh 2011-12-21 18:31:32 PST
Created attachment 120255 [details]
updated to tip of tree, removed one inconsistent comment
Comment 8 Ravi Phaneendra Kasibhatla 2011-12-21 21:28:28 PST
(In reply to comment #6)
> Awesome news. When you get a new patch ready, I think this looks like a good thing for Vangelis to check over.

I verified couple of other issues along with http://code.google.com/p/chromium/issues/detail?id=89955 which are related with preserve-3d property but they are not getting fixed with this patch.

I am attaching the sample html pages which are still reproducible with the patch.
Comment 9 Ravi Phaneendra Kasibhatla 2011-12-21 21:29:31 PST
Created attachment 120266 [details]
Sample page with 3d transformations having preserver-3d property.
Comment 10 Ravi Phaneendra Kasibhatla 2011-12-21 21:33:37 PST
Created attachment 120267 [details]
Another sample page with preserve-3d property and 3d transformations
Comment 11 Shawn Singh 2011-12-21 21:57:06 PST
(In reply to comment #8)
> (In reply to comment #6)
> > Awesome news. When you get a new patch ready, I think this looks like a good thing for Vangelis to check over.
> 
> I verified couple of other issues along with http://code.google.com/p/chromium/issues/detail?id=89955 which are related with preserve-3d property but they are not getting fixed with this patch.
> 
> I am attaching the sample html pages which are still reproducible with the patch.

Ravi, can you please clarify, the original test case on 89955 does get fixed properly with this patch?

I will double-check, but I expect these other tests will need to be fixed in a separate bug that fixes z-ordering tie-breakers (which I did not change on this patch).

Thanks again for bringing up these additional cases!
Comment 12 Ravi Phaneendra Kasibhatla 2011-12-21 22:22:11 PST
(In reply to comment #11)
> (In reply to comment #8)
> > (In reply to comment #6)
> > > Awesome news. When you get a new patch ready, I think this looks like a good thing for Vangelis to check over.
> > 
> > I verified couple of other issues along with http://code.google.com/p/chromium/issues/detail?id=89955 which are related with preserve-3d property but they are not getting fixed with this patch.
> > 
> > I am attaching the sample html pages which are still reproducible with the patch.
> 
> Ravi, can you please clarify, the original test case on 89955 does get fixed properly with this patch?
Yes the test case of 89955 is fixed with the patch.
> 
> I will double-check, but I expect these other tests will need to be fixed in a separate bug that fixes z-ordering tie-breakers (which I did not change on this patch).
I added the new tests to this bug because 92720 was duped to 89955. Since you are planning to re-open 92720 after verification (as per last comments on 89955 and 92720) this patch is fine with me.
> 
> Thanks again for bringing up these additional cases!
Comment 13 Vangelis Kokkevis 2011-12-22 10:02:30 PST
Comment on attachment 120255 [details]
updated to tip of tree, removed one inconsistent comment

View in context: https://bugs.webkit.org/attachment.cgi?id=120255&action=review

That's a great find, Shawn!

I suspect that part of the problem is also the usage of m_zRange.  It's value is calculated based on the max difference between all the Z values found in the group of layers (line 325).  However, if all the layers to be sorted are on the same plane then m_zRange = 0 which means that we could mistakenly create dependencies between them. I think we should specify the exit threshold in line 263 as something like max(m_range * 0.01 , 0.01) .

> Source/WebCore/platform/graphics/chromium/cc/CCLayerSorter.cpp:432
> +            // z-index/layout ordering). To preserve this ordering, we use

You should probably rework this comment to not mention Vector::removeLast() as this reference only makes sense in the context of a diff with the previous implementation. It should suffice to say that "to preserve this ordering we process the nodes in the order they were stored with" or something like that.
Comment 14 Shawn Singh 2011-12-22 18:22:49 PST
(In reply to comment #13)
> (From update of attachment 120255 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=120255&action=review
> 
> That's a great find, Shawn!
> 
> I suspect that part of the problem is also the usage of m_zRange.  It's value is calculated based on the max difference between all the Z values found in the group of layers (line 325).  However, if all the layers to be sorted are on the same plane then m_zRange = 0 which means that we could mistakenly create dependencies between them. I think we should specify the exit threshold in line 263 as something like max(m_range * 0.01 , 0.01) .

After more investigation:

Thanks to these additional test cases and Vangelis's insight...  the next problem to address is definitely incorrect dependencies that get added due to numerical error.  But it turns out its not because of exitThreshold.  The exitThreshold only saves computation once the difference is large enough.  but incorrect dependencies are still created whenever zDiff is not exactly zero.

The fix is to add an epsilon in CCLayerSorter::checkOverlap (if zDiff > epsilon, if zDiff < epsilon).

I verified this solves Ravi's useful test cases.  However, this is a little bit fragile - there could always be the rare website that needs a different epsilon to work correctly.

Vangelis do you think there exists a more robust fix?  I can think of revisiting the triangle-triangle test code, to see if we can compute what is in front of what in a more numerically robust way, but I don't know that code yet, so I'm not convinced it will ultimately solve the problem.

By the way, this epsilon fix does not fix http://code.google.com/p/chromium/issues/detail?id=102943 or http://code.google.com/p/chromium/issues/detail?id=102663 ...  I think those pages might not even be triggering the layerSorter.
Comment 15 Vangelis Kokkevis 2011-12-26 23:50:59 PST
(In reply to comment #14)
> (In reply to comment #13)
> > (From update of attachment 120255 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=120255&action=review
> > 
> > That's a great find, Shawn!
> > 
> > I suspect that part of the problem is also the usage of m_zRange.  It's value is calculated based on the max difference between all the Z values found in the group of layers (line 325).  However, if all the layers to be sorted are on the same plane then m_zRange = 0 which means that we could mistakenly create dependencies between them. I think we should specify the exit threshold in line 263 as something like max(m_range * 0.01 , 0.01) .
> 
> After more investigation:
> 
> Thanks to these additional test cases and Vangelis's insight...  the next problem to address is definitely incorrect dependencies that get added due to numerical error.  But it turns out its not because of exitThreshold.  The exitThreshold only saves computation once the difference is large enough.  but incorrect dependencies are still created whenever zDiff is not exactly zero.
> 
> The fix is to add an epsilon in CCLayerSorter::checkOverlap (if zDiff > epsilon, if zDiff < epsilon).
> 
> I verified this solves Ravi's useful test cases.  However, this is a little bit fragile - there could always be the rare website that needs a different epsilon to work correctly.
> 
> Vangelis do you think there exists a more robust fix?  I can think of revisiting the triangle-triangle test code, to see if we can compute what is in front of what in a more numerically robust way, but I don't know that code yet, so I'm not convinced it will ultimately solve the problem.

I think the epsilon approach is our best bet at the moment.  With floating point math there will always be numerical issues around testing for equality.  We could presumably also scale the epsilon value by the total z scale in the scene but cap it to some fairly low value (e.g. 0.001).  Out of curiosity, for the examples we fail, what's the magnitude of the zDiff? 

> 
> By the way, this epsilon fix does not fix http://code.google.com/p/chromium/issues/detail?id=102943 or http://code.google.com/p/chromium/issues/detail?id=102663 ...  I think those pages might not even be triggering the layerSorter.
Comment 16 Shawn Singh 2012-01-03 14:17:36 PST
> I think the epsilon approach is our best bet at the moment.  With floating point math there will always be numerical issues around testing for equality.  We could presumably also scale the epsilon value by the total z scale in the scene but cap it to some fairly low value (e.g. 0.001).  Out of curiosity, for the examples we fail, what's the magnitude of the zDiff? 

zDiff is on the order of +/-  .0001, however, I would expect that this magnitude changes based on how error accumulates in the transforms for the layer.

I'm submitting a new patch soon that includes both fixes.
Comment 17 Shawn Singh 2012-01-03 14:34:28 PST
Created attachment 121001 [details]
includes both bugfixes
Comment 18 Shawn Singh 2012-01-03 15:07:36 PST
Created attachment 121004 [details]
merged the existing exitThreshold and new nonZeroThreshold
Comment 19 Vangelis Kokkevis 2012-01-03 16:49:22 PST
Comment on attachment 121004 [details]
merged the existing exitThreshold and new nonZeroThreshold

View in context: https://bugs.webkit.org/attachment.cgi?id=121004&action=review

> Source/WebCore/platform/graphics/chromium/cc/CCLayerSorter.cpp:266
> +    float nonZeroThreshold = max(m_zRange * 0.001, 0.001);

Do we need to lower the range multiplier to 0.001 from 0.01 ?   I realize that this is all somewhat inaccurate but I'm curious if it's necessary to change the existing value in order to solve the case where m_zRange == 0.0 (which is the source of the issue here).
Comment 20 Shawn Singh 2012-01-03 17:24:17 PST
Comment on attachment 121004 [details]
merged the existing exitThreshold and new nonZeroThreshold

View in context: https://bugs.webkit.org/attachment.cgi?id=121004&action=review

>> Source/WebCore/platform/graphics/chromium/cc/CCLayerSorter.cpp:266
>> +    float nonZeroThreshold = max(m_zRange * 0.001, 0.001);
> 
> Do we need to lower the range multiplier to 0.001 from 0.01 ?   I realize that this is all somewhat inaccurate but I'm curious if it's necessary to change the existing value in order to solve the case where m_zRange == 0.0 (which is the source of the issue here).

Discussed offline with Vangelis.  It wasn't necessary to change the order of magnitude, and to minimize the change this patch incurs, I'll submit a new patch right now that doesn't change the original exitThreshold.
Comment 21 Shawn Singh 2012-01-03 17:34:28 PST
Created attachment 121026 [details]
un-merged exitThreshold and nonZeroThreshold again
Comment 22 Vangelis Kokkevis 2012-01-03 18:40:52 PST
Comment on attachment 121026 [details]
un-merged exitThreshold and nonZeroThreshold again

Looks good! Thanks, Shawn.
Comment 23 James Robinson 2012-01-03 19:18:06 PST
Comment on attachment 121026 [details]
un-merged exitThreshold and nonZeroThreshold again

R=me based on Vangelis' unofficial review
Comment 24 WebKit Review Bot 2012-01-03 19:37:00 PST
Comment on attachment 121026 [details]
un-merged exitThreshold and nonZeroThreshold again

Clearing flags on attachment: 121026

Committed r104002: <http://trac.webkit.org/changeset/104002>
Comment 25 WebKit Review Bot 2012-01-03 19:37:06 PST
All reviewed patches have been landed.  Closing bug.
Comment 26 Ravi Phaneendra Kasibhatla 2012-01-03 21:25:25 PST
(In reply to comment #24)
> (From update of attachment 121026 [details])
> Clearing flags on attachment: 121026
> 
> Committed r104002: <http://trac.webkit.org/changeset/104002>

Thanks Shawn for the fix. It helps us a lot. :)