Bug 109591

Summary: Fix hasOutOfFlowPositionedDescendant
Product: WebKit Reporter: vollick
Component: WebCore Misc.Assignee: Glenn Hartmann <hartmanng>
Status: NEW ---    
Severity: Normal CC: dglazkov, eric, esprehn+autocc, hartmanng, ojan.autocc, simon.fraser, vollick, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 109966    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
webkit.review.bot: commit-queue-
Archive of layout-test-results from gce-cr-linux-05 for chromium-linux-x86_64 none

Description vollick 2013-02-12 08:57:36 PST
Currently, we do not check visibility when setting the m_hasOutOfFlowPositionedDescendant flag, but we should. We should also dirty the flag when visibility changes.
Comment 1 vollick 2013-02-12 09:03:09 PST
Created attachment 187880 [details]
Patch
Comment 2 Simon Fraser (smfr) 2013-02-12 09:16:39 PST
Comment on attachment 187880 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=187880&action=review

> Source/WebCore/ChangeLog:10
> +        Currently, we do not check visibility when setting the
> +        m_hasOutOfFlowPositionedDescendant flag, but we should. With this flag
> +        we do. We also dirty the flag when visibility changes.

I'm a little concerned that we're spending time maintaining the m_hasOutOfFlowPositionedDescendant when most of the time it's unused.

> Source/WebCore/rendering/RenderLayer.cpp:1010
> +            bool childIsOutOfFlowPositioned = child->renderer() && child->renderer()->isOutOfFlowPositioned() && child->isVisuallyNonEmpty();

isVisuallyNonEmpty() is a bit expensive to call here. I think you should just check the visibility flags (hasVisibleContent/hasVisibleDescendant).

> LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:2
> +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> +   "http://www.w3.org/TR/html4/loose.dtd">

Just <!DOCTYPE html> please

> LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:4
> +<html lang="en">

No need for lang.

> LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:6
> +  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

No need for this.

> LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:64
> +        var layerTree = window.internals.layerTreeAsText(document);

Doesn't layerTreeAsText() update layout already?
Comment 3 vollick 2013-02-12 21:08:32 PST
Created attachment 188002 [details]
Patch

(In reply to comment #2)
> (From update of attachment 187880 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=187880&action=review
>
> > Source/WebCore/ChangeLog:10
> > +        Currently, we do not check visibility when setting the
> > +        m_hasOutOfFlowPositionedDescendant flag, but we should. With this flag
> > +        we do. We also dirty the flag when visibility changes.
>
> I'm a little concerned that we're spending time maintaining the m_hasOutOfFlowPositionedDescendant when most of the time it's unused.
I've updated the code so we do much less work in updateDescendantDependentFlags if the opt-in feature is disabled. This seemed cleaner than doing the check at every spot where we muck with the dirty bit for this property. Is this better?
>
> > Source/WebCore/rendering/RenderLayer.cpp:1010
> > +            bool childIsOutOfFlowPositioned = child->renderer() && child->renderer()->isOutOfFlowPositioned() && child->isVisuallyNonEmpty();
>
> isVisuallyNonEmpty() is a bit expensive to call here. I think you should just check the visibility flags (hasVisibleContent/hasVisibleDescendant).
Done.
>
> > LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:2
> > +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> > +   "http://www.w3.org/TR/html4/loose.dtd">
>
> Just <!DOCTYPE html> please
Fixed.
>
> > LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:4
> > +<html lang="en">
>
> No need for lang.
Removed.
>
> > LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:6
> > +  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
>
> No need for this.
Removed.
>
> > LayoutTests/compositing/overflow/do-not-opt-in-with-out-of-flow-descendant.html:64
> > +        var layerTree = window.internals.layerTreeAsText(document);
>
> Doesn't layerTreeAsText() update layout already?
Yep. Removed.
Comment 4 vollick 2013-03-13 08:09:32 PDT
Comment on attachment 188002 [details]
Patch

Need to ditch this approach. Glenn and I discussed the following approach which seems like the right way forward.
 - get rid of m_hasOutOfFlowEtcEtc
 - instead, on the fly in updateNeedsCompositedEtc, we'll
   - check if we have a descendant
   - if so, check if its containing block is us
   - if so, we form a containing block and don't need to worry about out-of-flow positioned descendants
   - if not, go through the list of positioned descendants maintained in RenderBlock and if any of them have containing block that is an ancestor of us, we can't opt in.
Comment 5 vollick 2013-03-15 10:29:36 PDT
(In reply to comment #4)
> (From update of attachment 188002 [details])
> Need to ditch this approach. Glenn and I discussed the following approach which seems like the right way forward.
>  - get rid of m_hasOutOfFlowEtcEtc
>  - instead, on the fly in updateNeedsCompositedEtc, we'll
>    - check if we have a descendant
>    - if so, check if its containing block is us
>    - if so, we form a containing block and don't need to worry about out-of-flow positioned descendants
>    - if not, go through the list of positioned descendants maintained in RenderBlock and if any of them have containing block that is an ancestor of us, we can't opt in.

Glenn pointed out that this won't work. According to the spec
(http://www.w3.org/TR/CSS2/visudet.html), the position attribute of an element
can change which element forms its containing block. This means that the
question "is this layer a containing block?" doesn't actually make any sense; it
depends on the descendant. So the revised (but naive) plan is:

1
Determine-if-we-have-an-out-of-flow-positioned-descendant-whose-containing-block-is-not-also-our-descendant:
2  for each ancestor that could be a containing block for our descendants*:
3    for each of its positioned descendants**
4      if the positioned descendant is also our descendant, don't opt in.

* It turns out that there are at most two of these ancestors we need to care
about, because:
    a) We will always opt out if we have fixed-pos descendants (this will be
dealt with in a separate, blocking bug), and 
    b) We trivially opt in if we're the root layer.
  So, according to the containing block definition in the spec
(http://www.w3.org/TR/CSS2/visudet.html) we only care about conditions 2 and 4.
That is, the containing block for:
    a) relative and static positioned descendants, and
    b) absolute positioned descendants.
  Also, if either of these containing blocks turn out to be us, then we don't
enter the 'for each' loop at line 2 of the algorithm; we already know we will
clip those descendants.

** Conveniently, RenderBlocks already maintain a list of positioned descendants
for us to consult.

If we use the above algorithm in updateNeedsCompositedScrolling, I believe we
are guaranteed to be correct. But this function gets called a LOT, and the
algorithm's complexity is O(n*d), where n = # of positioned descendants to
consider, d = tree height (because we have to check if a layer is a descendant
of another). I believe that the algorithm as presented here is a non starter --
it's just too expensive -- but I think that there are ways to speed it up
considerably.

First, we could dramatically reduce n if on line 3 of the algorithm we only had
to process those positioned descendants that had been added or removed from the
list since we last checked. And this isn't too hard to accomplish. In
RenderBlock.cpp, alongside gPositionedDescendantMap, we could store a list of
deltas. Every time we modify gPositionedDescendantMap, we can add a delta to the
list, and on line 3 of the algorithm we only need to process those deltas we
haven't seen yet.

There are two problems with the deltas approach:
  1. The list will grow without bound.
    A simple way to deal with this is to put a small limit on the number of
deltas we'll keep and clear the list periodically. If we keep track of the
number of times the list has been cleared, we'll be able to tell that on line 3
of the algorithm we have to fallback to processing the whole list again. This
saves us from having to push a notification to all layers that might care; we'll
essentially be pulling the 'freshness token' only for layers that could opt in.
At first we can keep it simple and clear the list of deltas whenever it passes
the size threshold, but in future, we could be smart and only clear when we go
idle.
  2. If a 'problematic' layer is removed in one of the deltas, how do we tell if
we can opt in?
    We're obviously sunk if we just store a boolean
(m_hasOutOfFlowPositionDescendant, say), but if instead we keep track of all
positioned descendants that are preventing us from opting in (in a global map,
just like in RenderBlock), we know it's safe to opt in whenever this collection
is empty.

Here is the revised algorithm:

1
Determine-if-we-have-an-out-of-flow-positioned-descendant-whose-containing-block-is-not-also-our-descendant:
2   for each ancestor that could be a containing block for our descendants:
3     if ancestor has cleared its delta list
4       clear our collection, P, of 'problematic' descendants
5       for each of its positioned descendants
6         if the positioned descendant is also our descendant, add it to P
7     else
8       for each unprocessed delta:
9         if delta.layer is our descendant
10          if delta.removal
11            remove the layer from P
12          else
13             add the layer to P
14   return !P.isEmpty

In the expected case, this algorithm is O(m*d), where m is the number of
unprocessed deltas. In the worst case it could be O(n*d), but that should be
very infrequent provided we don't clear the delta lists too often. As the page
stabilizes, m will approach zero. We will need to instrument this function and
measure its cost on heavy pages in the wild, but my intuition is that this will
be inexpensive.

As for storage, this requires O(d + m + s) storage, where 
  * d is all of the 'problematic' descendants we're keeping track of (but
remember, we only take this hit for overflow:scroll divs that cannot opt in.
  * m is the size of the delta lists we're maintaining, and
  * s is all the state we need to hang onto to determine which deltas we have
yet to process and if we need to fallback to worst case. This should be
sizeof(size_t)*2*number-of-candidates-for-opt-in.

We will also need to add instrumentation to allow us keep track of these sizes
(see RenderBlock::reportMemoryUsage).
Comment 6 Simon Fraser (smfr) 2013-03-15 11:09:54 PDT
Is this all really worth the benefit?
Comment 7 vollick 2013-03-15 11:32:13 PDT
(In reply to comment #6)
> Is this all really worth the benefit?

I hear you. This is fairly complex, and I really wish it wasn't. That said, I am entirely convinced that it's worth it. We've seen massive performance gains on the comp-scroll path, both because of the painting we avoid, and also because it allows us to process scroll updates off the main thread.

The more we can opt in, the better.. provided we don't break the web. And I don't think we can guarantee we won't without a reliable check for descendants that could extend beyond our clip. The test I suggested above is most efficient I could think of, but I'd be very eager to hear of any alternatives you might have.

I'd also love to know if I've got any flaws in my assumptions or reasoning above --if you notice any, please pass them along.
Comment 8 vollick 2013-04-02 16:28:07 PDT
Created attachment 196250 [details]
Patch

(In reply to comment #6)
> Is this all really worth the benefit?

Ack, I'm sorry. I think I misunderstood you. I think you were asking if all of
these optimizations were really worth the benefit. Turns out they weren't. I
went ahead and implemented them, but I found that even with the naive approach,
we only spend about 0.3ms in ::hasOutOfFlowPositionedDescendant while loading
gmail on my desktop machine. This simplifies things a lot.

This patch includes the changes for 109966. It doesn't perform correctly without
the fixes in that patch.
Comment 9 WebKit Review Bot 2013-04-03 02:44:49 PDT
Comment on attachment 196250 [details]
Patch

Attachment 196250 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://webkit-commit-queue.appspot.com/results/17330726

New failing tests:
platform/chromium/virtual/gpu/compositedscrolling/overflow/dynamic-composited-scrolling-status.html
compositing/overflow/overflow-clip-with-accelerated-scrolling-ancestor.html
compositing/overflow/content-gains-scrollbars.html
platform/chromium/virtual/gpu/compositedscrolling/scrollbars/scrollbar-iframe-click-does-not-blur-content.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/do-not-paint-outline-into-composited-scrolling-contents.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/overflow-auto-with-touch.html
compositing/overflow/scrolling-without-painting.html
compositing/overflow/paint-neg-z-order-descendants-into-scrolling-contents-layer.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/nested-scrolling.html
fast/events/scale-and-scroll-window.html
fast/loader/text-document-wrapping.html
fast/text/international/hindi-spacing.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/overflow-clip-with-accelerated-scrolling-ancestor.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/textarea-scroll-touch.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/scrolling-without-painting.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/overflow-scrollbar-layers.html
platform/chromium/virtual/softwarecompositing/overflow/image-load-overflow-scrollbars.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/paint-neg-z-order-descendants-into-scrolling-contents-layer.html
compositing/overflow/image-load-overflow-scrollbars.html
platform/chromium/virtual/softwarecompositing/overflow/content-gains-scrollbars.html
compositing/overflow/updating-scrolling-content.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/overflow-overlay-with-touch.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/content-gains-scrollbars.html
fast/preloader/document-write-noscript.html
compositing/overflow/nested-scrolling.html
fast/loader/javascript-url-in-object.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/updating-scrolling-content.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/overflow-auto-with-touch-toggle.html
platform/chromium/virtual/gpu/compositedscrolling/scrollbars/overflow-scrollbar-combinations.html
platform/chromium/virtual/gpu/compositedscrolling/overflow/clip-content-under-overflow-controls.html
Comment 10 WebKit Review Bot 2013-04-03 02:44:53 PDT
Created attachment 196308 [details]
Archive of layout-test-results from gce-cr-linux-05 for chromium-linux-x86_64

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: gce-cr-linux-05  Port: chromium-linux-x86_64  Platform: Linux-3.3.8-gcg-201212281604-x86_64-with-GCEL-10.04-gcel_10.04
Comment 11 Andreas Kling 2014-02-05 11:29:03 PST
Comment on attachment 196250 [details]
Patch

Clearing review flag on patches from before 2014. If this patch is still relevant, please reset the r? flag.