Bug 176243

Summary: requestAnimationFrame causes bad location of position:fixed inside overflow:auto and iframe
Product: WebKit Reporter: Frédéric Wang (:fredw) <fred.wang>
Component: FramesAssignee: Ali Juma <ajuma>
Status: RESOLVED FIXED    
Severity: Normal CC: ajuma, cmarcelo, commit-queue, dvoytenko, ews-watchlist, jamesr, luiz, simon.fraser, tonikitoo, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: iPhone / iPad   
OS: All   
Bug Depends on:    
Bug Blocks: 154399    
Attachments:
Description Flags
Testcase
none
Alternative testcase
none
Experimental patch
none
Patch
none
Patch for landing none

Description Frédéric Wang (:fredw) 2017-09-01 13:04:04 PDT
Created attachment 319637 [details]
Testcase

The attached testcase is essentially attachment 316551 [details] + some regular calls to requestAnimationFrame(). After r220261, the flickering is gone, but we still see some "big jumps" when scrolling the frame.

This issue disappears if we remove these requestAnimationFrame() calls or reduce their frequency. It looks like these requestAnimationFrame might cause some out-of-sync scroll coordinates between the web and UI process.
Comment 1 Radar WebKit Bug Importer 2017-09-01 13:47:53 PDT
<rdar://problem/34214708>
Comment 2 Frédéric Wang (:fredw) 2017-11-13 08:28:19 PST
Created attachment 326761 [details]
Alternative testcase

This testcase relies on a more standard use of requestAnimationFrame.
Comment 3 Frédéric Wang (:fredw) 2017-11-13 10:15:48 PST
I did a quick debugging of this and the trace for requestAnimationFrame is something like:

RemoteLayerTreeDrawingArea::flushLayers
 |
(Timer)
 |
RemoteLayerTreeDrawingArea::scheduleCompositingLayerFlush
ScriptedAnimationController::scheduleAnimation
ScriptedAnimationController::registerCallback
Document::requestAnimationFrame
DOMWindow::requestAnimationFrame

where RemoteLayerTreeDrawingArea::flushLayers commit the scrolling state and is actually also called when we perform async scrolling on iOS. So I don't see anything abnormal. Maybe requestAnimationFrame should not commit the scrolling state and let that work be done by the async scrolling code to avoid conflicts?
Comment 4 Frédéric Wang (:fredw) 2017-11-14 08:33:11 PST
Created attachment 326879 [details]
Experimental patch

This patch is a quick dirty hack to force a delayed flush when RemoteLayerTreeDrawingArea::scheduleCompositingLayerFlush is caused by a requestAnimationFrame call. As you can see, it avoids big jumps of the fixed element in the testcase. I wonder whether we should do something like that when user scrolling is happening... @Simon: WDYT?
Comment 5 Simon Fraser (smfr) 2017-11-14 11:12:51 PST
I don't get why that's the right approach to fix this bug. Would need to analyze further.
Comment 6 Frédéric Wang (:fredw) 2018-10-31 12:37:07 PDT
(In reply to Simon Fraser (smfr) from comment #5)
> I don't get why that's the right approach to fix this bug. Would need to
> analyze further.

I just tried the testcase again. The issue is still present in trunk and the experimental patch still allows to workaround it. Any better idea to fix this bug?
Comment 7 Simon Fraser (smfr) 2018-10-31 13:50:45 PDT
I would have to do some debugging to figure it out.
Comment 8 Ali Juma 2018-11-01 12:04:07 PDT
Trying to debug this a bit, there seems to be an issue with the fixed position's layer's position getting clobbered by a layer tree commit. For example:
1) Scroll in the UI Process
1a) ScrollingTreeFixedNode::updateLayersAfterAncestorChange sets layer position's y coordinate to 107
2) Scroll more in the UI Process
2b) ScrollingTreeFixedNode::updateLayersAfterAncestorChange sets layer position's y coordinate to 115
3) WebProcess flushes layers. The most recent scroll that the WebProcess is aware of is (1), so the GraphicsLayer tree is updated to reflect that state.
4) A new layer tree and scroll tree is committed to the UI Process, and the layer position's y coordinate gets set back to 107

At this point, the fixed position's layer's position is wrong, since it's based on the first scroll rather than the most recent one. It won't get corrected until we scroll again in the UI Process. If we draw to the screen before that, we'll draw the layer at the wrong position.

If I hackily change ScrollingTreeFixedNode::commitStateBeforeChildren to restore the old layer position, the fixed-position layer stays perfectly in place.

It seems like we need something like the logic in ScrollingTreeFixedNode::updateLayersAfterAncestorChange to run whenever we commit a new layer tree and scroll tree, to account for any scrolling that's happened in the UIProcess that the WebProcess wasn't yet aware of when it last updated the layer tree.

It's still not completely clear why rAF makes this bug so much worse though.
Comment 9 Dima Voytenko 2018-11-01 12:46:08 PDT
To note: while the `position:fixed` gotten a lot better, it still relatively easy to trigger jumping even without the rAF.
Comment 10 Ali Juma 2018-11-02 07:04:15 PDT
(In reply to Ali Juma from comment #8)
>
> It seems like we need something like the logic in
> ScrollingTreeFixedNode::updateLayersAfterAncestorChange to run whenever we
> commit a new layer tree and scroll tree, to account for any scrolling that's
> happened in the UIProcess that the WebProcess wasn't yet aware of when it
> last updated the layer tree.

It turns out we *do* have logic for this, but it's broken for scrollers that are descendants of iframes.

Specifically, RemoteLayerTreeDrawingAreaProxy::commitLayerTree checks if there are any fixed or sticky nodes, and if so, it calls viewportChangedViaDelegatedScrolling() to update the positions of fixed and sticky layers. This calls ScrollingTreeFrameScrollingNodeIOS::updateLayersAfterViewportChange on the main frame scroll node, which in turn calls updateLayersAfterAncestorChange on the subframe's scroll node.

And this is where things go wrong -- ScrollingTreeFrameScrollingNode::updateLayersAfterAncestorChange has an empty implementation with a FIXME, and ScrollingTreeFrameScrollingNodeIOS doesn't override this either. So we never recurse into the subframe's scroll nodes.

Implementing ScrollingTreeFrameScrollingNodeIOS::updateLayersAfterAncestorChange fixes the bug. I'll put together a patch.

Any suggestions for how to write a test for this?
Comment 11 Ali Juma 2018-11-02 10:09:29 PDT
Created attachment 353708 [details]
Patch
Comment 12 Simon Fraser (smfr) 2018-11-02 10:15:14 PDT
Comment on attachment 353708 [details]
Patch

Very nice, would be even better with a test!
Comment 13 Ali Juma 2018-11-02 12:53:48 PDT
Created attachment 353724 [details]
Patch for landing

Add missing null check
Comment 14 Ali Juma 2018-11-02 13:01:19 PDT
(In reply to Simon Fraser (smfr) from comment #12)
> Comment on attachment 353708 [details]
> Patch
> 
> Very nice, would be even better with a test!

I tried to think of a good way to test this, but I think we'd need something similar to UIScriptController's immediateScrollToOffset, but for overflow scrolling, along with plumbing to dump the UIProcess' layer tree.
Comment 15 WebKit Commit Bot 2018-11-02 15:07:27 PDT
Comment on attachment 353724 [details]
Patch for landing

Clearing flags on attachment: 353724

Committed r237754: <https://trac.webkit.org/changeset/237754>
Comment 16 WebKit Commit Bot 2018-11-02 15:07:29 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 Frédéric Wang (:fredw) 2018-11-03 04:24:55 PDT
(In reply to Ali Juma from comment #10)
> (In reply to Ali Juma from comment #8)
> >
> > It seems like we need something like the logic in
> > ScrollingTreeFixedNode::updateLayersAfterAncestorChange to run whenever we
> > commit a new layer tree and scroll tree, to account for any scrolling that's
> > happened in the UIProcess that the WebProcess wasn't yet aware of when it
> > last updated the layer tree.
> 
> It turns out we *do* have logic for this, but it's broken for scrollers that
> are descendants of iframes.
> 
> Specifically, RemoteLayerTreeDrawingAreaProxy::commitLayerTree checks if
> there are any fixed or sticky nodes, and if so, it calls
> viewportChangedViaDelegatedScrolling() to update the positions of fixed and
> sticky layers. This calls
> ScrollingTreeFrameScrollingNodeIOS::updateLayersAfterViewportChange on the
> main frame scroll node, which in turn calls updateLayersAfterAncestorChange
> on the subframe's scroll node.
> 
> And this is where things go wrong --
> ScrollingTreeFrameScrollingNode::updateLayersAfterAncestorChange has an
> empty implementation with a FIXME, and ScrollingTreeFrameScrollingNodeIOS
> doesn't override this either. So we never recurse into the subframe's scroll
> nodes.
> 
> Implementing
> ScrollingTreeFrameScrollingNodeIOS::updateLayersAfterAncestorChange fixes
> the bug. I'll put together a patch.
> 
> Any suggestions for how to write a test for this?

It seems I had implemented it for subframes in my patch for bug 173833 but had left it unimplemented for the main frame so probably I did not see any improvements on this bug. Thanks for spending time on this Ali!