Bug 218859

Summary: REGRESSION(r269579) [WPE] Many tests with scrolling flaky after this revision
Product: WebKit Reporter: Lauro Moura <lmoura>
Component: ScrollingAssignee: Chris Lord <clord>
Status: RESOLVED FIXED    
Severity: Normal CC: bugs-noreply, cgarcia, clord, cmarcelo, darin, ews-watchlist, fred.wang, jamesr, koivisto, kondapallykalyan, luiz, simon.fraser, tonikitoo, webkit-bug-importer, zan
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=214179
Attachments:
Description Flags
Patch
none
Patch
none
Patch none

Description Lauro Moura 2020-11-12 10:58:28 PST
Many tests are flaky with blank images either in the actual.png or expected.png. In some cases, it fails both runs and happen to  be registered as failure.

Link to first test run with the flakies: https://build.webkit.org/builders/WPE-Linux-64-bit-Release-Tests/builds/20967

Testing locally, the issue is reproducible most of the time (>80% here) with the following command:

./Tools/Scripts/run-webkit-tests --verbose --debug-rwt-logging --no-retry-failures --no-new-test-results --no-build --no-show-results --clobber-old-results --display-server=xvfb --results-directory layout-test-results --release --wpe fast/css/sticky/inflow-sticky.html

Also, running only on fast/css/sticky/, the following 3 tests (that use window.scrollTo) fail:

  fast/css/sticky/inflow-sticky.html
  fast/css/sticky/inline-sticky-abspos-child.html
  fast/css/sticky/sticky-both-sides.html

One complication is that it seems to depend on the hardware, as Chris could not reproduce it on his desktop but only on his laptop.
Comment 1 Chris Lord 2020-11-13 02:49:24 PST
Just to add a note that I'm working on this - I've narrowed it down to the layer UpdateScope that gets used in the Frame and Overflow scrolling nodes causing some kind of either deadlock or oscillation, but I've not been able to pin down exactly what's happening due to the difficulty in reproducing and its lack of effect in anything except testing situations (at least that I've found so far). I've been trying various fixes and work-arounds, but haven't had any success so far.
Comment 2 Chris Lord 2020-11-13 02:53:48 PST
Also a note, the reproduction instructions in comment one don't actually apply on my laptop (where I can reproduce this error). If I do 5 iterations of fast/css, I get a variable number of image-only failures and fast/css/pseudo-valid-fieldset.html gets highlighted as unexpected flakiness consistently. If I run that test in isolation, even for 50 iterations, it succeeds. All of the tests mentioned in the first comment succeed for me. 5 iterations might be overkill, I might be able to get away with 2 or 3. On my desktop, all tests pass fine even with 10 iterations.
Comment 3 Chris Lord 2020-11-16 03:09:06 PST
Created attachment 414214 [details]
Patch
Comment 4 Chris Lord 2020-11-16 03:14:06 PST
I believe this was happening due to updating the layer scroll node ID without flushing composition state and thus breaking the mapping of layer->scroll node. There was no actual deadlock (at least that I could find), but rather a race.

This doesn't affect GTK because GTK doesn't run tests with async composition enabled, afaik. This also, as far as I can tell, doesn't have any visually noticeable effect in practical use, I think the problem mostly just affected test results.
Comment 5 Simon Fraser (smfr) 2020-11-16 08:33:34 PST
Comment on attachment 414214 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414214&action=review

> Source/WebCore/page/scrolling/nicosia/ScrollingCoordinatorNicosia.cpp:107
> +                if (auto* frameView = frameViewForScrollingNode(nodeID))
> +                    frameView->flushCompositingStateIncludingSubframes();

This should not be necessary here. This is called from compositing updates, so we're already going to flush.
Comment 6 Chris Lord 2020-11-16 08:35:31 PST
(In reply to Simon Fraser (smfr) from comment #5)
> Comment on attachment 414214 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414214&action=review
> 
> > Source/WebCore/page/scrolling/nicosia/ScrollingCoordinatorNicosia.cpp:107
> > +                if (auto* frameView = frameViewForScrollingNode(nodeID))
> > +                    frameView->flushCompositingStateIncludingSubframes();
> 
> This should not be necessary here. This is called from compositing updates,
> so we're already going to flush.

That's even better, so this patch, minus this, is valid? I couldn't trace through enough to really understand if this was a situation where a flush was guaranteed or not.
Comment 7 Simon Fraser (smfr) 2020-11-16 08:41:38 PST
(In reply to Chris Lord from comment #6)
> (In reply to Simon Fraser (smfr) from comment #5)
> > Comment on attachment 414214 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=414214&action=review
> > 
> > > Source/WebCore/page/scrolling/nicosia/ScrollingCoordinatorNicosia.cpp:107
> > > +                if (auto* frameView = frameViewForScrollingNode(nodeID))
> > > +                    frameView->flushCompositingStateIncludingSubframes();
> > 
> > This should not be necessary here. This is called from compositing updates,
> > so we're already going to flush.
> 
> That's even better, so this patch, minus this, is valid? I couldn't trace
> through enough to really understand if this was a situation where a flush
> was guaranteed or not.

Any changes to a ScrollingState* node will schedule a flush.
Comment 8 Simon Fraser (smfr) 2020-11-16 08:44:45 PST
Comment on attachment 414214 [details]
Patch

Actually this patch is wrong :)
Comment 9 Simon Fraser (smfr) 2020-11-16 08:48:57 PST
Accessing platformLayers from ScrollingCoordinatorNicosia code seems wrong.

On macOS, the scrolling tree nodeIDs are accessed on the scrolling thread by going from CALayer -> PlatformCALayer. Can Nicosia do something similar?
Comment 10 Zan Dobersek 2020-11-16 08:57:46 PST
Yes, it can. Node ID assignments should be handled through GraphicsLayer.
Comment 11 Chris Lord 2020-11-17 02:36:38 PST
Created attachment 414326 [details]
Patch
Comment 12 Chris Lord 2020-11-17 02:37:59 PST
Ok, that was much simpler once I knew where to look...
Comment 13 Chris Lord 2020-11-17 08:52:07 PST
Created attachment 414346 [details]
Patch
Comment 14 EWS 2020-11-17 09:38:50 PST
Committed r269909: <https://trac.webkit.org/changeset/269909>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 414346 [details].
Comment 15 Radar WebKit Bug Importer 2020-11-17 09:39:18 PST
<rdar://problem/71490175>