Bug 160450 - [GTK][Threaded Compositor] Several flaky tests due to differences in scrollbars
Summary: [GTK][Threaded Compositor] Several flaky tests due to differences in scrollbars
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Local Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk, LayoutTestFailure
Depends on:
Blocks:
 
Reported: 2016-08-02 06:27 PDT by Carlos Garcia Campos
Modified: 2016-08-26 06:53 PDT (History)
5 users (show)

See Also:


Attachments
Speculative fix (6.57 KB, patch)
2016-08-03 00:58 PDT, Carlos Garcia Campos
mcatanzaro: review+
cgarcia: commit-queue-
Details | Formatted Diff | Diff
Simplified test case (149 bytes, text/html)
2016-08-07 23:09 PDT, Zan Dobersek
no flags Details
Patch (3.91 KB, patch)
2016-08-25 05:51 PDT, Carlos Garcia Campos
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Garcia Campos 2016-08-02 06:27:08 PDT
We still have a lot of flaky tests since we switched to the threaded compositor. I can't reproduce them though. I've seen several reftests failing because there's a part of the vertical scrollbar that is not rendered. It turns out that the part not rendered is the second tile of the scrollbar layer, that for some reason it seems to be rendered later. So, my guess is that we force a repaint that when finishes hasn't actually painted all the tiles. Then the UI process takes the screenshot too early, before the second tile of the scrollbar layer is actually painted. I took advantage of the DRI3 bug, to see how things happen when everything is slow, and that's exactly what happens, everything is rendered in one pass and then the second tile of the scrollbar layer. See:

http://people.igalia.com/cgarcia/webkitgtk-scrollbar-layer.mp4

I'm not sure how to fix it, because I don't know why that happens and I can't really reproduce the flakiness, but we could try to wait for more damage events in the UI process before taking the screenshot.
Comment 1 Carlos Garcia Campos 2016-08-03 00:58:16 PDT
Created attachment 285207 [details]
Speculative fix
Comment 2 WebKit Commit Bot 2016-08-03 01:01:35 PDT
Attachment 285207 [details] did not pass style-queue:


ERROR: Source/WebKit2/UIProcess/DrawingAreaProxyImpl.cpp:353:  Extra space before ( in function call  [whitespace/parens] [4]
ERROR: Source/WebKit2/UIProcess/DrawingAreaProxyImpl.h:94:  Extra space before ( in function call  [whitespace/parens] [4]
ERROR: Source/WebKit2/UIProcess/DrawingAreaProxyImpl.h:104:  Extra space before ( in function call  [whitespace/parens] [4]
Total errors found: 3 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Michael Catanzaro 2016-08-05 10:14:36 PDT
Comment on attachment 285207 [details]
Speculative fix

View in context: https://bugs.webkit.org/attachment.cgi?id=285207&action=review

> Source/WebKit2/UIProcess/DrawingAreaProxyImpl.cpp:359
> +#if PLATFORM(GTK)
> +    g_signal_connect_swapped(m_webPage.viewWidget(), "draw", reinterpret_cast<GCallback>(webViewDrawCallback), this);
> +    m_timer.startOneShot(1);

So we have a DrawingMonitor class that doesn't actually monitor drawing unless PLATFORM(GTK). Consider guarding the entire class and putting an #ifdef in DrawingAreaProxyImpl::dispatchAfterEnsuringDrawing so it's not used at all on other platforms.

> Source/WebKit2/UIProcess/DrawingAreaProxyImpl.cpp:382
> +    // We wait up to 1 second for draw events. If there are several draw events queued quickly,
> +    // we want to wait until all of them have been processed, so after receiving a draw, we wait
> +    // up to 100ms for the next one or stop.

I have no clue if this is right or sensible, but OK.

> Source/WebKit2/UIProcess/DrawingAreaProxyImpl.h:89
> +    class DrawingMonitor {
> +        WTF_MAKE_NONCOPYABLE(DrawingMonitor); WTF_MAKE_FAST_ALLOCATED;

Does it really make sense to use WTF_MAKE_FAST_ALLOCATED for a class that will rarely be allocated?

> Source/WebKit2/UIProcess/DrawingAreaProxyImpl.h:136
> +    std::unique_ptr<DrawingMonitor> m_drawingMonitor;

This use of std::unique_ptr appears to be unnecessary, since you're not using it for polymorphism, you don't really need the null state, and you don't appear to need the lazy initialization. You can just keep a DrawingMonitor member, then you don't need to worry about creating it in DrawingAreaProxyImpl::dispatchAfterEnsuringDrawing or checking that it's not null.
Comment 4 Carlos Garcia Campos 2016-08-05 22:57:25 PDT
(In reply to comment #3)
> Comment on attachment 285207 [details]
> Speculative fix
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=285207&action=review

Thanks for the review. I think Zan was looking at the actual cause of the scrollbars drawing delay, so I'll wait until he confirms this is ok.

> > Source/WebKit2/UIProcess/DrawingAreaProxyImpl.cpp:359
> > +#if PLATFORM(GTK)
> > +    g_signal_connect_swapped(m_webPage.viewWidget(), "draw", reinterpret_cast<GCallback>(webViewDrawCallback), this);
> > +    m_timer.startOneShot(1);
> 
> So we have a DrawingMonitor class that doesn't actually monitor drawing
> unless PLATFORM(GTK). Consider guarding the entire class and putting an
> #ifdef in DrawingAreaProxyImpl::dispatchAfterEnsuringDrawing so it's not
> used at all on other platforms.

This file is actually only used by GTK anyway, what is protected by GTK is the actual use of GTK API. 

> > Source/WebKit2/UIProcess/DrawingAreaProxyImpl.cpp:382
> > +    // We wait up to 1 second for draw events. If there are several draw events queued quickly,
> > +    // we want to wait until all of them have been processed, so after receiving a draw, we wait
> > +    // up to 100ms for the next one or stop.
> 
> I have no clue if this is right or sensible, but OK.

Note that this patch is just an experiment to see if flaky tests are reduced in the bots, so I just took those numbers arbitrarily. I didn't want to delay tests for more than 1 second, and if we could finish earlier we do.

> > Source/WebKit2/UIProcess/DrawingAreaProxyImpl.h:89
> > +    class DrawingMonitor {
> > +        WTF_MAKE_NONCOPYABLE(DrawingMonitor); WTF_MAKE_FAST_ALLOCATED;
> 
> Does it really make sense to use WTF_MAKE_FAST_ALLOCATED for a class that
> will rarely be allocated?

I don't know, I guess it's harmless.

> > Source/WebKit2/UIProcess/DrawingAreaProxyImpl.h:136
> > +    std::unique_ptr<DrawingMonitor> m_drawingMonitor;
> 
> This use of std::unique_ptr appears to be unnecessary, since you're not
> using it for polymorphism, you don't really need the null state, and you
> don't appear to need the lazy initialization. You can just keep a
> DrawingMonitor member, then you don't need to worry about creating it in
> DrawingAreaProxyImpl::dispatchAfterEnsuringDrawing or checking that it's not
> null.

This is only used by tests, so when no running tests, it's better to not have that monitor thing created, that's why I create it on demand.
Comment 5 Carlos Garcia Campos 2016-08-05 22:57:46 PDT
Comment on attachment 285207 [details]
Speculative fix

cq- for now
Comment 6 Michael Catanzaro 2016-08-06 07:43:14 PDT
Zan :)

(In reply to comment #4)
> This is only used by tests, so when no running tests, it's better to not
> have that monitor thing created, that's why I create it on demand.

I'd add a comment to that effect if you don't want someone "optimizing" away the pointer in the future.
Comment 7 Zan Dobersek 2016-08-07 23:09:21 PDT
Created attachment 285558 [details]
Simplified test case

Just sets up a 400px-wide column that enforces a scrollbar.

Reproduces the scrollbar issue in MiniBrowser. It's easier to observe it if a sleep() call is placed into the CoordinatedLayerTreeHost timer callback, along with either using mock scrollbars or exporting GTK_OVERLAY_SCOLLING=0.
Comment 8 Zan Dobersek 2016-08-08 03:02:23 PDT
OK, the issue is that when loading the test case in MiniBrowser, ThreadedCompositor::didChangeVisibleRect() dispatches the setVisibleContentsRect() call that ends up in CompositingCoordinator. Since we're compositing the scrollbars as well, this visible contents rect needs to encompass the complete width of the view, but that's not happening.

In case of mock scrollbars, the rightmost 15px are clipped from this rect, but that doesn't prevent the scrollbar overlay layers to be flushed and rendered. What does happen is that during tile creation in the backing store the tiles that would normally intersect the visible rect of the view (if it were spanning over the whole actual visible area) are sorted by distance to the visible rect.

The top of the two tiles used for the scrollbar is closer to the visible rect, so that gets created and filled in first.  The second tile is stored as pending for creation, and does get rendered at the point of the next layer flush.

None of this would be a problem if the setVisibleContentsRect() was passed a proper rect that would cover the whole view.
Comment 9 Carlos Garcia Campos 2016-08-25 05:51:39 PDT
Created attachment 286961 [details]
Patch

This should be the proper fix, I think. At least it seems to work. Thanks for the explanation Zan, I've copied it to the changelog entry :-)
Comment 10 Carlos Garcia Campos 2016-08-25 08:41:10 PDT
Committed r204961: <http://trac.webkit.org/changeset/204961>
Comment 11 Michael Catanzaro 2016-08-25 17:00:37 PDT
Didn't work, we had more builds after this landed with huge number of flakes:

https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20(Tests)/builds/17873
https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20(Tests)/builds/17874
https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20(Tests)/builds/17877

We need to investigate the 40 unexpected passes in 17877; that is definitely related to the flakes and fallout from threaded compositor.
Comment 12 Carlos Garcia Campos 2016-08-25 23:28:18 PDT
(In reply to comment #11)
> Didn't work, we had more builds after this landed with huge number of flakes:

No, it worked, it's just that this was not the only cause of flakiness, if you look at the results, there isn't any case in which we don't render the full scrollbar, what the patch fixed. Now it seems we have problems with borders, and also with scrollbars that sometimes are shown (the whole scrollbar) when they shouldn't and the other way around. We need to investigate why.

> https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20(Tests)/
> builds/17873
> https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20(Tests)/
> builds/17874
> https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20(Tests)/
> builds/17877
> 
> We need to investigate the 40 unexpected passes in 17877; that is definitely
> related to the flakes and fallout from threaded compositor.

Yes.