161242 – [GTK][Threaded Compositor] Several flaky tests

Carlos Garcia Campos

Reported 2016-08-26 06:53:25 PDT

Even after fixing bug #160450 we still have a lot of laky tests sometimes in the bots. We no longer see the problem of scrollbars partially rendered, but we see scrollbars in some cases where they shouldn't be there at all (the contents are not bigger than the view size). That made me think that maybe there are tests leaving the view in an inconsistent state, most of the flaky tests are ref tests and all run by the same worker. But I tried to run all the tests using a single worker and I couldn't reproduce the problem so there must be something else. The other thing I can think of is that maybe in some cases the bot is more overloaded and a particular worker has less priority and things happen slower. We are still assuming that scheduling the task in the next loop iteration in the UI process after a force redraw ensures a redraw. The patch I initially posted in bug #160450 as a workaround tried to fix that assumption. I'm not sure that's the problem though, but we could try that patch and roll it out if it doesn't work. I'm a bit lost with this, because it only happens sometimes in the bots and I've never been able to reproduce it locally in any of machines I've tried.

Michael Catanzaro

Comment 1 2016-08-26 08:21:20 PDT

Doesn't hurt to try.

Carlos Alberto Lopez Perez

Comment 2 2016-08-26 10:19:58 PDT

(In reply to comment #0) > Even after fixing bug #160450 we still have a lot of laky tests sometimes in > the bots. We no longer see the problem of scrollbars partially rendered, but > we see scrollbars in some cases where they shouldn't be there at all (the > contents are not bigger than the view size). That made me think that maybe > there are tests leaving the view in an inconsistent state, most of the flaky > tests are ref tests and all run by the same worker. But I tried to run all > the tests using a single worker and I couldn't reproduce the problem so > there must be something else. The other thing I can think of is that maybe > in some cases the bot is more overloaded and a particular worker has less > priority and things happen slower. We are still assuming that scheduling the > task in the next loop iteration in the UI process after a force redraw > ensures a redraw. The patch I initially posted in bug #160450 as a > workaround tried to fix that assumption. I'm not sure that's the problem > though, but we could try that patch and roll it out if it doesn't work. I'm > a bit lost with this, because it only happens sometimes in the bots and I've > never been able to reproduce it locally in any of machines I've tried. I think you will be able to reproduce this if you run the whole test suite instead a subset of it. And this is something that I believe was happening before the threaded compositor with other kind of tests. I have found many tests that are flaky when the whole test suite is run, but if you run then manually or you run a subset of the test suite (for example: run only the fast/ tests). Then its not longer reproducible. I'm not sure if the problem is on webkit itself or is in the tooling and the way the tests are ran.

Carlos Garcia Campos

Comment 3 2016-08-26 23:42:33 PDT

(In reply to comment #2) > (In reply to comment #0) > > Even after fixing bug #160450 we still have a lot of laky tests sometimes in > > the bots. We no longer see the problem of scrollbars partially rendered, but > > we see scrollbars in some cases where they shouldn't be there at all (the > > contents are not bigger than the view size). That made me think that maybe > > there are tests leaving the view in an inconsistent state, most of the flaky > > tests are ref tests and all run by the same worker. But I tried to run all > > the tests using a single worker and I couldn't reproduce the problem so > > there must be something else. The other thing I can think of is that maybe > > in some cases the bot is more overloaded and a particular worker has less > > priority and things happen slower. We are still assuming that scheduling the > > task in the next loop iteration in the UI process after a force redraw > > ensures a redraw. The patch I initially posted in bug #160450 as a > > workaround tried to fix that assumption. I'm not sure that's the problem > > though, but we could try that patch and roll it out if it doesn't work. I'm > > a bit lost with this, because it only happens sometimes in the bots and I've > > never been able to reproduce it locally in any of machines I've tried. > > I think you will be able to reproduce this if you run the whole test suite > instead a subset of it. I did run all the tests using a single worker. > And this is something that I believe was happening before the threaded > compositor with other kind of tests. > > I have found many tests that are flaky when the whole test suite is run, but > if you run then manually or you run a subset of the test suite (for example: > run only the fast/ tests). Then its not longer reproducible. > > I'm not sure if the problem is on webkit itself or is in the tooling and the > way the tests are ran. I'm pretty sure the problem is in the tests infrastructure.

Carlos Garcia Campos

Comment 4 2016-08-27 02:14:32 PDT

(In reply to comment #1) > Doesn't hurt to try. Yes, I don't think it will fix it, to be honest, but actually ensuring a drawing in dispatchAfterEnsuringDrawing instead of assuming it will happen in the next run loop iteration makes sense in any case. So, let's see if it helps.

Carlos Garcia Campos

Comment 5 2016-08-27 02:20:53 PDT

Committed r205075: <http://trac.webkit.org/changeset/205075>

Michael Catanzaro

Comment 6 2016-08-27 17:58:43 PDT

(In reply to comment #4) > Yes, I don't think it will fix it, to be honest Right as usual. :( > but actually ensuring a > drawing in dispatchAfterEnsuringDrawing instead of assuming it will happen > in the next run loop iteration makes sense in any case. Agreed.

Michael Catanzaro

Comment 7 2016-08-27 17:59:14 PDT

(Reopening.)

Carlos Garcia Campos

Comment 8 2016-08-28 03:38:01 PDT

(In reply to comment #6) > (In reply to comment #4) > > Yes, I don't think it will fix it, to be honest > > Right as usual. :( However, it seems the number of flaky tests has been reduced a bit, and what is more important, the errors we see now are different too. Before the patch, there were tests in which we were showing scrollbars when we shouldn't. Now what I see is that contents are blurry and/or clipped, and in many of the cases both the expected and the actual rendering are wrong. So, I think there's a condition somewhere that can leave a different page scale or device scale factor in the web view that affects other tests. > > but actually ensuring a > > drawing in dispatchAfterEnsuringDrawing instead of assuming it will happen > > in the next run loop iteration makes sense in any case. > > Agreed.

Michael Catanzaro

Comment 9 2016-08-28 12:39:52 PDT

(In reply to comment #8) > However, it seems the number of flaky tests has been reduced a bit It was probably just luck: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/17930

Zan Dobersek

Comment 10 2016-08-28 23:43:09 PDT

Would it be possible to test on the bots without using the software rasterizer? The clipping issues look as if the resizing of the underlying pixmap is not yet complete, or as if just a part of it is rendered.

Carlos Garcia Campos

Comment 11 2016-08-29 01:21:31 PDT

(In reply to comment #10) > Would it be possible to test on the bots without using the software > rasterizer? Yes, please, swrast doesn't work with wayland either, because of eglCreateImage being dummy when using the software rasterizer. > The clipping issues look as if the resizing of the underlying pixmap is not > yet complete, or as if just a part of it is rendered. Why are we using the software path, to not depend on the bot drivers? Maybe that's why I could never reproduce those issues. And that would explain why all this started when switched to the threaded compositor.

Carlos Alberto Lopez Perez

Comment 12 2016-08-30 06:31:09 PDT

(In reply to comment #11) > (In reply to comment #10) > > Would it be possible to test on the bots without using the software > > rasterizer? > > Yes, please, swrast doesn't work with wayland either, because of > eglCreateImage being dummy when using the software rasterizer. > > > The clipping issues look as if the resizing of the underlying pixmap is not > > yet complete, or as if just a part of it is rendered. > > Why are we using the software path, to not depend on the bot drivers? Maybe > that's why I could never reproduce those issues. And that would explain why > all this started when switched to the threaded compositor. We use swrast because its the best way of ensuring that developers can get consistent results for the layout tests. Otherwise it will be a nightmare. Think in developers having nvidia cards, or intel ones. With different versions of the drivers, etc. I have tested the following: 1) Reproduce the problem. I was able to do this at the first try. The trick is run the whole test suite using the same workers as number of CPUs, which is the default. So if you want to reproduce this, don't force any number of workers and simply tun the full test suite. I did this: $ Tools/Scripts/run-webkit-tests --no-show-results --no-new-test-results --no-sample-on-timeout --results-directory layout-test-results-xvfb-swrast --debug-rwt-logging --release --gtk And I got this: https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-swrast/results.html From the stdout.log, you see that I got 122 unexpected image-only flakiness (122): https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-swrast/ Now I did the same but using my native x11 display with the opengl nvidia drivers. To do that, simply export the variable USE_NATIVE_XDISPLAY=1 before running the tests. I did several runs of this, storing each run in a different directory. This are the results: -> Run 1: Unexpected flakiness: image-only failures (159) https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-xdisplay1/results.html -> Run 2: Unexpected flakiness: image-only failures (25) https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-xdisplay2/results.html -> Run 3: Unexpected flakiness: image-only failures (22) https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-xdisplay3/results.html So, my conclusion is that is not caused by swrast. I can reproduce the flakiness also with the nvidia driver. BTW, Check this results: - swrast: https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-swrast/fast/flexbox/auto-height-with-flex-diffs.html - nvidia run 1: https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-xdisplay1/fast/flexbox/auto-height-with-flex-diffs.html - nvidia run 2: https://people.igalia.com/clopez/wkbug/161242/xresults/layout-test-results-xvfb-xdisplay2/fast/flexbox/auto-height-with-flex-diffs.html As you can see. Both the expected and the actual images are different on each one of the three cases. On the nvidia run 2, the expected result didn't even reached the point to render anything than a blank viewport. I think that using swrast instead of hardware-backed opengl is not the problem here.

Carlos Garcia Campos

Comment 13 2016-08-30 06:47:28 PDT

Thanks for the so detailed analysis, Carlos. So, the results are the same than in the bots, both the expected and actual files are incorrectly rendered and scale factors seem to have something to do there. Maybe there's a condition (test timing out or crashing) causing that any of the scale factors is note correctly reset between tests. The device scale factor is changed from the UI process, while all other are changed in the web process on every BeginTest message.

Carlos Garcia Campos

Comment 14 2016-09-01 05:06:43 PDT

Finally found the issue. The problem is the viewport controller, that is changed by the fixed layout tests, but never reset. There's also a problem with the fixed layout tests that could leave the view in an inconsistent state as I explained in bug #160117. I'm already working on a proper fix, but in the meantime I've skipped fixed layout tests (they are only two) to make sure we don't see all those flaky tests in the bots anymore. To easily reproduce it, launch run-webkit-tests with a single worker job and run fixed layout test first and then other ref tests, for example: $ run-webkit-tests --gtk --release --child-processes=1 fast/fixed-layout/fixed-layout.html fast/regions

Michael Catanzaro

Comment 15 2016-09-01 10:33:52 PDT

(In reply to comment #14) > Finally found the issue. The problem is the viewport controller, that is > changed by the fixed layout tests, but never reset. There's also a problem > with the fixed layout tests that could leave the view in an inconsistent > state as I explained in bug #160117. I'm already working on a proper fix, > but in the meantime I've skipped fixed layout tests (they are only two) to > make sure we don't see all those flaky tests in the bots anymore. To easily > reproduce it, launch run-webkit-tests with a single worker job and run fixed > layout test first and then other ref tests, for example: > > $ run-webkit-tests --gtk --release --child-processes=1 > fast/fixed-layout/fixed-layout.html fast/regions It's clearly a big improvement, good job. But it's still not completely fixed. We have 50 unexpected passes in build #18009, for example. These are tests that normally always fail, but occasionally pass now that we've enabled threaded compositor.

Carlos Garcia Campos

Comment 16 2016-09-01 22:40:04 PDT

(In reply to comment #15) > (In reply to comment #14) > > Finally found the issue. The problem is the viewport controller, that is > > changed by the fixed layout tests, but never reset. There's also a problem > > with the fixed layout tests that could leave the view in an inconsistent > > state as I explained in bug #160117. I'm already working on a proper fix, > > but in the meantime I've skipped fixed layout tests (they are only two) to > > make sure we don't see all those flaky tests in the bots anymore. To easily > > reproduce it, launch run-webkit-tests with a single worker job and run fixed > > layout test first and then other ref tests, for example: > > > > $ run-webkit-tests --gtk --release --child-processes=1 > > fast/fixed-layout/fixed-layout.html fast/regions > > It's clearly a big improvement, good job. > > But it's still not completely fixed. We have 50 unexpected passes in build > #18009, for example. These are tests that normally always fail, but > occasionally pass now that we've enabled threaded compositor. You are assuming again that there's a single issue. This particular problem that was making both the expected and actual rendering of reftest fail, was caused by the fixed layout (it changes the viewport controller state that is never reset). Of course there are more issues, but I think the situation now is mostly the same to what we had before switching to the threaded compositor.

Michael Catanzaro

Comment 17 2016-09-02 04:52:41 PDT

(In reply to comment #16) > You are assuming again that there's a single issue. No, clearly there were many different issues, as you've fixed three different issues so far.... > I think the situation now > is mostly the same to what we had before switching to the threaded > compositor. Yes, except for the fact that sometimes a bunch of tests pass when they usually fail. In build #18023 we have 84 unexpected passes; that never happened before. What's interesting is these tests either always pass or always fail in a particular run of run-webkit-tests; they are clearly flaky, but they don't contribute to the flakiness count.

Carlos Garcia Campos

Comment 18 2016-09-02 06:25:48 PDT

(In reply to comment #17) > (In reply to comment #16) > > You are assuming again that there's a single issue. > > No, clearly there were many different issues, as you've fixed three > different issues so far.... > > > I think the situation now > > is mostly the same to what we had before switching to the threaded > > compositor. > > Yes, except for the fact that sometimes a bunch of tests pass when they > usually fail. In build #18023 we have 84 unexpected passes; that never > happened before. What's interesting is these tests either always pass or > always fail in a particular run of run-webkit-tests; they are clearly flaky, > but they don't contribute to the flakiness count. We should probably mark them as pass, to see what's wrong when they fail

Carlos Alberto Lopez Perez

Comment 19 2016-09-02 08:32:44 PDT

(In reply to comment #18) > (In reply to comment #17) > > (In reply to comment #16) > > > You are assuming again that there's a single issue. > > > > No, clearly there were many different issues, as you've fixed three > > different issues so far.... > > > > > I think the situation now > > > is mostly the same to what we had before switching to the threaded > > > compositor. > > > > Yes, except for the fact that sometimes a bunch of tests pass when they > > usually fail. In build #18023 we have 84 unexpected passes; that never > > happened before. What's interesting is these tests either always pass or > > always fail in a particular run of run-webkit-tests; they are clearly flaky, > > but they don't contribute to the flakiness count. > > We should probably mark them as pass, to see what's wrong when they fail That don't makes much sense to me. You can see the failure diff even if they are expected failures. You just have to navigate under https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20%28Tests%29/ to find them. For example: 1) Check the unexpected passes at end of build #18023 https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/18023/steps/layout-test/logs/stdio and pick 1 2) I pick: imported/mozilla/svg/blend-difference-stacking.html 3) Check that on the next build that test failed. Log of the next build: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Tests%29/builds/18024/steps/layout-test/logs/stdio <--- yes it failed 4) Go to https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20%28Tests%29/ and find the results for build #18204 -> https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20%28Tests%29/r205335%20%2818024%29/ 5) Now under that folder navigate to imported/mozilla/svg/ and look for the blend-difference-stacking diff 6) Here you have it: https://build.webkit.org/results/GTK%20Linux%2064-bit%20Release%20%28Tests%29/r205335%20%2818024%29/imported/mozilla/svg/blend-difference-stacking-diffs.html

Carlos Garcia Campos

Comment 20 2016-09-02 22:44:41 PDT

(In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > (In reply to comment #16) > > > > You are assuming again that there's a single issue. > > > > > > No, clearly there were many different issues, as you've fixed three > > > different issues so far.... > > > > > > > I think the situation now > > > > is mostly the same to what we had before switching to the threaded > > > > compositor. > > > > > > Yes, except for the fact that sometimes a bunch of tests pass when they > > > usually fail. In build #18023 we have 84 unexpected passes; that never > > > happened before. What's interesting is these tests either always pass or > > > always fail in a particular run of run-webkit-tests; they are clearly flaky, > > > but they don't contribute to the flakiness count. > > > > We should probably mark them as pass, to see what's wrong when they fail > > That don't makes much sense to me. > > You can see the failure diff even if they are expected failures. You just > have to navigate under > https://build.webkit.org/results/GTK%20Linux%2064- > bit%20Release%20%28Tests%29/ to find them. > > For example: > > 1) Check the unexpected passes at end of build #18023 > https://build.webkit.org/builders/GTK%20Linux%2064- > bit%20Release%20%28Tests%29/builds/18023/steps/layout-test/logs/stdio and > pick 1 > > 2) I pick: imported/mozilla/svg/blend-difference-stacking.html > > 3) Check that on the next build that test failed. Log of the next build: > https://build.webkit.org/builders/GTK%20Linux%2064- > bit%20Release%20%28Tests%29/builds/18024/steps/layout-test/logs/stdio <--- > yes it failed > > 4) Go to > https://build.webkit.org/results/GTK%20Linux%2064- > bit%20Release%20%28Tests%29/ and find the results for build #18204 -> > https://build.webkit.org/results/GTK%20Linux%2064- > bit%20Release%20%28Tests%29/r205335%20%2818024%29/ > > 5) Now under that folder navigate to imported/mozilla/svg/ and look for the > blend-difference-stacking diff > > 6) Here you have it: > https://build.webkit.org/results/GTK%20Linux%2064- > bit%20Release%20%28Tests%29/r205335%20%2818024%29/imported/mozilla/svg/blend- > difference-stacking-diffs.html hmm, well, not very convenient, but still. I have theory, though. When they unexpectedly pass, they don't actually pass, we just fail to render both the actual and expected files in the same way (for example a fully white image in both cases). That's a problem of the reftests. So, more interesting to see what's failing, which can be also reproduced locally more easily, would be to see what we render when they pass.

Carlos Garcia Campos

Comment 21 2016-09-06 03:35:41 PDT

Created attachment 288009 [details] Patch

Michael Catanzaro

Comment 22 2016-09-06 07:55:40 PDT

Comment on attachment 288009 [details] Patch (wrong bug)

Carlos Garcia Campos

Comment 23 2016-09-06 08:41:16 PDT

Created attachment 288025 [details] Patch

Michael Catanzaro

Comment 24 2016-09-06 09:32:54 PDT

Comment on attachment 288025 [details] Patch r=me for the CoordinatedGraphics changes. I'll let you decide whether or not to wait for an owner on the iOS code you moved; it looks clearly fine to me, though.

Carlos Garcia Campos

Comment 25 2016-09-06 23:43:08 PDT

Committed r205537: <http://trac.webkit.org/changeset/205537>

Michael Catanzaro

Comment 26 2016-09-07 05:26:20 PDT

Reopening since we still have one bug causing flakiness left (per comment #17)

Carlos Garcia Campos

Comment 27 2016-09-07 06:09:20 PDT

Please no. Use new bug reports for remaining flaky tests. We have always had flaky tests, so remaining ones might not be related to threaded compositor.

Michael Catanzaro

Comment 28 2016-09-07 08:31:14 PDT

(In reply to comment #27) > Please no. Use new bug reports for remaining flaky tests. We have always had > flaky tests, so remaining ones might not be related to threaded compositor. Bug #161688. They are almost certainly related to threaded compositor; we've never had more than maybe 10 flaky tests appear in unexpected passes until threaded compositor, now we have dozens.

Attachments
Patch (2.75 KB, patch) 2016-09-06 03:35 PDT, Carlos Garcia Campos	no flags	Details Formatted Diff Diff
Patch (8.72 KB, patch) 2016-09-06 08:41 PDT, Carlos Garcia Campos	mcatanzaro: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.