REOPENED 106884
[CoordinatedGraphics] Deadlock when running abspos-child-container-changes-from-relative-to-static.html
https://bugs.webkit.org/show_bug.cgi?id=106884
Summary [CoordinatedGraphics] Deadlock when running abspos-child-container-changes-fr...
Dominik Röttsches (drott)
Reported 2013-01-15 01:22:09 PST
When running $ run-webkit-tests -2 --efl --release fast/block on my Ubuntu 12.04 Installation, on a Dell XPS 3800 with GL configuration as follows: OpenGL vendor string: Intel Open Source Technology Center OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Desktop OpenGL version string: 3.0 Mesa 9.0 OpenGL shading language version string: 1.30 Tests start to stall after about 8 seconds, or 320 test cases. And I start to see "Timed out waiting for repaint" in the test runner dumps of the single runs.
Attachments
Patch (2.19 KB, patch)
2013-01-15 06:18 PST, Dominik Röttsches (drott)
no flags
Patch v2, updated bug title. (2.18 KB, patch)
2013-01-15 06:38 PST, Dominik Röttsches (drott)
no flags
Patch v3, grammar in ChangeLog fixed. (2.19 KB, patch)
2013-01-15 07:44 PST, Dominik Röttsches (drott)
no flags
Dominik Röttsches (drott)
Comment 1 2013-01-15 01:43:04 PST
I should mention, this is with subpixel layout enabled. Will try without.
Dominik Röttsches (drott)
Comment 2 2013-01-15 04:19:27 PST
(In reply to comment #1) > I should mention, this is with subpixel layout enabled. Will try without. That doesn't make a difference, current trunk shows the same issues, in Debug as well as Release. That's on an Ubuntu 12.10 64bit system.
Dominik Röttsches (drott)
Comment 3 2013-01-15 05:02:53 PST
Mhm, much easier, just: $ run-webkit-tests -2 --efl --debug fast/block/abspos-child-container-changes-from-relative-to-static.html fails to repaint for me.
Dominik Röttsches (drott)
Comment 4 2013-01-15 05:52:53 PST
When running fast/block/abspos-child-container-changes-from-relative-to-static.html WebKitTestRunner forces a repaint on the WebProcess but we end up in a deadlock and WTR detects a repaint failure in dumpPixelsAndCompareWithExpected. As far as I can see it, this happens because, first CoordinatedLayerTreeHost::forceRepaintAsync is called, which then calls scheduleLayerFlush() - after this timer fires: performScheduledLayerFlush() is called, but here: m_waitingForUIProcess is true and the method returns early, while the UIProcess is waiting for the WebProcess.
Dominik Röttsches (drott)
Comment 5 2013-01-15 06:18:04 PST
Chris Dumez
Comment 6 2013-01-15 06:25:22 PST
FYI, I could not reproduce the issue on my machine. I tried: Tools/Scripts/run-webkit-tests --debug --efl -2 fast/block/abspos-child-container-changes-from-relative-to-static.html
Dominik Röttsches (drott)
Comment 7 2013-01-15 06:29:30 PST
(In reply to comment #6) > FYI, I could not reproduce the issue on my machine. I tried: > Tools/Scripts/run-webkit-tests --debug --efl -2 fast/block/abspos-child-container-changes-from-relative-to-static.html It was 100% reproducible on my machine - maybe timing differences in Ubuntu 12.10 vs. 12.04.
Chris Dumez
Comment 8 2013-01-15 06:31:29 PST
I think we should use [CoordinatedGraphics] tag instead of [EFL]. Otherwise, the patch makes sense to me.
Dominik Röttsches (drott)
Comment 9 2013-01-15 06:38:57 PST
Created attachment 182755 [details] Patch v2, updated bug title.
Noam Rosenthal
Comment 10 2013-01-15 07:41:13 PST
Comment on attachment 182755 [details] Patch v2, updated bug title. This is fine with me but I don't get to review patches in WebKit2 anymore. Please find a WebKit2 owner.
Dominik Röttsches (drott)
Comment 11 2013-01-15 07:44:32 PST
Created attachment 182772 [details] Patch v3, grammar in ChangeLog fixed.
Thiago Marcos P. Santos
Comment 12 2013-01-15 08:04:45 PST
(In reply to comment #10) > (From update of attachment 182755 [details]) > This is fine with me but I don't get to review patches in WebKit2 anymore. Please find a WebKit2 owner. Doesn't it fit into the "trivial" category? http://lists.webkit.org/pipermail/webkit-dev/2013-January/023263.html
WebKit Review Bot
Comment 13 2013-01-15 08:34:51 PST
Comment on attachment 182772 [details] Patch v3, grammar in ChangeLog fixed. Attachment 182772 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/15841996 New failing tests: inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html
Dominik Röttsches (drott)
Comment 14 2013-01-15 08:40:06 PST
(In reply to comment #13) > (From update of attachment 182772 [details]) > Attachment 182772 [details] did not pass chromium-ews (chromium-xvfb): > Output: http://queues.webkit.org/results/15841996 > > New failing tests: > inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html I think that's an artifact, I am changing WebKit2 code that's not run by Chromium tests and the previous two EWS runs showed green.
Benjamin Poulain
Comment 15 2013-01-15 13:15:43 PST
Comment on attachment 182772 [details] Patch v3, grammar in ChangeLog fixed. Sure, if No'am says okay :)
Dominik Röttsches (drott)
Comment 16 2013-01-15 13:21:21 PST
(In reply to comment #15) > (From update of attachment 182772 [details]) > Sure, if No'am says okay :) Thanks, Benjamin.
WebKit Review Bot
Comment 17 2013-01-15 13:41:53 PST
Comment on attachment 182772 [details] Patch v3, grammar in ChangeLog fixed. Clearing flags on attachment: 182772 Committed r139781: <http://trac.webkit.org/changeset/139781>
WebKit Review Bot
Comment 18 2013-01-15 13:41:59 PST
All reviewed patches have been landed. Closing bug.
Dominik Röttsches (drott)
Comment 19 2013-01-16 06:47:15 PST
I still needed http://cgit.freedesktop.org/xorg/xserver/patch/?id=65642ccb78aa2d4c4e17b9ac42e4ef625c4a6e8b to my xvfb package in order to be able to generate pixel results.
Chris Dumez
Comment 20 2013-01-16 06:56:43 PST
(In reply to comment #19) > I still needed http://cgit.freedesktop.org/xorg/xserver/patch/?id=65642ccb78aa2d4c4e17b9ac42e4ef625c4a6e8b to my xvfb package in order to be able to generate pixel results. Has this bug been reported against Ubuntu so that they consider applying the patch to their package?
Dominik Röttsches (drott)
Comment 21 2013-01-16 07:06:06 PST
(In reply to comment #20) > (In reply to comment #19) > > I still needed http://cgit.freedesktop.org/xorg/xserver/patch/?id=65642ccb78aa2d4c4e17b9ac42e4ef625c4a6e8b to my xvfb package in order to be able to generate pixel results. > > Has this bug been reported against Ubuntu so that they consider applying the patch to their package? Yes, just now, https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1100312
Rafael Brandao
Comment 22 2013-01-18 10:46:10 PST
(In reply to comment #4) > When running fast/block/abspos-child-container-changes-from-relative-to-static.html WebKitTestRunner forces a repaint on the WebProcess but we end up in a deadlock and WTR detects a repaint failure in dumpPixelsAndCompareWithExpected. > > As far as I can see it, this happens because, first > CoordinatedLayerTreeHost::forceRepaintAsync is called, > which then calls scheduleLayerFlush() - after this timer fires: > performScheduledLayerFlush() is called, but here: > m_waitingForUIProcess is true and the method returns early, while the UIProcess is waiting for the WebProcess. This timeout was happening on other ports like Qt?
Rafael Brandao
Comment 23 2013-01-18 11:58:00 PST
I believe this is a wrong fix for a EFL specific problem. UIProcess can expect the callback to arrive at some time, but it should never really wait for it (on CoordinatedGraphics). UIProcess has to keep up with its main loop so things like "viewNeedsDisplay" will be processed and then unlock the WebProcess to do the render of next frame. Once the next frame is rendered and the flush happens on WebProcess side, then the callback will be sent back to UIProcess. This is the only way to guarantee that the rendered frame is the one after the forceRepaint was requested. We've been observing some flakyness on our API tests after this patch and then I've observed TestController::platformRunUntil of EFL: if (timeout == m_noTimeout) { // Never timeout if we are debugging or not meant to timeout. while (!condition) { ecore_main_loop_iterate(); sleep(1); } return; } This sleep seems to be the source of the problem. You should keep up the main loop running in a non-blocking way, like Qt does. Could we rollout this change or make the proper fix soon?
Noam Rosenthal
Comment 24 2013-01-18 12:07:34 PST
I think you are right. Dominik?
Dominik Röttsches (drott)
Comment 25 2013-01-21 02:14:52 PST
(In reply to comment #24) > I think you are right. > Dominik? Agree that it has a different root cause. I can look into rolling out and doing an EFL fix later on today - thanks Rafael for pointing this out. If you wanna roll out earlier, feel free to go ahead and CC me on the rollout patch.
Chris Dumez
Comment 26 2013-01-21 05:50:19 PST
(In reply to comment #23) > I believe this is a wrong fix for a EFL specific problem. UIProcess can expect the callback to arrive at some time, but it should never really wait for it (on CoordinatedGraphics). UIProcess has to keep up with its main loop so things like "viewNeedsDisplay" will be processed and then unlock the WebProcess to do the render of next frame. Once the next frame is rendered and the flush happens on WebProcess side, then the callback will be sent back to UIProcess. This is the only way to guarantee that the rendered frame is the one after the forceRepaint was requested. > > We've been observing some flakyness on our API tests after this patch and then I've observed TestController::platformRunUntil of EFL: > > if (timeout == m_noTimeout) { > // Never timeout if we are debugging or not meant to timeout. > while (!condition) { > ecore_main_loop_iterate(); > sleep(1); > } > return; > } > > This sleep seems to be the source of the problem. You should keep up the main loop running in a non-blocking way, like Qt does. Could we rollout this change or make the proper fix soon? I'm removing the sleep() instruction in Bug 107442. I would like this fix to land before we roll out.
WebKit Review Bot
Comment 27 2013-01-21 06:06:11 PST
Re-opened since this is blocked by bug 107443
Jocelyn Turcotte
Comment 28 2013-01-21 06:23:29 PST
(In reply to comment #23) > This sleep seems to be the source of the problem. You should keep up the main loop running in a non-blocking way, like Qt does. Could we rollout this change or make the proper fix soon? Very nice catch!
Chris Dumez
Comment 29 2013-01-21 06:24:14 PST
Closing since it should be fixed by Bug 107442 (dependency).
Note You need to log in before you can comment on or make changes to this bug.