Bug 106884

Summary: [CoordinatedGraphics] Deadlock when running abspos-child-container-changes-from-relative-to-static.html
Product: WebKit Reporter: Dominik Röttsches (drott) <d-r>
Component: Tools / TestsAssignee: Nobody <webkit-unassigned>
Status: REOPENED ---    
Severity: Normal CC: andersca, benjamin, cdumez, cmarcelo, dglazkov, jturcotte, kalyan.kondapally, kenneth, laszlo.gombos, mikhail.pozdnyakov, noam, ostap73, rafael.lobo, sam, tmpsantos, webkit.review.bot, yael, zeno
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 107442, 107443    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch v2, updated bug title.
none
Patch v3, grammar in ChangeLog fixed. none

Description Dominik Röttsches (drott) 2013-01-15 01:22:09 PST
When running
$ run-webkit-tests -2 --efl  --release fast/block
on my Ubuntu 12.04 Installation, on a Dell XPS 3800
with GL configuration as follows:
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Desktop 
OpenGL version string: 3.0 Mesa 9.0
OpenGL shading language version string: 1.30

Tests start to stall after about 8 seconds, or 320 test cases. And I start to see "Timed out waiting for repaint" in the test runner dumps of the single runs.
Comment 1 Dominik Röttsches (drott) 2013-01-15 01:43:04 PST
I should mention, this is with subpixel layout enabled. Will try without.
Comment 2 Dominik Röttsches (drott) 2013-01-15 04:19:27 PST
(In reply to comment #1)
> I should mention, this is with subpixel layout enabled. Will try without.

That doesn't make a difference, current trunk shows the same issues, in Debug as well as Release. That's on an Ubuntu 12.10 64bit system.
Comment 3 Dominik Röttsches (drott) 2013-01-15 05:02:53 PST
Mhm, much easier, just:

$ run-webkit-tests -2 --efl  --debug fast/block/abspos-child-container-changes-from-relative-to-static.html

fails to repaint for me.
Comment 4 Dominik Röttsches (drott) 2013-01-15 05:52:53 PST
When running fast/block/abspos-child-container-changes-from-relative-to-static.html WebKitTestRunner forces a repaint on the WebProcess but we end up in a deadlock and WTR detects a repaint failure in dumpPixelsAndCompareWithExpected.

As far as I can see it, this happens because, first
CoordinatedLayerTreeHost::forceRepaintAsync is called,
which then calls scheduleLayerFlush() - after this timer fires:
performScheduledLayerFlush() is called, but here:
m_waitingForUIProcess is true and the method returns early, while the UIProcess is waiting for the WebProcess.
Comment 5 Dominik Röttsches (drott) 2013-01-15 06:18:04 PST
Created attachment 182751 [details]
Patch
Comment 6 Chris Dumez 2013-01-15 06:25:22 PST
FYI, I could not reproduce the issue on my machine. I tried:
Tools/Scripts/run-webkit-tests --debug --efl -2 fast/block/abspos-child-container-changes-from-relative-to-static.html
Comment 7 Dominik Röttsches (drott) 2013-01-15 06:29:30 PST
(In reply to comment #6)
> FYI, I could not reproduce the issue on my machine. I tried:
> Tools/Scripts/run-webkit-tests --debug --efl -2 fast/block/abspos-child-container-changes-from-relative-to-static.html

It was 100% reproducible on my machine - maybe timing differences in Ubuntu 12.10 vs. 12.04.
Comment 8 Chris Dumez 2013-01-15 06:31:29 PST
I think we should use [CoordinatedGraphics] tag instead of [EFL]. Otherwise, the patch makes sense to me.
Comment 9 Dominik Röttsches (drott) 2013-01-15 06:38:57 PST
Created attachment 182755 [details]
Patch v2, updated bug title.
Comment 10 Noam Rosenthal 2013-01-15 07:41:13 PST
Comment on attachment 182755 [details]
Patch v2, updated bug title.

This is fine with me but I don't get to review patches in WebKit2 anymore. Please find a WebKit2 owner.
Comment 11 Dominik Röttsches (drott) 2013-01-15 07:44:32 PST
Created attachment 182772 [details]
Patch v3, grammar in ChangeLog fixed.
Comment 12 Thiago Marcos P. Santos 2013-01-15 08:04:45 PST
(In reply to comment #10)
> (From update of attachment 182755 [details])
> This is fine with me but I don't get to review patches in WebKit2 anymore. Please find a WebKit2 owner.

Doesn't it fit into the "trivial" category?

http://lists.webkit.org/pipermail/webkit-dev/2013-January/023263.html
Comment 13 WebKit Review Bot 2013-01-15 08:34:51 PST
Comment on attachment 182772 [details]
Patch v3, grammar in ChangeLog fixed.

Attachment 182772 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/15841996

New failing tests:
inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html
Comment 14 Dominik Röttsches (drott) 2013-01-15 08:40:06 PST
(In reply to comment #13)
> (From update of attachment 182772 [details])
> Attachment 182772 [details] did not pass chromium-ews (chromium-xvfb):
> Output: http://queues.webkit.org/results/15841996
> 
> New failing tests:
> inspector-protocol/debugger-terminate-dedicated-worker-while-paused.html

I think that's an artifact, I am changing WebKit2 code that's not run by Chromium tests and the previous two EWS runs showed green.
Comment 15 Benjamin Poulain 2013-01-15 13:15:43 PST
Comment on attachment 182772 [details]
Patch v3, grammar in ChangeLog fixed.

Sure, if No'am says okay :)
Comment 16 Dominik Röttsches (drott) 2013-01-15 13:21:21 PST
(In reply to comment #15)
> (From update of attachment 182772 [details])
> Sure, if No'am says okay :)

Thanks, Benjamin.
Comment 17 WebKit Review Bot 2013-01-15 13:41:53 PST
Comment on attachment 182772 [details]
Patch v3, grammar in ChangeLog fixed.

Clearing flags on attachment: 182772

Committed r139781: <http://trac.webkit.org/changeset/139781>
Comment 18 WebKit Review Bot 2013-01-15 13:41:59 PST
All reviewed patches have been landed.  Closing bug.
Comment 19 Dominik Röttsches (drott) 2013-01-16 06:47:15 PST
I still needed http://cgit.freedesktop.org/xorg/xserver/patch/?id=65642ccb78aa2d4c4e17b9ac42e4ef625c4a6e8b to my xvfb package in order to be able to generate pixel results.
Comment 20 Chris Dumez 2013-01-16 06:56:43 PST
(In reply to comment #19)
> I still needed http://cgit.freedesktop.org/xorg/xserver/patch/?id=65642ccb78aa2d4c4e17b9ac42e4ef625c4a6e8b to my xvfb package in order to be able to generate pixel results.

Has this bug been reported against Ubuntu so that they consider applying the patch to their package?
Comment 21 Dominik Röttsches (drott) 2013-01-16 07:06:06 PST
(In reply to comment #20)
> (In reply to comment #19)
> > I still needed http://cgit.freedesktop.org/xorg/xserver/patch/?id=65642ccb78aa2d4c4e17b9ac42e4ef625c4a6e8b to my xvfb package in order to be able to generate pixel results.
> 
> Has this bug been reported against Ubuntu so that they consider applying the patch to their package?

Yes, just now, https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1100312
Comment 22 Rafael Brandao 2013-01-18 10:46:10 PST
(In reply to comment #4)
> When running fast/block/abspos-child-container-changes-from-relative-to-static.html WebKitTestRunner forces a repaint on the WebProcess but we end up in a deadlock and WTR detects a repaint failure in dumpPixelsAndCompareWithExpected.
> 
> As far as I can see it, this happens because, first
> CoordinatedLayerTreeHost::forceRepaintAsync is called,
> which then calls scheduleLayerFlush() - after this timer fires:
> performScheduledLayerFlush() is called, but here:
> m_waitingForUIProcess is true and the method returns early, while the UIProcess is waiting for the WebProcess.

This timeout was happening on other ports like Qt?
Comment 23 Rafael Brandao 2013-01-18 11:58:00 PST
I believe this is a wrong fix for a EFL specific problem. UIProcess can expect the callback to arrive at some time, but it should never really wait for it (on CoordinatedGraphics). UIProcess has to keep up with its main loop so things like "viewNeedsDisplay" will be processed and then unlock the WebProcess to do the render of next frame. Once the next frame is rendered and the flush happens on WebProcess side, then the callback will be sent back to UIProcess. This is the only way to guarantee that the rendered frame is the one after the forceRepaint was requested.

We've been observing some flakyness on our API tests after this patch and then I've observed TestController::platformRunUntil of EFL:

    if (timeout == m_noTimeout) {
        // Never timeout if we are debugging or not meant to timeout.
        while (!condition) {
            ecore_main_loop_iterate();
            sleep(1);
        }
        return;
    }

This sleep seems to be the source of the problem. You should keep up the main loop running in a non-blocking way, like Qt does. Could we rollout this change or make the proper fix soon?
Comment 24 Noam Rosenthal 2013-01-18 12:07:34 PST
I think you are right.
Dominik?
Comment 25 Dominik Röttsches (drott) 2013-01-21 02:14:52 PST
(In reply to comment #24)
> I think you are right.
> Dominik?

Agree that it has a different root cause. I can look into rolling out and doing an EFL fix later on today - thanks Rafael for pointing this out. If you wanna roll out earlier, feel free to go ahead and CC me on the rollout patch.
Comment 26 Chris Dumez 2013-01-21 05:50:19 PST
(In reply to comment #23)
> I believe this is a wrong fix for a EFL specific problem. UIProcess can expect the callback to arrive at some time, but it should never really wait for it (on CoordinatedGraphics). UIProcess has to keep up with its main loop so things like "viewNeedsDisplay" will be processed and then unlock the WebProcess to do the render of next frame. Once the next frame is rendered and the flush happens on WebProcess side, then the callback will be sent back to UIProcess. This is the only way to guarantee that the rendered frame is the one after the forceRepaint was requested.
> 
> We've been observing some flakyness on our API tests after this patch and then I've observed TestController::platformRunUntil of EFL:
> 
>     if (timeout == m_noTimeout) {
>         // Never timeout if we are debugging or not meant to timeout.
>         while (!condition) {
>             ecore_main_loop_iterate();
>             sleep(1);
>         }
>         return;
>     }
> 
> This sleep seems to be the source of the problem. You should keep up the main loop running in a non-blocking way, like Qt does. Could we rollout this change or make the proper fix soon?

I'm removing the sleep() instruction in Bug 107442. I would like this fix to land before we roll out.
Comment 27 WebKit Review Bot 2013-01-21 06:06:11 PST
Re-opened since this is blocked by bug 107443
Comment 28 Jocelyn Turcotte 2013-01-21 06:23:29 PST
(In reply to comment #23)
> This sleep seems to be the source of the problem. You should keep up the main loop running in a non-blocking way, like Qt does. Could we rollout this change or make the proper fix soon?

Very nice catch!
Comment 29 Chris Dumez 2013-01-21 06:24:14 PST
Closing since it should be fixed by Bug 107442 (dependency).