Bug 172167

Summary: [Threaded Compositor] SHOULD NEVER BE REACHED in WebKit::CompositingRunLoop::updateCompleted
Product: WebKit Reporter: Carlos Garcia Campos <cgarcia>
Component: WebKit2Assignee: Nobody <webkit-unassigned>
Status: REOPENED    
Severity: Normal CC: bugs-noreply, cturner, mcatanzaro, zan
Priority: P2 Keywords: Gtk, LayoutTestFailure
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=171336
https://bugs.webkit.org/show_bug.cgi?id=172892
Attachments:
Description Flags
Patch mcatanzaro: review+

Carlos Garcia Campos
Reported 2017-05-16 06:39:16 PDT
This is still happening, even after r216182, less often but still happens. There are two conditions in which this can happen: - Again in force repaint. r216182 fixed the case of force repaint called when update state is completed, but it can also crash if update state is inProgress or PendingAfterCompletion when m_coordinateUpdateCompletionWithClient is true. - When the threaded compositor is invalidated right after renderLayerTree() starts, but before it finishes. I'll submit a patch to fix this.
Attachments
Patch (4.56 KB, patch)
2017-05-16 06:49 PDT, Carlos Garcia Campos
mcatanzaro: review+
Carlos Garcia Campos
Comment 1 2017-05-16 06:49:41 PDT
Carlos Garcia Campos
Comment 2 2017-05-16 06:54:23 PDT
I think this patch also fixes test imported/blink/compositing/layer-creation/incremental-destruction.html that is currently timing out in the release bot.
Michael Catanzaro
Comment 3 2017-05-16 10:16:40 PDT
Comment on attachment 310260 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=310260&action=review > Source/WebKit2/ChangeLog:23 > + (WebKit::ThreadedDisplayRefreshMonitor::dispatchDisplayRefreshCallback): Return early if the montior has been invalidated. monitor
Carlos Garcia Campos
Comment 4 2017-05-17 00:16:13 PDT
(In reply to Carlos Garcia Campos from comment #2) > I think this patch also fixes test > imported/blink/compositing/layer-creation/incremental-destruction.html that > is currently timing out in the release bot. And maybe this also fixes bug #172027, it could be similar to the timeout of imported/blink/compositing/layer-creation/incremental-destruction.html, because of a force repaint the compositing run loop gets stalled in PendingAfterCompletion state because the completion never happens.
Carlos Garcia Campos
Comment 5 2017-05-17 00:19:07 PDT
Zan Dobersek
Comment 6 2017-05-17 02:59:15 PDT
Comment on attachment 310260 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=310260&action=review > Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:266 > - sceneUpdateFinished(); > + if (m_scene->isActive()) > + sceneUpdateFinished(); How can this occur? renderLayerTree() returns early at the beginning if the scene is not active. If it is active, the function then executes serially, which should make it impossible for any task to be scheduled on this thread that would make the scene inactive. > Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:292 > + // Do not change m_coordinateUpdateCompletionWithClient while in force repaint. > + if (m_inForceRepaint) > + return; m_coordinateUpdateCompletion should be set here, but to false if m_inForceRepaint is false as well.
Carlos Garcia Campos
Comment 7 2017-05-17 03:03:01 PDT
(In reply to Zan Dobersek from comment #6) > Comment on attachment 310260 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=310260&action=review > > > Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:266 > > - sceneUpdateFinished(); > > + if (m_scene->isActive()) > > + sceneUpdateFinished(); > > How can this occur? renderLayerTree() returns early at the beginning if the > scene is not active. If it is active, the function then executes serially, > which should make it impossible for any task to be scheduled on this thread > that would make the scene inactive. Because invalidate happens in the main thread. > > Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:292 > > + // Do not change m_coordinateUpdateCompletionWithClient while in force repaint. > > + if (m_inForceRepaint) > > + return; > > m_coordinateUpdateCompletion should be set here, but to false if > m_inForceRepaint is false as well. That happens, because it doesn't return early.
Zan Dobersek
Comment 8 2017-05-17 03:08:58 PDT
Comment on attachment 310260 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=310260&action=review >>> Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:266 >>> + sceneUpdateFinished(); >> >> How can this occur? renderLayerTree() returns early at the beginning if the scene is not active. If it is active, the function then executes serially, which should make it impossible for any task to be scheduled on this thread that would make the scene inactive. > > Because invalidate happens in the main thread. That's doing unsafe cross-thread modifications then. >>> Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:292 >>> + return; >> >> m_coordinateUpdateCompletion should be set here, but to false if m_inForceRepaint is false as well. > > That happens, because it doesn't return early. I'm saying it should be set regardless of the value of m_inForceRepaint. But if m_inForceRepaint is false, it should be set to false. bool coordinateUpdate = m_inForceRepaint && std::any_of(...
Carlos Garcia Campos
Comment 9 2017-05-17 03:32:38 PDT
(In reply to Zan Dobersek from comment #8) > Comment on attachment 310260 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=310260&action=review > > >>> Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:266 > >>> + sceneUpdateFinished(); > >> > >> How can this occur? renderLayerTree() returns early at the beginning if the scene is not active. If it is active, the function then executes serially, which should make it impossible for any task to be scheduled on this thread that would make the scene inactive. > > > > Because invalidate happens in the main thread. > > That's doing unsafe cross-thread modifications then. We should probably make m_isActive an Atomic<bool>, yes. > >>> Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:292 > >>> + return; > >> > >> m_coordinateUpdateCompletion should be set here, but to false if m_inForceRepaint is false as well. > > > > That happens, because it doesn't return early. > > I'm saying it should be set regardless of the value of m_inForceRepaint. But > if m_inForceRepaint is false, it should be set to false. > > bool coordinateUpdate = m_inForceRepaint && std::any_of(... That's what caused the timeout, and I suspect bug #172027 too. If we change coordinateUpdate for the force repaint, the frame is never completed. If the state is PendingAfterCompletion, we won't process more updates. So, what we do here is handling the force update like it didn't happen, to ensure frame updates continue working as expected.
Michael Catanzaro
Comment 10 2017-05-17 05:49:12 PDT
Seems like we need to reopen this...
Zan Dobersek
Comment 11 2017-05-17 06:54:24 PDT
Comment on attachment 310260 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=310260&action=review >>>>> Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:292 >>>>> + return; >>>> >>>> m_coordinateUpdateCompletion should be set here, but to false if m_inForceRepaint is false as well. >>> >>> That happens, because it doesn't return early. >> >> I'm saying it should be set regardless of the value of m_inForceRepaint. But if m_inForceRepaint is false, it should be set to false. >> >> bool coordinateUpdate = m_inForceRepaint && std::any_of(... > > That's what caused the timeout, and I suspect bug #172027 too. If we change coordinateUpdate for the force repaint, the frame is never completed. If the state is PendingAfterCompletion, we won't process more updates. So, what we do here is handling the force update like it didn't happen, to ensure frame updates continue working as expected. When do you expect a forced updated to be finished? And do you expect the DisplayRefreshMonitor callback to be scheduled for forced updates? You also don't have any guarantee that there is a scene state update pending that would be executed during a forced repaint, meaning m_clientRendersNextFrame isn't necessarily flipped to true, which would end up not scheduling a DisplayRefreshMonitor callback in sceneUpdateFinished() that would complete the update.
Carlos Garcia Campos
Comment 12 2017-05-17 23:44:15 PDT
(In reply to Zan Dobersek from comment #11) > Comment on attachment 310260 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=310260&action=review > > >>>>> Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/ThreadedCompositor.cpp:292 > >>>>> + return; > >>>> > >>>> m_coordinateUpdateCompletion should be set here, but to false if m_inForceRepaint is false as well. > >>> > >>> That happens, because it doesn't return early. > >> > >> I'm saying it should be set regardless of the value of m_inForceRepaint. But if m_inForceRepaint is false, it should be set to false. > >> > >> bool coordinateUpdate = m_inForceRepaint && std::any_of(... > > > > That's what caused the timeout, and I suspect bug #172027 too. If we change coordinateUpdate for the force repaint, the frame is never completed. If the state is PendingAfterCompletion, we won't process more updates. So, what we do here is handling the force update like it didn't happen, to ensure frame updates continue working as expected. > > When do you expect a forced updated to be finished? And do you expect the > DisplayRefreshMonitor callback to be scheduled for forced updates? > > You also don't have any guarantee that there is a scene state update pending > that would be executed during a forced repaint, meaning > m_clientRendersNextFrame isn't necessarily flipped to true, which would end > up not scheduling a DisplayRefreshMonitor callback in sceneUpdateFinished() > that would complete the update. A fore repaint is always synchronous, it calls renderLayerTree directly so it's not expected to call updateCompleted(). I don't know if we should do anything special with the refresh monitor in that case. These are the situations that can happen when a forceRepaint is scheduled. 1) Previous update is complete. If updateCompleted() is called, it will crash in SHOULD NEVER BE REACHED. 2) Previous frame is InProgress and m_coordinateUpdateCompletionWithClient is false. If the frame renderLayerTree happens before the force repaint, then updateCompleted() will be called, so it would be like previous case. If it happens after, the forceRepaint one will not calle updateCompleted() and the InProgress frame will. At some point the scene is committed and the update lambda is called. If the force repaint happens before the update would do nothing, leaving m_coordinateUpdateCompletionWithClient as false. When completeCoordinatedUpdateIfNeeded is called, m_coordinateUpdateCompletionWithClient is still false as expected and updateCompleted() is not called (it was already called once for this frame). 3) Previous frame is InProgress and m_coordinateUpdateCompletionWithClient is true. In this case updateCompleted() won't be called no matter if one renderLayerTree is called before the other. But at some point the scene is committed and the update lambda is called. If the force repaint happens before the update would do nothing, leaving the m_coordinateUpdateCompletionWithClient as true. When completeCoordinatedUpdateIfNeeded is called, m_coordinateUpdateCompletionWithClient is still true as expected and updateCompleted() is called. 4) Previous frame if in PendingAfterCompletion. So this is like 2) or 3) but another frame has been scheduled and should be fired after previous one finishes. 1) was causing SHOULD NEVER BE REACHED, and it was fixed in r216182. In 2) and 3) the value of m_coordinateUpdateCompletionWithClient might be changed in an unexpected way by the forceRepaint commit, causing updateCompleted() to be called twice and then crashing in SHOULD NEVER BE REACHED. This was fixed by this bug. In 4) the value of m_coordinateUpdateCompletionWithClient might be changed in an unexpected way by the forceRepaint commit, causing updateCompleted() to never be called, causing the timeout in imported/blink/compositing/layer-creation/incremental-destruction.html and maybe bug #172027 too. This was fixed by this bug. Maybe there are still cases not considered or we need to do something else with the refresh monitor, but at least after r216970 all tests crashing in SHOULD NEVER BE REACHED stopped crashing, test imported/blink/compositing/layer-creation/incremental-destruction.html is no longer timing out, and Adri can use github again. I admit I don't completely understand the whole frame update mechanism and refresh monitor, but it's a lot more stable now. Let's improve it on top of this if needed.
Charlie Turner
Comment 13 2017-06-13 14:01:08 PDT
I'm still hitting case 1). I can pretty reliably trigger it by watching any YouTube video with Epiphany built against r218095. It's non-deterministic if you just let videos play, but I found that interacting with any of the controls is a sure fire way of getting this backtrace (among others unfortunately, but you always get one :)) The m_updateState is Completed when I enter the updateCompleted(): SHOULD NEVER BE REACHED ../../Source/WebKit2/Shared/CoordinatedGraphics/threadedcompositor/CompositingRunLoop.cpp(188) : void WebKit::CompositingRunLoop::updateCompleted() 1 0x7f0dc5e50818 /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(WTFCrash+0x1e) [0x7f0dc5e50818] 2 0x7f0dcd8443a8 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::CompositingRunLoop::updateCompleted()+0xe4) [0x7f0dcd8443a8] 3 0x7f0dcd849d6a /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::ThreadedCompositor::sceneUpdateFinished()+0xb8) [0x7f0dcd849d6a] 4 0x7f0dcd84a33f /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::ThreadedCompositor::frameComplete()+0x45) [0x7f0dcd84a33f] 5 0x7f0dcda48d06 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::ThreadedCoordinatedLayerTreeHost::frameComplete()+0x26) [0x7f0dcda48d06] 6 0x7f0dcda50c15 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::AcceleratedSurfaceX11::didRenderFrame()+0x27) [0x7f0dcda50c15] 7 0x7f0dcda48e55 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::ThreadedCoordinatedLayerTreeHost::didRenderFrame()+0x43) [0x7f0dcda48e55] 8 0x7f0dcda49bdc /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::ThreadedCoordinatedLayerTreeHost::CompositorClient::didRenderFrame()+0x1c) [0x7f0dcda49bdc] 9 0x7f0dcd849c8e /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::ThreadedCompositor::renderLayerTree()+0x372) [0x7f0dcd849c8e] 10 0x7f0dcd8489cd /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(+0x64e59cd) [0x7f0dcd8489cd] 11 0x7f0dcd84c916 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(+0x64e9916) [0x7f0dcd84c916] 12 0x357abca /home/charlie/jhbuild/install/libexec/webkit2gtk-4.0/WebKitWebProcess(std::function<void ()>::operator()() const+0x32) [0x357abca] 13 0x7f0dcd8443c8 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WebKit::CompositingRunLoop::updateTimerFired()+0x1c) [0x7f0dcd8443c8] 14 0x7f0dcd847e58 /home/charlie/jhbuild/install/lib/libwebkit2gtk-4.0.so.37(WTF::RunLoop::Timer<WebKit::CompositingRunLoop>::fired()+0x66) [0x7f0dcd847e58] 15 0x7f0dc5ebeaf1 /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a78af1) [0x7f0dc5ebeaf1] 16 0x7f0dc5ebeb2d /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a78b2d) [0x7f0dc5ebeb2d] 17 0x7f0dc5ebe1a2 /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a781a2) [0x7f0dc5ebe1a2] 18 0x7f0dc5ebe1d1 /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a781d1) [0x7f0dc5ebe1d1] 19 0x7f0dc0c59dca /home/charlie/jhbuild/install/lib/libglib-2.0.so.0(g_main_context_dispatch+0x15a) [0x7f0dc0c59dca] 20 0x7f0dc0c5a170 /home/charlie/jhbuild/install/lib/libglib-2.0.so.0(+0x4b170) [0x7f0dc0c5a170] 21 0x7f0dc0c5a492 /home/charlie/jhbuild/install/lib/libglib-2.0.so.0(g_main_loop_run+0xc2) [0x7f0dc0c5a492] 22 0x7f0dc5ebe738 /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(WTF::RunLoop::run()+0xba) [0x7f0dc5ebe738] 23 0x7f0dc5ebc20e /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a7620e) [0x7f0dc5ebc20e] 24 0x7f0dc5ebdc82 /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a77c82) [0x7f0dc5ebdc82] 25 0x351a3cf /home/charlie/jhbuild/install/libexec/webkit2gtk-4.0/WebKitWebProcess(WTF::Function<void ()>::operator()() const+0x37) [0x351a3cf] 26 0x7f0dc5e74e9e /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a2ee9e) [0x7f0dc5e74e9e] 27 0x7f0dc5eba77b /home/charlie/jhbuild/install/lib/libjavascriptcoregtk-4.0.so.18(+0x2a7477b) [0x7f0dc5eba77b] 28 0x7f0dbecfa6ba /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f0dbecfa6ba] 29 0x7f0dbdd7382d /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0dbdd7382d]
Charlie Turner
Comment 14 2017-06-13 17:10:38 PDT
FWIW, I did test the change to have m_isActive an atomic bool, but that didn't slow down the frequency of crashing much.
Note You need to log in before you can comment on or make changes to this bug.