Bug 63189

Summary: Fix latch deadlock when GPU process crashes or context is lost
Product: WebKit Reporter: John Bates <jbates@google.com>
Component: New BugsAssignee: John Bates <jbates@google.com>
Status: RESOLVED FIXED    
Severity: Normal CC: enne@google.com, jamesr@chromium.org, kbr@google.com, vangelis@chromium.org, webkit.review.bot@gmail.com
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch none

Description From 2011-06-22 14:56:28 PST
Fix latch deadlock when GPU process crashes or context is lost
------- Comment #1 From 2011-06-22 15:15:21 PST -------
Created an attachment (id=98248) [details]
Patch
------- Comment #2 From 2011-06-22 15:21:28 PST -------
This patch solves the deadlock problem with latches when the GPU process crashes or in some cases when a context is lost because of errors.

This does not recreate the failed child contexts, because that is already done by WebGLRenderingContext when the application supports it.

This also does not reparent the child contexts to the new compositor context when the LayerRendererChromium is recreated. A separate patch needs to do that.
------- Comment #3 From 2011-06-22 15:40:45 PST -------
(From update of attachment 98248 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=98248&action=review

> Source/WebCore/ChangeLog:9
> +
> +        Test: open particles WebGL demo in chrome, kill GPU process from Task Manager; observe no deadlock.
> +

you need at least a little bit of explanation here of what's going on

> Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:-379
> -
> -    // After updateCompositorResources, set/wait latches for all child
> -    // contexts. This will prevent the compositor from using any of the child
> -    // parent textures while WebGL commands are executing from javascript *and*
> -    // while the final parent texture is being blit'd. copyTexImage2D
> -    // uses the parent texture as a temporary resolve buffer, so that's why the
> -    // waitLatch is below, to block the compositor from using the parent texture
> -    // until the next WebGL SwapBuffers (or copyTextureToParentTexture for
> -    // Canvas2D).
> -    if (hardwareCompositing() && m_contextSupportsLatch) {
> -        m_childContextsWereCopied = true;
> -        // For each child context:
> -        //   glSetLatch(Offscreen->Compositor);
> -        //   glWaitLatch(Compositor->Offscreen);
> -        ChildContextMap::iterator i = m_childContexts.begin();
> -        for (; i != m_childContexts.end(); ++i) {
> -            Extensions3DChromium* ext = static_cast<Extensions3DChromium*>(i->first->getExtensions());
> -            GC3Duint childToParentLatchId, parentToChildLatchId;
> -            ext->getParentToChildLatchCHROMIUM(&parentToChildLatchId);
> -            ext->getChildToParentLatchCHROMIUM(&childToParentLatchId);
> -            ext->setLatchCHROMIUM(childToParentLatchId);
> -            ext->waitLatchCHROMIUM(parentToChildLatchId);
> -        }
> -    }

why's this going away?
------- Comment #4 From 2011-06-22 17:51:08 PST -------
(From update of attachment 98248 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=98248&action=review

>> Source/WebCore/ChangeLog:9
>> +
> 
> you need at least a little bit of explanation here of what's going on

Will do.

>> Source/WebCore/platform/graphics/chromium/LayerRendererChromium.cpp:-379
>> -    }
> 
> why's this going away?

It turned out to be premature to try and plan ahead for the threaded compositor with lots of nasty comments. For now, it's much safer to do all latching in one place (above in updateAndDrawLayers). It's also best to do the corresponding set before the wait, so consolidating the latching code above allows us to do that. When the threaded compositor comes online, we need to redo the latching code.
------- Comment #5 From 2011-06-22 18:11:01 PST -------
(From update of attachment 98248 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=98248&action=review

>>> Source/WebCore/ChangeLog:9
>>> +
>> 
>> you need at least a little bit of explanation here of what's going on
> 
> Will do.

This is what I'm adding:
The main bug fix is to only set/wait latches if the child context has no errors.
Additionally, the LayerChromium classes needed to be modified to not continue drawing when their corresponding contexts have errors. Otherwise, they would draw with invalid texture ids.
------- Comment #6 From 2011-06-22 18:45:00 PST -------
(From update of attachment 98248 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=98248&action=review

The logic changes look fine but there's an important documentation update that needs to be incorporated into this patch.

> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:68
> +            && (context->getExtensions()->getGraphicsResetStatusARB() == GraphicsContext3D::NO_ERROR));

I made a mistake when originally adding getGraphicsResetStatusARB to Extensions3D.h. In the real OpenGL API, this function only returns the error state once; subsequent calls generally return NO_ERROR. In WebKit, this entry point's state needs to be sticky; it needs to continue to return the error state essentially forever, because context recovery is implemented by throwing away the GraphicsContext3D instance and creating a new one. Please update the comments in Extensions3D.h indicating that any error returned needs to be persistent for the lifetime of the GraphicsContext3D object, unlike the behavior of the actual GL_ARB_robustness extension.
------- Comment #7 From 2011-06-23 15:59:38 PST -------
Created an attachment (id=98423) [details]
Patch
------- Comment #8 From 2011-06-23 16:00:53 PST -------
(From update of attachment 98248 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=98248&action=review

>> Source/WebCore/platform/graphics/chromium/Canvas2DLayerChromium.cpp:68
>> +            && (context->getExtensions()->getGraphicsResetStatusARB() == GraphicsContext3D::NO_ERROR));
> 
> I made a mistake when originally adding getGraphicsResetStatusARB to Extensions3D.h. In the real OpenGL API, this function only returns the error state once; subsequent calls generally return NO_ERROR. In WebKit, this entry point's state needs to be sticky; it needs to continue to return the error state essentially forever, because context recovery is implemented by throwing away the GraphicsContext3D instance and creating a new one. Please update the comments in Extensions3D.h indicating that any error returned needs to be persistent for the lifetime of the GraphicsContext3D object, unlike the behavior of the actual GL_ARB_robustness extension.

Done.
------- Comment #9 From 2011-06-23 16:01:44 PST -------
(From update of attachment 98423 [details])
Seems good
------- Comment #10 From 2011-06-23 17:01:00 PST -------
(From update of attachment 98423 [details])
As we've discussed offline, I have some remaining concerns about the asynchronous nature of the loss of contexts and whether a synchronous check before issuing the latch command will really be sufficient. However this seems to be a step forward. cq=me
------- Comment #11 From 2011-06-23 17:12:04 PST -------
(From update of attachment 98423 [details])
Clearing flags on attachment: 98423

Committed r89635: <http://trac.webkit.org/changeset/89635>
------- Comment #12 From 2011-06-23 17:12:08 PST -------
All reviewed patches have been landed.  Closing bug.