rdar://problem/58227967 conformance/textures/texture-copying-feedback-loops.html started being a flaky crash after switching to ANGLE. https://results.webkit.org/?suite=layout-tests&test=webgl%2F1.0.3%2Fconformance%2Ftextures%2Ftexture-copying-feedback-loops.html Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Application Specific Information: abort() called Application Specific Signatures: Graphics kernel error: 0xfffffffe Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff6ab9765a __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff6ac53e60 pthread_kill + 430 2 libsystem_c.dylib 0x00007fff6ab1e808 abort + 120 3 libGPUSupportMercury.dylib 0x00007fff53a380a5 gpusGenerateCrashLog.cold.1 + 95 4 libGPUSupportMercury.dylib 0x00007fff53a2f18f gpusGenerateCrashLog + 89 5 com.apple.driver.AppleIntelKBLGraphicsGLDriver 0x00007fff29795e58 gpusKillClientExt + 9 6 libGPUSupportMercury.dylib 0x00007fff53a3055b gpusSubmitDataBuffers + 164 7 com.apple.driver.AppleIntelKBLGraphicsGLDriver 0x00007fff28a0da36 IntelCommandBuffer::getNew(GLDContextRec*) + 48 8 com.apple.driver.AppleIntelKBLGraphicsGLDriver 0x00007fff28a0d844 intelSubmitCommands + 171
Looks like a crash inside Intel's graphics driver. There aren't any suppressions for this test in Chromium on macOS so it should be possible to make WebKit's WebGL validation behave exactly the same. I'll investigate locally.
More complete crash stack: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff6d24d2c6 __pthread_kill + 10 1 libsystem_pthread.dylib 0x00007fff6d308bf1 pthread_kill + 284 2 libsystem_c.dylib 0x00007fff6d1b76a6 abort + 127 3 libGPUSupportMercury.dylib 0x00007fff5c222240 gpusGenerateCrashLog + 166 4 com.apple.driver.AppleIntelKBLGraphicsGLDriver 0x00007fff3a511578 gpusKillClientExt + 9 5 libGPUSupportMercury.dylib 0x00007fff5c2235ea gpusSubmitDataBuffers + 164 6 com.apple.driver.AppleIntelKBLGraphicsGLDriver 0x00007fff3996aea8 IntelCommandBuffer::getNew(GLDContextRec*) + 48 7 com.apple.driver.AppleIntelKBLGraphicsGLDriver 0x00007fff3996acba intelSubmitCommands + 171 8 GLEngine 0x00007fff4af5f671 gleUnbindTextureObject + 58 9 GLEngine 0x00007fff4af30477 gleUnbindDeleteHashNamesAndObjects + 159 10 GLEngine 0x00007fff4ae3eaae glDeleteTextures_Exec + 697 11 libANGLE-shared.dylib 0x000000010b494ece rx::StateManagerGL::deleteTexture(unsigned int) + 278 (StateManagerGL.cpp:246) 12 libANGLE-shared.dylib 0x000000010b4aef61 rx::TextureGL::onDestroy(gl::Context const*) + 31 (TextureGL.cpp:156) 13 libANGLE-shared.dylib 0x000000010b4a9b0b gl::Texture::onDestroy(gl::Context const*) + 171 (Texture.cpp:675) 14 libANGLE-shared.dylib 0x000000010b4624c6 angle::RefCountObject<gl::Context, angle::Result>::release(gl::Context const*) + 266 (RefCountObject.h:45) 15 libANGLE-shared.dylib 0x000000010b46fd2a gl::TypedResourceManager<gl::Texture, gl::HandleAllocator, gl::TextureManager, gl::TextureID>::deleteObject(gl::Context const*, gl::TextureID) + 78 (ResourceManager.cpp:102) 16 libANGLE-shared.dylib 0x000000010b263cd6 gl::Context::deleteTextures(int, gl::TextureID const*) + 42 (Context.cpp:6210) 17 libANGLE-shared.dylib 0x000000010b2c2773 gl::DeleteTextures(int, unsigned int const*) + 142 (entry_points_gles_2_0_autogen.cpp:799) 18 com.apple.WebCore 0x000000010f64ef35 WebCore::GraphicsContext3D::deleteTexture(unsigned int) + 245 (GraphicsContext3DANGLE.cpp:1763) 19 com.apple.WebCore 0x00000001106e9cdc WebCore::WebGLObject::deleteObject(WebCore::GraphicsContext3D*) + 76 (WebGLObject.cpp:63) 20 com.apple.WebCore 0x000000011070f800 WebCore::WebGLSharedObject::detachContextGroup() + 32 (WebGLSharedObject.cpp:52) 21 com.apple.WebCore 0x00000001106e8838 WebCore::WebGLContextGroup::loseContextGroup(WebCore::WebGLRenderingContextBase::LostContextMode) + 184 (WebGLContextGroup.cpp:88) 22 com.apple.WebCore 0x0000000110709560 non-virtual thunk to WebCore::WebGLRenderingContextBase::stop() + 96 (WebGLRenderingContextBase.cpp:5180) 23 com.apple.WebCore 0x00000001104af3cd WTF::Detail::CallableWrapper<WebCore::ScriptExecutionContext::stopActiveDOMObjects()::$_7, WebCore::ScriptExecutionContext::ShouldContinue, WebCore::ActiveDOMObject&>::call(WebCore::ActiveDOMObject&) + 13 (Function.h:52) 24 com.apple.WebCore 0x00000001104aaa59 WebCore::ScriptExecutionContext::forEachActiveDOMObject(WTF::Function<WebCore::ScriptExecutionContext::ShouldContinue (WebCore::ActiveDOMObject&)> const&) const + 489 (ScriptExecutionContext.cpp:244) 25 com.apple.WebCore 0x00000001104aac06 WebCore::ScriptExecutionContext::stopActiveDOMObjects() + 70 (ScriptExecutionContext.cpp:298) 26 com.apple.WebCore 0x000000011085c042 WebCore::FrameLoader::frameDetached() + 178 (FrameLoader.cpp:2817) 27 com.apple.WebCore 0x00000001105fcabf WebCore::HTMLFrameOwnerElement::disconnectContentFrame() + 31 (HTMLFrameOwnerElement.cpp:81) 28 com.apple.WebCore 0x00000001103f300a WebCore::disconnectSubframes(WebCore::ContainerNode&, WebCore::SubframeDisconnectPolicy) + 218 (ContainerNodeAlgorithms.cpp:304) 29 com.apple.WebCore 0x00000001103efc3c WebCore::ContainerNode::removeChildren() + 540 (ContainerNode.cpp:104) 30 com.apple.WebCore 0x000000011059c74c WebCore::replaceChildrenWithFragment(WebCore::ContainerNode&, WTF::Ref<WebCore::DocumentFragment, WTF::DumbPtrTraits<WebCore::DocumentFragment> >&&) + 92 (markup.cpp:1334) The crash is occurring during test teardown. I can reproduce this running in WebKit locally.
This crash is strange. All of ANGLE's state appears to be set up correctly. The egl::Display is initialized, and the CGLContextObj that it refers to internally is still current, per CGLGetCurrentContext(). It doesn't look like the CGLContextObj has been deleted out from under the library - assuming that would have caused the current context to be null'ed out. Still investigating.
Still trying to figure out what is going on. Have checked out two copies of WebKit at this revision: Provide pid to crashing service worker process and GPU process 617c711307e3fb84a29000b2010bbf95b43773cc one using ANGLE, the other not. Built both Debug, running both with minibrowser in WK1 mode (for easier debugging). Can see both versions allocating and deleting texture objects. The ANGLE version seems to be allocating a few more textures per iteration, but that may be because the instrumentation I added is at a lower level in ANGLE. The crash is somehow related to running the test in an iframe. With the ANGLE backend, when loading LayoutTests/webgl/1.0.3/conformance/textures/texture-copying-feedback-loops.html (served up via a local HTTP server), it reliably crashes upon the first or second reload. The iframe never displays correctly with the ANGLE backend, even if attempting to resize the window to force a repaint. Loading LayoutTests/webgl/1.0.3/resources/webgl_test_files/conformance/textures/texture-copying-feedback-loops.html, the page can be reloaded over and over again reliably. Wondering whether some other part of WebKit which is using raw OpenGL is coming in and setting the CGLContextObj out from under ANGLE, causing ANGLE to inadvertently modify OpenGL context state belonging to WebKit, and ultimately leading to corrupted OpenGL state and a crash. I've tried modifying EGL_MakeCurrent (Source/ThirdParty/ANGLE/src/libGLESv2/entry_points_egl.cpp) to always call SetContextCurrent (and GraphicsContext3D::makeContextCurrent to always call EGL_MakeCurrent), but the crash persists. Still investigating.
Looking more deeply into ANGLE, it assumes no other code in the process is making OpenGL calls - not a good assumption when it's being embedded as a library (but not the base OpenGL ES implementation) in a larger application. Have tried modifying GraphicsContext3D::makeContextCurrent (Source/WebCore/platform/graphics/cocoa/GraphicsContext3DCocoa.mm) to forcibly set ANGLE's internal CGLContextObj to be current, via CGLSetCurrentContext: ... #elif USE(ANGLE) bool res = EGL_MakeCurrent(m_displayObj, EGL_NO_SURFACE, EGL_NO_SURFACE, m_contextObj); if (res) { EGLDeviceEXT device = nullptr; EGL_QueryDisplayAttribEXT(m_displayObj, EGL_DEVICE_EXT, reinterpret_cast<EGLAttrib*>(&device)); CGLContextObj cglContext = nullptr; EGL_QueryDeviceAttribEXT(device, EGL_CGL_CONTEXT_ANGLE, reinterpret_cast<EGLAttrib*>(&cglContext)); CGLSetCurrentContext(cglContext); } return res; #endif ... Have added some logging and confirmed that CGLSetCurrentContext is being called for each WebGL entry point. Unfortunately, this doesn't work around the problem; the same crash happens after reloading the page once or twice. I'd like to figure out a way to reset the CGLContextObj to null upon returning from the JavaScript callback that's making WebGL API calls, to see whether that helps isolate ANGLE from the rest of WebKit's rendering. Have looked a bit into RenderIFrame and superclasses, PlatformCALayerCocoa and the layer types it uses, GraphicsLayerCA, TileController, WidgetMac.mm, and some others, to try to understand how iframes are rendered. Are they another CALayer in the hierarchy? Do they have their own NSView? The question remains why the iframe running this test is disappearing when the ANGLE backend is enabled, and what interference there is between ANGLE's OpenGL context and the rest of WebKit's rendering that's causing this crash in the OpenGL driver. Still investigating - would appreciate any feedback on the questions above.
In GraphicsContext3D::markLayerComposited (GraphicsContext3DANGLE.cpp), tried resetting ANGLE's notion of the EGL context to EGL_NO_CONTEXT, and calling CGLSetCurrentContext(nullptr). WebGL still works, but the crash still persists. If a workaround like this is to be attempted then it needs to be bounded more tightly to the JavaScript callback currently running WebGL.
It turns out the disappearance of the iframe was unrelated. Apple's layout test wrapper for the WebGL conformance suite removes the iframe if all sub-tests pass, and for this test, the switch to ANGLE fixes the remaining two preexisting failures. Remembering that a workaround was recently added to Chromium for an Intel graphics driver bug where operations were being reordered with respect to glBindFramebuffer calls, I dug in the version history and found this: Fixed WebGL flush bug for old Intel driver https://chromium-review.googlesource.com/1912790 Applying this workaround in ANGLE conclusively works around this crash in the OpenGL driver! Running the test in the Minibrowser, it can be reloaded 40+ times reliably, where it used to crash within the first couple of reloads. It's likely to solve several of the other layout test flakes caused by the switch to ANGLE. Likely not all, though - will have to analyze them all case-by-case. Patch incoming. It will be upstreamed to ANGLE under this bug: Need flush workaround before/after bindFramebuffer on Intel drivers on macOS https://bugs.chromium.org/p/angleproject/issues/detail?id=4267
Created attachment 386796 [details] Patch
Note that there are important steps to take when updating ANGLE. See http://trac.webkit.org/wiki/UpdatingANGLE
This patch is being upstreamed to ANGLE under: https://chromium-review.googlesource.com/1987932 I think it's worth adding to WebKit's copy of ANGLE sooner rather than later because this may fix many flaky failures of the WebGL layout / conformance tests. Raising bug to P1 because this was a crash.
Comment on attachment 386796 [details] Patch Clearing flags on attachment: 386796 Committed r254045: <https://trac.webkit.org/changeset/254045>
All reviewed patches have been landed. Closing bug.
Upon review of this patch upstream in ANGLE in https://chromium-review.googlesource.com/1987932 , more effort was devoted to creating a reduced test case, and it came to light that this workaround is not really sufficient. Will file a new bug, block it on this one, and propose a revised workaround.