Summary: | [GTK] Crash of WebProcess on the last WebView disconnect | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Milan Crha <mcrha> | ||||||||
Component: | WebKitGTK | Assignee: | Nobody <webkit-unassigned> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | bugs-noreply, cgarcia, mcatanzaro | ||||||||
Priority: | P2 | ||||||||||
Version: | WebKit Local Build | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Attachments: |
|
Description
Milan Crha
2016-09-05 08:09:25 PDT
valgrind output (without detailed debuginfo, because of having issues with it): ==16976== Thread 1: ==16976== Invalid read of size 8 ==16976== at 0x66581C7: WebCore::PlatformDisplay::~PlatformDisplay() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x6575F68: WebCore::PlatformDisplayX11::~PlatformDisplayX11() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x764FC07: __run_exit_handlers (in /usr/lib64/libc-2.23.so) ==16976== by 0x764FC54: exit (in /usr/lib64/libc-2.23.so) ==16976== by 0x7635727: (below main) (in /usr/lib64/libc-2.23.so) ==16976== Address 0x31788f70 is 0 bytes inside a block of size 64 free'd ==16976== at 0x4C2CD5A: free (vg_replace_malloc.c:530) ==16976== by 0x65539C5: WebCore::GLContext::cleanupActiveContextsAtExit() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x764FC07: __run_exit_handlers (in /usr/lib64/libc-2.23.so) ==16976== by 0x764FC54: exit (in /usr/lib64/libc-2.23.so) ==16976== by 0x7635727: (below main) (in /usr/lib64/libc-2.23.so) ==16976== Block was alloc'd at ==16976== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299) ==16976== by 0xA8EB868: WTF::fastMalloc(unsigned long) (in /build/local/lib/libjavascriptcoregtk-4.0.so.18.4.4) ==16976== by 0x6563F1D: WebCore::GLContextGLX::createPbufferContext(WebCore::PlatformDisplay&, __GLXcontextRec*) (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x6564272: WebCore::GLContextGLX::createSharingContext(WebCore::PlatformDisplay&) (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x6553ED6: WebCore::GLContext::createSharingContext(WebCore::PlatformDisplay&) (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x6658222: WebCore::PlatformDisplay::sharingGLContext() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x656418D: WebCore::GLContextGLX::createContext(unsigned long, WebCore::PlatformDisplay&) (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x6553D6F: WebCore::GLContext::createContextForWindow(unsigned long, WebCore::PlatformDisplay*) (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x562DBD5: WebKit::ThreadedCompositor::makeContextCurrent() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x562DCD4: WebKit::ThreadedCompositor::renderLayerTree() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x562AB44: WebKit::CompositingRunLoop::updateTimerFired() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0xA92C78C: WTF::RunLoop::TimerBase::TimerBase(WTF::RunLoop&)::{lambda(void*)#1}::_FUN(void*) (in /build/local/lib/libjavascriptcoregtk-4.0.so.18.4.4) ==16976== ==16976== Jump to the invalid address stated on the next line ==16976== at 0x0: ??? ==16976== by 0x6575F68: WebCore::PlatformDisplayX11::~PlatformDisplayX11() (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) ==16976== by 0x764FC07: __run_exit_handlers (in /usr/lib64/libc-2.23.so) ==16976== by 0x764FC54: exit (in /usr/lib64/libc-2.23.so) ==16976== by 0x7635727: (below main) (in /usr/lib64/libc-2.23.so) ==16976== Address 0x0 is not stack'd, malloc'd or (recently) free'd The valgrind claims more issues here, maybe consider giving it a try too. The claims are on usage of uninitialized memory. (In reply to comment #1) > valgrind output (without detailed debuginfo, because of having issues with > it): > > ==16976== Thread 1: > ==16976== Invalid read of size 8 > ==16976== at 0x66581C7: WebCore::PlatformDisplay::~PlatformDisplay() (in > /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) > ==16976== by 0x6575F68: > WebCore::PlatformDisplayX11::~PlatformDisplayX11() (in > /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) > ==16976== by 0x764FC07: __run_exit_handlers (in /usr/lib64/libc-2.23.so) > ==16976== by 0x764FC54: exit (in /usr/lib64/libc-2.23.so) > ==16976== by 0x7635727: (below main) (in /usr/lib64/libc-2.23.so) > ==16976== Address 0x31788f70 is 0 bytes inside a block of size 64 free'd > ==16976== at 0x4C2CD5A: free (vg_replace_malloc.c:530) > ==16976== by 0x65539C5: WebCore::GLContext::cleanupActiveContextsAtExit() > (in /build/local/lib/libwebkit2gtk-4.0.so.37.14.4) > ==16976== by 0x764FC07: __run_exit_handlers (in /usr/lib64/libc-2.23.so) > ==16976== by 0x764FC54: exit (in /usr/lib64/libc-2.23.so) > ==16976== by 0x7635727: (below main) (in /usr/lib64/libc-2.23.so) OK that shows the problem, good job getting the valgrind output! (In reply to comment #2) > The valgrind claims more issues here, maybe consider giving it a try too. > The claims are on usage of uninitialized memory. Consider filing a new bug if it's not bug #146729 OK, so... there are a couple different ways to fix this, but fundamentally the problem is the GLContext class hands off ownership of new GLContexts to callers in its create functions, returning std::unique_ptrs. Then it goes behind the callers' backs and deletes them all in an exit handler if they haven't already been deleted. That fails here, as PlatformDisplay also gets deleted in an exit handler, which runs after GLContext's exit handler, causing the same GLContext to be deleted twice. There are many ways to fix this... remove the GLContext exit handler (seems like the correct solution, but it carries a warning of possibly crashing the X server when using nvidia proprietary driver? do we care?), leak the PlatformDisplay's GLContext by calling release() on the unique_ptr and rely on the GLContext exit handler to free it, create a throwaway GLContext the first time PlatformDisplay::initializeEGLDisplay is called to ensure the GLContext exit handler gets registered first (and therefore executed last at exit time)... let's see if Carlos Garcia has an opinion on what's best. (In reply to comment #5) > OK, so... there are a couple different ways to fix this, but fundamentally > the problem is the GLContext class hands off ownership of new GLContexts to > callers in its create functions, returning std::unique_ptrs. Then it goes > behind the callers' backs and deletes them all in an exit handler if they > haven't already been deleted. That fails here, as PlatformDisplay also gets > deleted in an exit handler, which runs after GLContext's exit handler, > causing the same GLContext to be deleted twice. This is not accurate. PlatformDisplay is not deleted in an exit handler, it's a static std::unique_ptr that is deleted after all exit handlers. > There are many ways to fix this... remove the GLContext exit handler (seems > like the correct solution, but it carries a warning of possibly crashing the > X server when using nvidia proprietary driver? do we care?), Never liked that, TBH, I was tempted to remove it when reworked the GLContext. It's probably a workaround for a bug in nvidia drivers that might have already been fixed. In any case, all contexts created now except the sharing context that is owned by the PlatformDisplay should be freed already on exit. > leak the > PlatformDisplay's GLContext by calling release() on the unique_ptr and rely > on the GLContext exit handler to free it, This is not possible because the GLContext exit handler is a workaround used on for X11 contexts, and other classes shouldn't rely on that. > create a throwaway GLContext the > first time PlatformDisplay::initializeEGLDisplay is called to ensure the > GLContext exit handler gets registered first (and therefore executed last at > exit time)... let's see if Carlos Garcia has an opinion on what's best. This has nothing to do with EGLDisplay, the crash is happening on ~PlatformDisplay and the exit handler of the EGLDisplay doesn't delete the PlatformDisplay. So, the solution is either removing the GLContext active contents handling assuming the nvidia issue is now fixed, or the conservative solution could be to not handle the sharing context as active context, since we are sure PlatformDisplay destructor is always called. The original code deleted the GraphicsContext3D instead of the GLContext, so it seems to me this was a hack to workaround both a bug in nvidia drivers, and a leak in WebKit. I'll remove that code and if we get reports about X server crashes we can look for a better solution. Created attachment 288112 [details]
Patch
Comment on attachment 288112 [details]
Patch
This is probably too risky at this point of the cycle, I'll use a conservative approach instead.
Created attachment 288115 [details]
Patch
Let's use this for now.
Created attachment 288117 [details] valgrind log with applied patch (In reply to comment #4) > Consider filing a new bug if it's not bug #146729 Okay, I updated that bug report. (In reply to comment #10) > Let's use this for now. Thanks, it fixes the crash for me. Valgrind claims some leaked memory, though some of it goes down to the intel drivers which may or may not be the driver issue. See the attached valgrind log. (In reply to comment #9) > Comment on attachment 288112 [details] > Patch > > This is probably too risky at this point of the cycle, I'll use a > conservative approach instead. Maybe we can land this one in trunk and the other one in the stable branch. (In reply to comment #6) > This is not accurate. PlatformDisplay is not deleted in an exit handler, > it's a static std::unique_ptr that is deleted after all exit handlers. Where do you think static objects get deleted. :) > Never liked that, TBH, I was tempted to remove it when reworked the > GLContext. It's probably a workaround for a bug in nvidia drivers that might > have already been fixed. In any case, all contexts created now except the > sharing context that is owned by the PlatformDisplay should be freed already > on exit. Agreed. > This is not possible because the GLContext exit handler is a workaround used > on for X11 contexts, and other classes shouldn't rely on that. Good point. > > create a throwaway GLContext the > > first time PlatformDisplay::initializeEGLDisplay is called to ensure the > > GLContext exit handler gets registered first (and therefore executed last at > > exit time)... let's see if Carlos Garcia has an opinion on what's best. > > This has nothing to do with EGLDisplay, the crash is happening on > ~PlatformDisplay and the exit handler of the EGLDisplay doesn't delete the > PlatformDisplay. Yes, I was completely wrong; for some reason I thought we were deleting the PlatformDisplay in the exit handler registered in that function, but clearly that's not the case. Comment on attachment 288112 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288112&action=review I prefer to use this patch in trunk. > Source/WebCore/ChangeLog:8 > + Stop tracking X11 GL contexts to be cleanered on an exit handler. This was added to work around bugs on drivers, cleared Comment on attachment 288115 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=288115&action=review rs=me for 2.14, but I prefer just removing the code in trunk. > Source/WebCore/ChangeLog:13 > + pointer ownership. Since this is specific to GLX, I've moed the code from GLContext to GLContextGLX and moved > Source/WebCore/platform/graphics/glx/GLContextGLX.cpp:209 > + clear(); > + activeContexts().remove(this); Good idea. > Source/WebCore/platform/graphics/glx/GLContextGLX.cpp:225 > + glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); > + glXMakeCurrent(downcast<PlatformDisplayX11>(m_display).native(), None, None); Is this new? (In reply to comment #15) > Comment on attachment 288115 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=288115&action=review > > rs=me for 2.14, but I prefer just removing the code in trunk. Agree. > > Source/WebCore/ChangeLog:13 > > + pointer ownership. Since this is specific to GLX, I've moed the code from GLContext to GLContextGLX and > > moved > > > Source/WebCore/platform/graphics/glx/GLContextGLX.cpp:209 > > + clear(); > > + activeContexts().remove(this); > > Good idea. > > > Source/WebCore/platform/graphics/glx/GLContextGLX.cpp:225 > > + glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); > > + glXMakeCurrent(downcast<PlatformDisplayX11>(m_display).native(), None, None); > > Is this new? No, it was in the destructor and now in clear(). Committed r205544: <http://trac.webkit.org/changeset/205544> |