Bug 201507 - [GTK] Crash in Nicosia::GC3DLayer::makeContextCurrent
Summary: [GTK] Crash in Nicosia::GC3DLayer::makeContextCurrent
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P2 Normal
Assignee: Carlos Garcia Campos
URL:
Keywords:
: 187958 (view as bug list)
Depends on:
Blocks: 192523
  Show dependency treegraph
 
Reported: 2019-09-05 08:14 PDT by Michael Catanzaro
Modified: 2022-09-06 02:17 PDT (History)
11 users (show)

See Also:


Attachments
Patch (7.33 KB, patch)
2020-07-22 07:26 PDT, Adrian Perez
no flags Details | Formatted Diff | Diff
Patch v2 (13.01 KB, patch)
2020-07-28 07:59 PDT, Adrian Perez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2019-09-05 08:14:07 PDT
Visit https://www.washingtonpost.com/technology/2019/08/26/spy-your-wallet-credit-cards-have-privacy-problem/?noredirect=on in Tech Preview (2.25.4) and wait about 10-15 seconds. The page will crash:

Core was generated by `/usr/libexec/webkit2gtk-4.0/WebKitWebProcess 17 31'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f127fa8fa58 in Nicosia::GC3DLayer::makeContextCurrent (
    this=<optimized out>) at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
352	      get() const noexcept

(gdb) bt
#0  0x00007f127fa8fa58 in Nicosia::GC3DLayer::makeContextCurrent() (this=<optimized out>)
    at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
#1  0x00007f127fa84b80 in WebCore::GraphicsContext3D::makeContextCurrent() (this=this@entry=0x7f11e79dc600)
    at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
#2  0x00007f127fa84de7 in WebCore::GraphicsContext3D::GraphicsContext3D(WebCore::GraphicsContext3DAttributes, WebCore::HostWindow*, WebCore::GraphicsContext3D::RenderStyle, WebCore::GraphicsContext3D*)
    (this=0x7f11e79dc600, attributes=..., renderStyle=WebCore::GraphicsContext3D::RenderOffscreen, sharedContext=<optimized out>) at ../Source/WebCore/platform/graphics/texmap/GraphicsContext3DTextureMapper.cpp:114
#3  0x00007f127fa859de in WebCore::GraphicsContext3D::create(WebCore::GraphicsContext3DAttributes, WebCore::HostWindow*, WebCore::GraphicsContext3D::RenderStyle) (attributes=..., hostWindow=hostWindow@entry=
    0x7f1275590060, renderStyle=renderStyle@entry=WebCore::GraphicsContext3D::RenderOffscreen)
    at DerivedSources/ForwardingHeaders/wtf/RefCounted.h:140
#4  0x00007f127f092c1f in WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContext3DAttributes&, WTF::String const&) (canvas=..., attributes=..., type=...)
    at ../Source/WebCore/html/canvas/WebGLRenderingContextBase.cpp:601
#5  0x00007f127ef78603 in WebCore::HTMLCanvasElement::createContextWebGL(WTF::String const&, WebCore::GraphicsContext3DAttributes&&) (this=0x7f122426b610, type=..., attrs=...) at ../Source/WebCore/html/HTMLCanvasElement.cpp:408
#6  0x00007f127ef7c7d7 in WebCore::HTMLCanvasElement::getContext(JSC::ExecState&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown>, 0ul, WTF::CrashOnOverflow, 16ul>&&)
    (this=this@entry=0x7f122426b610, state=..., contextId=..., arguments=...)
    at ../Source/WebCore/html/HTMLCanvasElement.cpp:276
#7  0x00007f127e4aea1d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody
    (throwScope=..., castedThis=0x7f12013c6380, state=0x7fff32f9c750)
    at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:291
#8  0x00007f127e4aea1d in WebCore::IDLOperation<WebCore::JSHTMLCanvasElement>::call<WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody> (operationName=0x7f127fcc3fa6 "getContext", state=...)
    at ../Source/WebCore/bindings/js/JSDOMOperation.h:53
#9  0x00007f127e4aea1d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContext(JSC::ExecState*)
    (state=0x7fff32f9c750) at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:296
#10 0x00007f1227fff16b in  ()
#11 0x00007fff32f9c860 in  ()
#12 0x00007f127b9df421 in llint_op_call () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#13 0x0000000000000000 in  ()


(gdb) bt full
#0  0x00007f127fa8fa58 in Nicosia::GC3DLayer::makeContextCurrent() (this=<optimized out>)
    at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
#1  0x00007f127fa84b80 in WebCore::GraphicsContext3D::makeContextCurrent() (this=this@entry=0x7f11e79dc600)
    at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
#2  0x00007f127fa84de7 in WebCore::GraphicsContext3D::GraphicsContext3D(WebCore::GraphicsContext3DAttributes, WebCore::HostWindow*, WebCore::GraphicsContext3D::RenderStyle, WebCore::GraphicsContext3D*)
    (this=0x7f11e79dc600, attributes=..., renderStyle=WebCore::GraphicsContext3D::RenderOffscreen, sharedContext=<optimized out>) at ../Source/WebCore/platform/graphics/texmap/GraphicsContext3DTextureMapper.cpp:114
        ANGLEResources = 
          {MaxVertexAttribs = 913974467, MaxVertexUniformVectors = 2051993642, MaxVaryingVectors = 1025385667, MaxVertexTextureImageUnits = 2073325422, MaxCombinedTextureImageUnits = 0, MaxTextureImageUnits = 32767, MaxFragmentUniformVectors = 2102440001, MaxDrawBuffers = 32530, OES_standard_derivatives = 2103732800, OES_EGL_image_external = 32530, OES_EGL_image_external_essl3 = 0, NV_EGL_stream_consumer_external = 0, ARB_texture_rectangle = 7, EXT_blend_func_extended = 32530, EXT_draw_buffers = 0, EXT_frag_depth = 0, EXT_shader_texture_lod = 7, WEBGL_debug_shader_precision = 0, EXT_shader_framebuffer_fetch = 320393242, NV_shader_framebuffer_fetch = 21954, ARM_shader_framebuffer_fetch = 112, OVR_multiview2 = 0, EXT_YUV_target = 94, EXT_geometry_shader = 0, OES_texture_storage_multisample_2d_array = 0, ANGLE_texture_multisample = 0, ANGLE_multi_draw = 320393272, NV_draw_buffers = 21954, FragmentPrecisionHigh = 144, MaxVertexOutputVectors = 0, MaxFragmentInputVectors = 1, MinProgramTexelOffset = 0, MaxProgramTexelOffset = 7, MaxDualSourceDrawBuffers = 49, MaxViewsOVR = 0, HashFunction = 0x0, ArrayIndexClampingStrategy = (unknown: 0), MaxExpressionComplexity = 0, MaxCallStackDepth = 124, MaxFunctionParameters = 119, MinProgramTextureGatherOffset = 110, MaxProgramTextureGatherOffset = 91, MaxImageUnits = 2040611200, MaxVertexImageUniforms = 32530, MaxFragmentImageUniforms = 5, MaxComputeImageUniforms = 0, MaxCombinedImageUniforms = 94, MaxUniformLocations = 0, MaxCombinedShaderOutputResources = 2103732704, MaxComputeWorkGroupCount = {_M_elems = {32530, 21, 0}}, MaxComputeWorkGroupSize = {_M_elems = {2040219680, 32530, -16}}, MaxComputeUniformComponents = -1, MaxComputeTextureImageUnits = 2102445017, MaxComputeAtomicCounters = 32530, MaxComputeAtomicCounterBuffers = 2040779328, MaxVertexAtomicCounters = 32530, MaxFragmentAtomicCounters = 2040779328, MaxCombinedAtomicCounters = 32530, MaxAtomicCounterBindings = 329073248, MaxVertexAtomicCounterBuffers = 21954, MaxFragmentAtomicCounterBuffers = 2040728757, MaxCombinedAtomicCounterBuffers = 32530, MaxAtomicCounterBufferSize = 0, MaxUniformBufferBindings = 0, MaxShaderStorageBufferBindings = 327478688, MaxPointSize = 3.07641065e-41, MaxGeometryUniformComponents = 2117502338, MaxGeometryUniformBlocks = 21, MaxGeometryInputComponents = 2101557456, MaxGeometryOutputComponents = 32530, MaxGeometryOutputVertices = 0, MaxGeometryTotalOutputComponents = 0, MaxGeometryTextureImageUnits = -2139654388, MaxGeometryAtomicCounterBuffers = 32530, MaxGeometryAtomicCounters = 2144069542, MaxGeometryShaderStorageBlocks = 32530, MaxGeometryShaderInvocations = 855229712, MaxGeometryImageUniforms = 32767}
        range = {1968767072, 32530}
        precision = 32529
#3  0x00007f127fa859de in WebCore::GraphicsContext3D::create(WebCore::GraphicsContext3DAttributes, WebCore::HostWindow*, WebCore::GraphicsContext3D::RenderStyle)
    (attributes=..., hostWindow=hostWindow@entry=0x7f1275590060, renderStyle=renderStyle@entry=WebCore::GraphicsContext3D::RenderOffscreen) at DerivedSources/ForwardingHeaders/wtf/RefCounted.h:140
        initialized = true
        success = true
        contexts = 
                @0x7f12807b9c00: {m_start = 0, m_end = 0, m_buffer = {<WTF::VectorBufferBase<WebCore::GraphicsContext3D*>> = {m_buffer = 0x7f12807b9c20 <WebCore::activeContexts()::s_activeContexts+32>, m_capacity = 16, m_size = 0}, m_inlineBuffer = {{__data = "\000\000\000\000\000\000\000", __align = {<No data fields>}} <repeats 16 times>}}}
#4  0x00007f127f092c1f in WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContext3DAttributes&, WTF::String const&) (canvas=..., attributes=..., type=...)
    at ../Source/WebCore/html/canvas/WebGLRenderingContextBase.cpp:601
        isPendingPolicyResolution = false
        hostWindow = 0x7f1275590060
        canvasElement = <optimized out>
        context = 
          {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WebCore::GraphicsContext3D, WTF::DumbPtrTraits<WebCore::GraphicsContext3D> >::isRefPtr".>, m_ptr = 0x0}
        extensions = <optimized out>
        renderingContext = <optimized out>
#5  0x00007f127ef78603 in WebCore::HTMLCanvasElement::createContextWebGL(WTF::String const&, WebCore::GraphicsContext3DAttributes&&) (this=0x7f122426b610, type=..., attrs=...) at ../Source/WebCore/html/HTMLCanvasElement.cpp:408
#6  0x00007f127ef7c7d7 in WebCore::HTMLCanvasElement::getContext(JSC::ExecState&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown>, 0ul, WTF::CrashOnOverflow, 16ul>&&) (this=this@entry=0x7f122426b610, state=..., contextId=..., arguments=...) at ../Source/WebCore/html/HTMLCanvasElement.cpp:276
        scope = {<JSC::ExceptionScope> = {m_vm = @0x7f1226b00000}, <No data fields>}
        attributes = {alpha = true, depth = true, stencil = false, antialias = true, premultipliedAlpha = true, preserveDrawingBuffer = false, failIfMajorPerformanceCaveat = false, powerPreference = WebCore::GraphicsContext3DPowerPreference::Default, shareResources = false, isWebGL2 = false, noExtensions = true, devicePixelRatio = 1, initialPowerPreference = WebCore::GraphicsContext3DPowerPreference::Default}
        context = <optimized out>
#7  0x00007f127e4aea1d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody (throwScope=..., castedThis=0x7f12013c6380, state=0x7fff32f9c750) at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:291
        impl = @0x7f122426b610: {<WebCore::HTMLElement> = {<WebCore::StyledElement> = {<WebCore::Element> = {<WebCore::ContainerNode> = {<WTF::CanMakeWeakPtr<WebCore::ContainerNode>> = {m_weakPtrFactory = {m_impl = {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WTF::WeakPtrImpl, WTF::DumbPtrTraits<WTF::WeakPtrImpl> >::isRefPtr".>, m_ptr = 0x0}}}, <WebCore::Node> = {<WebCore::EventTarget> = {<WebCore::ScriptWrappable> = {m_wrapper = {m_impl = 0x7f11e7933990}}, _vptr.EventTarget = 0x7f12806873d8 <vtable for WebCore::HTMLCanvasElement+16>}, static s_refCountIncrement = 2, static s_refCountMask = 4294967294, m_refCountAndParentBit = 2, m_nodeFlags = 524302, m_parentNode = 0x0, m_treeScope = 0x7f1224208c30, m_previous = 0x0, m_next = 0x0, m_data = {m_renderer = 0x0, m_rareData = 0x0}}, m_firstChild = 0x0, m_lastChild = 0x0}, m_tagName = {m_impl = {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WebCore::QualifiedName::QualifiedNameImpl, WTF::DumbPtrTraits<WebCore::QualifiedName::QualifiedNameImpl> >::isRefPtr".>, m_ptr = 0x7f12755880c8}}, m_elementData = {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WebCore::ElementData, WTF::DumbPtrTraits<WebCore::ElementData> >::isRefPtr".>, m_ptr = 0x0}}, <No data fields>}, <No data fields>}, <WebCore::CanvasBase> = {_vptr.CanvasBase = 0x7f12806878f8 <vtable for WebCore::HTMLCanvasElement+1328>, m_context = {_M_t = {_M_t = {<std::_Tuple_impl<0, WebCore::CanvasRenderingContext*, std::default_delete<WebCore::CanvasRenderingContext> >> = {<std::_Tuple_impl<1, std::default_delete<WebCore::CanvasRenderingContext> >> = {<std::_Head_base<1, std::default_delete<WebCore::CanvasRenderingContext>, true>> = {<std::default_delete<WebCore::CanvasRenderingContext>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, WebCore::CanvasRenderingContext*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}, m_originClean = true, m_observers = {m_impl = {static m_maxLoad = 2, static m_minLoad = 6, m_table = 0x0, m_tableSize = 0, m_tableSizeMask = 0, m_keyCount = 0, m_deletedCount = 0}}}, m_dirtyRect = {m_location = {m_x = 0, m_y = 0}, m_size = {m_width = 0, m_height = 0}}, m_size = {m_width = 300, m_height = 150}, m_ignoreReset = false, m_usesDisplayListDrawing = false, m_tracksDisplayListReplay = false, m_imageBufferAssignmentLock = {static isHeldBit = 1 '\001', static hasParkedBit = 2 '\002', m_byte = {value = {<std::__atomic_base<unsigned char>> = {static _S_alignment = 1, _M_i = 0 '\000'}, static is_always_lock_free = true}}}, m_hasCreatedImageBuffer = false, m_didClearImageBuffer = false, m_imageBuffer = {_M_t = {_M_t = {<std::_Tuple_impl<0, WebCore::ImageBuffer*, std::default_delete<WebCore::ImageBuffer> >> = {<std::_Tuple_impl<1, std::default_delete<WebCore::ImageBuffer> >> = {<std::_Head_base<1, std::default_delete<WebCore::ImageBuffer>, true>> = {<std::default_delete<WebCore::ImageBuffer>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, WebCore::ImageBuffer*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}, m_contextStateSaver = {_M_t = {_M_t = {<std::_Tuple_impl<0, WebCore::GraphicsContextStateSaver*, std::default_delete<WebCore::GraphicsContextStateSaver> >> = {<std::_Tuple_impl<1, std::default_delete<WebCore::GraphicsContextStateSaver> >> = {<std::_Head_base<1, std::default_delete<WebCore::GraphicsContextStateSaver>, true>> = {<std::default_delete<WebCore::GraphicsContextStateSaver>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, WebCore::GraphicsContextStateSaver*, false>> = {_M_head_impl = 0x0}, <No data fields>}, <No data fields>}}}, m_presentedImage = {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WebCore::Image, WTF::DumbPtrTraits<WebCore::Image> >::isRefPtr".>, m_ptr = 0x0}, m_copiedImage = {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WebCore::Image, WTF::DumbPtrTraits<WebCore::Image> >::isRefPtr".>, m_ptr = 0x0}}
        contextId = {static MaxLength = 2147483647, m_impl = {static isRefPtr = <error reading variable: Missing ELF symbol "WTF::RefPtr<WTF::StringImpl, WTF::DumbPtrTraits<WTF::StringImpl> >::isRefPtr".>, m_ptr = 0x7f11e79221e0}}
        arguments = {<WTF::VectorBuffer<JSC::Strong<JSC::Unknown>, 0>> = {<WTF::VectorBufferBase<JSC::Strong<JSC::Unknown> >> = {m_buffer = 0x0, m_capacity = 0, m_size = 0}, <No data fields>}, <No data fields>}
        throwScope = {<JSC::ExceptionScope> = {m_vm = @0x7f1226b00000}, <No data fields>}
        thisObject = 0x7f12013c6380
#8  0x00007f127e4aea1d in WebCore::IDLOperation<WebCore::JSHTMLCanvasElement>::call<WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody> (operationName=0x7f127fcc3fa6 "getContext", state=...) at ../Source/WebCore/bindings/js/JSDOMOperation.h:53
        throwScope = {<JSC::ExceptionScope> = {m_vm = @0x7f1226b00000}, <No data fields>}
        thisObject = 0x7f12013c6380
#9  0x00007f127e4aea1d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContext(JSC::ExecState*) (state=0x7fff32f9c750) at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:296
#10 0x00007f1227fff16b in  ()
#11 0x00007fff32f9c860 in  ()
#12 0x00007f127b9df421 in llint_op_call () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#13 0x0000000000000000 in  ()
Comment 1 Michael Catanzaro 2019-09-20 07:47:03 PDT
Today I noticed every attempt to view Google Maps -- even in new web views -- resulted in this crash. I had to restart the entire browser before viewing Google Maps worked again.
Comment 2 Michael Catanzaro 2019-09-21 10:01:36 PDT
(In reply to Michael Catanzaro from comment #1)
> Today I noticed every attempt to view Google Maps -- even in new web views
> -- resulted in this crash. I had to restart the entire browser before
> viewing Google Maps worked again.

And again today.
Comment 3 Michael Catanzaro 2019-10-04 06:28:41 PDT
OK I have a better reproducer! Visit https://q13fox.com/ and the page will crash immediately.
Comment 4 Michael Catanzaro 2019-10-04 06:46:20 PDT
OK wow, it's related to bug #202362. I think this bug only occurs when the browser gets into the "bad state" where bug #202362 occurs. Seems that some pages display white while others crash with this trace. In my journal, I see the same warnings from bug #202362:

Oct 04 08:44:16 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]: Cannot create EGL context: invalid display (last error: EGL_SUCCESS)

<web process crashes here>

Oct 04 08:44:16 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]: Cannot get default EGL display: EGL_BAD_PARAMETER
Comment 5 Michael Catanzaro 2019-10-07 07:14:05 PDT
I hit this crash 15-20 times this weekend.

It would be nice if a relevant developer would acknowledge this issue.
Comment 6 Carlos Garcia Campos 2019-10-08 02:43:17 PDT
This seems to be crashing because m_glContext is nullptr in GC3DLayer::makeContextCurrent(), which is called right after Nicosia::GC3DLayer is created, so the only way that can happen is because GLContext::createOffscreenContext() failed in the constructor. So, we need to know why creating an offscreen context fails in your system. The info of about:gpu in your system would help here.
Comment 7 Carlos Garcia Campos 2019-10-08 02:44:59 PDT
Ah! I had forgotten the previous comments, so

Cannot create EGL context: invalid display (last error: EGL_SUCCESS)

that helps a bit
Comment 8 Carlos Garcia Campos 2019-10-08 02:50:00 PDT
about:gpu info is still useful in any case
Comment 9 Carlos Garcia Campos 2019-10-08 02:57:52 PDT
The origin is 

PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

which happens when 

m_eglDisplay = eglGetDisplay(wpe_renderer_backend_egl_get_native_display(m_backend));

fails. In that case we don't initialize egl display, so it's initialized on demand when PlatformDisplay::eglDispaly is called, but eglGetDisplay(EGL_DEFAULT_DISPLAY) (fortunately, because we don't really want to use the default display in this case).

So, the thing is why eglGetDisplay(wpe_renderer_backend_egl_get_native_display(m_backend)) is failing.
Comment 10 Miguel Gomez 2019-10-08 03:16:02 PDT
(In reply to Carlos Garcia Campos from comment #9)
> The origin is 
> 
> PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
> 
> which happens when 
> 
> m_eglDisplay =
> eglGetDisplay(wpe_renderer_backend_egl_get_native_display(m_backend));
> 
> fails. In that case we don't initialize egl display, so it's initialized on
> demand when PlatformDisplay::eglDispaly is called, but
> eglGetDisplay(EGL_DEFAULT_DISPLAY) (fortunately, because we don't really
> want to use the default display in this case).
> 
> So, the thing is why
> eglGetDisplay(wpe_renderer_backend_egl_get_native_display(m_backend)) is
> failing.

Were you able to reproduce this, Carlos? using which version exactly? I've tried ToT and I'm not able to see it. I'm now about to test 2.26.

Anyway, it's true that the problem is that we can't create a gl context because we can't get a valid display. What bothers me is this line:

Cannot get default EGL display: EGL_BAD_PARAMETER

cause, at least according to the documentation, eglGetdisplay should not gererate an EGL_BAD_PARAMETER error, which could mean that the error is coming from a previous EGL call.
Comment 11 Miguel Gomez 2019-10-08 03:16:47 PDT
(In reply to Michael Catanzaro from comment #3)
> OK I have a better reproducer! Visit https://q13fox.com/ and the page will
> crash immediately.

Sadly this is content is geoblocked :(
Comment 12 Carlos Garcia Campos 2019-10-08 03:23:49 PDT
No, I can't reproduce it.
Comment 13 Carlos Garcia Campos 2019-10-08 03:38:46 PDT
This is indeed a duplicate of bug #202362, it shows white pages for websites with normal AC content, and crashes when using WebGL. I've attached a patch to bug #202362 to handle the EGL display initialization failure and disable AC. That patch fixes both the white pages and this crash, so let's use this bug to figure out why EGL display creation fails.
Comment 14 Carlos Garcia Campos 2019-10-08 03:52:32 PDT
wpe_renderer_backend_egl_create() does the wayland display connection for the given fd (wl_display_connect_to_fd) and the returned WaylandDisplay is what is passed to eglGetDisplay(). So, it might be failing to connect to the wayland nested compositor and we are passing nullptr to eglGetDisplay(). Michael, it would help if you could add printfs to PlatformDisplayLibWPE::initialize() to show the hostFd and the native display returned by wpe_renderer_backend_egl_get_native_display() as a pointer.
Comment 15 Carlos Garcia Campos 2019-10-08 05:00:53 PDT
I'm checking logs reported in bug #202362.

Oct 04 08:14:31 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]: Cannot get default EGL display: EGL_BAD_PARAMETER
Oct 04 08:14:31 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]: Cannot create EGL context: invalid display (last error: EGL_SUCCESS)

This is the initialization of the default display, that is also failing so problem is not specific to wpe renderer. EGL_BAD_PARAMETER must be a previous error, as Miguel suggested, and the only egl call that should happen before the egl display initialization is eglGetPlatformDisplay. It returns a EGL_BAD_PARAMETER when the given platform is not supported, but if that was the case it would always fail. I see we are checking for EGL_KHR_platform_wayland and always passing EGL_PLATFORM_WAYLAND_KHR even when using eglGetPlatformDisplayEXT, but that shouldn't be a problem because EGL_PLATFORM_WAYLAND_KHR and EGL_PLATFORM_WAYLAND_EXT are both defined as 0x31D8. 

Oct 04 08:14:32 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]: Cannot get default EGL display: EGL_BAD_PARAMETER
Oct 04 08:14:32 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

And this is creating the share display for compositing. In this case I don't know here the EGL_BAD_PARAMETER comes from.
Comment 16 Michael Catanzaro 2019-10-08 09:12:10 PDT
(In reply to Michael Catanzaro from comment #3)
> OK I have a better reproducer! Visit https://q13fox.com/ and the page will
> crash immediately.

What I didn't realize when I added that comment is the bug is only reproducible once Epiphany gets into the "bad state" such that bug #202362 also occurs. Then it will crash 100% until Epiphany is restarted. But if you're not yet in this bad state, it works fine.

(In reply to Carlos Garcia Campos from comment #8)
> about:gpu info is still useful in any case

I can add a patch to our runtime to add about:gpu, if you want to provide a patch that builds against 2.26.1.

I won't update the runtime to 2.27.1 due to (a) the GitLab CI regression, it will become impossible to use Epiphany to develop Epiphany, and (b) the MSE regressions, I like to watch YouTube. So I want to keep Tech Preview on 2.26 for now.

(In reply to Carlos Garcia Campos from comment #14)
> Michael, it would help if you could add printfs to
> PlatformDisplayLibWPE::initialize() to show the hostFd and the native
> display returned by wpe_renderer_backend_egl_get_native_display() as a
> pointer.

This is more work than just adding a patch locally though. A local build isn't good enough because this bug is not reproducible; we need the patch in the real Tech Preview build for the debug to be available next time I notice the issue.

So I will add a debug patch to the runtime tomorrow. Please confirm that you still need it to print (a) hostFd, and (b) native display as a pointer, and (c) nothing else. If more would be helpful, now is the best time to add it because it would be nice to update the patch as few times as possible.
Comment 17 Michael Catanzaro 2019-10-08 09:19:02 PDT
(In reply to Michael Catanzaro from comment #16)
> I won't update the runtime to 2.27.1 due to (a) the GitLab CI regression, it
> will become impossible to use Epiphany to develop Epiphany, and (b) the MSE
> regressions, I like to watch YouTube. So I want to keep Tech Preview on 2.26
> for now.

That's bug #202594, bug #201726, bug #202078, and bug #202079.

(We also really need bug #202321, though that one is already broken in 2.26.)
Comment 18 Carlos Garcia Campos 2019-10-09 00:14:25 PDT
(In reply to Michael Catanzaro from comment #16)
> (In reply to Michael Catanzaro from comment #3)
> > OK I have a better reproducer! Visit https://q13fox.com/ and the page will
> > crash immediately.
> 
> What I didn't realize when I added that comment is the bug is only
> reproducible once Epiphany gets into the "bad state" such that bug #202362
> also occurs. Then it will crash 100% until Epiphany is restarted. But if
> you're not yet in this bad state, it works fine.

The fact that doesn't always happen, and it starts happening after a while, makes me think it's not a problem with the EGL config, because it's always the same. It could be something like OOM or that you run out of file descriptors or something like that.

> (In reply to Carlos Garcia Campos from comment #8)
> > about:gpu info is still useful in any case
> 
> I can add a patch to our runtime to add about:gpu, if you want to provide a
> patch that builds against 2.26.1.
> 
> I won't update the runtime to 2.27.1 due to (a) the GitLab CI regression, it
> will become impossible to use Epiphany to develop Epiphany, and (b) the MSE
> regressions, I like to watch YouTube. So I want to keep Tech Preview on 2.26
> for now.
> 
> (In reply to Carlos Garcia Campos from comment #14)
> > Michael, it would help if you could add printfs to
> > PlatformDisplayLibWPE::initialize() to show the hostFd and the native
> > display returned by wpe_renderer_backend_egl_get_native_display() as a
> > pointer.
> 
> This is more work than just adding a patch locally though. A local build
> isn't good enough because this bug is not reproducible; we need the patch in
> the real Tech Preview build for the debug to be available next time I notice
> the issue.
> 
> So I will add a debug patch to the runtime tomorrow. Please confirm that you
> still need it to print (a) hostFd, and (b) native display as a pointer, and
> (c) nothing else. If more would be helpful, now is the best time to add it
> because it would be nice to update the patch as few times as possible.

No, I don't really need that, because it also happens when initializing the main display, so it's not related to wpe renderer.
Comment 19 Michael Catanzaro 2019-10-09 08:01:27 PDT
(In reply to Carlos Garcia Campos from comment #18)
> The fact that doesn't always happen, and it starts happening after a while,
> makes me think it's not a problem with the EGL config, because it's always
> the same. It could be something like OOM or that you run out of file
> descriptors or something like that.

It's definitely not OOM.

I can check to see if an fd leak might be a problem next time this happens. (Sadly I just had it in a bad state a couple minutes ago and restarted so that I could use some website.)
Comment 20 Carlos Alberto Lopez Perez 2019-10-09 08:13:56 PDT
(In reply to Michael Catanzaro from comment #19)
> (In reply to Carlos Garcia Campos from comment #18)
> > The fact that doesn't always happen, and it starts happening after a while,
> > makes me think it's not a problem with the EGL config, because it's always
> > the same. It could be something like OOM or that you run out of file
> > descriptors or something like that.
> 
> It's definitely not OOM.
> 
> I can check to see if an fd leak might be a problem next time this happens.
> (Sadly I just had it in a bad state a couple minutes ago and restarted so
> that I could use some website.)

It can be also related with flatpak or some resource limit inside the container.

Can you reproduce it outside of flatpak?
Comment 21 Michael Catanzaro 2019-10-09 08:23:08 PDT
(In reply to Carlos Alberto Lopez Perez from comment #20)
> It can be also related with flatpak or some resource limit inside the
> container.
> 
> Can you reproduce it outside of flatpak?

I am not going to try. The only way to notice bugs like this is to use the browser all day, every day, for several days. That's too long for me to test something.
Comment 22 Michael Catanzaro 2019-10-15 02:33:26 PDT
I have it in the error state again night now and the problem is not fd exhaustion... the UI process has <100 open fds.

I will try to resist the urge to restart my browser to fix this for about half an hour, in case someone sees this immediately and wants me to try something else for debugging.
Comment 23 Michael Catanzaro 2019-10-29 07:05:06 PDT
Well I don't know what we can do about this. I'm hitting the bug now for the first time in about a week, but can't think of anything to do other than restart my browser so I can browse the web again and thereby throw away the opportunity to debug it for another week.

It seems nobody else has any idea how we can even attempt to debug it, either.
Comment 24 Michael Catanzaro 2019-12-03 07:16:27 PST
I have Epiphany in the bad state again right now. I discovered that, while the fallback to disable AC mode in this state that was implemented in bug #202362 *usually* works properly, it currently fails on https://www.linuxjournal.com/content/job-control-bash-feature-you-only-think-you-dont-need and we still get the same crash in comment #0. I think there is something about the web content on this page that triggers the crash.
Comment 25 Michael Catanzaro 2019-12-04 05:10:00 PST
(In reply to Michael Catanzaro from comment #24)
> I think there is something about the web content on this page that triggers the crash.

I have it in the bad state again today. https://riot.igalia.com/ is also a guaranteed crash. I noticed this warning:

** (WebKitWebProcess:16376): CRITICAL **: 07:08:38.759: gst_gl_display_egl_new_with_egl_display: assertion 'display != NULL' failed
Comment 26 Michael Catanzaro 2019-12-04 05:34:20 PST
Split into bug #204848
Comment 27 Philippe Normand 2019-12-05 05:15:53 PST
(In reply to Michael Catanzaro from comment #26)
> Split into bug #204848

I don't really see the point of filing another bug?
Shouldn't there at least be an ASSERT in PlatformDisplay::sharedDisplayForCompositing()?
Comment 28 Michael Catanzaro 2019-12-05 07:25:46 PST
I'd do a RELEASE_ASSERT, since we keep hitting it.

Bug #204848: ensure WebKit disables compositing instead of crashing when in this bad state

This bug: ensure WebKit does not enter this broken state in the first place
Comment 29 Michael Catanzaro 2019-12-08 08:38:27 PST
Here is the output from my about:gpu. I think we could use some work on the WebKit side to make this easier to copy/paste into bug reports; it's not very readable by default. I've added newlines to the output to make it easier to read.

Version Information

WebKit version
WebKitGTK 2.27.3 (tarball)

Operating system
Linux 5.3.13-300.fc31.x86_64 #1 SMP Mon Nov 25 17:25:25 UTC 2019 x86_64

Desktop
GNOME

Cairo version
1.16.0 (build) 1.16.0 (runtime)

GTK version
3.24.13 (build) 3.24.13 (runtime)

WPE version
1.4.0 (using fdo backend 1.4.0)


Display Information

Type
Wayland

Screen geometry
0,0 1920x1080

Screen work area
0,0 1920x1080

Depth
32

Bits per color component
8

DPI
96.00


Hardware Acceleration Information

Policy
on demand

WebGL enabled
Yes

API
OpenGL

Native interface
EGL

GL_RENDERER
Radeon RX 570 Series (POLARIS10, DRM 3.33.0, 5.3.13-300.fc31.x86_64, LLVM 9.0.0)

GL_VENDOR
X.Org

GL_VERSION
4.5 (Core Profile) Mesa 19.2.1

GL_SHADING_LANGUAGE_VERSION
4.50

GL_EXTENSIONS
GL_AMD_conservative_depth GL_AMD_depth_clamp_separate GL_AMD_draw_buffers_blend GL_AMD_framebuffer_multisample_advanced GL_AMD_gpu_shader_int64 GL_AMD_multi_draw_indirect GL_AMD_performance_monitor GL_AMD_pinned_memory GL_AMD_query_buffer_object GL_AMD_seamless_cubemap_per_texture GL_AMD_shader_stencil_export GL_AMD_shader_trinary_minmax GL_AMD_texture_texture4 GL_AMD_vertex_shader_layer GL_AMD_vertex_shader_viewport_index GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_ARB_ES2_compatibility GL_ARB_ES3_1_compatibility GL_ARB_ES3_2_compatibility GL_ARB_ES3_compatibility GL_ARB_arrays_of_arrays GL_ARB_base_instance GL_ARB_bindless_texture GL_ARB_blend_func_extended GL_ARB_buffer_storage GL_ARB_clear_buffer_object GL_ARB_clear_texture GL_ARB_clip_control GL_ARB_color_buffer_float GL_ARB_compressed_texture_pixel_storage GL_ARB_compute_shader GL_ARB_compute_variable_group_size GL_ARB_conditional_render_inverted GL_ARB_conservative_depth GL_ARB_copy_buffer GL_ARB_copy_image GL_ARB_cull_distance GL_ARB_debug_output GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_derivative_control GL_ARB_direct_state_access GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_elements_base_vertex GL_ARB_draw_indirect GL_ARB_draw_instanced GL_ARB_enhanced_layouts GL_ARB_explicit_attrib_location GL_ARB_explicit_uniform_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_layer_viewport GL_ARB_fragment_shader GL_ARB_framebuffer_no_attachments GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_get_program_binary GL_ARB_get_texture_sub_image GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_gpu_shader_int64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_indirect_parameters GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_internalformat_query2 GL_ARB_invalidate_subdata GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multi_bind GL_ARB_multi_draw_indirect GL_ARB_occlusion_query2 GL_ARB_parallel_shader_compile GL_ARB_pipeline_statistics_query GL_ARB_pixel_buffer_object GL_ARB_point_sprite GL_ARB_polygon_offset_clamp GL_ARB_program_interface_query GL_ARB_provoking_vertex GL_ARB_query_buffer_object GL_ARB_robust_buffer_access_behavior GL_ARB_robustness GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_seamless_cubemap_per_texture GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counter_ops GL_ARB_shader_atomic_counters GL_ARB_shader_ballot GL_ARB_shader_bit_encoding GL_ARB_shader_clock GL_ARB_shader_draw_parameters GL_ARB_shader_group_vote GL_ARB_shader_image_load_store GL_ARB_shader_image_size GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_stencil_export GL_ARB_shader_storage_buffer_object GL_ARB_shader_subroutine GL_ARB_shader_texture_image_samples GL_ARB_shader_texture_lod GL_ARB_shader_viewport_layer_array GL_ARB_shading_language_420pack GL_ARB_shading_language_packing GL_ARB_sparse_buffer GL_ARB_stencil_texturing GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_barrier GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_buffer_range GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map_array GL_ARB_texture_filter_anisotropic GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirror_clamp_to_edge GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_levels GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_stencil8 GL_ARB_texture_storage GL_ARB_texture_storage_multisample GL_ARB_texture_swizzle GL_ARB_texture_view GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transform_feedback_overflow_query GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_attrib_binding GL_ARB_vertex_buffer_object GL_ARB_vertex_shader GL_ARB_vertex_type_10f_11f_11f_rev GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ATI_blend_equation_separate GL_ATI_meminfo GL_ATI_texture_float GL_ATI_texture_mirror_once GL_EXT_abgr GL_EXT_blend_equation_separate GL_EXT_depth_bounds_test GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_memory_object GL_EXT_memory_object_fd GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_pixel_buffer_object GL_EXT_polygon_offset_clamp GL_EXT_provoking_vertex GL_EXT_semaphore GL_EXT_semaphore_fd GL_EXT_shader_image_load_formatted GL_EXT_shader_integer_mix GL_EXT_texture_array GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_filter_anisotropic GL_EXT_texture_integer GL_EXT_texture_mirror_clamp GL_EXT_texture_sRGB GL_EXT_texture_sRGB_R8 GL_EXT_texture_sRGB_decode GL_EXT_texture_shared_exponent GL_EXT_texture_snorm GL_EXT_texture_swizzle GL_EXT_timer_query GL_EXT_transform_feedback GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_EXT_window_rectangles GL_IBM_multimode_draw_arrays GL_KHR_blend_equation_advanced GL_KHR_context_flush_control GL_KHR_debug GL_KHR_no_error GL_KHR_parallel_shader_compile GL_KHR_robust_buffer_access_behavior GL_KHR_robustness GL_KHR_texture_compression_astc_ldr GL_KHR_texture_compression_astc_sliced_3d GL_MESA_pack_invert GL_MESA_shader_integer_functions GL_MESA_texture_signed_rgba GL_NVX_gpu_memory_info GL_NV_conditional_render GL_NV_depth_clamp GL_NV_packed_depth_stencil GL_NV_texture_barrier GL_NV_vdpau_interop GL_OES_EGL_image GL_S3_s3tc

EGL_VERSION
1.5

EGL_VENDOR
Mesa Project

EGL_EXTENSIONS
EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_EXT_platform_wayland EGL_EXT_platform_x11 EGL_MESA_platform_gbm EGL_MESA_platform_surfaceless EGL_EXT_platform_device EGL_ANDROID_blob_cache EGL_ANDROID_native_fence_sync EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_swap_buffers_with_damage EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image_base EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_KHR_swap_buffers_with_damage EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export EGL_MESA_query_driver EGL_WL_bind_wayland_display EGL_WL_create_wayland_buffer_from_image
Comment 30 Michael Catanzaro 2019-12-16 14:29:05 PST
Hi, it seems this issue is stalled.

As far as I know, this is a regression from the switch to WPE renderer? (I'm not certain of this, but I definitely never noticed the issue before this September, so the timing seems right.)

For Epiphany, I am aiming to reenable AC mode in upcoming Epiphany 3.34.3 and 3.32.6, because we've discovered various sites that require 3D transforms, and all known AC mode bugs other than this one have been fixed. This bug is my remaining hesitation. I wonder if we should switch WebKitGTK's build default from WPE renderer back to the WaylandCompositor until we have time to track this down and debug it? I'm a bit concerned because Epiphany does not have any control over whether WPE renderer or WaylandCompositor gets used.
Comment 31 Carlos Garcia Campos 2019-12-17 00:43:24 PST
(In reply to Michael Catanzaro from comment #30)
> Hi, it seems this issue is stalled.

I don't think I can do more without a way to reproduce it.

> As far as I know, this is a regression from the switch to WPE renderer? (I'm
> not certain of this, but I definitely never noticed the issue before this
> September, so the timing seems right.)

This is just a guess, we would need to confirm it. Since you seem to be the only one affected, we would need you to build WebKit without WPE renderer and check if the problem is gone.

> For Epiphany, I am aiming to reenable AC mode in upcoming Epiphany 3.34.3
> and 3.32.6, because we've discovered various sites that require 3D
> transforms, and all known AC mode bugs other than this one have been fixed.
> This bug is my remaining hesitation. I wonder if we should switch
> WebKitGTK's build default from WPE renderer back to the WaylandCompositor
> until we have time to track this down and debug it? I'm a bit concerned
> because Epiphany does not have any control over whether WPE renderer or
> WaylandCompositor gets used.

Let's confirm it's a regression of wpe renderer, because I don't think it is.
Comment 32 Michael Catanzaro 2019-12-17 07:02:41 PST
(In reply to Carlos Garcia Campos from comment #31)
> This is just a guess, we would need to confirm it. Since you seem to be the
> only one affected, we would need you to build WebKit without WPE renderer
> and check if the problem is gone.

Problem is, without a reproducer, it's impossible to ever know for sure if it's fixed. I can only guess based on whether I see pages that include video crashing (or, in the unlikely event I'm running in a terminal, if I see the error messages appearing there). I haven't noticed the issue for about two weeks now, when it happened two or three days in a row, and then there were several calm weeks before that.

Since multiple weeks without hitting the bug isn't evidence that it's fixed, might I suggest we build some debugging into WPEBackend-fdo to try to figure out what's going wrong next time this happens? Something must be going wrong in Instance::initialize in ws.cpp. Instead of returning false when initialization fails, how about we make this a fatal error and crash the web process instead? If we change each return false to g_error() then the backtrace from the crash will tell us exactly where it's failing inside this function. I don't see any other way to proceed, because the only way we have to indicate failure other than crashing is the bool return value, which doesn't convey enough information about the failure:

bool Instance::initialize(EGLDisplay eglDisplay)
{
    if (m_eglDisplay == eglDisplay)
        return true;

    if (m_eglDisplay != EGL_NO_DISPLAY) {
        g_warning("Multiple EGL displays are not supported.\n");
        return false;
    }

    const char* extensions = eglQueryString(eglDisplay, EGL_EXTENSIONS);
    if (isEGLExtensionSupported(extensions, "EGL_WL_bind_wayland_display")) {
        s_eglBindWaylandDisplayWL = reinterpret_cast<PFNEGLBINDWAYLANDDISPLAYWL>(eglGetProcAddress("eglBindWaylandDisplayWL"));
        assert(s_eglBindWaylandDisplayWL);
        s_eglQueryWaylandBufferWL = reinterpret_cast<PFNEGLQUERYWAYLANDBUFFERWL>(eglGetProcAddress("eglQueryWaylandBufferWL"));
        assert(s_eglQueryWaylandBufferWL);
    }
    if (!s_eglBindWaylandDisplayWL || !s_eglQueryWaylandBufferWL)
        return false;

    if (isEGLExtensionSupported(extensions, "EGL_KHR_image_base")) {
        s_eglCreateImageKHR = reinterpret_cast<PFNEGLCREATEIMAGEKHRPROC>(eglGetProcAddress("eglCreateImageKHR"));
        assert(s_eglCreateImageKHR);
        s_eglDestroyImageKHR = reinterpret_cast<PFNEGLDESTROYIMAGEKHRPROC>(eglGetProcAddress("eglDestroyImageKHR"));
        assert(s_eglDestroyImageKHR);
    }
    if (!s_eglCreateImageKHR || !s_eglDestroyImageKHR)
        return false;

    if (!s_eglBindWaylandDisplayWL(eglDisplay, m_display))
        return false;

    m_eglDisplay = eglDisplay;

    /* Initialize Linux dmabuf subsystem. */
    if (isEGLExtensionSupported(extensions, "EGL_EXT_image_dma_buf_import")
        && isEGLExtensionSupported(extensions, "EGL_EXT_image_dma_buf_import_modifiers")) {
        s_eglQueryDmaBufFormatsEXT = reinterpret_cast<PFNEGLQUERYDMABUFFORMATSEXTPROC>(eglGetProcAddress("eglQueryDmaBufFormatsEXT"));
        assert(s_eglQueryDmaBufFormatsEXT);
        s_eglQueryDmaBufModifiersEXT = reinterpret_cast<PFNEGLQUERYDMABUFMODIFIERSEXTPROC>(eglGetProcAddress("eglQueryDmaBufModifiersEXT"));
        assert(s_eglQueryDmaBufModifiersEXT);
    }

    if (s_eglQueryDmaBufFormatsEXT && s_eglQueryDmaBufModifiersEXT) {
        if (m_linuxDmabuf)
            assert(!"Linux-dmabuf has already been initialized");
        m_linuxDmabuf = linux_dmabuf_setup(m_display);
    }

    return true;
}
Comment 33 Michael Catanzaro 2019-12-17 07:05:36 PST
I notice my EGL doesn't support EGL_EXT_image_dma_buf_import_modifiers, but it looks like that shouldn't be causing any failure here.
Comment 34 Carlos Garcia Campos 2019-12-17 07:12:09 PST
Or we can add g_warning() before every "return false" there
Comment 35 Michael Catanzaro 2019-12-17 07:15:50 PST
That would be good too.

I might start with errors (crashes), so we don't fail to notice the warnings (it's impossible to notice warnings except when running from a terminal, which I rarely do), and then we can change them to warnings once we have solved this bug?

Or: we could do warnings upstream, and I can patch the GNOME runtime to change them into crashes.
Comment 36 Carlos Alberto Lopez Perez 2019-12-17 07:32:12 PST
(In reply to Michael Catanzaro from comment #33)
> I notice my EGL doesn't support EGL_EXT_image_dma_buf_import_modifiers, but
> it looks like that shouldn't be causing any failure here.

Mmmm....

If your EGL doesn't support that, then s_eglQueryDmaBufModifiersEXT() is NULL.
It seems also s_eglQueryDmaBufFormatsEXT() would be NULL in that case (looking at the if-code block guarding it, it only enters into it if both extensions are supported).

The code block you pasted here checks for that, but below that code, in Instance::foreachDmaBufModifier() it calls both s_eglQueryDmaBufModifiersEXT() and s_eglQueryDmaBufFormatsEXT() and doesn't seem to check the function pointers are valid.

https://github.com/Igalia/WPEBackend-fdo/blob/bee4104/src/ws.cpp#L534

may this explain your issue?
Comment 37 Michael Catanzaro 2019-12-17 08:29:57 PST
I don't think so. I'd expect crashes way more often if that was happening. Instance::foreachDmaBufModifier() is only called from bind_linux_dmabuf() in linux-dmabuf.cpp, and that's only called from linux_dmabuf_setup(), and that's only called from Instance::initialize in an if (s_eglQueryDmaBufFormatsEXT && s_eglQueryDmaBufModifiersEXT) block. So that shouldn't happen.

Note: I have EGL_EXT_image_dma_buf_import, just not EGL_EXT_image_dma_buf_import_modifiers. I wonder why it's not available?
Comment 38 Michael Catanzaro 2019-12-30 07:43:41 PST
Still crashing when creating WebGL context with 2.27.3. I thought we had downgraded this from a crash to disabling AC mode? I'm in the bad state again and every attempt to load a page that uses WebGL results in a crash. Here is a backtrace with 2.27.3:

#0  0x00007fd504da5c18 in Nicosia::GC3DLayer::makeContextCurrent() (this=<optimized out>)
    at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
#1  0x00007fd504d9abc7 in WebCore::GraphicsContext3D::GraphicsContext3D(WebCore::GraphicsContext3DAttributes, WebCore::HostWindow*, WebCore::GraphicsContext3D::RenderStyle, WebCore::GraphicsContext3D*)
    (this=0x7fd493eb7000, attributes=..., renderStyle=WebCore::GraphicsContext3D::RenderOffscreen, sharedContext=<optimized out>) at ../Source/WebCore/platform/graphics/texmap/GraphicsContext3DTextureMapper.cpp:216
#2  0x00007fd504d9b7ce in WebCore::GraphicsContext3D::create(WebCore::GraphicsContext3DAttributes, WebCore::HostWindow*, WebCore::GraphicsContext3D::RenderStyle) (attributes=..., hostWindow=hostWindow@entry=
    0x7fd4fba5ba80, renderStyle=renderStyle@entry=WebCore::GraphicsContext3D::RenderOffscreen)
    at DerivedSources/ForwardingHeaders/wtf/RefCounted.h:185
#3  0x00007fd50432ce0f in WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContext3DAttributes&, WTF::String const&) (canvas=..., attributes=..., type=...)
    at ../Source/WebCore/html/canvas/WebGLRenderingContextBase.cpp:606
#4  0x00007fd504213633 in WebCore::HTMLCanvasElement::createContextWebGL(WTF::String const&, WebCore::GraphicsContext3DAttributes&&) (this=0x7fd4abf4eaa0, type=..., attrs=...) at ../Source/WebCore/html/HTMLCanvasElement.cpp:411
#5  0x00007fd5042187a2 in WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul>&&)
    (this=this@entry=0x7fd4abf4eaa0, state=..., contextId=..., arguments=...)
    at ../Source/WebCore/html/HTMLCanvasElement.cpp:279
#6  0x00007fd50375056d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody
    (throwScope=..., castedThis=0x7fd4abca9b80, callFrame=<optimized out>, lexicalGlobalObject=0x7fd4abcddf60)
    at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:297
#7  0x00007fd50375056d in WebCore::IDLOperation<WebCore::JSHTMLCanvasElement>::call<WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody> (operationName=0x7fd50500c689 "getContext", callFrame=..., lexicalGlobalObject=...)
    at ../Source/WebCore/bindings/js/JSDOMOperation.h:53
#8  0x00007fd50375056d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContext(JSC::JSGlobalObject*, JSC::CallFrame*) (lexicalGlobalObject=0x7fd4abcddf60, callFrame=<optimized out>)
    at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:302
#9  0x00007fd4abfff16b in  ()
#10 0x00007ffc85385b50 in  ()
#11 0x00007fd500a54977 in llint_op_call () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#12 0x0000000000000000 in  ()

Sadly I'm still unable to debug because I'm using flatpak 1.4.3, which has a broken 'flatpak enter' so no way to enter the sandbox environment.
Comment 39 Michael Catanzaro 2020-01-04 08:55:36 PST
(In reply to Carlos Garcia Campos from comment #15)
> I'm checking logs reported in bug #202362.
> 
> Oct 04 08:14:31 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]:
> Cannot get default EGL display: EGL_BAD_PARAMETER
> Oct 04 08:14:31 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]:
> Cannot create EGL context: invalid display (last error: EGL_SUCCESS)
> 
> This is the initialization of the default display, that is also failing so
> problem is not specific to wpe renderer. EGL_BAD_PARAMETER must be a
> previous error, as Miguel suggested, and the only egl call that should
> happen before the egl display initialization is eglGetPlatformDisplay. It
> returns a EGL_BAD_PARAMETER when the given platform is not supported, but if
> that was the case it would always fail. I see we are checking for
> EGL_KHR_platform_wayland and always passing EGL_PLATFORM_WAYLAND_KHR even
> when using eglGetPlatformDisplayEXT, but that shouldn't be a problem because
> EGL_PLATFORM_WAYLAND_KHR and EGL_PLATFORM_WAYLAND_EXT are both defined as
> 0x31D8. 
> 
> Oct 04 08:14:32 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]:
> Cannot get default EGL display: EGL_BAD_PARAMETER
> Oct 04 08:14:32 chargestone-cave org.gnome.Epiphany.Devel.desktop[1896]:
> PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
> 
> And this is creating the share display for compositing. In this case I don't
> know here the EGL_BAD_PARAMETER comes from.

In desperation, I'm looking at eglapi.c in mesa:

static EGLBoolean EGLAPIENTRY
eglBindWaylandDisplayWL(EGLDisplay dpy, struct wl_display *display)
{
   _EGLDisplay *disp = _eglLockDisplay(dpy);
   _EGLDriver *drv;
   EGLBoolean ret;

   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

   _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
   assert(disp->Extensions.WL_bind_wayland_display);

   if (!display)
      RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_FALSE);

   ret = drv->API.BindWaylandDisplayWL(drv, disp, display);

   RETURN_EGL_EVAL(disp, ret);
}

Could wl_display be NULL? It's created in the WS::Instance constructor in ws.cpp, in WPEBackend-fdo, using wl_display_create(). It's documented to return NULL on failure and it looks like a WPEBackend-fdo bug that it's not checking for possible failure there.

I know this isn't likely. We need better debugging to figure out what is going on. I've opened https://github.com/Igalia/WPEBackend-fdo/pull/89 to add debug crashes, which I recommend we use in production until we figure out what's going on here. (Otherwise, at the rate we're going, we might never find the bug.)
Comment 40 Michael Catanzaro 2020-03-19 18:54:33 PDT
BTW this is still crashing with WebKitGTK 2.28.0, libwpe 1.6.0, wpebackend-fdo-1.6.0. It's still impossible to reproduce except when it randomly happens. The backtrace has changed a bit, it now looks like this:

#0  0x00007f50d5a5c128 in Nicosia::GC3DLayer::makeContextCurrent() (this=<optimized out>)
    at /usr/include/c++/9.2.0/bits/unique_ptr.h:352
#1  0x00007f50d5a51133 in WebCore::GraphicsContextGLOpenGL::GraphicsContextGLOpenGL(WebCore::GraphicsContextGLAttributes, WebCore::HostWindow*, WebCore::GraphicsContextGL::Destination, WebCore::GraphicsContextGLOpenGL*)
    (this=0x7f50c4ee1b80, attributes=..., destination=<optimized out>, sharedContext=<optimized out>)
    at ../Source/WebCore/platform/graphics/texmap/GraphicsContextGLTextureMapper.cpp:215
#2  0x00007f50d5a516ed in WebCore::GraphicsContextGLOpenGL::create(WebCore::GraphicsContextGLAttributes, WebCore::HostWindow*, WebCore::GraphicsContextGL::Destination) (attributes=..., hostWindow=hostWindow@entry=
    0x7f50cc194ae0, destination=destination@entry=WebCore::GraphicsContextGL::Destination::Offscreen)
    at DerivedSources/ForwardingHeaders/wtf/RefCounted.h:185
#3  0x00007f50d4fa8b77 in WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WTF::String const&) (canvas=..., attributes=..., type=...)
    at ../Source/WebCore/html/canvas/WebGLRenderingContextBase.cpp:580
#4  0x00007f50d4e83bd3 in WebCore::HTMLCanvasElement::createContextWebGL(WTF::String const&, WebCore::GraphicsContextGLAttributes&&) (this=0x7f507ca94580, type=..., attrs=...) at ../Source/WebCore/html/HTMLCanvasElement.cpp:415
#5  0x00007f50d4e87552 in WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::Vector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0>, 0ul, WTF::CrashOnOverflow, 16ul, WTF::FastMalloc>&&) (this=this@entry=0x7f507ca94580, state=..., contextId=..., arguments=...)
    at ../Source/WebCore/html/HTMLCanvasElement.cpp:283
#6  0x00007f50d439e35d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody
    (throwScope=..., castedThis=0x7f5065925320, callFrame=<optimized out>, lexicalGlobalObject=0x7f50c4cea068)
    at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:298
#7  0x00007f50d439e35d in WebCore::IDLOperation<WebCore::JSHTMLCanvasElement>::call<WebCore::jsHTMLCanvasElementPrototypeFunctionGetContextBody> (operationName=0x7f50d5cb7d1e "getContext", callFrame=..., lexicalGlobalObject=...)
    at ../Source/WebCore/bindings/js/JSDOMOperation.h:53
#8  0x00007f50d439e35d in WebCore::jsHTMLCanvasElementPrototypeFunctionGetContext(JSC::JSGlobalObject*, JSC::CallFrame*) (lexicalGlobalObject=0x7f50c4cea068, callFrame=<optimized out>)
    at DerivedSources/WebCore/JSHTMLCanvasElement.cpp:303
#9  0x00007f507ffff178 in  ()
#10 0x00007ffc8f222500 in  ()
#11 0x00007f50d164428f in llint_op_call () at /usr/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18
#12 0x0000000000000000 in  ()
Comment 41 Michael Catanzaro 2020-06-03 07:39:04 PDT
OK it's been over half a year, I'm out of ideas and am considering switching GNOME back to WaylandCompositor to avoid these crashes. Please, if there's any debugging we can add to the code to help with this, let's add it.
Comment 42 Michael Catanzaro 2020-06-16 07:27:35 PDT
(In reply to Michael Catanzaro from comment #41)
> OK it's been over half a year, I'm out of ideas and am considering switching
> GNOME back to WaylandCompositor to avoid these crashes. Please, if there's
> any debugging we can add to the code to help with this, let's add it.

Since we are not making any progress on this issue, I'm going to disable WPE renderer again, for both GNOME and Fedora. For Fedora, we'll keep the WPE dependencies around indefinitely, but I won't update them anymore since WebKit will no longer use them. For GNOME, I will remove the deps from the SDK until WebKit is ready to use them again.
Comment 43 Carlos Garcia Campos 2020-06-16 08:12:48 PDT
Are you really getting crash reports about this in fedora? We don't have more reports upstream.
Comment 44 Michael Catanzaro 2020-06-16 08:30:24 PDT
(In reply to Carlos Garcia Campos from comment #43)
> Are you really getting crash reports about this in fedora? We don't have
> more reports upstream.

We're not, but I think that's just because our crash reporting infrastructure is broken. We've hardly received any crash reports from WebKit in the past year or two. It's possible that we've fixed all the bugs and WebKit has become nearly perfect... but I don't think so. ;)
Comment 45 Michael Catanzaro 2020-06-16 08:32:18 PDT
BTW, I'm OK with keeping WPE renderer as long as we add some sort of debugging to make it possible to solve this bug when crashes occur. My attempt in https://github.com/Igalia/WPEBackend-fdo/pull/89 was not successful.
Comment 46 Carlos Garcia Campos 2020-06-17 00:21:55 PDT
(In reply to Michael Catanzaro from comment #45)
> BTW, I'm OK with keeping WPE renderer as long as we add some sort of
> debugging to make it possible to solve this bug when crashes occur. My
> attempt in https://github.com/Igalia/WPEBackend-fdo/pull/89 was not
> successful.

Let's add it then, Adrian?
Comment 47 Michael Catanzaro 2020-07-17 08:11:51 PDT
Help? :)
Comment 48 Adrian Perez 2020-07-22 07:26:35 PDT
Created attachment 404920 [details]
Patch
Comment 49 Michael Catanzaro 2020-07-22 07:34:20 PDT
Comment on attachment 404920 [details]
Patch

Thanks!
Comment 50 Carlos Garcia Campos 2020-07-23 00:46:07 PDT
Comment on attachment 404920 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=404920&action=review

> Source/WebCore/platform/graphics/egl/GLContextEGL.cpp:195
> +    default:
> +        RELEASE_ASSERT_NOT_REACHED();

I think it's better not to add default here, since we are handling all possible cases (supported for the given build).

> Source/WebCore/platform/graphics/egl/GLContextEGL.cpp:248
> +        WTFLogAlways("Cannot create surfaceless EGL context: required extensions missing.");

I prefer not to add messages for things that are not errors. We did in the past and people thought there were errors, reporting them as possible cause of other bugs.

> Source/WebCore/platform/graphics/egl/GLContextEGL.cpp:306
> +        default:
> +            RELEASE_ASSERT_NOT_REACHED();
> +        }

Same here about the default, let the compiler complain.

> Source/WebCore/platform/graphics/egl/GLContextEGL.cpp:353
> +        default:
> +            RELEASE_ASSERT_NOT_REACHED();

Ditto.
Comment 51 Adrian Perez 2020-07-28 07:59:14 PDT
Created attachment 405357 [details]
Patch v2
Comment 52 EWS 2020-07-28 08:31:52 PDT
Committed r264986: <https://trac.webkit.org/changeset/264986>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 405357 [details].
Comment 53 Adrian Perez 2020-07-28 08:56:59 PDT
The landed patch was to add additional logging, let's keep the
bug open until we find out the root cause and fix it :)
Comment 54 Fujii Hironori 2020-07-28 23:48:00 PDT
Committed r265031: <https://trac.webkit.org/changeset/265031>
Comment 55 Fujii Hironori 2020-07-28 23:50:21 PDT
oops, reopened.
Comment 56 Michael Catanzaro 2020-08-05 08:55:56 PDT
I hit this again today, but discovered that the RELEASE_LOGS kinda failed since they are not activated by default. Can we change this to WTFLogAlways?

Here's what I see:

Aug 05 10:50:42 chargestone-cave geary[14241]: Cannot create EGL context: invalid display (last error: EGL_SUCCESS)
Aug 05 10:50:42 chargestone-cave kernel: WebKitWebProces[14241]: segfault at 0 ip 00007f6ca3dee168 sp 00007ffd6b29e798 error 4 in libwebkit2gtk-4.0.so.37.49.0[7f6ca17db000+32cb000]
Aug 05 10:50:42 chargestone-cave kernel: Code: c4 08 48 01 d8 5b 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 0f 1e fa 48 8b 7f 10 <48> 8b 07 ff 60 10 66 90 f3 0f 1e fa 48 8b 7f 10 48 8b 07 ff 60 50
Aug 05 10:50:42 chargestone-cave audit[14241]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=14241 comm="WebKitWebProces" exe="/usr/libexec/webkit2gtk-4.0/WebKitWebProcess" sig=11 res=1
Aug 05 10:50:42 chargestone-cave audit: BPF prog-id=72 op=LOAD
Aug 05 10:50:42 chargestone-cave audit: BPF prog-id=73 op=LOAD
Aug 05 10:50:42 chargestone-cave audit: BPF prog-id=74 op=LOAD
Aug 05 10:50:42 chargestone-cave audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@5-16310-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 05 10:50:42 chargestone-cave systemd[1]: Started Process Core Dump (PID 16310/UID 0).
Aug 05 10:50:42 chargestone-cave epiphany[5820]: Web process crashed
Aug 05 10:50:43 chargestone-cave systemd-coredump[16311]: Process 14241 (WebKitWebProces) of user 1000 dumped core.
                                                          
                                                          Stack trace of thread 2305:
                                                          #0  0x00007f6ca3dee168 n/a (/usr/lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37.49.0 + 0x2613168)
Aug 05 10:50:43 chargestone-cave systemd[1]: systemd-coredump@5-16310-0.service: Succeeded.
Aug 05 10:50:43 chargestone-cave audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@5-16310-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 05 10:50:43 chargestone-cave geary[16318]: Cannot get default EGL display: EGL_BAD_PARAMETER
Aug 05 10:50:43 chargestone-cave geary[16318]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

Notice that geary seems to be failing in the same way at exactly the same time that Epiphany crashed, so whatever has gone wrong has happened in multiple WebKit  UI processes at the same time. I've never noticed that before. But unlike Epiphany, Geary never crashes.
Comment 57 Michael Catanzaro 2020-08-18 14:46:59 PDT
Adrian, would you be OK with a patch that changes your debugging to use WTFLogAlways instead of RELEASE_LOG?
Comment 58 Michael Catanzaro 2020-08-25 11:31:42 PDT
We discussed this on IRC yesterday. Adrian's concern is that some of the log messages are really debug messages rather than warnings, and I agreed they shouldn't print by default. But neither of us were able to figure out how to get the messages to appear in the journal. We were hoping WEBKIT_DEBUG=Compositing=info would work, but no such luck.
Comment 59 Michael Catanzaro 2020-10-11 10:13:33 PDT
I think this bug is associated with video playback failure on reddit.com and many other websites. When the UI process gets stuck in this mysterious broken state

I've tried debugging with 'flatpak enter' when it happens, but we have no gdb inside the Platform runtime, only in Sdk. Extremely frustrating. We won't make any progress without more logging or more assertions.
Comment 60 Michael Catanzaro 2020-10-11 10:53:20 PDT
Here's the current version of the crash with 2.30.1, showing everything that gets logged, with kernel and audit stuff trimmed out except for the one line showing where the crash occurs:

Oct 11 12:43:01 chargestone-cave org.gnome.Epiphany.Devel.desktop[394756]: Cannot create EGL context: invalid display (last error: EGL_SUCCESS)
Oct 11 12:43:01 chargestone-cave kernel: WebKitWebProces[394756]: segfault at 0 ip 00007f1b373ea198 sp 00007ffe1f72dd68 error 4 in libwebkit2gtk-4.0.so.37.49.5[7f1b34dd0000+32d1000]
Oct 11 12:43:01 chargestone-cave epiphany[371249]: Web process crashed
Oct 11 12:43:01 chargestone-cave org.gnome.Epiphany.Devel.desktop[395171]: Cannot get default EGL display: EGL_BAD_PARAMETER
Oct 11 12:43:01 chargestone-cave org.gnome.Epiphany.Devel.desktop[395171]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

(I never figured out how to enable Adrian's new debug logs, despite some effort, so this only shows the logs enabled by default. Even if I knew how to enable the extra logging, there's no way to enable it when the crash occurs; I would have to remember to always start Epiphany with the extra environment variable for however many days or weeks required to get it to crash again. So we are going to have to make do with whatever logging we can print by default.)
Comment 61 Michael Catanzaro 2020-10-11 10:53:59 PDT
*** Bug 187958 has been marked as a duplicate of this bug. ***
Comment 62 Michael Catanzaro 2020-10-11 11:09:40 PDT
Looking through my journal, sometimes I see:

Oct 11 12:38:05 chargestone-cave org.gnome.Epiphany.Devel.desktop[393674]: Cannot get default EGL display: EGL_BAD_PARAMETER
Oct 11 12:38:05 chargestone-cave org.gnome.Epiphany.Devel.desktop[393674]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Oct 11 12:38:10 chargestone-cave org.gnome.Epiphany.Devel.desktop[393719]: Cannot get default EGL display: EGL_BAD_PARAMETER
Oct 11 12:38:10 chargestone-cave org.gnome.Epiphany.Devel.desktop[393719]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Oct 11 12:38:18 chargestone-cave org.gnome.Epiphany.Devel.desktop[393745]: Cannot get default EGL display: EGL_BAD_PARAMETER
Oct 11 12:38:18 chargestone-cave org.gnome.Epiphany.Devel.desktop[393745]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

Then begins the crash from comment #60:

Oct 11 12:40:47 chargestone-cave org.gnome.Epiphany.Devel.desktop[391553]: Cannot create EGL context: invalid display (last error: EGL_SUCCESS)

At least I think I know why we are printing EGL_SUCCESS. We have simply never made any EGL calls that can set errors prior to printing the error, so there's nothing for eglGetError() to return:

 1. Notice that GLContextEGL::createSharingContext returns immediately when it sees that platformDisplay.eglDisplay() == EGL_NO_DISPLAY, without making any EGL calls other than eglGetError() (in GLContextEGL::lastErrorString).

 2. In PlatformDisplayLibWPE::initialize, notice that the only EGL call we make is eglGetDisplay(). However, this is documented to never set an error: https://www.khronos.org/registry/EGL/sdk/docs/man/html/eglGetDisplay.xhtml. It's possible that the implementation of wpe_renderer_backend_egl_get_native_display() could be making more EGL calls, but clearly not any that can set error.

So I think, in those cases, WebKit should not try printing an error without first ensuring that we have called some function that can cause an error.
Comment 63 Michael Catanzaro 2020-10-11 11:28:57 PDT
Tracing the crash a bit... GraphicsContextGLOpenGL::GraphicsContextGLOpenGL is called. At the very top, we have:

#if USE(NICOSIA)
    m_nicosiaLayer = WTF::makeUnique<Nicosia::GCGLANGLELayer>(*this, destination);
#else
    m_texmapLayer = WTF::makeUnique<TextureMapperGCGLPlatformLayer>(*this, destination);
#endif
    makeContextCurrent();

The crash occurs inside makeContextCurrent. If gdb is to be trusted, it actually occurs inside the parent class function Nicosia::GCGLLayer::makeContextCurrent, not inside the child class function GCGLANGLELayer::makeContextCurrent. I'm not sure if gdb is to be trusted, because that shouldn't happen.

Shame all of these asserts are disabled in release builds, because it would help debugging a lot if they were enabled. Can I
Comment 64 Michael Catanzaro 2020-10-11 12:08:48 PDT
Ugh, I think I hit tab then space, and Bugzilla submitted my comment way too early. M question was: can I change the asserts to release asserts?

> If gdb is to be trusted, it actually occurs inside the parent class function Nicosia::GCGLLayer::makeContextCurrent, not inside the child class function GCGLANGLELayer::makeContextCurrent. I'm not sure if gdb is to be trusted, because that shouldn't happen.

Ah, I just got confused. There are two definitions of this function, one for #if USE(ANGLE) and a different one otherwise. And release builds do not enable USE(ANGLE), so it's expected that we create a base class Nicosia::GCGLLayer and wind up in base class Nicosia::GCGLLayer::makeContextCurrent. So in comment #63 I copied the wrong version of the function. Right before the crash, we are actually here:

// GraphicsContextGLTextureMapper.cpp
GraphicsContextGLOpenGL::GraphicsContextGLOpenGL(GraphicsContextGLAttributes attributes, HostWindow*, GraphicsContextGLOpenGL::Destination destination, GraphicsContextGLOpenGL* sharedContext)
    : GraphicsContextGL(attributes, destination, sharedContext)
{
    ASSERT_UNUSED(sharedContext, !sharedContext);
#if USE(NICOSIA)
    m_nicosiaLayer = makeUnique<Nicosia::GCGLLayer>(*this, destination);
#else
    m_texmapLayer = makeUnique<TextureMapperGCGLPlatformLayer>(*this, destination);
#endif

    makeContextCurrent();

I can see in gdb that m_nicosiaLayer and m_nicosiaLayer.get() are both non-null, so at least m_nicosiaLayer is valid. Then we call makeContextCurrent:

// GraphicsContextGLTextureMapper.cpp
bool GraphicsContextGLOpenGL::makeContextCurrent()
{
#if USE(NICOSIA)
    return m_nicosiaLayer->makeContextCurrent();
#else
    return m_texmapLayer->makeContextCurrent();
#endif
}

Which calls:

// NicosiaGCGLLayer.cpp
bool GCGLLayer::makeContextCurrent()
{
    ASSERT(m_glContext);
    return m_glContext->makeContextCurrent();
}

Then we crash immediately, so m_glContext must be nullptr. That gets set in the GCGLLayer::GCGLLayer constructor, here:

// NicosiaGCGLLayer.cpp
GCGLLayer::GCGLLayer(GraphicsContextGLOpenGL& context, GraphicsContextGLOpenGL::Destination destination)
    : m_context(context)
    , m_contentLayer(Nicosia::ContentLayer::create(Nicosia::ContentLayerTextureMapperImpl::createFactory(*this)))
{
    switch (destination) {
    case GraphicsContextGLOpenGL::Destination::Offscreen:
        m_glContext = GLContext::createOffscreenContext(&PlatformDisplay::sharedDisplayForCompositing());
        break;
    case GraphicsContextGLOpenGL::Destination::DirectlyToHostWindow:
        ASSERT_NOT_REACHED();
        break;
    }
}

Unfortunately destination is optimized out in my backtrace, and host gdb seems almost useless for poking my epiphany process inside the flatpak container. So without gdb inside my flatpak environment, I don't think there's anything I can do to see what its value was for sure, but let's assume GLContext::createOffscreenContext gets called, because that would be consistent with the log messages that are printed.

// GLContext.cpp
std::unique_ptr<GLContext> GLContext::createOffscreenContext(PlatformDisplay* platformDisplay)
{
    if (!initializeOpenGLShimsIfNeeded())
        return nullptr;

    return createContextForWindow(0, platformDisplay ? platformDisplay : &PlatformDisplay::sharedDisplay());
}

We know initializeOpenGLShimsIfNeeded() cannot fail, because it always returns true #if USE(OPENGL_ES) || USE(LIBEPOXY), and both those are true in our release builds. That means GLContext::createContextForWindow must be returning nullptr. But we know that is expected to happen, because we have lots of failure paths where that happens! Conclusion: it is *expected* that m_glContext may be nullptr! So either GCGLLayer::makeContextCurrent is wrong to ASSERT(m_glContext), or else GraphicsContextGLOpenGL::GraphicsContextGLOpenGL is wrong to call GCGLLayer::makeContextCurrent, and the later seems unlikely. Hence, I think the bug is in NicosiaGCGLLayer.cpp.

Of course, I mean the second bug that results in the crash. I still have no clue what the original bug is that causes EGL to get into this weird state. But I have discovered something very interesting there too, which I hadn't noticed before! The crashes occur in new incognito windows created from Epiphany's hamburger menu, but NOT in new incognito windows created from the action the gnome-shell's jumplist. This disproves my previous theory that the UI process is getting into some bad state! The difference is that when creating the new incognito window within Epiphany itself, the new window is created within the same flatpak environment as the original Epiphany. But when using gnome-shell's jumplist, you get a new flatpak environment. So it seems all EGL in the original flatpak environment is broken when this happens, even for newly-created processes! That is surprising.

I think that's as far as I can go without making production changes to WebKit in the GNOME runtime, or to Tech Preview (maybe we could change it to always use org.gnome.Sdk instead of org.gnome.Platform, to include gdb?).
Comment 65 Michael Catanzaro 2020-10-11 12:19:50 PDT
(In reply to Michael Catanzaro from comment #64)
> So it
> seems all EGL in the original flatpak environment is broken when this
> happens, even for newly-created processes! That is surprising.

OK, I was able to use this to make a tiny bit more progress:

$ flatpak ps
Instance   PID    Application              Runtime
34082139   395416 org.gnome.Epiphany.Devel org.gnome.Sdk
1774958924 371237 org.gnome.Epiphany.Devel org.gnome.Platform

The first one is flatpak-coredumpctl displaying my backtrace in gdb. The second one is my crashing browser instance. Now I can:

$ flatpak enter 1774958924 /bin/bash
[📦 org.gnome.Epiphany.Devel ~]$ epiphany -p https://greatriversgreenway.org/greenway-search/?mode=map

(epiphany:10230): Gdk-WARNING **: 14:14:59.686: Settings portal not found: Key/Value pair 0, “/run/user/1000/bus”, in address element “unix:/run/user/1000/bus” does not contain an equal sign
Cannot get default EGL display: EGL_BAD_PARAMETER

(epiphany:10230): libsecret-WARNING **: 14:14:59.740: couldn't get session bus: Key/Value pair 0, “/run/user/1000/bus”, in address element “unix:/run/user/1000/bus” does not contain an equal sign

** (epiphany:10230): WARNING **: 14:14:59.874: Failed to search secrets in password schema: Key/Value pair 0, “/run/user/1000/bus”, in address element “unix:/run/user/1000/bus” does not contain an equal sign

(WebKitWebProcess:10249): Gdk-WARNING **: 14:14:59.905: Settings portal not found: Key/Value pair 0, “/run/user/1000/bus”, in address element “unix:/run/user/1000/bus” does not contain an equal sign
Cannot get default EGL display: EGL_BAD_PARAMETER
PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Cannot create EGL context: invalid display (last error: EGL_SUCCESS)

** (epiphany:10230): WARNING **: 14:15:01.435: Web process crashed

(WebKitWebProcess:10283): Gdk-WARNING **: 14:15:01.473: Settings portal not found: Key/Value pair 0, “/run/user/1000/bus”, in address element “unix:/run/user/1000/bus” does not contain an equal sign
Cannot get default EGL display: EGL_BAD_PARAMETER
PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

Nothing special about that URL: it is just some page that uses WebGL, triggering the crash. So now I can run arbitrary commands in the affected sandbox, and can try running with special environment variables if someone else wants to figure out how to enable those release logs. But sadly, no access to gdb or other devel tools.

If I 'flatpak enter 34082139 /bin/bash' instead, and run 'epiphany -p' there, the crash does not occur, which is expected because only the original flatpak environment is in the broken state, not the flatpak-coredumpctl environment.

Note the settings portal warnings are unrelated (they occur in both broken and non-broken environments). That might be a bug for Patrick, but clearly unrelated.
Comment 66 Michael Catanzaro 2020-10-11 12:34:25 PDT
(In reply to Michael Catanzaro from comment #64)
> So it
> seems all EGL in the original flatpak environment is broken when this
> happens, even for newly-created processes! That is surprising.

I forgot to mention the most important implication: this means that we can install any sort of debug program into the flatpak environment to try testing and printing whatever we want, and then next time I notice this crash, I can enter the flatpak environment and try running the debug program! That possibility did not exist when I thought the bad state was specific to the current UI process.

Caveat: if we fix the crash in NicosiaGCGLLayer.cpp (which we should do ASAP, because it's horrible), then I'll no longer notice when the EGL bug occurs, and won't know when to run the debug program.
Comment 67 Michael Catanzaro 2021-01-08 09:40:22 PST
Today I noticed that fullscreen video was broken again. Of course it's this bug. We have made no progress on this, and don't have any prospects of successfully debugging it anytime soon, and I'm tired of it, so I'm going to disable WPE renderer in the GNOME flatpak runtime for now.

I don't have any evidence that this problem occurs outside flatpak, so I'll leave it enabled in Fedora for now. (It's much more work to remove and revive packages in Fedora than it is to remove/revive buildstream elements for a flatpak runtime.)
Comment 68 Michael Catanzaro 2021-07-30 06:52:50 PDT
Still broken with the 21.08 runtime. Still don't know how to reproduce.
Comment 69 Alice Mikhaylenko 2021-10-28 07:42:07 PDT
I'm not sure if it's related, but EGL seems completely broken in the 21.08 SDK.

```
alexm@lenovo-thinkpad-x1-yoga ~/P/W/GTK4> FLATPAK_USER_DIR=WebKitBuild/UserFlatpak/ flatpak run --device=dri --socket=wayland --command=gtk4-widget-factory org.webkit.Sdk//21.08
Gsk-Message: 19:42:21.136: Failed to realize renderer of type 'GskNglRenderer' for surface 'GdkWaylandToplevel': No GL implementation is available
```
Comment 70 Michael Catanzaro 2021-11-29 11:41:37 PST
Bug #233578 looks related, and the user there is reportedly able to reproduce much more easily than I am.
Comment 71 Michael Catanzaro 2022-02-22 11:07:46 PST
I'm hitting this crash whenever clicking on the Layers tab in the web inspector.
Comment 72 Michael Catanzaro 2022-09-02 11:20:24 PDT
Updated backtrace from today, which I hit 100% of the time about five seconds after loading https://id.spectrum.net/

Core was generated by `/usr/libexec/webkit2gtk-4.1/WebKitWebProcess 266 97'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f2f7e68a428 in Nicosia::GCGLLayer::makeContextCurrent (this=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/platform/graphics/nicosia/texmap/NicosiaGCGLLayer.cp--Type <RET> for more, q to quit, c to continue without paging--
p:61
61	    return m_glContext->makeContextCurrent();
[Current thread is 1 (Thread 0x7f2f7697be80 (LWP 2))]
(gdb) bt
#0  0x00007f2f7e68a428 in Nicosia::GCGLLayer::makeContextCurrent() (this=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/platform/graphics/nicosia/texmap/NicosiaGCGLLayer.cpp:61
#1  0x00007f2f7e5d1671 in WebCore::GraphicsContextGLOpenGL::GraphicsContextGLOpenGL(WebCore::GraphicsContextGLAttributes) (this=this@entry=0x7f2de0603c00, attributes=...)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/platform/graphics/opengl/GraphicsContextGLOpenGL.cpp:132
#2  0x00007f2f7e65d027 in WebCore::GraphicsContextGLTextureMapper::GraphicsContextGLTextureMapper(WebCore::GraphicsContextGLAttributes&&) (this=0x7f2de0603c00, attributes=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/platform/graphics/texmap/GraphicsContextGLTextureMapper.cpp:101
#3  0x00007f2f7e65d07f in WebCore::GraphicsContextGLTextureMapper::create(WebCore::GraphicsContextGLAttributes&&)
    (attributes=...)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/platform/graphics/texmap/GraphicsContextGLTextureMapper.cpp:94
#4  0x00007f2f7e65d1a8 in WebCore::createWebProcessGraphicsContextGL(WebCore::GraphicsContextGLAttributes const&)
    (attributes=...)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/platform/graphics/texmap/GraphicsContextGLTextureMapper.cpp:81
#5  0x00007f2f7e51bec7 in WebKit::WebChromeClient::createGraphicsContextGL(WebCore::GraphicsContextGLAttributes const&) const (this=<optimized out>, attributes=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebKit/WebProcess/WebCoreSupport/WebChromeClient.cpp:936
#6  0x00007f2f7fafd2bc in WebCore::Chrome::createGraphicsContextGL(WebCore::GraphicsContextGLAttributes const&) const
    (this=<optimized out>, attributes=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/page/Chrome.cpp:545
#7  0x00007f2f7f8ae325 in WebCore::WebGLRenderingContextBase::create(WebCore::CanvasBase&, WebCore::GraphicsContextGLAttributes&, WebCore::GraphicsContextGLWebGLVersion)
     (canvas=..., attributes=..., type=type@entry=WebCore::GraphicsContextGLWebGLVersion::WebGL1)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/html/canvas/WebGLRenderingContextBase.cpp:926
#8  0x00007f2f7f7745e9 in WebCore::HTMLCanvasElement::createContextWebGL(WebCore::GraphicsContextGLWebGLVersion, WebCore::GraphicsContextGLAttributes&&)
    (this=this@entry=0x7f2f060d58c0, type=type@entry=WebCore::GraphicsContextGLWebGLVersion::WebGL1, attrs=...)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/html/HTMLCanvasElement.cpp:477
#9  0x00007f2f7f7785b0 in WebCore::HTMLCanvasElement::getContext(JSC::JSGlobalObject&, WTF::String const&, WTF::FixedVector<JSC::Strong<JSC::Unknown, (JSC::ShouldStrongDestructorGrabLock)0> >&&)
     (this=this@entry=0x7f2f060d58c0, state=..., contextId=..., arguments=...)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/html/HTMLCanvasElement.cpp:335
#10 0x00007f2f7ea279cd in WebCore::jsHTMLCanvasElementPrototypeFunction_getContextBody(JSC::JSGlobalObject*, JSC::CallFrame*, WebCore::IDLOperation<WebCore::JSHTMLCanvasElement>::ClassParameter)
    (lexicalGlobalObject=0x7f2f1e270e68, callFrame=0x7fffba87d930, castedThis=0x7f2de2f73848)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/_builddir/WebCore/DerivedSources/JSHTMLCanvasElement.cpp:317
#11 0x00007f2f7ea27e58 in WebCore::IDLOperation<WebCore::JSHTMLCanvasElement>::call<WebCore::jsHTMLCanvasElementPrototypeFunction_getContextBody>
    (operationName=<optimized out>, callFrame=<optimized out>, lexicalGlobalObject=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/WebCore/bindings/js/JSDOMOperation.h:60
#12 WebCore::jsHTMLCanvasElementPrototypeFunction_getContext(JSC::JSGlobalObject*, JSC::CallFrame*)
    (lexicalGlobalObject=<optimized out>, callFrame=<optimized out>)
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/_builddir/WebCore/DerivedSources/JSHTMLCanvasElement.cpp:322
#13 0x00007f2f20008038 in  ()
#14 0x00007fffba87d9d0 in  ()
#15 0x00007f2f7be7b7aa in op_call_slow_return_location ()
    at /usr/lib/debug/source/sdk/webkit2gtk-4.1.bst/Source/JavaScriptCore/llint/LowLevelInterpreter.asm:1179
#16 0x0000000000000000 in  ()

Today the "broken global state" bug comes with a bonus: I cannot scroll Epiphany Tech Preview using the mouse wheel (but *can* scroll fine using the scrollbar, or if I switch to my jhbuild Epiphany). I bet it will be fixed if I reboot, but then I no doubt won't be able to reproduce this bug anymore.
Comment 73 Michael Catanzaro 2022-09-02 11:34:13 PDT
I'm starting to suspect this somehow only affects flatpaks.

https://webkit.org/blog-files/3d-transforms/poster-circle.html is broken in Ephy Tech Preview, but *NOT* in my jhbuild Epiphany. I can also scroll fine in jhbuild Epiphany, and it doesn't crash when loading https://id.spectrum.net/.

Still seeing the familiar spam in my journal:

Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55252]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55247]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55249]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55252]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55249]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55247]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55261]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55261]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55268]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55273]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55268]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55272]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55273]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55272]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55286]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55286]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55275]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55285]: Cannot get default EGL display: EGL_BAD_PARAMETER
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55285]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.
Sep 02 13:01:28 chargestone-cave org.gnome.Epiphany.Devel.desktop[55275]: PlatformDisplayLibWPE: could not create the EGL display: EGL_SUCCESS.

So EGL is broken system-wide... but somehow only for Epiphany Tech Preview?

I think WebKit should crash more gracefully when EGL is broken. That might be the best we can do here. This is really starting to smell like a mesa bug, isn't it?
Comment 74 Michael Catanzaro 2022-09-02 11:43:38 PDT
At Exalm's suggestion, I managed to reproduce the EGL display failure without using WebKit at all:

$ flatpak run -d --command=gtk4-widget-factory org.gnome.Epiphany.Devel
Gsk-Message: 13:39:27.787: Failed to realize renderer of type 'GskGLRenderer' for surface 'GdkWaylandToplevel': Failed to create EGL display


(gst-plugin-scanner:12): GLib-GObject-WARNING **: 13:39:27.982: type name '-a-png-encoder-pred' contains invalid characters

(gst-plugin-scanner:12): GLib-GObject-CRITICAL **: 13:39:27.982: g_type_set_qdata: assertion 'node != NULL' failed

(gst-plugin-scanner:12): GLib-GObject-CRITICAL **: 13:39:27.983: g_type_set_qdata: assertion 'node != NULL' failed
Gsk-Message: 13:39:38.762: Failed to realize renderer of type 'GskGLRenderer' for surface 'GdkWaylandPopup': Failed to create EGL display


So something is busted at the graphics driver level, at least in the flatpak environment. gtk4-widget-factory works perfectly fine on my host. This bug should remain open, though, because WebKit should still not crash.
Comment 75 Michael Catanzaro 2022-09-02 11:47:06 PDT
No errors if I do:

$ flatpak run -d --command=gtk4-widget-factory com.belmoussaoui.Authenticator

which uses org.gnome.Sdk//42 rather than org.gnome.Sdk//master. So the bug, or at least this particular case of the bug, seems limited to the nightly runtime?
Comment 76 Michael Catanzaro 2022-09-02 13:22:22 PDT
"Fixed" by running:

$ flatpak install org.gnome.Builder.Devel
Looking for matches…

org.gnome.Builder.Devel permissions:
    ipc                       network       fallback-x11       session-bus
    ssh-auth                  system-bus    wayland            x11
    dri                       devel         file access [1]    dbus access [2]
    system dbus access [3]    tags [4]

    [1] /var/lib/flatpak, home, host, xdg-data/meson, xdg-run/gvfsd,
        xdg-run/keyring, ~/.local/share/flatpak
    [2] org.freedesktop.FileManager1, org.freedesktop.Flatpak,
        org.freedesktop.PackageKit, org.freedesktop.secrets, org.gtk.vfs.*
    [3] org.freedesktop.Avahi, org.freedesktop.PolicyKit1, org.gnome.Sysprof3
    [4] nightly


        ID                                  Branch Op Remote        Download
 1. [✓] org.freedesktop.Platform.GL.default 22.08  i  gnome-nightly   2.3 kB / 131.0 MB
 2. [✓] org.gnome.Builder.Devel.Locale      master i  gnome-nightly   8.0 kB / 3.3 MB
 3. [✓] org.gnome.Sdk.Debug                 master u  gnome-nightly 279.0 kB / 5.6 GB
 4. [✓] org.gnome.Sdk.Locale                master u  gnome-nightly  17.5 kB / 344.8 MB
 5. [✓] org.gnome.Sdk                       master u  gnome-nightly   5.2 MB / 659.8 MB
 6. [✓] org.gnome.Builder.Devel             master i  gnome-nightly 205.6 MB / 196.8 MB

Changes complete.


Note that org.freedesktop.Platform.GL.default is an 'i' install, not 'u' update. So we now have a reproducer:

$ flatpak remove org.freedesktop.Platform.GL.default//22.08

will trigger the bug, and:

$ flatpak install org.freedesktop.Platform.GL.default//22.08

will fix the bug.

What we don't know is what went wrong for me today: how did my org.freedesktop.Platform.GL.default get uninstalled? My browser was working fine this morning but started crashing about two hours ago. Did Software decide to remove the GL extension for some reason? Who knows.

But at least finally, after nearly three years, we at last have enough information to reproduce and fix the WebKit crash. \o/
Comment 77 Carlos Garcia Campos 2022-09-05 05:45:06 PDT
Pull request: https://github.com/WebKit/WebKit/pull/4029
Comment 78 Carlos Garcia Campos 2022-09-05 05:46:44 PDT
(In reply to Carlos Garcia Campos from comment #77)
> Pull request: https://github.com/WebKit/WebKit/pull/4029

Could you try this PR to confirm it fixes the crash? And only the crash, rendering will be broken I guess.
Comment 79 Michael Catanzaro 2022-09-05 06:43:53 PDT
(In reply to Carlos Garcia Campos from comment #78)
> Could you try this PR to confirm it fixes the crash? And only the crash,
> rendering will be broken I guess.

If you think there's a substantial risk your patch is not correct, then I can add it to the GNOME runtime now for testing purposes and find out. But this is annoying to do, so it's easier to wait until it lands in a released version of WebKitGTK and check again then.
Comment 80 Michael Catanzaro 2022-09-05 06:45:58 PDT
See also: bug #233578 and bug #239429, which look similar, but not the same
Comment 81 Michael Catanzaro 2022-09-05 06:54:41 PDT
Thinking about this more, the easiest way to test is probably to locally hack GLContext::createOffscreenContext to always return nullptr.
Comment 82 EWS 2022-09-06 02:17:09 PDT
Committed 254183@main (b7d555805988): <https://commits.webkit.org/254183@main>

Reviewed commits have been landed. Closing PR #4029 and removing active labels.