Bug 261874 - REGRESSION(2.42): [GTK] GTK 3 rendering broken with 2.42 on NVIDIA graphics
Summary: REGRESSION(2.42): [GTK] GTK 3 rendering broken with 2.42 on NVIDIA graphics
Status: RESOLVED MOVED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P3 Normal
Assignee: Nobody
URL:
Keywords: Gtk
: 259644 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-09-21 05:29 PDT by gfrank227
Modified: 2024-02-25 09:38 PST (History)
13 users (show)

See Also:


Attachments
Screenshot of white screen (827.93 KB, image/png)
2023-09-21 05:29 PDT, gfrank227
no flags Details
Coredump (13.48 KB, text/plain)
2023-09-27 07:34 PDT, gfrank227
no flags Details
Screenshot showing Epiphany Tech Preview having non-blank WebView despite not setting any environment variables (529.82 KB, image/png)
2023-10-27 06:50 PDT, Kdwk
no flags Details
Backtrace #1 - running trgui-ng normally. (10.42 KB, text/plain)
2023-10-31 06:21 PDT, gfrank227
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description gfrank227 2023-09-21 05:29:55 PDT
Created attachment 467810 [details]
Screenshot of white screen

OK So for context, I'm using Arch and recently updated webkit2gtk from 2.40.5-2 to 2.42.0-1. One of my programs trgui-ng, stopped rendering properly. I tracked down the problem to webkit2gtk, and with the old version, the program still works but not with the new version. I put in a bug report with trgui-ng here: https://github.com/openscopeproject/TrguiNG/issues/84, and he came to the conclusion that it is a bug with this program, which makes sense.

The program still seems to work properly - it's just completely white. I'll add a screenshot attachment as well. Hopefully you guys can correct the error. Thanks!
Comment 1 Michael Catanzaro 2023-09-21 06:10:12 PDT
Any warning messages when you run it on the command line?
Comment 2 Michael Catanzaro 2023-09-21 06:20:14 PDT
Also, I assume you have NVIDIA graphics and are using the proprietary graphics driver. Is that right?
Comment 3 Michael Catanzaro 2023-09-21 06:22:12 PDT
You're probably hitting bug #259644. Please confirm.
Comment 4 gfrank227 2023-09-21 06:36:19 PDT
Yes, I am running Nvidia with proprietary drivers. that's correct.

The terminal output is identical with the old and new versions of webkit2gtk, it has only something about parsing the settings, nothing about webkit2gtk.It's

It's not clear to me if it's the same; it might be; but i'm not getting any of those permission denied messages on the command line.
Comment 5 Michael Catanzaro 2023-09-21 06:54:25 PDT
Please navigate to webkit://gpu and click the paste to clipboard button, and attach it here. (You can use Epiphany if there's no easy way to do this with trgui-ng.)
Comment 6 gfrank227 2023-09-21 09:58:33 PDT
(In reply to Michael Catanzaro from comment #5)
> Please navigate to webkit://gpu and click the paste to clipboard button, and
> attach it here. (You can use Epiphany if there's no easy way to do this with
> trgui-ng.)

OK so trgui-ng can't do that. I installed epiphany and attempted to do go to that page. The program quits on me. With terminal output, I get the following message:

158739 segmentation fault (core dumped)  epiphany
Comment 7 gfrank227 2023-09-21 10:00:58 PDT
But your thought processes got me to try something else. With the new version of webkit, I switched from Nvidia graphics to integrated GPU for AMD an d the problem disappears. 

Therefore, I do think we can safely say that the problem is nvidia-related.
Comment 8 Carlos Garcia Campos 2023-09-25 00:48:46 PDT
Could you check the output of 

$ cat /sys/module/nvidia_drm/parameters/modeset
Comment 9 Carlos Garcia Campos 2023-09-25 00:49:10 PDT
Are you under X11 or Wayland?
Comment 10 Michael Catanzaro 2023-09-25 05:20:43 PDT
(In reply to gfrank227 from comment #6)
> 158739 segmentation fault (core dumped)  epiphany

Attach a backtrace please.
Comment 11 gfrank227 2023-09-25 18:49:45 PDT
(In reply to Carlos Garcia Campos from comment #8)
> Could you check the output of 
> 
> $ cat /sys/module/nvidia_drm/parameters/modeset

OK so this command (with sudo) returns Y.
I'm on X11
Comment 12 gfrank227 2023-09-25 19:08:28 PDT
(In reply to Michael Catanzaro from comment #10)
> (In reply to gfrank227 from comment #6)
> > 158739 segmentation fault (core dumped)  epiphany
> 
> Attach a backtrace please.

OK so this is my first time doing this. I think i got what you needed from gdb:

Thread 1 "epiphany" received signal SIGSEGV, Segmentation fault.
0x00007fff82a032cc in ?? () from /usr/lib/libnvidia-eglcore.so.535.113.01

If this is not what you needed, please help me out with some detailed instructions on how to get what you need. Thanks!
Comment 13 Carlos Garcia Campos 2023-09-26 00:40:02 PDT
(In reply to gfrank227 from comment #11)
> (In reply to Carlos Garcia Campos from comment #8)
> > Could you check the output of 
> > 
> > $ cat /sys/module/nvidia_drm/parameters/modeset
> 
> OK so this command (with sudo) returns Y.
> I'm on X11

So, if you don't see any error message, and having drm modeset enabled, I think the problem is that for some reason it's failing to map the imported buffer. It's difficult to know why without the nvidia gbm sources... Is trgui-ng using gtk4? in that case you can try forcing egl with GDK_DEBUG=gl-egl
Comment 14 gfrank227 2023-09-26 05:26:22 PDT
I can check with the developer; but it appears to be using gtk-3.0, and i'm uncertain how to do this anyway without more specific instructions.
Comment 15 Michael Catanzaro 2023-09-26 08:47:26 PDT
(In reply to gfrank227 from comment #12)
> OK so this is my first time doing this. I think i got what you needed from
> gdb:
> 
> Thread 1 "epiphany" received signal SIGSEGV, Segmentation fault.
> 0x00007fff82a032cc in ?? () from /usr/lib/libnvidia-eglcore.so.535.113.01
> 
> If this is not what you needed, please help me out with some detailed
> instructions on how to get what you need. Thanks!

For future reference you can find instructions for getting backtraces here: https://blogs.gnome.org/mcatanzaro/2021/09/18/creating-quality-backtraces-for-crash-reports/

But in this particular case, it's not going to help because you're crashing inside the NVIDIA driver. Since the driver is proprietary and we don't have source code, it's the end of the road. You can report a bug to NVIDIA and hope they investigate.

This crash may or may not be related to your rendering issue.
Comment 16 gfrank227 2023-09-26 08:55:59 PDT
(In reply to Michael Catanzaro from comment #15)
> (In reply to gfrank227 from comment #12)
> > OK so this is my first time doing this. I think i got what you needed from
> > gdb:
> > 
> > Thread 1 "epiphany" received signal SIGSEGV, Segmentation fault.
> > 0x00007fff82a032cc in ?? () from /usr/lib/libnvidia-eglcore.so.535.113.01
> > 
> > If this is not what you needed, please help me out with some detailed
> > instructions on how to get what you need. Thanks!
> 
> For future reference you can find instructions for getting backtraces here:
> https://blogs.gnome.org/mcatanzaro/2021/09/18/creating-quality-backtraces-
> for-crash-reports/
> 
> But in this particular case, it's not going to help because you're crashing
> inside the NVIDIA driver. Since the driver is proprietary and we don't have
> source code, it's the end of the road. You can report a bug to NVIDIA and
> hope they investigate.
> 
> This crash may or may not be related to your rendering issue.

I hear what you're saying, but it works perfectly on the previous version of webkit. When I downgrade the package to the prior one, the program renders just fine. Therefore, something about the new version of webkit is clearly doing something that the old version is not.
Comment 17 Michael Catanzaro 2023-09-26 09:13:11 PDT
Well yeah of course, the rendering code is very different. But we aren't switching back.
Comment 18 gfrank227 2023-09-26 09:35:22 PDT
(In reply to Michael Catanzaro from comment #17)
> Well yeah of course, the rendering code is very different. But we aren't
> switching back.

I understand. Changing code around is a very important thing. But something you did causes it not to work well with Nvidia; that doesn't make it a nvidia bug. This seems like everybody is pushing the bug off to the other place. the developer of trgui-ng pushed it off to webkit; you're pushing it off to Nvidia; and surely Nvidia will push it right back on webkit, and the problem never gets solved. :-(
Comment 19 Michael Catanzaro 2023-09-26 10:01:12 PDT
Yeah I know. Actually NVIDIA will probably not respond at all. But maybe.

The rendering issue is probably a WebKitGTK bug and the crash is probably an NVIDIA bug, but only NVIDIA can tell us for sure. Sorry. :(
Comment 20 Carlos Garcia Campos 2023-09-27 00:36:35 PDT
If previous version worked, you can just disable the dmabuf renderer (WEBKIT_DISABLE_DMABUF_RENDERER=1) for now until we figure out how to make the nvidia driver work with the dmabuf renderer.
Comment 21 Michael Catanzaro 2023-09-27 04:01:51 PDT
(In reply to Carlos Garcia Campos from comment #20)
> If previous version worked, you can just disable the dmabuf renderer
> (WEBKIT_DISABLE_DMABUF_RENDERER=1) for now until we figure out how to make
> the nvidia driver work with the dmabuf renderer.

Maybe we should just do it ourselves by checking the GL vendor string? Seems hacky, but would presumably avoid a lot of disappointed users? strstr(glGetString(GL_VENDOR), "NVIDIA") != nullptr?

(In reply to Michael Catanzaro from comment #15)
> But in this particular case, it's not going to help because you're crashing
> inside the NVIDIA driver. Since the driver is proprietary and we don't have
> source code, it's the end of the road. You can report a bug to NVIDIA and
> hope they investigate.

Actually, looking at this again, I'd like to see as much as possible. So please go ahead and try those instructions anyway and we'll just see how much of the backtrace is possible to get, even though we won't be able to see where it ends.
Comment 22 gfrank227 2023-09-27 06:06:34 PDT
(In reply to Michael Catanzaro from comment #21)
> (In reply to Carlos Garcia Campos from comment #20)
> > If previous version worked, you can just disable the dmabuf renderer
> > (WEBKIT_DISABLE_DMABUF_RENDERER=1) for now until we figure out how to make
> > the nvidia driver work with the dmabuf renderer.
> 
> Maybe we should just do it ourselves by checking the GL vendor string? Seems
> hacky, but would presumably avoid a lot of disappointed users?
> strstr(glGetString(GL_VENDOR), "NVIDIA") != nullptr?
> 
> (In reply to Michael Catanzaro from comment #15)
> > But in this particular case, it's not going to help because you're crashing
> > inside the NVIDIA driver. Since the driver is proprietary and we don't have
> > source code, it's the end of the road. You can report a bug to NVIDIA and
> > hope they investigate.
> 
> Actually, looking at this again, I'd like to see as much as possible. So
> please go ahead and try those instructions anyway and we'll just see how
> much of the backtrace is possible to get, even though we won't be able to
> see where it ends.

OK i'll read the instructions (which I already saved as a bookmark for future refrerence) and try to get the backtrace. I'll try to get back to you later today. Meanwhile, is there a settings file that I need to change this WEBKIT_DISABLE_DMABUF_RENDERER in? Please advise. Thanks!
Comment 23 gfrank227 2023-09-27 07:34:43 PDT
Created attachment 467898 [details]
Coredump

Is this, perhaps, what you're looking for? This is from the epiphany crash when I went to webkit://gpu with the nvidia driver.
I am not sure that this is related to the rendering issue, to be honest.
Comment 24 gfrank227 2023-09-27 07:53:19 PDT
(In reply to gfrank227 from comment #22)
> (In reply to Michael Catanzaro from comment #21)
> > (In reply to Carlos Garcia Campos from comment #20)
> > > If previous version worked, you can just disable the dmabuf renderer
> > > (WEBKIT_DISABLE_DMABUF_RENDERER=1) for now until we figure out how to make
> > > the nvidia driver work with the dmabuf renderer.
> > 
> > Maybe we should just do it ourselves by checking the GL vendor string? Seems
> > hacky, but would presumably avoid a lot of disappointed users?
> > strstr(glGetString(GL_VENDOR), "NVIDIA") != nullptr?
> > 
> > (In reply to Michael Catanzaro from comment #15)
> > > But in this particular case, it's not going to help because you're crashing
> > > inside the NVIDIA driver. Since the driver is proprietary and we don't have
> > > source code, it's the end of the road. You can report a bug to NVIDIA and
> > > hope they investigate.
> > 
> > Actually, looking at this again, I'd like to see as much as possible. So
> > please go ahead and try those instructions anyway and we'll just see how
> > much of the backtrace is possible to get, even though we won't be able to
> > see where it ends.
> 
> OK i'll read the instructions (which I already saved as a bookmark for
> future refrerence) and try to get the backtrace. I'll try to get back to you
> later today. Meanwhile, is there a settings file that I need to change this
> WEBKIT_DISABLE_DMABUF_RENDERER in? Please advise. Thanks!

NM, I figured out the environment variable. And with the environment variable disabled, it renders properly!! Thank you for this workaround. This is much better than keeping a downgraded package. I configured my DE to run a script that sets this environment variable automatically when I open this program. Meanwhile, I'm so thankful for your help.
Comment 25 Michael Catanzaro 2023-09-27 08:11:15 PDT
(In reply to gfrank227 from comment #23)
> Is this, perhaps, what you're looking for? This is from the epiphany crash
> when I went to webkit://gpu with the nvidia driver.
> I am not sure that this is related to the rendering issue, to be honest.

No, that looks like the output coredumpctl prints before opening the core dump in gdb. We want to see a backtrace taken with gdb, with debuginfo installed so there are no ??? lines except for the NVIDIA driver. The blog post provides instructions for using gdb to print the backtrace.
Comment 26 Arthur Moraes do Lago 2023-10-04 09:46:08 PDT
I'm having the same problem, lightdm-webkit2-greeter stopped, epiphany shows a blank screen, etc. I intended to help capturing the core dump you guys have been asking, but I did not encounter a segfault at any point. I'm running an NVIDIA graphics card too. /sys/module/nvidia_drm/parameters/modeset is set to N.
I get the following errors in logs of both programs:
```
src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)

Failed to create GBM buffer of size 3840x2160: Invalid argument
src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)

Failed to create GBM buffer of size 3840x2160: Invalid argument
src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)

Failed to create GBM buffer of size 3840x2160: Invalid argument
Failed to create EGL images for DMABufs with file descriptors -1, -1 and -1

```
Comment 27 Carlos Garcia Campos 2023-10-05 00:18:48 PDT
(In reply to Arthur Moraes do Lago from comment #26)
> I'm having the same problem, lightdm-webkit2-greeter stopped, epiphany shows
> a blank screen, etc. I intended to help capturing the core dump you guys
> have been asking, but I did not encounter a segfault at any point. I'm
> running an NVIDIA graphics card too.
> /sys/module/nvidia_drm/parameters/modeset is set to N.

This is the problem in your case then, this needs to be enabled to be able to create gpu buffers with GBM.

> I get the following errors in logs of both programs:
> ```
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create):
> DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 3840x2160: Invalid argument
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create):
> DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 3840x2160: Invalid argument
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create):
> DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 3840x2160: Invalid argument
> Failed to create EGL images for DMABufs with file descriptors -1, -1 and -1
> 
> ```
Comment 28 anlutsenko 2023-10-11 22:53:07 PDT
I have encountered this issue in a VirtualBox VM so I don't think it is limited to NVIDIA graphics.

I was running ubuntu 23 KDE live image with VMSVGA virtual graphics, 128MB, 3d acceleration enabled.
Comment 29 Erik Kurzinger 2023-10-17 15:20:12 PDT
The NVIDIA driver's GBM implementation requires that the nvidia-drm kernel module is loaded with the parameter "modeset=1". See http://us.download.nvidia.com/XFree86/Linux-x86_64/545.23.06/README/gbm.html

With modeset=1 epiphany using the latest WebKit works fine for me, but without it I see an empty window as others have reported here.

What does appear to be a driver bug is that gbm_create_device will still succeed if this requirement is not satisfied. It's not until the application tries to actually allocate a buffer that it will fail.

I was hoping if I fixed that WebKit would fall back to one of its other renderers, but unfortunately that doesn't seem to be the case. The web process just crashes.
Comment 30 gfrank227 2023-10-17 16:58:00 PDT
(In reply to Erik Kurzinger from comment #29)
> The NVIDIA driver's GBM implementation requires that the nvidia-drm kernel
> module is loaded with the parameter "modeset=1". See
> http://us.download.nvidia.com/XFree86/Linux-x86_64/545.23.06/README/gbm.html
> 
> With modeset=1 epiphany using the latest WebKit works fine for me, but
> without it I see an empty window as others have reported here.
> 
> What does appear to be a driver bug is that gbm_create_device will still
> succeed if this requirement is not satisfied. It's not until the application
> tries to actually allocate a buffer that it will fail.
> 
> I was hoping if I fixed that WebKit would fall back to one of its other
> renderers, but unfortunately that doesn't seem to be the case. The web
> process just crashes.

Are you using Wayland? I could be wrong, but your link makes it sound like a wayland thing. Adding this kernel parameter didn't help me.
Comment 31 Erik Kurzinger 2023-10-17 17:06:46 PDT
> Are you using Wayland? I could be wrong, but your link makes it sound like a wayland thing.

I was using X11.

> Adding this kernel parameter didn't help me.

Can you confirm that reading /sys/module/nvidia_drm/parameters/modeset shows "Y"?
Comment 32 Carlos Garcia Campos 2023-10-18 02:04:32 PDT
The problem with NVIDIA is that there are several issues all of them causing the same result, a blank screen. And all possible combinations of GTK3, GTK4, X11 and Wayland makes it even more complex. After several bug reports and my own trying this is what I've found so far:

If you get errors like:

KMS: DRM_IOCTL_MODE_CREATE_DUMB failed: Permission denied
Failed to create GBM buffer of size 1148x893: Permission denied

it's probably because the nvidia gbm library is not installed (I don't know why it's not always installed as part of the driver). In this case, EGL display claims to support GBM platform, so we use it, but apparently it's mesa using llvmpipe renderer, so the KMS backend is used and we fail to allocate buffers. This could be fixed just installing the gbm nvidia library.

If you get errors like:

src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
Failed to create GBM buffer of size 3840x2160: Invalid argument

in this case it's clear that the nvidia GBM library is used, but you probably don't have modeset enabled, check the output of /sys/module/nvidia_drm/parameters/modeset if you get N, then try enabling it.

If GBM lib is installed and modeset enabled and you still get a white screen:

I guess you are under X11. In this case I think we are failing to map the imported DMA-BUF, gbm_bo_map() fails, but I still don't know why. A possible workaround if you are using GTK4 if to use GDK_DEBUG=gl-egl to force GTK to use EGL, so that we don't need to map the imported buffer. There's no workaround for GTK3, we need to figure out why gbm_bo_map is failing and fix it.

I'm not sure if there are more issues, I guess people using old nvidia drivers not supporing GBM will also see the KMS permission denied errors. Maybe we can try to detect all those situations and simply disable DMA-BUF renderer in those cases or even accelerated compositing mode.
Comment 33 Michael Catanzaro 2023-10-18 06:25:49 PDT
Very interesting.

Ideally our bug reports in WebKit Bugzilla would correspond to exactly one underlying configuration/issue. Currently for NVIDIA blank web view issues we have this one, bug #259644, and bug #228268. The status of each and how they correspond to the three separate issues you've just described is unclear.
Comment 34 Erik Kurzinger 2023-10-18 11:17:40 PDT
> gbm_bo_map() fails, but I still don't know why.

Note that gbm_bo_map only works for linear buffers, which our hardware cannot render to.
Comment 35 Kdwk 2023-10-19 07:47:41 PDT
For the egl-gem package not installed problem, it seems this package or equivalent is not installed in the Flatpak runtime. Which project should this be reported to?
Comment 36 Michael Catanzaro 2023-10-19 08:51:49 PDT
(In reply to Kdwk from comment #35)
> For the egl-gem package not installed problem, it seems this package or
> equivalent is not installed in the Flatpak runtime. Which project should
> this be reported to?

If using the GNOME flatpak runtime, report to gnome-build-meta. If using the WebKit runtime, report on this Bugzilla.
Comment 37 Kdwk 2023-10-20 02:17:47 PDT
Using the repo version of Epiphany on a host system with egl-gbm installed and modesetting turned on, web pages load normally with the DMABUF renderer. However, videos still don't work with this terminal output:

Failed to get GBM buffer from swap chain: error creating plane 0 of size 2048x858 and format 538982482: Invalid argument
src/gbm_drv_common.c:57: GBM-DRV error (get_bytes_per_component): Unknown or not supported format: 538982482
Comment 38 Carlos Garcia Campos 2023-10-20 02:59:41 PDT
I think we force linear usage for video frame, so I guess that's why.
Comment 39 Erik Kurzinger 2023-10-20 06:44:13 PDT
> Failed to get GBM buffer from swap chain: error creating plane 0 of size 2048x858 and format 538982482: Invalid argument
> src/gbm_drv_common.c:57: GBM-DRV error (get_bytes_per_component): Unknown or not supported format: 538982482 

Yeah, unfortunately we don't currently support the R8 format. We're actually working on adding this at the moment, along with a few other formats like YUV420 and NV12.
Comment 40 Kdwk 2023-10-20 06:45:47 PDT
I'm a little confused. I can play this video on other hardware and with software rendering. If WebKitGTK doesn't support this format wouldn't the video not be playable at all?
Comment 41 Erik Kurzinger 2023-10-20 06:47:56 PDT
> I'm a little confused. I can play this video on other hardware and with software rendering. If WebKitGTK doesn't support this format wouldn't the video not be playable at all?

Sorry, I meant "we" as in the NVIDIA driver. Or more specifically, the NVIDIA driver's GBM library.
Comment 42 Michael Catanzaro 2023-10-20 07:44:09 PDT
(In reply to Carlos Garcia Campos from comment #38)
> I think we force linear usage for video frame, so I guess that's why.

Let's use bug #260654 for the problem with video playback.
Comment 43 Erik Kurzinger 2023-10-20 09:35:56 PDT
Regarding the gbm_bo_map failure mentioned earlier, I was able to reproduce this if I force GTK to use GLX and the issue does appear to be that you're trying to map a non-linear buffer.

Our hardware can only render to memory with a so-called block-linear (tiled) layout, so if you allocate a renderable GBM buffer that is what you will get. Our GBM implementation does not allow CPU mapping such buffers, though.
Comment 44 Kdwk 2023-10-22 05:27:47 PDT
Regarding egl-gbm not present/ not working in Flatpak, I have filed this at gnome-build-meta: https://gitlab.gnome.org/GNOME/gnome-build-meta/-/issues/754
Comment 45 Kdwk 2023-10-27 06:50:49 PDT
Created attachment 468366 [details]
Screenshot showing Epiphany Tech Preview having non-blank WebView despite not setting any environment variables

I am amazed to see that Epiphany Technology Preview can now show web content (not be blank) without a WebKitGTK update or setting WEBKIT_DISABLE_DMABUF_RENDERER=1. Perhaps egl-gbm was fixed in the Flatpak runtime? Is anyone else able to observe this?

Tested on Epiphany Technology Preview 45.0-39-g2e99d99cb+/ WebKitGTK 2.42.1
Comment 46 gfrank227 2023-10-27 07:02:48 PDT
Yes, I can confirm. I Just opened all three programs that were giving me issues without setting parameters:

Trgui-NG
Yuzu
and webkit://gpu in epiphany and they all worked properly. !!!

It could be the latest version of the nvidia driver that was updated yesterday. But I noticed that webkit2gtk was updated 8 days ago. I believe that I still had the problem 8 days ago but I can't swear to it.
Comment 47 Kdwk 2023-10-27 07:16:42 PDT
Are you also running into https://bugs.webkit.org/show_bug.cgi?id=263779 ?
Comment 48 gfrank227 2023-10-27 07:37:33 PDT
(In reply to Kdwk from comment #47)
> Are you also running into https://bugs.webkit.org/show_bug.cgi?id=263779 ?

Using flatpak's epiphany, all websites are white, which is quite interesting. They don't render at all (and that would be using flatpak's own runtimes). But on the regular epiphany through the arch repositories, everything seems fine, including reddit.
Comment 49 Kdwk 2023-10-27 07:39:32 PDT
Try to use Epiphany Tech Preview (org.gnome.Epiphany.Devel) from Gnome Nightly. That one is the one that works. Gnome Web stable still doesn't work.
Comment 50 gfrank227 2023-10-31 06:21:23 PDT
OK so as of about two days ago, the program Trgui-ng, which is what I started this thread with, doesn't open AT ALL. It gives a core dump. 

It appears that it may be related somehow to this same issue (i'm not certain; hoping you guys can look). I'm attaching two backtraces of the errors i'm getting. The first one is the backtrace from the core dump running the program normally, and the second (backtrace2.txt) is a backtrace from attempting to run the program with the WEBKIT_DISABLE_DMABUF_RENDERER=1 variable.

I hoping that you might be able to figure out what's causing the crash, and perhaps even be able to use it to fix any  underlying issues related to the bug. Thanks!
Comment 51 gfrank227 2023-10-31 06:21:57 PDT
Created attachment 468424 [details]
Backtrace #1 - running trgui-ng normally.
Comment 52 Michael Catanzaro 2023-10-31 06:30:50 PDT
Hi, please report a separate bug for this crash. There are already at least three or four separate issues here. Realistically, only one bug is likely to be fixed per issue report and all the others are likely to be forgotten, so it's really important to have separate bug reports for separate issues.
Comment 53 Michael Catanzaro 2023-11-08 13:41:20 PST
*** Bug 259644 has been marked as a duplicate of this bug. ***
Comment 54 Michael Catanzaro 2024-02-07 09:04:47 PST
Hi gfrank, are you still experiencing this issue?

What WebKitGTK and NVIDIA driver versions are you using currently?
Comment 55 gfrank227 2024-02-07 09:22:06 PST
(In reply to Michael Catanzaro from comment #54)
> Hi gfrank, are you still experiencing this issue?
> 
> What WebKitGTK and NVIDIA driver versions are you using currently?

Yes, the issue is still present, but for awhile now, but it has changed slightly. Now, the program gives a segfault when I try to start it up without the environment variable. It only happens when running on nvidia card. The workaround with the environment variable still works as expected, which is how I know it's the same problem.

Just updated to the latest 2.42.5 and nvidia drivers are 545.29.06.
Comment 56 Georges Basile Stavracas Neto 2024-02-21 08:15:41 PST
Erik, do you know why the NVIDIA driver is doing a blocking D-Bus call using libdbus when initializing EGL?
Comment 57 Erik Kurzinger 2024-02-21 08:40:57 PST
Yeah, we use dbus to communicate with the nvidia-powerd service as part of our "dynamic boost" power management feature https://download.nvidia.com/XFree86/Linux-x86_64/530.41.03/README/dynamicboost.html

The reason it's crashing is actually somewhat interesting. It's due to a bug/quirk in the rust tool-chain which I described here https://internals.rust-lang.org/t/global-symbols-from-statically-linked-system-libraries/19954
Comment 58 Georges Basile Stavracas Neto 2024-02-21 09:03:15 PST
(In reply to Erik Kurzinger from comment #57)
> Yeah, we use dbus to communicate with the nvidia-powerd service as part of
> our "dynamic boost" power management feature
> https://download.nvidia.com/XFree86/Linux-x86_64/530.41.03/README/
> dynamicboost.html

I see, thanks.

> The reason it's crashing is actually somewhat interesting. It's due to a
> bug/quirk in the rust tool-chain which I described here
> https://internals.rust-lang.org/t/global-symbols-from-statically-linked-
> system-libraries/19954


Hah, nice. Good read. So at least that is not caused by WebKitGTK directly.
Comment 59 Michael Catanzaro 2024-02-22 05:45:26 PST
(In reply to Carlos Garcia Campos from comment #32)
> If GBM lib is installed and modeset enabled and you still get a white screen:
> 
> I guess you are under X11. In this case I think we are failing to map the
> imported DMA-BUF, gbm_bo_map() fails, but I still don't know why. A possible
> workaround if you are using GTK4 if to use GDK_DEBUG=gl-egl to force GTK to
> use EGL, so that we don't need to map the imported buffer. There's no
> workaround for GTK3, we need to figure out why gbm_bo_map is failing and fix
> it.

Did we figure out what's wrong in this scenario?

(In reply to gfrank227 from comment #55)
> Yes, the issue is still present, but for awhile now, but it has changed
> slightly. Now, the program gives a segfault when I try to start it up
> without the environment variable. It only happens when running on nvidia
> card. The workaround with the environment variable still works as expected,
> which is how I know it's the same problem.

Can you please test with a different application that doesn't use Rust? Example: Epiphany, or WebKitGTK's MiniBrowser
Comment 60 Michael Catanzaro 2024-02-22 05:47:42 PST
(In reply to Erik Kurzinger from comment #57)
> The reason it's crashing is actually somewhat interesting. It's due to a
> bug/quirk in the rust tool-chain which I described here
> https://internals.rust-lang.org/t/global-symbols-from-statically-linked-
> system-libraries/19954

This is a really good explanation. Thanks for investigating it, Erik. I'd say there's clearly nothing we can do about that and it's a problem for application developers to deal with.
Comment 61 Carlos Garcia Campos 2024-02-22 07:48:51 PST
(In reply to Michael Catanzaro from comment #59)
> (In reply to Carlos Garcia Campos from comment #32)
> > If GBM lib is installed and modeset enabled and you still get a white screen:
> > 
> > I guess you are under X11. In this case I think we are failing to map the
> > imported DMA-BUF, gbm_bo_map() fails, but I still don't know why. A possible
> > workaround if you are using GTK4 if to use GDK_DEBUG=gl-egl to force GTK to
> > use EGL, so that we don't need to map the imported buffer. There's no
> > workaround for GTK3, we need to figure out why gbm_bo_map is failing and fix
> > it.
> 
> Did we figure out what's wrong in this scenario?

See comment 34
Comment 62 gfrank227 2024-02-22 14:54:21 PST
?
> 
> (In reply to gfrank227 from comment #55)
> > Yes, the issue is still present, but for awhile now, but it has changed
> > slightly. Now, the program gives a segfault when I try to start it up
> > without the environment variable. It only happens when running on nvidia
> > card. The workaround with the environment variable still works as expected,
> > which is how I know it's the same problem.
> 
> Can you please test with a different application that doesn't use Rust?
> Example: Epiphany, or WebKitGTK's MiniBrowser

Yes, I hadn't done that in awhile. When I first saw this bug, it was happening in epiphany as well. the webkit://gpu page would crash the system. but that is no longer happening. everything seems to be working as expected with epiphany.
Comment 63 Michael Catanzaro 2024-02-22 15:32:16 PST
OK, thanks everyone: gfrank, Erik, Carlos Garcia, and Georges.

I understand there really was either a WebKit or a NVIDIA bug here originally, but that's no longer the case with current WebKitGTK and NVIDIA driver versions. This remaining problem with the statically-linked libdbus is very unfortunate, but it's clearly not WebKit's fault and it's not NVIDIA's fault either. This looks squarely on the application (or the fragile Rust static linking ecosystem). I'm going to close this now as MOVED. Please move back to the trgui-ng issue tracker https://github.com/openscopeproject/TrguiNG/issues/84 for tracking the problem with the two conflicting libdbus instances.

(In reply to Arthur Moraes do Lago from comment #26)
> I'm having the same problem, lightdm-webkit2-greeter stopped, epiphany shows
> a blank screen, etc. I intended to help capturing the core dump you guys
> have been asking, but I did not encounter a segfault at any point. I'm
> running an NVIDIA graphics card too.
> /sys/module/nvidia_drm/parameters/modeset is set to N.
> I get the following errors in logs of both programs:
> ```
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create):
> DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 3840x2160: Invalid argument
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create):
> DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 3840x2160: Invalid argument
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create):
> DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 3840x2160: Invalid argument
> Failed to create EGL images for DMABufs with file descriptors -1, -1 and -1
> 
> ```

I think this is the second case from comment #32, right? Try following Carlos Garcia's suggestion; I assume that should fix this for you. If not, new bug report please.
Comment 64 Thomas Zajic 2024-02-25 01:16:57 PST
Hi Michael,

I'm a bit confused now, TBH - when talking about "Carlos Garcia's suggestion", do you mean setting either of WEBKIT_DISABLE_DMABUF_RENDERER or WEBKIT_DISABLE_COMPOSITING_MODE to 1?

I'm currently using webkitgkt-2.42.5. With NVIDIA driver version 535.154.05, I used to get errors like this:

> [zlatko@disclosure:~]$ liferea
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 469x191: Invalid argument
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 469x191: Invalid argument
> src/nv_gbm.c:99: GBM-DRV error (nv_gbm_bo_create): DRM_IOCTL_NVIDIA_GEM_ALLOC_NVKMS_MEMORY failed (ret=-1)
> 
> Failed to create GBM buffer of size 469x191: Invalid argument
> Failed to create EGL images for DMABufs with file descriptors -1, -1 and -1

Yesterday I upgraded to NVIDIA driver version 550.54.14, and the error messages changed to:

> [zlatko@disclosure:~]$ WEBKIT_DISABLE_DMABUF_RENDERER=0 liferea
> src/nv_gbm.c:288: GBM-DRV error (nv_gbm_create_device_native): nv_common_gbm_create_device failed (ret=-1)
> 
> Failed to create GBM buffer of size 469x191: Permission denied
> Failed to create GBM buffer of size 469x191: Permission denied
> Failed to create GBM buffer of size 469x191: Permission denied
> Failed to create EGL images for DMABufs with file descriptors -1, -1 and -1
> [zlatko@disclosure:~]$ WEBKIT_DISABLE_DMABUF_RENDERER=0 epiphany 
> src/nv_gbm.c:288: GBM-DRV error (nv_gbm_create_device_native): nv_common_gbm_create_device failed (ret=-1)
> 
> Failed to create GBM buffer of size 1910x964: Permission denied
> Failed to create GBM buffer of size 1910x964: Permission denied
> Failed to create GBM buffer of size 1910x964: Permission denied
> Failed to create EGL images for DMABufs with file descriptors -1, -1 and -1

With both driver versions and both applications tested, the applications don't crash or segfault, but the UI space where webkitgtk is supposed to be rendering something ist empty. The content is actually there, as the mouse pointer changes if you hover above an input form, button, etc., but the renderered elements are actually invisible. Also, with both driver versions and both applications tested, the problem can be fixed/worked around by setting either WEBKIT_DISABLE_DMABUF_RENDERER or WEBKIT_DISABLE_COMPOSITING_MODE to 1.

liferea is using GTK3 and the 4.0 ABI (ie. libwebkit2gtk-4.0.so), epiphany is using GTK4 and the 6.0 ABI (ie. libwebkitgtk-6.0.so). The GTK versions I'm using are gtk+-3.24.41 and gtk-4.12.5.

So, my questions are:

1. Is this the same bug/problem, or are these different ones for GTK3 and GTK4?
2. If this is the same bug, is there already an open bugzilla bug# for it (and what's the actual bug#), or should I open a new one?
3. If these are two different bugs, are there already open bugzilla bug#s for them (and what's the actual bug#s), or should I open new ones?
4. Or is this actually a bug somewhere else (ie. NVIDIA, GTK, Rust, ...) which webkitgtk can't really do anything about, other than working around with setting environment variables?
5. If 4., am I right in assuming that anything GBM buffer (whatever GBM buffers might be :-D) related is caused by the rust linking problem that popped up here & there, and that exactly is the actual root of the problem?

I admit all this hopping back & forth between #261874 and #228268, and both bug#s cross referencing both each other and various more bug#s (which are not related to the problem *I* see) has me got quite confused ... sorry! :-)

Thanks for your patience,
Thomas
Comment 65 Michael Catanzaro 2024-02-25 06:46:32 PST
(In reply to Thomas Zajic from comment #64)
> Yesterday I upgraded to NVIDIA driver version 550.54.14, and the error
> messages changed to:
> 
> > [zlatko@disclosure:~]$ WEBKIT_DISABLE_DMABUF_RENDERER=0 liferea
> > src/nv_gbm.c:288: GBM-DRV error (nv_gbm_create_device_native): nv_common_gbm_create_device failed (ret=-1)

Nobody has reported this error before, so you'll want to create a new bug report. But make sure to review comment #32 first.
Comment 66 Erik Kurzinger 2024-02-25 09:36:57 PST
> src/nv_gbm.c:288: GBM-DRV error (nv_gbm_create_device_native): nv_common_gbm_create_device failed (ret=-1)

Note that with 535 and later NVIDIA drivers gbm_create_device will fail if the nvidia-drm kernel module is not loaded with the parameter "modeset=1". With earlier drivers, gbm_create_device would succeed in that case but trying to allocate a buffer would fail.
Comment 67 Erik Kurzinger 2024-02-25 09:38:24 PST
Sorry, typo in my previous comment, I meant 545 and later.