Created attachment 444547 [details] An animated gif that illustrates my problem. * Browser: Epiphany (Gnome Web) version 41, RPM version. * FlatPak from Flathub also tested, bug was reproduced * WebKitGTK 2.34.1 * Fedora 35 using Gnome 41 - 5.14.17-301.fc35.x86_64 Please see Fedora Project forum thread for reference: https://ask.fedoraproject.org/t/epiphany-gnome-web-webkit-intermittently-breaks I am getting browser-breaking DNS failures and content timeouts that are rendering Epiphany unusable. These problems are not occurring with Firefox or any Blink (Chromium) based browser. I am unsure of how to troubleshoot this issue and am happy to follow any steps required.
I've done some troubleshooting with Red Hat to try and rule out my system configuration as a problem. It looks like there's decent enough reason to believe that this is some sort of problem with Epiphany/WebKit: https://bugzilla.redhat.com/show_bug.cgi?id=2024330
(In reply to Trev from comment #0) > I am getting browser-breaking DNS failures and content timeouts that are > rendering Epiphany unusable. These problems are not occurring with Firefox > or any Blink (Chromium) based browser. I am unsure of how to troubleshoot > this issue and am happy to follow any steps required. What we've established is that GResolver is working fine outside WebKit, but WebKit is nevertheless blocking for very long periods of time when making DNS requests. So whatever is going wrong is happening at either libsoup level or WebKit level. I guess WebKit is more likely. I wonder if the NetworkProcess is somehow struggling or blocking on something unexpected, but tbh I don't know how to debug this either.
https://bugzilla-attachments.redhat.com/attachment.cgi?id=1842443 is the proof that something is going wrong with DNS.
So looking at where the timings come from, I found NetworkDataTaskSoup.cpp, which is just watching GSocketClientEvents. Next step might be to add some debug there to see what it's doing. Alternatively, we could write a test script that just tests GSocketClient on its own, though I have a suspicion that it will run fine, just like the GResolver test did.
BTW just to be certain: no network proxies configured, yes?
(In reply to Michael Catanzaro from comment #5) > BTW just to be certain: no network proxies configured, yes? Other than using public DNS, no proxies are being used. Problem was reproduced with both Google and Cloudflare public DNS services. As you know, Fedora tends to serve resolutions from a local stub after the fact.
Yeah, no proxies then. Network proxies are different. ;)
Created attachment 444970 [details] Another test script One more test script. This is to see if we can reproduce the problem using GSocketClient directly. Note this test script discovered a strange bug in GSocketClient: it is emitting the G_SOCKET_CLIENT_CONNECTING event twice per connection, which is weird because we ought to have tests to notice that.
(In reply to Michael Catanzaro from comment #8) > Note this test script discovered a strange bug in GSocketClient: it is > emitting the G_SOCKET_CLIENT_CONNECTING event twice per connection, which is > weird because we ought to have tests to notice that. Apparently it's actually allowed, per the documentation. This happens when connecting to a particular IP address times out, and the next one in the list is attempted.
(In reply to Michael Catanzaro from comment #8) > Created attachment 444970 [details] > Another test script > > One more test script. This is to see if we can reproduce the problem using > GSocketClient directly. > > Note this test script discovered a strange bug in GSocketClient: it is > emitting the G_SOCKET_CLIENT_CONNECTING event twice per connection, which is > weird because we ought to have tests to notice that. Output: ``` λ ./test-socket-client Gjs-Message: 11:03:57.847: JS LOG: Connecting to example.com... Gjs-Message: 11:03:57.849: JS LOG: Resolving example.com... Gjs-Message: 11:03:57.850: JS LOG: Connecting to discussion.fedoraproject.org... Gjs-Message: 11:03:57.850: JS LOG: Resolving discussion.fedoraproject.org... Gjs-Message: 11:04:01.651: JS LOG: Resolved example.com after 3802.394 ms Gjs-Message: 11:04:01.652: JS LOG: TCP/IP connecting to example.com... Gjs-Message: 11:04:01.673: JS LOG: TCP/IP connected to example.com after 21.029 ms Gjs-Message: 11:04:01.730: JS LOG: TLS handshaking for example.com... Gjs-Message: 11:04:01.756: JS LOG: TLS handshaked for example.com after 26.555 ms Gjs-Message: 11:04:01.757: JS LOG: Finished connection to example.com after 3909.739 ms total Gjs-Message: 11:04:05.401: JS LOG: Resolved discussion.fedoraproject.org after 7550.626 ms Gjs-Message: 11:04:05.401: JS LOG: TCP/IP connecting to discussion.fedoraproject.org... Gjs-Message: 11:04:05.417: JS LOG: TCP/IP connected to discussion.fedoraproject.org after 15.541 ms Gjs-Message: 11:04:05.420: JS LOG: TLS handshaking for discussion.fedoraproject.org... Gjs-Message: 11:04:05.468: JS LOG: TLS handshaked for discussion.fedoraproject.org after 48.148 ms Gjs-Message: 11:04:05.468: JS LOG: Finished connection to discussion.fedoraproject.org after 7618.382 ms total ```
OK, so not WebKit after all. Wasn't expecting that. This is weird: how is it that GResolver can resolve the hostnames quickly, but GSocketClient goes super slow? That doesn't make sense, because it's just using GResolver. Whatever. This is actually good because debugging GSocketClient should be way easier than debugging WebKit. Let's continue back in https://bugzilla.redhat.com/show_bug.cgi?id=2024330.