I'm filing this against WebKitGtk to be conservative, but maybe it's a more general problem. There is a problem in epiphany where sometimes you click on a link and it claims there's no such host, even when that's obviously untrue, and in fact, clicking reload will load the page successfully at this point. The bug seems to go like this: 1. User loads a page with a billion links 2. WebKit calls prefetchDNS a billion times 3. In the Gtk version at least, this results in a billion getaddrinfo() calls 4. The upstream DNS server starts sending back DNS responses, but does so slowly. Probably because it's a wi-fi hub powered by a relatively slow processor and it's processing the requests more-or-less in serial rather than in parallel. 5. User clicks a link partway down the page. Meanwhile, the wi-fi hub continues slowly sending back DNS responses. 6. WebKitGtk/libsoup sees that there's an outstanding DNS request for the host in the clicked-on link, and waits for it to complete. Meanwhile, the wi-fi hub continues slowly sending back DNS responses. 7. After 5 seconds, glibc times out all of the DNS requests that are still outstanding. 8. WebKitGtk/libsoup get the DNS error and show it to the user. network/cf/DNSCFNet.cpp seems to have some spam-preventing code. Perhaps that should be happening at a higher level. Alternatively, maybe we don't need to prefetchDNS on every single link on the page at page load time. For Gtk at least, we're going to want to distinguish page-load prefetching from on-hover prefetching anyway, because we want to do proxy URI prefetching on hover, but not on page load (qv https://bugzilla.gnome.org/show_bug.cgi?id=605065)
I actually think that this should happen at OS name resolver level (for OSes that have one) - there is more than one process running at a time, so there is nothing a web browser can do to reliably prevent this sort of problems. But that's just wishful thinking, in practice WebKit has to throttle DNS requests on its own.
I also suspect that this may be why when I upgraded to Safari 5 that loading a large number of tabs simultaneously caused many of the later tabs to time out. The first couple of tabs would presumably load, causing many DNS look ups to be fired off, and cause some of the later DNS look ups to load other tabs to time out. If you load multiple pages simultaneously (which I do all the time, with a folder set to open multiple tabs) then these top level tabs should take precedence over DNS look ups that are the result of loading a page.
> many of the later tabs to time out Is that fixed for you in nightly builds?
I'll check tonight on my home computer - I ended up switching to use Google's DNS servers, which improved it a lot, as my ISP's DNS servers seemed to time out 75% of the tabs that were loading.
The latest nightly, r60868 does much better when I use my ISPs DNS name server, with none of the dozen or tabs timing out on their DNS look ups. Look forward to that change being released in a Safari update.
Actually I think this issue is the source of this https://bugs.webkit.org/show_bug.cgi?id=42052 That page is full of links. Firefox loads it reasonably fast, but on epiphany it takes a lot and some of the images are not even shown at the end, most likely due to DNS errors
(In reply to comment #6) > Actually I think this issue is the source of this https://bugs.webkit.org/show_bug.cgi?id=42052 > > That page is full of links. Firefox loads it reasonably fast, but on epiphany it takes a lot and some of the images are not even shown at the end, most likely due to DNS errors Actually it's not caused by DNSprefetching, disabling it does not fix anything. What's more, oprofile shows libz and libfreetype on top so forget about my suggestion that they both could be the same bug.
OK so after discussing this with Dan I decided to implement a queue for DNS prefetching control for the libsoup backend similar to what other ports have. It works pretty well (for example bug 80587 is fixed by that patch) but there are two important things missing: 1- proxy detection. Is there any reliable way to detect that we're under a proxy without depending on GNOME system settings and the like? I guess we can completely get rid of prefetching when behind a proxy. 2- soup_session_prepare_for_uri() does not have a callback. We need it to limit the maximum amount of simultaneous DNS requests we issue. We can add a signal to libsoup or deprecate it and add a new one with the usual _async() parameters. I think I'm more in favour of the latter. Dan?
*** Bug 80587 has been marked as a duplicate of this bug. ***
Created attachment 131829 [details] libsoup Patch That'd be the libsoup patch. This adds some new API and the GNOME 3.4 API freeze is already gone. If having this patch is interesting enough for 3.4 we'd have to ask for an exception.
Created attachment 131831 [details] WK Patch Adds a queue (stolen from CFNET network code) that gracefully throttles DNS pre-fetching requests.
(In reply to comment #10) > That'd be the libsoup patch. This adds some new API and the GNOME 3.4 API freeze is already gone. If having this patch is interesting enough for 3.4 we'd have to ask for an exception. API Freeze only applies to the "Platform" components (glib, gtk, etc). Although it's not encouraged, we can change libsoup's API right up until hard code freeze. (Which is Monday.) And it's clear that the libsoup patch is pretty safe, so I don't mind adding it. (The WebKit patch is bigger, but that's someone else's problem :) On the patch itself, since we've given up on doing anything more clever than just DNS, I'd make it "soup_session_prefetch_dns", with a hostname, instead of "prefetch_uri" (which then also saves you having to fake up a URI in DNSResolveQueue::resolve()). Also, add this bug URL to the commit message. (In reply to comment #11) > WK Patch drop the proxyIsEnabledForDefaultSession() stuff; it's not really reliably in either direction; just because a proxy resolver is in use doesn't mean that there's a proxy configured. And OTOH if you're using SOCKS4, you have to do DNS on the client anyway (because the protocol only uses IP addresses). We can add smarts to SoupSession/SoupProxyURIResolver/GProxyResolver later if we want to implement this check correctly.
(In reply to comment #11) > Created an attachment (id=131831) [details] > WK Patch > > Adds a queue (stolen from CFNET network code) that gracefully throttles DNS pre-fetching requests. I'm no expert here, but this sees sane enough for me. Perhaps we can share most of this with the other port in a base class, since it looks quite generic.
Comment on attachment 131831 [details] WK Patch View in context: https://bugs.webkit.org/attachment.cgi?id=131831&action=review Looks good. I have a couple nits below. We should probably at least have a plan with how to deal with the proxy issue. > Source/WebCore/platform/network/soup/DNSSoup.cpp:30 > > #include "CString.h" I think you need to expand the Apple copyright to 2009 to match DNSCFNet. > Source/WebCore/platform/network/soup/DNSSoup.cpp:102 > + // if (proxyURI || soup_session_get_feature(soupSession, SOUP_TYPE_PROXY_URI_RESOLVER)) Simply remove commented out code. > Source/WebCore/platform/network/soup/DNSSoup.cpp:158 > +static void resolvedCallback(SoupAddress *soupAddress, guint status, gpointer userData) Small style issue here with asterisk placement. Please use void* intead of gpointer.
Created attachment 132010 [details] Patch
Created attachment 132012 [details] libsoup Patch libsoup patch with soup_session_prefetch_dns()
Created attachment 132013 [details] Patch properly rebased now
Comment on attachment 132013 [details] Patch Attachment 132013 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/11952738
BTW this patch should also bump the libsoup dependency up to 2.38
Comment on attachment 132013 [details] Patch Attachment 132013 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/11951695
Comment on attachment 132013 [details] Patch Attachment 132013 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/11953739
Created attachment 132029 [details] Patch next attempt to fix windows build
Comment on attachment 132029 [details] Patch Attachment 132029 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/11957554
Comment on attachment 132012 [details] libsoup Patch hm... ok, I'd forgotten how some of this worked. In addition to being a mess, this patch only works for the case of http://hostname:80; prefetching an https URI or a URI on a non-default port will have no effect. This could be fixed, but it would require more rewriting, so I'm thinking maybe not for 2.38 now? You could still change the webkit side to use the common infrastructure if you wanted, and just fake up the URI on the webkit side as in the earlier patch (though again, this would still have the same problem that it would only work for http-on-the-default-port, until we fixed the libsoup side)
(In reply to comment #24) > (From update of attachment 132012 [details]) > hm... ok, I'd forgotten how some of this worked. In addition to being a mess, this patch only works for the case of http://hostname:80; prefetching an https URI or a URI on a non-default port will have no effect. So why not use a SoupURI as the first version of the patch? > You could still change the webkit side to use the common infrastructure if you wanted, and just fake up the URI on the webkit side as in the earlier patch (though again, this would still have the same problem that it would only work for http-on-the-default-port, until we fixed the libsoup side) Unfortunately the wk side requires a callback in the prefetch libsoup call to work.
Comment on attachment 132029 [details] Patch Attachment 132029 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/11957571
(In reply to comment #25) > So why not use a SoupURI as the first version of the patch? We could, but that would require keeping the special libsoup-only prefetch code in WebKit. > Unfortunately the wk side requires a callback in the prefetch libsoup call to work. ah... yeah... well, maybe this is OK for now then
Comment on attachment 132012 [details] libsoup Patch View in context: https://bugs.webkit.org/attachment.cgi?id=132012&action=review > libsoup/soup-session.c:2376 > + soup_session_prefetch_dns (session, uri->host, NULL, NULL, NULL); Actually, given that we need to use a URI internally, it makes more sense to leave soup_session_prepare_for_uri() as it is, and have soup_session_prefetch_dns() call that, rather than vice versa. > libsoup/soup-session.c:2406 > +* shortly, and so the session can try to prepare (resolving the domain > +* name, obtaining proxy address, etc.) in order to work more quickly The proxy stuff isn't relevant any more. "...and so the session can try to prepare by resolving the domain name in advance, in order to work more quickly..." > libsoup/soup-session.c:2434 > + uri = soup_uri_new (NULL); Add a comment before this: /* FIXME: Prefetching should work for both HTTP and HTTPS */ > libsoup/soup-session.c:2439 > + if (!SOUP_URI_IS_VALID(uri)) { This can't happen; SOUP_URI_IS_VALID just checks that scheme and path are set, and you just set both of them yourself. So just remove this whole part. (If @hostname is invalid then soup_address_resolve_async() will fail with an appropriate error.) > libsoup/soup-session.h:10 > +#include <libsoup/soup-address.h> > #include <libsoup/soup-types.h> swap those. (soup-types.h always comes first, by convention)
Created attachment 132062 [details] libsoup Patch I guess something like this then.
Comment on attachment 132062 [details] libsoup Patch looks good. feel free to commit.
Comment on attachment 132029 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=132029&action=review It's great that you're trying to share this code. Hook up with some people on the Windows and Mac ports to see how you can add the appropriate files to the build (see below). > Source/WebCore/platform/network/DNS.h:38 > +class DNSResolveQueue : public TimerBase { Ideally DNSResolveQueue should have it's own header and source file. Perhaps ping someone from the Mac port about adding the new files to the Xcode project. > Source/WebCore/platform/network/DNS.h:65 > + void add(const String& hostname) > + { > + // If there are no names queued, and few enough are in flight, resolve immediately (the mouse may be over a link). > + if (!m_names.size()) { > + if (proxyIsEnabledInSystemPreferences()) > + return; A method of this size should be in source file instead of a header. > Source/WebCore/platform/network/DNS.h:90 > + virtual bool proxyIsEnabledInSystemPreferences() = 0; > + virtual void resolve(const String&) = 0; You actually don't need to make this an abstract class. While that's useful to do if need to instantiate different types of DNSResolveQueues at runtime, that isn't the case. There is only ever one networking backend at a time in WebKit. Instead what you can do here is to use the pattern that GraphicsContext and the like do where there's a bool platformProxyIsEnabledInSystemPreferences(); void platformResolve(const String& address); And those implementations are in the platform-specific directory in a different file. I think this is the preferred method in WebCore. > Source/WebCore/platform/network/DNS.h:94 > + void fired() > + { > + if (proxyIsEnabledInSystemPreferences()) { Ditto. > Source/WebCore/platform/network/cf/DNSCFNet.cpp:55 > +const int DNSResolveQueue::namesToResolveImmediately = 4; > +const double DNSResolveQueue::coalesceDelayInSeconds = 1.0; > +// CFHost doesn't currently throttle for us, see <rdar://8105550>. > +const int DNSResolveQueue::maxSimultaneousRequests = 8; > +const int DNSResolveQueue::maxRequestsToQueue = 64; > +const double DNSResolveQueue::retryResolvingInSeconds = 0.1; These don't differ between ports, so I prefer to keep them as static variables here. There are all kinds of tricky issues with static members that you probably don't even want to get near if you don't have to. http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.14
Created attachment 132560 [details] Patch
Comment on attachment 132560 [details] Patch Attachment 132560 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/11984402
(In reply to comment #32) > Created an attachment (id=132560) [details] > Patch I noticed that I forgot to remove a g_print. It also needs a libsoup version bump.
Comment on attachment 132560 [details] Patch Attachment 132560 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/11989053
Comment on attachment 132560 [details] Patch Attachment 132560 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/11980675
Comment on attachment 132560 [details] Patch Attachment 132560 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/11980939
Comment on attachment 132560 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=132560&action=review You need to bump LIBSOUP_REQUIRED_VERSION to 2.37.92 in configure.ac > Source/WebCore/platform/network/soup/DNSSoup.cpp:55 > + g_print("Resolving %s\n", hostname.utf8().data()); debugging leftover...
(In reply to comment #38) > (From update of attachment 132560 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=132560&action=review > > You need to bump LIBSOUP_REQUIRED_VERSION to 2.37.92 in configure.ac If you do so, please remember to bump the libsoup version in Source/cmake/OptionsEfl.cmake too.
(In reply to comment #39) > (In reply to comment #38) > > (From update of attachment 132560 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=132560&action=review > > > > You need to bump LIBSOUP_REQUIRED_VERSION to 2.37.92 in configure.ac > > If you do so, please remember to bump the libsoup version in Source/cmake/OptionsEfl.cmake too. And perhaps in gtk's and efl's jhbuild.modules too?
Created attachment 133282 [details] Patch
Comment on attachment 133282 [details] Patch Attachment 133282 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/12117101
Comment on attachment 133282 [details] Patch Attachment 133282 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/12117105
Comment on attachment 133282 [details] Patch Attachment 133282 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12116125
Created attachment 133320 [details] Patch
Comment on attachment 133320 [details] Patch Attachment 133320 [details] did not pass win-ews (win): Output: http://queues.webkit.org/results/12121093
Created attachment 133325 [details] Patch
(In reply to comment #47) > Created an attachment (id=133325) [details] > Patch Hopefully everything is ready now (forgot to move static members => static vars).
Comment on attachment 133325 [details] Patch Attachment 133325 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/12120168
Comment on attachment 133325 [details] Patch You probably need to update Tools/efl/jhbuild.modules as well.
Created attachment 133450 [details] Patch
Comment on attachment 133450 [details] Patch Attachment 133450 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/12113589
While trying to check what's going on with the EFL bot, I applied the patch locally and got the following error while building the new libsoup: make[3]: Entering directory `/home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup' GISCAN Soup-2.4.gir /home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup/tmp-introspect6ZUrTM/Soup-2.4.c: In function 'main': /home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup/tmp-introspect6ZUrTM/Soup-2.4.c:577:3: warning: 'g_thread_init' is deprecated (declared at /home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Root/include/glib-2.0/glib/deprecated/gthread.h:259) [-Wdeprecated-declarations] /usr/bin/ld: /home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup/tmp-introspect6ZUrTM/Soup-2.4.o: undefined reference to symbol 'g_module_open' /usr/bin/ld: note: 'g_module_open' is defined in DSO /home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Root/lib/libgmodule-2.0.so.0 so try adding it to the linker command line /home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Root/lib/libgmodule-2.0.so.0: could not read symbols: Invalid operation collect2: ld returned 1 exit status linking of temporary binary failed: Command '['/bin/sh', '../libtool', '--mode=link', '--tag=CC', '--silent', '/opt/icecream/bin/gcc', '-o', '/home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup/tmp-introspect6ZUrTM/Soup-2.4', '-export-dynamic', '-L.', 'libsoup-2.4.la', '-pthread', '-L/home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Root/lib', '-lgio-2.0', '-lgobject-2.0', '-lgthread-2.0', '-lrt', '-lglib-2.0', '/home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup/tmp-introspect6ZUrTM/Soup-2.4.o']' returned non-zero exit status 1 make[3]: *** [Soup-2.4.gir] Error 1 make[3]: Leaving directory `/home/rakuco/dev/webkit/WebKit/WebKitBuild/Dependencies/Source/libsoup/libsoup'
that's caused by mismatched use of system and jhbuild libraries. Possibly, trying to use an older system-installed g-ir-scanner against libraries linked to a newer glib.
(In reply to comment #54) > that's caused by mismatched use of system and jhbuild libraries. Possibly, trying to use an older system-installed g-ir-scanner against libraries linked to a newer glib. Hints on how to solve that? Does any specific environment variable need to be exported to jhbuild to make it look for the right files?
The only sane approach going forward I think is to include gobject-introspection in the jhbuild setup. For the short term, though, I've been considering disabling introspection all the way.
Comment on attachment 133450 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=133450&action=review Looks okay, but before landing: 1. Figure out how to not break EFL. :) 2. Take a look at the line below: > Source/WebCore/platform/network/cf/DNSCFNet.cpp:95 > - atomicDecrement(&m_requestsInFlight); > + DNSResolveQueue::shared().decrementRequestCount(); Can't you just do: decrementRequestCount(); here. Isn't 'this' already 'DNSResolveQueue::shared()' ?
(In reply to comment #57) > (From update of attachment 133450 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=133450&action=review > > Looks okay, but before landing: > > 1. Figure out how to not break EFL. :) FYI, 1) build bot/Efl looks updated. 2) WebCore/CMakeLists.txt should know DNSResolveQueue.cpp. I got below with attachment 133450 [details] in my pc. [100%] Building CXX object Tools/DumpRenderTree/efl/CMakeFiles/bin/DumpRenderTree.dir/PixelDumpSupportEfl.cpp.o ../../../lib/libwebcore_efl.so.0.1.0: undefined reference to `vtable for WebCore::DNSResolveQueue' ../../../lib/libwebcore_efl.so.0.1.0: undefined reference to `WebCore::DNSResolveQueue::add(WTF::String const&)' collect2: ld returned 1 exit status make[2]: *** [bin/ImageDiff] Error 1 make[1]: *** [Tools/DumpRenderTree/efl/CMakeFiles/bin/ImageDiff.dir/all] Error 2
(In reply to comment #58) > (In reply to comment #57) > > (From update of attachment 133450 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=133450&action=review > > > > Looks okay, but before landing: > > > > 1. Figure out how to not break EFL. :) > > FYI, > 1) build bot/Efl looks updated. s/looks/should be/
(In reply to comment #58) > (In reply to comment #57) > > (From update of attachment 133450 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=133450&action=review > > > > Looks okay, but before landing: > > > > 1. Figure out how to not break EFL. :) > > FYI, > 1) build bot/Efl looks updated. > > 2) WebCore/CMakeLists.txt should know DNSResolveQueue.cpp. > > I got below with attachment 133450 [details] in my pc. Does it build properly if you add it to the CMakeLists.txt file? Because if that's the only remaining thing to do, I can add it to the patch and land it.
Committed r112396: <http://trac.webkit.org/changeset/112396>