Bug 20690

Summary: Support for DNS prefetch
Product: WebKit Reporter: Collin Jackson <collinj>
Component: WebCore Misc.Assignee: Collin Jackson <collinj>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, eric, jar, koivisto, laszlo.gombos, mbelshe, mrowe, opendarwin, sam, slewis
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Proposed patch
koivisto: review-
Updated patch with comments from Antti
none
Updated patch with improved layout test
none
Updated patch more comments from Antti
none
Updated patch with comments from Mark Rowe
mrowe: review+
Updated patch with more comments from Mark Rowe none

Description Collin Jackson 2008-09-06 12:49:06 PDT
DNS prefetching speculatively resolves host names in hyperlinks.  Once a host name has been resolved, if the user does navigate to that URL, the DNS lookup will already be in the local cache, reducing DNS resolution time.  This is already implemented in Chromium, but it would be better if we could merge the WebCore part of it into the WebKit repository so that other WebKit clients can use it as well.
Comment 1 Collin Jackson 2008-09-06 12:54:17 PDT
Created attachment 23220 [details]
Proposed patch
Comment 2 Antti Koivisto 2008-09-06 14:50:16 PDT
Comment on attachment 23220 [details]
Proposed patch

Looks generally good except for the code changes to Document::visitedLinkHash(). That code needs better place though so r-.

Could this be tested with HTTP layout test (at least for code coverage)?

+        If ENABLE(DNS_PREFETCH) is true, notify the FrameLoaderClient 
+        about hyperlinks and <link rel="dns-prefetch"> tags that are
+        encountered so that it can speculatively resolve hostnames.

I would prefer not to have a feature define for this. Having it enabled on platforms that don't support it looks harmless and avoiding configuration proliferation is a good thing.
 
+#if ENABLE(DNS_PREFETCH)
+#include "FrameLoaderClient.h"
+#endif

No need to make this conditional.

 void Document::updateFocusAppearanceSoon()
@@ -4188,6 +4202,18 @@ unsigned Document::visitedLinkHash(const
 
     bool hasColonSlashSlash = containsColonSlashSlash(characters, length);
 
+#if ENABLE(DNS_PREFETCH)
+    String prefetchHost;
+    if (hasColonSlashSlash) {
+        KURL parsedURL(attributeURL, m_baseURL);
+        prefetchHost = parsedURL.host();
+    } else
+        prefetchHost = m_baseURL.host();
+
+    if (frame() && (m_isDNSPrefetchEnabled || prefetchHost == m_securityOrigin->host()))
+        frame()->loader()->client()->prefetchDNS(prefetchHost);
+#endif

This function is for calculating a hash and this kind of code is inappropriate here. Please find the right place to put this code. Would simply putting it to HTMLAnchorElement::parseMappedAttribute() work?

+void Document::initDNSPrefetchEnabled()
+{

Maybe just call it initDNSPrefetch()?

+    m_haveExplicitlyDisabledDNSPrefetch = false;
+    m_isDNSPrefetchEnabled = (securityOrigin()->protocol() == "http");

No need for parenthesis here.

+    // Inherit DNS prefetch opt-out from parent frame    
+    if (Document* parent = parentDocument())
+        if (!parent->isDNSPrefetchEnabled())
+            m_isDNSPrefetchEnabled = false;

Outer if needs curly brackets according to the coding style.

+void Document::setDNSPrefetchControl(const String& dnsPrefetchControl)
+{

This is not really a setter, how about parseDNSPrefetchControlHeader() or something?

+    if (equalIgnoringCase(dnsPrefetchControl, "on") && 
+        !m_haveExplicitlyDisabledDNSPrefetch) {

can be single line

+        m_isDNSPrefetchEnabled = true;
+        return;
+    }
+
+    m_isDNSPrefetchEnabled = false;
+    m_haveExplicitlyDisabledDNSPrefetch = true;
+}
+#endif
+

+    void setDNSPrefetchControl(const WebCore::String&);

No need for WebCore:: prefix.

@@ -111,6 +111,7 @@ protected:
     bool m_alternate;
     bool m_isStyleSheet;
     bool m_isIcon;
+    bool m_isDnsPrefetch;
     bool m_createdByParser;
 };

m_isDNSPrefetch to match coding style.
Comment 3 Collin Jackson 2008-09-15 12:57:04 PDT
Created attachment 23445 [details]
Updated patch with comments from Antti

(In reply to comment #2)
> Could this be tested with HTTP layout test (at least for code coverage)?

Good idea. I've added one. 

> +        If ENABLE(DNS_PREFETCH) is true, notify the FrameLoaderClient 
> +        about hyperlinks and <link rel="dns-prefetch"> tags that are
> +        encountered so that it can speculatively resolve hostnames.
> 
> I would prefer not to have a feature define for this. Having it enabled on
> platforms that don't support it looks harmless and avoiding configuration
> proliferation is a good thing.

Ok. I've removed the feature define.

> +#if ENABLE(DNS_PREFETCH)
> +#include "FrameLoaderClient.h"
> +#endif
> 
> No need to make this conditional.

Ok. This include isn't used in Document.cpp any more, so I just removed it.

>  void Document::updateFocusAppearanceSoon()
> @@ -4188,6 +4202,18 @@ unsigned Document::visitedLinkHash(const
> 
>      bool hasColonSlashSlash = containsColonSlashSlash(characters, length);
> 
> +#if ENABLE(DNS_PREFETCH)
> +    String prefetchHost;
> +    if (hasColonSlashSlash) {
> +        KURL parsedURL(attributeURL, m_baseURL);
> +        prefetchHost = parsedURL.host();
> +    } else
> +        prefetchHost = m_baseURL.host();
> +
> +    if (frame() && (m_isDNSPrefetchEnabled || prefetchHost ==
> m_securityOrigin->host()))
> +        frame()->loader()->client()->prefetchDNS(prefetchHost);
> +#endif
> 
> This function is for calculating a hash and this kind of code is inappropriate
> here. Please find the right place to put this code. Would simply putting it to
> HTMLAnchorElement::parseMappedAttribute() work?

Ok, I moved it.

> +void Document::initDNSPrefetchEnabled()
> +{
> 
> Maybe just call it initDNSPrefetch()?

Ok. I renamed it to initDNSPrefetch().

> +    m_haveExplicitlyDisabledDNSPrefetch = false;
> +    m_isDNSPrefetchEnabled = (securityOrigin()->protocol() == "http");
> 
> No need for parenthesis here.

Fixed.

> +    // Inherit DNS prefetch opt-out from parent frame    
> +    if (Document* parent = parentDocument())
> +        if (!parent->isDNSPrefetchEnabled())
> +            m_isDNSPrefetchEnabled = false;
> 
> Outer if needs curly brackets according to the coding style.

Fixed.

> +void Document::setDNSPrefetchControl(const String& dnsPrefetchControl)
> +{
> 
> This is not really a setter, how about parseDNSPrefetchControlHeader() or
> something?

Ok. I've renamed it to parseDNSPrefetchControlHeader().

> +    if (equalIgnoringCase(dnsPrefetchControl, "on") && 
> +        !m_haveExplicitlyDisabledDNSPrefetch) {
> 
> can be single line

Fixed.

> +        m_isDNSPrefetchEnabled = true;
> +        return;
> +    }
> +
> +    m_isDNSPrefetchEnabled = false;
> +    m_haveExplicitlyDisabledDNSPrefetch = true;
> +}
> +#endif
> +
> 
> +    void setDNSPrefetchControl(const WebCore::String&);
> 
> No need for WebCore:: prefix.

Fixed.

> @@ -111,6 +111,7 @@ protected:
>      bool m_alternate;
>      bool m_isStyleSheet;
>      bool m_isIcon;
> +    bool m_isDnsPrefetch;
>      bool m_createdByParser;
>  };
> 
> m_isDNSPrefetch to match coding style.

Fixed.
Comment 4 Jeff Johnson 2008-09-15 23:24:50 PDT
I hope there will be a thorough analysis of the security, privacy, and performance implications of this before it's implemented. A few thoughts/questions:

1) I'm reading a page with a discussion of DNS cache poisoning, and the page contains, as an example, a link to a domain whose authoritative DNS server has been poisoned. Any problems there?

2) I'm reading a local html that contains links. Will that cause network access?

3) I innocently navigate to a page that happens to have a link to a porn web site. Perhaps it's a page of search results, or it's a link in some web advertising. Now my ISP has a record of me asking their DNS servers to resolve a porn web site's domain.

4) I navigate to a page with a very large number of links from different domains, e.g., a page of search results. Will that cause hundreds of DNS requests?

Admittedly, I don't know much about DNS prefetching, so maybe I'm misunderstanding the whole process, but I think these are the concerns people would have.

Comment 5 Collin Jackson 2008-09-16 11:56:46 PDT
Created attachment 23479 [details]
Updated patch with improved layout test

I changed the layout test to remove some redundant cases and replace them with different ones.
Comment 6 Antti Koivisto 2008-09-17 00:11:50 PDT
I don't understand the addition of FrameLoaderClient::didParseAnchor(). Why not figure out the domain in WebCore and just invoke prefetchDNS() for links too like the first patch revision did?

Not really important for the first version but do you think the WebCore should track which domains have already been prefetched for the current document? It needs to be done in some level of the stack anyway and doing it early might be faster. 

As a minimal limiting measure it could avoid making prefetch callbacks for the domain of the current document (like simple relative URLs) since that is very likely already cached.
Comment 7 Collin Jackson 2008-09-17 00:52:00 PDT
(In reply to comment #6)
> I don't understand the addition of FrameLoaderClient::didParseAnchor(). Why not
> figure out the domain in WebCore and just invoke prefetchDNS() for links too
> like the first patch revision did?

The reason why I did it this way is that there is a "poor man's completeURL" in Document.cpp (lines 4080-4232) that tries to extract various features of hyperlinks that are encountered while parsing the page without calling the KURL constructor. According to a comment, it's "faster and less memory consumption" to do it this way. If we call the KURL constuctor for every hyperlink to figure out the domain we'd probably want to delete the "poor man's completeURL" since it's no longer providing any performance benefit; the URL has already been fully parsed. By passing the anchor's href and document off to the WebKit client we can let the client parse out the hostname on a separate thread if necessary. Also, clients that choose not to support DNS prefetch won't need to call the KURL constructor at all.

> Not really important for the first version but do you think the WebCore should
> track which domains have already been prefetched for the current document? It
> needs to be done in some level of the stack anyway and doing it early might be
> faster. 

The client may be running multiple instances of WebCore, and thus has a more complete picture of what's going on. I think that's a better place to prune duplicates.

> As a minimal limiting measure it could avoid making prefetch callbacks for the
> domain of the current document (like simple relative URLs) since that is very
> likely already cached.

It would be easy to implement this policy using the FrameLoaderClient API provided. However, many servers use short DNS TTLs so it turns out that re-warming the current domain when a new hyperlink is created can often be helpful. Different clients may have different network stacks and thus may have different desired prefetch behavior, so I think it's ok to leave rate limiting up to the client.
Comment 8 Antti Koivisto 2008-09-17 02:46:22 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > I don't understand the addition of FrameLoaderClient::didParseAnchor(). Why not
> > figure out the domain in WebCore and just invoke prefetchDNS() for links too
> > like the first patch revision did?
> 
> The reason why I did it this way is that there is a "poor man's completeURL" in
> Document.cpp (lines 4080-4232) that tries to extract various features of
> hyperlinks that are encountered while parsing the page without calling the KURL
> constructor. According to a comment, it's "faster and less memory consumption"
> to do it this way. If we call the KURL constuctor for every hyperlink to figure
> out the domain we'd probably want to delete the "poor man's completeURL" since
> it's no longer providing any performance benefit; the URL has already been
> fully parsed. By passing the anchor's href and document off to the WebKit
> client we can let the client parse out the hostname on a separate thread if
> necessary. Also, clients that choose not to support DNS prefetch won't need to
> call the KURL constructor at all.

I don't think it is good to have two callbacks that do essentially the same thing. Also didParseAnchor() tells nothing about what should be implemented there where as prefetchDNS() is a good name.

Document::visitedLinkHash() is a hash function used by style selection and can get invoked interactively and frequently. It does not need full URL parsing either. In contrast link attribute parsing is done (mostly) only once during document parsing.

I think relative link URLs (or actually anything that does not start with http:// or https://) can be ignored since the domain is the same as the current document's. That would help cutting out many links on many pages. At some level you will need to parse the URL enough to figure out the domain anyway. KURL isn't optimal but it has not turned out to be significant bottleneck so far either, I wouldn't worry too much about that. Faster implementations are of course welcome.

It is generally good to push as much logic as possible to WebCore level where it can be shared between implementations.

> > Not really important for the first version but do you think the WebCore should
> > track which domains have already been prefetched for the current document? It
> > needs to be done in some level of the stack anyway and doing it early might be
> > faster. 
> 
> The client may be running multiple instances of WebCore, and thus has a more
> complete picture of what's going on. I think that's a better place to prune
> duplicates.

Wouldn't it be even more important in that case to cut down unnecessary ipc traffic? Pruning duplicates here does not prevent pruning them more later. In any case, this can wait.

> It would be easy to implement this policy using the FrameLoaderClient API
> provided. However, many servers use short DNS TTLs so it turns out that
> re-warming the current domain when a new hyperlink is created can often be
> helpful. Different clients may have different network stacks and thus may have
> different desired prefetch behavior, so I think it's ok to leave rate limiting
> up to the client.

Right, but with current code the prefetching is anyway only invoked when the document is first parsed (at which point the current document domain is certainly in cache). Short TTL would expire all prefetched DSNs on the current page. In any case it is not necessary to make the prefetch call repeatedly for the current domain for every relative link in the document, once should be enough.
Comment 9 Collin Jackson 2008-09-17 11:08:59 PDT
Created attachment 23504 [details]
Updated patch more comments from Antti

(In reply to comment #8)
> I don't think it is good to have two callbacks that do essentially the same
> thing. Also didParseAnchor() tells nothing about what should be implemented
> there where as prefetchDNS() is a good name.

Ok, I got rid of didParseAnchor.

> I think relative link URLs (or actually anything that does not start with
> http:// or https://) can be ignored since the domain is the same as the current
> document's. That would help cutting out many links on many pages. 

Ok, I've implemented this suggestion. I also added a check for scheme-relative URLs (starting with "//"), since they are used heavily on certain sites (e.g. Slashdot).
Comment 10 Antti Koivisto 2008-09-17 12:51:37 PDT
Comment on attachment 23504 [details]
Updated patch more comments from Antti

Looks good, r=me
Comment 11 Mark Rowe (bdash) 2008-09-17 19:42:38 PDT
Why is the prefetching proper handled at the FrameLoaderClient level?  This pushes the actual prefetching into WebKit when it should really live in WebCore.   The policy of whether to prefetch a given URL could still be left up to FrameLoaderClient.
Comment 12 Collin Jackson 2008-09-18 01:18:39 PDT
Comment on attachment 23504 [details]
Updated patch more comments from Antti

Clearing the review flag while I work on this some more.
Comment 13 Maciej Stachowiak 2008-09-18 03:55:04 PDT
Added Stephanie to Cc as this should probably be run through the who range of page loading perf tests before getting landed.
Comment 14 Collin Jackson 2008-09-18 11:57:00 PDT
Created attachment 23534 [details]
Updated patch with comments from Mark Rowe

Here's a new version that puts the responsibility of prefetching in WebCore. Mark, can you please take a look?
Comment 15 Mark Rowe (bdash) 2008-09-18 12:36:44 PDT
Comment on attachment 23534 [details]
Updated patch with comments from Mark Rowe

 281             if (value.startsWith("http:") || value.startsWith("https:") || value.startsWith("//")) {
 282                 prefetchDNS(document()->completeURL(value).host());
 283             }

You have an unnecessary set of braces around this if statement.

There's no need for a separate DNSMac.cpp and DNSCFNet.cpp -- both Mac and Windows will probably use CFHost for prefetching.

The header guard on DNS.h is named DNSPrefetch_h, which doesn't match the file name.

You'll also need to stub out prefetchDNS for other platforms in order to not break Qt, Gtk and wx builds.  Adding to their TemporaryLinkStubs.cpp file is probably the right approach for those ports.

Unless I'm not understanding the test, it doesn't appear to actually test whether DNS prefetching has worked nor whether parsing the relevant headers and attributes were parsed, only that it does not crash while parsing.  The output from DRT could be a little clearer about this as it currently says "Browsers MAY do a DNS prefetch on the following links:" then doesn't list any links.  There are also coding style issues in the JavaScript, such as braces around one-line if statements.

r=me if you address these issues.
Comment 16 Darin Adler 2008-09-18 12:39:30 PDT
Comment on attachment 23534 [details]
Updated patch with comments from Mark Rowe

+            if (value.startsWith("http:") || value.startsWith("https:") || value.startsWith("//")) {

The protocolIs function (it's in KURL.h, but it works on a String) should be used for these types of checks. It's efficient and does a case-folding comparison.

Unless there's a reason to not use it here?
Comment 17 Collin Jackson 2008-09-18 14:28:33 PDT
Created attachment 23538 [details]
Updated patch with more comments from Mark Rowe

Stephanie, could you please run the page loading performance tests on this version of the patch?
Comment 18 Stephanie Lewis 2008-09-18 19:01:36 PDT
PLT numbers are good
Comment 19 Collin Jackson 2008-09-19 00:07:02 PDT
Landed in r36650 (with some minor fixes in r36654 and r36658).