Bug 20690 - Support for DNS prefetch
: Support for DNS prefetch
Status: RESOLVED FIXED
: WebKit
WebCore Misc.
: 528+ (Nightly build)
: All All
: P2 Normal
Assigned To:
:
:
:
:
  Show dependency treegraph
 
Reported: 2008-09-06 12:49 PST by
Modified: 2009-06-14 15:34 PST (History)


Attachments
Proposed patch (10.78 KB, patch)
2008-09-06 12:54 PST, Collin Jackson
koivisto: review-
Review Patch | Details | Formatted Diff | Diff
Updated patch with comments from Antti (29.39 KB, patch)
2008-09-15 12:57 PST, Collin Jackson
no flags Review Patch | Details | Formatted Diff | Diff
Updated patch with improved layout test (29.45 KB, patch)
2008-09-16 11:56 PST, Collin Jackson
no flags Review Patch | Details | Formatted Diff | Diff
Updated patch more comments from Antti (28.42 KB, patch)
2008-09-17 11:08 PST, Collin Jackson
no flags Review Patch | Details | Formatted Diff | Diff
Updated patch with comments from Mark Rowe (26.32 KB, patch)
2008-09-18 11:57 PST, Collin Jackson
mrowe: review+
Review Patch | Details | Formatted Diff | Diff
Updated patch with more comments from Mark Rowe (27.02 KB, patch)
2008-09-18 14:28 PST, Collin Jackson
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2008-09-06 12:49:06 PST
DNS prefetching speculatively resolves host names in hyperlinks.  Once a host name has been resolved, if the user does navigate to that URL, the DNS lookup will already be in the local cache, reducing DNS resolution time.  This is already implemented in Chromium, but it would be better if we could merge the WebCore part of it into the WebKit repository so that other WebKit clients can use it as well.
------- Comment #1 From 2008-09-06 12:54:17 PST -------
Created an attachment (id=23220) [details]
Proposed patch
------- Comment #2 From 2008-09-06 14:50:16 PST -------
(From update of attachment 23220 [details])
Looks generally good except for the code changes to Document::visitedLinkHash(). That code needs better place though so r-.

Could this be tested with HTTP layout test (at least for code coverage)?

+        If ENABLE(DNS_PREFETCH) is true, notify the FrameLoaderClient 
+        about hyperlinks and <link rel="dns-prefetch"> tags that are
+        encountered so that it can speculatively resolve hostnames.

I would prefer not to have a feature define for this. Having it enabled on platforms that don't support it looks harmless and avoiding configuration proliferation is a good thing.

+#if ENABLE(DNS_PREFETCH)
+#include "FrameLoaderClient.h"
+#endif

No need to make this conditional.

 void Document::updateFocusAppearanceSoon()
@@ -4188,6 +4202,18 @@ unsigned Document::visitedLinkHash(const

     bool hasColonSlashSlash = containsColonSlashSlash(characters, length);

+#if ENABLE(DNS_PREFETCH)
+    String prefetchHost;
+    if (hasColonSlashSlash) {
+        KURL parsedURL(attributeURL, m_baseURL);
+        prefetchHost = parsedURL.host();
+    } else
+        prefetchHost = m_baseURL.host();
+
+    if (frame() && (m_isDNSPrefetchEnabled || prefetchHost == m_securityOrigin->host()))
+        frame()->loader()->client()->prefetchDNS(prefetchHost);
+#endif

This function is for calculating a hash and this kind of code is inappropriate here. Please find the right place to put this code. Would simply putting it to HTMLAnchorElement::parseMappedAttribute() work?

+void Document::initDNSPrefetchEnabled()
+{

Maybe just call it initDNSPrefetch()?

+    m_haveExplicitlyDisabledDNSPrefetch = false;
+    m_isDNSPrefetchEnabled = (securityOrigin()->protocol() == "http");

No need for parenthesis here.

+    // Inherit DNS prefetch opt-out from parent frame    
+    if (Document* parent = parentDocument())
+        if (!parent->isDNSPrefetchEnabled())
+            m_isDNSPrefetchEnabled = false;

Outer if needs curly brackets according to the coding style.

+void Document::setDNSPrefetchControl(const String& dnsPrefetchControl)
+{

This is not really a setter, how about parseDNSPrefetchControlHeader() or something?

+    if (equalIgnoringCase(dnsPrefetchControl, "on") && 
+        !m_haveExplicitlyDisabledDNSPrefetch) {

can be single line

+        m_isDNSPrefetchEnabled = true;
+        return;
+    }
+
+    m_isDNSPrefetchEnabled = false;
+    m_haveExplicitlyDisabledDNSPrefetch = true;
+}
+#endif
+

+    void setDNSPrefetchControl(const WebCore::String&);

No need for WebCore:: prefix.

@@ -111,6 +111,7 @@ protected:
     bool m_alternate;
     bool m_isStyleSheet;
     bool m_isIcon;
+    bool m_isDnsPrefetch;
     bool m_createdByParser;
 };

m_isDNSPrefetch to match coding style.
------- Comment #3 From 2008-09-15 12:57:04 PST -------
Created an attachment (id=23445) [details]
Updated patch with comments from Antti

(In reply to comment #2)
> Could this be tested with HTTP layout test (at least for code coverage)?

Good idea. I've added one. 

> +        If ENABLE(DNS_PREFETCH) is true, notify the FrameLoaderClient 
> +        about hyperlinks and <link rel="dns-prefetch"> tags that are
> +        encountered so that it can speculatively resolve hostnames.
> 
> I would prefer not to have a feature define for this. Having it enabled on
> platforms that don't support it looks harmless and avoiding configuration
> proliferation is a good thing.

Ok. I've removed the feature define.

> +#if ENABLE(DNS_PREFETCH)
> +#include "FrameLoaderClient.h"
> +#endif
> 
> No need to make this conditional.

Ok. This include isn't used in Document.cpp any more, so I just removed it.

>  void Document::updateFocusAppearanceSoon()
> @@ -4188,6 +4202,18 @@ unsigned Document::visitedLinkHash(const
> 
>      bool hasColonSlashSlash = containsColonSlashSlash(characters, length);
> 
> +#if ENABLE(DNS_PREFETCH)
> +    String prefetchHost;
> +    if (hasColonSlashSlash) {
> +        KURL parsedURL(attributeURL, m_baseURL);
> +        prefetchHost = parsedURL.host();
> +    } else
> +        prefetchHost = m_baseURL.host();
> +
> +    if (frame() && (m_isDNSPrefetchEnabled || prefetchHost ==
> m_securityOrigin->host()))
> +        frame()->loader()->client()->prefetchDNS(prefetchHost);
> +#endif
> 
> This function is for calculating a hash and this kind of code is inappropriate
> here. Please find the right place to put this code. Would simply putting it to
> HTMLAnchorElement::parseMappedAttribute() work?

Ok, I moved it.

> +void Document::initDNSPrefetchEnabled()
> +{
> 
> Maybe just call it initDNSPrefetch()?

Ok. I renamed it to initDNSPrefetch().

> +    m_haveExplicitlyDisabledDNSPrefetch = false;
> +    m_isDNSPrefetchEnabled = (securityOrigin()->protocol() == "http");
> 
> No need for parenthesis here.

Fixed.

> +    // Inherit DNS prefetch opt-out from parent frame    
> +    if (Document* parent = parentDocument())
> +        if (!parent->isDNSPrefetchEnabled())
> +            m_isDNSPrefetchEnabled = false;
> 
> Outer if needs curly brackets according to the coding style.

Fixed.

> +void Document::setDNSPrefetchControl(const String& dnsPrefetchControl)
> +{
> 
> This is not really a setter, how about parseDNSPrefetchControlHeader() or
> something?

Ok. I've renamed it to parseDNSPrefetchControlHeader().

> +    if (equalIgnoringCase(dnsPrefetchControl, "on") && 
> +        !m_haveExplicitlyDisabledDNSPrefetch) {
> 
> can be single line

Fixed.

> +        m_isDNSPrefetchEnabled = true;
> +        return;
> +    }
> +
> +    m_isDNSPrefetchEnabled = false;
> +    m_haveExplicitlyDisabledDNSPrefetch = true;
> +}
> +#endif
> +
> 
> +    void setDNSPrefetchControl(const WebCore::String&);
> 
> No need for WebCore:: prefix.

Fixed.

> @@ -111,6 +111,7 @@ protected:
>      bool m_alternate;
>      bool m_isStyleSheet;
>      bool m_isIcon;
> +    bool m_isDnsPrefetch;
>      bool m_createdByParser;
>  };
> 
> m_isDNSPrefetch to match coding style.

Fixed.
------- Comment #4 From 2008-09-15 23:24:50 PST -------
I hope there will be a thorough analysis of the security, privacy, and performance implications of this before it's implemented. A few thoughts/questions:

1) I'm reading a page with a discussion of DNS cache poisoning, and the page contains, as an example, a link to a domain whose authoritative DNS server has been poisoned. Any problems there?

2) I'm reading a local html that contains links. Will that cause network access?

3) I innocently navigate to a page that happens to have a link to a porn web site. Perhaps it's a page of search results, or it's a link in some web advertising. Now my ISP has a record of me asking their DNS servers to resolve a porn web site's domain.

4) I navigate to a page with a very large number of links from different domains, e.g., a page of search results. Will that cause hundreds of DNS requests?

Admittedly, I don't know much about DNS prefetching, so maybe I'm misunderstanding the whole process, but I think these are the concerns people would have.
------- Comment #5 From 2008-09-16 11:56:46 PST -------
Created an attachment (id=23479) [details]
Updated patch with improved layout test

I changed the layout test to remove some redundant cases and replace them with different ones.
------- Comment #6 From 2008-09-17 00:11:50 PST -------
I don't understand the addition of FrameLoaderClient::didParseAnchor(). Why not figure out the domain in WebCore and just invoke prefetchDNS() for links too like the first patch revision did?

Not really important for the first version but do you think the WebCore should track which domains have already been prefetched for the current document? It needs to be done in some level of the stack anyway and doing it early might be faster. 

As a minimal limiting measure it could avoid making prefetch callbacks for the domain of the current document (like simple relative URLs) since that is very likely already cached.
------- Comment #7 From 2008-09-17 00:52:00 PST -------
(In reply to comment #6)
> I don't understand the addition of FrameLoaderClient::didParseAnchor(). Why not
> figure out the domain in WebCore and just invoke prefetchDNS() for links too
> like the first patch revision did?

The reason why I did it this way is that there is a "poor man's completeURL" in Document.cpp (lines 4080-4232) that tries to extract various features of hyperlinks that are encountered while parsing the page without calling the KURL constructor. According to a comment, it's "faster and less memory consumption" to do it this way. If we call the KURL constuctor for every hyperlink to figure out the domain we'd probably want to delete the "poor man's completeURL" since it's no longer providing any performance benefit; the URL has already been fully parsed. By passing the anchor's href and document off to the WebKit client we can let the client parse out the hostname on a separate thread if necessary. Also, clients that choose not to support DNS prefetch won't need to call the KURL constructor at all.

> Not really important for the first version but do you think the WebCore should
> track which domains have already been prefetched for the current document? It
> needs to be done in some level of the stack anyway and doing it early might be
> faster. 

The client may be running multiple instances of WebCore, and thus has a more complete picture of what's going on. I think that's a better place to prune duplicates.

> As a minimal limiting measure it could avoid making prefetch callbacks for the
> domain of the current document (like simple relative URLs) since that is very
> likely already cached.

It would be easy to implement this policy using the FrameLoaderClient API provided. However, many servers use short DNS TTLs so it turns out that re-warming the current domain when a new hyperlink is created can often be helpful. Different clients may have different network stacks and thus may have different desired prefetch behavior, so I think it's ok to leave rate limiting up to the client.
------- Comment #8 From 2008-09-17 02:46:22 PST -------
(In reply to comment #7)
> (In reply to comment #6)
> > I don't understand the addition of FrameLoaderClient::didParseAnchor(). Why not
> > figure out the domain in WebCore and just invoke prefetchDNS() for links too
> > like the first patch revision did?
> 
> The reason why I did it this way is that there is a "poor man's completeURL" in
> Document.cpp (lines 4080-4232) that tries to extract various features of
> hyperlinks that are encountered while parsing the page without calling the KURL
> constructor. According to a comment, it's "faster and less memory consumption"
> to do it this way. If we call the KURL constuctor for every hyperlink to figure
> out the domain we'd probably want to delete the "poor man's completeURL" since
> it's no longer providing any performance benefit; the URL has already been
> fully parsed. By passing the anchor's href and document off to the WebKit
> client we can let the client parse out the hostname on a separate thread if
> necessary. Also, clients that choose not to support DNS prefetch won't need to
> call the KURL constructor at all.

I don't think it is good to have two callbacks that do essentially the same thing. Also didParseAnchor() tells nothing about what should be implemented there where as prefetchDNS() is a good name.

Document::visitedLinkHash() is a hash function used by style selection and can get invoked interactively and frequently. It does not need full URL parsing either. In contrast link attribute parsing is done (mostly) only once during document parsing.

I think relative link URLs (or actually anything that does not start with http:// or https://) can be ignored since the domain is the same as the current document's. That would help cutting out many links on many pages. At some level you will need to parse the URL enough to figure out the domain anyway. KURL isn't optimal but it has not turned out to be significant bottleneck so far either, I wouldn't worry too much about that. Faster implementations are of course welcome.

It is generally good to push as much logic as possible to WebCore level where it can be shared between implementations.

> > Not really important for the first version but do you think the WebCore should
> > track which domains have already been prefetched for the current document? It
> > needs to be done in some level of the stack anyway and doing it early might be
> > faster. 
> 
> The client may be running multiple instances of WebCore, and thus has a more
> complete picture of what's going on. I think that's a better place to prune
> duplicates.

Wouldn't it be even more important in that case to cut down unnecessary ipc traffic? Pruning duplicates here does not prevent pruning them more later. In any case, this can wait.

> It would be easy to implement this policy using the FrameLoaderClient API
> provided. However, many servers use short DNS TTLs so it turns out that
> re-warming the current domain when a new hyperlink is created can often be
> helpful. Different clients may have different network stacks and thus may have
> different desired prefetch behavior, so I think it's ok to leave rate limiting
> up to the client.

Right, but with current code the prefetching is anyway only invoked when the document is first parsed (at which point the current document domain is certainly in cache). Short TTL would expire all prefetched DSNs on the current page. In any case it is not necessary to make the prefetch call repeatedly for the current domain for every relative link in the document, once should be enough.
------- Comment #9 From 2008-09-17 11:08:59 PST -------
Created an attachment (id=23504) [details]
Updated patch more comments from Antti

(In reply to comment #8)
> I don't think it is good to have two callbacks that do essentially the same
> thing. Also didParseAnchor() tells nothing about what should be implemented
> there where as prefetchDNS() is a good name.

Ok, I got rid of didParseAnchor.

> I think relative link URLs (or actually anything that does not start with
> http:// or https://) can be ignored since the domain is the same as the current
> document's. That would help cutting out many links on many pages. 

Ok, I've implemented this suggestion. I also added a check for scheme-relative URLs (starting with "//"), since they are used heavily on certain sites (e.g. Slashdot).
------- Comment #10 From 2008-09-17 12:51:37 PST -------
(From update of attachment 23504 [details])
Looks good, r=me
------- Comment #11 From 2008-09-17 19:42:38 PST -------
Why is the prefetching proper handled at the FrameLoaderClient level?  This pushes the actual prefetching into WebKit when it should really live in WebCore.   The policy of whether to prefetch a given URL could still be left up to FrameLoaderClient.
------- Comment #12 From 2008-09-18 01:18:39 PST -------
(From update of attachment 23504 [details])
Clearing the review flag while I work on this some more.
------- Comment #13 From 2008-09-18 03:55:04 PST -------
Added Stephanie to Cc as this should probably be run through the who range of page loading perf tests before getting landed.
------- Comment #14 From 2008-09-18 11:57:00 PST -------
Created an attachment (id=23534) [details]
Updated patch with comments from Mark Rowe

Here's a new version that puts the responsibility of prefetching in WebCore. Mark, can you please take a look?
------- Comment #15 From 2008-09-18 12:36:44 PST -------
(From update of attachment 23534 [details])
 281             if (value.startsWith("http:") || value.startsWith("https:") || value.startsWith("//")) {
 282                 prefetchDNS(document()->completeURL(value).host());
 283             }

You have an unnecessary set of braces around this if statement.

There's no need for a separate DNSMac.cpp and DNSCFNet.cpp -- both Mac and Windows will probably use CFHost for prefetching.

The header guard on DNS.h is named DNSPrefetch_h, which doesn't match the file name.

You'll also need to stub out prefetchDNS for other platforms in order to not break Qt, Gtk and wx builds.  Adding to their TemporaryLinkStubs.cpp file is probably the right approach for those ports.

Unless I'm not understanding the test, it doesn't appear to actually test whether DNS prefetching has worked nor whether parsing the relevant headers and attributes were parsed, only that it does not crash while parsing.  The output from DRT could be a little clearer about this as it currently says "Browsers MAY do a DNS prefetch on the following links:" then doesn't list any links.  There are also coding style issues in the JavaScript, such as braces around one-line if statements.

r=me if you address these issues.
------- Comment #16 From 2008-09-18 12:39:30 PST -------
(From update of attachment 23534 [details])
+            if (value.startsWith("http:") || value.startsWith("https:") || value.startsWith("//")) {

The protocolIs function (it's in KURL.h, but it works on a String) should be used for these types of checks. It's efficient and does a case-folding comparison.

Unless there's a reason to not use it here?
------- Comment #17 From 2008-09-18 14:28:33 PST -------
Created an attachment (id=23538) [details]
Updated patch with more comments from Mark Rowe

Stephanie, could you please run the page loading performance tests on this version of the patch?
------- Comment #18 From 2008-09-18 19:01:36 PST -------
PLT numbers are good
------- Comment #19 From 2008-09-19 00:07:02 PST -------
Landed in r36650 (with some minor fixes in r36654 and r36658).