Bug 17480 - Implement speculative preloading
Summary: Implement speculative preloading
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: All OS X 10.5
: P2 Enhancement
Assignee: Antti Koivisto
Depends on:
Reported: 2008-02-21 15:28 PST by Antti Koivisto
Modified: 2008-03-13 14:33 PDT (History)
5 users (show)

See Also:

preliminary patch (64.87 KB, patch)
2008-02-21 18:18 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
updated patch (66.96 KB, patch)
2008-02-26 23:47 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
another updated patch (69.28 KB, patch)
2008-03-07 00:13 PST, Antti Koivisto
no flags Details | Formatted Diff | Diff
yet another updated patch (70.05 KB, patch)
2008-03-11 17:19 PDT, Antti Koivisto
no flags Details | Formatted Diff | Diff
Darin's comments etc. (79.87 KB, patch)
2008-03-12 00:25 PDT, Antti Koivisto
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antti Koivisto 2008-02-21 15:28:26 PST
Currently when the parser encounters an external script resource it halts parsing and continues only after the resource has been received from the network. A problem with this is that we are not issuing any more load requests to the network layer. Loading performance depends heavily on parallel loading to hide network latency. With parser halted we quickly run out of things to load leading to poor network bandwidth utilizations.

This effect can be can reduced by introducing a side parser that parses the document source after the resource that halted the main parse. It searches for resources that are likely to be needed to complete the load and issues requests for them. This also makes it possible to schedule script and stylesheet requests (which block rendering) before any image requests.
Comment 1 Antti Koivisto 2008-02-21 18:18:33 PST
Created attachment 19269 [details]
preliminary patch

The patch implements speculative preloading. It also implements reordering of resource loads to schedule scripts and stylesheets first.
Comment 2 Alexey Proskuryakov 2008-02-21 21:42:44 PST
See also: bug 7722, bug 11838, bug 17040.
Comment 3 Antti Koivisto 2008-02-26 23:47:10 PST
Created attachment 19398 [details]
updated patch

Added code to track how well preloads are utilized and debug output that shows some hit rate statistics after page load. Some minor bug fixes too and made it apply to current ToT.
Comment 4 Antti Koivisto 2008-03-07 00:13:29 PST
Created attachment 19580 [details]
another updated patch

- maintain preloads in DocLoader, not HTMLPreloadTokenizer 
- preparse document.write() output, a major performance win in some cases
- fix reversed load ordering
Comment 5 Antti Koivisto 2008-03-11 17:19:02 PDT
Created attachment 19691 [details]
yet another updated patch

Some cleanups and more optimal loading for file: protocol. Performance looks ok in PLT too.

Flipping the review bit to troll for comments.
Comment 6 Darin Adler 2008-03-11 17:32:23 PDT
Comment on attachment 19691 [details]
yet another updated patch

The basic structure of the code looks really good!

+// FIXME main tokenizer includes this too, are we getting two copies of the code and data?

We are! We should fix that.

I am sad to see so much of the HTML tokenizing/parsing code copied and pasted, but I don't have a better suggestion.

+    class HTMLPreloadTokenizer : Noncopyable
+    {

Please move brace up.

+        HTMLPreloadTokenizer(Document* doc);

Please omit the name "doc" here.

+        void emitCharacter(UChar c);
+        void tokenizeCSS(UChar c);

And the "c" here.

+    , m_preloadTokenizer(0)

Not needed. OwnPtr gets initialized to 0 automatically.

     bool referenced() const { return !m_clients.isEmpty(); }
+    enum PreloadResult {

I think a blank line here would make this read better.

+#include "CString.h"

+#include "CString.h"

I think it's OK to include this header unconditionally in these places.

+    void registerPreload(CachedResource* resource);

No need for the word resource here.

+// Match the parallel connection count used by the networking layer
+// FIXME should not hardcode something like this
+const unsigned maxRequestsInFlightPerHost = 4;
+// Having a limit might still help getting more important resources first
+const unsigned maxRequestsInFlightForNonHTTPProtocols= 20;

These should be "static const" rather than just "const".

+        case CachedResource::ImageResource:
+            return Low;
+        default:
+            ASSERT_NOT_REACHED();
+            return Low;

Moving the default case out of hte switch will cause us to get a warning if there are any unhandled cases.

+    KURL url(resource->url());

If you use a const KURL& you can save a little refcount thrashing.

+    bool isHTTP = url.protocol() == "http" || url.protocol() == "https";

This should use the new protocolIs function. For one thing, it makes us not be case sensitive. And it's also more efficient.

+    if (priority > Low || !isHTTP)
+        // Try to request important resources immediately
+        host->servePendingRequests(priority);
+    else
+        // Handle asynchronously so early low priority requests don't get scheduled before later high priority ones
+        scheduleServePendingRequests();

Please put braces around these multiline bodies, even though they are one statement (the comma makes it two lines).

+        enum Priority {
+            Low, Medium, High
+        };

I'd like this better all on a single line.

+        Priority determinePriority(const CachedResource* resource) const;

No need for the word resource here.

+            void cancelPendingRequests(RequestQueue& requestsPending, DocLoader* docLoader);

No need for the word "docLoader" here.
Comment 7 Antti Koivisto 2008-03-12 00:25:49 PDT
Created attachment 19694 [details]
Darin's comments etc.

- Implemented Darin's comments
- Renamed the preloader class to PreloadScanner as suggested by Mitz
- ChangeLogs
- Some other minor tweaks
Comment 8 Darin Adler 2008-03-13 10:18:48 PDT
Comment on attachment 19694 [details]
Darin's comments etc.

+// FIXME: temporary, add PreloadScanner to all build files

I think this should go in the header, not the .cpp file.

+            // printf("DOCUMENT.WRITE PRELOAD\n");

Please don't check in commented-out printf.

+#include "config.h"
+#include "PreloadScanner.h"

Normally we don't leave a blank line there.

+    // FIXME there is a table for more exceptions in the spec

We'd normally use a colon and use a sentence (capital first letter, period) in a comment like this one.

It would be good to say which specification you're talking about. In general it seems that a comment saying that this was derived from the HTML 5 specification and which draft you used would be helpful.

+#define PreloadScanner_h
+#include "AtomicString.h"
+#include "PlatformString.h"
+#include "SegmentedString.h"

No need to include PlatformString.h here.

+#include <wtf/ListHashSet.h>
+#include <wtf/Noncopyable.h>
+#include <wtf/OwnPtr.h>

No reason to include ListHashSet.h or OwnPtr.h in this header.

+    unsigned m_preloadCount;
+    PreloadResult m_preloadResult;
+    bool m_requestedFromNetworkingLayer;

Why are these protected rather than private? Please make things private unless there is a reason not to.

+static const unsigned maxRequestsInFlightForNonHTTPProtocols= 20;

+static const unsigned maxRequestsInFlightForNonHTTPProtocols= 10000;

Missing space before "=".

+    KURL u(resource->url());
+    printf("HOST %s COUNT %d RECEIVED %s\n", u.host().latin1().data(), m_requestsLoading.size(), resource->url().latin1().data());

I think we need to add easier ways to log things. All this x.latin1().data() and x.utf8().data() gets tiring after a while!

These are all minor editorial comments, so r=me
Comment 9 Mike Belshe 2008-03-13 10:42:29 PDT

I believe in CancelRequests, the ';' at the end of the if statement is a bug:
+        if (host->hasRequests());
+            host->cancelRequests(docLoader);
Comment 10 Mike Belshe 2008-03-13 11:19:32 PDT
In PreloadScanner::tokenize
   if (!match || (!isWhitespace(tmpChar) && tmpChar != '>' & tmpChar != '/')) {

I believe that should be a "&&"
Comment 11 Antti Koivisto 2008-03-13 14:33:48 PDT
r31038 plus fixes to issues spotted by Mike in r31041