Currently when the parser encounters an external script resource it halts parsing and continues only after the resource has been received from the network. A problem with this is that we are not issuing any more load requests to the network layer. Loading performance depends heavily on parallel loading to hide network latency. With parser halted we quickly run out of things to load leading to poor network bandwidth utilizations. This effect can be can reduced by introducing a side parser that parses the document source after the resource that halted the main parse. It searches for resources that are likely to be needed to complete the load and issues requests for them. This also makes it possible to schedule script and stylesheet requests (which block rendering) before any image requests.
Created attachment 19269 [details] preliminary patch The patch implements speculative preloading. It also implements reordering of resource loads to schedule scripts and stylesheets first.
See also: bug 7722, bug 11838, bug 17040.
Created attachment 19398 [details] updated patch Added code to track how well preloads are utilized and debug output that shows some hit rate statistics after page load. Some minor bug fixes too and made it apply to current ToT.
Created attachment 19580 [details] another updated patch - maintain preloads in DocLoader, not HTMLPreloadTokenizer - preparse document.write() output, a major performance win in some cases - fix reversed load ordering
Created attachment 19691 [details] yet another updated patch Some cleanups and more optimal loading for file: protocol. Performance looks ok in PLT too. Flipping the review bit to troll for comments.
Comment on attachment 19691 [details] yet another updated patch The basic structure of the code looks really good! +// FIXME main tokenizer includes this too, are we getting two copies of the code and data? We are! We should fix that. I am sad to see so much of the HTML tokenizing/parsing code copied and pasted, but I don't have a better suggestion. + class HTMLPreloadTokenizer : Noncopyable + { Please move brace up. + HTMLPreloadTokenizer(Document* doc); Please omit the name "doc" here. + void emitCharacter(UChar c); + + void tokenizeCSS(UChar c); And the "c" here. + , m_preloadTokenizer(0) Not needed. OwnPtr gets initialized to 0 automatically. bool referenced() const { return !m_clients.isEmpty(); } + enum PreloadResult { I think a blank line here would make this read better. +#if PRELOAD_DEBUG +#include "CString.h" +#endif +#if REQUEST_DEBUG +#include "CString.h" +#endif I think it's OK to include this header unconditionally in these places. + void registerPreload(CachedResource* resource); No need for the word resource here. +// Match the parallel connection count used by the networking layer +// FIXME should not hardcode something like this +const unsigned maxRequestsInFlightPerHost = 4; +// Having a limit might still help getting more important resources first +const unsigned maxRequestsInFlightForNonHTTPProtocols= 20; These should be "static const" rather than just "const". + case CachedResource::ImageResource: + return Low; + default: + ASSERT_NOT_REACHED(); + return Low; Moving the default case out of hte switch will cause us to get a warning if there are any unhandled cases. + KURL url(resource->url()); If you use a const KURL& you can save a little refcount thrashing. + bool isHTTP = url.protocol() == "http" || url.protocol() == "https"; This should use the new protocolIs function. For one thing, it makes us not be case sensitive. And it's also more efficient. + if (priority > Low || !isHTTP) + // Try to request important resources immediately + host->servePendingRequests(priority); + else + // Handle asynchronously so early low priority requests don't get scheduled before later high priority ones + scheduleServePendingRequests(); Please put braces around these multiline bodies, even though they are one statement (the comma makes it two lines). + enum Priority { + Low, Medium, High + }; I'd like this better all on a single line. + Priority determinePriority(const CachedResource* resource) const; No need for the word resource here. + void cancelPendingRequests(RequestQueue& requestsPending, DocLoader* docLoader); No need for the word "docLoader" here.
Created attachment 19694 [details] Darin's comments etc. - Implemented Darin's comments - Renamed the preloader class to PreloadScanner as suggested by Mitz - ChangeLogs - Some other minor tweaks
Comment on attachment 19694 [details] Darin's comments etc. +// FIXME: temporary, add PreloadScanner to all build files +#if PLATFORM(MAC) +#define PRELOAD_SCANNER_ENABLED 1 +#else +#define PRELOAD_SCANNER_ENABLED 0 +#endif I think this should go in the header, not the .cpp file. + // printf("DOCUMENT.WRITE PRELOAD\n"); Please don't check in commented-out printf. +#include "config.h" + +#include "PreloadScanner.h" Normally we don't leave a blank line there. + // FIXME there is a table for more exceptions in the spec We'd normally use a colon and use a sentence (capital first letter, period) in a comment like this one. It would be good to say which specification you're talking about. In general it seems that a comment saying that this was derived from the HTML 5 specification and which draft you used would be helpful. +#define PreloadScanner_h + +#include "AtomicString.h" +#include "PlatformString.h" +#include "SegmentedString.h" No need to include PlatformString.h here. +#include <wtf/ListHashSet.h> +#include <wtf/Noncopyable.h> +#include <wtf/OwnPtr.h> No reason to include ListHashSet.h or OwnPtr.h in this header. + unsigned m_preloadCount; + PreloadResult m_preloadResult; + bool m_requestedFromNetworkingLayer; Why are these protected rather than private? Please make things private unless there is a reason not to. +static const unsigned maxRequestsInFlightForNonHTTPProtocols= 20; +static const unsigned maxRequestsInFlightForNonHTTPProtocols= 10000; Missing space before "=". +#if REQUEST_DEBUG + KURL u(resource->url()); + printf("HOST %s COUNT %d RECEIVED %s\n", u.host().latin1().data(), m_requestsLoading.size(), resource->url().latin1().data()); +#endif I think we need to add easier ways to log things. All this x.latin1().data() and x.utf8().data() gets tiring after a while! These are all minor editorial comments, so r=me
Note: I believe in CancelRequests, the ';' at the end of the if statement is a bug: + if (host->hasRequests()); + host->cancelRequests(docLoader);
In PreloadScanner::tokenize if (!match || (!isWhitespace(tmpChar) && tmpChar != '>' & tmpChar != '/')) { I believe that should be a "&&"
r31038 plus fixes to issues spotted by Mike in r31041