Bug 162811 - Update Resource Load Statistics
Summary: Update Resource Load Statistics
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit2 (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: John Wilander
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-30 14:40 PDT by John Wilander
Modified: 2016-10-06 10:42 PDT (History)
5 users (show)

See Also:


Attachments
Patch (27.84 KB, patch)
2016-09-30 15:14 PDT, John Wilander
no flags Details | Formatted Diff | Diff
Patch (28.03 KB, patch)
2016-10-03 17:34 PDT, John Wilander
no flags Details | Formatted Diff | Diff
Patch (28.20 KB, patch)
2016-10-03 18:09 PDT, John Wilander
no flags Details | Formatted Diff | Diff
Patch (30.06 KB, patch)
2016-10-05 10:50 PDT, John Wilander
no flags Details | Formatted Diff | Diff
Patch (28.98 KB, patch)
2016-10-05 12:37 PDT, John Wilander
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Wilander 2016-09-30 14:40:18 PDT
Update resource load statistics. Use the Public Suffix list, capture web sockets statistics, capture synchronous XHR requests, skip main resource loads done in SubresourceLoader, and clear data records.
Comment 1 John Wilander 2016-09-30 15:14:33 PDT
Created attachment 290395 [details]
Patch
Comment 2 Andy Estes 2016-09-30 17:00:44 PDT
Comment on attachment 290395 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290395&action=review

> Source/WebCore/Modules/websockets/WebSocket.cpp:323
> +        } else {
> +            ResourceLoadObserver::sharedObserver().logWebSocketLoading(document.frame(), m_url);

Our style is to omit braces for single-line control clauses.

> Source/WebCore/loader/ResourceLoadObserver.cpp:61
> +static inline bool is3xxRedirect(const ResourceResponse& response)

Can you use ResourceResponse::isRedirected() instead of this function?

> Source/WebCore/loader/ResourceLoadObserver.cpp:63
> +    return !response.isNull() && response.httpStatusCode() / 100 == 3;

Probably ok to omit the isNull() check. Null responses should have a status code of 0, and 0 / 100 == 0.

> Source/WebCore/loader/ResourceLoadObserver.cpp:66
> +static inline bool shouldLog(bool usesEphemeralSession, RefPtr<ResourceLoadStatisticsStore> store)

Passing a RefPtr by value here causes unnecessary ref count churn.

This would make more sense a private member function that takes a Page* as an argument, since you can derive usesEphemeralSession from a Page, and can access m_store from 'this'.

> Source/WebCore/loader/ResourceLoadObserver.cpp:163
> +    if (frame && frame->page() && frame->page()->usesEphemeralSession())

You assert that frame and frame->page() are non-null above, so you don't need to check for that here.

> Source/WebCore/loader/ResourceLoadStatisticsStore.h:67
> +    WEBCORE_EXPORT Vector<String> getPrevalentResourceDomainsWithoutUserInteraction();

Our style is to omit 'get' from getters. This should be called 'prevalentResourceDomainsWithoutUserInteraction()'.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:47
> +static const double numberOfSecondsBetweenClearingDataRecords = 600.0;
> +static const double featureVectorLengthThreshold = 3.0;

Our style is to omit the '.0' from literals.

Do these really need to be doubles? Seems like auto would work fine here.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:69
> +    unsigned featureOne = resourceStatistic.subresourceUnderTopFrameOrigins.size();
> +    unsigned featureTwo = resourceStatistic.subresourceUniqueRedirectsTo.size();
> +    unsigned featureThree = resourceStatistic.subframeUnderTopFrameOrigins.size();

I'd use auto instead of unsigned. Also, 'feature{One,Two,Three}' seem like poor names. I'd call these subresourceUnderTopFrameOriginsCount, etc.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:79
> +    double vectorLength = 0.0;

Our style is to omit the '.0' from literals.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:86
> +    vectorLength += pow(featureOne, 2);
> +    vectorLength += pow(featureTwo, 2);
> +    vectorLength += pow(featureThree, 2);
> +
> +    ASSERT(vectorLength > 0.0);
> +
> +    return sqrt(vectorLength) > featureVectorLengthThreshold;

This could use a 'why' comment.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:96
> +void WebResourceLoadStatisticsStore::clearDataRecords(Vector<String> prevalentResources)

Passing a Vector by value doesn't seem like the right thing to do here. From looking at this function's caller, I think you want to pass an rvalue reference (Vector<String>&&).

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:114
> +    RunLoop::main().dispatch([prevalentResources, webResourceLoadStatisticsStore = this] {

Capturing prevalentResources by value makes a copy. This can be avoided by using move capture: [prevalentResources = WTFMove(prevalentResources), ...]. Also, I think you can just capture 'this' instead of assigning it to a new variable.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:117
> +        websiteDataStore.fetchData(WebsiteDataType::Cookies, { }, [prevalentResources, webResourceLoadStatisticsStore](auto websiteDataRecords) {

Ditto.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:132
> +            websiteDataStore.removeData(WebsiteDataType::Cookies, { WTFMove(dataRecords) }, [webResourceLoadStatisticsStore] {

Ditto regarding the 'this' capture issue.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:147
> +            Vector<String> prevalentResources;
> +            webResourceLoadStatisticsStore->classifyResource(resourceStatistic);
> +            prevalentResources = webResourceLoadStatisticsStore->coreStore().getPrevalentResourceDomainsWithoutUserInteraction();

Instead of default-constructing prevalentResources then assigning it a new value, you should construct it with the return value of getPrevalentResourceDomainsWithoutUserInteraction() to take advantage of the return value optimization.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:149
> +            if (prevalentResources.size() > 0)
> +                webResourceLoadStatisticsStore->clearDataRecords(prevalentResources);

I'd move the 'prevalentResources.size() > 0' check into WebResourceLoadStatisticsStore::clearDataRecords(), then either WTFMove() prevalentResources or just move the call to getPrevalentResourceDomainsWithoutUserInteraction() into this statement.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.h:92
> +    double m_lastTimeDataRecordsWereCleared { 0.0 };

Our style is to omit the '.0' from literals.
Comment 3 Andy Estes 2016-09-30 17:08:57 PDT
Comment on attachment 290395 [details]
Patch

r- due to the EWS failures.
Comment 4 Andy Estes 2016-09-30 17:22:32 PDT
Comment on attachment 290395 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290395&action=review

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:122
> +                    if (websiteDataRecord.displayName == prevalentResource || websiteDataRecord.displayName.endsWith("." + prevalentResource))
> +                        dataRecords.append(websiteDataRecord);

Do websiteDataRecord.displayName and prevalentResource represent hostnames? If so, what if the names differ in case but are otherwise equal?

It's not great to do two separate string comparisons here. You could do something like this instead:

    if (websiteDataRecord.displayName.endsWith(prevalentResource)) { // should we use endsWithIgnoringASCIICase() instead?
        unsigned suffixStart = websiteDataRecord.displayName.length() - prevalentResource.length();
        if (suffixStart == 0 || websiteDataRecord.displayName[suffixStart - 1] == '.')
            dataRecords.append(websiteDataRecord);
    }
Comment 5 Andy Estes 2016-09-30 17:25:44 PDT
(In reply to comment #4)
> Comment on attachment 290395 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=290395&action=review
> 
> > Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:122
> > +                    if (websiteDataRecord.displayName == prevalentResource || websiteDataRecord.displayName.endsWith("." + prevalentResource))
> > +                        dataRecords.append(websiteDataRecord);
> 
> Do websiteDataRecord.displayName and prevalentResource represent hostnames?
> If so, what if the names differ in case but are otherwise equal?
> 
> It's not great to do two separate string comparisons here. You could do
> something like this instead:
> 
>     if (websiteDataRecord.displayName.endsWith(prevalentResource)) { //
> should we use endsWithIgnoringASCIICase() instead?
>         unsigned suffixStart = websiteDataRecord.displayName.length() -
> prevalentResource.length();
>         if (suffixStart == 0 || websiteDataRecord.displayName[suffixStart -
> 1] == '.')
>             dataRecords.append(websiteDataRecord);
>     }

Note that we already do this for computing cookie partitions, so there is a similar function in NetworkStorageSessionCFNet.cpp: static inline bool hostIsInDomain(). I wonder if we should move that somewhere where it can be reused.
Comment 6 John Wilander 2016-10-03 17:34:02 PDT
Created attachment 290541 [details]
Patch
Comment 7 John Wilander 2016-10-03 17:38:34 PDT
Thanks for great review comments, Andy!

I picked up all of it except:

1. ResourceResponse::isRedirected(). I did see this function and considered it. But I want to capture "isRedirecting" which is the 3xx response whereas "isRedirected" I believe is a response that has already been redirected. I might be wrong. We should figure it out and make sure the naming is correct. Maybe also introduce ResourceResponse::isRedirecting().

2. hostIsInDomain(). I agree this would be nice to have as a shared function.
Comment 8 John Wilander 2016-10-03 18:09:47 PDT
Created attachment 290542 [details]
Patch
Comment 9 John Wilander 2016-10-03 18:10:36 PDT
Fixed a failing test case where a web socket was constructed in a detached frame.
Comment 10 John Wilander 2016-10-03 19:03:53 PDT
r206763 seems to be failing on Windows.
Comment 11 John Wilander 2016-10-05 10:50:24 PDT
Created attachment 290725 [details]
Patch
Comment 12 John Wilander 2016-10-05 10:51:09 PDT
Fixed Windows build error.
Comment 13 Alex Christensen 2016-10-05 12:10:22 PDT
Comment on attachment 290725 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290725&action=review

> Source/WebCore/ChangeLog:9
> +        No new tests. The counting is based on top privately owned domains
> +        which currently is not supported by layout tests nor API tests.

We need to make a way to synthesize things to test this.  We can't have such a large feature completely untested.

> Source/WebCore/loader/ResourceLoadObserver.cpp:64
> +static inline bool is3xxRedirect(const ResourceResponse& response)
>  {
> -    if (!Settings::resourceLoadStatisticsEnabled())
> -        return;
> +    return response.httpStatusCode() / 100 == 3;
> +}

I'd prefer range checks >= 300 && <= 399
Do we want to support unknown status codes, like 366 (which has no defined meaning)?

> Source/WebCore/loader/ResourceLoadObserver.cpp:68
> +    // Err on the safe side until we have sorted out what to do in worker contexts

This should be a FIXME.  Workers should have started with something that was once connected to a page.  We'll probably need to pipe something refcounted around like I did in https://trac.webkit.org/changeset/202930

> Source/WebCore/loader/ResourceLoadObserver.cpp:231
> +    const URL& mainFrameURL = frame ? frame->mainFrame().document()->url() : URL();

We already did an early return if frame were null.

> Source/WebCore/loader/ResourceLoadObserver.cpp:278
>  String ResourceLoadObserver::primaryDomain(const URL& url)

This logic ought to be in URL and it should definitely have unit tests.  It should just return a substring of URL.m_string instead of using a StringBuilder to reconstruct a String we've split apart.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:88
> +    vectorLength += pow(subresourceUnderTopFrameOriginsCount, 2);
> +    vectorLength += pow(subresourceUniqueRedirectsToCount, 2);
> +    vectorLength += pow(subframeUnderTopFrameOriginsCount, 2);

Don't use pow(x, 2).  Just use x * x.
Comment 14 John Wilander 2016-10-05 12:37:18 PDT
Created attachment 290740 [details]
Patch
Comment 15 John Wilander 2016-10-05 12:42:10 PDT
Thanks for the review, Alex! Fixed all of it. Some comments inline below.

(In reply to comment #13)
> Comment on attachment 290725 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=290725&action=review
> 
> > Source/WebCore/ChangeLog:9
> > +        No new tests. The counting is based on top privately owned domains
> > +        which currently is not supported by layout tests nor API tests.
> 
> We need to make a way to synthesize things to test this.  We can't have such
> a large feature completely untested.

I agree fully and I do have a plan.

> > Source/WebCore/loader/ResourceLoadObserver.cpp:64
> > +static inline bool is3xxRedirect(const ResourceResponse& response)
> >  {
> > -    if (!Settings::resourceLoadStatisticsEnabled())
> > -        return;
> > +    return response.httpStatusCode() / 100 == 3;
> > +}
> 
> I'd prefer range checks >= 300 && <= 399

Fixed.

> Do we want to support unknown status codes, like 366 (which has no defined
> meaning)?

I believe our network stack handles all of HTTP 300-399 as redirects. That's why I did it this way.

> > Source/WebCore/loader/ResourceLoadObserver.cpp:68
> > +    // Err on the safe side until we have sorted out what to do in worker contexts
> 
> This should be a FIXME.  Workers should have started with something that was
> once connected to a page.  We'll probably need to pipe something refcounted
> around like I did in https://trac.webkit.org/changeset/202930

Fixed.

> > Source/WebCore/loader/ResourceLoadObserver.cpp:231
> > +    const URL& mainFrameURL = frame ? frame->mainFrame().document()->url() : URL();
> 
> We already did an early return if frame were null.

Fixed.

> > Source/WebCore/loader/ResourceLoadObserver.cpp:278
> >  String ResourceLoadObserver::primaryDomain(const URL& url)
> 
> This logic ought to be in URL and it should definitely have unit tests.  It
> should just return a substring of URL.m_string instead of using a
> StringBuilder to reconstruct a String we've split apart.

Although that code didn't really change with this patch I agree it should have tests. So I removed it instead. WebKit ports without public suffix list support can decide how to go about this if they want to use the feature.

> > Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:88
> > +    vectorLength += pow(subresourceUnderTopFrameOriginsCount, 2);
> > +    vectorLength += pow(subresourceUniqueRedirectsToCount, 2);
> > +    vectorLength += pow(subframeUnderTopFrameOriginsCount, 2);
> 
> Don't use pow(x, 2).  Just use x * x.

Fixed.
Comment 16 Alex Christensen 2016-10-06 09:21:11 PDT
Comment on attachment 290740 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=290740&action=review

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:85
> +    // Vector length for n dimensions is sqrt(a^2 + (...) + n^2).
> +    double vectorLength = 0;

I'm not sure this comment is necessary.  I think it would be obvious if we did something like this:
double vectorLength = sqrt(
    a * a
    + b * b 
    + c * c);
Comment 17 John Wilander 2016-10-06 10:19:51 PDT
Comment on attachment 290740 [details]
Patch

Thanks, Alex! I'll revisit the vector bit in a subsequent patch since we have more logic to add there anyway.
Comment 18 WebKit Commit Bot 2016-10-06 10:42:43 PDT
Comment on attachment 290740 [details]
Patch

Clearing flags on attachment: 290740

Committed r206869: <http://trac.webkit.org/changeset/206869>
Comment 19 WebKit Commit Bot 2016-10-06 10:42:47 PDT
All reviewed patches have been landed.  Closing bug.