Bug 174215

Summary: Resource Load Statistics: Prune statistics in order of importance
Product: WebKit Reporter: John Wilander <wilander>
Component: WebKit2Assignee: John Wilander <wilander>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, buildbot, cdumez, dbates, japhet, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch (Rebaselined)
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch cdumez: review+

Description John Wilander 2017-07-06 12:55:19 PDT
We should cap the size of stored statistics and prune the store in order of importance, less to more.
Comment 1 Radar WebKit Bug Importer 2017-07-06 12:55:49 PDT
<rdar://problem/33164403>
Comment 2 John Wilander 2017-07-06 13:15:48 PDT
Created attachment 314750 [details]
Patch
Comment 3 John Wilander 2017-07-06 13:16:58 PDT
I know there are a couple of whitespace changes that shouldn't be there but I just can't get Xcode to not add them back in so I'll have to fix the files in some other editor before landing.
Comment 4 John Wilander 2017-07-06 13:17:26 PDT
Oh, patch doesn't seem to apply to trunk. Will merge.
Comment 5 John Wilander 2017-07-06 16:02:07 PDT
Created attachment 314767 [details]
Patch
Comment 6 John Wilander 2017-07-06 16:31:20 PDT
Created attachment 314776 [details]
Patch
Comment 7 Chris Dumez 2017-07-06 18:22:58 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:47
> +static size_t maxStatisticsEntries = 1000;

All those overridable statics are wrong and should be data members. There can be several ResourceLoadStatisticsStore per process at least in theory since there is one store per WebSiteDataStore.
I believe those should be data members.
Comment 8 Chris Dumez 2017-07-06 18:34:19 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

> Source/WebCore/loader/ResourceLoadStatistics.h:65
> +    WallTime lastSeen { WallTime::fromRawSeconds(-1) };

Do we really need this magic -1 value here? Do we really need to distinguish default value from reset?

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:295
> +void ResourceLoadStatisticsStore::setMaxStatisticsEntries(size_t entries)

entries is a bad name

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:300
> +void ResourceLoadStatisticsStore::setPruneEntriesDownTo(size_t entries)

entries is a bad name

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454
> +    size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo;

If this is 0, we're in trouble, based on your implementation of sortAndPrune()
Comment 9 Chris Dumez 2017-07-06 18:53:54 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:435
> +    std::sort(statisticsToPrune.begin(), statisticsToPrune.end(), [](const StatisticsLastSeen& a, const StatisticsLastSeen& b) {

If statisticsToPrune.size() <= numberOfEntriesToPrune, then sorting is useless, right? We are going to remove ALL the entries in statisticsToPrune in this case, and it does not matter in which order they are removed.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:448
> +void ResourceLoadStatisticsStore::pruneStatisticsIfNeeded()

How often is this called?
Comment 10 Chris Dumez 2017-07-06 19:07:24 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:441
> +        if (!--numberOfEntriesToPrune)

This is bad if numberOfEntriesToPrune is 0.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:456
> +    Vector<StatisticsLastSeen> nonPrevalentResourcesWithoutUserInteraction;

Feels like this could be an array of Vectors, then the pruning code could be written as a loop. e.g. roughly 

Vector<StatisticsLastSeen> resourcesToPrunePerImportance[maxImportance + 1];
for (auto& resourceStatistic : m_resourceStatisticsMap.values())
    resourcesToPruneInPriority[computeImportance(resourceStatistic)].append({ resourceStatistic.highLevelDomain, resourceStatistic.lastSeen });

unsigned importance = 0;
while (numberOfEntriesLeftToPrune) {
    ASSERT(importance <= maxImportance);
    numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance++], numberOfEntriesLeftToPrune);
}
Comment 11 Chris Dumez 2017-07-06 19:14:34 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

> Source/WebCore/loader/ResourceLoadObserver.cpp:82
> +static WallTime reduceTimeResolution(WallTime time)

What's the reason we need to reduce the resolution of the lastSeen timestamp?
Comment 12 Brent Fulgham 2017-07-10 12:45:59 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:47
>> +static size_t maxStatisticsEntries = 1000;
> 
> All those overridable statics are wrong and should be data members. There can be several ResourceLoadStatisticsStore per process at least in theory since there is one store per WebSiteDataStore.
> I believe those should be data members.

Will fix.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:295
>> +void ResourceLoadStatisticsStore::setMaxStatisticsEntries(size_t entries)
> 
> entries is a bad name

Will fix.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:300
>> +void ResourceLoadStatisticsStore::setPruneEntriesDownTo(size_t entries)
> 
> entries is a bad name

Will fix.

I also think we should assert if we set "pruneEntriesDownTo" some value that is greater than the maximum number of allowable entries.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:435
>> +    std::sort(statisticsToPrune.begin(), statisticsToPrune.end(), [](const StatisticsLastSeen& a, const StatisticsLastSeen& b) {
> 
> If statisticsToPrune.size() <= numberOfEntriesToPrune, then sorting is useless, right? We are going to remove ALL the entries in statisticsToPrune in this case, and it does not matter in which order they are removed.

Good point -- since at each pruning step we are seeking to remove a smaller number of records, we may hit this routine at a point where we need to remove all, and a later point where we only remove some.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:441
>> +        if (!--numberOfEntriesToPrune)
> 
> This is bad if numberOfEntriesToPrune is 0.

Again, I'm pretty sure we can't enter this function with a non-zero 'numberOfEntriesToPrune', but I added an ASSERT in this loop just to prove we don't do so.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:448
>> +void ResourceLoadStatisticsStore::pruneStatisticsIfNeeded()
> 
> How often is this called?

It looks like this is triggered when the WebProcess completes a load and thinks its found new data to share.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454
>> +    size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo;
> 
> If this is 0, we're in trouble, based on your implementation of sortAndPrune()

We only hit this if m_resourceStatisticsMap.size() > the max count, and the pruning target count should always be less than the overall max (by default, by around 200 or so). I don't think we can hit this, but I'll add an assert.

Would you prefer a test and early return?

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:456
>> +    Vector<StatisticsLastSeen> nonPrevalentResourcesWithoutUserInteraction;
> 
> Feels like this could be an array of Vectors, then the pruning code could be written as a loop. e.g. roughly 
> 
> Vector<StatisticsLastSeen> resourcesToPrunePerImportance[maxImportance + 1];
> for (auto& resourceStatistic : m_resourceStatisticsMap.values())
>     resourcesToPruneInPriority[computeImportance(resourceStatistic)].append({ resourceStatistic.highLevelDomain, resourceStatistic.lastSeen });
> 
> unsigned importance = 0;
> while (numberOfEntriesLeftToPrune) {
>     ASSERT(importance <= maxImportance);
>     numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance++], numberOfEntriesLeftToPrune);
> }

Seems reasonable.
Comment 13 Brent Fulgham 2017-07-10 13:03:26 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

>> Source/WebCore/loader/ResourceLoadObserver.cpp:82
>> +static WallTime reduceTimeResolution(WallTime time)
> 
> What's the reason we need to reduce the resolution of the lastSeen timestamp?

I'm not sure, either. I left it in for now, but it seems like this could be omitted if we found it was a hotspot.
Comment 14 Chris Dumez 2017-07-10 13:04:14 PDT
Comment on attachment 314776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review

>>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454
>>> +    size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo;
>> 
>> If this is 0, we're in trouble, based on your implementation of sortAndPrune()
> 
> We only hit this if m_resourceStatisticsMap.size() > the max count, and the pruning target count should always be less than the overall max (by default, by around 200 or so). I don't think we can hit this, but I'll add an assert.
> 
> Would you prefer a test and early return?

??

The check 3 lines above is:
if (m_resourceStatisticsMap.size() < maxStatisticsEntries)
    return;

So if m_resourceStatisticsMap.size() == maxStatisticsEntries then you can end up with 0 here, no?
Comment 15 Brent Fulgham 2017-07-10 13:07:42 PDT
(In reply to Chris Dumez from comment #14)
> Comment on attachment 314776 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=314776&action=review
> 
> >>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454
> >>> +    size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo;
> 
> The check 3 lines above is:
> if (m_resourceStatisticsMap.size() < maxStatisticsEntries)
>     return;
> 
> So if m_resourceStatisticsMap.size() == maxStatisticsEntries then you can
> end up with 0 here, no?

If m_resourceStatisticsMap.size() == maxStatisticsEntries, we can only get zero if maxStatisticsEntries also equals "pruneEntriesDownTo", right?

It's all academic now, since your revised algorithm protects against it.
Comment 16 Brent Fulgham 2017-07-10 13:11:06 PDT
Created attachment 315014 [details]
Patch
Comment 17 Brent Fulgham 2017-07-10 14:26:40 PDT
Created attachment 315029 [details]
Patch (Rebaselined)
Comment 18 Brent Fulgham 2017-07-10 16:31:48 PDT
Created attachment 315042 [details]
Patch
Comment 19 Chris Dumez 2017-07-10 16:32:12 PDT
Comment on attachment 315029 [details]
Patch (Rebaselined)

View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review

> Source/WebCore/loader/ResourceLoadStatistics.h:65
> +    WallTime lastSeen { WallTime::fromRawSeconds(-1) };

As I commented on the earlier iteration. I do not think we need this magic -1 value. Why do we need to distinguish reset from default value here? I think this was just a bad copy/paste from the other WallTime member in statistics.

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:150
> +        coreStore().pruneStatisticsIfNeeded();

Will need some rebaselining now that I merged the 2 stores.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:257
> +    ASSERT(pruneTargetCount <= m_maxStatisticsEntries);

It is best to check this when pruning than here since you could imagine someone calling setPruneEntriesDownTo() then setMaxStatisticsEntries(), in this order, and yet leading to something valid.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367
> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune)

Looks like the Vector should be const.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:375
> +    for (auto& statistic : statisticsToPrune) {

I think it would be better as:
for (size_t i = 0, end = std::min(numberOfEntriesToPrune, statisticsToPrune.size()); i != end; ++i)
    statisticsMap.remove(statisticsToPrune[i].topPrivatelyOwnedDomain);

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:382
> +    return numberOfEntriesToPrune;

Could we just pass numberOfEntriesToPrune by reference instead of returning it. It would seem better to me.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:385
> +constexpr size_t maxImportance { 3 };

I think global constants should go to the top of the cpp file.
I think using unsigned should be enough for importance, both here and below.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:389
> +    if (!resourceStatistic.isPrevalentResource)

I think this may be a bit more readable:
unsigned importance = maxImportance;
if (!resourceStatistic.isPrevalentResource)
    importance -= 1;
if (!resourceStatistic.hadUserInteraction)
    importance -= 2;
return importance;

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:398
> +    if (m_resourceStatisticsMap.size() < m_maxStatisticsEntries)

Shouldn't this be <= ?
If we have max entries, it sounds like we don't need pruning yet.

> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:409
> +    while (numberOfEntriesLeftToPrune) {

I think this would look better as:
for (unsigned importance = 0; numberOfEntriesLeftToPrune && importance <= maxImportance; ++importance)
    numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance], numberOfEntriesLeftToPrune);
Comment 20 Chris Dumez 2017-07-10 16:33:23 PDT
Comment on attachment 315029 [details]
Patch (Rebaselined)

View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:510
> +void WebResourceLoadStatisticsStore::setLastSeen(Seconds seconds, const URL& url)

The parameters look backward to me.
Comment 21 Brent Fulgham 2017-07-10 16:46:39 PDT
Comment on attachment 315029 [details]
Patch (Rebaselined)

View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review

>> Source/WebCore/loader/ResourceLoadStatistics.h:65
>> +    WallTime lastSeen { WallTime::fromRawSeconds(-1) };
> 
> As I commented on the earlier iteration. I do not think we need this magic -1 value. Why do we need to distinguish reset from default value here? I think this was just a bad copy/paste from the other WallTime member in statistics.

Ok. I'll get rid of it.

>> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:150
>> +        coreStore().pruneStatisticsIfNeeded();
> 
> Will need some rebaselining now that I merged the 2 stores.

No kidding! It wasn't too bad, but a new patch is up (missing some of your comments, so don't review it yet).

>> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:510
>> +void WebResourceLoadStatisticsStore::setLastSeen(Seconds seconds, const URL& url)
> 
> The parameters look backward to me.

Switched.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:257
>> +    ASSERT(pruneTargetCount <= m_maxStatisticsEntries);
> 
> It is best to check this when pruning than here since you could imagine someone calling setPruneEntriesDownTo() then setMaxStatisticsEntries(), in this order, and yet leading to something valid.

Will do.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367
>> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune)
> 
> Looks like the Vector should be const.

Will do.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:375
>> +    for (auto& statistic : statisticsToPrune) {
> 
> I think it would be better as:
> for (size_t i = 0, end = std::min(numberOfEntriesToPrune, statisticsToPrune.size()); i != end; ++i)
>     statisticsMap.remove(statisticsToPrune[i].topPrivatelyOwnedDomain);

Eh. Ok.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:382
>> +    return numberOfEntriesToPrune;
> 
> Could we just pass numberOfEntriesToPrune by reference instead of returning it. It would seem better to me.

Yes. Good idea.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:385
>> +constexpr size_t maxImportance { 3 };
> 
> I think global constants should go to the top of the cpp file.
> I think using unsigned should be enough for importance, both here and below.

OK.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:389
>> +    if (!resourceStatistic.isPrevalentResource)
> 
> I think this may be a bit more readable:
> unsigned importance = maxImportance;
> if (!resourceStatistic.isPrevalentResource)
>     importance -= 1;
> if (!resourceStatistic.hadUserInteraction)
>     importance -= 2;
> return importance;

Sure!

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:398
>> +    if (m_resourceStatisticsMap.size() < m_maxStatisticsEntries)
> 
> Shouldn't this be <= ?
> If we have max entries, it sounds like we don't need pruning yet.

Reasonable. I'll change it.

>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:409
>> +    while (numberOfEntriesLeftToPrune) {
> 
> I think this would look better as:
> for (unsigned importance = 0; numberOfEntriesLeftToPrune && importance <= maxImportance; ++importance)
>     numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance], numberOfEntriesLeftToPrune);

Okay!
Comment 22 Brent Fulgham 2017-07-10 16:49:16 PDT
Created attachment 315043 [details]
Patch
Comment 23 Chris Dumez 2017-07-10 16:53:05 PDT
Comment on attachment 315043 [details]
Patch

r=me if the bots are happy.
Comment 24 Brent Fulgham 2017-07-10 17:01:22 PDT
Created attachment 315048 [details]
Patch
Comment 25 Brent Fulgham 2017-07-10 17:23:45 PDT
Comment on attachment 315029 [details]
Patch (Rebaselined)

View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review

>>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367
>>> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune)
>> 
>> Looks like the Vector should be const.
> 
> Will do.

Bah -- it can't be -- the sort operation needs to modify the statisticsToPrune Vector.
Comment 26 Brent Fulgham 2017-07-10 17:25:03 PDT
Created attachment 315053 [details]
Patch
Comment 27 Chris Dumez 2017-07-10 18:16:14 PDT
(In reply to Brent Fulgham from comment #25)
> Comment on attachment 315029 [details]
> Patch (Rebaselined)
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=315029&action=review
> 
> >>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367
> >>> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune)
> >> 
> >> Looks like the Vector should be const.
> > 
> > Will do.
> 
> Bah -- it can't be -- the sort operation needs to modify the
> statisticsToPrune Vector.

oh, duh :) sorry.
Comment 28 Chris Dumez 2017-07-10 18:16:54 PDT
Your latest patch only seems to include the changes to TestRunner?
Comment 29 Brent Fulgham 2017-07-10 19:00:45 PDT
Created attachment 315061 [details]
Patch
Comment 30 Brent Fulgham 2017-07-10 19:02:38 PDT
(In reply to Chris Dumez from comment #28)
> Your latest patch only seems to include the changes to TestRunner?

Oh, rats. I must have uploaded from the Tools directory (which was the last place I was manually patching the changes). Grrr!

Uploaded the hopefully final version!
Comment 31 Brent Fulgham 2017-07-10 19:52:09 PDT
Committed r219319: <http://trac.webkit.org/changeset/219319>
Comment 32 Brent Fulgham 2017-07-10 20:31:20 PDT
Reopening to attach new patch.
Comment 33 Brent Fulgham 2017-07-10 20:31:21 PDT
Created attachment 315069 [details]
Patch
Comment 34 Chris Dumez 2017-07-10 23:32:34 PDT
Comment on attachment 315069 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=315069&action=review

> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:947
> +        ++removed;

Why not directly --numberOfEntriesToPrune; ? instead of having an extra variable?
Comment 35 Brent Fulgham 2017-07-11 09:35:13 PDT
Comment on attachment 315069 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=315069&action=review

>> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:947
>> +        ++removed;
> 
> Why not directly --numberOfEntriesToPrune; ? instead of having an extra variable?

I was worried that std::min would be evaluated at each iteration loop, but now I see that it was being assigned to 'end', so would not.

I'll land a correction to tidy that up.
Comment 36 Brent Fulgham 2017-07-11 09:35:37 PDT
Second patch:
Committed r219323: <http://trac.webkit.org/changeset/219323>
Comment 37 Brent Fulgham 2017-07-11 09:52:40 PDT
Final cleanup:
Committed r219337: <http://trac.webkit.org/changeset/219337>