RESOLVED FIXED 174215
Resource Load Statistics: Prune statistics in order of importance
https://bugs.webkit.org/show_bug.cgi?id=174215
Summary Resource Load Statistics: Prune statistics in order of importance
John Wilander
Reported 2017-07-06 12:55:19 PDT
We should cap the size of stored statistics and prune the store in order of importance, less to more.
Attachments
Patch (43.28 KB, patch)
2017-07-06 13:15 PDT, John Wilander
no flags
Patch (43.49 KB, patch)
2017-07-06 16:02 PDT, John Wilander
no flags
Patch (43.51 KB, patch)
2017-07-06 16:31 PDT, John Wilander
no flags
Patch (43.34 KB, patch)
2017-07-10 13:11 PDT, Brent Fulgham
no flags
Patch (Rebaselined) (43.42 KB, patch)
2017-07-10 14:26 PDT, Brent Fulgham
no flags
Patch (41.99 KB, patch)
2017-07-10 16:31 PDT, Brent Fulgham
no flags
Patch (41.76 KB, patch)
2017-07-10 16:49 PDT, Brent Fulgham
no flags
Patch (41.87 KB, patch)
2017-07-10 17:01 PDT, Brent Fulgham
no flags
Patch (12.78 KB, patch)
2017-07-10 17:25 PDT, Brent Fulgham
no flags
Patch (41.84 KB, patch)
2017-07-10 19:00 PDT, Brent Fulgham
no flags
Patch (4.31 KB, patch)
2017-07-10 20:31 PDT, Brent Fulgham
cdumez: review+
Radar WebKit Bug Importer
Comment 1 2017-07-06 12:55:49 PDT
John Wilander
Comment 2 2017-07-06 13:15:48 PDT
John Wilander
Comment 3 2017-07-06 13:16:58 PDT
I know there are a couple of whitespace changes that shouldn't be there but I just can't get Xcode to not add them back in so I'll have to fix the files in some other editor before landing.
John Wilander
Comment 4 2017-07-06 13:17:26 PDT
Oh, patch doesn't seem to apply to trunk. Will merge.
John Wilander
Comment 5 2017-07-06 16:02:07 PDT
John Wilander
Comment 6 2017-07-06 16:31:20 PDT
Chris Dumez
Comment 7 2017-07-06 18:22:58 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:47 > +static size_t maxStatisticsEntries = 1000; All those overridable statics are wrong and should be data members. There can be several ResourceLoadStatisticsStore per process at least in theory since there is one store per WebSiteDataStore. I believe those should be data members.
Chris Dumez
Comment 8 2017-07-06 18:34:19 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review > Source/WebCore/loader/ResourceLoadStatistics.h:65 > + WallTime lastSeen { WallTime::fromRawSeconds(-1) }; Do we really need this magic -1 value here? Do we really need to distinguish default value from reset? > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:295 > +void ResourceLoadStatisticsStore::setMaxStatisticsEntries(size_t entries) entries is a bad name > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:300 > +void ResourceLoadStatisticsStore::setPruneEntriesDownTo(size_t entries) entries is a bad name > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454 > + size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo; If this is 0, we're in trouble, based on your implementation of sortAndPrune()
Chris Dumez
Comment 9 2017-07-06 18:53:54 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:435 > + std::sort(statisticsToPrune.begin(), statisticsToPrune.end(), [](const StatisticsLastSeen& a, const StatisticsLastSeen& b) { If statisticsToPrune.size() <= numberOfEntriesToPrune, then sorting is useless, right? We are going to remove ALL the entries in statisticsToPrune in this case, and it does not matter in which order they are removed. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:448 > +void ResourceLoadStatisticsStore::pruneStatisticsIfNeeded() How often is this called?
Chris Dumez
Comment 10 2017-07-06 19:07:24 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:441 > + if (!--numberOfEntriesToPrune) This is bad if numberOfEntriesToPrune is 0. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:456 > + Vector<StatisticsLastSeen> nonPrevalentResourcesWithoutUserInteraction; Feels like this could be an array of Vectors, then the pruning code could be written as a loop. e.g. roughly Vector<StatisticsLastSeen> resourcesToPrunePerImportance[maxImportance + 1]; for (auto& resourceStatistic : m_resourceStatisticsMap.values()) resourcesToPruneInPriority[computeImportance(resourceStatistic)].append({ resourceStatistic.highLevelDomain, resourceStatistic.lastSeen }); unsigned importance = 0; while (numberOfEntriesLeftToPrune) { ASSERT(importance <= maxImportance); numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance++], numberOfEntriesLeftToPrune); }
Chris Dumez
Comment 11 2017-07-06 19:14:34 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review > Source/WebCore/loader/ResourceLoadObserver.cpp:82 > +static WallTime reduceTimeResolution(WallTime time) What's the reason we need to reduce the resolution of the lastSeen timestamp?
Brent Fulgham
Comment 12 2017-07-10 12:45:59 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:47 >> +static size_t maxStatisticsEntries = 1000; > > All those overridable statics are wrong and should be data members. There can be several ResourceLoadStatisticsStore per process at least in theory since there is one store per WebSiteDataStore. > I believe those should be data members. Will fix. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:295 >> +void ResourceLoadStatisticsStore::setMaxStatisticsEntries(size_t entries) > > entries is a bad name Will fix. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:300 >> +void ResourceLoadStatisticsStore::setPruneEntriesDownTo(size_t entries) > > entries is a bad name Will fix. I also think we should assert if we set "pruneEntriesDownTo" some value that is greater than the maximum number of allowable entries. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:435 >> + std::sort(statisticsToPrune.begin(), statisticsToPrune.end(), [](const StatisticsLastSeen& a, const StatisticsLastSeen& b) { > > If statisticsToPrune.size() <= numberOfEntriesToPrune, then sorting is useless, right? We are going to remove ALL the entries in statisticsToPrune in this case, and it does not matter in which order they are removed. Good point -- since at each pruning step we are seeking to remove a smaller number of records, we may hit this routine at a point where we need to remove all, and a later point where we only remove some. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:441 >> + if (!--numberOfEntriesToPrune) > > This is bad if numberOfEntriesToPrune is 0. Again, I'm pretty sure we can't enter this function with a non-zero 'numberOfEntriesToPrune', but I added an ASSERT in this loop just to prove we don't do so. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:448 >> +void ResourceLoadStatisticsStore::pruneStatisticsIfNeeded() > > How often is this called? It looks like this is triggered when the WebProcess completes a load and thinks its found new data to share. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454 >> + size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo; > > If this is 0, we're in trouble, based on your implementation of sortAndPrune() We only hit this if m_resourceStatisticsMap.size() > the max count, and the pruning target count should always be less than the overall max (by default, by around 200 or so). I don't think we can hit this, but I'll add an assert. Would you prefer a test and early return? >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:456 >> + Vector<StatisticsLastSeen> nonPrevalentResourcesWithoutUserInteraction; > > Feels like this could be an array of Vectors, then the pruning code could be written as a loop. e.g. roughly > > Vector<StatisticsLastSeen> resourcesToPrunePerImportance[maxImportance + 1]; > for (auto& resourceStatistic : m_resourceStatisticsMap.values()) > resourcesToPruneInPriority[computeImportance(resourceStatistic)].append({ resourceStatistic.highLevelDomain, resourceStatistic.lastSeen }); > > unsigned importance = 0; > while (numberOfEntriesLeftToPrune) { > ASSERT(importance <= maxImportance); > numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance++], numberOfEntriesLeftToPrune); > } Seems reasonable.
Brent Fulgham
Comment 13 2017-07-10 13:03:26 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review >> Source/WebCore/loader/ResourceLoadObserver.cpp:82 >> +static WallTime reduceTimeResolution(WallTime time) > > What's the reason we need to reduce the resolution of the lastSeen timestamp? I'm not sure, either. I left it in for now, but it seems like this could be omitted if we found it was a hotspot.
Chris Dumez
Comment 14 2017-07-10 13:04:14 PDT
Comment on attachment 314776 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=314776&action=review >>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454 >>> + size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo; >> >> If this is 0, we're in trouble, based on your implementation of sortAndPrune() > > We only hit this if m_resourceStatisticsMap.size() > the max count, and the pruning target count should always be less than the overall max (by default, by around 200 or so). I don't think we can hit this, but I'll add an assert. > > Would you prefer a test and early return? ?? The check 3 lines above is: if (m_resourceStatisticsMap.size() < maxStatisticsEntries) return; So if m_resourceStatisticsMap.size() == maxStatisticsEntries then you can end up with 0 here, no?
Brent Fulgham
Comment 15 2017-07-10 13:07:42 PDT
(In reply to Chris Dumez from comment #14) > Comment on attachment 314776 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=314776&action=review > > >>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:454 > >>> + size_t numberOfEntriesLeftToPrune = m_resourceStatisticsMap.size() - pruneEntriesDownTo; > > The check 3 lines above is: > if (m_resourceStatisticsMap.size() < maxStatisticsEntries) > return; > > So if m_resourceStatisticsMap.size() == maxStatisticsEntries then you can > end up with 0 here, no? If m_resourceStatisticsMap.size() == maxStatisticsEntries, we can only get zero if maxStatisticsEntries also equals "pruneEntriesDownTo", right? It's all academic now, since your revised algorithm protects against it.
Brent Fulgham
Comment 16 2017-07-10 13:11:06 PDT
Brent Fulgham
Comment 17 2017-07-10 14:26:40 PDT
Created attachment 315029 [details] Patch (Rebaselined)
Brent Fulgham
Comment 18 2017-07-10 16:31:48 PDT
Chris Dumez
Comment 19 2017-07-10 16:32:12 PDT
Comment on attachment 315029 [details] Patch (Rebaselined) View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review > Source/WebCore/loader/ResourceLoadStatistics.h:65 > + WallTime lastSeen { WallTime::fromRawSeconds(-1) }; As I commented on the earlier iteration. I do not think we need this magic -1 value. Why do we need to distinguish reset from default value here? I think this was just a bad copy/paste from the other WallTime member in statistics. > Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:150 > + coreStore().pruneStatisticsIfNeeded(); Will need some rebaselining now that I merged the 2 stores. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:257 > + ASSERT(pruneTargetCount <= m_maxStatisticsEntries); It is best to check this when pruning than here since you could imagine someone calling setPruneEntriesDownTo() then setMaxStatisticsEntries(), in this order, and yet leading to something valid. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367 > +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune) Looks like the Vector should be const. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:375 > + for (auto& statistic : statisticsToPrune) { I think it would be better as: for (size_t i = 0, end = std::min(numberOfEntriesToPrune, statisticsToPrune.size()); i != end; ++i) statisticsMap.remove(statisticsToPrune[i].topPrivatelyOwnedDomain); > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:382 > + return numberOfEntriesToPrune; Could we just pass numberOfEntriesToPrune by reference instead of returning it. It would seem better to me. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:385 > +constexpr size_t maxImportance { 3 }; I think global constants should go to the top of the cpp file. I think using unsigned should be enough for importance, both here and below. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:389 > + if (!resourceStatistic.isPrevalentResource) I think this may be a bit more readable: unsigned importance = maxImportance; if (!resourceStatistic.isPrevalentResource) importance -= 1; if (!resourceStatistic.hadUserInteraction) importance -= 2; return importance; > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:398 > + if (m_resourceStatisticsMap.size() < m_maxStatisticsEntries) Shouldn't this be <= ? If we have max entries, it sounds like we don't need pruning yet. > Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:409 > + while (numberOfEntriesLeftToPrune) { I think this would look better as: for (unsigned importance = 0; numberOfEntriesLeftToPrune && importance <= maxImportance; ++importance) numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance], numberOfEntriesLeftToPrune);
Chris Dumez
Comment 20 2017-07-10 16:33:23 PDT
Comment on attachment 315029 [details] Patch (Rebaselined) View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review > Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:510 > +void WebResourceLoadStatisticsStore::setLastSeen(Seconds seconds, const URL& url) The parameters look backward to me.
Brent Fulgham
Comment 21 2017-07-10 16:46:39 PDT
Comment on attachment 315029 [details] Patch (Rebaselined) View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review >> Source/WebCore/loader/ResourceLoadStatistics.h:65 >> + WallTime lastSeen { WallTime::fromRawSeconds(-1) }; > > As I commented on the earlier iteration. I do not think we need this magic -1 value. Why do we need to distinguish reset from default value here? I think this was just a bad copy/paste from the other WallTime member in statistics. Ok. I'll get rid of it. >> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:150 >> + coreStore().pruneStatisticsIfNeeded(); > > Will need some rebaselining now that I merged the 2 stores. No kidding! It wasn't too bad, but a new patch is up (missing some of your comments, so don't review it yet). >> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:510 >> +void WebResourceLoadStatisticsStore::setLastSeen(Seconds seconds, const URL& url) > > The parameters look backward to me. Switched. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:257 >> + ASSERT(pruneTargetCount <= m_maxStatisticsEntries); > > It is best to check this when pruning than here since you could imagine someone calling setPruneEntriesDownTo() then setMaxStatisticsEntries(), in this order, and yet leading to something valid. Will do. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367 >> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune) > > Looks like the Vector should be const. Will do. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:375 >> + for (auto& statistic : statisticsToPrune) { > > I think it would be better as: > for (size_t i = 0, end = std::min(numberOfEntriesToPrune, statisticsToPrune.size()); i != end; ++i) > statisticsMap.remove(statisticsToPrune[i].topPrivatelyOwnedDomain); Eh. Ok. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:382 >> + return numberOfEntriesToPrune; > > Could we just pass numberOfEntriesToPrune by reference instead of returning it. It would seem better to me. Yes. Good idea. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:385 >> +constexpr size_t maxImportance { 3 }; > > I think global constants should go to the top of the cpp file. > I think using unsigned should be enough for importance, both here and below. OK. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:389 >> + if (!resourceStatistic.isPrevalentResource) > > I think this may be a bit more readable: > unsigned importance = maxImportance; > if (!resourceStatistic.isPrevalentResource) > importance -= 1; > if (!resourceStatistic.hadUserInteraction) > importance -= 2; > return importance; Sure! >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:398 >> + if (m_resourceStatisticsMap.size() < m_maxStatisticsEntries) > > Shouldn't this be <= ? > If we have max entries, it sounds like we don't need pruning yet. Reasonable. I'll change it. >> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:409 >> + while (numberOfEntriesLeftToPrune) { > > I think this would look better as: > for (unsigned importance = 0; numberOfEntriesLeftToPrune && importance <= maxImportance; ++importance) > numberOfEntriesLeftToPrune = pruneResources(m_resourceStatisticsMap, resourcesToPrunePerImportance[importance], numberOfEntriesLeftToPrune); Okay!
Brent Fulgham
Comment 22 2017-07-10 16:49:16 PDT
Chris Dumez
Comment 23 2017-07-10 16:53:05 PDT
Comment on attachment 315043 [details] Patch r=me if the bots are happy.
Brent Fulgham
Comment 24 2017-07-10 17:01:22 PDT
Brent Fulgham
Comment 25 2017-07-10 17:23:45 PDT
Comment on attachment 315029 [details] Patch (Rebaselined) View in context: https://bugs.webkit.org/attachment.cgi?id=315029&action=review >>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367 >>> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune) >> >> Looks like the Vector should be const. > > Will do. Bah -- it can't be -- the sort operation needs to modify the statisticsToPrune Vector.
Brent Fulgham
Comment 26 2017-07-10 17:25:03 PDT
Chris Dumez
Comment 27 2017-07-10 18:16:14 PDT
(In reply to Brent Fulgham from comment #25) > Comment on attachment 315029 [details] > Patch (Rebaselined) > > View in context: > https://bugs.webkit.org/attachment.cgi?id=315029&action=review > > >>> Source/WebKit2/UIProcess/Storage/ResourceLoadStatisticsStore.cpp:367 > >>> +static size_t pruneResources(HashMap<String, WebCore::ResourceLoadStatistics>& statisticsMap, Vector<StatisticsLastSeen>& statisticsToPrune, size_t numberOfEntriesToPrune) > >> > >> Looks like the Vector should be const. > > > > Will do. > > Bah -- it can't be -- the sort operation needs to modify the > statisticsToPrune Vector. oh, duh :) sorry.
Chris Dumez
Comment 28 2017-07-10 18:16:54 PDT
Your latest patch only seems to include the changes to TestRunner?
Brent Fulgham
Comment 29 2017-07-10 19:00:45 PDT
Brent Fulgham
Comment 30 2017-07-10 19:02:38 PDT
(In reply to Chris Dumez from comment #28) > Your latest patch only seems to include the changes to TestRunner? Oh, rats. I must have uploaded from the Tools directory (which was the last place I was manually patching the changes). Grrr! Uploaded the hopefully final version!
Brent Fulgham
Comment 31 2017-07-10 19:52:09 PDT
Brent Fulgham
Comment 32 2017-07-10 20:31:20 PDT
Reopening to attach new patch.
Brent Fulgham
Comment 33 2017-07-10 20:31:21 PDT
Chris Dumez
Comment 34 2017-07-10 23:32:34 PDT
Comment on attachment 315069 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=315069&action=review > Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:947 > + ++removed; Why not directly --numberOfEntriesToPrune; ? instead of having an extra variable?
Brent Fulgham
Comment 35 2017-07-11 09:35:13 PDT
Comment on attachment 315069 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=315069&action=review >> Source/WebKit2/UIProcess/WebResourceLoadStatisticsStore.cpp:947 >> + ++removed; > > Why not directly --numberOfEntriesToPrune; ? instead of having an extra variable? I was worried that std::min would be evaluated at each iteration loop, but now I see that it was being assigned to 'end', so would not. I'll land a correction to tidy that up.
Brent Fulgham
Comment 36 2017-07-11 09:35:37 PDT
Brent Fulgham
Comment 37 2017-07-11 09:52:40 PDT
Note You need to log in before you can comment on or make changes to this bug.