Bug 212034 - Support operating dates in ResourceLoadStatisticsDatabaseStore
Summary: Support operating dates in ResourceLoadStatisticsDatabaseStore
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kate Cheney
URL:
Keywords: InRadar
: 211775 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-05-18 10:52 PDT by Kate Cheney
Modified: 2022-02-12 21:01 PST (History)
4 users (show)

See Also:


Attachments
Patch (36.16 KB, patch)
2020-05-18 11:04 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (122.86 KB, patch)
2020-05-18 16:13 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (123.58 KB, patch)
2020-05-19 15:50 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (118.21 KB, patch)
2020-05-20 12:19 PDT, Kate Cheney
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kate Cheney 2020-05-18 10:52:46 PDT
We should add an Operating Dates table to the database schema.
Comment 1 Radar WebKit Bug Importer 2020-05-18 10:53:28 PDT
<rdar://problem/63349242>
Comment 2 Kate Cheney 2020-05-18 11:04:40 PDT
Created attachment 399658 [details]
Patch
Comment 3 Kate Cheney 2020-05-18 16:13:16 PDT
Created attachment 399681 [details]
Patch
Comment 4 John Wilander 2020-05-19 14:47:26 PDT
Comment on attachment 399681 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=399681&action=review

I like the tests. Was there something done to migrate the schema here?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3027
> +Optional<Seconds> ResourceLoadStatisticsDatabaseStore::statisticsExpirationTime() const

Looking at this function name and body, it's unclear what it does and what the scope of it is. It's const so nothing changes and it looks like a getter. But statisticsExpirationTime sounds like either the expiration time for an individual piece of statistics, in which case I'd expect the function to take a parameter, or some global expiration time which I don't think exists. If you explain to me what it does or is for, maybe we can come up with a better name together.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3157
> +void ResourceLoadStatisticsDatabaseStore::insertExpiredStatisticForTesting(const RegistrableDomain& domain, bool hasUserInteraction, bool isScheduledForAllButCookieDataRemoval, bool isPrevalent)

I wonder if we should break out these test functions into a friend class at some point. I don't know if that's something we do in WebKit but it sure would be nice to declutter the production logic.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:1201
> +

Were these 7 functions just moved from the base class? I know your change log entry says "Moves functions out of ResourceLoadStatisticsStore …" but it doesn't list which. Please make it more explicit in the change log since it looks like a big chunk of new functionality on the surface of it.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsPersistentStorage.cpp:211
> +    // Debug mode has a prepopulated memory store.

Nice.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:-57
> -constexpr Seconds operatingTimeWindowForLiveOnTesting { 1_h };

Please comment in the change log that these are moved to the header file.
Comment 5 Kate Cheney 2020-05-19 15:06:26 PDT
(In reply to John Wilander from comment #4)
> Comment on attachment 399681 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=399681&action=review
> 
> I like the tests. Was there something done to migrate the schema here?
> 

No additional migration code necessary, I recently added support for adding new tables and migrating existing data from the old tables over :)

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3027
> > +Optional<Seconds> ResourceLoadStatisticsDatabaseStore::statisticsExpirationTime() const
> 
> Looking at this function name and body, it's unclear what it does and what
> the scope of it is. It's const so nothing changes and it looks like a
> getter. But statisticsExpirationTime sounds like either the expiration time
> for an individual piece of statistics, in which case I'd expect the function
> to take a parameter, or some global expiration time which I don't think
> exists. If you explain to me what it does or is for, maybe we can come up
> with a better name together.
> 

Yes, definitely. This is used to clear expired user interactions in ResourceLoadStatisticsDatabaseStore::registrableDomainsToDeleteOrRestrictWebsiteDataFor(). It checks for the least recent Operating Date and will signal the store to clear UI that happened before that time. We could perhaps update the name to be "getLeastRecentOperatingDate()" or something of the sort. Or, we could have it take a statistic and return a bool deciding whether it was expired or not.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3157
> > +void ResourceLoadStatisticsDatabaseStore::insertExpiredStatisticForTesting(const RegistrableDomain& domain, bool hasUserInteraction, bool isScheduledForAllButCookieDataRemoval, bool isPrevalent)
> 
> I wonder if we should break out these test functions into a friend class at
> some point. I don't know if that's something we do in WebKit but it sure
> would be nice to declutter the production logic.
> 

Agreed. Most of the setters in this file are for testing only.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:1201
> > +
> 
> Were these 7 functions just moved from the base class? I know your change
> log entry says "Moves functions out of ResourceLoadStatisticsStore …" but it
> doesn't list which. Please make it more explicit in the change log since it
> looks like a big chunk of new functionality on the surface of it.
> 
> > 

Yes, will specify in ChangeLog.

Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsPersistentStorage.cpp:211
> > +    // Debug mode has a prepopulated memory store.
> 
> Nice.
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:-57
> > -constexpr Seconds operatingTimeWindowForLiveOnTesting { 1_h };
> 
> Please comment in the change log that these are moved to the header file.

Will do!
Comment 6 Kate Cheney 2020-05-19 15:38:19 PDT
Comment on attachment 399681 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=399681&action=review

>>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3027
>>> +Optional<Seconds> ResourceLoadStatisticsDatabaseStore::statisticsExpirationTime() const
>> 
>> Looking at this function name and body, it's unclear what it does and what the scope of it is. It's const so nothing changes and it looks like a getter. But statisticsExpirationTime sounds like either the expiration time for an individual piece of statistics, in which case I'd expect the function to take a parameter, or some global expiration time which I don't think exists. If you explain to me what it does or is for, maybe we can come up with a better name together.
> 
> Yes, definitely. This is used to clear expired user interactions in ResourceLoadStatisticsDatabaseStore::registrableDomainsToDeleteOrRestrictWebsiteDataFor(). It checks for the least recent Operating Date and will signal the store to clear UI that happened before that time. We could perhaps update the name to be "getLeastRecentOperatingDate()" or something of the sort. Or, we could have it take a statistic and return a bool deciding whether it was expired or not.

I just noticed statisticsExpirationTime() is unused in the memory store, so I will delete that.
Comment 7 Kate Cheney 2020-05-19 15:50:44 PDT
Created attachment 399776 [details]
Patch
Comment 8 Brent Fulgham 2020-05-19 17:42:17 PDT
Comment on attachment 399776 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=399776&action=review

I think this looks good - -and thank you for building such good test infrastructure. But I think you are doing way too many DB reads for this purpose. Instead, I think you should check and store these values at launch (and when you change them), but otherwise avoid hitting the DB. r- to fix that performance issue, but otherwise this looks great!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3027
> +Optional<Seconds> ResourceLoadStatisticsDatabaseStore::getLeastRecentOperatingDate() const

I prefer the original "statisticsExperationTime" name :-)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3034
> +    if (m_countOperatingDatesStatement.step() != SQLITE_ROW) {

We call this method every time we do a sweep of the data for stuff to delete (every few seconds), but this count only changes once a day, in includeTodayAsOperatingDateIfNecessary. I think we should compute its value in that method, and refer to it here.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3040
> +    const unsigned operatingDatesSize = m_countOperatingDatesStatement.getColumnInt(0);

I think this should be a member variable we compute in includeTodayAsOperatingDateIfNecessary and refer to elsewhere.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3068
> +    auto operatingDatesSize = m_countOperatingDatesStatement.getColumnInt(0);

I think this should be a member variable we can refer to later.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3080
> +        resetStatement(m_getMostRecentOperatingDateStatement);

It's almost like we need a scoped destructing object that resets a statement when leaving a method like this. Then you wouldn't have to have these cleanup steps everywhere.
But we probably only call this once a day -- it might be simpler to just have this statement created, prepared, and used in this method as a local.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3088
> +        if (m_deleteLeastRecentOperatingDatesStatement.bindInt(1, rowsToPrune) != SQLITE_OK

This statement is used rarely, too. I wonder if we couldn't just build it from a string right here and execute it, rather than having a member to hold it (and remember to reset it at the end, etc.).

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3096
> +    if (m_insertOperatingDateStatement.bindInt(1, today.year()) != SQLITE_OK

Ditto. We insert at most one time a day, so I'm not sure the complexity of storing this statement for later use is paid for in performance benefit.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3125
> +    if (m_countOperatingDatesStatement.step() != SQLITE_ROW) {

This method gets called every time we update cookie blocking, which might be very frequent. But our count of operating dates only changes once a day. So I think making this database call every time we hit this function is wrong.

I suggest that you store the operating date count as a member, updating it each time you call  includeTodayAsOperatingDateIfNecessary

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3142
> +        resetStatement(m_getLeastRecentOperatingDateStatement);

I suggest you grab this value in includeTodayAsOperatingDateIfNecessary, and store it to be checked here. (It only changes once a day, blah blah blah ...) :-)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3175
> +    }

If you make the changes I suggest, you will need to re-read the count (or just update the member variables) here.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:286
> +    mutable WebCore::SQLiteStatement m_getLeastRecentOperatingDateStatement;

I think we could get rid of a few of these stored queries, since they are only run once a day.
Comment 9 Kate Cheney 2020-05-20 12:19:59 PDT
Created attachment 399870 [details]
Patch
Comment 10 Brent Fulgham 2020-05-20 14:53:37 PDT
Comment on attachment 399870 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=399870&action=review

This looks great! r=me.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:2428
> +    Optional<Seconds> expirationDateTime = statisticsExpirationTime();

Oh gosh. What a typo! Thanks! :-)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3059
> +        double daysToSubtract = Seconds::fromHours(24 * i).value();

If we have to do this frequently, we should consider making a 'Seconds::fromDays()' helper. But not right now.
Comment 11 EWS 2020-05-20 15:14:17 PDT
Committed r261963: <https://trac.webkit.org/changeset/261963>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 399870 [details].
Comment 12 Brent Fulgham 2022-02-12 21:01:34 PST
*** Bug 211775 has been marked as a duplicate of this bug. ***