Bug 229527 - Separate PrivateClickMeasurement database from ResourceLoadStatistics database and add SPI to set its location
Summary: Separate PrivateClickMeasurement database from ResourceLoadStatistics databas...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alex Christensen
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-08-25 16:27 PDT by Alex Christensen
Modified: 2021-08-27 14:23 PDT (History)
11 users (show)

See Also:


Attachments
Patch (198.05 KB, patch)
2021-08-25 16:35 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (198.18 KB, patch)
2021-08-25 20:15 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (201.45 KB, patch)
2021-08-25 22:19 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (215.35 KB, patch)
2021-08-26 11:41 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (235.75 KB, patch)
2021-08-26 17:40 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (242.23 KB, patch)
2021-08-27 11:39 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff
Patch (242.24 KB, patch)
2021-08-27 12:08 PDT, Alex Christensen
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Christensen 2021-08-25 16:27:14 PDT
Separate PrivateClickMeasurement database from ResourceLoadStatistics database and add SPI to set its location
Comment 1 Alex Christensen 2021-08-25 16:35:43 PDT
Created attachment 436437 [details]
Patch
Comment 2 Alex Christensen 2021-08-25 20:15:42 PDT
Created attachment 436461 [details]
Patch
Comment 3 Alex Christensen 2021-08-25 22:19:40 PDT
Created attachment 436473 [details]
Patch
Comment 4 Alex Christensen 2021-08-26 11:41:55 PDT
Created attachment 436540 [details]
Patch
Comment 5 Kate Cheney 2021-08-26 13:26:14 PDT
Comment on attachment 436540 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436540&action=review

I like this change, it's very clever. My main question is why we are storing the PCM store inside the ITP store? I think it could be separate and use separate queues now that they are in different databases.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:561
>  void ResourceLoadStatisticsDatabaseStore::renameColumnInTable(String&& tableName, ExistingColumnName&& existingColumnName, ExpectedColumnName&& expectedColumnName)

You should delete this function, it is only used by the functions you deleted.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:795
>      m_updateAttributionsEarliestTimeToSendStatement = nullptr;

This seems like it can be deleted.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:60
>  using ExpectedColumnName = String;

ExistingColumns, ExpectedColumns, ExistingColumnName and ExpectedColumnName are all used in the migration functions you deleted. You can delete them here as well.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:133
>      Vector<String> columnsForTable(const String&);

Ditto, you should delete this function now that it is no longer used.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:142
>      void addMissingTablesIfNecessary();

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:145
>      void renameColumnInTable(String&&, ExistingColumnName&&, ExpectedColumnName&&);

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:146
>      void addMissingColumnsToTable(String&&, const ExistingColumns&, const ExpectedColumns&);

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:147
>      void renameColumnInTable();

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:149
>      void updatePrivateClickMeasurementSchemaIfNecessary();

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:267
>      std::unique_ptr<WebCore::SQLiteStatement> m_updateAttributionsEarliestTimeToSendStatement;

Please delete this.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:170
> +    , m_pcmStore(PCM::Store::create(sharedStatisticsQueue(), pcmStoreDirectory(networkSession, resourceLoadStatisticsDirectory, privateClickMeasurementStorageDirectory)))

We don't need PCM and ITP to share a queue because they have different databases now. Could we separate them out so we can do more work in parallel?

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:243
> +    PCM::Store& privateClickMeasurementStore() { return m_pcmStore.get();}

It seems strange to me to keep the PCM::Store as a member of the WebResourceLoadStatisticsStore instead of in the NetworkSession. Is there a reason you did it this way?

> Source/WebKit/NetworkProcess/NetworkSession.cpp:131
> +            return completionHandler(ResourceError(ResourceError::Type::Cancellation), { }, { });

Nice catch.

> Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementStore.cpp:43
> +    if (!databaseDirectory.isEmpty()) {

I can't find the place in code where you set this to be empty if the session is ephemeral, where does that happen?
Comment 6 Alex Christensen 2021-08-26 13:38:17 PDT
(In reply to Kate Cheney from comment #5)
> Comment on attachment 436540 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=436540&action=review
> 
> I like this change, it's very clever. My main question is why we are storing
> the PCM store inside the ITP store? I think it could be separate and use
> separate queues now that they are in different databases.
I'm just writing the two database files to the same directory by default so we don't have to add another directory, which would hurt startup time.  I like the idea of separate queues, though.

> > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:243
> > +    PCM::Store& privateClickMeasurementStore() { return m_pcmStore.get();}
> 
> It seems strange to me to keep the PCM::Store as a member of the
> WebResourceLoadStatisticsStore instead of in the NetworkSession. Is there a
> reason you did it this way?
That was necessary for the incremental approach with which I wrote this patch.  Now that it's a self-contained object, it can go anywhere.  I'll move it.

> > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementStore.cpp:43
> > +    if (!databaseDirectory.isEmpty()) {
> 
> I can't find the place in code where you set this to be empty if the session
> is ephemeral, where does that happen?
currently in a function called pcmStoreDirectory.
Comment 7 Kate Cheney 2021-08-26 15:24:59 PDT
Comment on attachment 436540 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436540&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:238
>  const int expectedIndexCount = 14;

I think you should decrement this now that you've removed some indexes.
Comment 8 Alex Christensen 2021-08-26 17:36:38 PDT
(In reply to Kate Cheney from comment #7)
> Comment on attachment 436540 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=436540&action=review
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:238
> >  const int expectedIndexCount = 14;
> 
> I think you should decrement this now that you've removed some indexes.

Yes, now it needs to be 12.

(In reply to Alex Christensen from comment #6)
> > > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:243
> > > +    PCM::Store& privateClickMeasurementStore() { return m_pcmStore.get();}
> > 
> > It seems strange to me to keep the PCM::Store as a member of the
> > WebResourceLoadStatisticsStore instead of in the NetworkSession. Is there a
> > reason you did it this way?
> That was necessary for the incremental approach with which I wrote this
> patch.  Now that it's a self-contained object, it can go anywhere.  I'll
> move it.
Although it's easy to move now, I think I'm going to keep it owned by WebResourceLoadStatisticsStore because of the lifetime of that object, which I want to be the same as the lifetime of PCM::Store.  Adding that lifetime management logic would look stranger than leaving this.
Comment 9 Alex Christensen 2021-08-26 17:40:05 PDT
Created attachment 436593 [details]
Patch
Comment 10 Alex Christensen 2021-08-26 17:40:24 PDT
Still no data migration, but this is getting closer.
Comment 11 Kate Cheney 2021-08-27 10:33:44 PDT
Comment on attachment 436593 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436593&action=review

Ok, took a closer pass today and I think this is almost there. I suggested one optimization for clearing now that we have a new schema, and had a few other nits. Then the grandfathering API tests seem to be failing as well. Grandfathering happens in WebResourceLoadStatisticsStore::populateMemoryStoreFromDisk() based on whether the file is new or not, so maybe a bug in the database open code?

> Source/WebKit/NetworkProcess/DatabaseUtilities.cpp:110
> +            return createdNewFile;

This return is not needed.

> Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:425
> +String Database::attributionToString(WebCore::SQLiteStatement* statement, PrivateClickMeasurementAttributionType attributionType)

Optional, could add ForTesting at the end of this function.

> Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:482
> +    ASSERT(!RunLoop::isMain());

Now that we have a separate table with all PCM domains and foreign key constraints, we can create one query to remove the domains from the PCMObservedDomains table to accomplish this task. Then we could replace this function with something simpler, like this:

void Database::clearPrivateClickMeasurement(std::optional<RegistrableDomain> domain)
{
ASSERT(!RunLoop::isMain());

    // Default to clear all entries if no domain is specified.
    String bindParameter = "%";
    if (domain) {
        auto domainIDToMatch = domainID(*domain);
        if (!domainIDToMatch)
            return;

        bindParameter = String::number(*domainIDToMatch);
    }

    auto transactionScope = beginTransactionIfNecessary();

    auto clearAllPrivateClickMeasurementScopedStatement = this->scopedStatement(m_ clearAllPrivateClickMeasurementStatement, clearAllPrivateClickMeasurementQuery, "clearPrivateClickMeasurement"_s);

    if (!clearAllPrivateClickMeasurementScopedStatement
        || clearAllPrivateClickMeasurementScopedStatement->bindText(1, bindParameter) != SQLITE_OK
        || clearAllPrivateClickMeasurementScopedStatement->step() != SQLITE_DONE) {
        ITP_RELEASE_LOG_ERROR(m_sessionID, "%p - ResourceLoadStatisticsStore::clearPrivateClickMeasurement clearAllPrivateClickMeasurementScopedStatement, error message: %" PRIVATE_LOG_STRING, this, m_database.lastErrorMsg());
        ASSERT_NOT_REACHED();
    }
}



You will need a new clearAllPrivateClickMeasurementQuery (DELETE FROM PCMObservedDomains WHERE domainID LIKE ?) and statement that remove the domains from the PCMObservedDomains table, but can remove the two others used in this function.

> Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:679
> +        RELEASE_LOG_ERROR(PrivateClickMeasurement, "%p - Database::insertObservedDomain failed to bind, error message: %" PRIVATE_LOG_STRING, this, m_database.lastErrorMsg());

Database::insertObservedDomain -> Database::ensureDomainID

> Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:685
> +        RELEASE_LOG_ERROR(PrivateClickMeasurement, "%p - Database::insertObservedDomain failed to commit, error message: %" PRIVATE_LOG_STRING, this, m_database.lastErrorMsg());

Ditto.

> Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementStore.cpp:102
> +    postTask([this, protectedThis = makeRef(*this), sourceSite = sourceSite.isolatedCopy(), destinationSite = destinationSite.isolatedCopy(), attributionTriggerData = WTFMove(attributionTriggerData), ephemeralMeasurement = crossThreadCopy(ephemeralMeasurement), completionHandler = WTFMove(completionHandler)] () mutable {

Why a crossThreadCopy of ephemeralMeasurement instead of an isolatedCopy?
Comment 12 Kate Cheney 2021-08-27 10:38:13 PDT
And, I think doing migration in a separate patch is fine, this one is quite big.
Comment 13 Kate Cheney 2021-08-27 11:18:35 PDT
Comment on attachment 436593 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436593&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:310
>      enableForeignKeys();

You should add this in the PCM database as well, so you can use PCMObservedDomains to perform cascading deletions.
Comment 14 Alex Christensen 2021-08-27 11:38:53 PDT
(In reply to Kate Cheney from comment #11)
> > +    postTask([this, protectedThis = makeRef(*this), sourceSite = sourceSite.isolatedCopy(), destinationSite = destinationSite.isolatedCopy(), attributionTriggerData = WTFMove(attributionTriggerData), ephemeralMeasurement = crossThreadCopy(ephemeralMeasurement), completionHandler = WTFMove(completionHandler)] () mutable {
> 
> Why a crossThreadCopy of ephemeralMeasurement instead of an isolatedCopy?

ephemeralMeasurement is a std::optional, which has no isolatedCopy member.  crossThreadCopy deals with this and calls isolatedCopy() of the contained type if the optional is engaged.
Comment 15 Alex Christensen 2021-08-27 11:39:30 PDT
Created attachment 436646 [details]
Patch
Comment 16 Kate Cheney 2021-08-27 12:01:21 PDT
Comment on attachment 436646 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=436646&action=review

r=me as soon as EWS is happy.

> Source/WebKit/NetworkProcess/DatabaseUtilities.cpp:109
> +            return createdNewFile;

This return is still not needed :P
Comment 17 Alex Christensen 2021-08-27 12:08:45 PDT
Created attachment 436654 [details]
Patch
Comment 18 Alex Christensen 2021-08-27 14:22:52 PDT
http://trac.webkit.org/r281721
Comment 19 Radar WebKit Bug Importer 2021-08-27 14:23:21 PDT
<rdar://problem/82454337>