RESOLVED FIXED 229527
Separate PrivateClickMeasurement database from ResourceLoadStatistics database and add SPI to set its location
https://bugs.webkit.org/show_bug.cgi?id=229527
Summary Separate PrivateClickMeasurement database from ResourceLoadStatistics databas...
Alex Christensen
Reported 2021-08-25 16:27:14 PDT
Separate PrivateClickMeasurement database from ResourceLoadStatistics database and add SPI to set its location
Attachments
Patch (198.05 KB, patch)
2021-08-25 16:35 PDT, Alex Christensen
no flags
Patch (198.18 KB, patch)
2021-08-25 20:15 PDT, Alex Christensen
no flags
Patch (201.45 KB, patch)
2021-08-25 22:19 PDT, Alex Christensen
no flags
Patch (215.35 KB, patch)
2021-08-26 11:41 PDT, Alex Christensen
no flags
Patch (235.75 KB, patch)
2021-08-26 17:40 PDT, Alex Christensen
no flags
Patch (242.23 KB, patch)
2021-08-27 11:39 PDT, Alex Christensen
no flags
Patch (242.24 KB, patch)
2021-08-27 12:08 PDT, Alex Christensen
no flags
Alex Christensen
Comment 1 2021-08-25 16:35:43 PDT
Alex Christensen
Comment 2 2021-08-25 20:15:42 PDT
Alex Christensen
Comment 3 2021-08-25 22:19:40 PDT
Alex Christensen
Comment 4 2021-08-26 11:41:55 PDT
Kate Cheney
Comment 5 2021-08-26 13:26:14 PDT
Comment on attachment 436540 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436540&action=review I like this change, it's very clever. My main question is why we are storing the PCM store inside the ITP store? I think it could be separate and use separate queues now that they are in different databases. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:561 > void ResourceLoadStatisticsDatabaseStore::renameColumnInTable(String&& tableName, ExistingColumnName&& existingColumnName, ExpectedColumnName&& expectedColumnName) You should delete this function, it is only used by the functions you deleted. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:795 > m_updateAttributionsEarliestTimeToSendStatement = nullptr; This seems like it can be deleted. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:60 > using ExpectedColumnName = String; ExistingColumns, ExpectedColumns, ExistingColumnName and ExpectedColumnName are all used in the migration functions you deleted. You can delete them here as well. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:133 > Vector<String> columnsForTable(const String&); Ditto, you should delete this function now that it is no longer used. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:142 > void addMissingTablesIfNecessary(); Ditto. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:145 > void renameColumnInTable(String&&, ExistingColumnName&&, ExpectedColumnName&&); Ditto. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:146 > void addMissingColumnsToTable(String&&, const ExistingColumns&, const ExpectedColumns&); Ditto. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:147 > void renameColumnInTable(); Ditto. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:149 > void updatePrivateClickMeasurementSchemaIfNecessary(); Ditto. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:267 > std::unique_ptr<WebCore::SQLiteStatement> m_updateAttributionsEarliestTimeToSendStatement; Please delete this. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:170 > + , m_pcmStore(PCM::Store::create(sharedStatisticsQueue(), pcmStoreDirectory(networkSession, resourceLoadStatisticsDirectory, privateClickMeasurementStorageDirectory))) We don't need PCM and ITP to share a queue because they have different databases now. Could we separate them out so we can do more work in parallel? > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:243 > + PCM::Store& privateClickMeasurementStore() { return m_pcmStore.get();} It seems strange to me to keep the PCM::Store as a member of the WebResourceLoadStatisticsStore instead of in the NetworkSession. Is there a reason you did it this way? > Source/WebKit/NetworkProcess/NetworkSession.cpp:131 > + return completionHandler(ResourceError(ResourceError::Type::Cancellation), { }, { }); Nice catch. > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementStore.cpp:43 > + if (!databaseDirectory.isEmpty()) { I can't find the place in code where you set this to be empty if the session is ephemeral, where does that happen?
Alex Christensen
Comment 6 2021-08-26 13:38:17 PDT
(In reply to Kate Cheney from comment #5) > Comment on attachment 436540 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=436540&action=review > > I like this change, it's very clever. My main question is why we are storing > the PCM store inside the ITP store? I think it could be separate and use > separate queues now that they are in different databases. I'm just writing the two database files to the same directory by default so we don't have to add another directory, which would hurt startup time. I like the idea of separate queues, though. > > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:243 > > + PCM::Store& privateClickMeasurementStore() { return m_pcmStore.get();} > > It seems strange to me to keep the PCM::Store as a member of the > WebResourceLoadStatisticsStore instead of in the NetworkSession. Is there a > reason you did it this way? That was necessary for the incremental approach with which I wrote this patch. Now that it's a self-contained object, it can go anywhere. I'll move it. > > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementStore.cpp:43 > > + if (!databaseDirectory.isEmpty()) { > > I can't find the place in code where you set this to be empty if the session > is ephemeral, where does that happen? currently in a function called pcmStoreDirectory.
Kate Cheney
Comment 7 2021-08-26 15:24:59 PDT
Comment on attachment 436540 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436540&action=review > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:238 > const int expectedIndexCount = 14; I think you should decrement this now that you've removed some indexes.
Alex Christensen
Comment 8 2021-08-26 17:36:38 PDT
(In reply to Kate Cheney from comment #7) > Comment on attachment 436540 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=436540&action=review > > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:238 > > const int expectedIndexCount = 14; > > I think you should decrement this now that you've removed some indexes. Yes, now it needs to be 12. (In reply to Alex Christensen from comment #6) > > > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:243 > > > + PCM::Store& privateClickMeasurementStore() { return m_pcmStore.get();} > > > > It seems strange to me to keep the PCM::Store as a member of the > > WebResourceLoadStatisticsStore instead of in the NetworkSession. Is there a > > reason you did it this way? > That was necessary for the incremental approach with which I wrote this > patch. Now that it's a self-contained object, it can go anywhere. I'll > move it. Although it's easy to move now, I think I'm going to keep it owned by WebResourceLoadStatisticsStore because of the lifetime of that object, which I want to be the same as the lifetime of PCM::Store. Adding that lifetime management logic would look stranger than leaving this.
Alex Christensen
Comment 9 2021-08-26 17:40:05 PDT
Alex Christensen
Comment 10 2021-08-26 17:40:24 PDT
Still no data migration, but this is getting closer.
Kate Cheney
Comment 11 2021-08-27 10:33:44 PDT
Comment on attachment 436593 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436593&action=review Ok, took a closer pass today and I think this is almost there. I suggested one optimization for clearing now that we have a new schema, and had a few other nits. Then the grandfathering API tests seem to be failing as well. Grandfathering happens in WebResourceLoadStatisticsStore::populateMemoryStoreFromDisk() based on whether the file is new or not, so maybe a bug in the database open code? > Source/WebKit/NetworkProcess/DatabaseUtilities.cpp:110 > + return createdNewFile; This return is not needed. > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:425 > +String Database::attributionToString(WebCore::SQLiteStatement* statement, PrivateClickMeasurementAttributionType attributionType) Optional, could add ForTesting at the end of this function. > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:482 > + ASSERT(!RunLoop::isMain()); Now that we have a separate table with all PCM domains and foreign key constraints, we can create one query to remove the domains from the PCMObservedDomains table to accomplish this task. Then we could replace this function with something simpler, like this: void Database::clearPrivateClickMeasurement(std::optional<RegistrableDomain> domain) { ASSERT(!RunLoop::isMain()); // Default to clear all entries if no domain is specified. String bindParameter = "%"; if (domain) { auto domainIDToMatch = domainID(*domain); if (!domainIDToMatch) return; bindParameter = String::number(*domainIDToMatch); } auto transactionScope = beginTransactionIfNecessary(); auto clearAllPrivateClickMeasurementScopedStatement = this->scopedStatement(m_ clearAllPrivateClickMeasurementStatement, clearAllPrivateClickMeasurementQuery, "clearPrivateClickMeasurement"_s); if (!clearAllPrivateClickMeasurementScopedStatement || clearAllPrivateClickMeasurementScopedStatement->bindText(1, bindParameter) != SQLITE_OK || clearAllPrivateClickMeasurementScopedStatement->step() != SQLITE_DONE) { ITP_RELEASE_LOG_ERROR(m_sessionID, "%p - ResourceLoadStatisticsStore::clearPrivateClickMeasurement clearAllPrivateClickMeasurementScopedStatement, error message: %" PRIVATE_LOG_STRING, this, m_database.lastErrorMsg()); ASSERT_NOT_REACHED(); } } You will need a new clearAllPrivateClickMeasurementQuery (DELETE FROM PCMObservedDomains WHERE domainID LIKE ?) and statement that remove the domains from the PCMObservedDomains table, but can remove the two others used in this function. > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:679 > + RELEASE_LOG_ERROR(PrivateClickMeasurement, "%p - Database::insertObservedDomain failed to bind, error message: %" PRIVATE_LOG_STRING, this, m_database.lastErrorMsg()); Database::insertObservedDomain -> Database::ensureDomainID > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementDatabase.cpp:685 > + RELEASE_LOG_ERROR(PrivateClickMeasurement, "%p - Database::insertObservedDomain failed to commit, error message: %" PRIVATE_LOG_STRING, this, m_database.lastErrorMsg()); Ditto. > Source/WebKit/NetworkProcess/PrivateClickMeasurement/PrivateClickMeasurementStore.cpp:102 > + postTask([this, protectedThis = makeRef(*this), sourceSite = sourceSite.isolatedCopy(), destinationSite = destinationSite.isolatedCopy(), attributionTriggerData = WTFMove(attributionTriggerData), ephemeralMeasurement = crossThreadCopy(ephemeralMeasurement), completionHandler = WTFMove(completionHandler)] () mutable { Why a crossThreadCopy of ephemeralMeasurement instead of an isolatedCopy?
Kate Cheney
Comment 12 2021-08-27 10:38:13 PDT
And, I think doing migration in a separate patch is fine, this one is quite big.
Kate Cheney
Comment 13 2021-08-27 11:18:35 PDT
Comment on attachment 436593 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436593&action=review > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:310 > enableForeignKeys(); You should add this in the PCM database as well, so you can use PCMObservedDomains to perform cascading deletions.
Alex Christensen
Comment 14 2021-08-27 11:38:53 PDT
(In reply to Kate Cheney from comment #11) > > + postTask([this, protectedThis = makeRef(*this), sourceSite = sourceSite.isolatedCopy(), destinationSite = destinationSite.isolatedCopy(), attributionTriggerData = WTFMove(attributionTriggerData), ephemeralMeasurement = crossThreadCopy(ephemeralMeasurement), completionHandler = WTFMove(completionHandler)] () mutable { > > Why a crossThreadCopy of ephemeralMeasurement instead of an isolatedCopy? ephemeralMeasurement is a std::optional, which has no isolatedCopy member. crossThreadCopy deals with this and calls isolatedCopy() of the contained type if the optional is engaged.
Alex Christensen
Comment 15 2021-08-27 11:39:30 PDT
Kate Cheney
Comment 16 2021-08-27 12:01:21 PDT
Comment on attachment 436646 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=436646&action=review r=me as soon as EWS is happy. > Source/WebKit/NetworkProcess/DatabaseUtilities.cpp:109 > + return createdNewFile; This return is still not needed :P
Alex Christensen
Comment 17 2021-08-27 12:08:45 PDT
Alex Christensen
Comment 18 2021-08-27 14:22:52 PDT
Radar WebKit Bug Importer
Comment 19 2021-08-27 14:23:21 PDT
Note You need to log in before you can comment on or make changes to this bug.