Bug 223726

Summary: Implement PCM SQLite changes based on spec review
Product: WebKit Reporter: Kate Cheney <katherine_cheney>
Component: WebKit Misc.Assignee: Kate Cheney <katherine_cheney>
Status: RESOLVED FIXED    
Severity: Normal CC: beidson, bfulgham, sihui_liu, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Kate Cheney 2021-03-24 20:06:29 PDT
We should update some things in PCM’s SQLite database based on spec review.
Comment 1 Kate Cheney 2021-03-24 20:08:15 PDT
Created attachment 424212 [details]
Patch
Comment 2 Kate Cheney 2021-03-25 13:03:27 PDT
<rdar://problem/75818526>
Comment 3 Kate Cheney 2021-03-25 13:06:45 PDT
cq- was for an unrelated flaky test.
Comment 4 John Wilander 2021-03-25 14:58:08 PDT
Comment on attachment 424212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424212&action=review

Ugh. This is why I don't like databases. But I do like folks like yourself who are proficient at it and willing to make these kind of changes. "Name change? Let me just write hundreds of lines of code." I'll let a DB expert formally review this. From a spec standpoint, this looks good.

> Source/WebKit/ChangeLog:11
> +        based on spec review. Second, it adds support for sending reports to

You can add this link as a reference for anyone who wants to dig into the spec and name change: https://trac.webkit.org/changeset/275046/webkit
Comment 5 Brady Eidson 2021-03-25 16:43:46 PDT
SQLite stuff looks fine, but I didn't review the rest in depth.
Comment 6 Sihui Liu 2021-03-25 16:44:13 PDT
Comment on attachment 424212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424212&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:284
> +static Vector<String> expectedUnattributedColumns()

static const Vector<String> expectedUnattributedColumns()
{
    static const auto columns = makeNeverDestroyed(Vector<String> { ... });
    return columns;
}

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:291
> +    return { "sourceSiteDomainID", "destinationSiteDomainID", "sourceID", "attributionTriggerData", "priority", "timeOfAdClick", "earliestTimeToSendToSource", "token", "signature", "keyID", "earliestTimeToSendToDestination" };

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:296
> +    return !schema.isEmpty() && schema.contains("REFERENCES TopLevelDomains");

!schema.isEmpty() is not needed.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:451
> +    auto currentSchema = tableSchema("AttributedPrivateClickMeasurement");

You are checking a different database scheme with the change. How can we make sure the old table schema is correct if this check pass? 

It seems we can add missing tables at different times. Maybe add asserts that the other tables are correct or add separate checks for each.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:540
> +{

You may want to assert size of existingColumns is less than or equal to size of expectedColumns; and no existingColumns is in unexpected

> Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.h:199
> +    void statisticsDatabaseColumnsForTable(const String&, CompletionHandler<void(Vector<String>&&)>&&);

I don't quite like this SPI because it is checking the low-level data, and I would expect an high-level error when a column is missing in the table. 
For example the test may verify this like: if the database is upgraded successfully, we can get the data from the store; otherwise we should see a failure/error.
Comment 7 Kate Cheney 2021-03-25 17:04:59 PDT
Comment on attachment 424212 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424212&action=review

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:284
>> +static Vector<String> expectedUnattributedColumns()
> 
> static const Vector<String> expectedUnattributedColumns()
> {
>     static const auto columns = makeNeverDestroyed(Vector<String> { ... });
>     return columns;
> }

Will fix.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:291
>> +    return { "sourceSiteDomainID", "destinationSiteDomainID", "sourceID", "attributionTriggerData", "priority", "timeOfAdClick", "earliestTimeToSendToSource", "token", "signature", "keyID", "earliestTimeToSendToDestination" };
> 
> Ditto.

Ditto.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:296
>> +    return !schema.isEmpty() && schema.contains("REFERENCES TopLevelDomains");
> 
> !schema.isEmpty() is not needed.

Will remove.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:451
>> +    auto currentSchema = tableSchema("AttributedPrivateClickMeasurement");
> 
> You are checking a different database scheme with the change. How can we make sure the old table schema is correct if this check pass? 
> 
> It seems we can add missing tables at different times. Maybe add asserts that the other tables are correct or add separate checks for each.

This will be the last thing that happens -- first we will add missing tables, add missing columns, and rename columns, then we check for something that requires a full migration. In this case, attributedPrivateClickMeasurementSchemaV1(). I don't think I understand your comment fully.

I can add asserts for the other tables if we think that is not wasteful or excessive. I hesitated on doing that because most other tables haven't changed since the initial implementation.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:540
>> +{
> 
> You may want to assert size of existingColumns is less than or equal to size of expectedColumns; and no existingColumns is in unexpected

Good idea.

>> Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.h:199
>> +    void statisticsDatabaseColumnsForTable(const String&, CompletionHandler<void(Vector<String>&&)>&&);
> 
> I don't quite like this SPI because it is checking the low-level data, and I would expect an high-level error when a column is missing in the table. 
> For example the test may verify this like: if the database is upgraded successfully, we can get the data from the store; otherwise we should see a failure/error.

I think this is good in theory. In practice it may be difficult because of the way we fail if there is a database error. We mostly rely on debug asserts. I think it would be good in the future to surface errors in this case.
Comment 8 Kate Cheney 2021-03-25 17:23:59 PDT
Created attachment 424306 [details]
Patch
Comment 9 Kate Cheney 2021-03-25 20:09:07 PDT
Created attachment 424312 [details]
Patch
Comment 10 Kate Cheney 2021-03-25 20:13:37 PDT
(In reply to katherine_cheney from comment #9)
> Created attachment 424312 [details]
> Patch

This patch fixes everything from the comments except removing the testing SPI. We don't have a great way to unit test SQLite right now, or surface errors to the UI process if a schema migration fails to do what Sihui suggested. I think an SPI to check the schema is the best way to test this right now, but we should consider adding better infrastructure to test in the future.
Comment 11 Kate Cheney 2021-03-26 09:11:51 PDT
Created attachment 424362 [details]
Patch
Comment 12 Brent Fulgham 2021-03-26 10:32:26 PDT
Comment on attachment 424362 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424362&action=review

r=me. I have a few minor suggestions, but otherwise looks good.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:274
> +static const Vector<String> expectedUnattributedColumns()

Since this is a never-destroyed object, I would return a const reference to avoid making a copy. Also, why not use the ExpectedColumns type alias?

static const ExpectedColumns& expectedUnattributedColumns()

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:280
> +static const Vector<String> expectedAttributedColumns()

Ditto "const ExpectedColumns&"

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:444
> +    auto currentSchema = tableSchema(table);

Maybe:
auto currentSchema = tableSchema("AttributedPrivateClickMeasurement"_s)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:456
> +    auto oldSchema = tableSchema(table);

I wonder if the compiler could do a better job if you didn't stack allocate 'table' every time you hit this call.

Maybe:
auto oldSchema = tableSchema("TopFrameUniqueRedirectsTo"_s);

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:536
> +    Vector<String> columns;

Depending on how many columns table tend to have, there might be benefit to hinting at the capacity of 'columns'. (Vector::reserveCapacity)

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:571
> +    auto attributedTableName = "AttributedPrivateClickMeasurement"_s;

I would suggest making these 'const'.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:600
> +    auto foundAttributedTableColumns = columnsForTable(attributedTableName);

I think all of the above could be 'const'.
Comment 13 Kate Cheney 2021-03-26 11:33:29 PDT
Comment on attachment 424362 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=424362&action=review

Thanks for the review!

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:274
>> +static const Vector<String> expectedUnattributedColumns()
> 
> Since this is a never-destroyed object, I would return a const reference to avoid making a copy. Also, why not use the ExpectedColumns type alias?
> 
> static const ExpectedColumns& expectedUnattributedColumns()

Oops -- good catch. Will fix.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:280
>> +static const Vector<String> expectedAttributedColumns()
> 
> Ditto "const ExpectedColumns&"

Will fix.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:444
>> +    auto currentSchema = tableSchema(table);
> 
> Maybe:
> auto currentSchema = tableSchema("AttributedPrivateClickMeasurement"_s)

Will fix.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:456
>> +    auto oldSchema = tableSchema(table);
> 
> I wonder if the compiler could do a better job if you didn't stack allocate 'table' every time you hit this call.
> 
> Maybe:
> auto oldSchema = tableSchema("TopFrameUniqueRedirectsTo"_s);

Will fix.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:536
>> +    Vector<String> columns;
> 
> Depending on how many columns table tend to have, there might be benefit to hinting at the capacity of 'columns'. (Vector::reserveCapacity)

I think this would require knowing the expected number of columns, which we don't know in all cases (like WebResourceLoadStatisticsStore::statisticsDatabaseColumnsForTable). I could always allocate the max number of columns, but I don't know enough about vector allocation to know whether this is better than not reserving capacity at all.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:571
>> +    auto attributedTableName = "AttributedPrivateClickMeasurement"_s;
> 
> I would suggest making these 'const'.

Will fix.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:600
>> +    auto foundAttributedTableColumns = columnsForTable(attributedTableName);
> 
> I think all of the above could be 'const'.

Ditto.
Comment 14 Kate Cheney 2021-03-26 11:39:58 PDT
Created attachment 424384 [details]
Patch for landing
Comment 15 EWS 2021-03-26 12:14:08 PDT
Committed r275106: <https://commits.webkit.org/r275106>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 424384 [details].