Bug 205121

Summary: Add timeStamp to ITP database
Product: WebKit Reporter: Kate Cheney <katherine_cheney>
Component: WebKit Misc.Assignee: Kate Cheney <katherine_cheney>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, cdumez, commit-queue, dbates, ews-watchlist, japhet, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Kate Cheney 2019-12-11 09:42:19 PST
Add a way to store most-recently-updated timestamps in the ITP database
Comment 1 Kate Cheney 2019-12-11 09:42:50 PST
<rdar://problem/57633021>
Comment 2 Kate Cheney 2019-12-11 10:23:48 PST
Created attachment 385405 [details]
Patch
Comment 3 Kate Cheney 2019-12-11 13:31:58 PST
Created attachment 385433 [details]
Patch
Comment 4 John Wilander 2019-12-13 16:22:50 PST
Comment on attachment 385433 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385433&action=review

It's hard for me to see what we're trying to achieve here. I think it is a timestamp of the latest time a third-party was loaded under a specific first-party either as a sub frame or as a subresource. The change log comment on "frame navigation" and "cross site load with link decoration" indicates that there are exceptions. Are there? Also, frame navigation is used for both top and sub frame navigations.

Shouldn't these changes be done in the persistent store too? Or are we certain we will never go back?

> Source/WebKit/ChangeLog:14
> +        2. On a cross site load with link decoration

Can you reason about why these two and not just page loads?

> Source/WebKit/ChangeLog:17
> +        should be exposed via API.

You probably want to land the bug fix in a separate patch. That's good practice to 1) not get bug fixes rolled out because something else needs to be rolled out. and 2) patches should have a single purpose if possible.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:829
> +    return timesUnderFirstParty > 0 && isPrevalentResource(domain);

This is the bug fix, right? Looks like something that could land in its own patch with an updated test case to make sure the API doesn't report non-prevalent things.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1841
> +        return Seconds { -1 };

This is reusing what we interpret as "never," right? A comment on what -1 means would be good. But since you use it three times here I don't know. We should probably have a const declaration of -1 Seconds that spells out what it means.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1853
> +        return Seconds { -1 };

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1857
> +        return Seconds { -1 };

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1859
> +    double max = m_getMostRecentlyUpdatedTimestampStatement.getColumnDouble(0);

The name max here is vague. If it were the result of calculating the max out of two values I'd get it but now it just gets a value. Would mostRecentlyUpdatedTimestamp work?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1861
> +    return Seconds { max };

Here max gets even more confusing.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:108
> +    ThirdPartyDataForSpecificFirstParty thirdPartyDataForSpecificFirstParty { firstPartyDomain, thirdPartyHasStorageAccess, Seconds { -1 }};

Another case of the -1 Seconds issue.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:88
> +        return makeString("Has been granted storage access under ", firstPartyDomain.string(), ": ", storageAccessGranted ? '1' : '0', "; Has been seen under ", firstPartyDomain.string(), " within 24 hours: ", WallTime::now().secondsSinceEpoch() - timeLastUpdated < 24_h ? '1' : '0');

I think "in the last 24 hours" is better than "within."

> LayoutTests/ChangeLog:12
> +        prevalent.

Ah, so there's your test of the bug fix.

> LayoutTests/ChangeLog:1624
> +        [ MacOS ] http/tests/resourceLoadStatistics/cookie-deletion.html is timing out

This is unrelated, right? Better to land in an unreviewed test gardening patch.

> LayoutTests/ChangeLog:2192
> +        [ MacOS ] http/tests/resourceLoadStatistics/cookie-deletion.html is timing out

Ditto.
Comment 5 Kate Cheney 2019-12-16 09:23:11 PST
Comment on attachment 385433 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385433&action=review

Thanks John! I think it would be best to make any updates to the memory store in a separate patch to keep this one smaller.

>> Source/WebKit/ChangeLog:14
>> +        2. On a cross site load with link decoration
> 
> Can you reason about why these two and not just page loads?

I did not include subresource loading in this list, but this patch will still update the timestamp for subresource loads when those statistics are merged into the database. I will update the ChangeLog to reflect this.

>> Source/WebKit/ChangeLog:17
>> +        should be exposed via API.
> 
> You probably want to land the bug fix in a separate patch. That's good practice to 1) not get bug fixes rolled out because something else needs to be rolled out. and 2) patches should have a single purpose if possible.

Good idea, I will put this in a new patch.

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1841
>> +        return Seconds { -1 };
> 
> This is reusing what we interpret as "never," right? A comment on what -1 means would be good. But since you use it three times here I don't know. We should probably have a const declaration of -1 Seconds that spells out what it means.

Good idea. I'll add something like #define NO_EXISTING_TIMESTAMP -1
Comment 6 Kate Cheney 2019-12-16 10:14:13 PST
Created attachment 385778 [details]
Patch
Comment 7 Kate Cheney 2019-12-16 10:25:14 PST
Created attachment 385782 [details]
Patch
Comment 8 Kate Cheney 2019-12-16 17:08:56 PST
> > You probably want to land the bug fix in a separate patch. That's good practice to 1) not get bug fixes rolled out because something else needs to be rolled out. and 2) patches should have a single purpose if possible.
> 
> Good idea, I will put this in a new patch.
> 

Link:
https://bugs.webkit.org/show_bug.cgi?id=205281
Comment 9 Kate Cheney 2019-12-17 15:54:00 PST
Created attachment 385925 [details]
Patch
Comment 10 Kate Cheney 2019-12-17 15:54:16 PST
making this patch slightly bigger, but now that https://bugs.webkit.org/show_bug.cgi?id=204932 has landed, this patch should include the timestamp in the new API classes.
Comment 11 Kate Cheney 2019-12-19 14:16:08 PST
*** Bug 205289 has been marked as a duplicate of this bug. ***
Comment 12 John Wilander 2019-12-19 16:56:14 PST
Comment on attachment 385925 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385925&action=review

I think we're good here. Please see my comments and questions before landing.

> Source/WebKit/ChangeLog:13
> +        _getResourceLoadStatisticsDataSummaryAPI.

Missing space before API.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:664
> +        if (insertRelationshipStatement.bindDouble(2, WallTime::now().secondsSinceEpoch().value()) != SQLITE_OK) {

Just double checking:
1) Does this cover all the types of data ITP collects? Subresource, sub frame, web sockets (which map to subresource)?
2) Is this timestamp double-keyed so that it's for the third-party under a specific first-party?

I was expecting a timestamp capture where we capture the loads but you can absolutely do it at the sink too.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1859
> +    double mostRecentlyUpdatedTimestamp = m_getMostRecentlyUpdatedTimestampStatement.getColumnDouble(0);

Much clearer variable name.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:108
> +    WebResourceLoadStatisticsStore::ThirdPartyDataForSpecificFirstParty thirdPartyDataForSpecificFirstParty { firstPartyDomain, thirdPartyHasStorageAccess, Seconds { NO_EXISTING_TIMESTAMP }};

Thanks for introducing a constant. See my comment on what its definition though.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.h:42
> +#define NO_EXISTING_TIMESTAMP -1

I think this should instead be …

static constexpr Seconds NoExistingTimestamp = { -1 };

… in the ResourceLoadStatistics namespace and then used as ResourceLoadStatistics::NoExistingTimestamp. Note that you then don't have to instantiate new Seconds objects in all those places.

See for instance LoggedInStatus::TimeToLiveShort and WatchDog::noTimeLimit.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:140
> +            return WTF::nullopt;

Will this lead to us dumping all the existing data? Because it won't have these timestamps. In the case of ResourceLoadStatistics.cpp, there's a modelVersion to check for and a graceful import of data when the source is less than the new model.

> Tools/ChangeLog:10
> +        in the Resource Load Statistics database backend.

You're mixing Resource Load Statistics and ITP in this entry. Let's stick to one.

> Tools/ChangeLog:11
> +        This also adds an api test case using the ITP database store.

api -> API

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:901
> +    // Teach ITP about bad origins:

Change colon to period.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:923
> +    // capture time for comparison later

Capital c and period.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:926
> +    

Remove double linefeed.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:942
> +

Remove double linefeed.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:1020
> +    // it appears under or redirects to

Period.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:1049
> +        // check timestamp for evil3 is reported as being within the correct range

Start with capital C and end with a period.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:1083
> +        // check timestamp for evil1 is reported as being within the correct range

Start with capital C and end with a period.

> Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm:1108
> +        // check timestamp for evil2 is reported as being within the correct range

Start with capital C and end with a period.
Comment 13 Kate Cheney 2019-12-19 17:16:30 PST
Comment on attachment 385925 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=385925&action=review

Thanks! Addressing these before uploading.

>> Source/WebKit/ChangeLog:13
>> +        _getResourceLoadStatisticsDataSummaryAPI.
> 
> Missing space before API.

good catch!

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:664
>> +        if (insertRelationshipStatement.bindDouble(2, WallTime::now().secondsSinceEpoch().value()) != SQLITE_OK) {
> 
> Just double checking:
> 1) Does this cover all the types of data ITP collects? Subresource, sub frame, web sockets (which map to subresource)?
> 2) Is this timestamp double-keyed so that it's for the third-party under a specific first-party?
> 
> I was expecting a timestamp capture where we capture the loads but you can absolutely do it at the sink too.

1) It captures the following relationships: subresourceUnderTopFrame, subFrameUnderTopFrame, subresourceUniqueRedirectTo, and crossSiteLoadWithLinkDecoration. I see web sockets are added to subresourcesUnderTopFrameDomains in WebResourceLoadObserver.cpp, so they will be captured with a timestamp.

2) the schema is | thirdParty | firstParty | timestamp |, so the timestamp will always be associated with a specific pair.

I thought about that to, but this came out a lot cleaner and has everything in one place (as opposed to trying to log subresource and subframe loads at different places).

>> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.h:42
>> +#define NO_EXISTING_TIMESTAMP -1
> 
> I think this should instead be …
> 
> static constexpr Seconds NoExistingTimestamp = { -1 };
> 
> … in the ResourceLoadStatistics namespace and then used as ResourceLoadStatistics::NoExistingTimestamp. Note that you then don't have to instantiate new Seconds objects in all those places.
> 
> See for instance LoggedInStatus::TimeToLiveShort and WatchDog::noTimeLimit.

Noted, I will change this

>> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:140
>> +            return WTF::nullopt;
> 
> Will this lead to us dumping all the existing data? Because it won't have these timestamps. In the case of ResourceLoadStatistics.cpp, there's a modelVersion to check for and a graceful import of data when the source is less than the new model.

Thats a very good point, I'll try to copy that logic.

>> Tools/ChangeLog:10
>> +        in the Resource Load Statistics database backend.
> 
> You're mixing Resource Load Statistics and ITP in this entry. Let's stick to one.

Agreed.
Comment 14 Kate Cheney 2019-12-19 17:37:51 PST
> >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:140
> >> +            return WTF::nullopt;
> > 
> > Will this lead to us dumping all the existing data? Because it won't have these timestamps. In the case of ResourceLoadStatistics.cpp, there's a modelVersion to check for and a graceful import of data when the source is less than the new model.
> 
> Thats a very good point, I'll try to copy that logic.

Update: John and I talked offline and this is a larger fix that will be a separate patch.
Comment 15 Kate Cheney 2019-12-19 17:47:14 PST
Created attachment 386170 [details]
Patch
Comment 16 Kate Cheney 2019-12-19 17:48:00 PST
Going to let bots run their course before landing
Comment 17 WebKit Commit Bot 2019-12-20 00:11:51 PST
Comment on attachment 386170 [details]
Patch

Rejecting attachment 386170 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-03', 'apply-attachment', '--no-update', '--non-interactive', 386170, '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Logging in as commit-queue@webkit.org...
Fetching: https://bugs.webkit.org/attachment.cgi?id=386170&action=edit
Fetching: https://bugs.webkit.org/show_bug.cgi?id=205121&ctype=xml&excludefield=attachmentdata
Processing 1 patch from 1 bug.
Processing patch 386170 from bug 205121.
Fetching: https://bugs.webkit.org/attachment.cgi?id=386170
Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Parsed 20 diffs from patch file(s).
patching file Source/WebKit/ChangeLog
patching file Source/WebCore/loader/ResourceLoadStatistics.h
patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp
patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h
patching file Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp
patching file Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h
patching file Source/WebKit/UIProcess/API/APIResourceLoadStatisticsFirstParty.h
patching file Source/WebKit/UIProcess/API/Cocoa/WKWebViewConfigurationPrivate.h
patching file Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm
patching file Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStorePrivate.h
patching file Source/WebKit/UIProcess/API/Cocoa/_WKResourceLoadStatisticsFirstParty.h
patching file Source/WebKit/UIProcess/API/Cocoa/_WKResourceLoadStatisticsFirstParty.mm
patching file Tools/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
patching file Tools/TestWebKitAPI/Tests/WebKitCocoa/ResourceLoadStatistics.mm
patching file LayoutTests/ChangeLog
Hunk #1 succeeded at 1 with fuzz 3.
Hunk #2 succeeded at 2772 (offset 60 lines).
Hunk #3 succeeded at 3340 (offset 60 lines).
patching file LayoutTests/http/tests/resourceLoadStatistics/aggregate-sorted-data-no-storage-access-database-expected.txt
patching file LayoutTests/http/tests/resourceLoadStatistics/aggregate-sorted-data-no-storage-access-expected.txt
patching file LayoutTests/http/tests/storageAccess/aggregate-sorted-data-with-storage-access-database-expected.txt
patching file LayoutTests/http/tests/storageAccess/aggregate-sorted-data-with-storage-access-expected.txt
patching file 'WebKit.xcworkspace/xcshareddata/xcschemes/All Source.xcscheme'
Hunk #1 FAILED at 57.
1 out of 1 hunk FAILED -- saving rejects to file 'WebKit.xcworkspace/xcshareddata/xcschemes/All Source.xcscheme.rej'

Failed to run "[u'/Volumes/Data/EWS/WebKit/Tools/Scripts/svn-apply', '--force']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Full output: https://webkit-queues.webkit.org/results/13293768
Comment 18 Kate Cheney 2019-12-20 21:09:14 PST
Created attachment 386286 [details]
Patch
Comment 19 WebKit Commit Bot 2019-12-21 06:58:09 PST
Comment on attachment 386286 [details]
Patch

Clearing flags on attachment: 386286

Committed r253863: <https://trac.webkit.org/changeset/253863>
Comment 20 WebKit Commit Bot 2019-12-21 06:58:11 PST
All reviewed patches have been landed.  Closing bug.