Bug 203432

Summary: Expose basic ITP data from the database for future API/SPI use
Product: WebKit Reporter: Kate Cheney <katherine_cheney>
Component: WebKit Misc.Assignee: Kate Cheney <katherine_cheney>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, cdumez, commit-queue, dbates, ews-watchlist, Hironori.Fujii, japhet, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Kate Cheney 2019-10-25 13:02:03 PDT
Expose basic ITP data from the database for future API/SPI use. The details can be changed and developed as new needs arise for the Safari ITP UI or any other potential clients.
Comment 1 Kate Cheney 2019-10-25 13:03:05 PDT
Associated with <rdar://problem/56085040>
Comment 2 Kate Cheney 2019-10-25 14:18:09 PDT
Created attachment 381964 [details]
Patch
Comment 3 Kate Cheney 2019-12-02 11:04:22 PST
Created attachment 384641 [details]
Patch
Comment 4 Kate Cheney 2019-12-02 11:41:19 PST
Created attachment 384643 [details]
Patch
Comment 5 Kate Cheney 2019-12-02 12:34:28 PST
Created attachment 384649 [details]
Patch
Comment 6 Brent Fulgham 2019-12-02 12:41:59 PST
Comment on attachment 384649 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384649&action=review

> Source/WebCore/loader/ResourceLoadStatistics.cpp:420
> +    return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < SECONDS_IN_DAY;

This could probably be written as:

return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < 24_h;

Then you wouldn't need the repeated #define for SECONDS_IN_DAY

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:2425
> +    return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < SECONDS_IN_DAY;

Ditto:

return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < 24_h;
Comment 7 John Wilander 2019-12-02 14:29:51 PST
Comment on attachment 384649 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384649&action=review

> Source/WebCore/ChangeLog:17
> +

Nit: I would reverse this to "Updated the toString() to report only if it was in the last 24 hours as opposed to reporting an actual time since mostRecentUserInteractionTime changes with each test run."

> Source/WebKit/ChangeLog:14
> +        Adds ability to collect sorted ITP data for displaying in a

Nit: "Adds *the ability …"

> Source/WebKit/ChangeLog:15
> +        user interface. Additional registrable domain data can be added based

Nit: Not sure that we have to call out potential use cases since that's up to whichever application calls the API/SPI.

> Source/WebKit/ChangeLog:30
> +        (WebKit::CompletionHandler<void):

This looks weird. I know the change log script gets confused by the ResourceLoadStatisticsDatabaseStore class. Please check the change entries manually. (I had to do so the last time.)

> Source/WebKit/ChangeLog:33
> +        joinSubStatisticsForSorting() to be used to collect sorted ITP data

Please add the class to the function name so that it's clear where it belongs.

> Source/WebCore/loader/ResourceLoadStatistics.cpp:36
> +#define SECONDS_IN_DAY 86400

Skip the define and use 24_h instead. Search for 24_h and you'll see it used. Ah, I now saw Brent pointing out the same thing!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:66
> +#define SECONDS_IN_DAY 86400

Ditto.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:809
> +static bool isThirdParty(unsigned timesUnderFirstParty)

By the look of its implementation, this function should probably be called hasBeenThirdParty().

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:126
> +static bool isThirdParty(const ResourceLoadStatistics& statistic)

Similar, hasBeenThirdParty().

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:556
> +        result.append("\nITP User Interface Data:\n");

I'd rather have this string say what we know in WebKit, such as just "ITP Data:".

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
> +    bool storageAccessGranted;

This doesn't really make sense to me since only third-parties can request and be granted storage access.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:97
> +    Vector<FirstPartyData> associatedFirstParties;

This is vague. Associated how? We've used the term for domain names owned by the same organization which I don't think you mean here. Maybe underFirstParties?

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:236
> +    Vector<ThirdPartyData> gatherDataForUserInterface();

This sounds like WebKit decides where this data is to be used. But it's just an API, right? Also, it's a getter but it sounds like an action. We don't know what the data will be used for. Let's name it what it is, such as aggregatedThirdPartyData().
Comment 8 Kate Cheney 2019-12-02 14:45:43 PST
Comment on attachment 384649 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384649&action=review

Thanks for the feedback!

>> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
>> +    bool storageAccessGranted;
> 
> This doesn't really make sense to me since only third-parties can request and be granted storage access.

good point, maybe "hasGrantedStorageAccess" instead?

>> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:97
>> +    Vector<FirstPartyData> associatedFirstParties;
> 
> This is vague. Associated how? We've used the term for domain names owned by the same organization which I don't think you mean here. Maybe underFirstParties?

good idea!
Comment 9 John Wilander 2019-12-02 15:00:53 PST
(In reply to katherine_cheney from comment #8)
> Comment on attachment 384649 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=384649&action=review
> 
> Thanks for the feedback!
> 
> >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
> >> +    bool storageAccessGranted;
> > 
> > This doesn't really make sense to me since only third-parties can request and be granted storage access.
> 
> good point, maybe "hasGrantedStorageAccess" instead?

Since the first party is not involved in granting storage access (except if it uses an iframe sandbox), I'd prefer if it's clear that it's about third-parties that have requested and been granted access under some first-party.

Maybe we can come up with something if you explain what this boolean means. It's part of FirstPartyData and can only be true/false. Is it "This first party has at least once been granted storage access when third-party"?


> >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:97
> >> +    Vector<FirstPartyData> associatedFirstParties;
> > 
> > This is vague. Associated how? We've used the term for domain names owned by the same organization which I don't think you mean here. Maybe underFirstParties?
> 
> good idea!
Comment 10 Kate Cheney 2019-12-02 15:04:58 PST
(In reply to John Wilander from comment #9)
> (In reply to katherine_cheney from comment #8)
> > Comment on attachment 384649 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=384649&action=review
> > 
> > Thanks for the feedback!
> > 
> > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
> > >> +    bool storageAccessGranted;
> > > 
> > > This doesn't really make sense to me since only third-parties can request and be granted storage access.
> > 
> > good point, maybe "hasGrantedStorageAccess" instead?
> 
> Since the first party is not involved in granting storage access (except if
> it uses an iframe sandbox), I'd prefer if it's clear that it's about
> third-parties that have requested and been granted access under some
> first-party.
> 
> Maybe we can come up with something if you explain what this boolean means.
> It's part of FirstPartyData and can only be true/false. Is it "This first
> party has at least once been granted storage access when third-party"?
> 

It's based on the data structure we talked about a few weeks ago with Matt, where each third party has a list of first parties based on what it has been a subframe/subresource under or redirected to. So this bool is whether the third party that this first party is listed under has been granted storage access for that specific first party
Comment 11 John Wilander 2019-12-02 15:17:35 PST
(In reply to katherine_cheney from comment #10)
> (In reply to John Wilander from comment #9)
> > (In reply to katherine_cheney from comment #8)
> > > Comment on attachment 384649 [details]
> > > Patch
> > > 
> > > View in context:
> > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review
> > > 
> > > Thanks for the feedback!
> > > 
> > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
> > > >> +    bool storageAccessGranted;
> > > > 
> > > > This doesn't really make sense to me since only third-parties can request and be granted storage access.
> > > 
> > > good point, maybe "hasGrantedStorageAccess" instead?
> > 
> > Since the first party is not involved in granting storage access (except if
> > it uses an iframe sandbox), I'd prefer if it's clear that it's about
> > third-parties that have requested and been granted access under some
> > first-party.
> > 
> > Maybe we can come up with something if you explain what this boolean means.
> > It's part of FirstPartyData and can only be true/false. Is it "This first
> > party has at least once been granted storage access when third-party"?
> > 
> 
> It's based on the data structure we talked about a few weeks ago with Matt,
> where each third party has a list of first parties based on what it has been
> a subframe/subresource under or redirected to. So this bool is whether the
> third party that this first party is listed under has been granted storage
> access for that specific first party

I see. The confusion comes from the boolean about storage access being about a third-party but stored in a struct about a first-party. The boolean needs to be owned about the third-party struct or the FirstPartyData needs to have a name that indicates that it's about the third-party.

An easy way to think of this is to look at each of your two structs alone. It should make sense on its own. As it stands, the FirstPartyData struct only makes sense in conjunction with the ThirdPartyData struct.

I suggest you skip the FirstPartyData struct and instead …

1. Add a boolean enum HasBeenGrantedStorageAccess { Yes, No }.

2. Replace ThirdPartyData's Vector<FirstPartyData> with a HashMap of <RegistrableDomain, HasBeenGrantedStorageAccess>.
Comment 12 Kate Cheney 2019-12-02 15:22:28 PST
(In reply to John Wilander from comment #11)
> (In reply to katherine_cheney from comment #10)
> > (In reply to John Wilander from comment #9)
> > > (In reply to katherine_cheney from comment #8)
> > > > Comment on attachment 384649 [details]
> > > > Patch
> > > > 
> > > > View in context:
> > > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review
> > > > 
> > > > Thanks for the feedback!
> > > > 
> > > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
> > > > >> +    bool storageAccessGranted;
> > > > > 
> > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access.
> > > > 
> > > > good point, maybe "hasGrantedStorageAccess" instead?
> > > 
> > > Since the first party is not involved in granting storage access (except if
> > > it uses an iframe sandbox), I'd prefer if it's clear that it's about
> > > third-parties that have requested and been granted access under some
> > > first-party.
> > > 
> > > Maybe we can come up with something if you explain what this boolean means.
> > > It's part of FirstPartyData and can only be true/false. Is it "This first
> > > party has at least once been granted storage access when third-party"?
> > > 
> > 
> > It's based on the data structure we talked about a few weeks ago with Matt,
> > where each third party has a list of first parties based on what it has been
> > a subframe/subresource under or redirected to. So this bool is whether the
> > third party that this first party is listed under has been granted storage
> > access for that specific first party
> 
> I see. The confusion comes from the boolean about storage access being about
> a third-party but stored in a struct about a first-party. The boolean needs
> to be owned about the third-party struct or the FirstPartyData needs to have
> a name that indicates that it's about the third-party.
> 
> An easy way to think of this is to look at each of your two structs alone.
> It should make sense on its own. As it stands, the FirstPartyData struct
> only makes sense in conjunction with the ThirdPartyData struct.
> 
> I suggest you skip the FirstPartyData struct and instead …
> 
> 1. Add a boolean enum HasBeenGrantedStorageAccess { Yes, No }.
> 
> 2. Replace ThirdPartyData's Vector<FirstPartyData> with a HashMap of
> <RegistrableDomain, HasBeenGrantedStorageAccess>.

I see your point about how it should make sense alone, but my only thought is if we want to add more data about each third-party/first-party pair in the future, like a last seen time. Then we would have to rethink the hash map.
Comment 13 John Wilander 2019-12-02 15:33:47 PST
(In reply to katherine_cheney from comment #12)
> (In reply to John Wilander from comment #11)
> > (In reply to katherine_cheney from comment #10)
> > > (In reply to John Wilander from comment #9)
> > > > (In reply to katherine_cheney from comment #8)
> > > > > Comment on attachment 384649 [details]
> > > > > Patch
> > > > > 
> > > > > View in context:
> > > > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review
> > > > > 
> > > > > Thanks for the feedback!
> > > > > 
> > > > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82
> > > > > >> +    bool storageAccessGranted;
> > > > > > 
> > > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access.
> > > > > 
> > > > > good point, maybe "hasGrantedStorageAccess" instead?
> > > > 
> > > > Since the first party is not involved in granting storage access (except if
> > > > it uses an iframe sandbox), I'd prefer if it's clear that it's about
> > > > third-parties that have requested and been granted access under some
> > > > first-party.
> > > > 
> > > > Maybe we can come up with something if you explain what this boolean means.
> > > > It's part of FirstPartyData and can only be true/false. Is it "This first
> > > > party has at least once been granted storage access when third-party"?
> > > > 
> > > 
> > > It's based on the data structure we talked about a few weeks ago with Matt,
> > > where each third party has a list of first parties based on what it has been
> > > a subframe/subresource under or redirected to. So this bool is whether the
> > > third party that this first party is listed under has been granted storage
> > > access for that specific first party
> > 
> > I see. The confusion comes from the boolean about storage access being about
> > a third-party but stored in a struct about a first-party. The boolean needs
> > to be owned about the third-party struct or the FirstPartyData needs to have
> > a name that indicates that it's about the third-party.
> > 
> > An easy way to think of this is to look at each of your two structs alone.
> > It should make sense on its own. As it stands, the FirstPartyData struct
> > only makes sense in conjunction with the ThirdPartyData struct.
> > 
> > I suggest you skip the FirstPartyData struct and instead …
> > 
> > 1. Add a boolean enum HasBeenGrantedStorageAccess { Yes, No }.
> > 
> > 2. Replace ThirdPartyData's Vector<FirstPartyData> with a HashMap of
> > <RegistrableDomain, HasBeenGrantedStorageAccess>.
> 
> I see your point about how it should make sense alone, but my only thought
> is if we want to add more data about each third-party/first-party pair in
> the future, like a last seen time. Then we would have to rethink the hash
> map.

That is true. I'm often in favor of keeping the change as simple as possible for the task at hand.

If you prefer a struct for the the kind of data we'd want to catch/expose, we have to come up with a better name than FirstPartyData. The stand-alone test needs to pass. :)

Maybe ThirdPartyDataForSpecificFirstParty? That makes it clear that it's data associated with the third-party but tied to what that third-party has been up to under a specific first-party.
Comment 14 Kate Cheney 2019-12-02 15:46:52 PST
Created attachment 384665 [details]
Patch
Comment 15 John Wilander 2019-12-03 17:50:45 PST
Comment on attachment 384665 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=384665&action=review

Please address comments before landing.

> Source/WebCore/loader/ResourceLoadStatistics.cpp:418
> +    return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < Seconds(24_h).value();

24_h is a Seconds object. Do we need to wrap it this way and call value()? If not, please change before landing.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:789
> +Vector<ThirdPartyDataForSpecificFirstParty> ResourceLoadStatisticsDatabaseStore::getFirstPartyDataForDomain(unsigned domainID, const RegistrableDomain& domainString) const

I would prefer this function name to be aligned with the return type. Is the domainString the third-party? Maybe we could make that clearer too? Anyway, I trust you'll come up with a new function name parameter name that make sense given what the function does and what it returns.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:2423
> +    return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < Seconds(24_h).value();

Same thing here with the Seconds(24_h) wrapping. Needed?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:112
> +static Vector<ThirdPartyDataForSpecificFirstParty> getFirstPartyDataForDomain(const ResourceLoadStatistics& statistic)

Same thing here for the function name and parameter name.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:80
> +struct ThirdPartyDataForSpecificFirstParty {

Much better.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:86
> +        return makeString(firstPartyDomain.string(), ", Has Granted Storage Access: ", storageAccessGranted);

Hmm. Now I'm confused. The user grants storage access, neither the first or third party. And the third-party requests storage access, not the first party. So the first party is never involved in anything around requesting or granting storage access. The data point should say whether or not a third party has been granted storage access under a first party. So maybe a generated string saying "Has been granted storage access under { first-party domain string } : { boolean }." Unfortunately, this means regenerating expect files for your tests.
Comment 16 Kate Cheney 2019-12-04 10:00:31 PST
Created attachment 384822 [details]
Patch
Comment 17 Kate Cheney 2019-12-04 10:01:22 PST
Thanks for all the help John! I made the naming changes you suggested plus a few others that clarified the distinction between third/first party information. I also had to update interactionTimeSeconds to be WTF::Seconds type instead of a double so it can be compared directly to 24_h (and I think it makes sense to have it as a seconds value instead of a double because the function compares time values).

Going to let it run through EWS before I cq+
Comment 18 WebKit Commit Bot 2019-12-04 12:06:34 PST
Comment on attachment 384822 [details]
Patch

Clearing flags on attachment: 384822

Committed r253118: <https://trac.webkit.org/changeset/253118>
Comment 19 WebKit Commit Bot 2019-12-04 12:06:36 PST
All reviewed patches have been landed.  Closing bug.
Comment 20 Fujii Hironori 2019-12-04 19:55:13 PST
Filed: Bug 204870 – [MSVC] WebResourceLoadStatisticsStore.h is reporting warning C4804: '/': unsafe use of type 'bool' in operation