Bug 203432

Summary: Expose basic ITP data from the database for future API/SPI use
Product: WebKit Reporter: Kate Cheney <katherine_cheney>
Component: WebKit Misc.Assignee: Kate Cheney <katherine_cheney>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, cdumez, commit-queue, dbates, ews-watchlist, Hironori.Fujii, japhet, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Kate Cheney
Reported 2019-10-25 13:02:03 PDT
Expose basic ITP data from the database for future API/SPI use. The details can be changed and developed as new needs arise for the Safari ITP UI or any other potential clients.
Attachments
Patch (26.94 KB, patch)
2019-10-25 14:18 PDT, Kate Cheney
no flags
Patch (60.49 KB, patch)
2019-12-02 11:04 PST, Kate Cheney
no flags
Patch (60.44 KB, patch)
2019-12-02 11:41 PST, Kate Cheney
no flags
Patch (65.34 KB, patch)
2019-12-02 12:34 PST, Kate Cheney
no flags
Patch (64.87 KB, patch)
2019-12-02 15:46 PST, Kate Cheney
no flags
Patch (65.88 KB, patch)
2019-12-04 10:00 PST, Kate Cheney
no flags
Kate Cheney
Comment 1 2019-10-25 13:03:05 PDT
Associated with <rdar://problem/56085040>
Kate Cheney
Comment 2 2019-10-25 14:18:09 PDT
Kate Cheney
Comment 3 2019-12-02 11:04:22 PST
Kate Cheney
Comment 4 2019-12-02 11:41:19 PST
Kate Cheney
Comment 5 2019-12-02 12:34:28 PST
Brent Fulgham
Comment 6 2019-12-02 12:41:59 PST
Comment on attachment 384649 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384649&action=review > Source/WebCore/loader/ResourceLoadStatistics.cpp:420 > + return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < SECONDS_IN_DAY; This could probably be written as: return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < 24_h; Then you wouldn't need the repeated #define for SECONDS_IN_DAY > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:2425 > + return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < SECONDS_IN_DAY; Ditto: return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < 24_h;
John Wilander
Comment 7 2019-12-02 14:29:51 PST
Comment on attachment 384649 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384649&action=review > Source/WebCore/ChangeLog:17 > + Nit: I would reverse this to "Updated the toString() to report only if it was in the last 24 hours as opposed to reporting an actual time since mostRecentUserInteractionTime changes with each test run." > Source/WebKit/ChangeLog:14 > + Adds ability to collect sorted ITP data for displaying in a Nit: "Adds *the ability …" > Source/WebKit/ChangeLog:15 > + user interface. Additional registrable domain data can be added based Nit: Not sure that we have to call out potential use cases since that's up to whichever application calls the API/SPI. > Source/WebKit/ChangeLog:30 > + (WebKit::CompletionHandler<void): This looks weird. I know the change log script gets confused by the ResourceLoadStatisticsDatabaseStore class. Please check the change entries manually. (I had to do so the last time.) > Source/WebKit/ChangeLog:33 > + joinSubStatisticsForSorting() to be used to collect sorted ITP data Please add the class to the function name so that it's clear where it belongs. > Source/WebCore/loader/ResourceLoadStatistics.cpp:36 > +#define SECONDS_IN_DAY 86400 Skip the define and use 24_h instead. Search for 24_h and you'll see it used. Ah, I now saw Brent pointing out the same thing! > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:66 > +#define SECONDS_IN_DAY 86400 Ditto. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:809 > +static bool isThirdParty(unsigned timesUnderFirstParty) By the look of its implementation, this function should probably be called hasBeenThirdParty(). > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:126 > +static bool isThirdParty(const ResourceLoadStatistics& statistic) Similar, hasBeenThirdParty(). > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:556 > + result.append("\nITP User Interface Data:\n"); I'd rather have this string say what we know in WebKit, such as just "ITP Data:". > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 > + bool storageAccessGranted; This doesn't really make sense to me since only third-parties can request and be granted storage access. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:97 > + Vector<FirstPartyData> associatedFirstParties; This is vague. Associated how? We've used the term for domain names owned by the same organization which I don't think you mean here. Maybe underFirstParties? > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:236 > + Vector<ThirdPartyData> gatherDataForUserInterface(); This sounds like WebKit decides where this data is to be used. But it's just an API, right? Also, it's a getter but it sounds like an action. We don't know what the data will be used for. Let's name it what it is, such as aggregatedThirdPartyData().
Kate Cheney
Comment 8 2019-12-02 14:45:43 PST
Comment on attachment 384649 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384649&action=review Thanks for the feedback! >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 >> + bool storageAccessGranted; > > This doesn't really make sense to me since only third-parties can request and be granted storage access. good point, maybe "hasGrantedStorageAccess" instead? >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:97 >> + Vector<FirstPartyData> associatedFirstParties; > > This is vague. Associated how? We've used the term for domain names owned by the same organization which I don't think you mean here. Maybe underFirstParties? good idea!
John Wilander
Comment 9 2019-12-02 15:00:53 PST
(In reply to katherine_cheney from comment #8) > Comment on attachment 384649 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=384649&action=review > > Thanks for the feedback! > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 > >> + bool storageAccessGranted; > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access. > > good point, maybe "hasGrantedStorageAccess" instead? Since the first party is not involved in granting storage access (except if it uses an iframe sandbox), I'd prefer if it's clear that it's about third-parties that have requested and been granted access under some first-party. Maybe we can come up with something if you explain what this boolean means. It's part of FirstPartyData and can only be true/false. Is it "This first party has at least once been granted storage access when third-party"? > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:97 > >> + Vector<FirstPartyData> associatedFirstParties; > > > > This is vague. Associated how? We've used the term for domain names owned by the same organization which I don't think you mean here. Maybe underFirstParties? > > good idea!
Kate Cheney
Comment 10 2019-12-02 15:04:58 PST
(In reply to John Wilander from comment #9) > (In reply to katherine_cheney from comment #8) > > Comment on attachment 384649 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review > > > > Thanks for the feedback! > > > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 > > >> + bool storageAccessGranted; > > > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access. > > > > good point, maybe "hasGrantedStorageAccess" instead? > > Since the first party is not involved in granting storage access (except if > it uses an iframe sandbox), I'd prefer if it's clear that it's about > third-parties that have requested and been granted access under some > first-party. > > Maybe we can come up with something if you explain what this boolean means. > It's part of FirstPartyData and can only be true/false. Is it "This first > party has at least once been granted storage access when third-party"? > It's based on the data structure we talked about a few weeks ago with Matt, where each third party has a list of first parties based on what it has been a subframe/subresource under or redirected to. So this bool is whether the third party that this first party is listed under has been granted storage access for that specific first party
John Wilander
Comment 11 2019-12-02 15:17:35 PST
(In reply to katherine_cheney from comment #10) > (In reply to John Wilander from comment #9) > > (In reply to katherine_cheney from comment #8) > > > Comment on attachment 384649 [details] > > > Patch > > > > > > View in context: > > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review > > > > > > Thanks for the feedback! > > > > > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 > > > >> + bool storageAccessGranted; > > > > > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access. > > > > > > good point, maybe "hasGrantedStorageAccess" instead? > > > > Since the first party is not involved in granting storage access (except if > > it uses an iframe sandbox), I'd prefer if it's clear that it's about > > third-parties that have requested and been granted access under some > > first-party. > > > > Maybe we can come up with something if you explain what this boolean means. > > It's part of FirstPartyData and can only be true/false. Is it "This first > > party has at least once been granted storage access when third-party"? > > > > It's based on the data structure we talked about a few weeks ago with Matt, > where each third party has a list of first parties based on what it has been > a subframe/subresource under or redirected to. So this bool is whether the > third party that this first party is listed under has been granted storage > access for that specific first party I see. The confusion comes from the boolean about storage access being about a third-party but stored in a struct about a first-party. The boolean needs to be owned about the third-party struct or the FirstPartyData needs to have a name that indicates that it's about the third-party. An easy way to think of this is to look at each of your two structs alone. It should make sense on its own. As it stands, the FirstPartyData struct only makes sense in conjunction with the ThirdPartyData struct. I suggest you skip the FirstPartyData struct and instead … 1. Add a boolean enum HasBeenGrantedStorageAccess { Yes, No }. 2. Replace ThirdPartyData's Vector<FirstPartyData> with a HashMap of <RegistrableDomain, HasBeenGrantedStorageAccess>.
Kate Cheney
Comment 12 2019-12-02 15:22:28 PST
(In reply to John Wilander from comment #11) > (In reply to katherine_cheney from comment #10) > > (In reply to John Wilander from comment #9) > > > (In reply to katherine_cheney from comment #8) > > > > Comment on attachment 384649 [details] > > > > Patch > > > > > > > > View in context: > > > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review > > > > > > > > Thanks for the feedback! > > > > > > > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 > > > > >> + bool storageAccessGranted; > > > > > > > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access. > > > > > > > > good point, maybe "hasGrantedStorageAccess" instead? > > > > > > Since the first party is not involved in granting storage access (except if > > > it uses an iframe sandbox), I'd prefer if it's clear that it's about > > > third-parties that have requested and been granted access under some > > > first-party. > > > > > > Maybe we can come up with something if you explain what this boolean means. > > > It's part of FirstPartyData and can only be true/false. Is it "This first > > > party has at least once been granted storage access when third-party"? > > > > > > > It's based on the data structure we talked about a few weeks ago with Matt, > > where each third party has a list of first parties based on what it has been > > a subframe/subresource under or redirected to. So this bool is whether the > > third party that this first party is listed under has been granted storage > > access for that specific first party > > I see. The confusion comes from the boolean about storage access being about > a third-party but stored in a struct about a first-party. The boolean needs > to be owned about the third-party struct or the FirstPartyData needs to have > a name that indicates that it's about the third-party. > > An easy way to think of this is to look at each of your two structs alone. > It should make sense on its own. As it stands, the FirstPartyData struct > only makes sense in conjunction with the ThirdPartyData struct. > > I suggest you skip the FirstPartyData struct and instead … > > 1. Add a boolean enum HasBeenGrantedStorageAccess { Yes, No }. > > 2. Replace ThirdPartyData's Vector<FirstPartyData> with a HashMap of > <RegistrableDomain, HasBeenGrantedStorageAccess>. I see your point about how it should make sense alone, but my only thought is if we want to add more data about each third-party/first-party pair in the future, like a last seen time. Then we would have to rethink the hash map.
John Wilander
Comment 13 2019-12-02 15:33:47 PST
(In reply to katherine_cheney from comment #12) > (In reply to John Wilander from comment #11) > > (In reply to katherine_cheney from comment #10) > > > (In reply to John Wilander from comment #9) > > > > (In reply to katherine_cheney from comment #8) > > > > > Comment on attachment 384649 [details] > > > > > Patch > > > > > > > > > > View in context: > > > > > https://bugs.webkit.org/attachment.cgi?id=384649&action=review > > > > > > > > > > Thanks for the feedback! > > > > > > > > > > >> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:82 > > > > > >> + bool storageAccessGranted; > > > > > > > > > > > > This doesn't really make sense to me since only third-parties can request and be granted storage access. > > > > > > > > > > good point, maybe "hasGrantedStorageAccess" instead? > > > > > > > > Since the first party is not involved in granting storage access (except if > > > > it uses an iframe sandbox), I'd prefer if it's clear that it's about > > > > third-parties that have requested and been granted access under some > > > > first-party. > > > > > > > > Maybe we can come up with something if you explain what this boolean means. > > > > It's part of FirstPartyData and can only be true/false. Is it "This first > > > > party has at least once been granted storage access when third-party"? > > > > > > > > > > It's based on the data structure we talked about a few weeks ago with Matt, > > > where each third party has a list of first parties based on what it has been > > > a subframe/subresource under or redirected to. So this bool is whether the > > > third party that this first party is listed under has been granted storage > > > access for that specific first party > > > > I see. The confusion comes from the boolean about storage access being about > > a third-party but stored in a struct about a first-party. The boolean needs > > to be owned about the third-party struct or the FirstPartyData needs to have > > a name that indicates that it's about the third-party. > > > > An easy way to think of this is to look at each of your two structs alone. > > It should make sense on its own. As it stands, the FirstPartyData struct > > only makes sense in conjunction with the ThirdPartyData struct. > > > > I suggest you skip the FirstPartyData struct and instead … > > > > 1. Add a boolean enum HasBeenGrantedStorageAccess { Yes, No }. > > > > 2. Replace ThirdPartyData's Vector<FirstPartyData> with a HashMap of > > <RegistrableDomain, HasBeenGrantedStorageAccess>. > > I see your point about how it should make sense alone, but my only thought > is if we want to add more data about each third-party/first-party pair in > the future, like a last seen time. Then we would have to rethink the hash > map. That is true. I'm often in favor of keeping the change as simple as possible for the task at hand. If you prefer a struct for the the kind of data we'd want to catch/expose, we have to come up with a better name than FirstPartyData. The stand-alone test needs to pass. :) Maybe ThirdPartyDataForSpecificFirstParty? That makes it clear that it's data associated with the third-party but tied to what that third-party has been up to under a specific first-party.
Kate Cheney
Comment 14 2019-12-02 15:46:52 PST
John Wilander
Comment 15 2019-12-03 17:50:45 PST
Comment on attachment 384665 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=384665&action=review Please address comments before landing. > Source/WebCore/loader/ResourceLoadStatistics.cpp:418 > + return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < Seconds(24_h).value(); 24_h is a Seconds object. Do we need to wrap it this way and call value()? If not, please change before landing. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:789 > +Vector<ThirdPartyDataForSpecificFirstParty> ResourceLoadStatisticsDatabaseStore::getFirstPartyDataForDomain(unsigned domainID, const RegistrableDomain& domainString) const I would prefer this function name to be aligned with the return type. Is the domainString the third-party? Maybe we could make that clearer too? Anyway, I trust you'll come up with a new function name parameter name that make sense given what the function does and what it returns. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:2423 > + return interactionTimeSeconds > 0 && WallTime::now().secondsSinceEpoch().value() - interactionTimeSeconds < Seconds(24_h).value(); Same thing here with the Seconds(24_h) wrapping. Needed? > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:112 > +static Vector<ThirdPartyDataForSpecificFirstParty> getFirstPartyDataForDomain(const ResourceLoadStatistics& statistic) Same thing here for the function name and parameter name. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:80 > +struct ThirdPartyDataForSpecificFirstParty { Much better. > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:86 > + return makeString(firstPartyDomain.string(), ", Has Granted Storage Access: ", storageAccessGranted); Hmm. Now I'm confused. The user grants storage access, neither the first or third party. And the third-party requests storage access, not the first party. So the first party is never involved in anything around requesting or granting storage access. The data point should say whether or not a third party has been granted storage access under a first party. So maybe a generated string saying "Has been granted storage access under { first-party domain string } : { boolean }." Unfortunately, this means regenerating expect files for your tests.
Kate Cheney
Comment 16 2019-12-04 10:00:31 PST
Kate Cheney
Comment 17 2019-12-04 10:01:22 PST
Thanks for all the help John! I made the naming changes you suggested plus a few others that clarified the distinction between third/first party information. I also had to update interactionTimeSeconds to be WTF::Seconds type instead of a double so it can be compared directly to 24_h (and I think it makes sense to have it as a seconds value instead of a double because the function compares time values). Going to let it run through EWS before I cq+
WebKit Commit Bot
Comment 18 2019-12-04 12:06:34 PST
Comment on attachment 384822 [details] Patch Clearing flags on attachment: 384822 Committed r253118: <https://trac.webkit.org/changeset/253118>
WebKit Commit Bot
Comment 19 2019-12-04 12:06:36 PST
All reviewed patches have been landed. Closing bug.
Fujii Hironori
Comment 20 2019-12-04 19:55:13 PST
Filed: Bug 204870 – [MSVC] WebResourceLoadStatisticsStore.h is reporting warning C4804: '/': unsafe use of type 'bool' in operation
Note You need to log in before you can comment on or make changes to this bug.