Bug 213493 - [WPE][GTK] Add API to support "Privacy Report"
Summary: [WPE][GTK] Add API to support "Privacy Report"
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: PC Linux
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 214401
Blocks: 210184
  Show dependency treegraph
 
Reported: 2020-06-22 15:57 PDT by Michael Catanzaro
Modified: 2020-07-27 01:35 PDT (History)
13 users (show)

See Also:


Attachments
Patch (27.86 KB, patch)
2020-07-22 03:41 PDT, Carlos Garcia Campos
aperez: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2020-06-22 15:57:43 PDT
"""
Apple also announced that its upcoming macOS Big Sur will include an update to the company’s web browser that gives users an easy-to-access “Privacy Report” right up next to the URL field. In a single click, the report will tell users which website trackers the browser has blocked.
"""

https://cdn.vox-cdn.com/thumbor/nuA_JNtgc1U3xSaOVIpoYcgR8VY=/0x0:653x653/920x0/filters:focal(0x0:653x653):format(webp):no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20047451/tracker.jpg

There must be some Cocoa SPI somewhere to enable this (where?). We could expose an equivalent API.
Comment 1 Carlos Garcia Campos 2020-06-24 06:42:14 PDT
John, Youenn is WebsiteDataStore::getResourceLoadStatisticsDataSummary() the function returning the data needed to create the privacy report?
Comment 2 Carlos Garcia Campos 2020-07-09 23:44:16 PDT
Ping.
Comment 3 John Wilander 2020-07-10 00:25:19 PDT
Sorry for the delay and thanks for the ping. I’m cc’ing my coworker who implemented the data reporting functionality from ITP.
Comment 4 Kate Cheney 2020-07-10 10:24:30 PDT
(In reply to Carlos Garcia Campos from comment #1)
> John, Youenn is WebsiteDataStore::getResourceLoadStatisticsDataSummary() the
> function returning the data needed to create the privacy report?

Yes, WebsiteDataStore::getResourceLoadStatisticsDataSummary() is the correct function.
Comment 5 John Wilander 2020-07-10 10:33:02 PDT
One thing to note here is that the reporting relies on full third-party cookie blocking. If your implementation only blocks some third-party cookies, you need to make sure the reporting only reports on those.
Comment 6 Michael Catanzaro 2020-07-10 11:04:05 PDT
We use ThirdPartyCookieBlockingMode::All since that's default in Safari. Our goal is to match what Safari does.

I wonder if the other ThirdPartyCookieBlockingMode options can be removed? I guess nobody is using ITP with any of the other modes anymore?
Comment 7 John Wilander 2020-07-10 11:10:36 PDT
(In reply to Michael Catanzaro from comment #6)
> We use ThirdPartyCookieBlockingMode::All since that's default in Safari. Our
> goal is to match what Safari does.
> 
> I wonder if the other ThirdPartyCookieBlockingMode options can be removed? I
> guess nobody is using ITP with any of the other modes anymore?

Yes, I started working on removing all of that in https://bugs.webkit.org/show_bug.cgi?id=208819 but we decided we wanted to wait with such refactoring until we've branched trunk for the upcoming releases.
Comment 8 Carlos Garcia Campos 2020-07-16 03:37:31 PDT
(In reply to katherine_cheney from comment #4)
> (In reply to Carlos Garcia Campos from comment #1)
> > John, Youenn is WebsiteDataStore::getResourceLoadStatisticsDataSummary() the
> > function returning the data needed to create the privacy report?
> 
> Yes, WebsiteDataStore::getResourceLoadStatisticsDataSummary() is the correct
> function.

Thanks! I started to play with this, but I noticed I was not getting the same results than Safari, actually I was getting no entries in the report. So, I tried using the database store and then I got the same results. It seems to be a bug in the memory store (see bug #214401). This makes me wonder if we should be using the database store like Safari. Why do we have memory and db stores? When are you supposed to use one or the other? Should we also expose a setting?
Comment 9 Carlos Garcia Campos 2020-07-16 03:56:07 PDT
I've also noticed that with the memory store ThirdPartyDataForSpecificFirstParty::timeLastUpdated is always NoExistingTimestamp
Comment 10 Michael Catanzaro 2020-07-16 05:55:06 PDT
(In reply to Carlos Garcia Campos from comment #8)
> Thanks! I started to play with this, but I noticed I was not getting the
> same results than Safari, actually I was getting no entries in the report.
> So, I tried using the database store and then I got the same results. It
> seems to be a bug in the memory store (see bug #214401). This makes me
> wonder if we should be using the database store like Safari. Why do we have
> memory and db stores? When are you supposed to use one or the other? Should
> we also expose a setting?

Good catch.

I don't think we need a setting for this. Memory store is for ephemeral mode and database store is for normal browsing mode. You can see if you restart Epiphany that all ITP data tracked in the Clear Data dialog gets deleted... oops, right? That's supposed to be persistent.
Comment 11 Carlos Garcia Campos 2020-07-16 06:04:53 PDT
(In reply to Michael Catanzaro from comment #10)
> (In reply to Carlos Garcia Campos from comment #8)
> > Thanks! I started to play with this, but I noticed I was not getting the
> > same results than Safari, actually I was getting no entries in the report.
> > So, I tried using the database store and then I got the same results. It
> > seems to be a bug in the memory store (see bug #214401). This makes me
> > wonder if we should be using the database store like Safari. Why do we have
> > memory and db stores? When are you supposed to use one or the other? Should
> > we also expose a setting?
> 
> Good catch.
> 
> I don't think we need a setting for this. Memory store is for ephemeral mode
> and database store is for normal browsing mode. You can see if you restart
> Epiphany that all ITP data tracked in the Clear Data dialog gets deleted...
> oops, right? That's supposed to be persistent.

It's persistent, stored in full_browsing_session_resourceLog.plist
Comment 12 Michael Catanzaro 2020-07-16 06:22:40 PDT
Hm... well then I don't understand either. How can it be a "memory store" if it's stored on disk? Maybe it's considered serialized memory, similar to WebKit session state?

It really does disappear after restarting the browser btw. I'm seeing *very* strange behavior. After restarting the browser and reloading this bug, I have no ITP data stored. After loading a reddit.com, I see ITP data stored for webkit.org (plausible since this bug was reloaded), mozilla.net (???), and firefox.com (???). After doing nothing and just waiting a few more seconds, ITP data appears for several other domains. This can't be right.
Comment 13 Michael Catanzaro 2020-07-19 12:09:53 PDT
I found this in WebResourceLoadStatisticsStore:

auto legacyPlistFilePath = FileSystem::pathByAppendingComponent(resourceLoadStatisticsDirectory, "full_browsing_session_resourceLog.plist");
if (FileSystem::fileExists(legacyPlistFilePath))
    FileSystem::deleteFile(legacyPlistFilePath);

Seems that ResourceLoadStatisticsMemoryStore is obsoleted by ResourceLoadStatisticsDatabaseStore, and enabling the database store will cause it to clean up the plist file.
Comment 14 Carlos Garcia Campos 2020-07-20 01:12:51 PDT
(In reply to Michael Catanzaro from comment #13)
> I found this in WebResourceLoadStatisticsStore:
> 
> auto legacyPlistFilePath =
> FileSystem::pathByAppendingComponent(resourceLoadStatisticsDirectory,
> "full_browsing_session_resourceLog.plist");
> if (FileSystem::fileExists(legacyPlistFilePath))
>     FileSystem::deleteFile(legacyPlistFilePath);
> 
> Seems that ResourceLoadStatisticsMemoryStore is obsoleted by
> ResourceLoadStatisticsDatabaseStore, and enabling the database store will
> cause it to clean up the plist file.

Let's switch to use the database store unconditionally then, but I would like someone from Apple to confirm we are right first.
Comment 15 Kate Cheney 2020-07-20 10:14:49 PDT
The database store is a SQLite version of the full_browsing_session_resourceLog.plist file. The "memory" store actually consists of both memory and persistent storage, and in this version data is stored in memory and flushed to disk periodically. The database store is a complete replacement for the memory store, and as you noticed, switching to the memory store from the database store is not supported. I would recommend using the database store unconditionally, as we may not continue to support new behavior with the memory store. Currently we are writing and maintaining tests for both versions, but this is not something we want to keep up indefinitely.
Comment 16 Carlos Garcia Campos 2020-07-22 03:41:31 PDT
Created attachment 404913 [details]
Patch
Comment 17 EWS Watchlist 2020-07-22 03:42:33 PDT
Thanks for the patch. If this patch contains new public API please make sure it follows the guidelines for new WebKit2 GTK+ API. See http://trac.webkit.org/wiki/WebKitGTK/AddingNewWebKit2API
Comment 18 Adrian Perez 2020-07-22 06:23:03 PDT
Comment on attachment 404913 [details]
Patch

Patch LGTM overall, but there are a couple of types and a an introspection
annotation which needs fixing, so please check the comments below before landing.

View in context: https://bugs.webkit.org/attachment.cgi?id=404913&action=review

> Source/WTF/ChangeLog:8
> +        Add support for using GRePtr with GDateTime.

Typo: GRePtr → GRefPtr

> Source/WebKit/ChangeLog:8
> +        Add WebKitITPFirstParty and WebKitITPThirdParty bosex types to expose the data returned by the ITP summary.

Typo: bosex → boxed

> Source/WebKit/UIProcess/API/glib/WebKitWebsiteDataManager.cpp:1221
> + * Returns: (transfer full): the last update time as a #GDateTime

The implementation returns the pointer directly, without adding a
reference, therefore the annotation here should be “(transfer none)”.
Comment 19 Michael Catanzaro 2020-07-22 09:38:46 PDT
Comment on attachment 404913 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=404913&action=review

API looks good.

> Source/WebKit/UIProcess/API/glib/WebKitWebsiteDataManager.cpp:1202
> + * Get whether website data access has been granted for a #WebKitITPThirdParty
> + * under @itp_first_party.

Hm, I understand the #WebkitITPFirstParty object corresponds to exactly one #WebKitITPThirdParty, yes? That is not very clear. Perhaps:

Get whether @itp_first_party has granted website data access to its #WebKitITPThirdParty. Each @WebKitITPFirstParty is created by webkit_itp_third_party_get_first_parties() and therefore corresponds to exactly one #WebKitITPThirdParty.
Comment 20 Carlos Garcia Campos 2020-07-27 01:31:30 PDT
(In reply to Adrian Perez from comment #18)
> Comment on attachment 404913 [details]
> Patch
> 
> Patch LGTM overall, but there are a couple of types and a an introspection
> annotation which needs fixing, so please check the comments below before
> landing.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=404913&action=review
> 
> > Source/WTF/ChangeLog:8
> > +        Add support for using GRePtr with GDateTime.
> 
> Typo: GRePtr → GRefPtr
> 
> > Source/WebKit/ChangeLog:8
> > +        Add WebKitITPFirstParty and WebKitITPThirdParty bosex types to expose the data returned by the ITP summary.
> 
> Typo: bosex → boxed
> 
> > Source/WebKit/UIProcess/API/glib/WebKitWebsiteDataManager.cpp:1221
> > + * Returns: (transfer full): the last update time as a #GDateTime
> 
> The implementation returns the pointer directly, without adding a
> reference, therefore the annotation here should be “(transfer none)”.

Right!
Comment 21 Carlos Garcia Campos 2020-07-27 01:32:22 PDT
(In reply to Michael Catanzaro from comment #19)
> Comment on attachment 404913 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=404913&action=review
> 
> API looks good.
> 
> > Source/WebKit/UIProcess/API/glib/WebKitWebsiteDataManager.cpp:1202
> > + * Get whether website data access has been granted for a #WebKitITPThirdParty
> > + * under @itp_first_party.
> 
> Hm, I understand the #WebkitITPFirstParty object corresponds to exactly one
> #WebKitITPThirdParty, yes? That is not very clear. Perhaps:

Right.

> Get whether @itp_first_party has granted website data access to its
> #WebKitITPThirdParty. Each @WebKitITPFirstParty is created by
> webkit_itp_third_party_get_first_parties() and therefore corresponds to
> exactly one #WebKitITPThirdParty.

Great, thanks!
Comment 22 Carlos Garcia Campos 2020-07-27 01:35:44 PDT
Committed r264910: <https://trac.webkit.org/changeset/264910>