Bug 195923

Summary: Resource Load Statistics (experimental): Clear non-cookie website data for sites that have been navigated to, with link decoration, by a prevalent resource
Product: WebKit Reporter: John Wilander <wilander>
Component: WebKit Misc.Assignee: John Wilander <wilander>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, bfulgham, cdumez, commit-queue, tsavell, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=196017
https://bugs.webkit.org/show_bug.cgi?id=198185
Attachments:
Description Flags
Patch
none
Patch
none
Patch for landing none

Description John Wilander 2019-03-18 17:25:14 PDT
Cross-site trackers abuse link query parameters to transport user identifiers and then store them in first-party storage space.

https://bugs.webkit.org/show_bug.cgi?id=189933 capped all persistent client-side cookies to seven days of storage.
https://bugs.webkit.org/show_bug.cgi?id=195196 capped persistent client-side cookies to one day of storage for navigations with link decoration from prevalent resources.
https://bugs.webkit.org/show_bug.cgi?id=195301 added logging of navigations with link decoration from prevalent resources.

We should clear out non-cookie website data for sites that have been navigated to, with link decoration, by a prevalent resource. This makes sure tracker scripts cannot force first-party sites to store cross-site tracking data transferred in such navigations.
Comment 1 Radar WebKit Bug Importer 2019-03-18 17:25:47 PDT
<rdar://problem/49001272>
Comment 2 John Wilander 2019-03-18 18:05:06 PDT
Created attachment 365096 [details]
Patch
Comment 3 Alex Christensen 2019-03-19 11:42:13 PDT
Comment on attachment 365096 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365096&action=review

> Source/WebCore/page/RuntimeEnabledFeatures.h:543
> +    bool m_isITPFirstPartyWebsiteDataRemovalEnabled { true };

DEFAULT_EXPERIMENTAL_FEATURES_ENABLED here?

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:163
> +    void registrableDomainsToRemoveWebsiteDataFor(HashMap<RegistrableDomain, WebsiteDataToRemove>&) override;

This should return a value instead of returning by reference.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:66
> +static String domainsToString(HashMap<RegistrableDomain, WebsiteDataToRemove>& domainsToRemoveWebsiteDataFor)

const HashMap&

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.h:75
> +constexpr unsigned operatingDatesWindowLong { 30 };
> +constexpr unsigned operatingDatesWindowShort { 7 };

I think operatingDatesLongWindow operatingDatesShortWindow would be better names for these.  Long and short are adjectives that modify Window.  If they're at the end, it looks to me like the type (unsigned long vs unsigned short).

Even better, I think, would be to have this be an enum class:
enum class OperatingDatesWindow : bool { Long, Short };

Then most places would just pass one or the other and in one place we would need constants for how many days the enum values mean.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63
> +enum class WebsiteDataToRemove {

: uint8_t
Comment 4 John Wilander 2019-03-19 11:45:54 PDT
(In reply to Alex Christensen from comment #3)
> Comment on attachment 365096 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=365096&action=review
> 
> > Source/WebCore/page/RuntimeEnabledFeatures.h:543
> > +    bool m_isITPFirstPartyWebsiteDataRemovalEnabled { true };
> 
> DEFAULT_EXPERIMENTAL_FEATURES_ENABLED here?

You're right. Should be false. I wrote the code in the wrong order.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:163
> > +    void registrableDomainsToRemoveWebsiteDataFor(HashMap<RegistrableDomain, WebsiteDataToRemove>&) override;
> 
> This should return a value instead of returning by reference.

I assume you mean for all instances of this function.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:66
> > +static String domainsToString(HashMap<RegistrableDomain, WebsiteDataToRemove>& domainsToRemoveWebsiteDataFor)
> 
> const HashMap&

Got it.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.h:75
> > +constexpr unsigned operatingDatesWindowLong { 30 };
> > +constexpr unsigned operatingDatesWindowShort { 7 };
> 
> I think operatingDatesLongWindow operatingDatesShortWindow would be better
> names for these.  Long and short are adjectives that modify Window.  If
> they're at the end, it looks to me like the type (unsigned long vs unsigned
> short).

Agreed.

> Even better, I think, would be to have this be an enum class:
> enum class OperatingDatesWindow : bool { Long, Short };

Ah. I'll take a stab at this.

> Then most places would just pass one or the other and in one place we would
> need constants for how many days the enum values mean.
> 
> > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63
> > +enum class WebsiteDataToRemove {
> 
> : uint8_t

Got it.

Thanks, Alex!
Comment 5 John Wilander 2019-03-19 11:50:15 PDT
(In reply to John Wilander from comment #4)
> (In reply to Alex Christensen from comment #3)
> > Comment on attachment 365096 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=365096&action=review
> > 
> > > Source/WebCore/page/RuntimeEnabledFeatures.h:543
> > > +    bool m_isITPFirstPartyWebsiteDataRemovalEnabled { true };
> > 
> > DEFAULT_EXPERIMENTAL_FEATURES_ENABLED here?
> 
> You're right. Should be false. I wrote the code in the wrong order.
> 
> > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.h:163
> > > +    void registrableDomainsToRemoveWebsiteDataFor(HashMap<RegistrableDomain, WebsiteDataToRemove>&) override;
> > 
> > This should return a value instead of returning by reference.
> 
> I assume you mean for all instances of this function.

Looking at it now I realized why I ended up where I did. I started out with two vectors – one for remove all data and one for all but cookies. Since I couldn't return two vectors I did it by reference. But then everything became much more complicated and I ran the risk of mixing up the two vectors. Finally, I realized I want to support three different filters for data removal. That's why I switched to a HashMap.

Thanks for pointing this out.

> > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.cpp:66
> > > +static String domainsToString(HashMap<RegistrableDomain, WebsiteDataToRemove>& domainsToRemoveWebsiteDataFor)
> > 
> > const HashMap&
> 
> Got it.
> 
> > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsStore.h:75
> > > +constexpr unsigned operatingDatesWindowLong { 30 };
> > > +constexpr unsigned operatingDatesWindowShort { 7 };
> > 
> > I think operatingDatesLongWindow operatingDatesShortWindow would be better
> > names for these.  Long and short are adjectives that modify Window.  If
> > they're at the end, it looks to me like the type (unsigned long vs unsigned
> > short).
> 
> Agreed.
> 
> > Even better, I think, would be to have this be an enum class:
> > enum class OperatingDatesWindow : bool { Long, Short };
> 
> Ah. I'll take a stab at this.
> 
> > Then most places would just pass one or the other and in one place we would
> > need constants for how many days the enum values mean.
> > 
> > > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63
> > > +enum class WebsiteDataToRemove {
> > 
> > : uint8_t
> 
> Got it.
> 
> Thanks, Alex!
Comment 6 John Wilander 2019-03-19 12:23:31 PDT
Created attachment 365203 [details]
Patch
Comment 7 Alex Christensen 2019-03-19 15:30:54 PDT
Comment on attachment 365203 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=365203&action=review

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1519
> +    HashMap<RegistrableDomain, WebsiteDataToRemove> domainsToRemoveWebsiteDataFor;

If we used a struct that contained 3 Vector<RegistrableDomain> instead of the enum, it would be more memory efficient.

> Tools/WebKitTestRunner/TestInvocation.cpp:1311
> +        WKRetainPtr<WKStringRef> fromHostKey(AdoptWK, WKStringCreateWithUTF8CString("FromHost"));

auto fromHostKey = adoptWK(WKStringCreateWithUTF8CString("FromHost"));
Comment 8 John Wilander 2019-03-19 15:36:49 PDT
Created attachment 365246 [details]
Patch for landing
Comment 9 John Wilander 2019-03-19 15:39:25 PDT
(In reply to Alex Christensen from comment #7)
> Comment on attachment 365203 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=365203&action=review
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:1519
> > +    HashMap<RegistrableDomain, WebsiteDataToRemove> domainsToRemoveWebsiteDataFor;
> 
> If we used a struct that contained 3 Vector<RegistrableDomain> instead of
> the enum, it would be more memory efficient.

Good idea. Will do in a follow up, tracked in rdar://problem/49038934.

> > Tools/WebKitTestRunner/TestInvocation.cpp:1311
> > +        WKRetainPtr<WKStringRef> fromHostKey(AdoptWK, WKStringCreateWithUTF8CString("FromHost"));
> 
> auto fromHostKey = adoptWK(WKStringCreateWithUTF8CString("FromHost"));

Fixed.

Thanks for the review, Alex!
Comment 10 John Wilander 2019-03-19 16:57:50 PDT
Test failure is unrelated.
Comment 11 WebKit Commit Bot 2019-03-19 17:22:16 PDT
Comment on attachment 365246 [details]
Patch for landing

Clearing flags on attachment: 365246

Committed r243181: <https://trac.webkit.org/changeset/243181>
Comment 12 WebKit Commit Bot 2019-03-19 17:22:17 PDT
All reviewed patches have been landed.  Closing bug.
Comment 13 Truitt Savell 2019-03-20 09:23:00 PDT
Looks like the new test http/tests/resourceLoadStatistics/website-data-removal-for-site-navigated-to-with-link-decoration.html

added in https://trac.webkit.org/changeset/243181/webkit

is failing constantly on WK1. History:
http://webkit-test-results.webkit.org/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=http%2Ftests%2FresourceLoadStatistics%2Fwebsite-data-removal-for-site-navigated-to-with-link-decoration.html

Diff:
--- /Volumes/Data/slave/highsierra-release-tests-wk2/build/layout-test-results/http/tests/resourceLoadStatistics/website-data-removal-for-site-navigated-to-with-link-decoration-expected.txt
+++ /Volumes/Data/slave/highsierra-release-tests-wk2/build/layout-test-results/http/tests/resourceLoadStatistics/website-data-removal-for-site-navigated-to-with-link-decoration-actual.txt
@@ -10,7 +10,7 @@
 After deletion: Client-side cookie exists.
 After deletion: Regular server-side cookie exists.
 
-After deletion: IDB entry does not exist.
+After deletion: IDB entry does exist.
 
 
 Resource load statistics:
@@ -30,6 +30,6 @@
     grandfathered: No
     topFrameLinkDecorationsFrom:
         localhost
-    gotLinkDecorationFromPrevalentResource: No    isPrevalentResource: No
+    gotLinkDecorationFromPrevalentResource: Yes    isPrevalentResource: No
     isVeryPrevalentResource: No
-    dataRecordsRemoved: 1
+    dataRecordsRemoved: 0
Comment 14 Truitt Savell 2019-03-20 09:23:34 PDT
Apologies, Failing constantly on WK2
Comment 15 John Wilander 2019-03-20 10:03:34 PDT
OK, looking at this.
Comment 16 John Wilander 2019-03-20 10:43:54 PDT
I found the bug. ITP refuses (and should refuse) to remove website data repeatedly because of the minimumTimeBetweenDataRecordsRemoval setting. I just need to add an exception to this rule for parameters().isRunningTest.
Comment 17 John Wilander 2019-03-20 10:57:49 PDT
(In reply to John Wilander from comment #16)
> I found the bug. ITP refuses (and should refuse) to remove website data
> repeatedly because of the minimumTimeBetweenDataRecordsRemoval setting. I
> just need to add an exception to this rule for parameters().isRunningTest.

Fix is on the commit queue: https://bugs.webkit.org/show_bug.cgi?id=196017