Bug 219134 - PCM: Persist pending ad clicks and attributions so they can survive browser restart
Summary: PCM: Persist pending ad clicks and attributions so they can survive browser r...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKit Misc. (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Kate Cheney
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-11-18 16:46 PST by Kate Cheney
Modified: 2020-11-20 16:38 PST (History)
8 users (show)

See Also:


Attachments
Patch (125.01 KB, patch)
2020-11-18 17:24 PST, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (218.29 KB, patch)
2020-11-19 20:16 PST, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch (219.79 KB, patch)
2020-11-20 09:45 PST, Kate Cheney
no flags Details | Formatted Diff | Diff
Patch for landing (220.07 KB, patch)
2020-11-20 15:59 PST, Kate Cheney
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kate Cheney 2020-11-18 16:46:43 PST
Currently all PCM data is stored in memory. We should persist it.
Comment 1 Kate Cheney 2020-11-18 16:47:05 PST
<rdar://problem/70470129>
Comment 2 Kate Cheney 2020-11-18 17:24:27 PST
Created attachment 414519 [details]
Patch
Comment 3 John Wilander 2020-11-19 09:50:13 PST
Comment on attachment 414519 [details]
Patch

Quick initial question: How disruptive will it be to change these names? The W3C Privacy CG conversation has changed a lot of this and we should make sure the DB uses the right terminology. In essence, it's about not using advertising lingo and keeping it abstract. So it's "source ID" instead of "ad campaign ID" and "attribution triggering event" instead of "conversion event." I've made changes to the developer facing parts but not the internal naming in WebKit yet.
Comment 4 Kate Cheney 2020-11-19 11:02:02 PST
(In reply to John Wilander from comment #3)
> Comment on attachment 414519 [details]
> Patch
> 
> Quick initial question: How disruptive will it be to change these names? The
> W3C Privacy CG conversation has changed a lot of this and we should make
> sure the DB uses the right terminology. In essence, it's about not using
> advertising lingo and keeping it abstract. So it's "source ID" instead of
> "ad campaign ID" and "attribution triggering event" instead of "conversion
> event." I've made changes to the developer facing parts but not the internal
> naming in WebKit yet.

It will be somewhat disruptive but not terrible. There is a SQLite command called ALTER TABLE that can change column names. We will have to add code to check for the old table name and change if needed once this lands (if a name-change occurs after we land).

I tried to align the terminology with that in your recent patch https://bugs.webkit.org/show_bug.cgi?id=218967.
Comment 5 John Wilander 2020-11-19 12:02:32 PST
Comment on attachment 414519 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414519&action=review

Great to see this be implemented! Overall, looks good.

We should rename DB entries like this:
· source –> sourceSite
· campaign/campaignID –> sourceID
· destination –> attributeOnSite
· conversion/conversionValue –> attributionTriggerData
· unconverted –> unattributed
· convert(ed) –> attribute(d)

You don't have to change the name of existing classes and variables. We can do that in a separate patch (or now if you want :P). I'm just trying to avoid a DB migration if possible.

Could we simulate a browser restart by terminating the network process in a test?

I'll hold off marking r+ until we've decided on naming.

> Source/WebCore/ChangeLog:21
> +        Moved logic to the ResourceLoadStatisticsDatabase class.

We should think through the scenarios for 1) ITP turned off now means PCM turned off, and 2) deletion of website data. What's your view?

> Source/WebKit/ChangeLog:35
> +        to wait to conver, the earliestTimeToSend parameter is set to now (0) and the

*convert
Comment 6 Kate Cheney 2020-11-19 12:17:51 PST
Comment on attachment 414519 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414519&action=review

Thanks for taking a look. I can rename everything in this patch :) Also, I didn't think about restarting the network process in WebKitTestRunner (I didn't realize it was possible), but I see TestRunner::terminateNetworkProcess() which seems to be exactly a function for doing that -- I will see if I can refactor to use this.

>> Source/WebCore/ChangeLog:21
>> +        Moved logic to the ResourceLoadStatisticsDatabase class.
> 
> We should think through the scenarios for 1) ITP turned off now means PCM turned off, and 2) deletion of website data. What's your view?

1) This is true. One option (although I haven't spent a long time thinking this through) could be to refactor so the database still gets created, but only PCM interactions occur when Allow Cross-Site Tracking is on. Will PCM also have an on/off switch we should account for?
2) In this patch, deletion of website data removes PCM persistent data, which I think aligns with the purpose of the action. Do you think it should behave otherwise?

Another consideration is ephemeral mode -- is no PCM the expected ephemeral mode behavior, or should there be some in-memory storage?

>> Source/WebKit/ChangeLog:35
>> +        to wait to conver, the earliestTimeToSend parameter is set to now (0) and the
> 
> *convert

Oops. Will fix.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:1533
> +        if (!m_statisticsStore)

I should call the completion handler here for an early return.

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.cpp:1554
> +            return;

Ditto.

> Source/WebKit/NetworkProcess/NetworkProcess.cpp:2570
> +    completionHandler();

I should forward this completion handler because it sets a flag in the ITP store on a background thread. I forgot to change this from when I was previously setting it only in the PCM manager.
Comment 7 Kate Cheney 2020-11-19 20:16:09 PST
Created attachment 414644 [details]
Patch
Comment 8 Kate Cheney 2020-11-19 20:17:18 PST
(In reply to John Wilander from comment #5)
> Comment on attachment 414519 [details]
> Patch
> 

> We should rename DB entries like this:
> · source –> sourceSite
> · campaign/campaignID –> sourceID
> · destination –> attributeOnSite
> · conversion/conversionValue –> attributionTriggerData
> · unconverted –> unattributed
> · convert(ed) –> attribute(d)
> 
> You don't have to change the name of existing classes and variables. We can
> do that in a separate patch (or now if you want :P). I'm just trying to
> avoid a DB migration if possible.
> 

I’m not sure I hit every name change, but I updated all names in the database store, the PCM manager, and the WebCore PCM object. I might need a followup to change the few remaining functions with “conversion” still in the name (the patch was growing quite large).  

> Could we simulate a browser restart by terminating the network process in a
> test?
> 

Instead of destroying the whole network process to test persistence, I ended up destroying and recreating the ResourceLoadStatisticsDatabaseStore object to simulate a session restart.
Comment 9 Kate Cheney 2020-11-20 09:45:35 PST
Created attachment 414691 [details]
Patch
Comment 10 John Wilander 2020-11-20 15:19:34 PST
Comment on attachment 414691 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=414691&action=review

R+ with comments. The existing tests should ensure that the logic is kept sound. As for database expertise, I can't help you much. I'm relying on your skill there. Just making sure - PCM is still disabled for ephemeral sessions, right? That is the intention.

> Source/WebCore/ChangeLog:18
> +        - convert(ed) â> attribute(d)

Seems to be a Unicode dash in these arrows. Please change to ancient ASCII. :/

> Source/WebKit/ChangeLog:21
> +        This adds 3 SQLite tables: one for ad clicks that haven't been

I'd just call them clicks to avoid advertising language.

> Source/WebCore/html/HTMLAnchorElement.cpp:405
> +    using AttributeOnSite = PrivateClickMeasurement::AttributeOnSite;

Thank you for doing this renaming work!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3064
> +    clearExpiredPrivateClickMeasurement();

We may want to add a comment here to make it clear that this call is important. We're lazily clearing before we attribute and that's how we make sure we don't attribute expired clicks.

> Source/WebKit/NetworkProcess/NetworkSession.cpp:193
> +    m_privateClickMeasurement->startTimer(0_s);

I seem to recall some startImmediate() function. Was this how it was done in the earlier implementation? Maybe I found that the *immediate function didn't behave as I wanted.

> Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.cpp:61
> +    startTimer(5_s);

What is this timer for?

> Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.h:43
> +enum class PrivateClickMeasurementAttributionType { Unattributed, Attributed };

bool
Comment 11 Kate Cheney 2020-11-20 15:32:27 PST
(In reply to John Wilander from comment #10)
> Comment on attachment 414691 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=414691&action=review
> 
> R+ with comments. The existing tests should ensure that the logic is kept
> sound. As for database expertise, I can't help you much. I'm relying on your
> skill there. Just making sure - PCM is still disabled for ephemeral
> sessions, right? That is the intention.
> 

Yes, there are checks in every WebResourceLoadStatistics function ensuring PCM is disabled for ephemeral sessions.

> > Source/WebCore/ChangeLog:18
> > +        - convert(ed) â> attribute(d)
> 
> Seems to be a Unicode dash in these arrows. Please change to ancient ASCII.
> :/

Will fix. 

> 
> > Source/WebKit/ChangeLog:21
> > +        This adds 3 SQLite tables: one for ad clicks that haven't been
> 
> I'd just call them clicks to avoid advertising language.
> 

Will fix.

> > Source/WebCore/html/HTMLAnchorElement.cpp:405
> > +    using AttributeOnSite = PrivateClickMeasurement::AttributeOnSite;
> 
> Thank you for doing this renaming work!
> 

Sure!

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3064
> > +    clearExpiredPrivateClickMeasurement();
> 
> We may want to add a comment here to make it clear that this call is
> important. We're lazily clearing before we attribute and that's how we make
> sure we don't attribute expired clicks.
> 

Ok, I will add before landing.

> > Source/WebKit/NetworkProcess/NetworkSession.cpp:193
> > +    m_privateClickMeasurement->startTimer(0_s);
> 
> I seem to recall some startImmediate() function. Was this how it was done in
> the earlier implementation? Maybe I found that the *immediate function
> didn't behave as I wanted.

I did not see a startImmediate function. This call is for tests only to make sure we send the report before the test times out.

> 
> > Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.cpp:61
> > +    startTimer(5_s);
> 
> What is this timer for?

After we start up the session, there may be reports that have 'expired' (i.e. the reporting delay time has passed) while the session was closed. This makes sure they get sent as soon as possible after session starts. The 5 seconds is to account for any delay in ITP starting up, since it is created after the PCM manager.

Let me know if you don't think this is the right behavior for persisted reports.

> 
> > Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.h:43
> > +enum class PrivateClickMeasurementAttributionType { Unattributed, Attributed };
> 
> bool

Will fix.
Comment 12 John Wilander 2020-11-20 15:44:07 PST
(In reply to katherine_cheney from comment #11)
> (In reply to John Wilander from comment #10)
> > Comment on attachment 414691 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=414691&action=review
> > 
> > R+ with comments. The existing tests should ensure that the logic is kept
> > sound. As for database expertise, I can't help you much. I'm relying on your
> > skill there. Just making sure - PCM is still disabled for ephemeral
> > sessions, right? That is the intention.
> > 
> 
> Yes, there are checks in every WebResourceLoadStatistics function ensuring
> PCM is disabled for ephemeral sessions.
> 
> > > Source/WebCore/ChangeLog:18
> > > +        - convert(ed) â> attribute(d)
> > 
> > Seems to be a Unicode dash in these arrows. Please change to ancient ASCII.
> > :/
> 
> Will fix. 
> 
> > 
> > > Source/WebKit/ChangeLog:21
> > > +        This adds 3 SQLite tables: one for ad clicks that haven't been
> > 
> > I'd just call them clicks to avoid advertising language.
> > 
> 
> Will fix.
> 
> > > Source/WebCore/html/HTMLAnchorElement.cpp:405
> > > +    using AttributeOnSite = PrivateClickMeasurement::AttributeOnSite;
> > 
> > Thank you for doing this renaming work!
> > 
> 
> Sure!
> 
> > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsDatabaseStore.cpp:3064
> > > +    clearExpiredPrivateClickMeasurement();
> > 
> > We may want to add a comment here to make it clear that this call is
> > important. We're lazily clearing before we attribute and that's how we make
> > sure we don't attribute expired clicks.
> > 
> 
> Ok, I will add before landing.
> 
> > > Source/WebKit/NetworkProcess/NetworkSession.cpp:193
> > > +    m_privateClickMeasurement->startTimer(0_s);
> > 
> > I seem to recall some startImmediate() function. Was this how it was done in
> > the earlier implementation? Maybe I found that the *immediate function
> > didn't behave as I wanted.
> 
> I did not see a startImmediate function. This call is for tests only to make
> sure we send the report before the test times out.
> 
> > 
> > > Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.cpp:61
> > > +    startTimer(5_s);
> > 
> > What is this timer for?
> 
> After we start up the session, there may be reports that have 'expired'
> (i.e. the reporting delay time has passed) while the session was closed.
> This makes sure they get sent as soon as possible after session starts. The
> 5 seconds is to account for any delay in ITP starting up, since it is
> created after the PCM manager.

Oh! Let's add a brief comment explaining that. Thanks!

> Let me know if you don't think this is the right behavior for persisted
> reports.
> 
> > 
> > > Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.h:43
> > > +enum class PrivateClickMeasurementAttributionType { Unattributed, Attributed };
> > 
> > bool
> 
> Will fix.
Comment 13 Kate Cheney 2020-11-20 15:59:57 PST
Created attachment 414742 [details]
Patch for landing
Comment 14 Kate Cheney 2020-11-20 16:00:30 PST
Thanks for the review!

> > > > Source/WebKit/NetworkProcess/PrivateClickMeasurementManager.cpp:61
> > > > +    startTimer(5_s);
> > > 
> > > What is this timer for?
> > 
> > After we start up the session, there may be reports that have 'expired'
> > (i.e. the reporting delay time has passed) while the session was closed.
> > This makes sure they get sent as soon as possible after session starts. The
> > 5 seconds is to account for any delay in ITP starting up, since it is
> > created after the PCM manager.
> 
> Oh! Let's add a brief comment explaining that. Thanks!
> 

Added!
Comment 15 EWS 2020-11-20 16:38:52 PST
Committed r270136: <https://trac.webkit.org/changeset/270136>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 414742 [details].