Bug 182664

Summary: Resource Load Statistics: Classify resources as prevalent based on redirects to other prevalent resources
Product: WebKit Reporter: John Wilander <wilander>
Component: WebKit Misc.Assignee: John Wilander <wilander>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, bfulgham, cdumez, commit-queue, dbates, ews-watchlist, japhet, mcatanzaro, webkit-bug-importer, wilander
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=182879
https://bugs.webkit.org/show_bug.cgi?id=182983
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Description John Wilander 2018-02-09 16:33:21 PST
A and B should be classified as prevalent if domain A has redirected to domain B which in turn has redirected to prevalent resource domain C.
Comment 1 John Wilander 2018-02-09 16:33:31 PST
rdar://problem/37372572
Comment 2 John Wilander 2018-02-09 17:03:36 PST
Created attachment 333536 [details]
Patch
Comment 3 John Wilander 2018-02-09 17:12:42 PST
Created attachment 333539 [details]
Patch
Comment 4 Brent Fulgham 2018-02-12 10:28:43 PST
Comment on attachment 333539 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=333539&action=review

Very nice! I asked you to make a few adjustments to calm my fears, but otherwise this looks good.

> Source/WebCore/loader/ResourceLoadObserver.cpp:167
> +    auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

I don't see why this line had to be moved. It makes me nervous, because we have had many threading/memory corruption bugs caused by invalidated iterators, and I'd prefer to limit the scope of these references to the smaller area possible. It doesn't seem like this is referenced again in your new code below, so I don't think you should make this change.

Or did you do this to make sure there is an existing entry for the top frame in the redirect code below? If so, I propose you do so at the point you need it, and add the same comment you do elsewhere ("For consistency, make sure we also have a statistics entry for the top frame domain."). I would also give it the name "topFrameStatistics" in the redirect sections below.

> Source/WebCore/loader/ResourceLoadObserver.cpp:234
> +    auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

Ditto my comments in the "logFrameNavigation" code.

> Source/WebKit/UIProcess/WebResourceLoadStatisticsStore.cpp:488
> +{

Please add "ASSERT(!RunLoop::isMain());" here.
Comment 5 John Wilander 2018-02-12 10:40:35 PST
(In reply to Brent Fulgham from comment #4)
> Comment on attachment 333539 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=333539&action=review
> 
> Very nice! I asked you to make a few adjustments to calm my fears, but
> otherwise this looks good.
> 
> > Source/WebCore/loader/ResourceLoadObserver.cpp:167
> > +    auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> 
> I don't see why this line had to be moved. It makes me nervous, because we
> have had many threading/memory corruption bugs caused by invalidated
> iterators, and I'd prefer to limit the scope of these references to the
> smaller area possible. It doesn't seem like this is referenced again in your
> new code below, so I don't think you should make this change.

I see. However, I do use targetStatistics in two more places now, which is why I moved it. Please see new lines 182 and 185 where I now add topFrameUniqueRedirectsFrom and subresourceUniqueRedirectsFrom respectively.

Previously we only added statistics to the redirecting domain but now we need to keep track of both directions for the purpose of traversing the redirect graph backwards.

Please advice.

> Or did you do this to make sure there is an existing entry for the top frame
> in the redirect code below? If so, I propose you do so at the point you need
> it, and add the same comment you do elsewhere ("For consistency, make sure
> we also have a statistics entry for the top frame domain."). I would also
> give it the name "topFrameStatistics" in the redirect sections below.
> 
> > Source/WebCore/loader/ResourceLoadObserver.cpp:234
> > +    auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> 
> Ditto my comments in the "logFrameNavigation" code.

Ditto reply. Please advice.

> > Source/WebKit/UIProcess/WebResourceLoadStatisticsStore.cpp:488
> > +{
> 
> Please add "ASSERT(!RunLoop::isMain());" here.

Will do.
Comment 6 Brent Fulgham 2018-02-12 11:09:49 PST
Comment on attachment 333539 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=333539&action=review

Now that I understand better, this really was introducing a nasty bug. I'm changing to r- for you to fix. :-)

>>> Source/WebCore/loader/ResourceLoadObserver.cpp:167
>>> +    auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
>> 
>> I don't see why this line had to be moved. It makes me nervous, because we have had many threading/memory corruption bugs caused by invalidated iterators, and I'd prefer to limit the scope of these references to the smaller area possible. It doesn't seem like this is referenced again in your new code below, so I don't think you should make this change.
>> 
>> Or did you do this to make sure there is an existing entry for the top frame in the redirect code below? If so, I propose you do so at the point you need it, and add the same comment you do elsewhere ("For consistency, make sure we also have a statistics entry for the top frame domain."). I would also give it the name "topFrameStatistics" in the redirect sections below.
> 
> I see. However, I do use targetStatistics in two more places now, which is why I moved it. Please see new lines 182 and 185 where I now add topFrameUniqueRedirectsFrom and subresourceUniqueRedirectsFrom respectively.
> 
> Previously we only added statistics to the redirecting domain but now we need to keep track of both directions for the purpose of traversing the redirect graph backwards.
> 
> Please advice.

Ah! I missed that.

This is dangerous, because both calls to "ensureResourceStatisticsForPrimaryDomain" potentially insert content into the HashSet, invalidating the reference you obtained earlier.

I think in the redirect case you need to change to retrieving a copy of the content, then setting it back in the HashSet to avoid creating another set of difficult-to-debug crashes.

So:

1. Leave the original targetStatistics code on line 172 alone. That one is fine.
2. In the redirect section, do one of these two things:

(a) Assign the return value of the 'ensureResourceStatisticsForPrimaryDomain" to a temporary value (a copy), and then set the value back in the m_resourceStatisticsMap,
-or-
(b) Update the 'redirectingOriginStatistics" as you have done, then do a call to "ensureResourceStatistics..." to a local reference and then update the 'targetStatistics' value.

If you don't the 'targetStatistics' object will be potentially deallocated memory and your use of it will bring about the end of the world. Or at least, will cause some nasty security bugs.

> Source/WebCore/loader/ResourceLoadObserver.cpp:181
> +            isNewRedirectToEntry = redirectingOriginStatistics.topFrameUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry;

So I suggest that immediately after line 181 you do:

auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

> Source/WebCore/loader/ResourceLoadObserver.cpp:184
> +            isNewRedirectToEntry = redirectingOriginStatistics.subresourceUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry;

Ditto, right here after line 184:

auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
Comment 7 John Wilander 2018-02-12 11:13:33 PST
(In reply to Brent Fulgham from comment #6)
> Comment on attachment 333539 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=333539&action=review
> 
> Now that I understand better, this really was introducing a nasty bug. I'm
> changing to r- for you to fix. :-)
> 
> >>> Source/WebCore/loader/ResourceLoadObserver.cpp:167
> >>> +    auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> >> 
> >> I don't see why this line had to be moved. It makes me nervous, because we have had many threading/memory corruption bugs caused by invalidated iterators, and I'd prefer to limit the scope of these references to the smaller area possible. It doesn't seem like this is referenced again in your new code below, so I don't think you should make this change.
> >> 
> >> Or did you do this to make sure there is an existing entry for the top frame in the redirect code below? If so, I propose you do so at the point you need it, and add the same comment you do elsewhere ("For consistency, make sure we also have a statistics entry for the top frame domain."). I would also give it the name "topFrameStatistics" in the redirect sections below.
> > 
> > I see. However, I do use targetStatistics in two more places now, which is why I moved it. Please see new lines 182 and 185 where I now add topFrameUniqueRedirectsFrom and subresourceUniqueRedirectsFrom respectively.
> > 
> > Previously we only added statistics to the redirecting domain but now we need to keep track of both directions for the purpose of traversing the redirect graph backwards.
> > 
> > Please advice.
> 
> Ah! I missed that.
> 
> This is dangerous, because both calls to
> "ensureResourceStatisticsForPrimaryDomain" potentially insert content into
> the HashSet, invalidating the reference you obtained earlier.
> 
> I think in the redirect case you need to change to retrieving a copy of the
> content, then setting it back in the HashSet to avoid creating another set
> of difficult-to-debug crashes.
> 
> So:
> 
> 1. Leave the original targetStatistics code on line 172 alone. That one is
> fine.
> 2. In the redirect section, do one of these two things:
> 
> (a) Assign the return value of the
> 'ensureResourceStatisticsForPrimaryDomain" to a temporary value (a copy),
> and then set the value back in the m_resourceStatisticsMap,
> -or-
> (b) Update the 'redirectingOriginStatistics" as you have done, then do a
> call to "ensureResourceStatistics..." to a local reference and then update
> the 'targetStatistics' value.
> 
> If you don't the 'targetStatistics' object will be potentially deallocated
> memory and your use of it will bring about the end of the world. Or at
> least, will cause some nasty security bugs.
> 
> > Source/WebCore/loader/ResourceLoadObserver.cpp:181
> > +            isNewRedirectToEntry = redirectingOriginStatistics.topFrameUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry;
> 
> So I suggest that immediately after line 181 you do:
> 
> auto& targetStatistics =
> ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
> 
> > Source/WebCore/loader/ResourceLoadObserver.cpp:184
> > +            isNewRedirectToEntry = redirectingOriginStatistics.subresourceUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry;
> 
> Ditto, right here after line 184:
> 
> auto& targetStatistics =
> ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);

Thanks, Brent! I will do these changes and get it up here for you to look at again.
Comment 8 John Wilander 2018-02-12 13:40:14 PST
Created attachment 333626 [details]
Patch
Comment 9 John Wilander 2018-02-12 13:48:34 PST
Created attachment 333628 [details]
Patch
Comment 10 Brent Fulgham 2018-02-12 17:21:42 PST
Comment on attachment 333628 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=333628&action=review

Looks great! r=me (but please fix the boolean initialization stuff I mentioned).

> Source/WebCore/loader/ResourceLoadObserver.cpp:178
> +        bool isNewRedirectFromEntry;

I would initialize these to false, just because I am pedantic that way.

> Source/WebCore/loader/ResourceLoadObserver.cpp:247
> +        bool isNewRedirectFromEntry;

Don't declare these up here. Just declare them at initialization.

> Source/WebCore/loader/ResourceLoadObserver.cpp:249
> +        isNewRedirectToEntry = redirectingOriginStatistics.subresourceUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry;

bool isNewRedirectToEntry = redirectingOrigin....

> Source/WebCore/loader/ResourceLoadObserver.cpp:251
> +        isNewRedirectFromEntry = targetStatistics.subresourceUniqueRedirectsFrom.add(sourcePrimaryDomain).isNewEntry;

bool isNewRedirectFromEntry = targetStatistics.subresource...
Comment 11 John Wilander 2018-02-12 20:59:57 PST
Created attachment 333669 [details]
Patch for landing
Comment 12 John Wilander 2018-02-12 21:00:31 PST
Thanks for the review, Brent!
Comment 13 John Wilander 2018-02-12 22:07:59 PST
Comment on attachment 333669 [details]
Patch for landing

Tree seems red. Landing directly instead.
Comment 14 John Wilander 2018-02-12 22:08:20 PST
Committed r228416: <https://trac.webkit.org/changeset/228416>
Comment 15 Michael Catanzaro 2018-02-13 18:57:51 PST
Can I ask: what is a prevalent resource?
Comment 16 John Wilander 2018-02-13 20:18:54 PST
A prevalent resource is a domain that has shown its ability to track the user cross-site. Some simply call them trackers but since we don’t know what they do with their ability to track, we just call them prevalent.
Comment 17 Michael Catanzaro 2018-02-13 21:07:28 PST
Everything is much less confusing now, thanks!