RESOLVED FIXED Bug 182664
Resource Load Statistics: Classify resources as prevalent based on redirects to other prevalent resources
https://bugs.webkit.org/show_bug.cgi?id=182664
Summary Resource Load Statistics: Classify resources as prevalent based on redirects ...
John Wilander
Reported 2018-02-09 16:33:21 PST
A and B should be classified as prevalent if domain A has redirected to domain B which in turn has redirected to prevalent resource domain C.
Attachments
Patch (58.33 KB, patch)
2018-02-09 17:03 PST, John Wilander
no flags
Patch (59.63 KB, patch)
2018-02-09 17:12 PST, John Wilander
no flags
Patch (58.82 KB, patch)
2018-02-12 13:40 PST, John Wilander
no flags
Patch (58.69 KB, patch)
2018-02-12 13:48 PST, John Wilander
no flags
Patch for landing (58.63 KB, patch)
2018-02-12 20:59 PST, John Wilander
no flags
John Wilander
Comment 1 2018-02-09 16:33:31 PST
John Wilander
Comment 2 2018-02-09 17:03:36 PST
John Wilander
Comment 3 2018-02-09 17:12:42 PST
Brent Fulgham
Comment 4 2018-02-12 10:28:43 PST
Comment on attachment 333539 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=333539&action=review Very nice! I asked you to make a few adjustments to calm my fears, but otherwise this looks good. > Source/WebCore/loader/ResourceLoadObserver.cpp:167 > + auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); I don't see why this line had to be moved. It makes me nervous, because we have had many threading/memory corruption bugs caused by invalidated iterators, and I'd prefer to limit the scope of these references to the smaller area possible. It doesn't seem like this is referenced again in your new code below, so I don't think you should make this change. Or did you do this to make sure there is an existing entry for the top frame in the redirect code below? If so, I propose you do so at the point you need it, and add the same comment you do elsewhere ("For consistency, make sure we also have a statistics entry for the top frame domain."). I would also give it the name "topFrameStatistics" in the redirect sections below. > Source/WebCore/loader/ResourceLoadObserver.cpp:234 > + auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); Ditto my comments in the "logFrameNavigation" code. > Source/WebKit/UIProcess/WebResourceLoadStatisticsStore.cpp:488 > +{ Please add "ASSERT(!RunLoop::isMain());" here.
John Wilander
Comment 5 2018-02-12 10:40:35 PST
(In reply to Brent Fulgham from comment #4) > Comment on attachment 333539 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333539&action=review > > Very nice! I asked you to make a few adjustments to calm my fears, but > otherwise this looks good. > > > Source/WebCore/loader/ResourceLoadObserver.cpp:167 > > + auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); > > I don't see why this line had to be moved. It makes me nervous, because we > have had many threading/memory corruption bugs caused by invalidated > iterators, and I'd prefer to limit the scope of these references to the > smaller area possible. It doesn't seem like this is referenced again in your > new code below, so I don't think you should make this change. I see. However, I do use targetStatistics in two more places now, which is why I moved it. Please see new lines 182 and 185 where I now add topFrameUniqueRedirectsFrom and subresourceUniqueRedirectsFrom respectively. Previously we only added statistics to the redirecting domain but now we need to keep track of both directions for the purpose of traversing the redirect graph backwards. Please advice. > Or did you do this to make sure there is an existing entry for the top frame > in the redirect code below? If so, I propose you do so at the point you need > it, and add the same comment you do elsewhere ("For consistency, make sure > we also have a statistics entry for the top frame domain."). I would also > give it the name "topFrameStatistics" in the redirect sections below. > > > Source/WebCore/loader/ResourceLoadObserver.cpp:234 > > + auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); > > Ditto my comments in the "logFrameNavigation" code. Ditto reply. Please advice. > > Source/WebKit/UIProcess/WebResourceLoadStatisticsStore.cpp:488 > > +{ > > Please add "ASSERT(!RunLoop::isMain());" here. Will do.
Brent Fulgham
Comment 6 2018-02-12 11:09:49 PST
Comment on attachment 333539 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=333539&action=review Now that I understand better, this really was introducing a nasty bug. I'm changing to r- for you to fix. :-) >>> Source/WebCore/loader/ResourceLoadObserver.cpp:167 >>> + auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); >> >> I don't see why this line had to be moved. It makes me nervous, because we have had many threading/memory corruption bugs caused by invalidated iterators, and I'd prefer to limit the scope of these references to the smaller area possible. It doesn't seem like this is referenced again in your new code below, so I don't think you should make this change. >> >> Or did you do this to make sure there is an existing entry for the top frame in the redirect code below? If so, I propose you do so at the point you need it, and add the same comment you do elsewhere ("For consistency, make sure we also have a statistics entry for the top frame domain."). I would also give it the name "topFrameStatistics" in the redirect sections below. > > I see. However, I do use targetStatistics in two more places now, which is why I moved it. Please see new lines 182 and 185 where I now add topFrameUniqueRedirectsFrom and subresourceUniqueRedirectsFrom respectively. > > Previously we only added statistics to the redirecting domain but now we need to keep track of both directions for the purpose of traversing the redirect graph backwards. > > Please advice. Ah! I missed that. This is dangerous, because both calls to "ensureResourceStatisticsForPrimaryDomain" potentially insert content into the HashSet, invalidating the reference you obtained earlier. I think in the redirect case you need to change to retrieving a copy of the content, then setting it back in the HashSet to avoid creating another set of difficult-to-debug crashes. So: 1. Leave the original targetStatistics code on line 172 alone. That one is fine. 2. In the redirect section, do one of these two things: (a) Assign the return value of the 'ensureResourceStatisticsForPrimaryDomain" to a temporary value (a copy), and then set the value back in the m_resourceStatisticsMap, -or- (b) Update the 'redirectingOriginStatistics" as you have done, then do a call to "ensureResourceStatistics..." to a local reference and then update the 'targetStatistics' value. If you don't the 'targetStatistics' object will be potentially deallocated memory and your use of it will bring about the end of the world. Or at least, will cause some nasty security bugs. > Source/WebCore/loader/ResourceLoadObserver.cpp:181 > + isNewRedirectToEntry = redirectingOriginStatistics.topFrameUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry; So I suggest that immediately after line 181 you do: auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); > Source/WebCore/loader/ResourceLoadObserver.cpp:184 > + isNewRedirectToEntry = redirectingOriginStatistics.subresourceUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry; Ditto, right here after line 184: auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain);
John Wilander
Comment 7 2018-02-12 11:13:33 PST
(In reply to Brent Fulgham from comment #6) > Comment on attachment 333539 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=333539&action=review > > Now that I understand better, this really was introducing a nasty bug. I'm > changing to r- for you to fix. :-) > > >>> Source/WebCore/loader/ResourceLoadObserver.cpp:167 > >>> + auto& targetStatistics = ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); > >> > >> I don't see why this line had to be moved. It makes me nervous, because we have had many threading/memory corruption bugs caused by invalidated iterators, and I'd prefer to limit the scope of these references to the smaller area possible. It doesn't seem like this is referenced again in your new code below, so I don't think you should make this change. > >> > >> Or did you do this to make sure there is an existing entry for the top frame in the redirect code below? If so, I propose you do so at the point you need it, and add the same comment you do elsewhere ("For consistency, make sure we also have a statistics entry for the top frame domain."). I would also give it the name "topFrameStatistics" in the redirect sections below. > > > > I see. However, I do use targetStatistics in two more places now, which is why I moved it. Please see new lines 182 and 185 where I now add topFrameUniqueRedirectsFrom and subresourceUniqueRedirectsFrom respectively. > > > > Previously we only added statistics to the redirecting domain but now we need to keep track of both directions for the purpose of traversing the redirect graph backwards. > > > > Please advice. > > Ah! I missed that. > > This is dangerous, because both calls to > "ensureResourceStatisticsForPrimaryDomain" potentially insert content into > the HashSet, invalidating the reference you obtained earlier. > > I think in the redirect case you need to change to retrieving a copy of the > content, then setting it back in the HashSet to avoid creating another set > of difficult-to-debug crashes. > > So: > > 1. Leave the original targetStatistics code on line 172 alone. That one is > fine. > 2. In the redirect section, do one of these two things: > > (a) Assign the return value of the > 'ensureResourceStatisticsForPrimaryDomain" to a temporary value (a copy), > and then set the value back in the m_resourceStatisticsMap, > -or- > (b) Update the 'redirectingOriginStatistics" as you have done, then do a > call to "ensureResourceStatistics..." to a local reference and then update > the 'targetStatistics' value. > > If you don't the 'targetStatistics' object will be potentially deallocated > memory and your use of it will bring about the end of the world. Or at > least, will cause some nasty security bugs. > > > Source/WebCore/loader/ResourceLoadObserver.cpp:181 > > + isNewRedirectToEntry = redirectingOriginStatistics.topFrameUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry; > > So I suggest that immediately after line 181 you do: > > auto& targetStatistics = > ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); > > > Source/WebCore/loader/ResourceLoadObserver.cpp:184 > > + isNewRedirectToEntry = redirectingOriginStatistics.subresourceUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry; > > Ditto, right here after line 184: > > auto& targetStatistics = > ensureResourceStatisticsForPrimaryDomain(targetPrimaryDomain); Thanks, Brent! I will do these changes and get it up here for you to look at again.
John Wilander
Comment 8 2018-02-12 13:40:14 PST
John Wilander
Comment 9 2018-02-12 13:48:34 PST
Brent Fulgham
Comment 10 2018-02-12 17:21:42 PST
Comment on attachment 333628 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=333628&action=review Looks great! r=me (but please fix the boolean initialization stuff I mentioned). > Source/WebCore/loader/ResourceLoadObserver.cpp:178 > + bool isNewRedirectFromEntry; I would initialize these to false, just because I am pedantic that way. > Source/WebCore/loader/ResourceLoadObserver.cpp:247 > + bool isNewRedirectFromEntry; Don't declare these up here. Just declare them at initialization. > Source/WebCore/loader/ResourceLoadObserver.cpp:249 > + isNewRedirectToEntry = redirectingOriginStatistics.subresourceUniqueRedirectsTo.add(targetPrimaryDomain).isNewEntry; bool isNewRedirectToEntry = redirectingOrigin.... > Source/WebCore/loader/ResourceLoadObserver.cpp:251 > + isNewRedirectFromEntry = targetStatistics.subresourceUniqueRedirectsFrom.add(sourcePrimaryDomain).isNewEntry; bool isNewRedirectFromEntry = targetStatistics.subresource...
John Wilander
Comment 11 2018-02-12 20:59:57 PST
Created attachment 333669 [details] Patch for landing
John Wilander
Comment 12 2018-02-12 21:00:31 PST
Thanks for the review, Brent!
John Wilander
Comment 13 2018-02-12 22:07:59 PST
Comment on attachment 333669 [details] Patch for landing Tree seems red. Landing directly instead.
John Wilander
Comment 14 2018-02-12 22:08:20 PST
Michael Catanzaro
Comment 15 2018-02-13 18:57:51 PST
Can I ask: what is a prevalent resource?
John Wilander
Comment 16 2018-02-13 20:18:54 PST
A prevalent resource is a domain that has shown its ability to track the user cross-site. Some simply call them trackers but since we don’t know what they do with their ability to track, we just call them prevalent.
Michael Catanzaro
Comment 17 2018-02-13 21:07:28 PST
Everything is much less confusing now, thanks!
Note You need to log in before you can comment on or make changes to this bug.