Bug 194791

Summary: Introduce and adopt new class RegistrableDomain for eTLD+1
Product: WebKit Reporter: John Wilander <wilander>
Component: WebKit Misc.Assignee: John Wilander <wilander>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, bfulgham, cdumez, commit-queue, don.olmstead, ews-watchlist, Hironori.Fujii, rniwa, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=194664
https://bugs.webkit.org/show_bug.cgi?id=195071
Attachments:
Description Flags
Patch
none
Archive of layout-test-results from ews104 for mac-highsierra-wk2
none
Archive of layout-test-results from ews124 for ios-simulator-wk2
none
Patch
none
Patch for landing none

Description John Wilander 2019-02-18 14:47:30 PST
We should have a dedicated class representing a "registrable domain" or eTLD+1. This concept is used heavily in WebKit, for instance in partitioning, SameSite cookies, and Resource Load Statistics (a.k.a. Intelligent Tracking Prevention). Currently, registrable domains are passed as strings and we've had at least two bugs where the host, including subdomains, was passed instead. Type safety will prevent such bugs in the future.
Comment 1 Radar WebKit Bug Importer 2019-02-18 14:48:23 PST
<rdar://problem/48179240>
Comment 2 John Wilander 2019-02-21 18:59:26 PST
I will not adopt the new class for partitioning in this bug. The patch is large enough as it is. Retitling.
Comment 3 John Wilander 2019-02-21 19:21:24 PST
Created attachment 362683 [details]
Patch
Comment 4 John Wilander 2019-02-21 19:24:26 PST
I'm sorry for the patch size but this was a natural chunk size.
Comment 5 EWS Watchlist 2019-02-21 20:51:28 PST
Comment on attachment 362683 [details]
Patch

Attachment 362683 [details] did not pass mac-wk2-ews (mac-wk2):
Output: https://webkit-queues.webkit.org/results/11241661

New failing tests:
imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/module/dynamic-import/dynamic-imports-fetch-error.sub.html
imported/w3c/web-platform-tests/infrastructure/server/wpt-server-http.sub.html
imported/w3c/web-platform-tests/credential-management/passwordcredential-framed-get.sub.https.html
imported/w3c/web-platform-tests/cors/redirect-userinfo.htm
imported/w3c/web-platform-tests/resource-timing/resource_timing_cross_origin_redirect.html
imported/w3c/web-platform-tests/html/dom/reflection-embedded.html
imported/w3c/web-platform-tests/eventsource/eventsource-cross-origin.htm
imported/w3c/web-platform-tests/credential-management/federatedcredential-framed-get.sub.https.html
http/tests/xmlhttprequest/access-control-and-redirects-async.html
Comment 6 EWS Watchlist 2019-02-21 20:51:30 PST
Created attachment 362692 [details]
Archive of layout-test-results from ews104 for mac-highsierra-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-highsierra-wk2  Platform: Mac OS X 10.13.6
Comment 7 Ryosuke Niwa 2019-02-21 20:59:42 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

Just a drive by comment...

> Source/WebCore/platform/RegistrableDomain.h:35
> +// This class represents a domain's eTLD+1 (effective top level domain plus

We don't usually add a class description like this.

> Source/WebCore/platform/RegistrableDomain.h:40
> +// the Public Suffix List (https://publicsuffix.org). This class just
> +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST.

In particular, this is the kind of information you can directly get from reading code, and tends to get outdated rather quickly.
Comment 8 EWS Watchlist 2019-02-21 21:36:40 PST
Comment on attachment 362683 [details]
Patch

Attachment 362683 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: https://webkit-queues.webkit.org/results/11241767

New failing tests:
imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/module/dynamic-import/dynamic-imports-fetch-error.sub.html
imported/w3c/web-platform-tests/infrastructure/server/wpt-server-http.sub.html
imported/w3c/web-platform-tests/credential-management/passwordcredential-framed-get.sub.https.html
imported/w3c/web-platform-tests/cors/redirect-userinfo.htm
imported/w3c/web-platform-tests/resource-timing/resource_timing_cross_origin_redirect.html
imported/w3c/web-platform-tests/html/dom/reflection-embedded.html
http/tests/quicklook/secure-document-with-subresources.html
imported/w3c/web-platform-tests/eventsource/eventsource-cross-origin.htm
imported/w3c/web-platform-tests/credential-management/federatedcredential-framed-get.sub.https.html
http/tests/xmlhttprequest/access-control-and-redirects-async.html
Comment 9 EWS Watchlist 2019-02-21 21:36:41 PST
Created attachment 362696 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.13.6
Comment 10 Fujii Hironori 2019-02-22 02:40:09 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

> Source/WebCore/loader/AdClickAttribution.h:79
> +            return registrableDomain == RegistrableDomain { url.host().toString() };

This can be 'RegistrableDomain { url }'.

> Source/WebCore/loader/AdClickAttribution.h:249
> +    encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend;

Is '.string()' needed? Even though RegistrableDomain has 'encode' method.

> Source/WebCore/platform/RegistrableDomain.h:67
> +    bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; }

I want a method to compare with URL, for example 'matches'.
Can it be implemented by using StringView.endsWith?

bool matches(URL url) {
  auto host = url.host();
  if (!host.endsWith(m_registrableDomain))
    return fasle;
   if (host.length() == m_registrableDomain.length())
   return true;
  return host[host.length() - m_registrableDomain.length()] == '.'
}

> Source/WebKit/UIProcess/WebProcessProxy.cpp:334
> +        page.value->postMessageToInjectedBundle("WebsiteDataDeletionForRegistrableDomainsFinished", nullptr);

Wow! WK2 common code is sending messages to InjectedBundle. IIUC, this is not allowed.
Comment 11 Fujii Hironori 2019-02-22 02:43:04 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

> Source/WebCore/html/HTMLAnchorElement.cpp:446
> +    if (RegistrableDomain(documentDomain) == RegistrableDomain(adDestinationHost)) {

For example, this can be 'if (RegistrableDomain(documentDomain).matches(adDestinationURL))' if RegistrableDomain has 'matches' method.

>> Source/WebCore/loader/AdClickAttribution.h:79
>> +            return registrableDomain == RegistrableDomain { url.host().toString() };
> 
> This can be 'RegistrableDomain { url }'.

And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.
Comment 12 Don Olmstead 2019-02-22 09:15:03 PST
John the WinCairo build is just failing because we haven't started enumerating the WebCore headers for copying. You can ignore the build failure.
Comment 13 Brent Fulgham 2019-02-22 09:52:49 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

I think this change is really, really great! I think it could be improved a bit as suggested by Ryousuke, Fuji, and others, but it's a huge improvement.

>> Source/WebCore/loader/AdClickAttribution.h:249
>> +    encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend;
> 
> Is '.string()' needed? Even though RegistrableDomain has 'encode' method.

Yes, this doesn't seem necessary if RD can encode/decode itself.

>> Source/WebCore/platform/RegistrableDomain.h:40
>> +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST.
> 
> In particular, this is the kind of information you can directly get from reading code, and tends to get outdated rather quickly.

This would be good information in the ChangeLog, though.

> Source/WebCore/platform/RegistrableDomain.h:47
> +        : m_registrableDomain { topPrivatelyControlledDomain(url.host().toString()) }

Is this right? Our old code did this:

    auto domain = topPrivatelyControlledDomain(url.host().toString());
    if (domain.isEmpty())
        domain = url.host().toString();

>> Source/WebCore/platform/RegistrableDomain.h:67
>> +    bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; }
> 
> I want a method to compare with URL, for example 'matches'.
> Can it be implemented by using StringView.endsWith?
> 
> bool matches(URL url) {
>   auto host = url.host();
>   if (!host.endsWith(m_registrableDomain))
>     return fasle;
>    if (host.length() == m_registrableDomain.length())
>    return true;
>   return host[host.length() - m_registrableDomain.length()] == '.'
> }

This does seem like a good idea, especially if we can avoid converting the entire URL into a temporary string just to compare the host portion..

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:255
> +            workQueue->dispatch([domainsWithDeletedWebsiteData = crossThreadCopy(domainsWithDeletedWebsiteData), callback = WTFMove(callback), weakThis = WTFMove(weakThis)] () mutable {

It seems like both tiers of lambdas here should work in terms of HashSets of RegistrableDomains, rather than strings that have to be converted to RegistrableDomains later.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:260
> +                for (auto& domain : domainsWithDeletedWebsiteData) {

Much better naming!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:289
> +        auto mapEntry = m_resourceStatisticsMap.find(RegistrableDomain { subresourceUniqueRedirectFromDomain });

We should make 'resourceStatistic.subresourceUniqueRedirectsFrom' (and the others) containers of RegistrableDomain when they are serialized out of the plist. Then you don't need to do this conversion and string copying every time you loop or search.

Likewise, m_resourceStatisticsMap should be a container keyed by RegistrableDomain for the same reason.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:58
> +    using SubFrameDomain = WebCore::RegistrableDomain;

Great!

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:222
> +

Extra whitespace

> Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63
> +    using ResourceLoadStatistics = WebCore::ResourceLoadStatistics;

I really like this approach. I wonder if all of this code would be better/less error prone if instead of 'using' you actually had subclasses for each of these concepts. This would prevent us from accidentally mixing up the TopFrameDomain and SubFrameDomain when passing to a helper function.

> Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:-622
> -void WebsiteDataStore::fetchDataForTopPrivatelyControlledDomains(OptionSet<WebsiteDataType> dataTypes, OptionSet<WebsiteDataFetchOption> fetchOptions, const Vector<String>& topPrivatelyControlledDomains, Function<void(Vector<WebsiteDataRecord>&&, HashSet<String>&&)>&& completionHandler)

Wow! Was this never used?
Comment 14 Brent Fulgham 2019-02-22 09:56:48 PST
(In reply to Build Bot from comment #6)
> Created attachment 362692 [details]
> Archive of layout-test-results from ews104 for mac-highsierra-wk2
> 
> The attached test failures were seen while running run-webkit-tests on the
> mac-wk2-ews.
> Bot: ews104  Port: mac-highsierra-wk2  Platform: Mac OS X 10.13.6

It looks like you introduced a nullptr dereference somewhere.
Comment 15 Alex Christensen 2019-02-22 10:39:47 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

>>> Source/WebCore/loader/AdClickAttribution.h:79
>>> +            return registrableDomain == RegistrableDomain { url.host().toString() };
>> 
>> This can be 'RegistrableDomain { url }'.
> 
> And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.

Let's not add a matches method to RegistrableDomain.  That's what operator== is for.
Comment 16 Brent Fulgham 2019-02-22 10:43:52 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

>>>> Source/WebCore/loader/AdClickAttribution.h:79
>>>> +            return registrableDomain == RegistrableDomain { url.host().toString() };
>>> 
>>> This can be 'RegistrableDomain { url }'.
>> 
>> And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.
> 
> Let's not add a matches method to RegistrableDomain.  That's what operator== is for.

How about operator== that takes a URL argument?
Comment 17 John Wilander 2019-02-22 13:36:59 PST
Thanks Ryosuke, Fuji, Brent, and Alex for all your comments. I will work through them now. And thanks Don, for the heads up on the WinCairo bot.
Comment 18 Chris Dumez 2019-02-22 13:40:53 PST
(In reply to Brent Fulgham from comment #16)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> >>>> Source/WebCore/loader/AdClickAttribution.h:79
> >>>> +            return registrableDomain == RegistrableDomain { url.host().toString() };
> >>> 
> >>> This can be 'RegistrableDomain { url }'.
> >> 
> >> And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.
> > 
> > Let's not add a matches method to RegistrableDomain.  That's what operator== is for.
> 
> How about operator== that takes a URL argument?

I think an operator== on RegistrableDomain that would take in a URL in parameter would be confusing. They are 2 very different things. I'd much prefer a matches() method.
Comment 19 John Wilander 2019-02-22 13:44:38 PST
(In reply to Alex Christensen from comment #15)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> >>> Source/WebCore/loader/AdClickAttribution.h:79
> >>> +            return registrableDomain == RegistrableDomain { url.host().toString() };
> >> 
> >> This can be 'RegistrableDomain { url }'.
> > 
> > And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.
> 
> Let's not add a matches method to RegistrableDomain.  That's what operator==
> is for.

I generally agree, but I think what Fuji is suggesting is different than the == operator.

His proposed .matches() function considers "https://sub.domain.webkit.org" as matching the registrable domain webkit.org. However, I would expect an == comparing a URL with a RegistrableDomain to return true only if the URL's host is equal to the registrable domain. Do you agree?
Comment 20 Alex Christensen 2019-02-22 13:58:46 PST
(In reply to John Wilander from comment #19)
> (In reply to Alex Christensen from comment #15)
> > Comment on attachment 362683 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> > 
> > >>> Source/WebCore/loader/AdClickAttribution.h:79
> > >>> +            return registrableDomain == RegistrableDomain { url.host().toString() };
> > >> 
> > >> This can be 'RegistrableDomain { url }'.
> > > 
> > > And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.
> > 
> > Let's not add a matches method to RegistrableDomain.  That's what operator==
> > is for.
> 
> I generally agree, but I think what Fuji is suggesting is different than the
> == operator.
> 
> His proposed .matches() function considers "https://sub.domain.webkit.org"
> as matching the registrable domain webkit.org. However, I would expect an ==
> comparing a URL with a RegistrableDomain to return true only if the URL's
> host is equal to the registrable domain. Do you agree?

I guess operator==(URL&, RegistrableDomain&) doesn't make sense.  RegistrableDomain.matches makes less sense than URL.hasRegistrableDomain, but I guess it doesn't matter too much.
Comment 21 John Wilander 2019-02-22 15:03:52 PST
(In reply to Ryosuke Niwa from comment #7)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> Just a drive by comment...
> 
> > Source/WebCore/platform/RegistrableDomain.h:35
> > +// This class represents a domain's eTLD+1 (effective top level domain plus
> 
> We don't usually add a class description like this.
> 
> > Source/WebCore/platform/RegistrableDomain.h:40
> > +// the Public Suffix List (https://publicsuffix.org). This class just
> > +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST.
> 
> In particular, this is the kind of information you can directly get from
> reading code, and tends to get outdated rather quickly.

I'll move the description to the Changelog.
Comment 22 John Wilander 2019-02-22 15:05:55 PST
(In reply to Fujii Hironori from comment #10)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> > Source/WebCore/loader/AdClickAttribution.h:79
> > +            return registrableDomain == RegistrableDomain { url.host().toString() };
> 
> This can be 'RegistrableDomain { url }'.

Of course. :) I think I wrote this line before adding the URL constructor to RegistrableDomain.

> > Source/WebCore/loader/AdClickAttribution.h:249
> > +    encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend;
> 
> Is '.string()' needed? Even though RegistrableDomain has 'encode' method.

You're right.

> > Source/WebCore/platform/RegistrableDomain.h:67
> > +    bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; }
> 
> I want a method to compare with URL, for example 'matches'.
> Can it be implemented by using StringView.endsWith?
> 
> bool matches(URL url) {
>   auto host = url.host();
>   if (!host.endsWith(m_registrableDomain))
>     return fasle;
>    if (host.length() == m_registrableDomain.length())
>    return true;
>   return host[host.length() - m_registrableDomain.length()] == '.'
> }

I implemented this and added an API test … which revealed an off-by-one bug in the final line. :P But thanks for the suggestion. It makes sense to have this capability.

> > Source/WebKit/UIProcess/WebProcessProxy.cpp:334
> > +        page.value->postMessageToInjectedBundle("WebsiteDataDeletionForRegistrableDomainsFinished", nullptr);
> 
> Wow! WK2 common code is sending messages to InjectedBundle. IIUC, this is
> not allowed.
Comment 23 John Wilander 2019-02-22 15:07:14 PST
(In reply to Fujii Hironori from comment #11)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> > Source/WebCore/html/HTMLAnchorElement.cpp:446
> > +    if (RegistrableDomain(documentDomain) == RegistrableDomain(adDestinationHost)) {
> 
> For example, this can be 'if
> (RegistrableDomain(documentDomain).matches(adDestinationURL))' if
> RegistrableDomain has 'matches' method.

Yes. Will fix.

> >> Source/WebCore/loader/AdClickAttribution.h:79
> >> +            return registrableDomain == RegistrableDomain { url.host().toString() };
> > 
> > This can be 'RegistrableDomain { url }'.
> 
> And, this can be 'registrableDomain.matches(url) ' if it has 'matches'
> method.

Will fix.
Comment 24 John Wilander 2019-02-22 15:13:40 PST
(In reply to Brent Fulgham from comment #13)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> I think this change is really, really great! I think it could be improved a
> bit as suggested by Ryousuke, Fuji, and others, but it's a huge improvement.
> 
> >> Source/WebCore/loader/AdClickAttribution.h:249
> >> +    encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend;
> > 
> > Is '.string()' needed? Even though RegistrableDomain has 'encode' method.
> 
> Yes, this doesn't seem necessary if RD can encode/decode itself.

Will fix.

> >> Source/WebCore/platform/RegistrableDomain.h:40
> >> +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST.
> > 
> > In particular, this is the kind of information you can directly get from reading code, and tends to get outdated rather quickly.
> 
> This would be good information in the ChangeLog, though.

Yup. Will do that.

> > Source/WebCore/platform/RegistrableDomain.h:47
> > +        : m_registrableDomain { topPrivatelyControlledDomain(url.host().toString()) }
> 
> Is this right? Our old code did this:
> 
>     auto domain = topPrivatelyControlledDomain(url.host().toString());
>     if (domain.isEmpty())
>         domain = url.host().toString();

In the places where we need to have some string representing the "domain," the old code makes sense. This new class should only represent real registrable domains though. I.e. code that needs a representation where there is no registrable domain needs to handle that themselves. Did I remove such handling somewhere?

> >> Source/WebCore/platform/RegistrableDomain.h:67
> >> +    bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; }
> > 
> > I want a method to compare with URL, for example 'matches'.
> > Can it be implemented by using StringView.endsWith?
> > 
> > bool matches(URL url) {
> >   auto host = url.host();
> >   if (!host.endsWith(m_registrableDomain))
> >     return fasle;
> >    if (host.length() == m_registrableDomain.length())
> >    return true;
> >   return host[host.length() - m_registrableDomain.length()] == '.'
> > }
> 
> This does seem like a good idea, especially if we can avoid converting the
> entire URL into a temporary string just to compare the host portion..

Yes, I've added it.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:255
> > +            workQueue->dispatch([domainsWithDeletedWebsiteData = crossThreadCopy(domainsWithDeletedWebsiteData), callback = WTFMove(callback), weakThis = WTFMove(weakThis)] () mutable {
> 
> It seems like both tiers of lambdas here should work in terms of HashSets of
> RegistrableDomains, rather than strings that have to be converted to
> RegistrableDomains later.

I started out working on that but the resulting HashSet<String>& domainsWithDeletedWebsiteData is built up from calls in to various storage management functions that yet don't know about RegistrableDomain. Therefore I left this particular structure as-is to limit the changes in this one patch.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:260
> > +                for (auto& domain : domainsWithDeletedWebsiteData) {
> 
> Much better naming!
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:289
> > +        auto mapEntry = m_resourceStatisticsMap.find(RegistrableDomain { subresourceUniqueRedirectFromDomain });
> 
> We should make 'resourceStatistic.subresourceUniqueRedirectsFrom' (and the
> others) containers of RegistrableDomain when they are serialized out of the
> plist. Then you don't need to do this conversion and string copying every
> time you loop or search.

Yes. However, since these are the data structures that are persisted, I wanted to leave them untouched in this patch and change them in a standalone one. If we regress something in the persistence layer we can just roll out the small patch.

> Likewise, m_resourceStatisticsMap should be a container keyed by
> RegistrableDomain for the same reason.
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:58
> > +    using SubFrameDomain = WebCore::RegistrableDomain;
> 
> Great!
> 
> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:222
> > +
> 
> Extra whitespace

Thanks. Will fix.

> > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63
> > +    using ResourceLoadStatistics = WebCore::ResourceLoadStatistics;
> 
> I really like this approach. I wonder if all of this code would be
> better/less error prone if instead of 'using' you actually had subclasses
> for each of these concepts. This would prevent us from accidentally mixing
> up the TopFrameDomain and SubFrameDomain when passing to a helper function.

I considered that and I still think it's a good idea. But I'd like to add the sub classes in a separate patch.

> > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:-622
> > -void WebsiteDataStore::fetchDataForTopPrivatelyControlledDomains(OptionSet<WebsiteDataType> dataTypes, OptionSet<WebsiteDataFetchOption> fetchOptions, const Vector<String>& topPrivatelyControlledDomains, Function<void(Vector<WebsiteDataRecord>&&, HashSet<String>&&)>&& completionHandler)
> 
> Wow! Was this never used?

At least not now. It may be a remnant from your recent refactoring. :)
Comment 25 Fujii Hironori 2019-02-22 16:05:16 PST
Comment on attachment 362683 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review

> Source/WebCore/platform/RegistrableDomain.h:103
> +    return RegistrableDomain { *domain };

Calling ctor is calling topPrivatelyControlledDomain again. Is there a way to avoid calling topPrivatelyControlledDomain?
Comment 26 John Wilander 2019-02-22 16:51:23 PST
(In reply to John Wilander from comment #24)
> (In reply to Brent Fulgham from comment #13)
> > 
> > Is this right? Our old code did this:
> > 
> >     auto domain = topPrivatelyControlledDomain(url.host().toString());
> >     if (domain.isEmpty())
> >         domain = url.host().toString();
> 
> In the places where we need to have some string representing the "domain,"
> the old code makes sense. This new class should only represent real
> registrable domains though. I.e. code that needs a representation where
> there is no registrable domain needs to handle that themselves. Did I remove
> such handling somewhere?

I see what you were saying now. And this was actually the cause of the test case failures. I've updated the RegistrableDomain class to handle the null origin and cases where the Public Suffix List cannot deduce an eTLD+1 for us. In the first case we represent it internally with the string "nullOrigin" and in the latter case we use the URL's host. Both these cases match what we did in the old code.
Comment 27 John Wilander 2019-02-22 16:53:33 PST
(In reply to Fujii Hironori from comment #25)
> Comment on attachment 362683 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362683&action=review
> 
> > Source/WebCore/platform/RegistrableDomain.h:103
> > +    return RegistrableDomain { *domain };
> 
> Calling ctor is calling topPrivatelyControlledDomain again. Is there a way
> to avoid calling topPrivatelyControlledDomain?

I did this deliberately so that the decoder wouldn't be a way to sneak in a string that doesn't represent a registrable domain. Maybe that's too paranoid?
Comment 28 John Wilander 2019-02-22 16:54:08 PST
Created attachment 362789 [details]
Patch
Comment 29 Alex Christensen 2019-02-25 08:38:45 PST
Comment on attachment 362789 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362789&action=review

r=me with a few nits.

> Source/WebCore/loader/AdClickAttribution.h:63
> +            : registrableDomain { RegistrableDomain { host } }

This can be made more concise and avoid an unnecessary copy which is probably optimized out.
: registrableDomain { host }

> Source/WebCore/loader/ResourceLoadObserver.cpp:196
> -    if (url.protocolIsAbout() || url.isEmpty())
> +    if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty())

This is a change in behavior, and I don't see why it's necessary in this patch.

> Source/WebCore/platform/RegistrableDomain.h:75
> +        if (host.length() == m_registrableDomain.length())

I would make this <= to protect against reading out of bounds below.  Otherwise the line below needs to make sure it's reading in bounds.

> Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:216
> +    const RegistrableDomain m_debugStaticPrevalentResource { "3rdpartytestwebkit.org" };

A _s suffix on the string might help something.  It indicates it's a String and not just a const char*.  We used to have ASCIILiteral.  Those details are always a little unclear to me.

> Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:66
>  using namespace WebCore;
> +using RegistrableDomain = WebCore::RegistrableDomain;

This seems redundant.

> Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:1518
> -            process->clearPrevalentResource(m_sessionID, primaryDomain, [processPool, callbackAggregator = callbackAggregator.copyRef()] { });
> +            process->clearPrevalentResource(m_sessionID, url.host().toString(), [processPool, callbackAggregator = callbackAggregator.copyRef()] { });

This could just pass url, right?
Comment 30 John Wilander 2019-02-25 11:05:06 PST
(In reply to Alex Christensen from comment #29)
> Comment on attachment 362789 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362789&action=review
> 
> r=me with a few nits.
> 
> > Source/WebCore/loader/AdClickAttribution.h:63
> > +            : registrableDomain { RegistrableDomain { host } }
> 
> This can be made more concise and avoid an unnecessary copy which is
> probably optimized out.
> : registrableDomain { host }

Will fix.

> > Source/WebCore/loader/ResourceLoadObserver.cpp:196
> > -    if (url.protocolIsAbout() || url.isEmpty())
> > +    if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty())
> 
> This is a change in behavior, and I don't see why it's necessary in this
> patch.

I had to fix it because I kept getting assert crashes on debug builds when I clicked the test results page (served as a file).

> > Source/WebCore/platform/RegistrableDomain.h:75
> > +        if (host.length() == m_registrableDomain.length())
> 
> I would make this <= to protect against reading out of bounds below. 
> Otherwise the line below needs to make sure it's reading in bounds.

But that would make it do the wrong thing, right? At least this line, since it would then say "ebkit.org" matches "webkit.org."

The current implementation says:
1. It has to end with the registrable domain. This rules out anything that's shorter than what we're matching against. (I just added an API test for this to my patch to be sure.)
2. Now, if they are the exact same length, they match.
3. If not, then check if the next character to the left is a period. Since what we're matching with cannot be shorter and is not the same length, it has to be longer.

This would still work with your suggested <=, but the line would indicate the wrong thing.

It's a tradeoff, I guess. Please advise.

> > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:216
> > +    const RegistrableDomain m_debugStaticPrevalentResource { "3rdpartytestwebkit.org" };
> 
> A _s suffix on the string might help something.  It indicates it's a String
> and not just a const char*.  We used to have ASCIILiteral.  Those details
> are always a little unclear to me.

Good catch. Will fix.

> > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:66
> >  using namespace WebCore;
> > +using RegistrableDomain = WebCore::RegistrableDomain;
> 
> This seems redundant.

Correct! I started out with all these "using" statements only to find that we are using the whole namespace in many of the implementation files. Will fix.

> > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:1518
> > -            process->clearPrevalentResource(m_sessionID, primaryDomain, [processPool, callbackAggregator = callbackAggregator.copyRef()] { });
> > +            process->clearPrevalentResource(m_sessionID, url.host().toString(), [processPool, callbackAggregator = callbackAggregator.copyRef()] { });
> 
> This could just pass url, right?

Yes. Will fix.

I'll fix the decoder issue Fuji brought up, like so:

template<class Decoder>
Optional<RegistrableDomain> RegistrableDomain::decode(Decoder& decoder)
{
    Optional<String> domain;
    decoder >> domain;
    if (!domain)
        return WTF::nullopt;

    RegistrableDomain registrableDomain;
    registrableDomain.m_registrableDomain = WTFMove(*domain);
    return registrableDomain;
}
Comment 31 Alex Christensen 2019-02-25 11:19:58 PST
Comment on attachment 362789 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=362789&action=review

>>> Source/WebCore/loader/ResourceLoadObserver.cpp:196
>>> +    if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty())
>> 
>> This is a change in behavior, and I don't see why it's necessary in this patch.
> 
> I had to fix it because I kept getting assert crashes on debug builds when I clicked the test results page (served as a file).

Could  we put this in a separate change?  It seems like you should get the assert crashes without the rest of this change.

>>> Source/WebCore/platform/RegistrableDomain.h:75
>>> +        if (host.length() == m_registrableDomain.length())
>> 
>> I would make this <= to protect against reading out of bounds below.  Otherwise the line below needs to make sure it's reading in bounds.
> 
> But that would make it do the wrong thing, right? At least this line, since it would then say "ebkit.org" matches "webkit.org."
> 
> The current implementation says:
> 1. It has to end with the registrable domain. This rules out anything that's shorter than what we're matching against. (I just added an API test for this to my patch to be sure.)
> 2. Now, if they are the exact same length, they match.
> 3. If not, then check if the next character to the left is a period. Since what we're matching with cannot be shorter and is not the same length, it has to be longer.
> 
> This would still work with your suggested <=, but the line would indicate the wrong thing.
> 
> It's a tradeoff, I guess. Please advise.

Hmmm. You're right.  The endsWith call prevents any reading out of bounds.
Comment 32 John Wilander 2019-02-25 11:33:50 PST
(In reply to Alex Christensen from comment #31)
> Comment on attachment 362789 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=362789&action=review
> 
> >>> Source/WebCore/loader/ResourceLoadObserver.cpp:196
> >>> +    if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty())
> >> 
> >> This is a change in behavior, and I don't see why it's necessary in this patch.
> > 
> > I had to fix it because I kept getting assert crashes on debug builds when I clicked the test results page (served as a file).
> 
> Could  we put this in a separate change?  It seems like you should get the
> assert crashes without the rest of this change.

Sure. It might also have been fixed by the handling of null origins since I believe that's what happened for the file document.

> >>> Source/WebCore/platform/RegistrableDomain.h:75
> >>> +        if (host.length() == m_registrableDomain.length())
> >> 
> >> I would make this <= to protect against reading out of bounds below.  Otherwise the line below needs to make sure it's reading in bounds.
> > 
> > But that would make it do the wrong thing, right? At least this line, since it would then say "ebkit.org" matches "webkit.org."
> > 
> > The current implementation says:
> > 1. It has to end with the registrable domain. This rules out anything that's shorter than what we're matching against. (I just added an API test for this to my patch to be sure.)
> > 2. Now, if they are the exact same length, they match.
> > 3. If not, then check if the next character to the left is a period. Since what we're matching with cannot be shorter and is not the same length, it has to be longer.
> > 
> > This would still work with your suggested <=, but the line would indicate the wrong thing.
> > 
> > It's a tradeoff, I guess. Please advise.
> 
> Hmmm. You're right.  The endsWith call prevents any reading out of bounds.

Yeah. But I agree with you that neither of these two look clear and nice. :/
Comment 33 John Wilander 2019-02-25 12:49:06 PST
(In reply to John Wilander from comment #30)
> (In reply to Alex Christensen from comment #29)
> > Comment on attachment 362789 [details]
> 
> > > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:1518
> > > -            process->clearPrevalentResource(m_sessionID, primaryDomain, [processPool, callbackAggregator = callbackAggregator.copyRef()] { });
> > > +            process->clearPrevalentResource(m_sessionID, url.host().toString(), [processPool, callbackAggregator = callbackAggregator.copyRef()] { });
> > 
> > This could just pass url, right?
> 
> Yes. Will fix.

I did this change and got a test failure. Tons of logging and debugging later, I realize this takes the trip through the NetworkProcessProxy which is still handling strings. So the URL is converted to a string which includes scheme and port, not just the host. This in turn calls RegistrableDomain's string constructor which decides to use the whole thing as its internal representation since the Public Suffix functionality returns an empty result.
   This all tells me we should try to get rid of the RegistrableDomain's string constructor in a follow up patch. Callers will have to construct URLs if they just have a string. Do you agree?
Comment 34 John Wilander 2019-02-25 13:13:11 PST
Created attachment 362920 [details]
Patch for landing
Comment 35 WebKit Commit Bot 2019-02-25 13:53:03 PST
Comment on attachment 362920 [details]
Patch for landing

Clearing flags on attachment: 362920

Committed r242056: <https://trac.webkit.org/changeset/242056>
Comment 36 WebKit Commit Bot 2019-02-25 13:53:05 PST
All reviewed patches have been landed.  Closing bug.