RESOLVED FIXED 194791
Introduce and adopt new class RegistrableDomain for eTLD+1
https://bugs.webkit.org/show_bug.cgi?id=194791
Summary Introduce and adopt new class RegistrableDomain for eTLD+1
John Wilander
Reported 2019-02-18 14:47:30 PST
We should have a dedicated class representing a "registrable domain" or eTLD+1. This concept is used heavily in WebKit, for instance in partitioning, SameSite cookies, and Resource Load Statistics (a.k.a. Intelligent Tracking Prevention). Currently, registrable domains are passed as strings and we've had at least two bugs where the host, including subdomains, was passed instead. Type safety will prevent such bugs in the future.
Attachments
Patch (344.29 KB, patch)
2019-02-21 19:21 PST, John Wilander
no flags
Archive of layout-test-results from ews104 for mac-highsierra-wk2 (3.09 MB, application/zip)
2019-02-21 20:51 PST, EWS Watchlist
no flags
Archive of layout-test-results from ews124 for ios-simulator-wk2 (3.14 MB, application/zip)
2019-02-21 21:36 PST, EWS Watchlist
no flags
Patch (353.98 KB, patch)
2019-02-22 16:54 PST, John Wilander
no flags
Patch for landing (355.02 KB, patch)
2019-02-25 13:13 PST, John Wilander
no flags
Radar WebKit Bug Importer
Comment 1 2019-02-18 14:48:23 PST
John Wilander
Comment 2 2019-02-21 18:59:26 PST
I will not adopt the new class for partitioning in this bug. The patch is large enough as it is. Retitling.
John Wilander
Comment 3 2019-02-21 19:21:24 PST
John Wilander
Comment 4 2019-02-21 19:24:26 PST
I'm sorry for the patch size but this was a natural chunk size.
EWS Watchlist
Comment 5 2019-02-21 20:51:28 PST
Comment on attachment 362683 [details] Patch Attachment 362683 [details] did not pass mac-wk2-ews (mac-wk2): Output: https://webkit-queues.webkit.org/results/11241661 New failing tests: imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/module/dynamic-import/dynamic-imports-fetch-error.sub.html imported/w3c/web-platform-tests/infrastructure/server/wpt-server-http.sub.html imported/w3c/web-platform-tests/credential-management/passwordcredential-framed-get.sub.https.html imported/w3c/web-platform-tests/cors/redirect-userinfo.htm imported/w3c/web-platform-tests/resource-timing/resource_timing_cross_origin_redirect.html imported/w3c/web-platform-tests/html/dom/reflection-embedded.html imported/w3c/web-platform-tests/eventsource/eventsource-cross-origin.htm imported/w3c/web-platform-tests/credential-management/federatedcredential-framed-get.sub.https.html http/tests/xmlhttprequest/access-control-and-redirects-async.html
EWS Watchlist
Comment 6 2019-02-21 20:51:30 PST
Created attachment 362692 [details] Archive of layout-test-results from ews104 for mac-highsierra-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews104 Port: mac-highsierra-wk2 Platform: Mac OS X 10.13.6
Ryosuke Niwa
Comment 7 2019-02-21 20:59:42 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review Just a drive by comment... > Source/WebCore/platform/RegistrableDomain.h:35 > +// This class represents a domain's eTLD+1 (effective top level domain plus We don't usually add a class description like this. > Source/WebCore/platform/RegistrableDomain.h:40 > +// the Public Suffix List (https://publicsuffix.org). This class just > +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST. In particular, this is the kind of information you can directly get from reading code, and tends to get outdated rather quickly.
EWS Watchlist
Comment 8 2019-02-21 21:36:40 PST
Comment on attachment 362683 [details] Patch Attachment 362683 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: https://webkit-queues.webkit.org/results/11241767 New failing tests: imported/w3c/web-platform-tests/html/semantics/scripting-1/the-script-element/module/dynamic-import/dynamic-imports-fetch-error.sub.html imported/w3c/web-platform-tests/infrastructure/server/wpt-server-http.sub.html imported/w3c/web-platform-tests/credential-management/passwordcredential-framed-get.sub.https.html imported/w3c/web-platform-tests/cors/redirect-userinfo.htm imported/w3c/web-platform-tests/resource-timing/resource_timing_cross_origin_redirect.html imported/w3c/web-platform-tests/html/dom/reflection-embedded.html http/tests/quicklook/secure-document-with-subresources.html imported/w3c/web-platform-tests/eventsource/eventsource-cross-origin.htm imported/w3c/web-platform-tests/credential-management/federatedcredential-framed-get.sub.https.html http/tests/xmlhttprequest/access-control-and-redirects-async.html
EWS Watchlist
Comment 9 2019-02-21 21:36:41 PST
Created attachment 362696 [details] Archive of layout-test-results from ews124 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews124 Port: ios-simulator-wk2 Platform: Mac OS X 10.13.6
Fujii Hironori
Comment 10 2019-02-22 02:40:09 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review > Source/WebCore/loader/AdClickAttribution.h:79 > + return registrableDomain == RegistrableDomain { url.host().toString() }; This can be 'RegistrableDomain { url }'. > Source/WebCore/loader/AdClickAttribution.h:249 > + encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend; Is '.string()' needed? Even though RegistrableDomain has 'encode' method. > Source/WebCore/platform/RegistrableDomain.h:67 > + bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; } I want a method to compare with URL, for example 'matches'. Can it be implemented by using StringView.endsWith? bool matches(URL url) { auto host = url.host(); if (!host.endsWith(m_registrableDomain)) return fasle; if (host.length() == m_registrableDomain.length()) return true; return host[host.length() - m_registrableDomain.length()] == '.' } > Source/WebKit/UIProcess/WebProcessProxy.cpp:334 > + page.value->postMessageToInjectedBundle("WebsiteDataDeletionForRegistrableDomainsFinished", nullptr); Wow! WK2 common code is sending messages to InjectedBundle. IIUC, this is not allowed.
Fujii Hironori
Comment 11 2019-02-22 02:43:04 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review > Source/WebCore/html/HTMLAnchorElement.cpp:446 > + if (RegistrableDomain(documentDomain) == RegistrableDomain(adDestinationHost)) { For example, this can be 'if (RegistrableDomain(documentDomain).matches(adDestinationURL))' if RegistrableDomain has 'matches' method. >> Source/WebCore/loader/AdClickAttribution.h:79 >> + return registrableDomain == RegistrableDomain { url.host().toString() }; > > This can be 'RegistrableDomain { url }'. And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method.
Don Olmstead
Comment 12 2019-02-22 09:15:03 PST
John the WinCairo build is just failing because we haven't started enumerating the WebCore headers for copying. You can ignore the build failure.
Brent Fulgham
Comment 13 2019-02-22 09:52:49 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review I think this change is really, really great! I think it could be improved a bit as suggested by Ryousuke, Fuji, and others, but it's a huge improvement. >> Source/WebCore/loader/AdClickAttribution.h:249 >> + encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend; > > Is '.string()' needed? Even though RegistrableDomain has 'encode' method. Yes, this doesn't seem necessary if RD can encode/decode itself. >> Source/WebCore/platform/RegistrableDomain.h:40 >> +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST. > > In particular, this is the kind of information you can directly get from reading code, and tends to get outdated rather quickly. This would be good information in the ChangeLog, though. > Source/WebCore/platform/RegistrableDomain.h:47 > + : m_registrableDomain { topPrivatelyControlledDomain(url.host().toString()) } Is this right? Our old code did this: auto domain = topPrivatelyControlledDomain(url.host().toString()); if (domain.isEmpty()) domain = url.host().toString(); >> Source/WebCore/platform/RegistrableDomain.h:67 >> + bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; } > > I want a method to compare with URL, for example 'matches'. > Can it be implemented by using StringView.endsWith? > > bool matches(URL url) { > auto host = url.host(); > if (!host.endsWith(m_registrableDomain)) > return fasle; > if (host.length() == m_registrableDomain.length()) > return true; > return host[host.length() - m_registrableDomain.length()] == '.' > } This does seem like a good idea, especially if we can avoid converting the entire URL into a temporary string just to compare the host portion.. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:255 > + workQueue->dispatch([domainsWithDeletedWebsiteData = crossThreadCopy(domainsWithDeletedWebsiteData), callback = WTFMove(callback), weakThis = WTFMove(weakThis)] () mutable { It seems like both tiers of lambdas here should work in terms of HashSets of RegistrableDomains, rather than strings that have to be converted to RegistrableDomains later. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:260 > + for (auto& domain : domainsWithDeletedWebsiteData) { Much better naming! > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:289 > + auto mapEntry = m_resourceStatisticsMap.find(RegistrableDomain { subresourceUniqueRedirectFromDomain }); We should make 'resourceStatistic.subresourceUniqueRedirectsFrom' (and the others) containers of RegistrableDomain when they are serialized out of the plist. Then you don't need to do this conversion and string copying every time you loop or search. Likewise, m_resourceStatisticsMap should be a container keyed by RegistrableDomain for the same reason. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:58 > + using SubFrameDomain = WebCore::RegistrableDomain; Great! > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:222 > + Extra whitespace > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63 > + using ResourceLoadStatistics = WebCore::ResourceLoadStatistics; I really like this approach. I wonder if all of this code would be better/less error prone if instead of 'using' you actually had subclasses for each of these concepts. This would prevent us from accidentally mixing up the TopFrameDomain and SubFrameDomain when passing to a helper function. > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:-622 > -void WebsiteDataStore::fetchDataForTopPrivatelyControlledDomains(OptionSet<WebsiteDataType> dataTypes, OptionSet<WebsiteDataFetchOption> fetchOptions, const Vector<String>& topPrivatelyControlledDomains, Function<void(Vector<WebsiteDataRecord>&&, HashSet<String>&&)>&& completionHandler) Wow! Was this never used?
Brent Fulgham
Comment 14 2019-02-22 09:56:48 PST
(In reply to Build Bot from comment #6) > Created attachment 362692 [details] > Archive of layout-test-results from ews104 for mac-highsierra-wk2 > > The attached test failures were seen while running run-webkit-tests on the > mac-wk2-ews. > Bot: ews104 Port: mac-highsierra-wk2 Platform: Mac OS X 10.13.6 It looks like you introduced a nullptr dereference somewhere.
Alex Christensen
Comment 15 2019-02-22 10:39:47 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review >>> Source/WebCore/loader/AdClickAttribution.h:79 >>> + return registrableDomain == RegistrableDomain { url.host().toString() }; >> >> This can be 'RegistrableDomain { url }'. > > And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method. Let's not add a matches method to RegistrableDomain. That's what operator== is for.
Brent Fulgham
Comment 16 2019-02-22 10:43:52 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review >>>> Source/WebCore/loader/AdClickAttribution.h:79 >>>> + return registrableDomain == RegistrableDomain { url.host().toString() }; >>> >>> This can be 'RegistrableDomain { url }'. >> >> And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method. > > Let's not add a matches method to RegistrableDomain. That's what operator== is for. How about operator== that takes a URL argument?
John Wilander
Comment 17 2019-02-22 13:36:59 PST
Thanks Ryosuke, Fuji, Brent, and Alex for all your comments. I will work through them now. And thanks Don, for the heads up on the WinCairo bot.
Chris Dumez
Comment 18 2019-02-22 13:40:53 PST
(In reply to Brent Fulgham from comment #16) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > >>>> Source/WebCore/loader/AdClickAttribution.h:79 > >>>> + return registrableDomain == RegistrableDomain { url.host().toString() }; > >>> > >>> This can be 'RegistrableDomain { url }'. > >> > >> And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method. > > > > Let's not add a matches method to RegistrableDomain. That's what operator== is for. > > How about operator== that takes a URL argument? I think an operator== on RegistrableDomain that would take in a URL in parameter would be confusing. They are 2 very different things. I'd much prefer a matches() method.
John Wilander
Comment 19 2019-02-22 13:44:38 PST
(In reply to Alex Christensen from comment #15) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > >>> Source/WebCore/loader/AdClickAttribution.h:79 > >>> + return registrableDomain == RegistrableDomain { url.host().toString() }; > >> > >> This can be 'RegistrableDomain { url }'. > > > > And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method. > > Let's not add a matches method to RegistrableDomain. That's what operator== > is for. I generally agree, but I think what Fuji is suggesting is different than the == operator. His proposed .matches() function considers "https://sub.domain.webkit.org" as matching the registrable domain webkit.org. However, I would expect an == comparing a URL with a RegistrableDomain to return true only if the URL's host is equal to the registrable domain. Do you agree?
Alex Christensen
Comment 20 2019-02-22 13:58:46 PST
(In reply to John Wilander from comment #19) > (In reply to Alex Christensen from comment #15) > > Comment on attachment 362683 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > > > >>> Source/WebCore/loader/AdClickAttribution.h:79 > > >>> + return registrableDomain == RegistrableDomain { url.host().toString() }; > > >> > > >> This can be 'RegistrableDomain { url }'. > > > > > > And, this can be 'registrableDomain.matches(url) ' if it has 'matches' method. > > > > Let's not add a matches method to RegistrableDomain. That's what operator== > > is for. > > I generally agree, but I think what Fuji is suggesting is different than the > == operator. > > His proposed .matches() function considers "https://sub.domain.webkit.org" > as matching the registrable domain webkit.org. However, I would expect an == > comparing a URL with a RegistrableDomain to return true only if the URL's > host is equal to the registrable domain. Do you agree? I guess operator==(URL&, RegistrableDomain&) doesn't make sense. RegistrableDomain.matches makes less sense than URL.hasRegistrableDomain, but I guess it doesn't matter too much.
John Wilander
Comment 21 2019-02-22 15:03:52 PST
(In reply to Ryosuke Niwa from comment #7) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > Just a drive by comment... > > > Source/WebCore/platform/RegistrableDomain.h:35 > > +// This class represents a domain's eTLD+1 (effective top level domain plus > > We don't usually add a class description like this. > > > Source/WebCore/platform/RegistrableDomain.h:40 > > +// the Public Suffix List (https://publicsuffix.org). This class just > > +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST. > > In particular, this is the kind of information you can directly get from > reading code, and tends to get outdated rather quickly. I'll move the description to the Changelog.
John Wilander
Comment 22 2019-02-22 15:05:55 PST
(In reply to Fujii Hironori from comment #10) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > > Source/WebCore/loader/AdClickAttribution.h:79 > > + return registrableDomain == RegistrableDomain { url.host().toString() }; > > This can be 'RegistrableDomain { url }'. Of course. :) I think I wrote this line before adding the URL constructor to RegistrableDomain. > > Source/WebCore/loader/AdClickAttribution.h:249 > > + encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend; > > Is '.string()' needed? Even though RegistrableDomain has 'encode' method. You're right. > > Source/WebCore/platform/RegistrableDomain.h:67 > > + bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; } > > I want a method to compare with URL, for example 'matches'. > Can it be implemented by using StringView.endsWith? > > bool matches(URL url) { > auto host = url.host(); > if (!host.endsWith(m_registrableDomain)) > return fasle; > if (host.length() == m_registrableDomain.length()) > return true; > return host[host.length() - m_registrableDomain.length()] == '.' > } I implemented this and added an API test … which revealed an off-by-one bug in the final line. :P But thanks for the suggestion. It makes sense to have this capability. > > Source/WebKit/UIProcess/WebProcessProxy.cpp:334 > > + page.value->postMessageToInjectedBundle("WebsiteDataDeletionForRegistrableDomainsFinished", nullptr); > > Wow! WK2 common code is sending messages to InjectedBundle. IIUC, this is > not allowed.
John Wilander
Comment 23 2019-02-22 15:07:14 PST
(In reply to Fujii Hironori from comment #11) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > > Source/WebCore/html/HTMLAnchorElement.cpp:446 > > + if (RegistrableDomain(documentDomain) == RegistrableDomain(adDestinationHost)) { > > For example, this can be 'if > (RegistrableDomain(documentDomain).matches(adDestinationURL))' if > RegistrableDomain has 'matches' method. Yes. Will fix. > >> Source/WebCore/loader/AdClickAttribution.h:79 > >> + return registrableDomain == RegistrableDomain { url.host().toString() }; > > > > This can be 'RegistrableDomain { url }'. > > And, this can be 'registrableDomain.matches(url) ' if it has 'matches' > method. Will fix.
John Wilander
Comment 24 2019-02-22 15:13:40 PST
(In reply to Brent Fulgham from comment #13) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > I think this change is really, really great! I think it could be improved a > bit as suggested by Ryousuke, Fuji, and others, but it's a huge improvement. > > >> Source/WebCore/loader/AdClickAttribution.h:249 > >> + encoder << m_campaign.id << m_source.registrableDomain.string() << m_destination.registrableDomain.string() << m_timeOfAdClick << m_conversion << m_earliestTimeToSend; > > > > Is '.string()' needed? Even though RegistrableDomain has 'encode' method. > > Yes, this doesn't seem necessary if RD can encode/decode itself. Will fix. > >> Source/WebCore/platform/RegistrableDomain.h:40 > >> +// uses the full domain for ports that haven't enabled PUBLIC_SUFFIX_LIST. > > > > In particular, this is the kind of information you can directly get from reading code, and tends to get outdated rather quickly. > > This would be good information in the ChangeLog, though. Yup. Will do that. > > Source/WebCore/platform/RegistrableDomain.h:47 > > + : m_registrableDomain { topPrivatelyControlledDomain(url.host().toString()) } > > Is this right? Our old code did this: > > auto domain = topPrivatelyControlledDomain(url.host().toString()); > if (domain.isEmpty()) > domain = url.host().toString(); In the places where we need to have some string representing the "domain," the old code makes sense. This new class should only represent real registrable domains though. I.e. code that needs a representation where there is no registrable domain needs to handle that themselves. Did I remove such handling somewhere? > >> Source/WebCore/platform/RegistrableDomain.h:67 > >> + bool operator!=(const RegistrableDomain& other) const { return m_registrableDomain != other.m_registrableDomain; } > > > > I want a method to compare with URL, for example 'matches'. > > Can it be implemented by using StringView.endsWith? > > > > bool matches(URL url) { > > auto host = url.host(); > > if (!host.endsWith(m_registrableDomain)) > > return fasle; > > if (host.length() == m_registrableDomain.length()) > > return true; > > return host[host.length() - m_registrableDomain.length()] == '.' > > } > > This does seem like a good idea, especially if we can avoid converting the > entire URL into a temporary string just to compare the host portion.. Yes, I've added it. > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:255 > > + workQueue->dispatch([domainsWithDeletedWebsiteData = crossThreadCopy(domainsWithDeletedWebsiteData), callback = WTFMove(callback), weakThis = WTFMove(weakThis)] () mutable { > > It seems like both tiers of lambdas here should work in terms of HashSets of > RegistrableDomains, rather than strings that have to be converted to > RegistrableDomains later. I started out working on that but the resulting HashSet<String>& domainsWithDeletedWebsiteData is built up from calls in to various storage management functions that yet don't know about RegistrableDomain. Therefore I left this particular structure as-is to limit the changes in this one patch. > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:260 > > + for (auto& domain : domainsWithDeletedWebsiteData) { > > Much better naming! > > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.cpp:289 > > + auto mapEntry = m_resourceStatisticsMap.find(RegistrableDomain { subresourceUniqueRedirectFromDomain }); > > We should make 'resourceStatistic.subresourceUniqueRedirectsFrom' (and the > others) containers of RegistrableDomain when they are serialized out of the > plist. Then you don't need to do this conversion and string copying every > time you loop or search. Yes. However, since these are the data structures that are persisted, I wanted to leave them untouched in this patch and change them in a standalone one. If we regress something in the persistence layer we can just roll out the small patch. > Likewise, m_resourceStatisticsMap should be a container keyed by > RegistrableDomain for the same reason. > > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:58 > > + using SubFrameDomain = WebCore::RegistrableDomain; > > Great! > > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:222 > > + > > Extra whitespace Thanks. Will fix. > > Source/WebKit/NetworkProcess/Classifier/WebResourceLoadStatisticsStore.h:63 > > + using ResourceLoadStatistics = WebCore::ResourceLoadStatistics; > > I really like this approach. I wonder if all of this code would be > better/less error prone if instead of 'using' you actually had subclasses > for each of these concepts. This would prevent us from accidentally mixing > up the TopFrameDomain and SubFrameDomain when passing to a helper function. I considered that and I still think it's a good idea. But I'd like to add the sub classes in a separate patch. > > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:-622 > > -void WebsiteDataStore::fetchDataForTopPrivatelyControlledDomains(OptionSet<WebsiteDataType> dataTypes, OptionSet<WebsiteDataFetchOption> fetchOptions, const Vector<String>& topPrivatelyControlledDomains, Function<void(Vector<WebsiteDataRecord>&&, HashSet<String>&&)>&& completionHandler) > > Wow! Was this never used? At least not now. It may be a remnant from your recent refactoring. :)
Fujii Hironori
Comment 25 2019-02-22 16:05:16 PST
Comment on attachment 362683 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362683&action=review > Source/WebCore/platform/RegistrableDomain.h:103 > + return RegistrableDomain { *domain }; Calling ctor is calling topPrivatelyControlledDomain again. Is there a way to avoid calling topPrivatelyControlledDomain?
John Wilander
Comment 26 2019-02-22 16:51:23 PST
(In reply to John Wilander from comment #24) > (In reply to Brent Fulgham from comment #13) > > > > Is this right? Our old code did this: > > > > auto domain = topPrivatelyControlledDomain(url.host().toString()); > > if (domain.isEmpty()) > > domain = url.host().toString(); > > In the places where we need to have some string representing the "domain," > the old code makes sense. This new class should only represent real > registrable domains though. I.e. code that needs a representation where > there is no registrable domain needs to handle that themselves. Did I remove > such handling somewhere? I see what you were saying now. And this was actually the cause of the test case failures. I've updated the RegistrableDomain class to handle the null origin and cases where the Public Suffix List cannot deduce an eTLD+1 for us. In the first case we represent it internally with the string "nullOrigin" and in the latter case we use the URL's host. Both these cases match what we did in the old code.
John Wilander
Comment 27 2019-02-22 16:53:33 PST
(In reply to Fujii Hironori from comment #25) > Comment on attachment 362683 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362683&action=review > > > Source/WebCore/platform/RegistrableDomain.h:103 > > + return RegistrableDomain { *domain }; > > Calling ctor is calling topPrivatelyControlledDomain again. Is there a way > to avoid calling topPrivatelyControlledDomain? I did this deliberately so that the decoder wouldn't be a way to sneak in a string that doesn't represent a registrable domain. Maybe that's too paranoid?
John Wilander
Comment 28 2019-02-22 16:54:08 PST
Alex Christensen
Comment 29 2019-02-25 08:38:45 PST
Comment on attachment 362789 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362789&action=review r=me with a few nits. > Source/WebCore/loader/AdClickAttribution.h:63 > + : registrableDomain { RegistrableDomain { host } } This can be made more concise and avoid an unnecessary copy which is probably optimized out. : registrableDomain { host } > Source/WebCore/loader/ResourceLoadObserver.cpp:196 > - if (url.protocolIsAbout() || url.isEmpty()) > + if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty()) This is a change in behavior, and I don't see why it's necessary in this patch. > Source/WebCore/platform/RegistrableDomain.h:75 > + if (host.length() == m_registrableDomain.length()) I would make this <= to protect against reading out of bounds below. Otherwise the line below needs to make sure it's reading in bounds. > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:216 > + const RegistrableDomain m_debugStaticPrevalentResource { "3rdpartytestwebkit.org" }; A _s suffix on the string might help something. It indicates it's a String and not just a const char*. We used to have ASCIILiteral. Those details are always a little unclear to me. > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:66 > using namespace WebCore; > +using RegistrableDomain = WebCore::RegistrableDomain; This seems redundant. > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:1518 > - process->clearPrevalentResource(m_sessionID, primaryDomain, [processPool, callbackAggregator = callbackAggregator.copyRef()] { }); > + process->clearPrevalentResource(m_sessionID, url.host().toString(), [processPool, callbackAggregator = callbackAggregator.copyRef()] { }); This could just pass url, right?
John Wilander
Comment 30 2019-02-25 11:05:06 PST
(In reply to Alex Christensen from comment #29) > Comment on attachment 362789 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362789&action=review > > r=me with a few nits. > > > Source/WebCore/loader/AdClickAttribution.h:63 > > + : registrableDomain { RegistrableDomain { host } } > > This can be made more concise and avoid an unnecessary copy which is > probably optimized out. > : registrableDomain { host } Will fix. > > Source/WebCore/loader/ResourceLoadObserver.cpp:196 > > - if (url.protocolIsAbout() || url.isEmpty()) > > + if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty()) > > This is a change in behavior, and I don't see why it's necessary in this > patch. I had to fix it because I kept getting assert crashes on debug builds when I clicked the test results page (served as a file). > > Source/WebCore/platform/RegistrableDomain.h:75 > > + if (host.length() == m_registrableDomain.length()) > > I would make this <= to protect against reading out of bounds below. > Otherwise the line below needs to make sure it's reading in bounds. But that would make it do the wrong thing, right? At least this line, since it would then say "ebkit.org" matches "webkit.org." The current implementation says: 1. It has to end with the registrable domain. This rules out anything that's shorter than what we're matching against. (I just added an API test for this to my patch to be sure.) 2. Now, if they are the exact same length, they match. 3. If not, then check if the next character to the left is a period. Since what we're matching with cannot be shorter and is not the same length, it has to be longer. This would still work with your suggested <=, but the line would indicate the wrong thing. It's a tradeoff, I guess. Please advise. > > Source/WebKit/NetworkProcess/Classifier/ResourceLoadStatisticsMemoryStore.h:216 > > + const RegistrableDomain m_debugStaticPrevalentResource { "3rdpartytestwebkit.org" }; > > A _s suffix on the string might help something. It indicates it's a String > and not just a const char*. We used to have ASCIILiteral. Those details > are always a little unclear to me. Good catch. Will fix. > > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:66 > > using namespace WebCore; > > +using RegistrableDomain = WebCore::RegistrableDomain; > > This seems redundant. Correct! I started out with all these "using" statements only to find that we are using the whole namespace in many of the implementation files. Will fix. > > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:1518 > > - process->clearPrevalentResource(m_sessionID, primaryDomain, [processPool, callbackAggregator = callbackAggregator.copyRef()] { }); > > + process->clearPrevalentResource(m_sessionID, url.host().toString(), [processPool, callbackAggregator = callbackAggregator.copyRef()] { }); > > This could just pass url, right? Yes. Will fix. I'll fix the decoder issue Fuji brought up, like so: template<class Decoder> Optional<RegistrableDomain> RegistrableDomain::decode(Decoder& decoder) { Optional<String> domain; decoder >> domain; if (!domain) return WTF::nullopt; RegistrableDomain registrableDomain; registrableDomain.m_registrableDomain = WTFMove(*domain); return registrableDomain; }
Alex Christensen
Comment 31 2019-02-25 11:19:58 PST
Comment on attachment 362789 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=362789&action=review >>> Source/WebCore/loader/ResourceLoadObserver.cpp:196 >>> + if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty()) >> >> This is a change in behavior, and I don't see why it's necessary in this patch. > > I had to fix it because I kept getting assert crashes on debug builds when I clicked the test results page (served as a file). Could we put this in a separate change? It seems like you should get the assert crashes without the rest of this change. >>> Source/WebCore/platform/RegistrableDomain.h:75 >>> + if (host.length() == m_registrableDomain.length()) >> >> I would make this <= to protect against reading out of bounds below. Otherwise the line below needs to make sure it's reading in bounds. > > But that would make it do the wrong thing, right? At least this line, since it would then say "ebkit.org" matches "webkit.org." > > The current implementation says: > 1. It has to end with the registrable domain. This rules out anything that's shorter than what we're matching against. (I just added an API test for this to my patch to be sure.) > 2. Now, if they are the exact same length, they match. > 3. If not, then check if the next character to the left is a period. Since what we're matching with cannot be shorter and is not the same length, it has to be longer. > > This would still work with your suggested <=, but the line would indicate the wrong thing. > > It's a tradeoff, I guess. Please advise. Hmmm. You're right. The endsWith call prevents any reading out of bounds.
John Wilander
Comment 32 2019-02-25 11:33:50 PST
(In reply to Alex Christensen from comment #31) > Comment on attachment 362789 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=362789&action=review > > >>> Source/WebCore/loader/ResourceLoadObserver.cpp:196 > >>> + if (url.protocolIsAbout() || url.isLocalFile() || url.isEmpty()) > >> > >> This is a change in behavior, and I don't see why it's necessary in this patch. > > > > I had to fix it because I kept getting assert crashes on debug builds when I clicked the test results page (served as a file). > > Could we put this in a separate change? It seems like you should get the > assert crashes without the rest of this change. Sure. It might also have been fixed by the handling of null origins since I believe that's what happened for the file document. > >>> Source/WebCore/platform/RegistrableDomain.h:75 > >>> + if (host.length() == m_registrableDomain.length()) > >> > >> I would make this <= to protect against reading out of bounds below. Otherwise the line below needs to make sure it's reading in bounds. > > > > But that would make it do the wrong thing, right? At least this line, since it would then say "ebkit.org" matches "webkit.org." > > > > The current implementation says: > > 1. It has to end with the registrable domain. This rules out anything that's shorter than what we're matching against. (I just added an API test for this to my patch to be sure.) > > 2. Now, if they are the exact same length, they match. > > 3. If not, then check if the next character to the left is a period. Since what we're matching with cannot be shorter and is not the same length, it has to be longer. > > > > This would still work with your suggested <=, but the line would indicate the wrong thing. > > > > It's a tradeoff, I guess. Please advise. > > Hmmm. You're right. The endsWith call prevents any reading out of bounds. Yeah. But I agree with you that neither of these two look clear and nice. :/
John Wilander
Comment 33 2019-02-25 12:49:06 PST
(In reply to John Wilander from comment #30) > (In reply to Alex Christensen from comment #29) > > Comment on attachment 362789 [details] > > > > Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:1518 > > > - process->clearPrevalentResource(m_sessionID, primaryDomain, [processPool, callbackAggregator = callbackAggregator.copyRef()] { }); > > > + process->clearPrevalentResource(m_sessionID, url.host().toString(), [processPool, callbackAggregator = callbackAggregator.copyRef()] { }); > > > > This could just pass url, right? > > Yes. Will fix. I did this change and got a test failure. Tons of logging and debugging later, I realize this takes the trip through the NetworkProcessProxy which is still handling strings. So the URL is converted to a string which includes scheme and port, not just the host. This in turn calls RegistrableDomain's string constructor which decides to use the whole thing as its internal representation since the Public Suffix functionality returns an empty result. This all tells me we should try to get rid of the RegistrableDomain's string constructor in a follow up patch. Callers will have to construct URLs if they just have a string. Do you agree?
John Wilander
Comment 34 2019-02-25 13:13:11 PST
Created attachment 362920 [details] Patch for landing
WebKit Commit Bot
Comment 35 2019-02-25 13:53:03 PST
Comment on attachment 362920 [details] Patch for landing Clearing flags on attachment: 362920 Committed r242056: <https://trac.webkit.org/changeset/242056>
WebKit Commit Bot
Comment 36 2019-02-25 13:53:05 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.