There should be an API for prevalent domains on a per page basis.
rdar://problem/59270758
Created attachment 390332 [details] Patch
Created attachment 390370 [details] Patch
Comment on attachment 390370 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390370&action=review > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:302 > + updatePrevalentDomains(request.url()); For every sub-resource load, we may now send a NetworkProcessProxy::UpdatePrevalentDomains() to the UIProcess. Considering that there may be hundreds of sub-resources loads in a single page, many of them for the same registrable domain, this seems excessive. > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:342 > + const auto domain = RegistrableDomain(url); We usually don't use const locals in WebKit. > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:413 > + completionHandler({ }); Shouldn't this be completionHandler(nil); ? > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:417 > + auto webPageProxy = [webView _page]; auto* > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:419 > + completionHandler({ }); Shouldn't this be completionHandler(nil); ? > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:436 > + auto webPageProxy = [webView _page]; auto* > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:521 > +void NetworkProcessProxy::updatePrevalentDomains(WebPageProxyIdentifier pageID, RegistrableDomain domain) RegistrableDomain&& since it is coming from IPC and you're moving it inside the implementation. > Source/WebKit/UIProcess/WebPageProxy.cpp:9859 > +void WebPageProxy::updatePrevalentDomains(WebCore::RegistrableDomain&& domain) WebCore:: is likely not needed. > Source/WebKit/UIProcess/WebPageProxy.h:1688 > + HashSet<WebCore::RegistrableDomain>& prevalentDomains() { return m_prevalentDomains; } should this be a const getter returning a const HashSet?
Comment on attachment 390370 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390370&action=review > Tools/WebKitTestRunner/InjectedBundle/InjectedBundle.cpp:419 > + domains.append(toWTFString(static_cast<WKStringRef>(item))); You can use uncheckedAppend() since you called reserveInitialCapacity() > Tools/WebKitTestRunner/InjectedBundle/InjectedBundle.cpp:422 > + m_testRunner->callDidReceivePrevalentDomainsCallback(domains); WTFMove(domains) > Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp:2361 > + WKBundlePostSynchronousMessage(InjectedBundle::singleton().bundle(), messageName.get(), nullptr, nullptr); No, this should not be a synchronous message. It should be an asynchronous one since you're taking a callback. > Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp:2364 > +void TestRunner::callDidReceivePrevalentDomainsCallback(Vector<String>& domains) Vector<String>&& > Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp:2370 > + stringBuilder.appendLiteral("["); append('[') > Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp:2371 > + bool firstDomain = true; Boolean variables need a prefix: isFirstDomain for e.g. > Tools/WebKitTestRunner/InjectedBundle/TestRunner.cpp:2381 > + stringBuilder.appendLiteral("]"); append(']') > Tools/WebKitTestRunner/TestInvocation.cpp:938 > + if (WKStringIsEqualToUTF8CString(messageName, "GetPrevalentDomains")) { Should not be in didReceiveSynchronousMessage*(), use the async message version please. > Tools/WebKitTestRunner/TestInvocation.h:89 > + void didReceivePrevalentDomains(Vector<String>& domains); Vector<String>&& > Tools/WebKitTestRunner/cocoa/TestControllerCocoa.mm:346 > + m_currentInvocation->didReceivePrevalentDomains(domains); WTFMove(domains)
(In reply to Chris Dumez from comment #4) > Comment on attachment 390370 [details] > Patch > Thanks for the comments! I'll make these changes. > View in context: > https://bugs.webkit.org/attachment.cgi?id=390370&action=review > > > Source/WebKit/NetworkProcess/NetworkResourceLoader.cpp:302 > > + updatePrevalentDomains(request.url()); > > For every sub-resource load, we may now send a > NetworkProcessProxy::UpdatePrevalentDomains() to the UIProcess. Considering > that there may be hundreds of sub-resources loads in a single page, many of > them for the same registrable domain, this seems excessive. > Do you think a good solution is to batch in a HashSet and send every few seconds, like WebResourceLoadObserver?
Created attachment 390472 [details] Patch
(In reply to Chris Dumez from comment #5) > Comment on attachment 390370 [details] > Patch > Thanks for the comments! (In reply to Chris Dumez from comment #4) > Comment on attachment 390370 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=390370&action=review > > > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:417 > > + auto webPageProxy = [webView _page]; > > auto* > This one gave me a compiler error, but I addressed the rest.
Created attachment 390474 [details] Patch
Created attachment 390524 [details] Patch
Comment on attachment 390474 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390474&action=review This patch seems to completely ignore loads served by the memory cache and ping/beacon loads, which seems wrong? I think you likely want centralized logging in CachedResourceLoader::requestResource() which I believe is our bottleneck *before* the memory cache (Youenn can confirm if this would cover everything or not). I think it would cover everything. > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:791 > +void NetworkConnectionToWebProcess::checkForPrevalentSubresourceLoad(WebPageProxyIdentifier pageID, RegistrableDomain&& domain) Please use webPageProxyID instead of pageID. We stopped using pageID now that we have 2 identifier types for pages (one for WebPageProxy and another for WebPage/Page). > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:797 > + m_networkProcess->parentProcessConnection()->send(Messages::NetworkProcessProxy::UpdatePrevalentDomains(pageID, domain), 0); You should be able to call send() on m_networkProcess directly since NetworkProcess class subclasses IPC::MessageSender. > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:417 > + auto webPageProxy = [webView _page]; auto*, assuming it returns a raw pointer. > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:1151 > +void NetworkProcessProxy::updatePrevalentDomains(WebKit::WebPageProxyIdentifier pageID, WebCore::RegistrableDomain&& domain) WebKit:: is unnecessary. WebCore:: likely is too. > Source/WebKit/UIProcess/Network/NetworkProcessProxy.messages.in:53 > + UpdatePrevalentDomains(WebKit::WebPageProxyIdentifier pageIdentifier, WebCore::RegistrableDomain domain) Shouldn't this be called AddPrevalentDomainForPage() ? > Source/WebKit/UIProcess/WebPageProxy.cpp:9860 > +void WebPageProxy::updatePrevalentDomains(RegistrableDomain&& domain) Seems like this should be named addPrevalentDomain(). > Source/WebKit/UIProcess/WebPageProxy.cpp:9868 > + send(Messages::WebPage::ClearLoadedSubDomains()); What's a SubDomain? I don't think this is a term we normally use in WebKit. We use registrable domain, domain or host. Here I assume you mean registrable domain. > Source/WebKit/UIProcess/WebPageProxy.h:1548 > + HashSet<WebCore::RegistrableDomain>& prevalentDomains() { return m_prevalentDomains; } As previously commented, this getter should likely be const and return a const HashSet<>&. > Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:108 > void WebLoaderStrategy::loadResource(Frame& frame, CachedResource& resource, ResourceRequest&& request, const ResourceLoaderOptions& options, CompletionHandler<void(RefPtr<SubresourceLoader>&&)>&& completionHandler) How about ping loads (or beacons), do you intend to log those too? If so, you likely need to add logging to startPingLoad() > Source/WebKit/WebProcess/WebCoreSupport/WebFrameLoaderClient.cpp:908 > + webPage->updateLoadedSubDomains(RegistrableDomain(request.url())); This is not quite right, at this point we have not started to load the domain yet and the client may refuse to proceed with the load via the policy delegate. Odds are that you want to do this in didStartProvisionalLoad(). Also, you are again using an odd naming for this method. Seems to me it should be called something like addLoadedRegistrableDomain(). Also you should check but I believe decidePolicyForNavigationAction() / didStartProvisionalLoadForFrame() and WebLoaderStrategy::loadResource() are all called for main resources. Therefore, your code in WebLoaderStrategy::loadResource() may be sufficient. I believe WebLoaderStrategy::loadResource() is called for all async resource loads (main and sub resources). Not sure which domains you are attempting to log exactly (I assume all). > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6672 > + if (!m_loadedSubDomains.contains(domain)) { You're doing a double hash lookup here, once in contains, and then again in add(). You should call add() and use the returned AddResult to check it is was new or not. > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6674 > + WebProcess::singleton().ensureNetworkProcessConnection().connection().send(Messages::NetworkConnectionToWebProcess::CheckForPrevalentSubresourceLoad(m_webPageProxyIdentifier, domain), 0); You're calling this "Check" but it does not actually expect a response. Seems like you are "Reporting" instead.
Comment on attachment 390524 [details] Patch See comments on previous patch.
(In reply to Chris Dumez from comment #11) > Comment on attachment 390474 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=390474&action=review > > This patch seems to completely ignore loads served by the memory cache and > ping/beacon loads, which seems wrong? I think you likely want centralized > logging in CachedResourceLoader::requestResource() which I believe is our > bottleneck *before* the memory cache (Youenn can confirm if this would cover > everything or not). I think it would cover everything. > I think this covers cached loads. I say this because with the original patch logging domains in NetworkResourceLoader, the layout tests I added were failing because the iframe was in the memory cache and never made it to the Network Process. But here the tests run fine. I could be wrong though, Youenn should definitely confirm or deny.
(In reply to katherine_cheney from comment #13) > (In reply to Chris Dumez from comment #11) > > Comment on attachment 390474 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=390474&action=review > > > > This patch seems to completely ignore loads served by the memory cache and > > ping/beacon loads, which seems wrong? I think you likely want centralized > > logging in CachedResourceLoader::requestResource() which I believe is our > > bottleneck *before* the memory cache (Youenn can confirm if this would cover > > everything or not). I think it would cover everything. > > > > I think this covers cached loads. I say this because with the original patch > logging domains in NetworkResourceLoader, the layout tests I added were > failing because the iframe was in the memory cache and never made it to the > Network Process. But here the tests run fine. I could be wrong though, > Youenn should definitely confirm or deny. You are NOT covering *memory* cache loads. The only improvement in your new patch is that you are now covering *disk* cache loads.
Comment on attachment 390524 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390524&action=review > Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:112 > + auto* webPage = webFrame ? webFrame->page() : nullptr; Ping/beacon loads nowadays should go through this code path. This is indeed missing all loads that are served by MemoryCache (content served directly by an existing CachedResource). CachedResourceLoader::requestResource is the place for all loads. At the same time, if it is in the memory cache, we should have already queried network process to know whether that is a prevalent resource. I guess we might miss the case where a cached resource, not prevalent at the time, is becoming prevalent after this load. Not sure whether this has an impact here. > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6670 > +void WebPage::updateLoadedSubDomains(WebCore::RegistrableDomain domain) s/WebCore::RegistrableDomain/RegistrableDomain&&/ or const& > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6674 > + WebProcess::singleton().ensureNetworkProcessConnection().connection().send(Messages::NetworkConnectionToWebProcess::CheckForPrevalentSubresourceLoad(m_webPageProxyIdentifier, domain), 0); We will send two IPC messages for every request (at least those that are not prevalent). If we do not need to check for cached resources, maybe we could just send the regular request and add an option to the regular request to query the prevalent load.
Comment on attachment 390524 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390524&action=review > Source/WebKit/UIProcess/WebPageProxy.cpp:9867 > + m_prevalentDomains.clear(); Shouldn't we clear these when the page commits a new main frame load (i.e. the user navigated away)? >> Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:112 >> + auto* webPage = webFrame ? webFrame->page() : nullptr; > > Ping/beacon loads nowadays should go through this code path. > This is indeed missing all loads that are served by MemoryCache (content served directly by an existing CachedResource). > CachedResourceLoader::requestResource is the place for all loads. > > At the same time, if it is in the memory cache, we should have already queried network process to know whether that is a prevalent resource. > I guess we might miss the case where a cached resource, not prevalent at the time, is becoming prevalent after this load. > Not sure whether this has an impact here. I was referring to this branch in CachedResource::load(): if (m_options.keepAlive && shouldUsePingLoad(type()) && platformStrategies()->loaderStrategy()->usePingLoad()) { }
Comment on attachment 390524 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390524&action=review > Source/WebKit/ChangeLog:5 > + rdar://problem/59270758 Nit: <rdar://problem/59270758> >> Source/WebKit/WebProcess/Network/WebLoaderStrategy.cpp:112 >> + auto* webPage = webFrame ? webFrame->page() : nullptr; > > Ping/beacon loads nowadays should go through this code path. > This is indeed missing all loads that are served by MemoryCache (content served directly by an existing CachedResource). > CachedResourceLoader::requestResource is the place for all loads. > > At the same time, if it is in the memory cache, we should have already queried network process to know whether that is a prevalent resource. > I guess we might miss the case where a cached resource, not prevalent at the time, is becoming prevalent after this load. > Not sure whether this has an impact here. I think we do want to capture information about prevalent resources that were requested from cached content, even though we would have taken corrective action. Ideally, a cold page load and a warm load from the cache would produce the same set of prevalent resources encountered. Of course, unclassified domains that later became classified would introduce some variance, but I think that is acceptable for this use case. >> Source/WebKit/WebProcess/WebPage/WebPage.cpp:6674 >> + WebProcess::singleton().ensureNetworkProcessConnection().connection().send(Messages::NetworkConnectionToWebProcess::CheckForPrevalentSubresourceLoad(m_webPageProxyIdentifier, domain), 0); > > We will send two IPC messages for every request (at least those that are not prevalent). > If we do not need to check for cached resources, maybe we could just send the regular request and add an option to the regular request to query the prevalent load. Maybe we could collect these domains as the sub resource loads occur, then send one message when the page load completes?
I missed the fact that the hash set is for subdomains, not prevalent subdomains. Might not be a big deal then.
Created attachment 390541 [details] Patch
Thanks for the feedback everyone! This patch addresses all comments except for this one (from Chris): > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:417 > + auto webPageProxy = [webView _page]; auto*, assuming it returns a raw pointer. I get error: variable 'webPageProxy' with type 'auto *' has incompatible initializer of type 'NakedPtr<WebKit::WebPageProxy>’
Created attachment 390542 [details] Patch
Comment on attachment 390542 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390542&action=review This is much better but I am not sure I understand why we want to store these registrable domains in the UIProcess at all. > Source/WebCore/loader/FrameLoaderClient.h:37 > +#include "RegistrableDomain.h" Can be forward-declared instead. > Source/WebCore/loader/cache/CachedResourceLoader.cpp:794 > + m_documentLoader->frameLoader()->client().addPrevalentDomain(RegistrableDomain(request.resourceRequest().url())); Seems like you could get the frame loader like so: frame->loader() since you already have the frame. Seems more straightforward > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:791 > +void NetworkConnectionToWebProcess::reportSubresourceLoad(WebPageProxyIdentifier webPageProxyID, RegistrableDomain&& domain) I don't really like the name, seems too generic. Maybe something like reportSubresourceLoadToDomain() > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStorePrivate.h:65 > +- (void)_getPrevalentDomainsFor:(WKWebView *)webView completionHandler:(void (^)(NSArray<NSString *> *domains))completionHandler WK_API_AVAILABLE(macos(WK_MAC_TBA), ios(WK_IOS_TBA)); Do we expect this to get called often? Given that this takes in a completion handler, another (simpler) approach would have been to keep the registrable domains on the WebPage (not the WebPageProxy) and then send an async IPC to the WebProcess to get the list of Registrable domains. This would avoid the awkward IPC from the NetworkProcess to the UIProcess to tell it to add a registrable domain for a WebPage for e.g. This would also simplify invalidation of registrable domains when necessary. > Source/WebKit/UIProcess/WebPageProxy.cpp:4529 > + if (frame->isMainFrame()) Can you reuse the if (frame->isMainFrame()) { block at line 4570. > Source/WebKit/WebProcess/WebCoreSupport/WebFrameLoaderClient.cpp:163 > + webPage->addLoadedRegistrableDomain(WTFMove(domain)); Something weird here: addPrevalentDomain() calls addLoadedRegistrableDomain() ??
Created attachment 390580 [details] Patch
Comment on attachment 390580 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390580&action=review Almost perfect, just a few more improvements. > Source/WebCore/loader/cache/CachedResourceLoader.cpp:794 > + frame.loader().client().addLoadedRegistrableDomain(RegistrableDomain(request.resourceRequest().url())); You may need something in CachedResource::redirectReceived() too to handle redirects. Please double check to Youenn where is the best place to add the addLoadedRegistrableDomain() for handling redirect cases. > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:791 > +void NetworkConnectionToWebProcess::isPrevalentSubresourceLoad(WebPageProxyIdentifier webPageProxyID, RegistrableDomain&& domain, CompletionHandler<void(bool)>&& completionHandler) webPageProxyID parameter is unused and should be dropped. > Source/WebKit/NetworkProcess/NetworkConnectionToWebProcess.cpp:801 > +#endif Please add a blank line after this one. > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:423 > +#if ENABLE(RESOURCE_LOAD_STATISTICS) Why isn't this the first line of this method? > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:424 > + webPageProxy->getPrevalentDomains([completionHandler = makeBlockPtr(completionHandler)] (HashSet<WebCore::RegistrableDomain> prevalentDomains) { HashSet<WebCore::RegistrableDomain>&& > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:428 > + apiDomains.uncheckedAppend(API::String::create(domain.string())); WTFMove(domain.string()) but you will need to add this to RegistrableDomain: String& string() { return m_registrableDomain; } > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:445 > +#if ENABLE(RESOURCE_LOAD_STATISTICS) Why isn't this the first line of this method? > Source/WebKit/UIProcess/WebPageProxy.cpp:4567 > + clearPrevalentDomains(); This is weird, it should be in the didCommitLoadForFrame() in the WebProcess instead, now that we don't store them in the UIProcess. It does unnecessary IPC otherwise. > Source/WebKit/WebProcess/WebCoreSupport/WebFrameLoaderClient.cpp:159 > + WebPage* webPage = m_frame->page(); auto* > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6593 > + clearPrevalentDomains(); Not needed anymore now that things are only stored in the WebProcess. You want this in didCommitLoadForFrame() when we are the main frame though (in WebProcess side). > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6673 > +void WebPage::addLoadedRegistrableDomain(WebCore::RegistrableDomain&& domain) I doubt WebCore:: is really needed. > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6677 > + WebProcess::singleton().ensureNetworkProcessConnection().connection().sendWithAsyncReply(Messages::NetworkConnectionToWebProcess::IsPrevalentSubresourceLoad(m_webPageProxyIdentifier, domain), [this, protectedThis = makeRefPtr(*this), domain] (bool isPrevalent) { m_webPageProxyIdentifier parameter is unnecessary.
Comment on attachment 390580 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390580&action=review > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:427 > + for (auto& domain : prevalentDomains) Youenn speaking through my hands: "this should use WTF::map(prevalentDomains, [](auto& domain) { return API::String::create(WTFMove(domain.string())) });"
Created attachment 390706 [details] Patch
Comment on attachment 390706 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=390706&action=review r=me with changes > Source/WebCore/loader/cache/CachedResourceLoader.cpp:794 > + frame.loader().client().addLoadedRegistrableDomain(RegistrableDomain(request.resourceRequest().url())); I think this should be moved towards the end of the method to take into account content extensions/blockers for example (we may block the load or update the url). Probably by this line at the end: m_documentResources.set(resource->url(), resource); > Source/WebKit/UIProcess/API/Cocoa/WKWebsiteDataStore.mm:426 > + apiDomains = WTF::map(prevalentDomains, [](auto& domain) { this can be on the previous line: Vector<RefPtr<API::Object>> apiDomains = WTF::map(); > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6678 > + WebProcess::singleton().ensureNetworkProcessConnection().connection().sendWithAsyncReply(Messages::NetworkConnectionToWebProcess::IsPrevalentSubresourceLoad(domain), [this, protectedThis = makeRefPtr(*this), domain] (bool isPrevalent) { makeRef(), not makeRefPtr() > Source/WebKit/WebProcess/WebPage/WebPage.cpp:6687 > + nit: extra blank line
Created attachment 390715 [details] Patch for landing
Thanks Chris and Youenn!
The commit-queue encountered the following flaky tests while processing attachment 390715 [details]: editing/spelling/spellcheck-attribute.html bug 206178 (authors: g.czajkowski@samsung.com, mark.lam@apple.com, and rniwa@webkit.org) The commit-queue is continuing to process your patch.
Comment on attachment 390715 [details] Patch for landing Clearing flags on attachment: 390715 Committed r256583: <https://trac.webkit.org/changeset/256583>
All reviewed patches have been landed. Closing bug.
<rdar://problem/59448257>
The new test http/tests/resourceLoadStatistics/prevalent-domains-per-page-database.html is a flaky timeout on iOS, tracking in https://bugs.webkit.org/show_bug.cgi?id=207944