Use a NetworkProcessProxy singleton instead of one per WebProcessPool
Created attachment 382149 [details] Patch
Comment on attachment 382149 [details] Patch Not quite ready yet, but I thought I'd upload my WIP patch. Most things work.
Created attachment 383180 [details] Patch
Just rebasing. Still not ready yet.
Created attachment 383233 [details] Patch
Thanks for the patch. If this patch contains new public API please make sure it follows the guidelines for new WebKit2 GTK+ API. See http://trac.webkit.org/wiki/WebKitGTK/AddingNewWebKit2API
Created attachment 383235 [details] Patch
Created attachment 383242 [details] Patch
Created attachment 383243 [details] Patch
Created attachment 383244 [details] Patch
Comment on attachment 383244 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=383244&action=review > Source/WebKit/ChangeLog:4 > + Use a NetworkProcessProxy singleton instead of one per WebProcessPool > + https://bugs.webkit.org/show_bug.cgi?id=203547 Please include information about why you are making this change and how it is beneficial.
Created attachment 391917 [details] Patch
Comment on attachment 391917 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=391917&action=review > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:161 > + , m_throttler(*this, false) Why is this OK?
It's not. As you can tell, this latest patch is not quite ready yet. I wasn't sure what to do with the process throttlers, so I removed that for now. We will need to fix that before landing, as well as fix other things.
Created attachment 391956 [details] Patch
Closer, but still not quite ready. Chris, I'm interested in what you think we should do about the process throttlers
Would still love to know the "why" here.
Still plan to explain that in the changelog once it works. Until then you can read the test I added and see the kind of thing that developers do a lot and doesn't work until this fix.
(In reply to Alex Christensen from comment #16) > Closer, but still not quite ready. Chris, I'm interested in what you think > we should do about the process throttlers Maybe get the flag from the first process pool you find?
and rdar://problem/56027111
Created attachment 406104 [details] Patch
Created attachment 406105 [details] Patch
Created attachment 406434 [details] Patch
Created attachment 406435 [details] Patch
Do you intend for this to be up for review? It still lacks any details in the ChangeLog? -- Alex has explained in a separate situation that the impetus behind this change (and please correct me if I have the details wrong) is due to the fact that multiple network processes can't share an http cookie store. So if you make multiple WKWebViews, all sharing the same default WKWebsiteDataStore, and then try to get the httpCookieStore from the WKWebsiteDataStore, it will return one of the network processes cookie stores at random. This is clearly bad. So, assuming we can't fix the underlying issue and make the network libraries cookie stores work in multiple processes at time, we likely have to change how network process sharing works. An alternative approach to solving this problem, which I think has various nice properties, would be to make the network process per-WKWebsiteDataStore (and probably having one global one for all ephemeral WKWebsiteDataStores). In practice, this will not be hugely different than what I think is being proposed here, as for public api, there is only the defaultDataStore global and the nonPersistentDataStores (e.g. ephemeral). The benefit though is that it provides flexibility to expand in the future, for instance, if there is desire to allow third parties to create their own WKWebsiteDataStores so they can keep certain data segregated from each other (e.g. an app that uses WKWebViews for both ads and their own content, might not want to share website data between them). While in principle, it might be possible that a singleton network process could provide that segregation if there was not global state, history has shown us that system libraries do not always cooperate. So to sum up, I think a better more conservative and future proof version of this would be to make the network process not a singleton, but per-WKWebsiteDataStores.
Created attachment 406446 [details] Patch
No review yet. There's still something wrong with service workers, but I'm letting EWS tell me if there's anything else wrong. Sam, you are correct that cookie storage and correctness is the motivation for this. Making one network process per WKWebsiteDataStore would also solve that correctness problem, but it would add the cost of launching and running another process per private tab, which would be a considerable performance regression I don't think we want. The ability to create a persistent cookie storage from a path already exists, so that is not a motivation to separate network processes.
Created attachment 406447 [details] Patch
Hm, I see this was a lot of work. Why are you trying to move to single NetworkProcessProxy? I only looked at this briefly, but I'm not sure this can be made to work easily for soup. Problem is WebProcessPool corresponds to WebKitWebContext, the library context object. Remember WebProcessPool used to be called WebContext, right? Separate WebProcessPools currently have to use separate network processes, otherwise they're no longer really separate contexts at all. Say library A and library B are trying to use WebKit with different network settings, or say an application is trying to use WebKit in two different ways in two different places... the same network process shouldn't be used for both contexts because they have process-level settings that would clobber each other. That said, I think it might be possible to do without sacrificing correctness. We would need to move all global NetworkProcess settings to be properties of NetworkSession instead (there are currently many that aren't, e.g. we never fixed bug #195140), then ensure a different NetworkSession gets used for each WebProcessPool (currently not the case), and then somehow get rid of SessionID::DefaultSessionID. That's probably easier said than done....
(In reply to Alex Christensen from comment #27) > No review yet. There's still something wrong with service workers, but I'm > letting EWS tell me if there's anything else wrong. > > Sam, you are correct that cookie storage and correctness is the motivation > for this. Making one network process per WKWebsiteDataStore would also > solve that correctness problem, but it would add the cost of launching and > running another process per private tab, which would be a considerable > performance regression I don't think we want. As I noted, having a single network process for all ephemeral data stores would likely be okay, or if you wanted to, having a way for ephemeral data stores to use the default data stores network process might also be fine as an optimization. > The ability to create a > persistent cookie storage from a path already exists, so that is not a > motivation to separate network processes. I don't think I stated it was. I really don't think a singleton is the right architectural or future thinking option here, and before committing to such a change, we should all agree this is the right direction.
Michael, what are these "process-level settings" and why can't they be made to be properties of the NetworkSession? That's what I've been doing to prepare for this, and that is why I asked the libsoup network implementation maintainers to do the same. Sam, could you give a concrete example of why having more processes would be better "future thinking"? If we need an optimization to use one process for multiple sessions, why would we not make that the default?
(In reply to Sam Weinig from comment #25) > Alex has explained in a separate situation that the impetus behind this > change (and please correct me if I have the details wrong) is due to the > fact that multiple network processes can't share an http cookie store. So if > you make multiple WKWebViews, all sharing the same default > WKWebsiteDataStore, and then try to get the httpCookieStore from the > WKWebsiteDataStore, it will return one of the network processes cookie > stores at random. This is clearly bad. > > So, assuming we can't fix the underlying issue and make the network > libraries cookie stores work in multiple processes at time, we likely have > to change how network process sharing works. Oh OK... sorry for failing to read. Yeah that is not good. I wonder if the libsoup backend might have similar problems.... > An alternative approach to solving this problem, which I think has various > nice properties, would be to make the network process per-WKWebsiteDataStore > (and probably having one global one for all ephemeral WKWebsiteDataStores). > In practice, this will not be hugely different than what I think is being > proposed here, as for public api, there is only the defaultDataStore global > and the nonPersistentDataStores (e.g. ephemeral). Sadly, we allow setting a WebKitWebsiteDataStore on both a WebkitWebContext or a WebKitWebView, where the web view setting overrides the web context setting. That is, the same website data store can be shared between two different views that use two different web contexts. And this is baked into public API, so it can't be changed. Now that's not a total blocker, but it makes any change quite tricky for the same reason that a singleton network process is tricky: settings that are currently global to the network process would need to somehow be associated with the correct web context (probably via NetworkSession) because one network process could be used by multiple web contexts. That said, removing global state is good and we should try to do it regardless. But I think we have to completely remove SessionID::DefaultSessionID from the codebase to make that work, and I'm not sure if that's practical to do.
(In reply to Alex Christensen from comment #31) > Michael, what are these "process-level settings" and why can't they be made > to be properties of the NetworkSession? Things like: webkit_web_context_set_tls_errors_policy(), webkit_web_context_allow_tls_certificate_for_host(), webkit_web_context_set_preferred_languages(), webkit_cookie_manager_set_accept_policy(), etc. We actually *can* make them all properties of the NetworkSession, at least I think so. Remember I tried to help with this a year ago because you requested it, then abandoned the effort when you suggested it was no longer a priority. (Some history in bug #195140 where I was trying to move the accept policy setting.) I don't have time to retake that work myself, but maybe someone from Igalia could push it through if you want to reopen that bug. In theory, once everything is removed from NetworkProcessCreationParameters, we should be almost good to go from the network process side of things... but then the next problem is that we need to fix up the UI process to stop using SessionID::DefaultSessionID all over the place. Soup port code has a lot of assumptions that the only NetworkSession is the default NetworkSession, and that all will have to be audited and fixed up. The soup backend has never supported multiple NetworkSessions per web context so it was just never important before.
We don't need to remove everything from NetworkProcessCreationParameters, just everything that is not also global to the UI process. Right now it's being used as a way to make global things in the network process that are just properties of one WebProcessPool that could be different in a different WebProcessPool. This is what must be shifted away from in order to fix these cookie bugs. GTK API can be shimmed to do basically what it used to, deprecated, and replaced. This is part of why we haven't exposed any public API on WKProcessPool.
(In reply to Alex Christensen from comment #31) > > Sam, could you give a concrete example of why having more processes would be > better "future thinking"? Sure. If we wanted to improve defense against Spectre style attacks on user data, using multiple processes is the best option going. Also, as I mentioned above, historically, there have been system APIs that are only exposed on a per-process level. So, if we added API to created data segregated WKWebsiteDataStores that depended on any of those, we would need multiple network processes (this was the case for the shimmed credentials for quite a while, I am not sure if it is still the case). > If we need an optimization to use one process for > multiple sessions, why would we not make that the default? I don't think I said don't make it the default. What I want to avoid is reducing the flexibility of the architecture by baking in the concept of a singleton network process, when conceptually, keeping it multi-instance and aligning it with WKWebsiteDataStores would work just as well. What concerns do you have the approach I have outlined? It seem like a nice compromise. I achieves the goals you have set out and maintains flexibility.
Created attachment 406544 [details] Patch
(In reply to Sam Weinig from comment #35) > (In reply to Alex Christensen from comment #31) > > > > Sam, could you give a concrete example of why having more processes would be > > better "future thinking"? > > Sure. If we wanted to improve defense against Spectre style attacks on user > data, using multiple processes is the best option going. Spectre style attacks are easily achievable in the web processes because they run untrusted JavaScript. That is not true of the networking process. I suppose if we do find some kind of speculative execution attack that can be performed without execution, we would need to consider process separation. Until then, I think it is better to have a smaller number of processes because of the substantial overhead of launching and sustaining a process. > Also, as I mentioned above, historically, there have been system APIs that > are only exposed on a per-process level. So, if we added API to created data > segregated WKWebsiteDataStores that depended on any of those, we would need > multiple network processes (this was the case for the shimmed credentials > for quite a while, I am not sure if it is still the case). Historically this has been true. I have worked long and hard to make sure it is no longer true. Bug 213048 is me finishing that work. > > If we need an optimization to use one process for > > multiple sessions, why would we not make that the default? > > I don't think I said don't make it the default. What I want to avoid is > reducing the flexibility of the architecture by baking in the concept of a > singleton network process, when conceptually, keeping it multi-instance and > aligning it with WKWebsiteDataStores would work just as well. If it's not the default, then nobody would choose to use an obscure setting to slow down their application for no benefit. > What concerns do you have the approach I have outlined? It seem like a nice > compromise. I achieves the goals you have set out and maintains flexibility. Process overhead.
(In reply to Alex Christensen from comment #37) > (In reply to Sam Weinig from comment #35) > > (In reply to Alex Christensen from comment #31) > > > > > > Sam, could you give a concrete example of why having more processes would be > > > better "future thinking"? > > > > Sure. If we wanted to improve defense against Spectre style attacks on user > > data, using multiple processes is the best option going. > Spectre style attacks are easily achievable in the web processes because > they run untrusted JavaScript. That is not true of the networking process. > I suppose if we do find some kind of speculative execution attack that can > be performed without execution, we would need to consider process > separation. Until then, I think it is better to have a smaller number of > processes because of the substantial overhead of launching and sustaining a > process. You are also conflating the architecture and the number of processes used. I am not suggesting we use more processes. I am suggesting we don't use a singleton pattern, and instead tie network processor ownership to WebsiteDataStore. As I indicated when responding to your concern about ephemeral pages, the number of actual processes used can be optimized how ever one wants. > > > Also, as I mentioned above, historically, there have been system APIs that > > are only exposed on a per-process level. So, if we added API to created data > > segregated WKWebsiteDataStores that depended on any of those, we would need > > multiple network processes (this was the case for the shimmed credentials > > for quite a while, I am not sure if it is still the case). > Historically this has been true. I have worked long and hard to make sure > it is no longer true. Bug 213048 is me finishing that work. There is no reason to believe this wouldn't be true in the future again. System frameworks often make assumptions about global state assuming all their clients will be applications and without thinking about the ramifications of other frameworks that need to use that functionality. > > > > If we need an optimization to use one process for > > > multiple sessions, why would we not make that the default? > > > > I don't think I said don't make it the default. What I want to avoid is > > reducing the flexibility of the architecture by baking in the concept of a > > singleton network process, when conceptually, keeping it multi-instance and > > aligning it with WKWebsiteDataStores would work just as well. > If it's not the default, then nobody would choose to use an obscure setting > to slow down their application for no benefit. I don't understand this. What obscure setting are you referring to? > > > What concerns do you have the approach I have outlined? It seem like a nice > > compromise. I achieves the goals you have set out and maintains flexibility. > > Process overhead. So far, what I have proposed does not create any additional processes overhead.
Created attachment 406548 [details] Patch
(In reply to Sam Weinig from comment #38) > > > Also, as I mentioned above, historically, there have been system APIs that > > > are only exposed on a per-process level. So, if we added API to created data > > > segregated WKWebsiteDataStores that depended on any of those, we would need > > > multiple network processes (this was the case for the shimmed credentials > > > for quite a while, I am not sure if it is still the case). > > Historically this has been true. I have worked long and hard to make sure > > it is no longer true. Bug 213048 is me finishing that work. > > There is no reason to believe this wouldn't be true in the future again. > System frameworks often make assumptions about global state assuming all > their clients will be applications and without thinking about the > ramifications of other frameworks that need to use that functionality. I disagree. Session based networking and not breaking session boundaries is pretty fundamental these days, and if anyone breaks the boundaries it will already be a substantial privacy breach. > > > > If we need an optimization to use one process for > > > > multiple sessions, why would we not make that the default? > > > > > > I don't think I said don't make it the default. What I want to avoid is > > > reducing the flexibility of the architecture by baking in the concept of a > > > singleton network process, when conceptually, keeping it multi-instance and > > > aligning it with WKWebsiteDataStores would work just as well. > > If it's not the default, then nobody would choose to use an obscure setting > > to slow down their application for no benefit. > > I don't understand this. What obscure setting are you referring to? I was referring to a switch somewhere in your proposed solution to switch between one network process and a network process per WebsiteDataStore. I think such a switch would be hard to find and not very useful. > So far, what I have proposed does not create any additional processes > overhead. If I have Safari open with 10 private tabs, your proposal creates 9 additional processes.
I think doing this does not prevent us from making a network process per WebsiteDataStore if we find a reason that paying the process price is worth it. I just don't think we have such a reason.
(In reply to Alex Christensen from comment #41) > I think doing this does not prevent us from making a network process per > WebsiteDataStore if we find a reason that paying the process price is worth > it. I just don't think we have such a reason. Maybe we need it for the other ports, at least for the soup based ports using the glib API. This is how the GLib API currently works: - WebKitWebsiteDataManager: this is the WebsiteDataStore. If you use _new() constructor you get a persistent data store with the default session ID, so two persistent data managers both use the same default session ID (this is not a problem with the current implementation). If you use new_ephemeral() a non-persistent data store is created. - WebKitWebContext: this is the WebProcessPool. It has a construct only property for the website data manager, which means it's required for constructing the object and it can't change for its lifetime. This is always the primary data store for the WebProcessPool and the only one in our case. - WebKitWebView: this is WebPageProxy. The web view also has a website data manager associated, but *only* to allow ephemeral web views on non-ephemeral web contexts. So, by default the web view uses the data manager of its web context, and there's no API to set a different data manager for a web view. When a web view is created with new_ephemeral() and the web context data manager is persistent a new ephemeral data manager is created and associated to the web view. Again this happens at construction and can't change. So, with this model, it's not a problem to have two persistent data managers using the same default session id, because it's not possible to have both in the same network process. If two contexts are created we will have two network processes each with a default session. If you create two contexts sharing the same data manager, both network process would be accessing the same cookies storage, which is what this bug is trying to solve, right? So, the problem is present in soup based ports too, but not that easy to happen, most of the applications create just one web context and if they create more than one is precisely to use a different data store configuration. If we follow Sam's suggestion, and we use a network process per data store, the bug would be solved for us (unless you create two different data stores with the same cookies configuration, but I think we can consider that a programmer error and just document it in the API). I like the idea of using one network process per persistent data store and one network process for all non-persistent data stores.
(In reply to Alex Christensen from comment #40) > (In reply to Sam Weinig from comment #38) > > > > Also, as I mentioned above, historically, there have been system APIs that > > > > are only exposed on a per-process level. So, if we added API to created data > > > > segregated WKWebsiteDataStores that depended on any of those, we would need > > > > multiple network processes (this was the case for the shimmed credentials > > > > for quite a while, I am not sure if it is still the case). > > > Historically this has been true. I have worked long and hard to make sure > > > it is no longer true. Bug 213048 is me finishing that work. > > > > There is no reason to believe this wouldn't be true in the future again. > > System frameworks often make assumptions about global state assuming all > > their clients will be applications and without thinking about the > > ramifications of other frameworks that need to use that functionality. > I disagree. Session based networking and not breaking session boundaries is > pretty fundamental these days, and if anyone breaks the boundaries it will > already be a substantial privacy breach. The network process is used for a lot more than networking today. For sure it should be named better, but that's where we stand. > > > > > > If we need an optimization to use one process for > > > > > multiple sessions, why would we not make that the default? > > > > > > > > I don't think I said don't make it the default. What I want to avoid is > > > > reducing the flexibility of the architecture by baking in the concept of a > > > > singleton network process, when conceptually, keeping it multi-instance and > > > > aligning it with WKWebsiteDataStores would work just as well. > > > If it's not the default, then nobody would choose to use an obscure setting > > > to slow down their application for no benefit. > > > > I don't understand this. What obscure setting are you referring to? > I was referring to a switch somewhere in your proposed solution to switch > between one network process and a network process per WebsiteDataStore. I > think such a switch would be hard to find and not very useful. > > > So far, what I have proposed does not create any additional processes > > overhead. > > If I have Safari open with 10 private tabs, your proposal creates 9 > additional processes. I can't tell if you are arguing in good faith here, but I have said numerous times that there is nothing in my proposal that would require this. We could still implement ephemeral data stores (e.g. safari private tabs) to use the network process from the default data store. I think I have come to my limit in debating this. You seem to have come to the conclusion on how you wish to fix this bug long ago, no more strongly demonstrated than by by the title of the bug being "Use a NetworkProcessProxy singleton instead of one per WebProcessPool" rather than the symptom you are trying to fix and you don't seem interested in feedback, compromise or consensus here, which is really disappointing.
I am indeed interested in feedback, which is why I not only have been discussing here but I went to webkit-dev to get more feedback. I haven't been taking Sam's suggestion to describe my changes more in the change log because I was prototyping something I didn't know could be done without breaking things. Now that I know it can be done, I'm planning to document more and explore tweaking and improving the solution. Comment 43 is the first concrete proposal for an alternative that does not introduce many unnecessary processes. Comment 30's "having a way for ephemeral data stores to use the default data stores network process might also be fine as an optimization" sounded to me like introducing SPI to do such, which I thought would be a bad solution. Feel free to be disappointed, but be disappointed in the miscommunication.
I renamed this bug based on feedback from comment 43 that it was biased towards my initial approach to solving the problem.
I plan to fix this in https://bugs.webkit.org/show_bug.cgi?id=216041 *** This bug has been marked as a duplicate of bug 216041 ***