168479 – [WK2] Support termination of unresponsive WebContent processes that are in the background

Chris Dumez

Reported 2017-02-16 15:51:29 PST

Support termination of unresponsive WebContent processes that are in the background. This protects us against cases where a background tab is unresponsive and has high CPU usage, thus draining the battery without the user knowing about it.

Chris Dumez

Comment 1 2017-02-16 15:52:05 PST

<rdar://problem/30558745>

Chris Dumez

Comment 2 2017-02-16 16:49:23 PST

Created attachment 301856 [details] Patch

WebKit Commit Bot

Comment 3 2017-02-16 16:50:32 PST

Attachment 301856 [details] did not pass style-queue: ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 12 files If any of these errors are false positives, please file a bug against check-webkit-style.

Keith Rollin

Comment 4 2017-02-16 20:16:22 PST

Comment on attachment 301856 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301856&action=review > Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34 > +static const std::chrono::seconds maximumInterval { 86400 }; // 24 hours. How was 86400 decided? My inclination would be to say it's too high. If the wedged webpage is on a laptop or phone, it could drain the battery in an hour or two, if not sooner. If I have a web page loaded and it gets wedged 2 days later, we won't know it for another day (other than by the battery dying). We'd like to catch issues before then. Perhaps a max time of 1 hour? Question from a noob: what happens to the timer and the checking when a device is turned off or MobileSafari is switched out? Does the timer continue to run? What happens when the device is turned back on or MobileSafari is re-entered? Does the timer continue from where it was when MobileSafari was exited? Or does it resume after incorporating the full elapsed time? If the latter, do we want that? In other words, if we were due to check the process in an hour, but then turned off the phone for two hours, what behavior do we want when MobileSafari starts again? This patch was inspired by a web page that responded poorly while in the background, and while Safari was started with this page already in the background. Does this patch do what we want in that case? In particular, I'm thinking that we should perhaps reset the interval back to 20 seconds in order to catch pages that don't awaken in the background politely. (Crib notes on checking) Intervals: 20 40 80 160 320 640 1280 2560 3600 3600 ... Checks: 20 60 140 300 620 1260 2540 5100 8700 12300 ... Or: 20s 1m 2m20s 5min 10m20s 21m 42m20s 1h25m 2h25m 3h25m ... > Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:70 > + page->terminateProcess(WebPageProxy::TerminationReason::UnresponsiveWhileInBackground); Are there possible race conditions here? I don't know what calling terminateProcess could lead to. Could the other pages in the process get notification that the process has terminated and then delete themselves before you get a chance to mark them with UnresponsiveWhileInBackground? Could they be removed from m_webProcessProxy.m_pageMap while you're still iterating over it (leading to the iterator becoming invalid, or the pages returned from it being invalid)? Could the WebProcessProxy that m_webProcessProxy refers to be deleted out from under you (you're not doing anything to protect its lifetime)? Or is all that safely deferred until timerFired is done?

Chris Dumez

Comment 5 2017-02-16 20:48:29 PST

Comment on attachment 301856 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301856&action=review >> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34 >> +static const std::chrono::seconds maximumInterval { 86400 }; // 24 hours. > > How was 86400 decided? My inclination would be to say it's too high. If the wedged webpage is on a laptop or phone, it could drain the battery in an hour or two, if not sooner. If I have a web page loaded and it gets wedged 2 days later, we won't know it for another day (other than by the battery dying). We'd like to catch issues before then. Perhaps a max time of 1 hour? > > Question from a noob: what happens to the timer and the checking when a device is turned off or MobileSafari is switched out? Does the timer continue to run? What happens when the device is turned back on or MobileSafari is re-entered? Does the timer continue from where it was when MobileSafari was exited? Or does it resume after incorporating the full elapsed time? If the latter, do we want that? In other words, if we were due to check the process in an hour, but then turned off the phone for two hours, what behavior do we want when MobileSafari starts again? > > This patch was inspired by a web page that responded poorly while in the background, and while Safari was started with this page already in the background. Does this patch do what we want in that case? In particular, I'm thinking that we should perhaps reset the interval back to 20 seconds in order to catch pages that don't awaken in the background politely. > > (Crib notes on checking) > Intervals: 20 40 80 160 320 640 1280 2560 3600 3600 ... > Checks: 20 60 140 300 620 1260 2540 5100 8700 12300 ... > Or: 20s 1m 2m20s 5min 10m20s 21m 42m20s 1h25m 2h25m 3h25m ... 1. Let's discuss the max interval with Gavin. I came up with the current value myself and was mostly used as a way not to overflow. 2. I am not 100% sure but I believe that if you close your laptop lid and a timer was scheduled to fire while the system is suspended, then it will fire as soon as the system resumes. 3. The patch works great in cases like <rdar://problem/29808005>. I do not understand why you want to reset to 20 seconds, or when you would want to reset to 20 seconds. To be clear, in the case of a session restore, if one of the background tabs that was restored starts mis-behaving right away, then we will notice it after 20 seconds (because the responsiveness timer started as soon as the tab started loading, given that the tab was in the background). >> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:70 >> + page->terminateProcess(WebPageProxy::TerminationReason::UnresponsiveWhileInBackground); > > Are there possible race conditions here? I don't know what calling terminateProcess could lead to. Could the other pages in the process get notification that the process has terminated and then delete themselves before you get a chance to mark them with UnresponsiveWhileInBackground? Could they be removed from m_webProcessProxy.m_pageMap while you're still iterating over it (leading to the iterator becoming invalid, or the pages returned from it being invalid)? Could the WebProcessProxy that m_webProcessProxy refers to be deleted out from under you (you're not doing anything to protect its lifetime)? > > Or is all that safely deferred until timerFired is done? This could indeed be unsafe if the client were to mess with the pages in a delegate. I think I will have to copy the pages before looping.

Chris Dumez

Comment 6 2017-02-17 09:14:59 PST

Created attachment 301941 [details] Patch

WebKit Commit Bot

Comment 7 2017-02-17 09:16:32 PST

Attachment 301941 [details] did not pass style-queue: ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.

Chris Dumez

Comment 8 2017-02-17 10:19:46 PST

Created attachment 301951 [details] Patch

WebKit Commit Bot

Comment 9 2017-02-17 10:22:29 PST

Attachment 301951 [details] did not pass style-queue: ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.

Gavin Barraclough

Comment 10 2017-02-17 10:53:28 PST

Comment on attachment 301941 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301941&action=review > Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34 > +static const std::chrono::seconds maximumInterval { 14400 }; // 4 hours. Per conversation I like 8hrs. :-) > Source/WebKit2/UIProcess/WebProcessProxy.cpp:274 > + updateBackgroundResponsivenessTimer(); Is this call not redundant? When a page is created/destroyed, if it changes the visible page count that automatically triggers updateBackgroundResponsivenessTimer. If it doesn't nothing to do. I guess you do need a one time updateBackgroundResponsivenessTimer when the WebProcessProxy is first created (or the first page added) to get the timer going in the first place – maybe that could be from the constructor? Do we need any special handling around process disconnect / reconnect? (If the processes crashes should we be suspending the timer.) > Source/WebKit2/UIProcess/WebProcessProxy.cpp:282 > + updateBackgroundResponsivenessTimer(); ditto

Chris Dumez

Comment 11 2017-02-17 11:08:44 PST

Comment on attachment 301941 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301941&action=review >> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34 >> +static const std::chrono::seconds maximumInterval { 14400 }; // 4 hours. > > Per conversation I like 8hrs. :-) Yes, this is taken care of in the latest iteration. >> Source/WebKit2/UIProcess/WebProcessProxy.cpp:274 >> + updateBackgroundResponsivenessTimer(); > > Is this call not redundant? > > When a page is created/destroyed, if it changes the visible page count that automatically triggers updateBackgroundResponsivenessTimer. If it doesn't nothing to do. > > I guess you do need a one time updateBackgroundResponsivenessTimer when the WebProcessProxy is first created (or the first page added) to get the timer going in the first place – maybe that could be from the constructor? > > Do we need any special handling around process disconnect / reconnect? (If the processes crashes should we be suspending the timer.) This function is called when a page's process has crashed and we reconnect it to a fresh WebProcess. Our policy to start the responsiveness timer is that: 1. There should be pages 2. There should be no-visible pages In the case of a page crash and then reconnect, this method will be called. It is possible that when this is called, the page is not visible and we want to start the responsiveness timer. So I do not think this is redundant. >> Source/WebKit2/UIProcess/WebProcessProxy.cpp:282 >> + updateBackgroundResponsivenessTimer(); > > ditto If you remove the last page that was visible, and they are still pages, then we want to start the responsiveness timer. If we remove the last page (i.e. the process has no more pages), we want to stop the responsiveness timer. So I think his call is needed.

Antti Koivisto

Comment 12 2017-02-19 12:29:50 PST

Comment on attachment 301951 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=301951&action=review > Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34 > +static const std::chrono::seconds initialInterval { 20 }; > +static const std::chrono::seconds maximumInterval { 28800 }; // 8 hours. These are better written as static const std::chrono::seconds initialInterval { 20s }; static const std::chrono::seconds maximumInterval { 8h }; No need for comment. > Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.h:33 > +class BackgroundWebProcessResponsivenessTimer { It has a timer but is it a timer? Maybe UnresponsiveWebProcessTerminator or similar would describe it better? > Source/WebKit2/UIProcess/WebPageProxy.cpp:1592 > + m_process->responsivenessTimer().stop(); ...especially since we have a ResponsivenessTimer already that is something else.

Chris Dumez

Comment 13 2017-02-19 16:23:06 PST

Created attachment 302099 [details] Patch

WebKit Commit Bot

Comment 14 2017-02-19 16:24:41 PST

Attachment 302099 [details] did not pass style-queue: ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Commit Bot

Comment 15 2017-02-19 17:03:09 PST

Comment on attachment 302099 [details] Patch Rejecting attachment 302099 [details] from commit-queue. Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'validate-changelog', '--check-oops', '--non-interactive', 302099, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit ChangeLog entry in Source/WebKit2/ChangeLog contains OOPS!. Full output: http://webkit-queues.webkit.org/results/3156555

Chris Dumez

Comment 16 2017-02-19 18:02:52 PST

Created attachment 302105 [details] Patch

WebKit Commit Bot

Comment 17 2017-02-19 18:05:38 PST

Attachment 302105 [details] did not pass style-queue: ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 1 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.

WebKit Commit Bot

Comment 18 2017-02-19 18:40:27 PST

Comment on attachment 302105 [details] Patch Clearing flags on attachment: 302105 Committed r212619: <http://trac.webkit.org/changeset/212619>

WebKit Commit Bot

Comment 19 2017-02-19 18:40:32 PST

All reviewed patches have been landed. Closing bug.

mitz

Comment 20 2017-02-19 18:44:33 PST

What’s the rationale behind reporting these via the API differently from how termination of a Web process by the OS due to memory pressure is reported to the client?

Chris Dumez

Comment 21 2017-02-19 18:49:32 PST

(In reply to comment #20) > What’s the rationale behind reporting these via the API differently from how > termination of a Web process by the OS due to memory pressure is reported to > the client? Can you clarify? We do call the processDidCrash client delegate in most cases. The only case we opted not to call this delegate is when the tab has never been visible. We thought it was not worth displaying a crash banner in this case since it is not useful to the user. To the user, it will just look as if the tab started loading when he switched to it, which is not surprising behavior in case of session restore.

mitz

Comment 22 2017-02-19 18:59:32 PST

(In reply to comment #21) > (In reply to comment #20) > > What’s the rationale behind reporting these via the API differently from how > > termination of a Web process by the OS due to memory pressure is reported to > > the client? > > Can you clarify? We do call the processDidCrash client delegate in most > cases. The only case we opted not to call this delegate is when the tab has > never been visible. And that differs from the case where the OS would have killed such a Web content process, I believe, which seems like an arbitrary inconsistency. > We thought it was not worth displaying a crash banner in > this case since it is not useful to the user. Displaying crash UI is the client app’s domain. I think it’s best if the framework faithfully, or at least consistently, represents to the client what is happening. It seems like this change also introduces another API curiosity: a navigation (the reload) triggered not by the user nor by the webpage nor by the API. It seems like this could be confusing to the client’s navigation delegate, depending on how it’s reported (what is the navigation action being reported by the API in this case?). > To the user, it will just look > as if the tab started loading when he switched to it, which is not > surprising behavior in case of session restore. I do think that what things look like “to the user” is really the domain of the app, not the framework. Admittedly, this new behavior is opt-in, so apps sort of know what they’re signing up for when they opt in, but it seems like a weird contract. Consider an app like an email client, that after initially loading an email message in an off-screen web view performs some operations that read or modify the content. It may not know to repeat those operations after an automatic reload happened. Or, if it naively assumes that email messages don’t reload, then it may erroneously repeat something that shouldn’t be repeated (say: add a calendar event from an email message).

mitz

Comment 23 2017-02-19 19:03:00 PST

(In reply to comment #22) > I do think that what things look like “to the user” is really the domain of > the app, not the framework. Admittedly, this new behavior is opt-in, so apps > sort of know what they’re signing up for when they opt in, but it seems like > a weird contract. Consider an app like an email client, that after initially > loading an email message in an off-screen web view performs some operations > that read or modify the content. It may not know to repeat those operations > after an automatic reload happened. Or, if it naively assumes that email > messages don’t reload, then it may erroneously repeat something that > shouldn’t be repeated (say: add a calendar event from an email message). Further, consider that the email client executes some JavaScript code when an email message loads (in the offscreen web view) and that it’s this code that causes the Web process to become unresponsive. Now, the email app may have a failsafe mechanism that after some number of crashes of the Web process, it stops trying to reload the message into it. But if it is not aware of these crashes nor of the unresponsiveness that induced them, then it may just repeatedly enter the high-CPU-usage condition.

mitz

Comment 24 2017-02-19 19:24:33 PST

Given that there is already API for reporting to the client app that a Web process is unresponsive, and given that this new API is opt-in, it seems plainly redundant. An app that wants this behavior can just implement it using the existing API. It seems like the issue this change was meant to address is that we only classify a Web process as “unresponsive” if someone is actually expecting a response from it, but off-screen web views may not be expected to respond, and therefor may be “unresponsive” for extended periods of time. It seems like this may also be the case for an on-screen web view, if it’s done loading and no user interaction is taking place. So really, shouldn’t we just address this by expanding the notion of “unresponsive” to include “would have been unresponsive if anyone actually bothered to ask it to do anything”?

Chris Dumez

Comment 25 2017-02-19 19:29:09 PST

(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #20) > > > What’s the rationale behind reporting these via the API differently from how > > > termination of a Web process by the OS due to memory pressure is reported to > > > the client? > > > > Can you clarify? We do call the processDidCrash client delegate in most > > cases. The only case we opted not to call this delegate is when the tab has > > never been visible. > > And that differs from the case where the OS would have killed such a Web > content process, I believe, which seems like an arbitrary inconsistency. To be clear, the part that is inconsistent and when we decide to reload instead of calling processDidCrash, right?

mitz

Comment 26 2017-02-19 19:30:22 PST

(In reply to comment #25) > (In reply to comment #22) > > (In reply to comment #21) > > > (In reply to comment #20) > > > > What’s the rationale behind reporting these via the API differently from how > > > > termination of a Web process by the OS due to memory pressure is reported to > > > > the client? > > > > > > Can you clarify? We do call the processDidCrash client delegate in most > > > cases. The only case we opted not to call this delegate is when the tab has > > > never been visible. > > > > And that differs from the case where the OS would have killed such a Web > > content process, I believe, which seems like an arbitrary inconsistency. > > To be clear, the part that is inconsistent and when we decide to reload > instead of calling processDidCrash, right? Right.

Chris Dumez

Comment 27 2017-02-19 19:30:54 PST

(In reply to comment #22) > (In reply to comment #21) > > (In reply to comment #20) > > > What’s the rationale behind reporting these via the API differently from how > > > termination of a Web process by the OS due to memory pressure is reported to > > > the client? > > > > Can you clarify? We do call the processDidCrash client delegate in most > > cases. The only case we opted not to call this delegate is when the tab has > > never been visible. > > And that differs from the case where the OS would have killed such a Web > content process, I believe, which seems like an arbitrary inconsistency. > > > We thought it was not worth displaying a crash banner in > > this case since it is not useful to the user. > > Displaying crash UI is the client app’s domain. I think it’s best if the > framework faithfully, or at least consistently, represents to the client > what is happening. I think this is a good point. I'll discuss this with Gavin next week and he suggested this.

Chris Dumez

Comment 28 2017-02-19 19:36:43 PST

Note that part of the reason we wanted to do this in WebKit instead of client side is: 1. The next step is to kill processes that use too much CPU in the background too, not just unresponsive ones. 2. I believe doing this on client side is a bit more awkward because I don't think there is an API abstraction for a WebContent process, only WKPages.

mitz

Comment 29 2017-02-19 19:42:45 PST

(In reply to comment #28) > Note that part of the reason we wanted to do this in WebKit instead of > client side is: > 1. The next step is to kill processes that use too much CPU in the > background too, not just unresponsive ones. I can see this as a natural extension of the operating system’s existing behavior of killing processes when under memory pressure, so it seems OK if the framework does it and reports it to the client the same way. > 2. I believe doing this on client side is a bit more awkward because I don't > think there is an API abstraction for a WebContent process, only WKPages. Advanced clients already take process-sharing considerations into account when they respond to unresponsiveness (Safari on macOS is one such client that I know of, and I think the iOS version does that too). Perhaps the API that allows them to do this can be made better. Less advanced clients are probably going to only ever have one web view per process.

Michael Catanzaro

Comment 30 2017-02-20 07:36:45 PST

(In reply to comment #24) > Given that there is already API for reporting to the client app that a Web > process is unresponsive Really? What's that API? Mildly-related problem: We have an issue in Epiphany where attempting to Stop or close an unresponsive web process takes a very long time (I think 30 seconds). It would be useful for us to have a way to detect this and kill the process immediately. I do think this should really be handled by WebKit, not the application, though: if the web process doesn't respond to Stop or close within a second or two, something's gone wrong and it surely needs to be killed.

mitz

Comment 31 2017-02-20 07:48:02 PST

(In reply to comment #30) > (In reply to comment #24) > > Given that there is already API for reporting to the client app that a Web > > process is unresponsive > > Really? What's that API? _webViewWebProcessDidBecomeUnresponsive: in WKNavigationDelegatePrivate.

Michael Catanzaro

Comment 32 2017-02-20 09:14:59 PST

(In reply to comment #31) > _webViewWebProcessDidBecomeUnresponsive: in WKNavigationDelegatePrivate. OK thanks, bug #168604.

Chris Dumez

Comment 33 2017-02-21 12:42:28 PST

(In reply to comment #27) > (In reply to comment #22) > > (In reply to comment #21) > > > (In reply to comment #20) > > > > What’s the rationale behind reporting these via the API differently from how > > > > termination of a Web process by the OS due to memory pressure is reported to > > > > the client? > > > > > > Can you clarify? We do call the processDidCrash client delegate in most > > > cases. The only case we opted not to call this delegate is when the tab has > > > never been visible. > > > > And that differs from the case where the OS would have killed such a Web > > content process, I believe, which seems like an arbitrary inconsistency. > > > > > We thought it was not worth displaying a crash banner in > > > this case since it is not useful to the user. > > > > Displaying crash UI is the client app’s domain. I think it’s best if the > > framework faithfully, or at least consistently, represents to the client > > what is happening. > > I think this is a good point. I'll discuss this with Gavin next week and he > suggested this. Gavin is OK always reporting this to the client as a crash.

Attachments
Patch (31.33 KB, patch) 2017-02-16 16:49 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (30.82 KB, patch) 2017-02-17 09:14 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (30.82 KB, patch) 2017-02-17 10:19 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (30.78 KB, patch) 2017-02-19 16:23 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Patch (30.78 KB, patch) 2017-02-19 18:02 PST, Chris Dumez	no flags	Details Formatted Diff Diff
Show Obsolete (4) View All Add attachment proposed patch, testcase, etc.