Bug 168479

Summary: [WK2] Support termination of unresponsive WebContent processes that are in the background
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: WebKit2Assignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, barraclough, cgarcia, commit-queue, ggaren, kling, koivisto, krollin, mcatanzaro, mitz, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=168604
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Chris Dumez 2017-02-16 15:51:29 PST
Support termination of unresponsive WebContent processes that are in the background. This protects us against cases where a background tab is unresponsive and has high CPU usage, thus draining the battery without the user knowing about it.
Comment 1 Chris Dumez 2017-02-16 15:52:05 PST
<rdar://problem/30558745>
Comment 2 Chris Dumez 2017-02-16 16:49:23 PST
Created attachment 301856 [details]
Patch
Comment 3 WebKit Commit Bot 2017-02-16 16:50:32 PST
Attachment 301856 [details] did not pass style-queue:


ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 12 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Keith Rollin 2017-02-16 20:16:22 PST
Comment on attachment 301856 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301856&action=review

> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34
> +static const std::chrono::seconds maximumInterval { 86400 }; // 24 hours.

How was 86400 decided? My inclination would be to say it's too high. If the wedged webpage is on a laptop or phone, it could drain the battery in an hour or two, if not sooner. If I have a web page loaded and it gets wedged 2 days later, we won't know it for another day (other than by the battery dying). We'd like to catch issues before then. Perhaps a max time of 1 hour?

Question from a noob: what happens to the timer and the checking when a device is turned off or MobileSafari is switched out? Does the timer continue to run? What happens when the device is turned back on or MobileSafari is re-entered? Does the timer continue from where it was when MobileSafari was exited? Or does it resume after incorporating the full elapsed time? If the latter, do we want that? In other words, if we were due to check the process in an hour, but then turned off the phone for two hours, what behavior do we want when MobileSafari starts again?

This patch was inspired by a web page that responded poorly while in the background, and while Safari was started with this page already in the background. Does this patch do what we want in that case? In particular, I'm thinking that we should perhaps reset the interval back to 20 seconds in order to catch pages that don't awaken in the background politely.

(Crib notes on checking)
Intervals: 20 40 80 160 320 640 1280 2560 3600 3600 ...
Checks: 20 60 140 300 620 1260 2540 5100 8700 12300 ...
Or: 20s 1m 2m20s 5min 10m20s 21m 42m20s 1h25m 2h25m 3h25m ...

> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:70
> +            page->terminateProcess(WebPageProxy::TerminationReason::UnresponsiveWhileInBackground);

Are there possible race conditions here? I don't know what calling terminateProcess could lead to. Could the other pages in the process get notification that the process has terminated and then delete themselves before you get a chance to mark them with UnresponsiveWhileInBackground? Could they be removed from m_webProcessProxy.m_pageMap while you're still iterating over it (leading to the iterator becoming invalid, or the pages returned from it being invalid)? Could the WebProcessProxy that m_webProcessProxy refers to be deleted out from under you (you're not doing anything to protect its lifetime)?

Or is all that safely deferred until timerFired is done?
Comment 5 Chris Dumez 2017-02-16 20:48:29 PST
Comment on attachment 301856 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301856&action=review

>> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34
>> +static const std::chrono::seconds maximumInterval { 86400 }; // 24 hours.
> 
> How was 86400 decided? My inclination would be to say it's too high. If the wedged webpage is on a laptop or phone, it could drain the battery in an hour or two, if not sooner. If I have a web page loaded and it gets wedged 2 days later, we won't know it for another day (other than by the battery dying). We'd like to catch issues before then. Perhaps a max time of 1 hour?
> 
> Question from a noob: what happens to the timer and the checking when a device is turned off or MobileSafari is switched out? Does the timer continue to run? What happens when the device is turned back on or MobileSafari is re-entered? Does the timer continue from where it was when MobileSafari was exited? Or does it resume after incorporating the full elapsed time? If the latter, do we want that? In other words, if we were due to check the process in an hour, but then turned off the phone for two hours, what behavior do we want when MobileSafari starts again?
> 
> This patch was inspired by a web page that responded poorly while in the background, and while Safari was started with this page already in the background. Does this patch do what we want in that case? In particular, I'm thinking that we should perhaps reset the interval back to 20 seconds in order to catch pages that don't awaken in the background politely.
> 
> (Crib notes on checking)
> Intervals: 20 40 80 160 320 640 1280 2560 3600 3600 ...
> Checks: 20 60 140 300 620 1260 2540 5100 8700 12300 ...
> Or: 20s 1m 2m20s 5min 10m20s 21m 42m20s 1h25m 2h25m 3h25m ...

1. Let's discuss the max interval with Gavin. I came up with the current value myself and was mostly used as a way not to overflow.

2. I am not 100% sure but I believe that if you close your laptop lid and a timer was scheduled to fire while the system is suspended, then it will fire as soon as the system resumes.

3. The patch works great in cases like <rdar://problem/29808005>. I do not understand why you want to reset to 20 seconds, or when you would want to reset to 20 seconds. To be clear, in the case of a session restore, if one of the background tabs that was restored starts mis-behaving right away, then we will notice it after 20 seconds (because the responsiveness timer started as soon as the tab started loading, given that the tab was in the background).

>> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:70
>> +            page->terminateProcess(WebPageProxy::TerminationReason::UnresponsiveWhileInBackground);
> 
> Are there possible race conditions here? I don't know what calling terminateProcess could lead to. Could the other pages in the process get notification that the process has terminated and then delete themselves before you get a chance to mark them with UnresponsiveWhileInBackground? Could they be removed from m_webProcessProxy.m_pageMap while you're still iterating over it (leading to the iterator becoming invalid, or the pages returned from it being invalid)? Could the WebProcessProxy that m_webProcessProxy refers to be deleted out from under you (you're not doing anything to protect its lifetime)?
> 
> Or is all that safely deferred until timerFired is done?

This could indeed be unsafe if the client were to mess with the pages in a delegate. I think I will have to copy the pages before looping.
Comment 6 Chris Dumez 2017-02-17 09:14:59 PST
Created attachment 301941 [details]
Patch
Comment 7 WebKit Commit Bot 2017-02-17 09:16:32 PST
Attachment 301941 [details] did not pass style-queue:


ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 8 Chris Dumez 2017-02-17 10:19:46 PST
Created attachment 301951 [details]
Patch
Comment 9 WebKit Commit Bot 2017-02-17 10:22:29 PST
Attachment 301951 [details] did not pass style-queue:


ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 10 Gavin Barraclough 2017-02-17 10:53:28 PST
Comment on attachment 301941 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301941&action=review

> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34
> +static const std::chrono::seconds maximumInterval { 14400 }; // 4 hours.

Per conversation I like 8hrs. :-)

> Source/WebKit2/UIProcess/WebProcessProxy.cpp:274
> +    updateBackgroundResponsivenessTimer();

Is this call not redundant?

When a page is created/destroyed, if it changes the visible page count that automatically triggers updateBackgroundResponsivenessTimer. If it doesn't nothing to do.

I guess you do need a one time updateBackgroundResponsivenessTimer when the WebProcessProxy is first created (or the first page added) to get the timer going in the first place – maybe that could be from the constructor?

Do we need any special handling around process disconnect / reconnect? (If the processes crashes should we be suspending the timer.)

> Source/WebKit2/UIProcess/WebProcessProxy.cpp:282
> +    updateBackgroundResponsivenessTimer();

ditto
Comment 11 Chris Dumez 2017-02-17 11:08:44 PST
Comment on attachment 301941 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301941&action=review

>> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34
>> +static const std::chrono::seconds maximumInterval { 14400 }; // 4 hours.
> 
> Per conversation I like 8hrs. :-)

Yes, this is taken care of in the latest iteration.

>> Source/WebKit2/UIProcess/WebProcessProxy.cpp:274
>> +    updateBackgroundResponsivenessTimer();
> 
> Is this call not redundant?
> 
> When a page is created/destroyed, if it changes the visible page count that automatically triggers updateBackgroundResponsivenessTimer. If it doesn't nothing to do.
> 
> I guess you do need a one time updateBackgroundResponsivenessTimer when the WebProcessProxy is first created (or the first page added) to get the timer going in the first place – maybe that could be from the constructor?
> 
> Do we need any special handling around process disconnect / reconnect? (If the processes crashes should we be suspending the timer.)

This function is called when a page's process has crashed and we reconnect it to a fresh WebProcess.

Our policy to start the responsiveness timer is that:
1. There should be pages
2. There should be no-visible pages

In the case of a page crash and then reconnect, this method will be called. It is possible that when this is called, the page is not visible and we want to start the responsiveness timer. So I do not think this is redundant.

>> Source/WebKit2/UIProcess/WebProcessProxy.cpp:282
>> +    updateBackgroundResponsivenessTimer();
> 
> ditto

If you remove the last page that was visible, and they are still pages, then we want to start the responsiveness timer.
If we remove the last page (i.e. the process has no more pages), we want to stop the responsiveness timer.

So I think his call is needed.
Comment 12 Antti Koivisto 2017-02-19 12:29:50 PST
Comment on attachment 301951 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=301951&action=review

> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.cpp:34
> +static const std::chrono::seconds initialInterval { 20 };
> +static const std::chrono::seconds maximumInterval { 28800 }; // 8 hours.

These are better written as

static const std::chrono::seconds initialInterval { 20s };
static const std::chrono::seconds maximumInterval { 8h };

No need for comment.

> Source/WebKit2/UIProcess/BackgroundWebProcessResponsivenessTimer.h:33
> +class BackgroundWebProcessResponsivenessTimer {

It has a timer but is it a timer? Maybe UnresponsiveWebProcessTerminator or similar would describe it better?

> Source/WebKit2/UIProcess/WebPageProxy.cpp:1592
> +            m_process->responsivenessTimer().stop();

...especially since we have a ResponsivenessTimer already that is something else.
Comment 13 Chris Dumez 2017-02-19 16:23:06 PST
Created attachment 302099 [details]
Patch
Comment 14 WebKit Commit Bot 2017-02-19 16:24:41 PST
Attachment 302099 [details] did not pass style-queue:


ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 15 WebKit Commit Bot 2017-02-19 17:03:09 PST
Comment on attachment 302099 [details]
Patch

Rejecting attachment 302099 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'validate-changelog', '--check-oops', '--non-interactive', 302099, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

ChangeLog entry in Source/WebKit2/ChangeLog contains OOPS!.

Full output: http://webkit-queues.webkit.org/results/3156555
Comment 16 Chris Dumez 2017-02-19 18:02:52 PST
Created attachment 302105 [details]
Patch
Comment 17 WebKit Commit Bot 2017-02-19 18:05:38 PST
Attachment 302105 [details] did not pass style-queue:


ERROR: Source/WebKit2/UIProcess/WebProcessProxy.cpp:106:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 18 WebKit Commit Bot 2017-02-19 18:40:27 PST
Comment on attachment 302105 [details]
Patch

Clearing flags on attachment: 302105

Committed r212619: <http://trac.webkit.org/changeset/212619>
Comment 19 WebKit Commit Bot 2017-02-19 18:40:32 PST
All reviewed patches have been landed.  Closing bug.
Comment 20 mitz 2017-02-19 18:44:33 PST
What’s the rationale behind reporting these via the API differently from how termination of a Web process by the OS due to memory pressure is reported to the client?
Comment 21 Chris Dumez 2017-02-19 18:49:32 PST
(In reply to comment #20)
> What’s the rationale behind reporting these via the API differently from how
> termination of a Web process by the OS due to memory pressure is reported to
> the client?

Can you clarify? We do call the processDidCrash client delegate in most cases. The only case we opted not to call this delegate is when the tab has never been visible. We thought it was not worth displaying a crash banner in this case since it is not useful to the user. To the user, it will just look as if the tab started loading when he switched to it, which is not surprising behavior in case of session restore.
Comment 22 mitz 2017-02-19 18:59:32 PST
(In reply to comment #21)
> (In reply to comment #20)
> > What’s the rationale behind reporting these via the API differently from how
> > termination of a Web process by the OS due to memory pressure is reported to
> > the client?
> 
> Can you clarify? We do call the processDidCrash client delegate in most
> cases. The only case we opted not to call this delegate is when the tab has
> never been visible.

And that differs from the case where the OS would have killed such a Web content process, I believe, which seems like an arbitrary inconsistency.

> We thought it was not worth displaying a crash banner in
> this case since it is not useful to the user.

Displaying crash UI is the client app’s domain. I think it’s best if the framework faithfully, or at least consistently, represents to the client what is happening.

It seems like this change also introduces another API curiosity: a navigation (the reload) triggered not by the user nor by the webpage nor by the API. It seems like this could be confusing to the client’s navigation delegate, depending on how it’s reported (what is the navigation action being reported by the API in this case?).

> To the user, it will just look
> as if the tab started loading when he switched to it, which is not
> surprising behavior in case of session restore.

I do think that what things look like “to the user” is really the domain of the app, not the framework. Admittedly, this new behavior is opt-in, so apps sort of know what they’re signing up for when they opt in, but it seems like a weird contract. Consider an app like an email client, that after initially loading an email message in an off-screen web view performs some operations that read or modify the content. It may not know to repeat those operations after an automatic reload happened. Or, if it naively assumes that email messages don’t reload, then it may erroneously repeat something that shouldn’t be repeated (say: add a calendar event from an email message).
Comment 23 mitz 2017-02-19 19:03:00 PST
(In reply to comment #22)
> I do think that what things look like “to the user” is really the domain of
> the app, not the framework. Admittedly, this new behavior is opt-in, so apps
> sort of know what they’re signing up for when they opt in, but it seems like
> a weird contract. Consider an app like an email client, that after initially
> loading an email message in an off-screen web view performs some operations
> that read or modify the content. It may not know to repeat those operations
> after an automatic reload happened. Or, if it naively assumes that email
> messages don’t reload, then it may erroneously repeat something that
> shouldn’t be repeated (say: add a calendar event from an email message).

Further, consider that the email client executes some JavaScript code when an email message loads (in the offscreen web view) and that it’s this code that causes the Web process to become unresponsive. Now, the email app may have a failsafe mechanism that after some number of crashes of the Web process, it stops trying to reload the message into it. But if it is not aware of these crashes nor of the unresponsiveness that induced them, then it may just repeatedly enter the high-CPU-usage condition.
Comment 24 mitz 2017-02-19 19:24:33 PST
Given that there is already API for reporting to the client app that a Web process is unresponsive, and given that this new API is opt-in, it seems plainly redundant. An app that wants this behavior can just implement it using the existing API.

It seems like the issue this change was meant to address is that we only classify a Web process as “unresponsive” if someone is actually expecting a response from it, but off-screen web views may not be expected to respond, and therefor may be “unresponsive” for extended periods of time. It seems like this may also be the case for an on-screen web view, if it’s done loading and no user interaction is taking place. So really, shouldn’t we just address this by expanding the notion of “unresponsive” to include “would have been unresponsive if anyone actually bothered to ask it to do anything”?
Comment 25 Chris Dumez 2017-02-19 19:29:09 PST
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > What’s the rationale behind reporting these via the API differently from how
> > > termination of a Web process by the OS due to memory pressure is reported to
> > > the client?
> > 
> > Can you clarify? We do call the processDidCrash client delegate in most
> > cases. The only case we opted not to call this delegate is when the tab has
> > never been visible.
> 
> And that differs from the case where the OS would have killed such a Web
> content process, I believe, which seems like an arbitrary inconsistency.

To be clear, the part that is inconsistent and when we decide to reload instead of calling processDidCrash, right?
Comment 26 mitz 2017-02-19 19:30:22 PST
(In reply to comment #25)
> (In reply to comment #22)
> > (In reply to comment #21)
> > > (In reply to comment #20)
> > > > What’s the rationale behind reporting these via the API differently from how
> > > > termination of a Web process by the OS due to memory pressure is reported to
> > > > the client?
> > > 
> > > Can you clarify? We do call the processDidCrash client delegate in most
> > > cases. The only case we opted not to call this delegate is when the tab has
> > > never been visible.
> > 
> > And that differs from the case where the OS would have killed such a Web
> > content process, I believe, which seems like an arbitrary inconsistency.
> 
> To be clear, the part that is inconsistent and when we decide to reload
> instead of calling processDidCrash, right?

Right.
Comment 27 Chris Dumez 2017-02-19 19:30:54 PST
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > What’s the rationale behind reporting these via the API differently from how
> > > termination of a Web process by the OS due to memory pressure is reported to
> > > the client?
> > 
> > Can you clarify? We do call the processDidCrash client delegate in most
> > cases. The only case we opted not to call this delegate is when the tab has
> > never been visible.
> 
> And that differs from the case where the OS would have killed such a Web
> content process, I believe, which seems like an arbitrary inconsistency.
> 
> > We thought it was not worth displaying a crash banner in
> > this case since it is not useful to the user.
> 
> Displaying crash UI is the client app’s domain. I think it’s best if the
> framework faithfully, or at least consistently, represents to the client
> what is happening.

I think this is a good point. I'll discuss this with Gavin next week and he suggested this.
Comment 28 Chris Dumez 2017-02-19 19:36:43 PST
Note that part of the reason we wanted to do this in WebKit instead of client side is:
1. The next step is to kill processes that use too much CPU in the background too, not just unresponsive ones.
2. I believe doing this on client side is a bit more awkward because I don't think there is an API abstraction for a WebContent process, only WKPages.
Comment 29 mitz 2017-02-19 19:42:45 PST
(In reply to comment #28)
> Note that part of the reason we wanted to do this in WebKit instead of
> client side is:
> 1. The next step is to kill processes that use too much CPU in the
> background too, not just unresponsive ones.

I can see this as a natural extension of the operating system’s existing behavior of killing processes when under memory pressure, so it seems OK if the framework does it and reports it to the client the same way.

> 2. I believe doing this on client side is a bit more awkward because I don't
> think there is an API abstraction for a WebContent process, only WKPages.

Advanced clients already take process-sharing considerations into account when they respond to unresponsiveness (Safari on macOS is one such client that I know of, and I think the iOS version does that too). Perhaps the API that allows them to do this can be made better. Less advanced clients are probably going to only ever have one web view per process.
Comment 30 Michael Catanzaro 2017-02-20 07:36:45 PST
(In reply to comment #24)
> Given that there is already API for reporting to the client app that a Web
> process is unresponsive

Really? What's that API?

Mildly-related problem: We have an issue in Epiphany where attempting to Stop or close an unresponsive web process takes a very long time (I think 30 seconds). It would be useful for us to have a way to detect this and kill the process immediately. I do think this should really be handled by WebKit, not the application, though: if the web process doesn't respond to Stop or close within a second or two, something's gone wrong and it surely needs to be killed.
Comment 31 mitz 2017-02-20 07:48:02 PST
(In reply to comment #30)
> (In reply to comment #24)
> > Given that there is already API for reporting to the client app that a Web
> > process is unresponsive
> 
> Really? What's that API?

_webViewWebProcessDidBecomeUnresponsive: in WKNavigationDelegatePrivate.
Comment 32 Michael Catanzaro 2017-02-20 09:14:59 PST
(In reply to comment #31)
> _webViewWebProcessDidBecomeUnresponsive: in WKNavigationDelegatePrivate.

OK thanks, bug #168604.
Comment 33 Chris Dumez 2017-02-21 12:42:28 PST
(In reply to comment #27)
> (In reply to comment #22)
> > (In reply to comment #21)
> > > (In reply to comment #20)
> > > > What’s the rationale behind reporting these via the API differently from how
> > > > termination of a Web process by the OS due to memory pressure is reported to
> > > > the client?
> > > 
> > > Can you clarify? We do call the processDidCrash client delegate in most
> > > cases. The only case we opted not to call this delegate is when the tab has
> > > never been visible.
> > 
> > And that differs from the case where the OS would have killed such a Web
> > content process, I believe, which seems like an arbitrary inconsistency.
> > 
> > > We thought it was not worth displaying a crash banner in
> > > this case since it is not useful to the user.
> > 
> > Displaying crash UI is the client app’s domain. I think it’s best if the
> > framework faithfully, or at least consistently, represents to the client
> > what is happening.
> 
> I think this is a good point. I'll discuss this with Gavin next week and he
> suggested this.

Gavin is OK always reporting this to the client as a crash.