Bug 232603 - Terminate unresponsive network process by crashing it
Summary: Terminate unresponsive network process by crashing it
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Sihui Liu
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2021-11-01 21:58 PDT by Sihui Liu
Modified: 2021-11-02 13:25 PDT (History)
5 users (show)

See Also:


Attachments
Patch (5.63 KB, patch)
2021-11-01 22:18 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch (8.48 KB, patch)
2021-11-02 10:04 PDT, Sihui Liu
ews-feeder: commit-queue-
Details | Formatted Diff | Diff
Patch (8.52 KB, patch)
2021-11-02 10:56 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff
Patch for landing (8.82 KB, patch)
2021-11-02 11:58 PDT, Sihui Liu
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sihui Liu 2021-11-01 21:58:38 PDT
...
Comment 1 Sihui Liu 2021-11-01 22:18:31 PDT
Created attachment 443059 [details]
Patch
Comment 2 Chris Dumez 2021-11-02 07:34:24 PDT
Comment on attachment 443059 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=443059&action=review

> Source/WebKit/Platform/IPC/Connection.cpp:759
> +    if (message->messageName() == MessageName::Terminate) {

I like the idea of generating a crash report on termination.

One advice though would be to call a NEVER_INLINE function with a name such as terminateDueToIPCTerminateMessage(). It will make the generated crash logs much easier to interpret and will help distinguish it from other crashes that could happen in processIncomingMessage().

> Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:91
> +static constexpr Seconds unresponsivenessCheckPeriod = 15_s;

The responsiveness timeout is 6_s. So if the network process has to be unresponsive 3 times, it means it will take at least 3*6=18 seconds to happen, no? However, your new logic only triggers if this happens within 15 seconds. Therefore, I am not sure how it can trigger.

Also, I am not sure we want such logic at all? Why not be honest and report every network termination as a crash? Then we'd see how much of a problem is it and how urgent it is to address.
Comment 3 Sihui Liu 2021-11-02 09:43:25 PDT
(In reply to Chris Dumez from comment #2)
> Comment on attachment 443059 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=443059&action=review
> 
> > Source/WebKit/Platform/IPC/Connection.cpp:759
> > +    if (message->messageName() == MessageName::Terminate) {
> 
> I like the idea of generating a crash report on termination.
> 
> One advice though would be to call a NEVER_INLINE function with a name such
> as terminateDueToIPCTerminateMessage(). It will make the generated crash
> logs much easier to interpret and will help distinguish it from other
> crashes that could happen in processIncomingMessage().

Sure, will do.

> 
> > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:91
> > +static constexpr Seconds unresponsivenessCheckPeriod = 15_s;
> 
> The responsiveness timeout is 6_s. So if the network process has to be
> unresponsive 3 times, it means it will take at least 3*6=18 seconds to
> happen, no? However, your new logic only triggers if this happens within 15
> seconds. Therefore, I am not sure how it can trigger.
> 
> Also, I am not sure we want such logic at all? Why not be honest and report
> every network termination as a crash? Then we'd see how much of a problem is
> it and how urgent it is to address.


The timestamp is set when the first unresponsiveness case appears, so it's first case -6s- second case -6s- third case, which is about 12 seconds for 3 (I think there might be delay so I use 15).

I worried that will generate many same crash reports on user's device (which might be costly compared to our old behavior, since we know the network process can get terminated a lot by the number of web process crash reports rdar://84745326 (~20 failed attempts will cause one crash)). The goal is to identify the cause, and with the limit we will still know the magnitude of the issue.
Comment 4 Sihui Liu 2021-11-02 10:04:46 PDT
Created attachment 443102 [details]
Patch
Comment 5 Sihui Liu 2021-11-02 10:56:57 PDT
Created attachment 443109 [details]
Patch
Comment 6 Chris Dumez 2021-11-02 10:59:14 PDT
(In reply to Sihui Liu from comment #3)
> (In reply to Chris Dumez from comment #2)
> > Comment on attachment 443059 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=443059&action=review
> > 
> > > Source/WebKit/Platform/IPC/Connection.cpp:759
> > > +    if (message->messageName() == MessageName::Terminate) {
> > 
> > I like the idea of generating a crash report on termination.
> > 
> > One advice though would be to call a NEVER_INLINE function with a name such
> > as terminateDueToIPCTerminateMessage(). It will make the generated crash
> > logs much easier to interpret and will help distinguish it from other
> > crashes that could happen in processIncomingMessage().
> 
> Sure, will do.
> 
> > 
> > > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:91
> > > +static constexpr Seconds unresponsivenessCheckPeriod = 15_s;
> > 
> > The responsiveness timeout is 6_s. So if the network process has to be
> > unresponsive 3 times, it means it will take at least 3*6=18 seconds to
> > happen, no? However, your new logic only triggers if this happens within 15
> > seconds. Therefore, I am not sure how it can trigger.
> > 
> > Also, I am not sure we want such logic at all? Why not be honest and report
> > every network termination as a crash? Then we'd see how much of a problem is
> > it and how urgent it is to address.
> 
> 
> The timestamp is set when the first unresponsiveness case appears, so it's
> first case -6s- second case -6s- third case, which is about 12 seconds for 3
> (I think there might be delay so I use 15).

Ah, indeed. I missed that.

> 
> I worried that will generate many same crash reports on user's device (which
> might be costly compared to our old behavior, since we know the network
> process can get terminated a lot by the number of web process crash reports
> rdar://84745326 (~20 failed attempts will cause one crash)). The goal is to
> identify the cause, and with the limit we will still know the magnitude of
> the issue.
Comment 7 Chris Dumez 2021-11-02 11:13:51 PDT
Comment on attachment 443109 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=443109&action=review

r=me with comment.

> Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:153
> +    if (shouldTerminateNetworkProcessBySendingMessage()) {

Please add a comment to explain that we're doing this to generate a crash report when the network process is killed for being unresponsive several times in a row. It is other not clear why sometimes we're killing by sending an IPC and sometimes not.
Comment 8 Sihui Liu 2021-11-02 11:58:50 PDT
Created attachment 443118 [details]
Patch for landing
Comment 9 EWS 2021-11-02 12:42:13 PDT
Committed r285177 (243810@main): <https://commits.webkit.org/243810@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 443118 [details].
Comment 10 Radar WebKit Bug Importer 2021-11-02 12:43:18 PDT
<rdar://problem/84942667>
Comment 11 Simon Fraser (smfr) 2021-11-02 13:23:43 PDT
Did you consider using os_log_fault like we do for MESSAGE_CHECK?
Comment 12 Chris Dumez 2021-11-02 13:25:14 PDT
(In reply to Simon Fraser (smfr) from comment #11)
> Did you consider using os_log_fault like we do for MESSAGE_CHECK?

2 things about os_log_fault:
1. It generates a simulated crash, not a real crash (so it doesn't exit the process)
2. It only generates the simulated crash on iOS, not macOS.