RESOLVED FIXED 232603
Terminate unresponsive network process by crashing it
https://bugs.webkit.org/show_bug.cgi?id=232603
Summary Terminate unresponsive network process by crashing it
Sihui Liu
Reported 2021-11-01 21:58:38 PDT
...
Attachments
Patch (5.63 KB, patch)
2021-11-01 22:18 PDT, Sihui Liu
no flags
Patch (8.48 KB, patch)
2021-11-02 10:04 PDT, Sihui Liu
ews-feeder: commit-queue-
Patch (8.52 KB, patch)
2021-11-02 10:56 PDT, Sihui Liu
no flags
Patch for landing (8.82 KB, patch)
2021-11-02 11:58 PDT, Sihui Liu
no flags
Sihui Liu
Comment 1 2021-11-01 22:18:31 PDT
Chris Dumez
Comment 2 2021-11-02 07:34:24 PDT
Comment on attachment 443059 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=443059&action=review > Source/WebKit/Platform/IPC/Connection.cpp:759 > + if (message->messageName() == MessageName::Terminate) { I like the idea of generating a crash report on termination. One advice though would be to call a NEVER_INLINE function with a name such as terminateDueToIPCTerminateMessage(). It will make the generated crash logs much easier to interpret and will help distinguish it from other crashes that could happen in processIncomingMessage(). > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:91 > +static constexpr Seconds unresponsivenessCheckPeriod = 15_s; The responsiveness timeout is 6_s. So if the network process has to be unresponsive 3 times, it means it will take at least 3*6=18 seconds to happen, no? However, your new logic only triggers if this happens within 15 seconds. Therefore, I am not sure how it can trigger. Also, I am not sure we want such logic at all? Why not be honest and report every network termination as a crash? Then we'd see how much of a problem is it and how urgent it is to address.
Sihui Liu
Comment 3 2021-11-02 09:43:25 PDT
(In reply to Chris Dumez from comment #2) > Comment on attachment 443059 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=443059&action=review > > > Source/WebKit/Platform/IPC/Connection.cpp:759 > > + if (message->messageName() == MessageName::Terminate) { > > I like the idea of generating a crash report on termination. > > One advice though would be to call a NEVER_INLINE function with a name such > as terminateDueToIPCTerminateMessage(). It will make the generated crash > logs much easier to interpret and will help distinguish it from other > crashes that could happen in processIncomingMessage(). Sure, will do. > > > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:91 > > +static constexpr Seconds unresponsivenessCheckPeriod = 15_s; > > The responsiveness timeout is 6_s. So if the network process has to be > unresponsive 3 times, it means it will take at least 3*6=18 seconds to > happen, no? However, your new logic only triggers if this happens within 15 > seconds. Therefore, I am not sure how it can trigger. > > Also, I am not sure we want such logic at all? Why not be honest and report > every network termination as a crash? Then we'd see how much of a problem is > it and how urgent it is to address. The timestamp is set when the first unresponsiveness case appears, so it's first case -6s- second case -6s- third case, which is about 12 seconds for 3 (I think there might be delay so I use 15). I worried that will generate many same crash reports on user's device (which might be costly compared to our old behavior, since we know the network process can get terminated a lot by the number of web process crash reports rdar://84745326 (~20 failed attempts will cause one crash)). The goal is to identify the cause, and with the limit we will still know the magnitude of the issue.
Sihui Liu
Comment 4 2021-11-02 10:04:46 PDT
Sihui Liu
Comment 5 2021-11-02 10:56:57 PDT
Chris Dumez
Comment 6 2021-11-02 10:59:14 PDT
(In reply to Sihui Liu from comment #3) > (In reply to Chris Dumez from comment #2) > > Comment on attachment 443059 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=443059&action=review > > > > > Source/WebKit/Platform/IPC/Connection.cpp:759 > > > + if (message->messageName() == MessageName::Terminate) { > > > > I like the idea of generating a crash report on termination. > > > > One advice though would be to call a NEVER_INLINE function with a name such > > as terminateDueToIPCTerminateMessage(). It will make the generated crash > > logs much easier to interpret and will help distinguish it from other > > crashes that could happen in processIncomingMessage(). > > Sure, will do. > > > > > > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:91 > > > +static constexpr Seconds unresponsivenessCheckPeriod = 15_s; > > > > The responsiveness timeout is 6_s. So if the network process has to be > > unresponsive 3 times, it means it will take at least 3*6=18 seconds to > > happen, no? However, your new logic only triggers if this happens within 15 > > seconds. Therefore, I am not sure how it can trigger. > > > > Also, I am not sure we want such logic at all? Why not be honest and report > > every network termination as a crash? Then we'd see how much of a problem is > > it and how urgent it is to address. > > > The timestamp is set when the first unresponsiveness case appears, so it's > first case -6s- second case -6s- third case, which is about 12 seconds for 3 > (I think there might be delay so I use 15). Ah, indeed. I missed that. > > I worried that will generate many same crash reports on user's device (which > might be costly compared to our old behavior, since we know the network > process can get terminated a lot by the number of web process crash reports > rdar://84745326 (~20 failed attempts will cause one crash)). The goal is to > identify the cause, and with the limit we will still know the magnitude of > the issue.
Chris Dumez
Comment 7 2021-11-02 11:13:51 PDT
Comment on attachment 443109 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=443109&action=review r=me with comment. > Source/WebKit/UIProcess/Network/NetworkProcessProxy.cpp:153 > + if (shouldTerminateNetworkProcessBySendingMessage()) { Please add a comment to explain that we're doing this to generate a crash report when the network process is killed for being unresponsive several times in a row. It is other not clear why sometimes we're killing by sending an IPC and sometimes not.
Sihui Liu
Comment 8 2021-11-02 11:58:50 PDT
Created attachment 443118 [details] Patch for landing
EWS
Comment 9 2021-11-02 12:42:13 PDT
Committed r285177 (243810@main): <https://commits.webkit.org/243810@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 443118 [details].
Radar WebKit Bug Importer
Comment 10 2021-11-02 12:43:18 PDT
Simon Fraser (smfr)
Comment 11 2021-11-02 13:23:43 PDT
Did you consider using os_log_fault like we do for MESSAGE_CHECK?
Chris Dumez
Comment 12 2021-11-02 13:25:14 PDT
(In reply to Simon Fraser (smfr) from comment #11) > Did you consider using os_log_fault like we do for MESSAGE_CHECK? 2 things about os_log_fault: 1. It generates a simulated crash, not a real crash (so it doesn't exit the process) 2. It only generates the simulated crash on iOS, not macOS.
Note You need to log in before you can comment on or make changes to this bug.