Bug 180477

Summary: We should be able to recover after a Service Worker process crash
Product: WebKit Reporter: Chris Dumez <cdumez>
Component: WebKit2Assignee: Chris Dumez <cdumez>
Status: RESOLVED FIXED    
Severity: Normal CC: beidson, commit-queue, ggaren, webkit-bug-importer, youennf
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=180659
Bug Depends on: 180496    
Bug Blocks:    
Attachments:
Description Flags
WIP Patch
none
WIP Patch
none
Patch
none
Patch
none
Patch
none
Patch none

Description Chris Dumez 2017-12-06 08:58:57 PST
We should be able to recover after a Service Worker process crash.
Comment 1 Chris Dumez 2017-12-06 17:01:48 PST
Created attachment 328654 [details]
WIP Patch
Comment 2 Chris Dumez 2017-12-06 17:04:02 PST
Created attachment 328656 [details]
WIP Patch

Will clean up tonight.
Comment 3 Chris Dumez 2017-12-06 17:04:12 PST
*** Bug 180496 has been marked as a duplicate of this bug. ***
Comment 4 Chris Dumez 2017-12-06 20:04:54 PST
Created attachment 328675 [details]
Patch
Comment 5 Brady Eidson 2017-12-06 21:27:31 PST
Comment on attachment 328675 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328675&action=review

> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:98
> +    template<typename U> static void sendToContextProcess(U&& message);

Why static?

In a future world of multiple context processes, how can this possibly be static?

If it's a convenience because of the lambda captures... can we capture the connection, or a way of looking the connection up?
Comment 6 Chris Dumez 2017-12-06 21:43:21 PST
Comment on attachment 328675 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328675&action=review

>> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:98
>> +    template<typename U> static void sendToContextProcess(U&& message);
> 
> Why static?
> 
> In a future world of multiple context processes, how can this possibly be static?
> 
> If it's a convenience because of the lambda captures... can we capture the connection, or a way of looking the connection up?

Yes, it is because of the lambda captures. It would be unsafe to capture |this| was I have no way to make sure the connection is still alive when the lambda is called. I'll try and find another way without making this static.
Comment 7 Chris Dumez 2017-12-06 21:48:24 PST
Comment on attachment 328675 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328675&action=review

>>> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:98
>>> +    template<typename U> static void sendToContextProcess(U&& message);
>> 
>> Why static?
>> 
>> In a future world of multiple context processes, how can this possibly be static?
>> 
>> If it's a convenience because of the lambda captures... can we capture the connection, or a way of looking the connection up?
> 
> Yes, it is because of the lambda captures. It would be unsafe to capture |this| was I have no way to make sure the connection is still alive when the lambda is called. I'll try and find another way without making this static.

WebSWServerConnection is not refcounted so I cannot protect it when I capture.
Comment 8 youenn fablet 2017-12-06 21:51:58 PST
Comment on attachment 328675 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328675&action=review

> Source/WebCore/ChangeLog:12
> +        Once the connection is the context is established, process "run service worker"

s/is/in

> Source/WebCore/ChangeLog:17
> +        if processed. We used to assert that we had a connection to the context process.

s/if/is

> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.cpp:123
> +        server().runServiceWorkerIfNecessary(*serviceWorkerIdentifier, [connectionIdentifier = identifier(), fetchIdentifier, serviceWorkerIdentifier, request, options, formData](bool success) {

We should probably have WebSWServerConnection::startFetch take rvalue references for request, options and formData to take benefit of these here.
Space between ] ( although I am not a fan either.

Ditto in below WebSWServerConnection::XX methods.

> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.cpp:125
> +                sendToContextProcess(Messages::WebSWContextManagerConnection::StartFetch { connectionIdentifier, fetchIdentifier, serviceWorkerIdentifier, request, options, formData });

In case success is false, we need to return a DidNotHandle message on the web content process.

> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:55
> +    IPC::Connection& ipcConnection() const { return m_contentConnection.get(); }

Not sure ipcConnection is a great name since there is also a service worker process connection.
Should it be something like contentConnection() or webContentConnection() instead?

> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:101
>      Ref<IPC::Connection> m_contentConnection;

Hum, m_contextConnection is never used so we could remove it, or start using it.
Probably better to just remove it for now. We are probably not ready to split service workers into separate processes based on origins.

> Source/WebKit/StorageProcess/StorageProcess.cpp:112
> +    // Relaunch the context process if necessary.

Is this comment needed?

> Source/WebKit/StorageProcess/StorageProcess.cpp:124
> +    if (m_swServerConnections.size() == 1 && m_serverToContextConnection && &m_swServerConnections.begin()->value->ipcConnection() == m_serverToContextConnection->ipcConnection())

Instead, should we have WebSWServerConnection has a bool isServiceWorkerProcessConnection()?

> LayoutTests/http/tests/workers/service/resources/postmessage-after-sw-process-crash.js:19
> +        }, 0);

Are we sure 0 is sufficient to ensure it will crash before the messages?
Should we not try to make the test a little bit more robust with something like:
- a first message to the SW to set a value inside the worker, the SW having a default value.
- a second message to get from the SW the stored value, which should be the default value if the worker process was crashed and the service worker rerun.
Comment 9 Chris Dumez 2017-12-06 21:58:56 PST
Comment on attachment 328675 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328675&action=review

>> LayoutTests/http/tests/workers/service/resources/postmessage-after-sw-process-crash.js:19
>> +        }, 0);
> 
> Are we sure 0 is sufficient to ensure it will crash before the messages?
> Should we not try to make the test a little bit more robust with something like:
> - a first message to the SW to set a value inside the worker, the SW having a default value.
> - a second message to get from the SW the stored value, which should be the default value if the worker process was crashed and the service worker rerun.

I am sure. See https://bugs.webkit.org/show_bug.cgi?id=180496 where the test was failing before I implemented recovery.
Comment 10 Chris Dumez 2017-12-06 22:06:04 PST
Comment on attachment 328675 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=328675&action=review

>> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:55
>> +    IPC::Connection& ipcConnection() const { return m_contentConnection.get(); }
> 
> Not sure ipcConnection is a great name since there is also a service worker process connection.
> Should it be something like contentConnection() or webContentConnection() instead?

Just trying to be consistent with Brady here who used WebSWServerToContextConnection::ipcConnection().
Comment 11 Brady Eidson 2017-12-06 22:19:30 PST
(In reply to Chris Dumez from comment #10)
> Comment on attachment 328675 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=328675&action=review
> 
> >> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:55
> >> +    IPC::Connection& ipcConnection() const { return m_contentConnection.get(); }
> > 
> > Not sure ipcConnection is a great name since there is also a service worker process connection.
> > Should it be something like contentConnection() or webContentConnection() instead?
> 
> Just trying to be consistent with Brady here who used
> WebSWServerToContextConnection::ipcConnection().

Meh - Use something more sensical if such a thing exists.
Comment 12 Chris Dumez 2017-12-06 22:23:22 PST
(In reply to Brady Eidson from comment #11)
> (In reply to Chris Dumez from comment #10)
> > Comment on attachment 328675 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=328675&action=review
> > 
> > >> Source/WebKit/StorageProcess/ServiceWorker/WebSWServerConnection.h:55
> > >> +    IPC::Connection& ipcConnection() const { return m_contentConnection.get(); }
> > > 
> > > Not sure ipcConnection is a great name since there is also a service worker process connection.
> > > Should it be something like contentConnection() or webContentConnection() instead?
> > 
> > Just trying to be consistent with Brady here who used
> > WebSWServerToContextConnection::ipcConnection().
> 
> Meh - Use something more sensical if such a thing exists.

We have so many "connections", I am personally fine with ipcConnection() as it makes it clear it is the underlying IPC::Connection. contentConnection() or webContentConnection() are super vague and I would have no idea what they return from their name.
Comment 13 Chris Dumez 2017-12-06 22:24:53 PST
Created attachment 328679 [details]
Patch
Comment 14 WebKit Commit Bot 2017-12-06 22:27:45 PST
Comment on attachment 328679 [details]
Patch

Rejecting attachment 328679 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-02', 'validate-changelog', '--check-oops', '--non-interactive', 328679, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

ChangeLog entry in LayoutTests/ChangeLog contains OOPS!.

Full output: http://webkit-queues.webkit.org/results/5525882
Comment 15 Chris Dumez 2017-12-06 22:31:45 PST
Created attachment 328680 [details]
Patch
Comment 16 WebKit Commit Bot 2017-12-06 22:53:35 PST
Comment on attachment 328680 [details]
Patch

Rejecting attachment 328680 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-02', 'build', '--no-clean', '--no-update', '--build-style=release', '--port=mac']" exit_code: 2 cwd: /Volumes/Data/EWS/WebKit

Last 500 characters of output:
unsigned long long, std::optional<WTF::ObjectIdentifier<WebCore::ServiceWorkerIdentifierType> >, WebCore::ResourceRequest&&, WebCore::FetchOptions&&, IPC::FormDataReference&&) in WebSWServerConnection.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

** BUILD FAILED **


The following build commands failed:
	Ld /Volumes/Data/EWS/WebKit/WebKitBuild/Release/WebKit.framework/Versions/A/WebKit normal x86_64
(1 failure)

Full output: http://webkit-queues.webkit.org/results/5526070
Comment 17 Chris Dumez 2017-12-06 22:59:08 PST
Created attachment 328681 [details]
Patch
Comment 18 youenn fablet 2017-12-06 23:08:03 PST
But do we really need this ipcConnection() if it is only used to know which server connection we are talking to, in which case a bool getter would be better?
Also ipcConnection may return a const ref if possible.
Comment 19 Chris Dumez 2017-12-06 23:10:16 PST
(In reply to youenn fablet from comment #18)
> But do we really need this ipcConnection() if it is only used to know which
> server connection we are talking to, in which case a bool getter would be
> better?
> Also ipcConnection may return a const ref if possible.

We do not have this information at the moment so adding the bool getter was not trivial. I can look into this in a follow up.
Comment 20 WebKit Commit Bot 2017-12-06 23:37:53 PST
Comment on attachment 328681 [details]
Patch

Clearing flags on attachment: 328681

Committed r225622: <https://trac.webkit.org/changeset/225622>
Comment 21 WebKit Commit Bot 2017-12-06 23:37:55 PST
All reviewed patches have been landed.  Closing bug.
Comment 22 Radar WebKit Bug Importer 2017-12-06 23:38:22 PST
<rdar://problem/35902914>