Bug 129684

Summary: [GTK] WebProcess leaked when closing pages with network process enabled
Product: WebKit Reporter: Carlos Garcia Campos <cgarcia>
Component: WebKit2Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: andersca, csaavedra, gns, svillar
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch
none
Use a 1 second timeout for ShouldTerminate message
none
Patch andersca: review+

Description Carlos Garcia Campos 2014-03-04 08:25:30 PST
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f33596f2143 in WTF::ThreadCondition::timedWait(WTF::Mutex&, double) () from /opt/gnome-3.0/lib64/libjavascriptcoregtk-3.0.so.0
#2  0x00007f3359711491 in WTF::BinarySemaphore::wait(double) () from /opt/gnome-3.0/lib64/libjavascriptcoregtk-3.0.so.0
#3  0x00007f335bc3a266 in IPC::Connection::waitForSyncReply(unsigned long, std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned int) ()
   from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#4  0x00007f335bc3b3a3 in IPC::Connection::sendSyncMessage(unsigned long, std::unique_ptr<IPC::MessageEncoder, std::default_delete<IPC::MessageEncoder> >, std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned int) () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#5  0x00007f335a89eb13 in bool IPC::Connection::sendSync<Messages::WebProcessProxy::ShouldTerminate>(Messages::WebProcessProxy::ShouldTerminate&&, Messages::WebProcessProxy::ShouldTerminate::Reply&&, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned int) () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#6  0x00007f335a89adc5 in WebKit::WebProcess::shouldTerminate() () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#7  0x00007f335a721d7a in WebKit::ChildProcess::terminationTimerFired() () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#8  0x00007f335a8913bf in WebKit::WebPage::close() () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#9  0x00007f335a8bcdba in WebKit::WebPage::didReceiveWebPageMessage(IPC::Connection*, IPC::MessageDecoder&) () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#10 0x00007f335bc3ea9b in IPC::MessageReceiverMap::dispatchMessage(IPC::Connection*, IPC::MessageDecoder&) () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#11 0x00007f335a89a8d6 in WebKit::WebProcess::didReceiveMessage(IPC::Connection*, IPC::MessageDecoder&) () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#12 0x00007f335bc38c2b in IPC::Connection::dispatchMessage(std::unique_ptr<IPC::MessageDecoder, std::default_delete<IPC::MessageDecoder> >) ()
   from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#13 0x00007f335bc38d73 in IPC::Connection::dispatchOneMessage() () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#14 0x00007f33596f0966 in WTF::RunLoop::performWork() () from /opt/gnome-3.0/lib64/libjavascriptcoregtk-3.0.so.0
#15 0x00007f33596fd3c9 in WTF::RunLoop::queueWork(WTF::RunLoop*) () from /opt/gnome-3.0/lib64/libjavascriptcoregtk-3.0.so.0
#16 0x00007f3354bb3268 in g_idle_dispatch (source=0x7f32ec002370, callback=0x7f33596fd3c0 <WTF::RunLoop::queueWork(WTF::RunLoop*)>, user_data=0x7f334afc7d90) at gmain.c:5319
#17 0x00007f3354bb0982 in g_main_dispatch (context=0xa4ec70) at gmain.c:3064
#18 0x00007f3354bb16f8 in g_main_context_dispatch (context=0xa4ec70) at gmain.c:3663
#19 0x00007f3354bb18ea in g_main_context_iterate (context=0xa4ec70, block=1, dispatch=1, self=0xa8f660) at gmain.c:3734
#20 0x00007f3354bb1d13 in g_main_loop_run (loop=0xb03930) at gmain.c:3928
#21 0x00007f335a81d974 in WebProcessMainGtk () from /opt/gnome-3.0/lib64/libwebkit2gtk-3.0.so.25
#22 0x0000003aa4e21d65 in __libc_start_main (main=0x400ad0 <main>, argc=2, argv=0x7fffda4b7ec8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffda4b7eb8) at libc-start.c:285
#23 0x0000000000400b01 in _start ()
Comment 1 Carlos Garcia Campos 2014-03-04 08:36:02 PST
This happens sometimes when closing the last page of a web process:

1.- WebPageProxy::close() sends the Close message to the WebProcess
2.- WebProcessProxy::removeWebPage() checks that it's the last page and that the process can be terminated and closes the connection (this only happens when using the network process)
3.- The WebProcess receives the Close message and enables the process termination
4.- The WebProcess io thread notices the connection has been closed and invalidates it
5.- Connection::connectionDidClose() calls platformInvalidate and schedules dispatchConnectionDidClose on the main loop
6.- WebProcess::shouldTerminate() tries to send the sync message ShouldTerminate to the UI process

At this point the connection is still valid (it has a client because dispatchConnectionDidClose is scheduled but hasn't been handled yet) but the connection has been invalidated in plaformInvalidate. So we fail to send the message, of course, but the connection keeps waiting for the reply because it still has a client. 

This only happens sometimes because in other cases where the dispatchConnectionDidClose is run before the sendSync, the connection ends up in didFailToSendSyncMessage() and the processes exists.
Comment 2 Carlos Garcia Campos 2014-03-04 09:14:33 PST
Created attachment 225784 [details]
Patch

I think this patch fixes the race condition, by checking if the connection is still open before trying to send the sync message
Comment 3 Carlos Garcia Campos 2014-03-05 06:29:18 PST
I've noticed that is not enough. I think the connection can be closed between Connection::sendMessage() that schedules sendOutgoingMessages() and when sendOutgoingMessages() is called. In that case we actually check is connected in canSendOutgoingMessages() and we don't try to send the message, but Connection::sendSyncMessage has already called waitForSyncReply and keeps waiting for the reply of a sync message that hasn't been sent.
Comment 4 Carlos Garcia Campos 2014-03-05 08:30:58 PST
Maybe the only problem is that we are using the default timeout (std::chrono::milliseconds::max()) maybe we could use a 1 second timeout for the shouldTerminate message?
Comment 5 Carlos Garcia Campos 2014-03-06 02:18:36 PST
Created attachment 225973 [details]
Use a 1 second timeout for ShouldTerminate message
Comment 6 Carlos Garcia Campos 2014-05-26 05:27:09 PDT
I've finally found the problem here. It's not actually a race condition in the end, the problem is that the web process is not notified when the UI process closes the connection. It happens because the UI process uses close() on the socket fd, which only shuts down the connection when the socket is not shared by another process. We are already closing the sockets in ProcessLauncherGtk right after the spawn and in the child process callback, but that's not enough in multiprocess mode. We need to use set the cloexec flag to make sure the socket is not exposed to other future children.
Comment 7 Carlos Garcia Campos 2014-05-26 05:36:24 PDT
Created attachment 232072 [details]
Patch
Comment 8 Carlos Garcia Campos 2014-05-26 09:55:42 PDT
Committed r169352: <http://trac.webkit.org/changeset/169352>