Bug 51474

Summary: [Qt][WK2] Incomplete clean up on termination
Product: WebKit Reporter: Balazs Kelemen <kbalazs>
Component: WebKit2Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: kenneth, kimmo.t.kinnunen, kling, ossy, webkit-ews
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Attachments:
Description Flags
Patch
none
Patch - with the new files
none
renaming patch
kenneth: review-
patch v4
none
patch v5
none
patch v5 build-fixed none

Description Balazs Kelemen 2010-12-22 09:25:15 PST
The cleanup logic is broken again. There are two main issue:
 - we are killing the web process so it does not have a chance to clean up
 - the CrashHandler do not do it's job
The first one can be fixed easily by sending a SIGTERM instead of a SIGKILL, the second one is more complicated.
Since we force the termination of the web process with a signal we cannot rely on QCoreApplication::aboutToQuit.
Actually we could stop the event loop explicitly in answer to the signal but after a lot of debugging I feel
that QCoreApplication::aboutToQuit is not very reliable so I decided to call the cleanup method explicitly instead of
through the signal-slot mechanism. Besides this we should try to handle a crash only once to avoid endless recursion
if we are crashing during cleanup. In summary I would do the following changes:
 - Sending SIGTERM instead of SIGKILL from the UI process to force termination of the web process.
 - Calling the cleanup method explicitly from the signal handler _and_ from WebProcess::platformShutdown (we can reach that before getting
   the signal)
 - fix the cleanup of the MappedMemoryPool. Currently this is only deleted if we get a signal. If the web process shutting down before
   getting signal this is not deleted.

I feel that doing this in one patch is more reasonable than separating the issues into parts.
All of this is belonging to cleanup on termination.
Comment 1 Balazs Kelemen 2010-12-22 10:58:49 PST
Created attachment 77237 [details]
Patch
Comment 2 Balazs Kelemen 2010-12-22 11:14:26 PST
Created attachment 77241 [details]
Patch - with the new files

Seems like webkit-patch cannot handle git mv.
Comment 3 Early Warning System Bot 2010-12-22 11:20:33 PST
Attachment 77237 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7248106
Comment 4 Csaba Osztrogon√°c 2010-12-22 11:21:00 PST
Comment on attachment 77241 [details]
Patch - with the new files

View in context: https://bugs.webkit.org/attachment.cgi?id=77241&action=review

> WebKit2/Shared/qt/CleanupHandler.h:27
> +#ifndef CrashHandler_h
> +#define CrashHandler_h

Please rename them to CleanupHandler_h before landing.
Comment 5 Balazs Kelemen 2010-12-22 11:23:03 PST
Kling, could you also take a look at this?
Comment 6 Kenneth Rohde Christiansen 2010-12-22 11:33:45 PST
Comment on attachment 77241 [details]
Patch - with the new files

View in context: https://bugs.webkit.org/attachment.cgi?id=77241&action=review

I think Kimmo should look at this before it is being committed. Also I do not like that you do renaming and all these changes in the same patch... it is very hard to see what was actually changed. The rename could easily be another patch. R- for this reason.

> WebKit2/Platform/qt/SharedMemoryQt.cpp:139
> +    // Do not leave behind the shared memory segment!

Comments should not end with ! They should just be informative.

> WebKit2/Platform/qt/SharedMemoryQt.cpp:147
> +    CleanupHandler::instance()->willDelete(m_impl);

I do not like the name willDelete... I dont understand what it actually does.
Comment 7 Balazs Kelemen 2010-12-22 17:59:03 PST
Created attachment 77287 [details]
renaming patch
Comment 8 Kimmo Kinnunen 2010-12-22 23:15:11 PST
As for fixing the existing bugs caused by implementing the crashhandler in the first place, i think this patch is good in general.

Rest is feedback on the crash handler, sorry about this..

However, I think the whole approach here is maybe not that wise. 

IMO the best fix would be to remove the whole crashhandler and have following contract for all the IPC and SHM files:
1. Web process creates them and sends the filename to UI process
2. UI process opens the file and deletes it
3. Web process would delete the file in its object destructors in case ui process didn't delete it. 

Point 3. would be mainly to maintain consistency. In practice it wouldn't be exercised at all, because the case ui process wouldn't open the file most likely mean that ui process crashed, and that should mean that web process would be killed with SIGKILL (or, it would, if it wouldn't have been modified to be killed with SIGINT instead..).

In practice this would leave stale files only when UI process crashes before deleting the file. This can be made very rare occasion, b/c first code that UI process runs is webkit code when handling the msg. Of course, it's a tradeoff, but I'd take it any day compared to non-terminating web processes (see below).


As for the concrete shortcomings of current code (crashhandler):

1) Doesn't survive SIGKILL. The web process will receive sigkill signals.

2) Runs c++ destructors from address space of the crashed program. It could as well be the same destructors that caused the crash in the first place.

3) Deletes files in crash handler. The file names exist in the memory space of crashed process. In other words, those filenames can contain whatever valuable filename, like my $HOME/.emacs, due to some stack/heap smashing bug. When coding properly, I don't actually think there's much that you are allowed to do in signal handlers. For one, I wouldn't call any Qt functions.- 

4) One of the main features of WebKit is that it is fast to kill the process. Now you are again introducing code that will swap in random pages of memory during exit. 

5) Relies on the bug introduced in http://trac.webkit.org/changeset/72077. This means that there's no guarantees that Web Process will be properly killed when UI process dies.. To ensure this, we would need again to write some daemon monitoring stale, locked up web processes.


Here's the subset of POSIX that you might be able to run from signal handlers: https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers . 

However, I don't understand what you can do with the funcs if you cannot rely on the integrity of the memory.
Comment 9 Kimmo Kinnunen 2010-12-22 23:21:44 PST
Oops, a typo there:
So using SIGKILL was my original intention in r72077.

And point 5 should read
5) Relies on the bug introduced in http://trac.webkit.org/changeset/72088. ...

In other words, I think crash cleanup should be done otherwise, and sending SIGINT instead of SIGKILL is a bug.
Comment 10 Kenneth Rohde Christiansen 2010-12-24 02:51:15 PST
Comment on attachment 77287 [details]
renaming patch

r- due to Kimmo's comments
Comment 11 Balazs Kelemen 2011-01-03 04:42:55 PST
(In reply to comment #8)

Sorry for the long delay, I was on my vacation.

> As for fixing the existing bugs caused by implementing the crashhandler in the first place, i think this patch is good in general.
> 
> Rest is feedback on the crash handler, sorry about this..
> 
> However, I think the whole approach here is maybe not that wise. 
> 
> IMO the best fix would be to remove the whole crashhandler and have following contract for all the IPC and SHM files:
> 1. Web process creates them and sends the filename to UI process
> 2. UI process opens the file and deletes it
> 3. Web process would delete the file in its object destructors in case ui process didn't delete it. 

But we should assure that those destructors will be executed. By terminating the web process with SIGKILL that's not the case.
Furthermore, SHM segments must be released on both side unless the system won't free it up (that is the case on Unix).
 
> 3) Deletes files in crash handler. The file names exist in the memory space of crashed process. In other words, those filenames can contain whatever valuable filename, like my $HOME/.emacs, due to some stack/heap smashing bug. When coding properly, I don't actually think there's much that you are allowed to do in signal handlers. For one, I wouldn't call any Qt functions.- 

Of course crash handling is nasty. The main purpose of doing that was to save the bots from being full of stale files and shm segments in
the case when trunk is in a crashing state for a while. This is not that important however since a production release should be stable enough
to not suffering from this. However, at least on normal termination those staling stuff should be cleaned.

> 
> 4) One of the main features of WebKit is that it is fast to kill the process. Now you are again introducing code that will swap in random pages of memory during exit.

That's true but how could we do any cleaning without that? I know this is a policy but I do not think that it is a main feature :)
Comment 12 Balazs Kelemen 2011-01-03 04:47:26 PST
> In other words, I think crash cleanup should be done otherwise, and sending SIGINT instead of SIGKILL is a bug.

In the next patch I will remove the crash handling logic and restrict the patch
to handling cleanup at termination. About SIGKILL: I think about sending SIGTERM,
waiting for the process to finish and if it times out sending SIGKILL. This means
shutting down the UI process would need some more time :(
Comment 13 Balazs Kelemen 2011-01-03 06:12:57 PST
That's not just about the web process.
Comment 14 Balazs Kelemen 2011-01-03 10:15:48 PST
Created attachment 77816 [details]
patch v4

Returned to QCoreApplication::aboutToQuit. It had became reliable after I stopped ignoring thread affinity.
No crash handling, just termination. The patch contains the renames and the functional changes in one, but git diff -M is hopefully
making it easier to review (it just shows the differing lines).
Comment 15 Kimmo Kinnunen 2011-01-03 11:40:43 PST
(In reply to comment #14)
> Created an attachment (id=77816) [details]
> patch v4
> 
> Returned to QCoreApplication::aboutToQuit. It had became reliable after I stopped ignoring thread affinity.
> No crash handling, just termination. The patch contains the renames and the functional changes in one, but git diff -M is hopefully
> making it easier to review (it just shows the differing lines).


If I understand correctly, you moved the file deletion to UI process? I don't think that solves any existing problem correctly?

If I do killall -9 MiniBrowser, won't that fail?

How about 'killall -9 MiniBrowser QtWebProcess' ?

I don't think you can rely on crash handlers. You cannot rely on any explicit cleanup.  And you should not have to.

(In reply to comment #11)
> (In reply to comment #8)
> But we should assure that those destructors will be executed. By terminating the web process with SIGKILL that's not the case.
> Furthermore, SHM segments must be released on both side unless the system won't free it up (that is the case on Unix).

Sounds like that SHM API is not very robust nor up to modern standards. Maybe there's something better that actually functions correctly?

> Of course crash handling is nasty. The main purpose of doing that was to save the bots from being full of stale files and shm segments in
> the case when trunk is in a crashing state for a while. This is not that important however since a production release should be stable enough
> to not suffering from this. However, at least on normal termination those staling stuff should be cleaned.

This rationale is actually completely inverse of what is important.

On your buildbots, you can surely fix this by scripts:
'sudo find /tmp -iname '*QtWebKit*' -delete'

On "production" browser there can exist no possibility that stale files would prevent startup.

> > 4) One of the main features of WebKit is that it is fast to kill the process. Now you are again introducing code that will swap in random pages of memory during exit.
> That's true but how could we do any cleaning without that? I know this is a policy but I do not think that it is a main feature :)

I already explained:
1. open file
2. communicate file to the other process
3. open file in other process
3. delete file in other process

There's variations of this scheme, most of them a bit more robust than above simple strategy. I think all of them are more robust than deleting the files explicitly in some exit condition.
 
While the patch4 might solve current bugs, i don't think it's any solution to the problem of never leaving files behind. Whether or not that should block review, that's not for me to decide.
Comment 16 Kimmo Kinnunen 2011-01-03 13:20:53 PST
And let me clarify:
IHMO it is good to clean up the current implementation, if this is the only way to fix the buildbots.

However, IMO the underlying issue, should be fixed differently to this patch.
Comment 17 Kenneth Rohde Christiansen 2011-01-03 13:23:03 PST
(In reply to comment #16)
> And let me clarify:
> IHMO it is good to clean up the current implementation, if this is the only way to fix the buildbots.
> 
> However, IMO the underlying issue, should be fixed differently to this patch.

So you are for this patch as an intermediate step?
Comment 18 Kenneth Rohde Christiansen 2011-01-03 13:28:30 PST
Comment on attachment 77816 [details]
patch v4

View in context: https://bugs.webkit.org/attachment.cgi?id=77816&action=review

> WebKit2/ChangeLog:10
> +        Do not try to cleanup on crash because it is ont reliable.

not*

> WebKit2/ChangeLog:14
> +        do not own that. This is the case of a server connection.

s/that/it. s/of/with/

> WebKit2/ChangeLog:16
> +        Making MappedMemoryPool a QObject so the CleanupHandler can destruct it

Turn MappedMemoryPool into a ...

> WebKit2/Platform/qt/MappedMemoryPool.cpp:45
> +        // Do not leave mapping files on the disk!

s/!/.

> WebKit2/Platform/qt/SharedMemoryQt.cpp:102
> +    // Do not leave behind the shared memory segment.

move behind to the end. Do not leave the shared memory segment behind.

> WebKit2/Platform/qt/SharedMemoryQt.cpp:139
> +    // Do not leave behind the shared memory segment.

same here

> WebKit2/Shared/qt/CleanupHandler.cpp:41
> +    QCoreApplication* coreApplication = QCoreApplication::instance();

Maybe you can use qApp instead? just wondering...

> WebKit2/Shared/qt/CleanupHandler.h:63
> +    static void sigTermHandler(int);

Is this one still needed?

> WebKit2/UIProcess/Launcher/qt/ProcessLauncherQt.cpp:90
> +    // Do not leave behind the socket file.

move behind to the end.

> WebKit2/UIProcess/Launcher/qt/ProcessLauncherQt.cpp:102
> +        static bool registered = false;

isRegistered would be slightly more WebKitish
Comment 19 Kimmo Kinnunen 2011-01-03 21:52:51 PST
(In reply to comment #17)
> (In reply to comment #16)
> > And let me clarify:
> > IHMO it is good to clean up the current implementation, if this is the only way to fix the buildbots.
> > 
> > However, IMO the underlying issue, should be fixed differently to this patch.
> 
> So you are for this patch as an intermediate step?

Sure, if it's too much work to fix the buildbots by other means.
Comment 20 Balazs Kelemen 2011-01-04 04:52:09 PST
Created attachment 77878 [details]
patch v5
Comment 21 Balazs Kelemen 2011-01-04 04:55:42 PST
> Maybe you can use qApp instead? just wondering...
> 
> > WebKit2/Shared/qt/CleanupHandler.h:63
> > +    static void sigTermHandler(int);
> 
> Is this one still needed?

Yes, to be able to force termination of the web process on UNIX where
QProcess::terminate sends SIGTERM.

All the other suggestions has been done in the new patch.
Comment 22 Early Warning System Bot 2011-01-04 05:02:41 PST
Attachment 77878 [details] did not build on qt:
Build output: http://queues.webkit.org/results/7297371
Comment 23 Balazs Kelemen 2011-01-04 05:08:18 PST
Created attachment 77879 [details]
patch v5 build-fixed
Comment 24 Balazs Kelemen 2011-01-04 06:01:58 PST
Comment on attachment 77879 [details]
patch v5 build-fixed

Clearing flags on attachment: 77879

Committed r74967: <http://trac.webkit.org/changeset/74967>
Comment 25 Balazs Kelemen 2011-01-04 06:02:07 PST
All reviewed patches have been landed.  Closing bug.