Bug 166029 - [SOUP] SoupCookieJar is never released (resulting in sqlite temp files lying around)
Summary: [SOUP] SoupCookieJar is never released (resulting in sqlite temp files lying ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Catanzaro
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-19 11:47 PST by Allison Lortie (desrt)
Modified: 2017-07-10 06:21 PDT (History)
9 users (show)

See Also:


Attachments
Patch (6.30 KB, patch)
2017-07-02 08:55 PDT, Michael Catanzaro
no flags Details | Formatted Diff | Diff
Alternative patch (4.08 KB, patch)
2017-07-10 05:36 PDT, Carlos Garcia Campos
mcatanzaro: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Allison Lortie (desrt) 2016-12-19 11:47:27 PST
When running epiphany, the SoupCookieJar opens the cookies.sqlite file, creating two temporary files for journalling.  Normally, on sqlite cleanup, these files are committed to the main file and deleted.

libsoup does this properly when the SoupCookieJar is freed.

Unfortunately, that never happens because of the global singleton WebKitWebContext (or in the case of ephy, its own EphyShell).

See https://bugzilla.gnome.org/show_bug.cgi?id=776287 for the original report.
Comment 1 Michael Catanzaro 2017-01-12 12:27:50 PST
(In reply to comment #0)
> When running epiphany, the SoupCookieJar opens the cookies.sqlite file,
> creating two temporary files for journalling.  Normally, on sqlite cleanup,
> these files are committed to the main file and deleted.
> 
> libsoup does this properly when the SoupCookieJar is freed.
> 
> Unfortunately, that never happens because of the global singleton
> WebKitWebContext (or in the case of ephy, its own EphyShell).

Something else must be wrong, because Ephy does not use the default global web context, it uses its own, and I was wrong: Ephy DOES delete its own web context.
Comment 2 Michael Catanzaro 2017-01-15 05:39:29 PST
I just found:

https://bugzilla.gnome.org/show_bug.cgi?id=774011

which I reported not so long ago, but forgot about.

There I discovered that we have the following code in WebKit in SoupCookieJar.cpp:

void deleteAllCookies(const NetworkStorageSession& session)
{
    SoupCookieJar* cookieJar = cookieJarForSession(session);
    GUniquePtr<GSList> cookies(soup_cookie_jar_all_cookies(cookieJar));
    for (GSList* item = cookies.get(); item; item = g_slist_next(item)) {
        SoupCookie* cookie = static_cast<SoupCookie*>(item->data);
        soup_cookie_jar_delete_cookie(cookieJar, cookie);
        soup_cookie_free(cookie);
    }
}

Note that we need to free the cookie jar there too, else the delete all cookies feature will not remove the .sqlite-wal and .sqlite-shm files.
Comment 3 Michael Catanzaro 2017-01-15 11:16:30 PST
The problem (after the recent cookie storage refactor in trunk) is that NetworkStorageSession::defaultSession is never destroyed:

static std::unique_ptr<NetworkStorageSession>& defaultSession()
{
    ASSERT(isMainThread());
    static NeverDestroyed<std::unique_ptr<NetworkStorageSession>> session;
    return session;
}

NetworkStorageSession& NetworkStorageSession::defaultStorageSession()
{
    if (!defaultSession())
        defaultSession() = std::make_unique<NetworkStorageSession>(SessionID::defaultSessionID(), nullptr);
    return *defaultSession();
}

This is a problem we have again, and again, and again (e.g. bug #163504 and bug #164303) and just don't know how to solve. It's a best-practices clash between C++ exit time destructors, C++ smart pointers, and GObject. In C++ it is an antipattern to ever destroy a static object, since exit-time destructors are a perennial source of crashes, so we use NeverDestroyed to ensure we leak these objects and to indicate that it is intentional. But, repeatedly, we find that the correctness of our code depends on destroying some GObject that we store in a GRefPtr inside a NeverDestroyed object, ensuring that the GObject never gets unreffed when we really need it to.

It's not even really a problem specific to GObject. If a big object is declared with NeverDestroyed, that means the destructors of all its member variables, and the members' members, and so on, never run. Nobody is likely to check a big tree of data members to make sure the destructors are not doing anything that really needs to happen at exit time.

The only clear options are to (a) get rid of the use of NeverDestroyed and hope for the best, or (b) set up an exit handler to manually unref the GObject. In general, both are poor choices. In general, GObjects assume that they will always be freed and may rightfully do important stuff in dispose and finalize, so we really shouldn't be leaking them.
Comment 4 Michael Catanzaro 2017-02-11 06:32:26 PST
Although bug #168126 is related and could certainly exacerbate this issue, note that fixing it will do nothing to solve this. We need to actually destroy the default NetworkStorageSession somewhere.
Comment 5 Dan Jacobson 2017-06-29 16:42:55 PDT
I had the dream that I could import a lot of cookies into epiphany.
I would examine the cookies.sqlite structures, and figure out what sqlite commands I needed to add them in.
However, no matter how I close epiphany, the cookies are always have newer shm and wal files.
Comment 6 Dan Jacobson 2017-06-29 16:48:33 PDT
cookies.sqlite always has newer shm and wal files.
OK today I will research:
Find out if there is some tool I can use, that after closing epiphany, I can
then give it the three cookies.* files, and it will delete the other two, and just leave an updated cookies.sqlite.
Comment 7 Dan Jacobson 2017-06-29 17:41:23 PDT
OK, I found that
$ sqlite3 cookies.sqlite .dump
apparently treats all three files as a whole, GOOD.

Also I found that sometimes
~/.config/epiphany/cookies.sqlite
is the only file of the three there, the shm and wal are gone.

But apparently this state is not achieved by any special way of closing
epiphany, but probably epiphany reaching some checkpointing level...

All I know is after every few {open browser, browse some stuff, close
browser} sessions, a single cookies.sqlite file remained. Usually not,
but sometimes I was lucky.
Comment 8 Michael Catanzaro 2017-06-29 20:43:43 PDT
Dan, this bug is to delete those temp files on exit. It's not an appropriate place to discuss anything else.
Comment 9 Michael Catanzaro 2017-06-29 20:46:47 PDT
By the way, Carlos, I have no clue what to do here. We need to really rethink how we handle static objects. See comment #3.
Comment 10 Dan Jacobson 2017-06-30 08:57:31 PDT
(In reply to Michael Catanzaro from comment #8)
Yes. I'm saying that this bug
Blocks: [many (other peoples') projects]
that one might have never imagined.
Comment 11 Dan Jacobson 2017-07-01 19:14:43 PDT
A workaround I discovered is
1. Close your browser.
2. $ sqlite3 cookies.db .dump > /dev/null
http://sqlite.1065341.n5.nabble.com/Doc-bug-wal-html-should-mention-proper-way-to-remove-wal-and-shm-files-td96459.html
This will clean up the shm and wal (until the next time you open your browser.)
Comment 12 Michael Catanzaro 2017-07-01 21:15:23 PDT
This implementation of NetworkStorageSession::defaultStorageSession fixes the bug:

NetworkStorageSession& NetworkStorageSession::defaultStorageSession()
{
    if (!defaultSession()) {
        defaultSession() = std::make_unique<NetworkStorageSession>(SessionID::defaultSessionID(), nullptr);
        std::atexit([] {
            // Ensure cookie storage is closed cleanly even though the default session is NeverDestroyed.
            g_object_unref(defaultSession()->m_cookieStorage.get());
            g_object_unref(defaultSession()->m_cookieStorage.get());
            g_object_unref(defaultSession()->m_cookieStorage.leakRef());
        });
    }
    return *defaultSession();
}

It requires one unref for the SoupCookieJar owned by the default NeverDestroyed NetworkStorageSession, another unref because it's also owned by the SoupSession owned by the default NeverDestroyed SoupNetworkSession, and a third unref because it's also owned by... I dunno, something else, haven't figured that out.

The first thing I would do to simplify this is have the NetworkStorageSession access its SoupCookieJar via the SoupNetworkSession instead of keeping its own reference.

Unfortunately we know we for sure cannot destroy the default SoupSession, because that has caused nasty exit handler crashes in the past. So if we want to destroy the SoupCookieJar, then we have no choice but to unref it from somewhere it is not owned, to balance the ref owned by the default SoupSession that we know will never be destroyed.

Alternatively, we could try to sqlite3_close() the database manually. This would require exposing the private SQLiteDatabase object owned by WebKitSoupCookieJarSqlite (not desirable) or else adding some way to tell WebKitSoupCookieJarSqlite to close its database even though it's not being destroyed. This seems like the best approach.
Comment 13 Michael Catanzaro 2017-07-02 08:55:52 PDT
Created attachment 314418 [details]
Patch
Comment 14 Carlos Garcia Campos 2017-07-10 05:26:10 PDT
(In reply to Michael Catanzaro from comment #12)
> This implementation of NetworkStorageSession::defaultStorageSession fixes
> the bug:
> 
> NetworkStorageSession& NetworkStorageSession::defaultStorageSession()
> {
>     if (!defaultSession()) {
>         defaultSession() =
> std::make_unique<NetworkStorageSession>(SessionID::defaultSessionID(),
> nullptr);
>         std::atexit([] {
>             // Ensure cookie storage is closed cleanly even though the
> default session is NeverDestroyed.
>             g_object_unref(defaultSession()->m_cookieStorage.get());
>             g_object_unref(defaultSession()->m_cookieStorage.get());
>             g_object_unref(defaultSession()->m_cookieStorage.leakRef());
>         });
>     }
>     return *defaultSession();
> }
> 
> It requires one unref for the SoupCookieJar owned by the default
> NeverDestroyed NetworkStorageSession, another unref because it's also owned
> by the SoupSession owned by the default NeverDestroyed SoupNetworkSession,
> and a third unref because it's also owned by... I dunno, something else,
> haven't figured that out.

SoupSession actually takes two references, one in the list of features and another one in the default implementation of SoupSessionFeature::attach. Both are removed in soup_session_remove_feature, called in dispose. So, I think we could try again to actually clear the soup session, that will properly lcear the cookie jar too.

> The first thing I would do to simplify this is have the
> NetworkStorageSession access its SoupCookieJar via the SoupNetworkSession
> instead of keeping its own reference.
> 
> Unfortunately we know we for sure cannot destroy the default SoupSession,
> because that has caused nasty exit handler crashes in the past. So if we
> want to destroy the SoupCookieJar, then we have no choice but to unref it
> from somewhere it is not owned, to balance the ref owned by the default
> SoupSession that we know will never be destroyed.

Well, the crashes happened when the soup session was destroyed in exit handlers. We never figured out why, but it might have to do with the fact that we don't really know in which order global static variables are removed. Also note that the crash was not in WebKit, not in libsoup, but in gbnutls, so maybe newer versions of gnutls don't have that problem. In any case, I think we can try to clear the session from ChildProcessMainBase::platformFinalize(), which is called right after the run loop quits, before any exit handler, so I think it should be safe. The only problem is in case the network process finishes with an exit() call instead of quiting the main loop. If we see that's happening (it doesn't seem to happen here right now) we can then try a hack like what you propose below.

> Alternatively, we could try to sqlite3_close() the database manually. This
> would require exposing the private SQLiteDatabase object owned by
> WebKitSoupCookieJarSqlite (not desirable) or else adding some way to tell
> WebKitSoupCookieJarSqlite to close its database even though it's not being
> destroyed. This seems like the best approach.

I prefer to avoid this, unless there's no other cleaner option.
Comment 15 Carlos Garcia Campos 2017-07-10 05:27:41 PDT
Comment on attachment 314418 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=314418&action=review

> Source/WebKit2/ChangeLog:37
> +        Also note that we might be able to get rid of WebKitSoupCookieJarSqlite, since it seems to
> +        use exactly the same database format as the upstream SoupCookieJarDb class. I suspect
> +        WebKitSoupCookieJarSqlite exists only because SoupCookieJarDb is relatively new, and the
> +        older SoupCookieJarSqlite class existed only in libsoup-gnome rather than libsoup proper.
> +        The advantage of continuing to use our WebKitSoupCookieJarSqlite class, besides that we
> +        already know it works, is that it utilizes the same SQLiteDatabase class that is used
> +        elsewhere in WebKit.

SoupCookieJarDb is part of libsoup-gnome and we depend on libsoup, not libsoup-gnome. That's why we added WebKitSoupCookieJarSqlite long time ago.
Comment 16 Carlos Garcia Campos 2017-07-10 05:36:16 PDT
Created attachment 314975 [details]
Alternative patch

Could you try this alternative patch?
Comment 17 Michael Catanzaro 2017-07-10 06:07:31 PDT
Comment on attachment 314975 [details]
Alternative patch

OK, but please don't backport this since it's not unlikely it will introduce crashes.
Comment 18 Michael Catanzaro 2017-07-10 06:09:21 PDT
(In reply to Carlos Garcia Campos from comment #15)
> SoupCookieJarDb is part of libsoup-gnome and we depend on libsoup, not
> libsoup-gnome. That's why we added WebKitSoupCookieJarSqlite long time ago.

SoupCookieJarSqlite is part of libsoup-gnome. SoupCookieJarDb is in libsoup, new in 2.42, which is now our min required version. The implementation was copied from SoupCookieJarSqlite to SoupCookieJarDb. Maybe we can start using SoupCookieJarDb.
Comment 19 Carlos Garcia Campos 2017-07-10 06:17:30 PDT
(In reply to Michael Catanzaro from comment #18)
> (In reply to Carlos Garcia Campos from comment #15)
> > SoupCookieJarDb is part of libsoup-gnome and we depend on libsoup, not
> > libsoup-gnome. That's why we added WebKitSoupCookieJarSqlite long time ago.
> 
> SoupCookieJarSqlite is part of libsoup-gnome. SoupCookieJarDb is in libsoup,
> new in 2.42, which is now our min required version. The implementation was
> copied from SoupCookieJarSqlite to SoupCookieJarDb. Maybe we can start using
> SoupCookieJarDb.

Ah, I didn't know that, then there's no a reason to use our own copy anymore.
Comment 20 Carlos Garcia Campos 2017-07-10 06:21:46 PDT
Committed r219290: <http://trac.webkit.org/changeset/219290>