Bug 135218

Summary: IDB transactions never reset if the Web Process ends before cleaning up
Product: WebKit Reporter: Vicki Pfau <jeffrey+webkit>
Component: WebKit2Assignee: Vicki Pfau <jeffrey+webkit>
Status: RESOLVED FIXED    
Severity: Normal CC: alecflett, beidson, commit-queue, jsbell
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 135389    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
darin: review+
Patch for testing (not yet for review)
none
Patch for testing v2 - Much better
none
Mega patch v1 ddkilzer: review+

Description Vicki Pfau 2014-07-23 16:17:35 PDT
With the Database Process enabled, IDB transactions are almost entirely managed by the Web Process, even though the backing store is handled by the Database Process. In the event that the Web Process ends before it cleans up an IDB transaction, either due to a bookkeeping error, or by crashing, the Database Process will never clean up the transactions that were being managed by the Web Process. This leads to memory leaks at best, or the Database Process getting wedged at worst.

<rdar://problem/17356958>
Comment 1 Vicki Pfau 2014-07-23 16:25:47 PDT
Created attachment 235388 [details]
Patch
Comment 2 Darin Adler 2014-07-23 17:51:03 PDT
Comment on attachment 235388 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=235388&action=review

r=me but this has some really inefficient idioms that involve copying whole hash tables multiple times, and could easily be made more efficient.

> Source/WebKit2/DatabaseProcess/DatabaseToWebProcessConnection.cpp:77
> -
> +    // The WebProcess has disconnected, close all of the connections associated with it
> +    IDBConnectionMap idbConnections = m_idbConnections;
> +    for (auto serverConnectionIdentifier : idbConnections)
> +        removeDatabaseProcessIDBConnection(serverConnectionIdentifier.key);

I suggest auto& rather than auto for the identifiers to avoid unnecessary copying of each key/value pair. And iterating keys() so we can get at the key without ".key()" each time.

But copying the entire map just to iterate the keys is not efficient. Instead, if we can’t iterate the map in place, we should copy the keys into a vector with copyKeysToVector and iterate the vector.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:720
> +        HashSet<IDBIdentifier> transactions = m_establishedTransactions.get(&transactionIdentifier.connection());
> +        transactions.add(transactionIdentifier);
> +        m_establishedTransactions.set(&transactionIdentifier.connection(), transactions);

This can be done much more efficiently with add rather than get/set. Copying the set out of the map and back into the map every time is not good for performance, even if we added std::move to the set call to eliminate one of the copies. Instead we should do an add, passing an empty set, which will find the existing set. Then we call add on the set we found inside the map.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:755
> +        HashSet<IDBIdentifier> transactions = m_establishedTransactions.get(&transactionIdentifier.connection());
> +        transactions.remove(transactionIdentifier);
> +        m_establishedTransactions.set(&transactionIdentifier.connection(), transactions);

Same comment as with add above.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1089
> +    if (!m_establishedTransactions.contains(&connection))
> +        return;
> +
> +    HashSet<IDBIdentifier> transactions = m_establishedTransactions.get(&connection);

Should not do a contains followed by an get, since that’s two hash table lookups.

Also, if we can’t iterate the transactions in place inside the hash table, we should copy out the identifiers into a vector rather than copying the set.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1091
> +    for (auto transactionIdentifier : transactions) {

I suggest auto& instead of auto.
Comment 3 Vicki Pfau 2014-07-23 18:04:03 PDT
(In reply to comment #2)
> (From update of attachment 235388 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=235388&action=review
> 
> r=me but this has some really inefficient idioms that involve copying whole hash tables multiple times, and could easily be made more efficient.
> 
> > Source/WebKit2/DatabaseProcess/DatabaseToWebProcessConnection.cpp:77
> > -
> > +    // The WebProcess has disconnected, close all of the connections associated with it
> > +    IDBConnectionMap idbConnections = m_idbConnections;
> > +    for (auto serverConnectionIdentifier : idbConnections)
> > +        removeDatabaseProcessIDBConnection(serverConnectionIdentifier.key);
> 
> I suggest auto& rather than auto for the identifiers to avoid unnecessary copying of each key/value pair. And iterating keys() so we can get at the key without ".key()" each time.
> 
> But copying the entire map just to iterate the keys is not efficient. Instead, if we can’t iterate the map in place, we should copy the keys into a vector with copyKeysToVector and iterate the vector.

The map cannot be iterated in place because it is changed by the function it calls. The suggestions for fixes make sense, although it should also be possible to do a "while (!m_idbConnection.isEmpty())", as the original map will be empty when the loop ends.

> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:720
> > +        HashSet<IDBIdentifier> transactions = m_establishedTransactions.get(&transactionIdentifier.connection());
> > +        transactions.add(transactionIdentifier);
> > +        m_establishedTransactions.set(&transactionIdentifier.connection(), transactions);
> 
> This can be done much more efficiently with add rather than get/set. Copying the set out of the map and back into the map every time is not good for performance, even if we added std::move to the set call to eliminate one of the copies. Instead we should do an add, passing an empty set, which will find the existing set. Then we call add on the set we found inside the map.

I'm not sure I follow how this idiom is supposed to work.

> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:755
> > +        HashSet<IDBIdentifier> transactions = m_establishedTransactions.get(&transactionIdentifier.connection());
> > +        transactions.remove(transactionIdentifier);
> > +        m_establishedTransactions.set(&transactionIdentifier.connection(), transactions);
> 
> Same comment as with add above.
> 
> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1089
> > +    if (!m_establishedTransactions.contains(&connection))
> > +        return;
> > +
> > +    HashSet<IDBIdentifier> transactions = m_establishedTransactions.get(&connection);
> 
> Should not do a contains followed by an get, since that’s two hash table lookups.

There appears to be a bug in HashSet that will cause an assertion failure if the HashMap does not contain the key. It appears to create and then copy an empty HashSet, and the copying of the HashSet does not like the fact that it's empty, hence the check beforehand. I can remove the check, but it was causing the assertion to fail occasionally.

> Also, if we can’t iterate the transactions in place inside the hash table, we should copy out the identifiers into a vector rather than copying the set.

I believe we can iterate this in place, but I'm not 100% sure. I'll have to take a closer look.

> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1091
> > +    for (auto transactionIdentifier : transactions) {
> 
> I suggest auto& instead of auto.

Can do.
Comment 4 Vicki Pfau 2014-07-25 19:31:31 PDT
Created attachment 235557 [details]
Patch
Comment 5 Darin Adler 2014-07-25 21:18:39 PDT
Comment on attachment 235557 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=235557&action=review

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1084
> +    auto transactions = m_establishedTransactions.add(&transactionIdentifier.connection(), HashSet<IDBIdentifier>());

It’s a little strange to call this result of the add function “transactions”. I normally call such things “addResult”.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1095
> +    auto transactions = m_establishedTransactions.add(&transactionIdentifier.connection(), HashSet<IDBIdentifier>());
> +    transactions.iterator->value.remove(transactionIdentifier);

Here we could use find instead of add, since we don’t need to add anything if it’s empty:

    auto slot = m_establishedTransactions.find(&transactionIdentifier.connection());
    if (slot == m_establishedTransactions.end())
        return;
    slot->value.remove(transactionIdentifier);

It will be a little lower overhead than doing the add, avoiding the creation of the temporary HashSet.

We could even assert that it is not equal to end if we are sure that won’t happen.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1101
> +    auto transactionIdentifiers = m_establishedTransactions.find(&connection);

It’s a little peculiar to call the iterator “identifiers”. I normally call such things “slot” or sometimes “iterator”, or maybe “map entry”.

> Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1105
> +    for (auto transactionIdentifier : transactionIdentifiers.get()->value) {

I am surprised the get() is needed here.

I also think we’d get something slightly more efficient if we used auto& instead of auto.
Comment 6 Vicki Pfau 2014-07-28 15:12:07 PDT
(In reply to comment #5)
> (From update of attachment 235557 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=235557&action=review
> 
> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1084
> > +    auto transactions = m_establishedTransactions.add(&transactionIdentifier.connection(), HashSet<IDBIdentifier>());
> 
> It’s a little strange to call this result of the add function “transactions”. I normally call such things “addResult”.
> 
> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1095
> > +    auto transactions = m_establishedTransactions.add(&transactionIdentifier.connection(), HashSet<IDBIdentifier>());
> > +    transactions.iterator->value.remove(transactionIdentifier);

I much prefer to name variables after what they contain over what the function of that specific variable is.

> Here we could use find instead of add, since we don’t need to add anything if it’s empty:
> 
>     auto slot = m_establishedTransactions.find(&transactionIdentifier.connection());
>     if (slot == m_establishedTransactions.end())
>         return;
>     slot->value.remove(transactionIdentifier);
> 
> It will be a little lower overhead than doing the add, avoiding the creation of the temporary HashSet.
> 
> We could even assert that it is not equal to end if we are sure that won’t happen.

I'm not 100% certain that we won't get into a state where it's completely legal for this to be equal, but otherwise this change makes sense.

> > Source/WebKit2/DatabaseProcess/IndexedDB/UniqueIDBDatabase.cpp:1101
> > +    auto transactionIdentifiers = m_establishedTransactions.find(&connection);
> 
> It’s a little peculiar to call the iterator “identifiers”. I normally call such things “slot” or sometimes “iterator”, or maybe “map entry”.

Ditto, although I should probably keep the names consistent.
Comment 7 Vicki Pfau 2014-07-28 15:27:32 PDT
Committed r171704: <http://trac.webkit.org/changeset/171704>
Comment 8 Alexey Proskuryakov 2014-07-28 17:13:41 PDT
storage/indexeddb/intversion-revert-on-abort.html has started to fail, looks like it was because of this patch.
Comment 9 Alexey Proskuryakov 2014-07-29 09:40:35 PDT
And storage/indexeddb/version-change-abort.html too.
Comment 10 WebKit Commit Bot 2014-07-29 10:21:27 PDT
Re-opened since this is blocked by bug 135389
Comment 11 Brady Eidson 2014-08-06 14:10:33 PDT
Based on Jeffrey's feedback and analysis of what's causing this, we're theorizing that making the reset and rollback messages from the WebProcess synchronous will fix the remaining issues here.

Attaching his original patch with changes that do just that.
Comment 12 Brady Eidson 2014-08-06 14:10:59 PDT
Created attachment 236133 [details]
Patch for testing (not yet for review)
Comment 13 Brady Eidson 2014-08-06 14:14:05 PDT
Ugh, patch not quite ready, never mind.
Comment 14 Brady Eidson 2014-08-06 14:28:45 PDT
Created attachment 236136 [details]
Patch for testing v2 - Much better
Comment 15 Brady Eidson 2014-08-06 14:38:02 PDT
(In reply to comment #14)
> Created an attachment (id=236136) [details]
> Patch for testing v2 - Much better

I'm stress testing the layout tests with this patch.

Regarding the two version change tests that were failing (at least flakily):

With just the patch that landed in r171704 these tests would easily flake (sometimes a little, sometimes a LOT) if you ran them for about 100 iterations.

With the most recent patch I attached I'm running them for 20000 iterations, am about 10000 iterations in, and am not seeing any failures.

Once I'm done with that run, I will stress test the entire IDB test suite with this patch.
Comment 16 Brady Eidson 2014-08-06 16:29:20 PDT
Jeffrey has tested and shown that the original bug remains squashed.

With Jeffrey's original patch, I can easily reproduce the flakey layout tests.  With this patch, I've hammered them and not seen a failure.

Putting up for review.

Darin's original review on Jeffrey's original patch stands - None of that has changed at all.

What needs review is the WebCore changes and everything relating to resetTransactionSync/rollbackTransactionSync in WK2.
Comment 17 Brady Eidson 2014-08-06 16:29:51 PDT
Created attachment 236149 [details]
Mega patch v1
Comment 18 WebKit Commit Bot 2014-08-06 16:31:10 PDT
Attachment 236149 [details] did not pass style-queue:


ERROR: Source/WebKit2/DatabaseProcess/IndexedDB/DatabaseProcessIDBConnection.h:35:  Alphabetical sorting problem.  [build/include_order] [4]
ERROR: Source/WebKit2/DatabaseProcess/IndexedDB/DatabaseProcessIDBConnection.h:59:  The parameter name "connection" adds no information, so it should be removed.  [readability/parameter_name] [5]
ERROR: Source/WebKit2/DatabaseProcess/IndexedDB/DatabaseProcessIDBConnection.h:59:  The parameter name "decoder" adds no information, so it should be removed.  [readability/parameter_name] [5]
Total errors found: 3 in 14 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 19 David Kilzer (:ddkilzer) 2014-08-06 16:41:57 PDT
Comment on attachment 236149 [details]
Mega patch v1

View in context: https://bugs.webkit.org/attachment.cgi?id=236149&action=review

r=me on the new parts not already reviewed by Darin.

> Source/WebCore/Modules/indexeddb/IDBServerConnection.h:75
>      virtual void resetTransaction(int64_t transactionID, std::function<void()> completionCallback) = 0;
> +    virtual bool resetTransactionSync(int64_t transactionID) = 0;
>      virtual void rollbackTransaction(int64_t transactionID, std::function<void()> completionCallback) = 0;
> +    virtual bool rollbackTransactionSync(int64_t transactionID) = 0;

We could put "#if USE(LEVELDB)/#endif" around the non-Sync code since it's not only used on the LevelDB code path.

> Source/WebCore/Modules/indexeddb/IDBTransactionBackend.cpp:140
> -        m_database->serverConnection().rollbackTransaction(m_id, []() { });
> +        m_database->serverConnection().resetTransactionSync(m_id);

This should be rollbackTransactionSync().
Comment 20 Brady Eidson 2014-08-06 17:24:25 PDT
(In reply to comment #19)
> (From update of attachment 236149 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=236149&action=review
> 
> r=me on the new parts not already reviewed by Darin.
> 
> > Source/WebCore/Modules/indexeddb/IDBServerConnection.h:75
> >      virtual void resetTransaction(int64_t transactionID, std::function<void()> completionCallback) = 0;
> > +    virtual bool resetTransactionSync(int64_t transactionID) = 0;
> >      virtual void rollbackTransaction(int64_t transactionID, std::function<void()> completionCallback) = 0;
> > +    virtual bool rollbackTransactionSync(int64_t transactionID) = 0;
> 
> We could put "#if USE(LEVELDB)/#endif" around the non-Sync code since it's not only used on the LevelDB code path.

I started to do this but it immediately became way too invasive.

I'll leave it alone for now, and we can just nuke all the LEVELDB code in one huge swath, when it's the right time.
Comment 21 Brady Eidson 2014-08-06 17:34:44 PDT
http://trac.webkit.org/changeset/172193