RESOLVED FIXED 107784
webdatabase: Refactor DatabaseContext lookup to be thread-safe
https://bugs.webkit.org/show_bug.cgi?id=107784
Summary webdatabase: Refactor DatabaseContext lookup to be thread-safe
Mark Lam
Reported 2013-01-23 22:01:57 PST
Here is the current state of how things work in the webdatabase module: 1. The existing code maps ScriptExecutionContexts to DatabaseContexts by implementing DatabaseContext as a Supplement to the ScriptExecutionContext's Supplementable. The Supplement mechanism is intended for access only within the thread that created it. 2. DatabaseTracker::interruptAllDatabasesForContext() uses the ScriptExecutionContext* as the identifier of databases to interrupt. However, it should be using the corresponding DatabaseContext*. This will become evident as the right thing to do when we implement a database process for webkit2. 3. DatabaseTracker::interruptAllDatabasesForContext() is called from the MainThread to interrupt WorkerThread databases. Later, when we change DatabaseTracker::interruptAllDatabasesForContext() to use the DatabaseContext (point 2) instead of the ScriptExecutionContext, we will need to get access the WorkerThread's SupplementMap (point 1) from the MainThread (point 3). This is inherently not thread-safe. We will fix this by changing DatabaseContext to not be a Supplement of ScriptExecutionContext. Instead, we'll manage the map from ScriptExecutionContext to DatabaseContext in the DatabaseManager, and guard it with a lock. This allows us to get the thread-safe access we need for mapping ScriptExecutionContexts to DatabaseContexts without changing the owner-thread only usage requirement of Supplement mechanism. About the life-cycle management of DatabaseContexts ================================== Previously, as a Supplement, a DatabaseContext will be destructed when the SupplementMap gets destructed during the ScriptExecutionContext's destruction. In this new scheme, we tie the life-cycle of the DatabaseContext to all the Databases associated with it. By definition, a Database should not out-live its DatabaseContext anyway. Here is the gory details: 1. There is a 1:1 association between a ScriptExecutionContext and its DatabaseContext. When the script accesses database functionality, a DatabaseContext will be created and associated with it. 2. The DatabaseManager's database factory will create a DatabaseContext before it can create a Database. If one already exists for the given ScriptExecutionContext, it will reuse that one. The DatabaseManager manages a context map that stores raw pointers to the DatabaseContext instances. 3. The created DatabaseContext is held in a RefPtr and is passed to the Database constructor which will adopt the ref in a RefPtr field. 4. If the database successfully opens its SQL backend, the Database's RefPtr field will keep the databaseContext alive. 5. If the database fails to open its SQL backend, both the database and databaseContext will not be ref'ed by anything when the DatabaseManager's database factory returns. Both will get destructed then. In addition to the above, the DatabaseContext constructor will register itself with the DatabaseManager's context map. Similarly, the DatabaseContext's destructor will unregister itself from the DatabaseManager's context map. The context map does not contain any RefPtrs to the DatabaseContexts. In summary, this is what happens: 1. If we successfully open a database, the database will keep the databaseContext alive. 2. If more than one database is opened by the same script, all those databases will ref the same databaseContext, thereby keeping it alive. 3. When the last database referencing the databaseContext is closed, the databaseContext will automatically destruct itself. One difference from the old life-cycle scheme: in the old scheme, once a databaseContext is created, it is retained until the owner document is destructed. In the new scheme, the databaseContext will be released once no more databases are referring to it i.e. when it is no longer needed. Note: the life-cycle scheme may change again later when we refactor the Database class to completely split between a front-end and a back-end. This scheme (above) is adopted for now to allow the existing database code to continue to work while the module gets refactored in parts.
Attachments
The patch. (24.99 KB, patch)
2013-01-23 22:11 PST, Mark Lam
webkit-ews: commit-queue-
Patch + fix for release builds. (25.08 KB, patch)
2013-01-23 22:56 PST, Mark Lam
webkit.review.bot: commit-queue-
The final patch. (27.54 KB, patch)
2013-01-25 00:27 PST, Mark Lam
no flags
the final final patch? (34.53 KB, patch)
2013-01-28 22:35 PST, Mark Lam
no flags
svn up'ed to fix patch apply problem. (34.47 KB, patch)
2013-01-29 00:15 PST, Mark Lam
ggaren: review-
buildbot: commit-queue-
Mark Lam
Comment 1 2013-01-23 22:11:04 PST
Created attachment 184400 [details] The patch. Uploading the patch for EWS bot testing. I'm not requesting a review yet because I still need to give a little thought to the life-cycle management of Database objects. This patch ties the life-cycle of DatabaseContexts to the life-cycle of Databases. This is the right thing to do. However, I need to do due diligence to make sure that the life-cycle of Databases are managed appropriately so that there is no leak.
Adam Barth
Comment 2 2013-01-23 22:14:00 PST
> We will fix this by changing DatabaseContext to not be a Supplement of ScriptExecutionContext. Instead, we'll manage the map from ScriptExecutionContext to DatabaseContext in the DatabaseManager, and guard it with a lock. This allows us to get the thread-safe access we need for mapping ScriptExecutionContexts to DatabaseContexts without changing the owner-thread only usage requirement of Supplement mechanism. Ok. What you've written sounds reasonable. I suspect that we could come up with a better design if we wanted to do more surgery, but that's likely a big project. I seem to remember that jochen and dcarney were working on a bug about the lifetime of databases. I've CCed them in case they have some insights.
Early Warning System Bot
Comment 3 2013-01-23 22:18:45 PST
WebKit Review Bot
Comment 4 2013-01-23 22:36:57 PST
Comment on attachment 184400 [details] The patch. Attachment 184400 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/16074591
Build Bot
Comment 5 2013-01-23 22:41:50 PST
Comment on attachment 184400 [details] The patch. Attachment 184400 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://queues.webkit.org/results/16015034
Mark Lam
Comment 6 2013-01-23 22:56:00 PST
Created attachment 184408 [details] Patch + fix for release builds.
WebKit Review Bot
Comment 7 2013-01-24 03:33:04 PST
Comment on attachment 184408 [details] Patch + fix for release builds. Attachment 184408 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/16078671 New failing tests: fast/repaint/selection-clear.html
Mark Lam
Comment 8 2013-01-24 13:36:55 PST
Comment on attachment 184408 [details] Patch + fix for release builds. I think there is no implications with the DatabaseContext being tied to the Database life-cycle. The Database live-ness is determined by the JS garbage collector and destructs only when it is no longer needed. However, I'm obsoleting this patch for 2 reasons: 1. I discovered a bug in the handling of the DatabaseThread (owned by the DatabaseContext). The context needs to shut it down on destruction. I have a fix for this already. 2. There seems to be some regressions in the layout tests when run with a debug build. This doesn't seem to affect release builds. I need to investigate this a little more.
jochen
Comment 9 2013-01-24 13:55:18 PST
(In reply to comment #8) > (From update of attachment 184408 [details]) > I think there is no implications with the DatabaseContext being tied to the Database life-cycle. The Database live-ness is determined by the JS garbage collector and destructs only when it is no longer needed. The DatabaseThread has a RefPtr to all Databases that were opened during the page's lifetime. Even if JS drops the last reference to the Database, it won't be deleted as long as the DatabaseThread is alive :-/ See https://bugs.webkit.org/show_bug.cgi?id=68303 for a repro and a patch we proposed (but nobody liked). > > However, I'm obsoleting this patch for 2 reasons: > 1. I discovered a bug in the handling of the DatabaseThread (owned by the DatabaseContext). The context needs to shut it down on destruction. I have a fix for this already. > 2. There seems to be some regressions in the layout tests when run with a debug build. This doesn't seem to affect release builds. I need to investigate this a little more.
Mark Lam
Comment 10 2013-01-24 14:07:43 PST
(In reply to comment #9) > (In reply to comment #8) > > (From update of attachment 184408 [details] [details]) > > I think there is no implications with the DatabaseContext being tied to the Database life-cycle. The Database live-ness is determined by the JS garbage collector and destructs only when it is no longer needed. > > The DatabaseThread has a RefPtr to all Databases that were opened during the page's lifetime. Even if JS drops the last reference to the Database, it won't be deleted as long as the DatabaseThread is alive :-/ > > See https://bugs.webkit.org/show_bug.cgi?id=68303 for a repro and a patch we proposed (but nobody liked). Thanks for the tip. This issue will certainly get in the way of my implementation. I will investigate.
Mark Lam
Comment 11 2013-01-25 00:27:10 PST
Created attachment 184687 [details] The final patch. In addition to the work that has already been written up in the bug description, this final patch makes DatabaseContext a sub-class of ContextDestructionObserver. This allows it to be notified when the ScriptExecutionContext is destructed, and trigger its own shutdown as well as the shutdown of its DatabaseThread and close all the Database instances held by the thread. Previously, I thought that it is sufficient to depend on garbage collection of the Database instances to trigger the shutdown of the DatabaseContext and DatabaseThread. But Jochen pointed out that the Database instances is currently kept alive by the DatabaseThread itself (https://bugs.webkit.org/show_bug.cgi?id=68303). After thinking thru the issue some more, I decided that fixing 68303 is beyond the scope of this bug. I'm also not convinced yet that the solution proposed in the 68303 patch is the best way to go (I haven't had time to digest it properly yet). I'll leave that for a later time. Also, previously, I mentioned that I saw some regression in the layout tests with debug builds. They were actually not failures, but stderr diffs in the console out, but were consistently manifesting when I ran a debug build with this patch applied (in contrast with the baseline). It turned out that the issue there is due to https://bugs.webkit.org/show_bug.cgi?id=79013 which Adam provided a fix for. Once I incorporate Adam's fix, the stderr diffs went away completely. I should point out that with this patch, I also cleaned up the shutdown code of DatabaseContext and DatabaseThread. They are still only shutdown when their associated ScriptExecutionContext is destructed. However, they are now guaranteed to be cleaned up. I'm not completely sure, but there may have been a bug before where the DatabaseThread (and its databases) associated with the MainThread may be leaked if stopDatabases() wasn't called by someone external to the database system (because I couldn't find it in the code). I've changed the code so that the thread shutdown is done if it is not already triggered by someone else before the DatabaseContext is destructed. And I did some testing with some printfs to ensure that the shutdown did indeed occurred as expected with this patch. The patch is now ready for a review please. Thanks.
Adam Barth
Comment 12 2013-01-25 09:20:34 PST
> In addition to the work that has already been written up in the bug description, this final patch makes DatabaseContext a sub-class of ContextDestructionObserver. This allows it to be notified when the ScriptExecutionContext is destructed, and trigger its own shutdown as well as the shutdown of its DatabaseThread and close all the Database instances held by the thread. That's unlikely to be correct. Triggering shutdown via ContextDestructionObserver doesn't work if JavaScript can add a reference from some object retained by DatabaseContext to some object that retains ScriptExecutionContext. Instead, you'll probably need to use ActiveDOMObject, whose stop() notification is called at a predictable time.
Geoffrey Garen
Comment 13 2013-01-25 10:15:36 PST
Comment on attachment 184687 [details] The final patch. View in context: https://bugs.webkit.org/attachment.cgi?id=184687&action=review > Source/WebCore/Modules/webdatabase/AbstractDatabase.cpp:177 > +AbstractDatabase::AbstractDatabase(RefPtr<DatabaseContext>& databaseContext, const String& name, const String& expectedVersion, To pass a reference to an object, use PassRefPtr<T>, rather than RefPtr<T>&. This reduces refcount churn and is idiomatic. If you have a RefPtr<T> x, "x.release()" will set x to 0 and return a PassRefPtr<T>.
Mark Lam
Comment 14 2013-01-28 21:56:15 PST
Comment on attachment 184687 [details] The final patch. New patch with fixes coming soon.
Mark Lam
Comment 15 2013-01-28 22:35:28 PST
Created attachment 185158 [details] the final final patch? Addressed Adam's comment: DatabaseContext now implements ActiveDOMObject as opposed to ContextDestructionObserver. Addressed Geoff's comment: The DatabaseContext ref is now being passed as a PassRefPtr instead of a RefPtr&. Also fixed some race condition bugs pertaining the shutdown process of the DatabaseContext. These were encountered while teasing the changes. The shutdown code is now homogenized. The triggers sources for shutdown are: 1. the destructor: i.e. when all the Databases in this context has been GC'ed, and the ScriptExecutionContext may or may not be still alive. 2. ActiveDOMObject::stop(): called for shutting down ActiveDOMObjects before the ScriptExecutionContext goes away. The ScriptExecutionContext is still alive. 3. ContextDestructionObserver ::contextDestroyed(): this needed in case the ScriptExecutionContext dies before the DatabaseContext (which is kept alive by Database objects that yet to be GC'ed). We need to detach from the ScriptExecutionContext here. To be robust, we also shutdown the databases here in case the client did not call ActiveDOMObject::stop() before hand. 4. stopDatabases(): can be called by the client directly e.g. when frame loading gets aborted. While we could just leave the DatabaseContext and its databases around until the ScriptExecutionContext dies, I left this in there so that we can clean up the databases sooner. All of these paths now go thru / calls stopDatabases() to initiate orderly shutdown.
Mark Lam
Comment 16 2013-01-28 22:38:03 PST
Comment on attachment 185158 [details] the final final patch? May I have a review please? Thanks.
Mark Lam
Comment 17 2013-01-29 00:15:07 PST
Created attachment 185182 [details] svn up'ed to fix patch apply problem.
Build Bot
Comment 18 2013-01-29 07:04:50 PST
Comment on attachment 185182 [details] svn up'ed to fix patch apply problem. Attachment 185182 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://queues.webkit.org/results/16151213 New failing tests: fast/workers/worker-lifecycle.html
Mark Lam
Comment 19 2013-01-29 09:33:55 PST
(In reply to comment #18) > (From update of attachment 185182 [details]) > Attachment 185182 [details] did not pass mac-wk2-ews (mac-wk2): > Output: http://queues.webkit.org/results/16151213 > > New failing tests: > fast/workers/worker-lifecycle.html fast/workers/worker-lifecycle.html also fails in the baseline before this patch is applied. So, it is not an issue with this patch.
Geoffrey Garen
Comment 20 2013-01-29 11:12:53 PST
Comment on attachment 185182 [details] svn up'ed to fix patch apply problem. View in context: https://bugs.webkit.org/attachment.cgi?id=185182&action=review r- because I think DatabaseContext::contextDestroyed() is a bug. The rest of my comments here would be good to resolve before landing, but don't stand in the way of an r+. > Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:55 > + DatabaseManager::manager().notifyDatabaseContextConstructed(); I prefer for notifications to specify whether the event in question happened already, or is about to happen. In this case, I think we want to notify when construction has happened already, and when destruction is about to happen. Other options risk our client dereferencing an object before or after its valid state. I'd suggest moving this call to below the initializing call to suspendIfNeeded(), since this object is not fully constructed until after that call. Then, I would rename the notifications to "didConstructDatabaseContext" and "willDestructDatabaseContext". > Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:65 > + // Make sure that the DatabaseThread is stopped before we destruct. > + stopDatabases(); The comment here duplicates what the code says almost verbatim, increasing total reading time for these two lines by about 3X. I think you should remove it. > Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:70 > + // This is because the Databases that are managed by DatabaseThread still > + // relies on this ref between the context and the thread to execute the Grammar: "relies" should be "rely". > Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:77 > if (m_databaseThread) { > ASSERT(m_databaseThread->terminationRequested()); > m_databaseThread = 0; > } This code is a no-op. Clearing is the default behavior of the RefPtr destructor. I would suggest this instead: stopDatabases(); ASSERT(!m_databaseThread || m_databaseThread->terminationRequested()); Your comment about why stopDatabases() can't clear our m_databaseThread reference should move into stopDatabases(). > Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:95 > + // However, we're here because the destructor hasn't been called, and the > + // ScriptExecutionContext we're associated with is about to be destructed. I don't understand this comment or code. m_scriptExecutionContext is a RefPtr, so it should be impossible for this function to be called while m_scriptExecutionContext is non-NULL. What am I missing? > Source/WebCore/Modules/webdatabase/DatabaseManager.h:65 > + PassRefPtr<DatabaseContext> getDatabaseContext(ScriptExecutionContext*); We use the "get" prefix to indicate an out parameter. I'd rename this to "databaseContext" or "databaseContextFor". > Source/WebCore/Modules/webdatabase/DatabaseManager.h:127 > + PassRefPtr<DatabaseContext> getExistingDatabaseContext(ScriptExecutionContext*); We use the "get" prefix to indicate an out parameter. I'd rename this to "existingDatabaseContext" or "existingDatabaseContextFor". > Source/WebCore/Modules/webdatabase/DatabaseManager.h:129 > + typedef HashMap<ScriptExecutionContext*, DatabaseContext*, PtrHash<ScriptExecutionContext*> > ContextMap; PtrHash is the default hash for pointers, and doesn't need to be specified. > Source/WebCore/Modules/webdatabase/DatabaseManager.h:139 > +#if !ASSERT_DISABLED > + int m_databaseContextRegisteredCount; > + int m_databaseContextInstanceCount; > +#endif You should comment that these data members require locking m_contextMapLock. > Source/WebCore/Modules/webdatabase/DatabaseThread.cpp:58 > + // care of ensuring that a termination request has been issue. The Grammar: "issue" should be "issued". > Source/WebCore/Modules/webdatabase/DatabaseThread.cpp:60 > + // termination request will trigger an orderly shutdown of the thread, and > + // then deref itself. And then what will deref itself? The request, or the thread? > Source/WebCore/dom/ActiveDOMObject.cpp:60 > + // m_scriptExecutionContext would/should have been nullify by Grammar: "nullify" should be "nullified".
Mark Lam
Comment 21 2013-01-29 13:49:08 PST
Comment on attachment 185182 [details] svn up'ed to fix patch apply problem. View in context: https://bugs.webkit.org/attachment.cgi?id=185182&action=review >> Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:55 >> + DatabaseManager::manager().notifyDatabaseContextConstructed(); > > I prefer for notifications to specify whether the event in question happened already, or is about to happen. > > In this case, I think we want to notify when construction has happened already, and when destruction is about to happen. Other options risk our client dereferencing an object before or after its valid state. > > I'd suggest moving this call to below the initializing call to suspendIfNeeded(), since this object is not fully constructed until after that call. Then, I would rename the notifications to "didConstructDatabaseContext" and "willDestructDatabaseContext". Fixed. >> Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:65 >> + stopDatabases(); > > The comment here duplicates what the code says almost verbatim, increasing total reading time for these two lines by about 3X. I think you should remove it. Removed. >> Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:70 >> + // relies on this ref between the context and the thread to execute the > > Grammar: "relies" should be "rely". Fixed. >> Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:77 >> } > > This code is a no-op. Clearing is the default behavior of the RefPtr destructor. I would suggest this instead: > > stopDatabases(); > ASSERT(!m_databaseThread || m_databaseThread->terminationRequested()); > > Your comment about why stopDatabases() can't clear our m_databaseThread reference should move into stopDatabases(). Good point. Thanks for pointing that out. This code was inherited, but I'll delete it. The comment is moved to stopDatabases(). >> Source/WebCore/Modules/webdatabase/DatabaseContext.cpp:95 >> + // ScriptExecutionContext we're associated with is about to be destructed. > > I don't understand this comment or code. m_scriptExecutionContext is a RefPtr, so it should be impossible for this function to be called while m_scriptExecutionContext is non-NULL. What am I missing? m_scriptExecutionContext came from ContextDestructionObserver and is just a raw pointer, not a RefPtr. The whole purpose of contextDestroyed() is to inform the ContextDestructionObserver (which is the DatabaseContext in this case) that the ScriptExecutionContext is going down. Since the DatabaseContext can live on pass the ScriptExecutionContext, we need to detach from it. >> Source/WebCore/Modules/webdatabase/DatabaseManager.h:65 >> + PassRefPtr<DatabaseContext> getDatabaseContext(ScriptExecutionContext*); > > We use the "get" prefix to indicate an out parameter. I'd rename this to "databaseContext" or "databaseContextFor". OK. Renamed to databaseContextFor. >> Source/WebCore/Modules/webdatabase/DatabaseManager.h:127 >> + PassRefPtr<DatabaseContext> getExistingDatabaseContext(ScriptExecutionContext*); > > We use the "get" prefix to indicate an out parameter. I'd rename this to "existingDatabaseContext" or "existingDatabaseContextFor". Renamed to existingDatabaseContextFor. >> Source/WebCore/Modules/webdatabase/DatabaseManager.h:129 >> + typedef HashMap<ScriptExecutionContext*, DatabaseContext*, PtrHash<ScriptExecutionContext*> > ContextMap; > > PtrHash is the default hash for pointers, and doesn't need to be specified. Fixed. >> Source/WebCore/Modules/webdatabase/DatabaseManager.h:139 >> +#endif > > You should comment that these data members require locking m_contextMapLock. Grouped the fields together and added the comment. >> Source/WebCore/Modules/webdatabase/DatabaseThread.cpp:58 >> + // care of ensuring that a termination request has been issue. The > > Grammar: "issue" should be "issued". Typo. Fixed. >> Source/WebCore/Modules/webdatabase/DatabaseThread.cpp:60 >> + // then deref itself. > > And then what will deref itself? The request, or the thread? The databaseThread() thread function will deref the DatabaseThread object before returning. I conflate the function and the object as one because they work together. I've re-written the comment to be more clear about this. >> Source/WebCore/dom/ActiveDOMObject.cpp:60 >> + // m_scriptExecutionContext would/should have been nullify by > > Grammar: "nullify" should be "nullified". Thanks. Typo again. Fixed.
Mark Lam
Comment 22 2013-01-29 14:54:17 PST
Mark Lam
Comment 24 2013-01-30 09:31:46 PST
David Levin
Comment 25 2013-01-30 10:03:35 PST
(In reply to comment #24) > (In reply to comment #23) > > r141166 seems to have introduced some thread related assert failures on the Linux debug test bot. > > > > http://build.chromium.org/p/chromium.webkit/builders/WebKit%20Linux%20%28dbg%29/builds/5080 > > http://build.chromium.org/f/chromium/layout_test_results/WebKit_Linux__dbg_/results/layout-test-results/fast/workers/storage/change-version-sync-crash-log.txt > > ASSERTION FAILED: m_verifier.isSafeToUse() > > The bug for this is https://bugs.webkit.org/show_bug.cgi?id=108285. Note that this indicates a thread safety issue. This likely is an issue for the OSX port as well. It is just that the OSX port decided to turn off this assert everywhere so it doesn't show up.
Mark Lam
Comment 26 2013-01-30 10:05:21 PST
(In reply to comment #25) > Note that this indicates a thread safety issue. This likely is an issue for the OSX port as well. It is just that the OSX port decided to turn off this assert everywhere so it doesn't show up. I'm working on it.
Note You need to log in before you can comment on or make changes to this bug.