30954 – Possible race condition in Database.cpp

RESOLVED WONTFIX30954

Possible race condition in Database.cpp

https://bugs.webkit.org/show_bug.cgi?id=30954

Summary Possible race condition in Database.cpp

Xan Lopez

Reported 2009-10-30 07:02:31 PDT

While debugging some ASSERTions happening randomly in the storage code in the GTK port, I found the following with valgrind/helgrind: ==7268== Possible data race during write of size 1 at 0xa5c20a9 by thread #1 ==7268== at 0x47D4983: WebCore::Database::stop() (Database.cpp:361) ==7268== by 0x4357DAD: WebCore::Document::stopDatabases() (Document.cpp:4406) ==7268== by 0x455316C: WebCore::FrameLoader::stopLoading(WebCore::UnloadEventPolicy, WebCore::DatabasePolicy) (FrameLoader.cpp:554) ==7268== by 0x455331F: WebCore::FrameLoader::closeURL() (FrameLoader.cpp:585) ==7268== by 0x455BDC0: WebCore::FrameLoader::transitionToCommitted(WTF::PassRefPtr<WebCore::CachedPage>) (FrameLoader.cpp:2465) ==7268== by 0x455B86F: WebCore::FrameLoader::commitProvisionalLoad(WTF::PassRefPtr<WebCore::CachedPage>) (FrameLoader.cpp:2391) ==7268== by 0x453EAD1: WebCore::DocumentLoader::commitIfReady() (DocumentLoader.cpp:320) ==7268== by 0x453EB0A: WebCore::DocumentLoader::finishedLoading() (DocumentLoader.cpp:327) ==7268== by 0x455CA6D: WebCore::FrameLoader::finishedLoading() (FrameLoader.cpp:2697) ==7268== by 0x456CEA1: WebCore::MainResourceLoader::didFinishLoading() (MainResourceLoader.cpp:393) ==7268== by 0x456C660: WebCore::MainResourceLoader::continueAfterContentPolicy(WebCore::PolicyAction, WebCore::ResourceResponse const&) (MainResourceLoader.cpp:264) ==7268== by 0x456C777: WebCore::MainResourceLoader::continueAfterContentPolicy(WebCore::PolicyAction) (MainResourceLoader.cpp:278) ==7268== This conflicts with a previous read of size 1 by thread #3 ==7268== at 0x47F7BC4: WebCore::Database::stopped() const (Database.h:110) ==7268== by 0x47F5E71: WebCore::SQLTransaction::checkAndHandleClosedDatabase() (SQLTransaction.cpp:151) ==7268== by 0x47F60CD: WebCore::SQLTransaction::performNextStep() (SQLTransaction.cpp:182) ==7268== by 0x47DE215: WebCore::DatabaseTransactionTask::doPerformTask() (DatabaseTask.cpp:145) ==7268== by 0x47DDC7B: WebCore::DatabaseTask::performTask() (DatabaseTask.cpp:56) ==7268== by 0x47DEDB5: WebCore::DatabaseThread::databaseThread() (DatabaseThread.cpp:99) ==7268== by 0x47DED02: WebCore::DatabaseThread::databaseThreadStart(void*) (DatabaseThread.cpp:82) ==7268== by 0x41044B5: WTF::threadEntryPoint(void*) (Threading.cpp:64) Basically there's the possibility of the main thread being busy stopping all current transactions, while the database thread wants to check if they are stopped to do something else. This all happens without any kind of lock, so there's a real possibility of things going the wrong way I think. Not sure what would be the best fix here though, since I'm not very familiar with the code.

Attachments
patch: put m_stopped behind a lock (1.78 KB, patch) 2010-03-31 18:53 PDT, Dumitru Daniliuc	levin: review- dumi: commit-queue-	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Xan Lopez

Comment 1 2009-11-02 03:28:26 PST

There's another possible race condition spotted by valgrind/helgrind running the same test (the test is storage/open-database-while-transaction-in-progress.html, btw): ==18197== Possible data race during write of size 4 at 0xa643cb4 by thread #1 ==18197== at 0x47F7607: WebCore::SQLTransaction::deliverSuccessCallback() (SQLTransaction.cpp:482) ==18197== by 0x47F6351: WebCore::SQLTransaction::performPendingCallback() (SQLTransaction.cpp:204) ==18197== by 0x47D63EA: WebCore::Database::deliverPendingCallback(void*) (Database.cpp:624) ==18197== by 0x41015EF: WTF::dispatchFunctionsFromMainThread() (MainThread.cpp:94) ==18197== by 0x41072FF: WTF::timeoutFired(void*) (MainThreadGtk.cpp:43) ==18197== by 0x7998D22: g_timeout_dispatch (gmain.c:3396) ==18197== by 0x7995EB9: g_main_dispatch (gmain.c:1960) ==18197== by 0x7997116: g_main_context_dispatch (gmain.c:2513) ==18197== by 0x7997536: g_main_context_iterate (gmain.c:2591) ==18197== by 0x7997C42: g_main_loop_run (gmain.c:2799) ==18197== by 0x7324C59: gtk_main (gtkmain.c:1218) ==18197== by 0x80509B1: runTest(std::string const&) (DumpRenderTree.cpp:488) ==18197== This conflicts with a previous read of size 4 by thread #3 ==18197== at 0x47F61A7: WebCore::SQLTransaction::performNextStep() (SQLTransaction.cpp:188) ==18197== by 0x47DE2A5: WebCore::DatabaseTransactionTask::doPerformTask() (DatabaseTask.cpp:145) ==18197== by 0x47DDD0B: WebCore::DatabaseTask::performTask() (DatabaseTask.cpp:56) ==18197== by 0x47DEE45: WebCore::DatabaseThread::databaseThread() (DatabaseThread.cpp:99) ==18197== by 0x47DED92: WebCore::DatabaseThread::databaseThreadStart(void*) (DatabaseThread.cpp:82) ==18197== by 0x41044B5: WTF::threadEntryPoint(void*) (Threading.cpp:64) ==18197== by 0x4009B47: mythread_wrapper (hg_intercepts.c:201) ==18197== by 0x56F51E: start_thread (pthread_create.c:297) From reading the code a bit it would seem that in most places the lock (m_lockAcquired) is supposed to be held when running any step of a transaction. In some cases, though, it's not explicitly enforced, and in this one it seems to be completely missing.

Dumitru Daniliuc

Comment 2 2010-03-31 18:53:10 PDT

Created attachment 52234 [details] patch: put m_stopped behind a lock

WebKit Review Bot

Comment 3 2010-03-31 18:57:21 PDT

Attachment 52234 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/storage/Database.h:117: More than one command on the same line [whitespace/newline] [4] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.

Dmitry Titov

Comment 4 2010-04-01 12:55:20 PDT

I'm not sure I understand how having a lock around assigning a bool may prevent anything. I see how it prevents the valgrind from complaining, but if there is an issue with setting/checking this bool on different threads, then adding a lock in this manner doesn't seem to fix that. The bool can be flipped by another thread right after the check anyways. Do I miss something?

Dumitru Daniliuc

Comment 5 2010-04-01 13:08:50 PDT

(In reply to comment #4) > I'm not sure I understand how having a lock around assigning a bool may prevent > anything. I see how it prevents the valgrind from complaining, but if there is > an issue with setting/checking this bool on different threads, then adding a > lock in this manner doesn't seem to fix that. The bool can be flipped by > another thread right after the check anyways. > Do I miss something? This flag is basically used to interrupt transactions. The database thread checks it before running any transaction step, and if it's set to 'true', then we roll back the transaction. So it doesn't matter when the main thread flips this bool -- we'll just keep executing transactions until we notice it's set to 'true'. This patch doesn't change any of that. It just makes sure that access to a variable used on multiple threads is properly synchronized, which should make valgrind happy.

Alexey Proskuryakov

Comment 6 2010-04-01 13:20:09 PDT

Can you use atomicIncrement() for this?

Dumitru Daniliuc

Comment 7 2010-04-01 14:58:17 PDT

(In reply to comment #6) > Can you use atomicIncrement() for this? I can, but I was told that valgrind doesn't know about atomic operations and will keep complaining about this problem. Also, there's not a single atomic operation in the entire DB code. Should I still use atomicIncrement()? I'm happy to go with whatever solution is best, just want to make sure that all opinions are taken into account.

Dmitry Titov

Comment 8 2010-04-01 15:13:38 PDT

I would rather not add any synchronization primitives if the functionality does not require it, and I take from your explanation that the code does not have a race, actually - it's just the valgrind complains that a variable not protected by a lock is used by 2 threads. Valgrind as a tool shoudl have some ways to exclude the particular false positive. It would be better to do that then add a lock that is not needed. Or is it actually needed? What ASSERT did you refer to?

David Levin

Comment 9 2010-04-01 16:05:22 PDT

Comment on attachment 52234 [details] patch: put m_stopped behind a lock It seems to have been noted by other folks that this doesn't really solve anything except to make an arbitrary tool happy by slowing things down and using more memory. If there is some core issue, then this definitely doesn't fix it. If there isn't an issue, then there perhaps there should be a comment in the code to explain why this is ok.

Dumitru Daniliuc

Comment 10 2010-04-01 16:44:21 PDT

OK, I guess I'm going to close this issue then. m_stopped: Doesn't need to be locked. It's just a flag used to signal that a particular database needs to be closed (either because the document is about to go away, or because we need to delete that database). The main thread and the DB thread check this flag at the beginning of every transaction step, and immediately roll it back if m_stopped is set to 'true'. m_lockAcquired: It is a flag used to make sure that no transaction gets to run any step on the DB thread without going through the SQLTransactionCoordinator first. We assert that m_lockAcquired == true in all SQLTransaction methods that run on the DB thread. However, there's no need to do that for the steps that run on the main thread, because the main thread can safely handle multiple transactions on different handles to the same database at the same time (even though in practice m_lockAcquired == true during those steps too: it gets set to 'true' at the beginning of a transaction, and is reset to 'false' after committing/rolling back the transaction).

David Levin

Comment 11 2010-04-01 17:05:02 PDT

(In reply to comment #10) > OK, I guess I'm going to close this issue then. > How is the race with reading and setting "m_nextStep" (that is mentioned in comment #2, ok?

Dumitru Daniliuc

Comment 12 2010-04-01 17:09:12 PDT

(In reply to comment #11) > (In reply to comment #10) > > OK, I guess I'm going to close this issue then. > > > > How is the race with reading and setting "m_nextStep" (that is mentioned in > comment #2, ok? There's no race there. At every point in time the transaction is either running on the main thread, or the DB thread (or none): it is impossible for the main thread and the DB thread to run code in the same SQLTransaction object at the same time.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution WONTFIX

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Dumitru Daniliuc

Reported

2009-10-30 07:02 PDT

Modified

2010-04-01 17:09 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks