Bug 41105

Summary: fast/workers/storage/execute-sql-args-worker.html crashed on Leopard Commit Bot
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: WebCore Misc.Assignee: Dumitru Daniliuc <dumi>
Status: RESOLVED FIXED    
Severity: Normal CC: dimich, dumi, ericu, michaeln
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Attachments:
Description Flags
crash report from commit bot
none
patch
dumi: commit-queue-
patch eric: review+, dumi: commit-queue-

Description Eric Seidel (no email) 2010-06-23 14:52:27 PDT
fast/workers/storage/execute-sql-args-worker.html crashed on Leopard Commit Bot

I'll attach the crash report.

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000034
Crashed Thread:  7

Thread 7 Crashed:
0   com.apple.WebCore                   0x021b762c WebCore::OriginUsageRecord::diskUsage() + 12 (OriginUsageRecord.cpp:75)
1   com.apple.WebCore                   0x01ae69ea WebCore::DatabaseTracker::getMaxSizeForDatabase(WebCore::AbstractDatabase const*) + 90 (DatabaseTracker
.cpp:230)
2   com.apple.WebCore                   0x01adb13a WebCore::Database::maximumSize() const + 26 (Database.cpp:471)
3   com.apple.WebCore                   0x02342d48 WebCore::SQLTransaction::openTransactionAndPreflight() + 536 (RefPtr.h:66)
4   com.apple.WebCore                   0x023416ba WebCore::SQLTransaction::performNextStep() + 42 (SQLTransaction.cpp:190)
5   com.apple.WebCore                   0x01ae3515 WebCore::DatabaseTransactionTask::doPerformTask() + 21 (DatabaseTask.cpp:146)
6   com.apple.WebCore                   0x01ae344d WebCore::DatabaseTask::performTask() + 29 (DatabaseTask.cpp:81)
7   com.apple.WebCore                   0x01ae413b WebCore::DatabaseThread::databaseThread() + 763 (DatabaseThread.cpp:100)
8   libSystem.B.dylib                   0x979a9155 _pthread_start + 321
9   libSystem.B.dylib                   0x979a9012 thread_start + 34
Comment 2 Eric Seidel (no email) 2010-06-23 14:54:13 PDT
Created attachment 59565 [details]
crash report from commit bot
Comment 3 Dumitru Daniliuc 2010-06-23 15:35:34 PDT
http://developer.apple.com/mac/library/technotes/tn2004/tn2123.html: "EXC_BAD_ACCESS/KERN_PROTECTION_FAILURE — This is caused by the thread trying to write to read-only memory. This is always caused by a data access."

Also, OriginUsageRecord:75 is: if (m_cachedDiskUsageIsValid), and m_cachedDiskUsageIsValid is just a bool field.

Some forums suggest that this might be caused by bad RAM.

In any case, I've never seen this crash before, and I really don't see how checking the value of a bool variable could've caused it. I think we should treat it as flakiness, unless it happens again.
Comment 4 Eric Seidel (no email) 2010-06-23 15:39:31 PDT
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000034

Means something is trying to grab at an offset 34 bytes from NULL.  Meaning that either OriginUsageRecord (or whatever object owns the OriginUsageRecord) is NULL.  Or rather something is tryihng to grab at that object from a NULL pointer.
Comment 5 Eric Seidel (no email) 2010-06-23 15:39:50 PDT
If the bug isn't useful, we can close it.  Just pointing out that your diagnosis of "bad ram" is wrong. :)
Comment 6 Eric Seidel (no email) 2010-06-23 15:42:57 PDT
Looks like:

unsigned long long OriginQuotaManager::diskUsage(SecurityOrigin* origin) const
126	{
127	    ASSERT(m_usageRecordGuardLocked);
128	
129	    OriginUsageRecord* usageRecord = m_usageMap.get(origin);
130	    ASSERT(usageRecord);
131	
132	    return usageRecord->diskUsage();
133	}

Got back a NULL OriginUsageRecord* from the m_usageMap.  The question is why.  How can that happen? :)  That's the reason it crashed.
Comment 7 Dumitru Daniliuc 2010-06-23 17:43:29 PDT
I think I found a scenario that would lead to this crash.

Short version: race condition that can happen only when using workers.

Long version:
In order to open a database, Database::openDatabase() calls (among other things) DatabaseTracker::canEstablishDatabase() and then DatabaseTracker::addOpenDatabase(). canEstablishDatabase() makes sure that the OriginQuotaManager instance used by DatabaseTracker starts tracking the given origin, but does not tell the DatabaseTracker that there's a new open database. addOpenDatabase() does the opposite: it tells DatabaseTracker() that a new database was opened, but it doesn't make sure that OriginQuotaManager is tracking this origin.

On the other hand, when a database is closed, it calls DatabaseTracker::removeOpenDatabase(), which tells DatabaseTracker to stop keeping track of this DB. Also, if this happens to be the last DB in this context that DatabaseTracker knows about, then it tells OriginQuotaManager to stop tracking this security origin too.

Now in the non-worker case, this is fine, because all these calls happen on the same context thread, so there's no race. When we have multiple workers though, the context threads are different for each worker, but they still share the same DatabaseTracker instance. So here's what I think happens:

1. Worker #1 runs DB test N.
2. OriginQuotaManager starts tracking this origin.
3. Worker #1 wants to shut down. This posts a task to the DatabaseThread of Worker #1, which will make sure all DBs are closed, all references are released, etc.
4. Meanwhile, Worker #2 starts running test (N+1), in the same security origin.
5. Worker #2 calls DatabaseTracker::canEstablishDatabase(). Nothing interesting happens, because OriginQuotaManager is still tracking this origin.
6. Worker #1 finally closes the DB used by test N, and calls DatabaseTracker::removeOpenDatabase(). At this point, DatabaseTracker doesn't know yet about the DB that will be opened in test (N+1), so as far as it's concerned, the last DB in this origin is being closed. Therefore, DatabaseTracker tells OriginQuotaManager to stop tracking this origin.
7. Worker #2 calls DatabaseTracker::addOpenDatabase(). DatabaseTracker is happy to start keeping track of this new DB, but doesn't tell OriginQuotaManager to re-start keeping track of this origin.
8. Now worker #2 gets to a transaction. Before the transaction can run, we need to know how much space is available to this origin. Since OriginQuotaManager is not tracking this origin, a NULL OriginUsageRecord is returned.


Solution: I think the best solution would be to call OriginQuotaManager::trackOrigin() whenever we call DatabaseTracker::addOpenDatabase().


I'm not sure if this is the problem that led to this crash, but it looks to me like it would lead to the same stack track, and I think it's worth fixing anyway.
Comment 8 Dumitru Daniliuc 2010-06-23 18:04:39 PDT
Created attachment 59592 [details]
patch
Comment 9 Eric U. 2010-06-28 13:49:18 PDT
Comment on attachment 59592 [details]
patch

> Index: WebCore/ChangeLog
> ===================================================================
> --- WebCore/ChangeLog	(revision 61722)
> +++ WebCore/ChangeLog	(working copy)
> @@ -1,3 +1,13 @@
> +2010-06-23  Dumitru Daniliuc  <dumi@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Fix a race condition that can happen when using DBs in workers.
> +        https://bugs.webkit.org/show_bug.cgi?id=41105
> +
> +        * storage/DatabaseTracker.cpp:
> +        (WebCore::DatabaseTracker::addOpenDatabase):
> +
>  2010-06-23  Kwang Yul Seo  <skyul@company100.net>
>  
>          Reviewed by Kent Tamura.
> Index: WebCore/storage/DatabaseTracker.cpp
> ===================================================================
> --- WebCore/storage/DatabaseTracker.cpp	(revision 61722)
> +++ WebCore/storage/DatabaseTracker.cpp	(working copy)
> @@ -508,6 +508,12 @@ void DatabaseTracker::addOpenDatabase(Ab
>          LOG(StorageAPI, "Added open Database %s (%p)\n", database->stringIdentifier().ascii().data(), database);
>      }
>  
> +    {
> +        Locker<OriginQuotaManager> quotaManagerLocker(originQuotaManager());
> +        if (!originQuotaManager().tracksOrigin(database->securityOrigin()))
> +            originQuotaManager().trackOrigin(database->securityOrigin());
> +    }
> +
>      MutexLocker lockDatabase(m_databaseGuard);
>      doneCreatingDatabase(database->securityOrigin(), database->stringIdentifier());
>  }

WebCore/storage/DatabaseTracker.cpp:515
 +      }
I think that still doesn't work.
The deleting thread could just drop the openDatabaseMapLock in removeOpenDatabase, then wait there for a bit.  Meanwhile we hit and pass your new code.  Then the deleting thread locks OQM and removes the origin we just checked on, and we're back where we started.

One approach which I think would work would be to define originQuotaManager's lock as lockable under m_openDatabaseMapGuard in the DatabaseTracker locking hierarchy, so that you could hold the mapGuard through the whole thing on both sides..  I believe that would be safe, although it would restrict such changes in the future.
Comment 10 Dumitru Daniliuc 2010-06-28 14:51:43 PDT
Created attachment 59946 [details]
patch

Should be fixed now.
Comment 11 Eric U. 2010-06-28 15:01:22 PDT
Comment on attachment 59946 [details]
patch

> Index: WebCore/ChangeLog
> ===================================================================
> --- WebCore/ChangeLog	(revision 62036)
> +++ WebCore/ChangeLog	(working copy)
> @@ -1,3 +1,14 @@
> +2010-06-28  Dumitru Daniliuc  <dumi@chromium.org>
> +
> +        Reviewed by NOBODY (OOPS!).
> +
> +        Fix a race condition that can happen when using DBs in workers.
> +        https://bugs.webkit.org/show_bug.cgi?id=41105
> +
> +        * storage/DatabaseTracker.cpp:
> +        (WebCore::DatabaseTracker::addOpenDatabase):
> +        * storage/DatabaseTracker.h:
> +
>  2010-06-28  Beth Dakin  <bdakin@apple.com>
>  
>          Reviewed by Sam Weinig.
> Index: WebCore/storage/DatabaseTracker.cpp
> ===================================================================
> --- WebCore/storage/DatabaseTracker.cpp	(revision 62036)
> +++ WebCore/storage/DatabaseTracker.cpp	(working copy)
> @@ -506,6 +506,10 @@ void DatabaseTracker::addOpenDatabase(Ab
>          databaseSet->add(database);
>  
>          LOG(StorageAPI, "Added open Database %s (%p)\n", database->stringIdentifier().ascii().data(), database);
> +
> +        Locker<OriginQuotaManager> quotaManagerLocker(originQuotaManager());
> +        if (!originQuotaManager().tracksOrigin(database->securityOrigin()))
> +            originQuotaManager().trackOrigin(database->securityOrigin());
>      }
>  
>      MutexLocker lockDatabase(m_databaseGuard);
> @@ -553,10 +557,10 @@ void DatabaseTracker::removeOpenDatabase
>  
>          m_openDatabaseMap->remove(database->securityOrigin());
>          delete nameMap;
> -    }
>  
> -    Locker<OriginQuotaManager> quotaManagerLocker(originQuotaManager());
> -    originQuotaManager().removeOrigin(database->securityOrigin());
> +        Locker<OriginQuotaManager> quotaManagerLocker(originQuotaManager());
> +        originQuotaManager().removeOrigin(database->securityOrigin());
> +    }
>  }
>  
>  void DatabaseTracker::getOpenDatabases(SecurityOrigin* origin, const String& name, HashSet<RefPtr<AbstractDatabase> >* databases)
> Index: WebCore/storage/DatabaseTracker.h
> ===================================================================
> --- WebCore/storage/DatabaseTracker.h	(revision 62036)
> +++ WebCore/storage/DatabaseTracker.h	(working copy)
> @@ -63,7 +63,7 @@ public:
>      // This singleton will potentially be used from multiple worker threads and the page's context thread simultaneously.  To keep this safe, it's
>      // currently using 4 locks.  In order to avoid deadlock when taking multiple locks, you must take them in the correct order:
>      // m_databaseGuard before quotaManager if both locks are needed.
> -    // no other lock is taken in the code locked on m_openDatabaseMapGuard.
> +    // m_openDatabaseMapGuard before quotaManager if both locks are needed.
>      // notificationMutex() is currently independent of the other locks.
>  
>      bool canEstablishDatabase(ScriptExecutionContext*, const String& name, const String& displayName, unsigned long estimatedSize);

WebCore/storage/DatabaseTracker.h:66
 +      // m_openDatabaseMapGuard before quotaManager if both locks are needed.
Please point out that m_databaseGuard and m_openDatabaseMapGuard currently don't overlap at all.

WebCore/storage/DatabaseTracker.cpp:512
 +              originQuotaManager().trackOrigin(database->securityOrigin());
LGTM.
Comment 12 Dumitru Daniliuc 2010-06-28 15:02:38 PDT
> WebCore/storage/DatabaseTracker.h:66
>  +      // m_openDatabaseMapGuard before quotaManager if both locks are needed.
> Please point out that m_databaseGuard and m_openDatabaseMapGuard currently don't overlap at all.

Will do on landing. Thanks for the review!
Comment 13 Eric Seidel (no email) 2010-06-28 15:05:54 PDT
Comment on attachment 59946 [details]
patch

I don't fully understand this, but I trust eric's blessing.
Comment 14 Eric U. 2010-06-28 15:07:43 PDT
(In reply to comment #13)
> (From update of attachment 59946 [details])
> I don't fully understand this, but I trust eric's blessing.

Awesome.  I'm gonna get my reviewer bit before I get my committer bit ;'>.
Comment 15 Dumitru Daniliuc 2010-06-28 15:36:33 PDT
Landed: r62043.