Bug 155506

Summary: Database process crashes deleting a corrupt SQLite database file (null deref)
Product: WebKit Reporter: Ryan Haddad <ryanhaddad>
Component: WebCore Misc.Assignee: Brady Eidson <beidson>
Status: RESOLVED FIXED    
Severity: Normal CC: achristensen, alecflett, ap, beidson, commit-queue, jsbell, sam
Priority: P2    
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=159371
https://bugs.webkit.org/show_bug.cgi?id=160780
Bug Depends on:    
Bug Blocks: 154968    
Attachments:
Description Flags
Crash log
none
Patch
none
Patch none

Comment 1 Alexey Proskuryakov 2016-03-16 20:41:33 PDT
Brady, does this sound like a real bug?
Comment 2 Brady Eidson 2016-03-16 21:11:49 PDT
(In reply to comment #1)
> Brady, does this sound like a real bug?

In all the failure stdio's: "Child process terminated with signal 15: Terminated"

It's definitely unexpected that one of the processes involved would terminate during the test.

That said, I'm more inclined to suspect it is something iOS-specific or simulator-specific about TestWebKitAPI, versus something about IDB code.

I'm truly out of my element with the iOS testing infrastructure, especially TestWebKitAPI. I'd appreciate help by somebody much more in tune with it taking a look.
Comment 3 Brady Eidson 2016-03-16 21:13:04 PDT
(In reply to comment #2)
> (In reply to comment #1)
> > Brady, does this sound like a real bug?
> 
> In all the failure stdio's: "Child process terminated with signal 15:
> Terminated"
> 
> It's definitely unexpected that one of the processes involved would
> terminate during the test.
> 
> That said, I'm more inclined to suspect it is something iOS-specific or
> simulator-specific about TestWebKitAPI, versus something about IDB code.
> 
> I'm truly out of my element with the iOS testing infrastructure, especially
> TestWebKitAPI. I'd appreciate help by somebody much more in tune with it
> taking a look.

One thing that comes to mind:

Very few of the API tests actually spawn multiple web processes...  Are the others that do use multiple web processes enabled on iOS-sim?
Comment 4 Alexey Proskuryakov 2016-03-16 22:30:06 PDT
> In all the failure stdio's: "Child process terminated with signal 15: Terminated"

This just means that run-webkit-tests killed it after 30 seconds.
Comment 5 Brady Eidson 2016-03-17 09:09:48 PDT
(In reply to comment #4)
> > In all the failure stdio's: "Child process terminated with signal 15: Terminated"
> 
> This just means that run-webkit-tests killed it after 30 seconds.

AFAIK run-webkit-tests doesn't run the API tests.

Do you mean run-api-tests?

"it" - which "it"? The test will never complete if two processes don't launch and run their parts of the tests, or if they can't successfully message back their results.

Again, I'm curious as to whether multi-process support is working in TWKA on iOS-sim.
Comment 6 Alexey Proskuryakov 2016-03-17 09:45:31 PDT
> Do you mean run-api-tests?

Yes.

> "it" - which "it"? The test will never complete if two processes don't launch and run their parts of the tests, or if they can't successfully message back their results.

run-api-tests script killed TestWebKitAPI process, which hasn't terminated on its own within 30 seconds that it had to execute this particular test.

> Again, I'm curious as to whether multi-process support is working in TWKA on iOS-sim.

I'm not really sure what this question means. It's just a regular WebKit2 API client, and shouldn't be doing anything hidden that can affect WebKit2.
Comment 7 Alexey Proskuryakov 2016-04-07 17:21:17 PDT
Happens on Mac too.
Comment 8 Ryan Haddad 2016-06-24 12:26:53 PDT
This test is now timing out on almost every run of El Capitan and Yosemite release WK1

https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20WK1%20(Tests)/builds/7303

https://build.webkit.org/builders/Apple%20Yosemite%20Release%20WK1%20(Tests)/builds/15739
Comment 9 Brady Eidson 2016-06-24 13:07:16 PDT
(In reply to comment #8)
> This test is now timing out on almost every run of El Capitan and Yosemite
> release WK1

This is an API test that tests WK2 API.

I'm not sure what it means that it's failing on WK1.
Comment 10 Alexey Proskuryakov 2016-06-24 14:55:08 PDT
We run all the same API tests on WK1 and WK2 bots. Only makes it more confusing why some of the machines started failing so much.
Comment 11 Alexey Proskuryakov 2016-06-30 14:31:40 PDT
Created attachment 282466 [details]
Crash log

What happens here is that the Databases process simply crashes.

That means that we have several bugs here:

1. Whatever the proximate cause of crashing is.

2. WebKit keeps waiting despite a Databases process crash, apparently never detecting the condition.

3. IndexedDB state is not cleaned up between TestWebKitAPI runs.
Comment 12 Alexey Proskuryakov 2016-06-30 14:42:07 PDT
I can reproduce 100% locally if I copy ~/Library/WebKit/TestWebKitAPI/WebsiteData/IndexedDB content from the bot. Brady, I'll e-mail it to you.
Comment 13 Brady Eidson 2016-07-01 13:56:13 PDT
Yup, instant repro.

Crash the database process.

#0	0x00000001150bda5f in std::__1::__atomic_base<unsigned char, false>::compare_exchange_weak(unsigned char&, unsigned char, std::__1::memory_order) [inlined] at /Applications/Xcode.app/Contents/Developer/Toolchains/OSX10.11.xctoolchain/usr/bin/../include/c++/v1/atomic:879
#1	0x00000001150bd92d in WTF::Atomic<unsigned char>::compareExchangeWeak(unsigned char, unsigned char, std::__1::memory_order) at usr/local/include/wtf/Atomics.h:67
#2	0x00000001150bd8c5 in WTF::LockBase::lock() at usr/local/include/wtf/Lock.h:51
#3	0x00000001151c34ed in WTF::Locker<WTF::LockBase>::lock() at usr/local/include/wtf/Locker.h:68
#4	0x00000001151c34b3 in WTF::Locker<WTF::LockBase>::Locker(WTF::LockBase&) at usr/local/include/wtf/Locker.h:41
#5	0x00000001151c1c4d in WTF::Locker<WTF::LockBase>::Locker(WTF::LockBase&) at usr/local/include/wtf/Locker.h:41
#6	0x000000011723a963 in WebCore::SQLiteStatement::prepare() at /Volumes/Data/git/OpenSource/Source/WebCore/platform/sql/SQLiteStatement.cpp:62
#7	0x0000000117226a96 in WebCore::IDBServer::SQLiteIDBBackingStore::deleteBackingStore() at /Volumes/Data/git/OpenSource/Source/WebCore/Modules/indexeddb/server/SQLiteIDBBackingStore.cpp:2060
#8	0x000000011755d3f6 in WebCore::IDBServer::UniqueIDBDatabase::deleteBackingStore(WebCore::IDBDatabaseIdentifier const&) at /Volumes/Data/git/OpenSource/Source/WebCore/Modules/indexeddb/server/UniqueIDBDatabase.cpp:243
#9	0x000000011757bb1f in void WTF::callMemberFunctionForCrossThreadTaskImpl<WebCore::IDBServer::UniqueIDBDatabase, void (WebCore::IDBServer::UniqueIDBDatabase::*)(WebCore::IDBDatabaseIdentifier const&), std::__1::tuple<WebCore::IDBDatabaseIdentifier>, 0ul>(WebCore::IDBServer::UniqueIDBDatabase*, void (WebCore::IDBServer::UniqueIDBDatabase::*)(WebCore::IDBDatabaseIdentifier const&), std::__1::tuple<WebCore::IDBDatabaseIdentifier>&&, std::__1::integer_sequence<unsigned long, 0ul>) at usr/local/include/wtf/CrossThreadTask.h:82

....
Comment 14 Brady Eidson 2016-07-01 14:09:18 PDT
(In reply to comment #11)
> Created attachment 282466 [details]
> Crash log
> 
> What happens here is that the Databases process simply crashes.
> 
> That means that we have several bugs here:
> 

> 
> 3. IndexedDB state is not cleaned up between TestWebKitAPI runs.

I'm not concerned about this one for now - The current state of the art is "tests should clean up stale state before running"

> 1. Whatever the proximate cause of crashing is.

I'll explore this here.

> 2. WebKit keeps waiting despite a Databases process crash, apparently never
> detecting the condition.

I'm exploring this in https://bugs.webkit.org/show_bug.cgi?id=159371
Comment 15 Brady Eidson 2016-07-03 11:07:21 PDT
Retitling:
Database process crashes deleting a corrupt SQLite database file (null deref)

The database files that reproduce this are inconsistent IDB databases. Probably generated from a previous DB process crash that wasn't recovered from.

When the backing store tries to read from the file, it sees things it doesn't expect, so it closes and nulls out the database handle.

Then, we create a SQLiteStatement with a null database handle, causing the crash.

Adding a null check in the right place is appropriate, and allows the database to be deleted as expected, getting the bot (or user) out of this bad state.

Now to write a test.
Comment 16 Brady Eidson 2016-07-03 12:02:26 PDT
Created attachment 282665 [details]
Patch
Comment 17 Brady Eidson 2016-07-03 12:03:12 PDT
The attached patch is ready to go, but it won't apply/won't run EWS until the patch in https://bugs.webkit.org/show_bug.cgi?id=159371 is first reviewed and landed.
Comment 18 Brady Eidson 2016-07-04 09:05:13 PDT
Created attachment 282726 [details]
Patch
Comment 19 WebKit Commit Bot 2016-07-05 11:06:05 PDT
Comment on attachment 282726 [details]
Patch

Clearing flags on attachment: 282726

Committed r202822: <http://trac.webkit.org/changeset/202822>
Comment 20 WebKit Commit Bot 2016-07-05 11:06:10 PDT
All reviewed patches have been landed.  Closing bug.
Comment 22 Alexey Proskuryakov 2016-08-11 13:24:09 PDT
There is no crash log on the bot in this case.
Comment 23 Alexey Proskuryakov 2016-08-11 13:27:18 PDT
Filed bug 160780.