Bug 36251

Summary: Turn on AUTO_VACUUM for all HTML5 DBs
Product: WebKit Reporter: Dumitru Daniliuc <dumi>
Component: New BugsAssignee: Dumitru Daniliuc <dumi>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, beidson, dglazkov, eric, levin, michaeln, sam, webkit.review.bot, yuzo
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
patch
dumi: commit-queue-
patch
mrowe: review-, dumi: commit-queue-
patch
dumi: commit-queue-
patch
dumi: commit-queue-
patch
dglazkov: review+, dumi: commit-queue-
patch levin: review+, dumi: commit-queue-

Description Dumitru Daniliuc 2010-03-17 16:30:36 PDT
It looks like having AUTO_VACUUM on for all databases is the best vacuuming solution we have right now. We should turn AUTO_VACUUM on whenever we open a database.
Comment 1 Dumitru Daniliuc 2010-03-17 16:41:16 PDT
Created attachment 50971 [details]
patch
Comment 2 WebKit Review Bot 2010-03-17 16:46:25 PDT
Attachment 50971 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/sql/SQLiteDatabase.cpp:376:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Dumitru Daniliuc 2010-03-17 17:01:50 PDT
Created attachment 50974 [details]
patch

Same patch, fixed indentation.
Comment 4 Mark Rowe (bdash) 2010-03-17 17:20:31 PDT
AUTO_VACUUM = FULL results in freelist pages being moved to the end of the database file and the database file being truncated after every transaction commits.  This increases the amount of I/O for each transaction, in some cases by a significant amount.  We’ve found that in various places on Mac OS X this results in an unacceptable performance hit when mutating the database, and have moved from using AUTO_VACUUM = FULL to AUTO_VACUUM = INCREMENTAL and incremental vacuums scheduled based on the number of pages in SQLite’s free list.  Is there some reason that an approach similar to this would not be suitable for HTML 5 database storage, or that the “FULL” setting is preferable to this approach?

Running a VACUUM command when opening a database will also result in a substantial amount of I/O, and may take some time to complete.  Is it ok to have the the database thread tied up for that entire time?
Comment 5 Mark Rowe (bdash) 2010-03-17 17:35:24 PDT
I also noticed that this bug is specifically about HTML 5 databases.  SQLiteDatabase is used for things other than the various HTML 5 database APIs.  It’s also used for the icon database, application cache, and other areas.  It seems wrong to me that a change in our policy for handling of HTML 5 databases would impact the manner in which these other features use SQLite.  Some of those features also open their databases on the main thread (ick, not good!), which makes the idea of performing the very expensive VACUUM operation during open a much more serious concern than if it were limited to situations in which databases were always opened in background threads.
Comment 6 Mark Rowe (bdash) 2010-03-17 17:35:51 PDT
Comment on attachment 50974 [details]
patch

Marking as r- for the reasons mentioned in my latest comment.
Comment 7 Dumitru Daniliuc 2010-03-17 18:25:28 PDT
Created attachment 50988 [details]
patch

Changed the patch to turn on AUTO_VACUUM only for HTML5 DBs.

According to sqlite engineers, the overhead of AUTO_VACUUM = FULL should be unnoticeable. Do you have a benchmark or a usage pattern that makes this overhead significant on Mac OS X? I'll try to run some tests with AUTO_VACUUM = FULL and AUTO_VACUUM = NONE and see what numbers I get. If we have to, we could set AUTO_VACUUM = INCREMENTAL and call PRAGMA incremental_vacuum when freelist_count > N, but I'd rather have sqlite take care of everything for us, if possible.

As far as the VACUUM call goes, I don't think it's a problem. According to Richard Hipp, when sqlite opens a database, it sets the AUTO_VACUUM flag to whatever value it had when the DB was last updated. So for new databases, this VACUUM call is pretty much a no-op. And for existing databases, we will run VACUUM at most once per database, to "convert" it from AUTO_VACUUM = NONE to AUTO_VACUUM = FULL. Taking into account that the default quota limit in Safari and Chromium is set to 5MB right now, I don't expect any user to have databases bigger than a few MB at this time. Running VACUUM on such a database will take a couple of seconds, which I think is an acceptable one-time performance hit.
Comment 8 WebKit Review Bot 2010-03-17 18:28:15 PDT
Attachment 50988 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/ChangeLog:12:  Line contains tab character.  [whitespace/tab] [5]
WebCore/ChangeLog:13:  Line contains tab character.  [whitespace/tab] [5]
Total errors found: 2 in 4 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 9 Mark Rowe (bdash) 2010-03-17 19:24:48 PDT
(In reply to comment #7)
> Created an attachment (id=50988) [details]
> patch
> 
> Changed the patch to turn on AUTO_VACUUM only for HTML5 DBs.
> 
> According to sqlite engineers, the overhead of AUTO_VACUUM = FULL should be
> unnoticeable. Do you have a benchmark or a usage pattern that makes this
> overhead significant on Mac OS X?

I can’t share any data from specific uses of SQLite, but I just tried a simple experiment on WebKit’s icon database.  I set up copies of it with AUTO_VACUUM set to OFF, FULL and INCREMENTAL.  I then ran simple queries that deleted a percentage of rows from the database.  The performance between OFF and INCREMENTAL was effectively identical.  FULL is slower by between 7 and 30% depending on the average size of the rows that are being deleted. This is an admittedly contrived case.  I didn’t bother to test inserting new rows, but based on the described behavior of AUTO_VACUUM the only case it should affect is data being removed.

> As far as the VACUUM call goes, I don't think it's a problem. According to
> Richard Hipp, when sqlite opens a database, it sets the AUTO_VACUUM flag to
> whatever value it had when the DB was last updated. So for new databases, this
> VACUUM call is pretty much a no-op. And for existing databases, we will run
> VACUUM at most once per database, to "convert" it from AUTO_VACUUM = NONE to
> AUTO_VACUUM = FULL. Taking into account that the default quota limit in Safari
> and Chromium is set to 5MB right now, I don't expect any user to have databases
> bigger than a few MB at this time. Running VACUUM on such a database will take
> a couple of seconds, which I think is an acceptable one-time performance hit.

The first-time conversion hit is the case that I was concerned about.  If this is restricted to HTML 5 Databases and we’re certain that they are only ever opened on background threads then I agree that the work VACUUM does is probably acceptable.
Comment 10 Dumitru Daniliuc 2010-03-19 21:17:11 PDT
I ran the following simple test:

CREATE TABLE Test (Data Text);
insertString = random string of length 1024
updateString = random string of length 1024
for (i = 0; i < 100; i++) {
  insert insertString i times in the Test table;
  update all insertStrings to updateStrings;
  delete all updateStrings; // this empties the database
}

and measured the total time for all insert transactions, all update transactions and all delete transactions. I ran this test as an HTML5 DB test on Safari/win and Chromium/win, and as a standalone test (using direct calls to sqlite functions) on Linux (the loop ran 500 times instead of 100). Here are the approximate results I got (total_insert_time/total_update_time/total_delete_time, in milliseconds):

Safari, AUTO_VACUUM = NONE: 6900/5600/2300
Safari, AUTO_VACUUM = FULL: 4200/5200/14000
Chromium, AUTO_VACUUM = NONE: 4000/3900/4400
Chromium, AUTO_VACUUM = FULL: 2500/3600/6200
Linux test, AUTO_VACUUM = NONE: 18000/21000/6000
Linux test, AUTO_VACUUM = FULL: 12000/14000/12000

I didn't have a Mac machine handy to run this test on, but if you're expecting to see different numbers, I'm sure I can find one.

Looking at these numbers, it seems to me that having AUTO_VACUUM = FULL will add a significant overhead to DELETE operations. However, the inserts (and even updates, to a lesser degree) that follow after a DELETE will be significantly faster. I'm not sure why this is so: at the end of each loop iteration the database is empty, so you'd think that inserting a bunch of data should be fast in both cases. However, these numbers seem to suggest that creating new pages is faster than reusing the existing empty ones.

Now it seems to me that a typical web app will SELECT things from the database most of the time, will INSERT/UPDATE data sometimes, and will DELETE data rarely. Based on this assumption and the numbers I got, I believe we should set AUTO_VACUUM to FULL. That will make sure that our database files are always as small as possible, and, on average, I think we should break even performance-wise.

Mark, what's your opinion on this? I know the test I ran is not perfect: if you want me to run a different test, let me know and I'll run it on all 3 platforms.
Comment 11 Dumitru Daniliuc 2010-03-26 19:02:10 PDT
Created attachment 51805 [details]
patch

Mark, I think this patch is ready for review.

Changes in the latest patch:
1. Fix SQLiteStatement::prepare() to work correctly with sqlite libraries < 3.6.16.
2. Turn on AUTO_VACUUM = 2 for all HTML5 databases.
3. When opening a database, if AUTO_VACUUM != 2, set it to 2 and VACUUM that database. This is a one-time performance hit per database, which should be acceptable.
4. Check the number of free pages at the end of every transaction and force a vacuum only if at least 10% of all pages are free. According to the number posted in my previous comments, calling PRAGMA incremental_vacuum could take a non-negligible amount of time; however, the next few operations on that database should be significantly faster too. In the long run, I believe the overhead of vacuuming the database when needed should be negligible.
Comment 12 Michael Nordman 2010-03-29 13:32:40 PDT
166 int64_t SQLiteDatabase::totalSize()
167 {
168     MutexLocker locker(m_authorizerLock);
169     enableAuthorizer(false);
170     SQLiteStatement statement(*this, "PRAGMA page_count");
171     int64_t size = statement.getColumnInt64(0) * pageSize();
172 
173     enableAuthorizer(true);
174     return size;
175 }

The pageSize() method also locks m_autorizerLock. Are webcore Mutex's recursive? It looks like the windows impl may be (by virtue of CRITCIAL_SECTION), but I don't see the pthread mutex type being set to RECURSIVE anywhere for the pthread impl.

419     switch (autoVacuumMode) {
 420     case 2: // AUTO_VACUUM is set to INCREMENTAL, no need to do anything
 421         return true;
 422     case 1: // AUTO_VACUUM is set to FULL, change it to INCREMENTAL
 423         return executeCommand("PRAGMA auto_vacuum = 2");
 424     case 0: // AUTO_VACUUM is set to NONE, set it to INCREMENTAL
 425     default: // AUTO_VACUUM is set to some unknown value, set it to INCREMENTAL
 426         if (!executeCommand("PRAGMA auto_vacuum = 2"))
 427             return false;
 428         runVacuumCommand();
 429         error = lastError();
 430         return ((error == SQLITE_OK) || (error = SQLITE_BUSY));
 431     }

Some constant definitions for these integer values would be nice and alleviate the need for the comment on each line. Something along the lines of what was done for setSynchronous could work.

    // The SQLite SYNCHRONOUS pragma can be either FULL, NORMAL, or OFF
    // FULL - Any writing calls to the DB block until the data is actually on the disk surface
    // NORMAL - SQLite pauses at some critical moments when writing, but much less than FULL
    // OFF - Calls return immediately after the data has been passed to disk
    enum SynchronousPragma { SyncOff = 0, SyncNormal = 1, SyncFull = 2 };
    void setSynchronous(SynchronousPragma);

 404 bool SQLiteDatabase::turnOnIncrementalAutoVacuum()

Feels awkward to have this method return true in cases where it didn't set the pragma. Similar pragma setting methods (setSynchronous(), setFullsync(), setMaximumSize()) don't return a success or error value at all.
Comment 13 Dumitru Daniliuc 2010-03-29 17:36:35 PDT
Created attachment 51987 [details]
patch

Addressed Michael's comments.

I refactored a few SQLiteStatement methods to make sure we don't try to recursively lock on m_authorizerLock.
Comment 14 Dimitri Glazkov (Google) 2010-04-08 08:25:15 PDT
This seems reasonable. I am ready to r+ unless there are any objections from Mark.
Comment 15 Dimitri Glazkov (Google) 2010-04-14 14:20:53 PDT
Comment on attachment 51987 [details]
patch

ok.
Comment 16 Dumitru Daniliuc 2010-04-26 15:59:04 PDT
Landed as r58270.
Comment 17 WebKit Review Bot 2010-04-26 16:55:21 PDT
http://trac.webkit.org/changeset/58270 might have broken SnowLeopard Intel Release (Tests)
Comment 18 Yuzo Fujishima 2010-04-26 19:13:15 PDT
Committed r58277: <http://trac.webkit.org/changeset/58277>
Comment 19 Dumitru Daniliuc 2010-04-27 17:29:46 PDT
Created attachment 54482 [details]
patch

r58270 + r58274. Chromium's side fix was landed, so it should be safe to re-land these patches.
Comment 20 David Levin 2010-04-27 17:42:47 PDT
Comment on attachment 54482 [details]
patch

This is basically the same patch that dglazkov gave an r+ to, but relanding after a fix to the chromium test_shell was done, so I'm basically re-adding dglazkov's r+.



> Index: WebCore/storage/SQLTransaction.cpp
> ===================================================================
> --- WebCore/storage/SQLTransaction.cpp	(revision 58335)
> +++ WebCore/storage/SQLTransaction.cpp	(working copy)
> @@ -466,6 +466,9 @@ void SQLTransaction::postflightAndCommit
>          return;
>      }
>  
> +    // The commit was successful, vacuum the database if needed

Consider: The commit was successful, so vacuum the database if needed.
Comment 21 Dumitru Daniliuc 2010-04-27 17:53:41 PDT
(In reply to comment #20)
> Consider: The commit was successful, so vacuum the database if needed.

done.
Comment 22 Dumitru Daniliuc 2010-04-27 18:24:51 PDT
Re-landed as r58366.