RESOLVED FIXED Bug 36251
Turn on AUTO_VACUUM for all HTML5 DBs
https://bugs.webkit.org/show_bug.cgi?id=36251
Summary Turn on AUTO_VACUUM for all HTML5 DBs
Dumitru Daniliuc
Reported 2010-03-17 16:30:36 PDT
It looks like having AUTO_VACUUM on for all databases is the best vacuuming solution we have right now. We should turn AUTO_VACUUM on whenever we open a database.
Attachments
patch (2.37 KB, patch)
2010-03-17 16:41 PDT, Dumitru Daniliuc
dumi: commit-queue-
patch (2.33 KB, patch)
2010-03-17 17:01 PDT, Dumitru Daniliuc
mrowe: review-
dumi: commit-queue-
patch (2.63 KB, patch)
2010-03-17 18:25 PDT, Dumitru Daniliuc
dumi: commit-queue-
patch (deleted)
2010-03-26 19:02 PDT, Dumitru Daniliuc
dumi: commit-queue-
patch (10.26 KB, patch)
2010-03-29 17:36 PDT, Dumitru Daniliuc
dglazkov: review+
dumi: commit-queue-
patch (14.74 KB, patch)
2010-04-27 17:29 PDT, Dumitru Daniliuc
levin: review+
dumi: commit-queue-
Dumitru Daniliuc
Comment 1 2010-03-17 16:41:16 PDT
WebKit Review Bot
Comment 2 2010-03-17 16:46:25 PDT
Attachment 50971 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/platform/sql/SQLiteDatabase.cpp:376: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Dumitru Daniliuc
Comment 3 2010-03-17 17:01:50 PDT
Created attachment 50974 [details] patch Same patch, fixed indentation.
Mark Rowe (bdash)
Comment 4 2010-03-17 17:20:31 PDT
AUTO_VACUUM = FULL results in freelist pages being moved to the end of the database file and the database file being truncated after every transaction commits. This increases the amount of I/O for each transaction, in some cases by a significant amount. We’ve found that in various places on Mac OS X this results in an unacceptable performance hit when mutating the database, and have moved from using AUTO_VACUUM = FULL to AUTO_VACUUM = INCREMENTAL and incremental vacuums scheduled based on the number of pages in SQLite’s free list. Is there some reason that an approach similar to this would not be suitable for HTML 5 database storage, or that the “FULL” setting is preferable to this approach? Running a VACUUM command when opening a database will also result in a substantial amount of I/O, and may take some time to complete. Is it ok to have the the database thread tied up for that entire time?
Mark Rowe (bdash)
Comment 5 2010-03-17 17:35:24 PDT
I also noticed that this bug is specifically about HTML 5 databases. SQLiteDatabase is used for things other than the various HTML 5 database APIs. It’s also used for the icon database, application cache, and other areas. It seems wrong to me that a change in our policy for handling of HTML 5 databases would impact the manner in which these other features use SQLite. Some of those features also open their databases on the main thread (ick, not good!), which makes the idea of performing the very expensive VACUUM operation during open a much more serious concern than if it were limited to situations in which databases were always opened in background threads.
Mark Rowe (bdash)
Comment 6 2010-03-17 17:35:51 PDT
Comment on attachment 50974 [details] patch Marking as r- for the reasons mentioned in my latest comment.
Dumitru Daniliuc
Comment 7 2010-03-17 18:25:28 PDT
Created attachment 50988 [details] patch Changed the patch to turn on AUTO_VACUUM only for HTML5 DBs. According to sqlite engineers, the overhead of AUTO_VACUUM = FULL should be unnoticeable. Do you have a benchmark or a usage pattern that makes this overhead significant on Mac OS X? I'll try to run some tests with AUTO_VACUUM = FULL and AUTO_VACUUM = NONE and see what numbers I get. If we have to, we could set AUTO_VACUUM = INCREMENTAL and call PRAGMA incremental_vacuum when freelist_count > N, but I'd rather have sqlite take care of everything for us, if possible. As far as the VACUUM call goes, I don't think it's a problem. According to Richard Hipp, when sqlite opens a database, it sets the AUTO_VACUUM flag to whatever value it had when the DB was last updated. So for new databases, this VACUUM call is pretty much a no-op. And for existing databases, we will run VACUUM at most once per database, to "convert" it from AUTO_VACUUM = NONE to AUTO_VACUUM = FULL. Taking into account that the default quota limit in Safari and Chromium is set to 5MB right now, I don't expect any user to have databases bigger than a few MB at this time. Running VACUUM on such a database will take a couple of seconds, which I think is an acceptable one-time performance hit.
WebKit Review Bot
Comment 8 2010-03-17 18:28:15 PDT
Attachment 50988 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 WebCore/ChangeLog:12: Line contains tab character. [whitespace/tab] [5] WebCore/ChangeLog:13: Line contains tab character. [whitespace/tab] [5] Total errors found: 2 in 4 files If any of these errors are false positives, please file a bug against check-webkit-style.
Mark Rowe (bdash)
Comment 9 2010-03-17 19:24:48 PDT
(In reply to comment #7) > Created an attachment (id=50988) [details] > patch > > Changed the patch to turn on AUTO_VACUUM only for HTML5 DBs. > > According to sqlite engineers, the overhead of AUTO_VACUUM = FULL should be > unnoticeable. Do you have a benchmark or a usage pattern that makes this > overhead significant on Mac OS X? I can’t share any data from specific uses of SQLite, but I just tried a simple experiment on WebKit’s icon database. I set up copies of it with AUTO_VACUUM set to OFF, FULL and INCREMENTAL. I then ran simple queries that deleted a percentage of rows from the database. The performance between OFF and INCREMENTAL was effectively identical. FULL is slower by between 7 and 30% depending on the average size of the rows that are being deleted. This is an admittedly contrived case. I didn’t bother to test inserting new rows, but based on the described behavior of AUTO_VACUUM the only case it should affect is data being removed. > As far as the VACUUM call goes, I don't think it's a problem. According to > Richard Hipp, when sqlite opens a database, it sets the AUTO_VACUUM flag to > whatever value it had when the DB was last updated. So for new databases, this > VACUUM call is pretty much a no-op. And for existing databases, we will run > VACUUM at most once per database, to "convert" it from AUTO_VACUUM = NONE to > AUTO_VACUUM = FULL. Taking into account that the default quota limit in Safari > and Chromium is set to 5MB right now, I don't expect any user to have databases > bigger than a few MB at this time. Running VACUUM on such a database will take > a couple of seconds, which I think is an acceptable one-time performance hit. The first-time conversion hit is the case that I was concerned about. If this is restricted to HTML 5 Databases and we’re certain that they are only ever opened on background threads then I agree that the work VACUUM does is probably acceptable.
Dumitru Daniliuc
Comment 10 2010-03-19 21:17:11 PDT
I ran the following simple test: CREATE TABLE Test (Data Text); insertString = random string of length 1024 updateString = random string of length 1024 for (i = 0; i < 100; i++) { insert insertString i times in the Test table; update all insertStrings to updateStrings; delete all updateStrings; // this empties the database } and measured the total time for all insert transactions, all update transactions and all delete transactions. I ran this test as an HTML5 DB test on Safari/win and Chromium/win, and as a standalone test (using direct calls to sqlite functions) on Linux (the loop ran 500 times instead of 100). Here are the approximate results I got (total_insert_time/total_update_time/total_delete_time, in milliseconds): Safari, AUTO_VACUUM = NONE: 6900/5600/2300 Safari, AUTO_VACUUM = FULL: 4200/5200/14000 Chromium, AUTO_VACUUM = NONE: 4000/3900/4400 Chromium, AUTO_VACUUM = FULL: 2500/3600/6200 Linux test, AUTO_VACUUM = NONE: 18000/21000/6000 Linux test, AUTO_VACUUM = FULL: 12000/14000/12000 I didn't have a Mac machine handy to run this test on, but if you're expecting to see different numbers, I'm sure I can find one. Looking at these numbers, it seems to me that having AUTO_VACUUM = FULL will add a significant overhead to DELETE operations. However, the inserts (and even updates, to a lesser degree) that follow after a DELETE will be significantly faster. I'm not sure why this is so: at the end of each loop iteration the database is empty, so you'd think that inserting a bunch of data should be fast in both cases. However, these numbers seem to suggest that creating new pages is faster than reusing the existing empty ones. Now it seems to me that a typical web app will SELECT things from the database most of the time, will INSERT/UPDATE data sometimes, and will DELETE data rarely. Based on this assumption and the numbers I got, I believe we should set AUTO_VACUUM to FULL. That will make sure that our database files are always as small as possible, and, on average, I think we should break even performance-wise. Mark, what's your opinion on this? I know the test I ran is not perfect: if you want me to run a different test, let me know and I'll run it on all 3 platforms.
Dumitru Daniliuc
Comment 11 2010-03-26 19:02:10 PDT
Created attachment 51805 [details] patch Mark, I think this patch is ready for review. Changes in the latest patch: 1. Fix SQLiteStatement::prepare() to work correctly with sqlite libraries < 3.6.16. 2. Turn on AUTO_VACUUM = 2 for all HTML5 databases. 3. When opening a database, if AUTO_VACUUM != 2, set it to 2 and VACUUM that database. This is a one-time performance hit per database, which should be acceptable. 4. Check the number of free pages at the end of every transaction and force a vacuum only if at least 10% of all pages are free. According to the number posted in my previous comments, calling PRAGMA incremental_vacuum could take a non-negligible amount of time; however, the next few operations on that database should be significantly faster too. In the long run, I believe the overhead of vacuuming the database when needed should be negligible.
Michael Nordman
Comment 12 2010-03-29 13:32:40 PDT
166 int64_t SQLiteDatabase::totalSize() 167 { 168 MutexLocker locker(m_authorizerLock); 169 enableAuthorizer(false); 170 SQLiteStatement statement(*this, "PRAGMA page_count"); 171 int64_t size = statement.getColumnInt64(0) * pageSize(); 172 173 enableAuthorizer(true); 174 return size; 175 } The pageSize() method also locks m_autorizerLock. Are webcore Mutex's recursive? It looks like the windows impl may be (by virtue of CRITCIAL_SECTION), but I don't see the pthread mutex type being set to RECURSIVE anywhere for the pthread impl. 419 switch (autoVacuumMode) { 420 case 2: // AUTO_VACUUM is set to INCREMENTAL, no need to do anything 421 return true; 422 case 1: // AUTO_VACUUM is set to FULL, change it to INCREMENTAL 423 return executeCommand("PRAGMA auto_vacuum = 2"); 424 case 0: // AUTO_VACUUM is set to NONE, set it to INCREMENTAL 425 default: // AUTO_VACUUM is set to some unknown value, set it to INCREMENTAL 426 if (!executeCommand("PRAGMA auto_vacuum = 2")) 427 return false; 428 runVacuumCommand(); 429 error = lastError(); 430 return ((error == SQLITE_OK) || (error = SQLITE_BUSY)); 431 } Some constant definitions for these integer values would be nice and alleviate the need for the comment on each line. Something along the lines of what was done for setSynchronous could work. // The SQLite SYNCHRONOUS pragma can be either FULL, NORMAL, or OFF // FULL - Any writing calls to the DB block until the data is actually on the disk surface // NORMAL - SQLite pauses at some critical moments when writing, but much less than FULL // OFF - Calls return immediately after the data has been passed to disk enum SynchronousPragma { SyncOff = 0, SyncNormal = 1, SyncFull = 2 }; void setSynchronous(SynchronousPragma); 404 bool SQLiteDatabase::turnOnIncrementalAutoVacuum() Feels awkward to have this method return true in cases where it didn't set the pragma. Similar pragma setting methods (setSynchronous(), setFullsync(), setMaximumSize()) don't return a success or error value at all.
Dumitru Daniliuc
Comment 13 2010-03-29 17:36:35 PDT
Created attachment 51987 [details] patch Addressed Michael's comments. I refactored a few SQLiteStatement methods to make sure we don't try to recursively lock on m_authorizerLock.
Dimitri Glazkov (Google)
Comment 14 2010-04-08 08:25:15 PDT
This seems reasonable. I am ready to r+ unless there are any objections from Mark.
Dimitri Glazkov (Google)
Comment 15 2010-04-14 14:20:53 PDT
Comment on attachment 51987 [details] patch ok.
Dumitru Daniliuc
Comment 16 2010-04-26 15:59:04 PDT
Landed as r58270.
WebKit Review Bot
Comment 17 2010-04-26 16:55:21 PDT
http://trac.webkit.org/changeset/58270 might have broken SnowLeopard Intel Release (Tests)
Yuzo Fujishima
Comment 18 2010-04-26 19:13:15 PDT
Dumitru Daniliuc
Comment 19 2010-04-27 17:29:46 PDT
Created attachment 54482 [details] patch r58270 + r58274. Chromium's side fix was landed, so it should be safe to re-land these patches.
David Levin
Comment 20 2010-04-27 17:42:47 PDT
Comment on attachment 54482 [details] patch This is basically the same patch that dglazkov gave an r+ to, but relanding after a fix to the chromium test_shell was done, so I'm basically re-adding dglazkov's r+. > Index: WebCore/storage/SQLTransaction.cpp > =================================================================== > --- WebCore/storage/SQLTransaction.cpp (revision 58335) > +++ WebCore/storage/SQLTransaction.cpp (working copy) > @@ -466,6 +466,9 @@ void SQLTransaction::postflightAndCommit > return; > } > > + // The commit was successful, vacuum the database if needed Consider: The commit was successful, so vacuum the database if needed.
Dumitru Daniliuc
Comment 21 2010-04-27 17:53:41 PDT
(In reply to comment #20) > Consider: The commit was successful, so vacuum the database if needed. done.
Dumitru Daniliuc
Comment 22 2010-04-27 18:24:51 PDT
Re-landed as r58366.
Note You need to log in before you can comment on or make changes to this bug.