Created attachment 190129 [details] disk stat speed test OriginUsageRecord::diskUsage() is used to provide the total disk usage size of all databases in a specified origin. The current implementation relies on a cache of database sizes that the OriginUsageRecord tracks. The OriginUsageRecord is notified whenever any database activity occurs (within the same process) that may change the size of the database. OriginUsageRecord::diskUsage() then checks which databases have changed, and updates their sizes by fetching the actual size from the file system. Thereafter, OriginUsageRecord::diskUsage() sums up the sizes of the databases from the specified origin and return the sum as its result. This strategy works as long as there will only be one process that will be creating and tracking databases. In a multi-process environment, the in-process cached sizes can be easy out-dated by other processes that may be creating or adding to databases in the same origin. This is because the activity of other processes will not trigger an update of the cached sizes in this process. One solution to this issue is simply to do without the cached sizes and go fetch the actual size from the file system every time. Intuitively, the file sizes should be cached in data structures in the OS' memory. Hence, the act of fetching the file size should not incur that much overhead. Here are some test results based on running the following test: The test ===== 1. Create 10 databases (differently named, of course) from the same origin, and initialize them with some minimal data. 2. Measure the time it takes to do 1000 repetitions of adding a small record to only 1 of those databases. Note: each measured unit of time is taken from a transaction cycle from just before the disk usage is computed (which slightly precedes the start of the transaction), and ends right after the transaction is committed. The intent here is that the disk usage should be computed by summing up the sizes of the 10 databases. Meanwhile, oily 1 of those 10 databases is being mutated with a small write. For the scenario where we used cached sizes, we should only be fetching the size of 1 database from the file system, while the other 9 (being unchanged) comes from the cache. For the scenario where we fetch the sizes from the file system, we will be doing 10 file size fetches per iteration. The results ======= These measurements are done on a MacBook Pro (2.4 GHz core i7) with a SSD. Times are in microseconds. Run1 Run2 Run3 | Average === === === | ===== Baseline (using a cache): 1491.97 1559.09 1483.12 | 1511.39 No cache: 1688.97 1828.95 1656.79 | 1724.90 The difference is: the no cache approach is 1.14x slower (a difference of 214 microseconds). That is for 10 vs 1 database size fetches. The difference per additional database (when there is more than 1) is 1/9th of that i.e. 0.016 slower or 23 microseconds. These results show that the cost of not using a cache is insignificant, while we get the benefit of having a more reliable read on the disk usage of databases for a given origin. The test file disk-stat-speed-test.html is attached. To run the test, 1. Do a debug build of webkit and add instrumentation to the transaction code to measure the time from before the call to DatabaseTracker::getMaxSizeForDatabase() till after m_sqliteTransaction->commit(). Print the delta time (and other stats: total, average) after measuring end time. 2. Delete all databases from origin localhost. 3. Serve up the page on localhost. 4. Run the debug version of webkit and load the test page. 5. When the test starts, it will present you with a "Start" button in an alert box. Click on start and pay attention to your console to see the time values. When the test ends, you will get an alert box with a "End" button.
Created attachment 190146 [details] the fix.
Created attachment 190147 [details] For consuming storage space to test the quota.
Comment on attachment 190146 [details] the fix. View in context: https://bugs.webkit.org/attachment.cgi?id=190146&action=review > Source/WebCore/ChangeLog:8 > + cached file size values. Maybe I should add a comment to say that this is #ifdef'ed out for chromium because listDirectory() if not implemented in FileSystemChromium.cpp.
Comment on attachment 190146 [details] the fix. View in context: https://bugs.webkit.org/attachment.cgi?id=190146&action=review > Source/WebCore/Modules/webdatabase/DatabaseTracker.cpp:297 > +#if !PLATFORM(CHROMIUM) This file is not compiled in chromium so there shouldn't be a need for this guard. > Source/WebCore/Modules/webdatabase/OriginUsageRecord.cpp:39 > +#if PLATFORM(CHROMIUM) Ditto, not compiled in chromium.
Comment on attachment 190146 [details] the fix. Attachment 190146 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://queues.webkit.org/results/16752526
Comment on attachment 190146 [details] the fix. I'll redo the patch since chromium doesn't need the deprecated code, and to fix the release build failure.
Comment on attachment 190146 [details] the fix. View in context: https://bugs.webkit.org/attachment.cgi?id=190146&action=review Would be nice to get rid of all the quota cache code, since it's defunct now. > Source/WebCore/ChangeLog:11 > + The performance difference for this changes is a 1.6% degradation > + per additional database whose size needs to be fetched. You should clarify that this is a worst-case performance difference. Your benchmark measured stat overhead vs ~16 bytes of storage. Most transactions do more work than that.
Created attachment 190190 [details] updated fix patch.
Comment on attachment 190190 [details] updated fix patch. r=me
Thanks for the review. Landed in r144006: <http://trac.webkit.org/changeset/144006>.