Bug 110805 - webdatabase: Need more robust WebSQL disk usage computation.
Summary: webdatabase: Need more robust WebSQL disk usage computation.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks: 110600
  Show dependency treegraph
 
Reported: 2013-02-25 14:36 PST by Mark Lam
Modified: 2013-02-25 21:11 PST (History)
6 users (show)

See Also:


Attachments
disk stat speed test (4.89 KB, text/html)
2013-02-25 14:36 PST, Mark Lam
no flags Details
the fix. (5.76 KB, patch)
2013-02-25 16:05 PST, Mark Lam
buildbot: commit-queue-
Details | Formatted Diff | Diff
For consuming storage space to test the quota. (11.48 KB, text/html)
2013-02-25 16:07 PST, Mark Lam
no flags Details
updated fix patch. (50.84 KB, patch)
2013-02-25 20:02 PST, Mark Lam
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2013-02-25 14:36:56 PST
Created attachment 190129 [details]
disk stat speed test

OriginUsageRecord::diskUsage() is used to provide the total disk usage size of all databases in a specified origin.  The current implementation relies on a cache of database sizes that the OriginUsageRecord tracks.  The OriginUsageRecord is notified whenever any database activity occurs (within the same process) that may change the size of the database.  OriginUsageRecord::diskUsage() then checks which databases have changed, and updates their sizes by fetching the actual size from the file system.  Thereafter, OriginUsageRecord::diskUsage() sums up the sizes of the databases from the specified origin and return the sum as its result.

This strategy works as long as there will only be one process that will be creating and tracking databases.  In a multi-process environment, the in-process cached sizes can be easy out-dated by other processes that may be creating or adding to databases in the same origin.  This is because the activity of other processes will not trigger an update of the cached sizes in this process.

One solution to this issue is simply to do without the cached sizes and go fetch the actual size from the file system every time.  Intuitively, the file sizes should be cached in data structures in the OS' memory.  Hence, the act of fetching the file size should not incur that much overhead.  Here are some test results based on running the following test:

The test
=====
1. Create 10 databases (differently named, of course) from the same origin, and initialize them with some minimal data.
2. Measure the time it takes to do 1000 repetitions of adding a small record to only 1 of those databases.
Note: each measured unit of time is taken from a transaction cycle from just before the disk usage is computed (which slightly precedes the start of the transaction), and ends right after the transaction is committed.  

The intent here is that the disk usage should be computed by summing up the sizes of the 10 databases.  Meanwhile, oily 1 of those 10 databases is being mutated with a small write.  For the scenario where we used cached sizes, we should only be fetching the size of 1 database from the file system, while the other 9 (being unchanged) comes from the cache.  For the scenario where we fetch the sizes from the file system, we will be doing 10 file size fetches per iteration.

The results
=======
These measurements are done on a MacBook Pro (2.4 GHz core i7) with a SSD.  Times are in microseconds.

                                        Run1         Run2        Run3       |    Average
                                        ===         ===         ===        |   =====
Baseline (using a cache):   1491.97   1559.09   1483.12   |   1511.39
No cache:                         1688.97   1828.95   1656.79   |   1724.90

The difference is: the no cache approach is 1.14x slower (a difference of 214 microseconds).  That is for 10 vs 1 database size fetches.  The difference per additional database (when there is more than 1) is 1/9th of that i.e. 0.016 slower or 23 microseconds.

These results show that the cost of not using a cache is insignificant, while we get the benefit of having a more reliable read on the disk usage of databases for a given origin.

The test file disk-stat-speed-test.html is attached.  To run the test,
1. Do a debug build of webkit and add instrumentation to the transaction code to measure the time from before the call to DatabaseTracker::getMaxSizeForDatabase() till after m_sqliteTransaction->commit().  Print the delta time (and other stats: total, average) after measuring end time.
2. Delete all databases from origin localhost.
3. Serve up the page on localhost.
4. Run the debug version of webkit and load the test page.
5. When the test starts, it will present you with a "Start" button in an alert box.  Click on start and pay attention to your console to see the time values.  When the test ends, you will get an alert box with a "End" button.
Comment 1 Mark Lam 2013-02-25 16:05:29 PST
Created attachment 190146 [details]
the fix.
Comment 2 Mark Lam 2013-02-25 16:07:06 PST
Created attachment 190147 [details]
For consuming storage space to test the quota.
Comment 3 Mark Lam 2013-02-25 16:14:10 PST
Comment on attachment 190146 [details]
the fix.

View in context: https://bugs.webkit.org/attachment.cgi?id=190146&action=review

> Source/WebCore/ChangeLog:8
> +        cached file size values.

Maybe I should add a comment to say that this is #ifdef'ed out for chromium because listDirectory() if not implemented in FileSystemChromium.cpp.
Comment 4 Michael Nordman 2013-02-25 16:27:56 PST
Comment on attachment 190146 [details]
the fix.

View in context: https://bugs.webkit.org/attachment.cgi?id=190146&action=review

> Source/WebCore/Modules/webdatabase/DatabaseTracker.cpp:297
> +#if !PLATFORM(CHROMIUM)

This file is not compiled in chromium so there shouldn't be a need for this guard.

> Source/WebCore/Modules/webdatabase/OriginUsageRecord.cpp:39
> +#if PLATFORM(CHROMIUM)

Ditto, not compiled in chromium.
Comment 5 Build Bot 2013-02-25 16:31:55 PST
Comment on attachment 190146 [details]
the fix.

Attachment 190146 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://queues.webkit.org/results/16752526
Comment 6 Mark Lam 2013-02-25 16:34:22 PST
Comment on attachment 190146 [details]
the fix.

I'll redo the patch since chromium doesn't need the deprecated code, and to fix the release build failure.
Comment 7 Geoffrey Garen 2013-02-25 17:25:59 PST
Comment on attachment 190146 [details]
the fix.

View in context: https://bugs.webkit.org/attachment.cgi?id=190146&action=review

Would be nice to get rid of all the quota cache code, since it's defunct now.

> Source/WebCore/ChangeLog:11
> +        The performance difference for this changes is a 1.6% degradation
> +        per additional database whose size needs to be fetched. 

You should clarify that this is a worst-case performance difference. Your benchmark measured stat overhead vs ~16 bytes of storage. Most transactions do more work than that.
Comment 8 Mark Lam 2013-02-25 20:02:37 PST
Created attachment 190190 [details]
updated fix patch.
Comment 9 Geoffrey Garen 2013-02-25 20:09:58 PST
Comment on attachment 190190 [details]
updated fix patch.

r=me
Comment 10 Mark Lam 2013-02-25 21:11:53 PST
Thanks for the review.  Landed in r144006: <http://trac.webkit.org/changeset/144006>.