Bug 22765

Summary: Limit the size of local storage to a fixed quota.
Product: WebKit Reporter: Yael <yael>
Component: DOMAssignee: Nobody <webkit-unassigned>
Status: RESOLVED DUPLICATE    
Severity: Normal CC: beidson, benm, ddkilzer, hausmann, jorlow, kangax, laszlo.gombos
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: All   
Attachments:
Description Flags
Setting a limit on local storage size.
none
Update previous patch. none

Description Yael 2008-12-09 14:55:39 PST
Set a limit on the size of local storage per origin.
Comment 1 Yael 2008-12-09 14:59:12 PST
Created attachment 25899 [details]
Setting a limit on local storage size.

This patch is setting a fixed limit on the size of local storage.
Some issues are still left open e.g. should DatabaseTracker be modified to support LocalStorage quota or is LocalStorage an entity on its own.
Comment 2 Simon Hausmann 2008-12-16 03:03:59 PST
I can review the Qt bits, but I don't feel comfortable reviewing the local storage changes.
Comment 3 Yael 2009-01-11 14:10:55 PST
Created attachment 26616 [details]
Update previous patch.

This patch updates the previous patch. It adds a callback to the ChromeClient when a new LocalStorageArea is created, to get the quota for the given origin.
This patch also adds the ability to delete LocalStorage or SessionStorage selectively (by origin)
Comment 4 Simon Hausmann 2009-01-29 02:40:47 PST
Comment on attachment 26616 [details]
Update previous patch.

Brady, could you take a look at the non-Qt bits of this patch?

Thanks!
Comment 5 Michael Nordman 2009-05-28 14:36:26 PDT
> virtual unsigned long long quotaForLocalStorage(Frame*) = 0;

Keying off of the Frame* may be problematic given a world of workers. I'm guessing there are two distinct purposes for the frame argument.
1) Figure out what SecurityOrigin the frame is in, and return the quota for that origin
2) Figure out where to present UI (should that be needed)

If so, maybe have this take two arguments, one for each purpose, where the frame can be NULL in the worker case. 
Comment 6 Yael 2009-05-29 05:50:31 PDT
(In reply to comment #5)
> 
> If so, maybe have this take two arguments, one for each purpose, where the
> frame can be NULL in the worker case. 
> 

First off, Thank you for reviewing the patch.
Regarding the added argument, do you have an example of another place in the code where the Frame argument is duplicated? reviewers tend to reject patches if there are extra arguments without a really good reason. 
Thanks!
Comment 7 Michael Nordman 2009-05-29 12:03:45 PDT
I put a question back to you...

Why is this not quotaForLocalStorage(SecurityOrigin*) <period>. What does the Frame have to do with the quota for an origin?
Comment 8 Yael 2009-05-30 05:27:47 PDT
(In reply to comment #7)
> I put a question back to you...
> 
> Why is this not quotaForLocalStorage(SecurityOrigin*) <period>. What does the
> Frame have to do with the quota for an origin?
> 

I cannot assume that all ports will show a UI in this callback. Passing the Frame gives enough context and flexibility for the ports to do what they need. As an example, the same is done in exceededDatabaseQuota(Frame*, const String&);

When WebWorkers start supporting LocalStorage, this will need to be revisited, and an additional parameter could be added at that time.
Comment 9 Jeremy Orlow 2009-05-31 03:46:37 PDT
FYI LocalStorage within web workers is not part of the spec anymore, so that shouldn't be a concern.

Also FYI I'm currently finishing up a patch for refactoring DOM storage for use in multi-process browsers (https://bugs.webkit.org/show_bug.cgi?id=25376) which will definitely conflict with this patch.  We may need to coordinate if this doesn't get landed soon (and the review gods are favorable to me :)
Comment 10 Jeremy Orlow 2009-05-31 03:56:50 PDT
Ugh.  I can't 'officially' review this due to permissions issues...  I tried to cut the files (and some of the diffs) out that I didn't comment on.


I haven't looked at the SQL or locking side of this very closely yet.  In general, I worry that it's adding a lot of complexity....but I can't think of any better ways to do a lot of this.


> Index: WebCore/storage/StorageArea.cpp
> ===================================================================
> --- WebCore/storage/StorageArea.cpp	(revision 39799)
> +++ WebCore/storage/StorageArea.cpp	(working copy)
> @@ -36,12 +36,16 @@
>  StorageArea::StorageArea(SecurityOrigin* origin)
>      : m_securityOrigin(origin)
>      , m_storageMap(StorageMap::create())
> +    , m_quota(UINT_MAX) // default used for sessionStorage which is unlimited
> +    , m_deleted(false)
>  {
>  }
>  
>  StorageArea::StorageArea(SecurityOrigin* origin, StorageArea* area)
>      : m_securityOrigin(origin)
>      , m_storageMap(area->m_storageMap)
> +    , m_quota(area->m_quota)
> +    , m_deleted(area->m_deleted)
>  {
>  }
>  
> @@ -51,11 +55,16 @@
>  
>  unsigned StorageArea::internalLength() const
>  {
> +    if (m_deleted)
> +        return 0;
>      return m_storageMap->length();
>  }
>  
>  String StorageArea::internalKey(unsigned index, ExceptionCode& ec) const
>  {
> +    if (m_deleted)
> +        return String();
> +    
>      String key;
>      
>      if (!m_storageMap->key(index, key)) {
> @@ -68,21 +77,29 @@
>  
>  String StorageArea::internalGetItem(const String& key) const
>  {
> +    if (m_deleted)
> +        return String();
>      return m_storageMap->getItem(key);
>  }
>  
> -void StorageArea::internalSetItem(const String& key, const String& value, ExceptionCode&, Frame* frame)
> +void StorageArea::internalSetItem(const String& key, const String& value, ExceptionCode& ec, Frame* frame)
>  {
> +    if (m_deleted)
> +        return;
>      ASSERT(!value.isNull());
>      
> -    // FIXME: For LocalStorage where a disk quota will be enforced, here is where we need to do quota checking.
> -    //        If we decide to enforce a memory quota for SessionStorage, this is where we'd do that, also.
> -    // if (<over quota>) {
> -    //     ec = INVALID_ACCESS_ERR;
> -    //     return;
> -    // }
> -    
> -    String oldValue;   
> +    String oldValue = m_storageMap->getItem(key);   
> +    unsigned long long usage = m_storageMap->totalUsage();

I think your equality means to check if oldValue is null (i.e. not present).  Also you use "usage = usage +" where "usage +=" is probably clearer.  That said, I kind of feel like doing this calculation here and in the storage map is just asking for accounting errors in the future.  Maybe you could pass in the quota delta to storageMap, create a helper function that they both share, or have storageMap make quota decisions (and return ec when the limit is hit)?

> +    if (oldValue != value) {
> +        usage = usage + value.length() * sizeof(UChar) - oldValue.length() * sizeof(UChar);
> +    }
> +    else {
> +        usage += (key.length() + value.length()) * sizeof(UChar);
> +    }
> +    if (usage > m_quota) {
> +        ec = QUOTA_EXCEEDED_ERR;
> +        return;
> +    }
>      RefPtr<StorageMap> newMap = m_storageMap->setItem(key, value, oldValue);
>      
>      if (newMap)
> Index: WebCore/storage/SessionStorage.cpp
> ===================================================================
> --- WebCore/storage/SessionStorage.cpp	(revision 39799)
> +++ WebCore/storage/SessionStorage.cpp	(working copy)
> @@ -72,4 +72,17 @@
>      return storageArea.release();
>  }
>  
> +
> +void SessionStorage::origins(Vector<RefPtr<SecurityOrigin> >& result)
> +{
> +    copyKeysToVector(m_storageAreaMap, result);
>  }
> +
> +void SessionStorage::deleteStorageArea(SecurityOrigin* origin)
> +{
> +    RefPtr<SessionStorageArea> storageArea = m_storageAreaMap.take(origin);

Storage object might still refer to the storage area.  Basically any page that's currently open when you clear storage will just stop working correctly because of your m_delete checks.

Why is this necessary?  Can't you just clear the storageMap and schedule a sync of the database?

> +    if (storageArea)
> +        storageArea.get()->markDeleted();
> +}
> +
> +}
> Index: WebCore/storage/LocalStorage.cpp
> ===================================================================
> --- WebCore/storage/LocalStorage.cpp	(revision 39799)
> +++ WebCore/storage/LocalStorage.cpp	(working copy)
> @@ -26,6 +26,7 @@
>  #include "config.h"
>  #include "LocalStorage.h"
>  
> +#include "ChromeClient.h"
>  #include "CString.h"
>  #include "EventNames.h"
>  #include "FileSystem.h"
> @@ -34,6 +35,7 @@
>  #include "LocalStorageArea.h"
>  #include "Page.h"
>  #include "PageGroup.h"
> +#include "SQLiteStatement.h"
>  #include "StorageArea.h"
>  #include <wtf/StdLibExtras.h>
>  
> @@ -62,6 +64,7 @@
>  
>  LocalStorage::LocalStorage(const String& path)
>      : m_path(path.copy())
> +    , m_importComplete(false)
>  {
>      // If the path is empty, we know we're never going to be using the thread for anything, so don't start it.
>      // In the future, we might also want to consider removing it from the DOM in that case - <rdar://problem/5960470>
> @@ -76,6 +79,7 @@
>  LocalStorage::~LocalStorage()
>  {
>      ASSERT(localStorageMap().get(m_path) == this);
> +    m_database.close();
>      localStorageMap().remove(m_path);
>  }
>  
> @@ -83,32 +87,55 @@
>  {
>      ASSERT(isMainThread());
>  
> -    // FIXME: If the security origin in question has never had a storage area established,
> -    // we need to ask a client call if establishing it is okay.  If the client denies the request,
> -    // this method will return null.
> -    // The sourceFrame argument exists for the purpose of asking a client.
> -    // To know if an area has previously been established, we need to wait until this LocalStorage 
> -    // object has finished it's AreaImport task.
> -
> -
> -    // FIXME: If the storage area is being established for the first time here, we need to 
> -    // sync its existance and quota out to disk via an task of type AreaSync
> -
>      RefPtr<LocalStorageArea> storageArea;
>      if (storageArea = m_storageAreaMap.get(origin))
>          return storageArea.release();
> +    // wait for import to complete before we know if this origin already has a quota associated with it
> +    waitForImportComplete();
> +    
> +    unsigned long long quota;
> +    
> +    {
> +        MutexLocker importLock(m_importLock);
> +        quota =  m_securityOriginQuoteMap.get(origin);
> +    }
> +    
> +    Page* page = sourceFrame->page();
> +    ASSERT(page);
> +    
> +    bool syncRequired(false);

Isn't the norm for primitives to initialize with = ?

So if something has a 0 quota we're always going to check with the client?  We should probably only do this if it's not in the quota map, right?

> +    if (quota == 0) {
> +        quota = page->chrome()->client()->quotaForLocalStorage(sourceFrame);
> +        syncRequired = true;
> +    }
>          
> +    if (quota == 0)
> +        return 0;
> +    
>      storageArea = LocalStorageArea::create(origin, this);
> +    storageArea->setQuota(quota);
>      m_storageAreaMap.set(origin, storageArea);
> +    
> +    if (syncRequired)
> +    {
> +        {
> +            MutexLocker importLock(m_importLock);
> +            m_securityOriginQuoteMap.set(origin, quota);
> +        }
> +        
> +        {
> +            MutexLocker syncLock(m_syncLock);
> +            m_securityOriginsWaitingSync.set(origin, quota);
> +        }
> +        if (m_thread)
> +            m_thread->scheduleSync(this);
> +    }
> +        
>      return storageArea.release();
>  }
>  
>  String LocalStorage::fullDatabaseFilename(SecurityOrigin* origin)
>  {
> -    // FIXME: Once we actually track origin/quota entries to see which origins have local storage established,
> -    // we will return an empty path name if the origin isn't allowed to have LocalStorage.
> -    // We'll need to wait here until the AreaImport task to complete before making that decision.
> -
>      if (m_path.isEmpty())
>          return String();
>  
> @@ -124,18 +151,165 @@
>      return pathByAppendingComponent(m_path, origin->databaseIdentifier() + ".localstorage");
>  }
>  
> +void LocalStorage::origins(Vector<RefPtr<SecurityOrigin> >& result)
> +{
> +    ASSERT(isMainThread());
> +    waitForImportComplete();
> +    MutexLocker importLock(m_importLock);
> +    copyKeysToVector(m_securityOriginQuoteMap, result);
> +}
> +
> +void LocalStorage::deleteStorageArea(SecurityOrigin* origin)
> +{
> +    ASSERT(isMainThread());
> +    RefPtr<LocalStorageArea> storageArea = m_storageAreaMap.take(origin);
> +    if (!storageArea)
> +        return;
> +    
> +    // clear
> +    storageArea->scheduleClear();
> +    
> +    // schedule removing from index database
> +    {
> +        MutexLocker syncLock(m_syncLock);
> +        m_securityOriginsWaitingSync.set(origin, 0);
> +    }
> +    if (m_thread)
> +        m_thread->scheduleSync(this);
> +    
> +    // kill the storageArea thread 
> +    storageArea->scheduleFinalSync();
> +    
> +    //delete file in the file system
> +    deleteFile(fullDatabaseFilename(origin));
> +    
> +    // remove from quota map
> +    {
> +        MutexLocker importLock(m_importLock);
> +        m_securityOriginQuoteMap.take(origin);
> +    }    
> +}
> +
> +bool LocalStorage::openIndexDatabase()
> +{
> +    ASSERT(!isMainThread());
> +    ASSERT(!m_database.isOpen());
> +
> +    if (m_path.isEmpty())
> +        return false;
> +    
> +    if (!makeAllDirectories(m_path)) {
> +        LOG_ERROR("Unabled to create LocalStorage database path %s", m_path.utf8().data());
> +        return false;
> +    }
> +    
> +    String databaseFilename = pathByAppendingComponent(m_path, "localstorage.db");
> +    if (databaseFilename.isEmpty()) {
> +        LOG_ERROR("Filename for local storage index database is empty - local storage path %s", m_path.utf8().data());
> +        return false;
> +    }

This shouldn't be possible since we currently do not create a thread for storage when m_path is empty.  I think you can just use an assert.

> +
> +    if (!m_database.open(databaseFilename)) {
> +        LOG_ERROR("Failed to open index database file %s for local storage", databaseFilename.utf8().data());
> +        return false;
> +    }
> +
> +    if (!m_database.executeCommand("CREATE TABLE IF NOT EXISTS Origins (origin TEXT UNIQUE ON CONFLICT REPLACE, quota INTEGER NOT NULL ON CONFLICT FAIL);")) {

What's this on conflict fail business?  That only applies when you use UNIQUE, right?

> +        LOG_ERROR("Failed to create table Origins for local storage %s", m_path.utf8().data());
> +        return false;
> +    }
> +    return true;
> +}
> +
>  void LocalStorage::performImport()
>  {
>      ASSERT(!isMainThread());
>  
> -    // FIXME: Import all known local storage origins here along with their quotas
> +    // Import all known local storage origins here along with their quotas
> +    if (!openIndexDatabase()) {
> +        markImported();
> +        return;

I wonder if it's better for localStorage to operate as if privateBrowsing is true when import errors happen.  Otherwise transient failures could cause some really weird behaviors...especially if there were values in the database that we start to overwrite when we later sync.

> +    }
> +    
> +    SQLiteStatement query(m_database, "SELECT origin, quota FROM Origins");
> +    if (query.prepare() != SQLResultOk) {
> +        LOG_ERROR("Unable to select origins from Origins table for local storage %s", m_path.utf8().data());
> +        markImported();
> +        return;
> +    }
> +
> +    int result;
> +    SecurityOriginQuoteMap importedItemsMap;
> +    while ((result = query.step()) == SQLResultRow) {
> +        RefPtr<SecurityOrigin> origin = SecurityOrigin::createFromDatabaseIdentifier(query.getColumnText(0));
> +        importedItemsMap.set(origin.get(), query.getColumnInt64(1));
> +    }
> +    
> +    {
> +        MutexLocker importLock(m_importLock);
> +        m_securityOriginQuoteMap.swap(importedItemsMap);
> +    }
> +
> +    if (result != SQLResultDone) {
> +        LOG_ERROR("Error reading Origins from Origins table for local storage %s", m_path.utf8().data());
> +        markImported();
> +        return;
> +    }
> +    markImported();
>  }
>  
>  void LocalStorage::performSync()
>  {
>      ASSERT(!isMainThread());
>  
> -    // FIXME: Write out new origins and quotas here
> +    // Write out new origins and quotas here
> +    
> +    if (!m_database.isOpen())
> +        return;
> +
> +    SQLiteStatement insertQuery(m_database, "INSERT INTO Origins VALUES (?, ?)");
> +    SQLiteStatement removeQuery(m_database, "DELETE FROM Origins WHERE origin=?");
> +    if (insertQuery.prepare() != SQLResultOk) {
> +        LOG_ERROR("Failed to prepare insert statement - cannot write to local storage database %s", m_path.utf8().data());
> +        return;
> +    }
> +    if (removeQuery.prepare() != SQLResultOk) {
> +        LOG_ERROR("Failed to prepare insert statement - cannot write to local storage database %s", m_path.utf8().data());
> +        return;
> +    }
> +    
> +    SecurityOriginQuoteMap itemsToSync;
> +    {
> +        MutexLocker syncLock(m_syncLock);
> +        m_securityOriginsWaitingSync.swap(itemsToSync);
> +    }
> +    
> +    SecurityOriginQuoteMap::iterator end = itemsToSync.end();
> +
> +    for (SecurityOriginQuoteMap::iterator it = itemsToSync.begin(); it != end; ++it) {
> +        if (it->second > 0) {
> +            insertQuery.bindText(1, it->first->databaseIdentifier());
> +            insertQuery.bindInt64(2, it->second);
> +    
> +            int result = insertQuery.step();
> +            if (result != SQLResultDone) {
> +                LOG_ERROR("Failed to add origin to the local storage database %s - %i", m_path.utf8().data(), result);
> +                break;
> +            }
> +            insertQuery.reset();
> +        }
> +        else {
> +            removeQuery.bindText(1, it->first->databaseIdentifier());
> +    
> +            int result = removeQuery.step();
> +            if (result != SQLResultDone) {
> +                LOG_ERROR("Failed to remove origin from the local storage database %s - %i", m_path.utf8().data(), result);
> +                break;
> +            }
> +            removeQuery.reset();
> +        }
> +    }
> +    itemsToSync.clear();
>  }
>  
>  void LocalStorage::close()
Comment 11 Jeremy Orlow 2009-05-31 04:27:28 PDT
Also, this patch seems to be designed so that, on the first time a site is accessed which tries to use local storage, page->chrome()->client()->quotaForLocalStorage() is called.  From what I understand, this is to allow a browser to ask the client whether the app should be given space.  Is my understanding correct?

It seems like a better model might be to give them some default amount of space (which can be set to 0...probably put this into page->settings()?) and then do a callback any time they want to go beyond this space.  The browser could decide whether or not to ignore it or present it to the user.

I like this better because it'd allow browsers to give sites a modest amount of space per site (where the definition of modest is decided by each browser...possibly 0 when space is tight or sites are not trusted) in return for the user being bugged less.  This also gives the browser full control over when to "bug" the user.

For example, lets say gmail used localStorage.  When I first load it up, it could store some basic machine-specific preferences without me being prompted.  Then, let's say that I wanted to sync my email box.  It'd start storing data until the limit is hit.  My browser could then ask me if I wanted to give gmail more space.  If I choose yes, it'd keep filling up until some other limit was hit (or maybe never if I told it I trust that origin).

In addition, there probably needs to be some interface to the quotas so that the browser can display the information to users and allow them to adjust the quota.  This makes me wonder if it should be in its own class.

Lastly, isn't there some other quota management already in WebKit?  It seems there probably should be some central way to manage this?  Definitely Database and maybe AppCache seem to fall into this category.
Comment 12 Yael 2009-06-01 06:15:19 PDT
(In reply to comment #11)
> Also, this patch seems to be designed so that, on the first time a site is
> accessed which tries to use local storage,
> page->chrome()->client()->quotaForLocalStorage() is called.  From what I
> understand, this is to allow a browser to ask the client whether the app should
> be given space.  Is my understanding correct?
> 
> It seems like a better model might be to give them some default amount of space
> (which can be set to 0...probably put this into page->settings()?) and then do
> a callback any time they want to go beyond this space.  The browser could
> decide whether or not to ignore it or present it to the user.
> 
> I like this better because it'd allow browsers to give sites a modest amount of
> space per site (where the definition of modest is decided by each
> browser...possibly 0 when space is tight or sites are not trusted) in return
> for the user being bugged less.  This also gives the browser full control over
> when to "bug" the user.
> 
> For example, lets say gmail used localStorage.  When I first load it up, it
> could store some basic machine-specific preferences without me being prompted. 
> Then, let's say that I wanted to sync my email box.  It'd start storing data
> until the limit is hit.  My browser could then ask me if I wanted to give gmail
> more space.  If I choose yes, it'd keep filling up until some other limit was
> hit (or maybe never if I told it I trust that origin).
> 
> In addition, there probably needs to be some interface to the quotas so that
> the browser can display the information to users and allow them to adjust the
> quota.  This makes me wonder if it should be in its own class.
> 
> Lastly, isn't there some other quota management already in WebKit?  It seems
> there probably should be some central way to manage this?  Definitely Database
> and maybe AppCache seem to fall into this category.
> 


Thank you for a very thorough review. I will remove the review flag for now, as this obviously needs more work :-)
Comment 13 Yael 2009-10-05 13:39:21 PDT

*** This bug has been marked as a duplicate of bug 29991 ***